Showing posts with label model. Show all posts
Showing posts with label model. Show all posts

Sunday, March 25, 2012

Cluster Names

I have a basic segmentation model that uses the clustering algorithm. I've renamed the clusters in the cluster diagram, but they still show up as Cluster 1-n in the dimension browser and in the cube browser.

After renaming the clusters, I did a full reprosses on the data mining dimension and on the linked cube. Can anyone tell me what else I need to do to use the new cluster names in the report?

Thanks

A full reprocess of the DM dimension and the linked cube should do the trick - in fact, you don't really need to reprocess the cube with the linked DM dimension. Are you sure the reprocess completed successfully and you refreshed the dimension/cube browsers?|||

Yes. I tried it again with the same results. I noticed that when I opened the project that the cluster names that I entered were gone. The model had Cluster 1-10 again. I renamed them again, saved the .dmm file, reprocessed the dimension (full, successful), and reconnected in the browser. It still shows Cluster 1-10. I tried re-processing the model. That resets the cluster names in the model viewer.

Thanks for your help Raman.

|||

Renaming clusters in the viewer directly updated the model on the server - there is no need to reprocess the model. What you're seeing still looks like a client refresh issue.

What version of the Analysis Services 2005 and client tools (BI Dev Studio) are you running? RTM or SP1? Would you be able to provide complete repro steps for this problem?

|||

SP1, using BIDS to browse

In trying to re-create the problem in AdventureWorks, I found that changing the cluster names worked. I noticed the AllowDrillthrough setting in the AW cluster model was set to True. After changing that property in my project, the cluster names showed in the dimension. This seems to fix the problem.

Thanks for your help Raman.

sqlsql

Cluster Model Viewer Timeout

Hi,

I am trying to browse a clustering model and encounter the following timeout error:

XML for Analysis parser: The XML for Analysis request timed out before it was completed.
Execution of the managed stored procedure GetNodeGraph failed with the following error: Exception has been thrown by the target of an invocation.Microsoft::AnalysisServices::AdomdServer::AdomdException.

Is there anything I can do to change settings to enable viewing of this model? Or, perhaps we have too many attributes in the model?

Thanks.

Please try the following:

In BI Dev Studio, go to the Tools menu and select Options\Business Intelligence Designers. Change the value of the Query Timeout to some larger value (600 would be a good start)

|||

Excellent, thanks Bogdan! That worked perfectly. For clarification, it's:

Tools --> Options

--> Browse left-side tree in the pop-up window to Designers --> Analysis Services Designers --> General

Tuesday, March 20, 2012

Cluster Euclidean Distance

I am new to data mining so please excuse my ignorance. Lets assume

- i have created a cluser model

- identified 3 clusters ( a, b, c)

- each record consists of 15 columns

- collecting new records( 15 variables) real time

what i would like to do is plot these new records programmatically as i collect them realtime. I assume this new record will belong to one of these three clusters. I believe we can find the cluster this new record belongs to by ' SELECT Cluster()....' and distance from the center of the cluster by ClusterDistance(). To plot this on a 2-dimentional space i need (x, y).

ClusterDistance() could be Y but what will be X.

thanks.

Cluster() will return cluster that is most likely contain an input case (in your case, the new case). You can also use ClusterProbability() to get the probability that a case belongs to a particular cluster. This basically serves as the (reversed) cluster distance you are talking about; and it works with more general data (including both numeric and discrete data). Moreover, you can use PredictHistogrom(…) to return a histogram of the likelihood of the input case existing in each of the model’s clusters. You can also use CaseLikelihood(…) to return a measure from 0 to 1 that indicates how likely an input case is to exist considering the model learned by the algorithm.

For your reference, we have a live sample The Art of Clustering demonstrating how to use all these features to render 2D data points according to clustering results. I hope this sample will be of help to your project.

Good luck,

|||Your x and y are whatever you choose them to be. For example, the way our cluster diagram works is to plot cluster locations on a 2d plane by arbitrarily laying them out and using a "point-charge" approach to move the clusters around until they converge (or we get tired....). If you were to use such a method for identifying cluster "locations" in 2d space, you could then use ClusterDistance() (which is 1-ClusterProbability) for each case vs. each cluster to approximate where the case would land in the 2d space. You could then color the case by the most likely cluster, and you would have a diagram that looked similar to Yimin's Art of Clustering example, but with arbitrarily dimensioned cluster models.|||

Thanks Wu. I have one more question.

lets assume the new record( all floats columns ) belongs to cluster A but is significantly away from the center of the cluster.

there can be one or more than one columns that caused this record to be far away from the center of the cluster.

Is there any way we can find out the most significant columns in this record that caused it .

thanks

|||This is exactly how we implemented the outlier detection in the data mining addins for Excel. Code that shows how to do this is at http://www.sqlserverdatamining.com/DMCommunity/LiveSamples/46.aspx

Saturday, February 25, 2012

CLONING PACKAGES

I loved the DTS feature that allowed saving DTS package as a VB model. With a bit of coding you could generate a number of packages from one template.

Are there any analogous solutions for SSIS? I cannot find anything.

My goal is simple. I have an ETL step that transfers data from staging dimension table to the corresponding star schema table in the subject matter database. I have two types of packages for SCD type 1 and type 2. Do you have any suggestions on how I can clone packages so that I don’t have to go manually through each of them to replace certain items such as stored procedure names, etc.

Thank you!


Besides copying the .dtsx file and editing the copy to suit your needs?|||You can create a program to generate SSIS packages. If you want to reverse your current packages into C# code, take a look at http://www.ivolva.com/ssis_code_generator.html.|||You could also create a template for each type of load process and store them in the \Program Files\Microsoft Visual Studio 8\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems folder. I think that's the correct place.|||Phil Brammer: Yes, I want to be able to change all hard-coded string dynamically. Assuming I use standard naming convention all object names will have the same structure. All I want to do is loop through table names and replace appropriate strings within the existing template package to generate new packages on the fly.|||jwelch, that's an interesting solution. I'll try to play with you and let you know if it works. But it seems it's what I need.|||

LoveDanger wrote:

Phil Brammer: Yes, I want to be able to change all hard-coded string dynamically. Assuming I use standard naming convention all object names will have the same structure. All I want to do is loop through table names and replace appropriate strings within the existing template package to generate new packages on the fly.

Using expressions, you can build just ONE package to do this though.

Provided the structures of all of the tables (both sources and destinations) are all the same... That is, you can dynamically adjust the tables the OLE DB sources and destinations go against.|||Thanks Phil, this should work. I assume the transformation part would be tricky (I'm not sure how to map all the columns dynamically), but I'll check if that could be done. Also, I believe the solution posted by jwelch (http://www.ivolva.com/ssis_code_generator.html) should work if one still wants to generate numerous packages.

Thank you all!
|||Note: My solution works if you are looping through tables that have the same structure. If they don't, then my solution won't work. Once you build the package (mappings and all) you can start using expressions to dynamically change the table names.|||Yeah, in my case we have different set of attributes for each dimension that I'm not sure how to handle. I used your approach with BCP though - use only one package to transfer data from a number of flat files to SQL Server tables. Though you still have to generate format files for each that pretty much translates your column mappings.|||

LoveDanger wrote:

I loved the DTS feature that allowed saving DTS package as a VB model. With a bit of coding you could generate a number of packages from one template.

Are there any analogous solutions for SSIS? I cannot find anything.

My goal is simple. I have an ETL step that transfers data from staging dimension table to the corresponding star schema table in the subject matter database. I have two types of packages for SCD type 1 and type 2. Do you have any suggestions on how I can clone packages so that I don’t have to go manually through each of them to replace certain items such as stored procedure names, etc.

Thank you!


Sounds to me like you want to use templates. Matt explained where to drop them elsewhere in this thread.

-Jamie

|||

Well, I am not really sure how would templates solve my problem. I will still have to go through each package task and replace the names of the objects manually.

Does template allow you to do search and replace for the entire package?

|||

LoveDanger wrote:

Well, I am not really sure how would templates solve my problem. I will still have to go through each package task and replace the names of the objects manually.

It sounds as though you'll have to do that regardless of the solution though?

LoveDanger wrote:

Does template allow you to do search and replace for the entire package?

No, not really. You could open up the package's XML (right-click on the package and select 'View Code') and do a Find-Replace that way of that's what you want to do.

-Jamie

|||Thanks Jamie, I think this would be the easiest in my case. At least it will save time on going through each task manually.

Anastasia|||

LoveDanger wrote:

Thanks Jamie, I think this would be the easiest in my case. At least it will save time on going through each task manually.

Anastasia

Cool. Take a copy of the package before you alter it Smile

Sunday, February 12, 2012

Click Through Error - An item with the same key has already been added - When/Why?

I am getting this error when clicking-through in Report Builder on a report using a report model as the data source -
An item with the same key has already been added.

I searched for the error but came up empty in the SSRS space. Does anyone out there know when this is generated and where I need to look in my model/properties to see what I have set incorrectly?

A little background/detail - It's a simple report against two fields (type & #count) in one entity (customer) in a (guessing here) medium sized model (~16 transactional/fact entities, ~40 lookup entities, ~15 what I am calling domain entities). The entity has 5 DefaultDetailAttributes, one Identifying. The report has 2 fields type (a lookup role) and #customers in the data area (so I can click it to test the click through). Then I click the # Customers to look at the detail for those X customers and after a long while and no queries against the DB (I have profiler running) I get this error.

Thanks in advance for the help.

Getting the same error too...