Showing posts with label expression. Show all posts
Showing posts with label expression. Show all posts

Wednesday, March 21, 2012

May I have my attributes discretized based on my own expression?

Hi, all here.

I am just having one question about discretization of continous attributes values. Cos the current discretization methods available in SQL Server 2005 data mining engine are these 3 ones:

.......................................................................................

automatic;

equal areas;

clusters.

..........................................................................................

So how these 3 methods work respectively? I mean like clusters method, how dose it discretize the continous values?

More importantly, can we have a discretization based on our own expression? like when i have one column with values ranging from 1 to 10, may we discretize this column based on expression like: 1-3,4-6,7-10?

Thanks a lot for any guidance.

User-defined ranges are not supported.

Here are descriptions of the supported discretization methods:

· Clusters: This finds buckets by performing single-dimensional clustering on the input values using the K-Means algorithm. It uses Gaussian distributions.

· EqualAreas: This examines the distribution of values across the population and creates bucket ranges such that that the total population is distributed equally across the buckets. In other words, if the distribution of continuous values were plotted as a curve, the areas under the curve covered by each bucket range would be equal. This is useful when there are a large number of duplicate values.

· Automatic: If this is selected, we try obtaining the requested number of buckets by applying the above discretization methods in the following order: Clusters, EqualAreas. We use the first method that gets closest to the number of requested buckets.

The Clusters method use random sampling (with a sample size of 1000) so EqualAreas may be used in situations where sampling is not desirable.

|||Hi, Thanks a lot.|||

However, you can always add a calculated column to do your own discretization. For example you can add a column "AgeDisc" with the expression

CASE WHEN [Age]<20 THEN 'Under 20'
WHEN [Age] <= 30 THEN 'Between 20 and 30'
ELSE 'Over 30'
END

Of course, you will have to map any input data to these values for predictions.

|||Jamie, thanks a lot. Very helpful.

Friday, March 9, 2012

Maximum length of an expression

Is there a maximum number of characters in an expression?

thx

Helen

There is no hard limit on the length of a single expression.

But even if there were, you could always convert that expression into a custom code or custom assembly function and then simply call the function from the original expression.

-- Robert

|||

If only that were true! Alas, my longest expressions are to provide a sub-title built of the selected values of a pile of multi-select (and other) parameters, and these require the use of such functions as Count, which cannot be called from code behind.

Debugging these things is a painful experience (as is debugging code behind) -- am I alone in wishing fervently for better debugging?

Thx

Helen

|||

Maybe you can post your expression here and I may have suggestions of how to refactor it.

-- Robert

|||

Here ya go (one of the shorter ones!):

=iif(Parameters!DepartmentIDs.Count = count(Fields!DepartmentID.Value, "Departments"),"All Departments", "Departments: " + join(Parameters!DepartmentIDs.Label,", ")) + "; " + iif(Parameters!EmployeeTypeIDs.Count = count(Fields!EmployeeTypeID.Value, "EmployeeTypes"),"All Employee Types", "Employee Types: " + join(Parameters!EmployeeTypeIDs.Label,", ")) + "; " + iif(Parameters!JobTitleIDs.Count = count(Fields!JobTitleID.Value, "JobTitles"),"All Job Titles", "Job Titles: " + join(Parameters!JobTitleIDs.Label,", ")) + "; " + iif(Parameters!VenueIDs.Count = count(Fields!VenueID.Value, "Venues"),"All Venues", "Venue 1: " + join(Parameters!VenueIDs.Label,", ")) + iif(IsNothing(Parameters!LastName.Value), "", "; Last Name Like " + Parameters!LastName.Value) + iif(IsNothing(Parameters!AgreementStartDate.Value), "", "; Agreement Date Between " + Parameters!AgreementStartDate.Value + " and " + Parameters!AgreementEndDate.Value) + iif(IsNothing(Parameters!JobStartDate.Value), "", "; Job Start Date Between " + Parameters!JobStartDate.Value + " and " + Parameters!JobEndDate.Value)

I realize that I could shorten this beast by creating a function to build the label strings, one of the parameters being the count of the fields (since there appears to be no way of calling Count from Code behind), so I would not have to repeat the logic for each multi-select parameter.

The way I debug these things is to build them section by section, using notepad to add each section and copying it to the expression, then going to preview mode to see if it works. This works (slowly!) unless there is a little bombshell awaiting me -- the report prints fine from Report Designer preview mode, but chokes when I deploy it and open it from Report Manager -- happily this doesn't happen too often.