6.2.1. Variable Types: Review and Implications

<< Click to Display Table of Contents >>

Navigation:  6. Predictive Modeling with TIMi Modeler > 6.2. Review and Audit Data: The Type Var file Editor (Step 2) >

6.2.1. Variable Types: Review and Implications

 

Using the “Type Var Editor”, you can specify the type of each column of the dataset.

 

There are basically five types of columns:

 

a.Value type. Examples are: Age, Size, Cost, Price, …

This type of column can only contain numbers and exhibits an ordering property. For example, a six-year old boy is younger than a twelve-year old boy and a twelve-year old boy is younger than an eighteen-year old boy. There is an order. So, a column containing the age of a person should be of type “value”. On the contrary, a column containing the zip code of the house of a person should be of type “nominal” because the zip code 1050 is NOT smaller or bigger than the zip code 1210. There is no order inside the zip code.

 

b.Binary type. Examples are: isMale, isForeigner, …

This type of column can only contain a “true/false”, “yes/no” semantic. These columns contain only two modalities (T/F) or three modalities (T/F/missing).

 

c.Nominal type. Examples are CarLabel, Region, …

This type of column contains anything that is not “value type” or “binary type”.

 

d.Target type. What is the “target column”?

 

e.Key type. What is the “primary key column”?

 

 

The Type Var Editor window is divided in three zones: See the illustration below:

 

TIMiQuickGuide_english_v5_img32

 
 

The Green Zone contains information about all the columns of your dataset.

 

Inside this zone, you can see the column “Type” that contains all the column’s type inside your dataset. The dataset column’s type is based on a guess done by TIMi Modeler, and (as it is the case with everything that’s coming out of a heuristic) guesses can sometime go wrong. For example: a postal code will be detected as “value type”, but it is in reality a “Nominal type”. You should check the column’s type to see if the guess was OK.

 

If a column containing the number “3,14” is incorrectly detected as the “nominal”, you must go back to the DataSource Editor and change the “character used for decimal mark” parameter to “,”. Alternatively, a better solution might be to use Anatella to replace the “,” comma character with a “.” dot character (using the TIMiQuickGuide_english_v5_img19 ReplaceStrings action).

 

Note that TIMi Modeler will work fine even if some types are wrongly defined. However, if there are too many errors in the column’s type then the quality of the predictive model (the quality of the lift) might be a little bit lower.

 

Usually, TIMi Modeler’s guesses are fine (except for zip and other numeric “code” type of variable). So, don’t worry if you do not want to look at the type of ten thousand variables. You can proceed directly to the next step to construct a “predictive model”.

 

 

TIMiQuickGuide_english_v5_img10

Remark: Once I have a predictive model, I check carefully the type of the few columns (usually no more than 25) that are actually really used to perform the prediction. As a general rule of thumb, it’s never a good idea to spend a lot of time cleaning and “tuning” all the dataset’s columns. You should focus only on the columns that are used by the predictive model. TIMi Modeler is pretty robust to un-cleaned data, so don’t worry too much about having a “perfectly” correct .TypeXML file.

 

 

TIMiQuickGuide_english_v5_img10

You can edit directly the table to change the column’s type. For example, the column “detailed industry code” is a code and thus should be a “nominal value”. Change column’s type as illustrated below:

 

TIMiQuickGuide_english_v5_img36

 

Please note that the manipulation described in this note is only an example. This manipulation has NOT been performed in the rest of this document.

 

 
The red zone: This zone contains some filter that allows you to search easily inside the central table in the green zone. Some examples:
 

-If you enter “*code” inside the Filter 1, you will get all the columns with a name that ends with “code”.
 

-If you enter “code*” inside the Filter 1, you will get all the columns with a name that starts with “code”.
 

-If you enter “*code*” inside the Filter 1, you will get all the columns with a name that contains “code”.

 

 
The blue zone: This zone is used to apply changes on the selected lines of the central table in the green zone.
 

 

TIMiQuickGuide_english_v5_img10

An example of usage of the red and blue zone: We want to set to the “nominal” type all the columns that contain “code”:

 

1. Enter “*code*” inside the Filter 1.

 
2. Click the TIMiQuickGuide_english_v5_img38  button number1  and then the TIMiQuickGuide_english_v5_img39 button. number2

 

See the illustration below:

 

TIMiQuickGuide_english_v5_img40

 

Please note that the manipulation described in this note is only an example. This manipulation has NOT been performed in the rest of this document.

 

 
You must define the column that contains the primary key. In our example the primary key is the column “Key”. See illustration:

 
TIMiQuickGuide_english_v5_img41

 

 
We also have to define the column containing the target. In this example the target is of type Binary True/False: We select inside the .TypeXML the column “taxable income amount” as TargetB (“B” stands for “Binary” target). See illustration:

 

TIMiQuickGuide_english_v5_img42          

 

 
The other parameters available inside the .TypeXML File editor are not important at this time. You can leave them at their default value.