6.2.6. Univariate Importance of a Variable

<< Click to Display Table of Contents >>

Navigation:  6. Predictive Modeling with TIMi Modeler > 6.2. Review and Audit Data: The Type Var file Editor (Step 2) >

6.2.6. Univariate Importance of a Variable

 

We will define the “Univariate Importance” of a variable X as the AUC of the univariate predictive model built using only the variable X. The “Univariate Importance” of the “age” variable is thus 57.1%.

 

The last section of the audit report generated with TIMi Modeler contains a table with all the “Univariate Importance” of all the variables in the dataset:

 

TIMiQuickGuide_english_v5_img56

 
 

It means that, if we use the information contained in the column “education” alone to predict the target, we will get a lift with a AUC=69.75%. Let us have a look at the lift chart of the (univariate) model built using only the column “education”:

TIMiQuickGuide_english_v5_img57

 

 

TIMiQuickGuide_english_v5_img58

What can we conclude about the education level of a person?

 

A higher education means a higher chance to be wealthy on adult age. Interestingly, the “Doctorate degree” (the second from the left) study does not seem to pay off at the end. So, if you want to be wealthy, you should study for a long time and stop before the “Doctorate degree”. Apparently, too much studies fries your brain! clip0004

 
 

Is the gender of the person related to its income?

 

Let us have a look at the column “sex”. We observe:

TIMiQuickGuide_english_v5_img59
 

 

It appears that the “Males” have a lot more chances to be wealthy than “Females”.

 

We could say this chart contradicts the hypothesis of gender equality quite a bit!

 

We can continue to analyze the dataset, column by column, for a long time. In one way or another, almost every variable has some impact and could be (more or less) well suited to predict if a person is wealthy or not (i.e. to predict if a person is “inside the target”). In reality, only a small subset of the 42 columns is really needed to predict accurately the target. In this “demo” situation, we only have 42 columns but most real-life datasets often have hundreds or thousands of columns/variables. In the CRISP-DM methodology, a careful review of every variable is required prior to a predictive modeling exercise. With TIMi Modeler, while are careful review of some variables is still needed, the process tends to be much faster.

 

To figure out which variables are really important, the easiest way is to find out which can be included inside a predictive model.

 

Let us then proceed to the next step: open the “Config File editor”: click on the TIMiQuickGuide_english_v5_img60 button:
 

start-multivariate
 

 

You can now close the “Type Var Editor” application and the “Microsoft Word” application (the file “census-income_AUDIT.doc” was still visible inside Microsoft Word).