6.2.6. Univariate Importance of a Variable
We will define the “Univariate Importance” of a variable X as the AUC of the univariate predictive model built using only the variable X. The “Univariate Importance” of the “age” variable is thus 57.1%.
The last section of the audit report generated with TIMi Modeler contains a table with all the “Univariate Importance” of all the variables in the dataset:
It means that, if we use the information contained in the column “education” alone to predict the target, we will get a lift with a AUC=69.75%. Let us have a look at the lift chart of the (univariate) model built using only the column “education”:
What can we conclude about the education level of a person?
A higher education means a higher chance to be wealthy on adult age. Interestingly, the “Doctorate degree” (the second from the left) study does not seem to pay off at the end. So, if you want to be wealthy, you should study for a long time and stop before the “Doctorate degree”. Apparently, too much studies fries your brain!
Is the gender of the person related to its income?
Let us have a look at the column “sex”. We observe:
It appears that the “Males” have a lot more chances to be wealthy than “Females”.
We could say this chart contradicts the hypothesis of gender equality quite a bit!
We can continue to analyze the dataset, column by column, for a long time. In one way or another, almost every variable has some impact and could be (more or less) well suited to predict if a person is wealthy or not (i.e. to predict if a person is “inside the target”). In reality, only a small subset of the 42 columns is really needed to predict accurately the target. In this “demo” situation, we only have 42 columns but most real-life datasets often have hundreds or thousands of columns/variables. In the CRISP-DM methodology, a careful review of every variable is required prior to a predictive modeling exercise. With TIMi Modeler, while are careful review of some variables is still needed, the process tends to be much faster.
To figure out which variables are really important, the easiest way is to find out which can be included inside a predictive model.
Let us then proceed to the next step: open the “Config File editor”: click on the button:
You can now close the “Type Var Editor” application and the “Microsoft Word” application (the file “census-income_AUDIT.doc” was still visible inside Microsoft Word).