6.1. Data Selection: The DataSource Editor (Step 1)

<< Click to Display Table of Contents >>

Navigation:  6. Predictive Modeling with TIMi Modeler >

6.1. Data Selection: The DataSource Editor (Step 1)


Click on the Select Data Source button of the TIMi main window: number1




The following window appears:




The first step of the analysis is obviously to open the dataset file on which we will work on. Using the DataSource editor, you can specify the location of the dataset that you want to analyze. The default storage is CSV/Text file (CSV=“Comma Separated Value” text file).


It is important to properly set the separator character when working with text file. It can be a comma “,”, a Hash “#”, a Semicolon “;”a TAB character, or any other which you may specify. When in doubt, click on “Show Datasource” number2 and TIMi Modeler will display the first 100 lines.


Please also note the presence of the parameter named “character used for decimal mark”. The default setting is “.”. It means that a column containing “3,14” won’t be recognized as a “value” column but rather as a “nominal” column because of the comma. If you want that TIMi Modeler sees “3,14” as a number  you must:


1.Change the parameter named “character used for decimal mark” to the value “,”.

2.In particular: This will NOT work:

Manually changing the type of the variable from “nominal” to “value” inside the “Type Var Editor” (see next section) will not work.

Alternatively, a better solution might be to use Anatella to replace the “,” comma character with a “.” dot character (using the TIMiQuickGuide_english_v5_img19 ReplaceStrings action).




Note about UNICODE:


Ideally, the character encoding of your text file is Unicode UTF-8. Unicode allows you to manipulate datasets containing any kind of characters: Cyrillic, Chinese, Greek, etc. If unicode is used, TIMi Modeler automatically produces unicode version of the reports in MSWord and MSExcel.


If the first 3 bytes of your text dataset are 0xEF,0xBB,0xBF  (This is the Byte-Order-Mark for utf-8 encoded text file) then TIMi Modeler automatically switches to the Unicode mode. By default, Anatella always produces utf-8 unicode text files with a correct utf-8 BOM (Byte-Order-Mark) header (so that everything is automated and transparent for the end-user when using Anatella).

Let us go back to the “census-income” example: Click on the TIMiQuickGuide_english_v5_img21 button, go to the “Central Dataset Repository” directory, and select the “census-income.rar” file (see appendix A about compressed CSV files). See the illustration below:



Once you click “Open” number1  the connection is established.


Note that the dataset can also be stored inside an Anatella .gel file, a .sqlite database file or inside any database accessible through ODBC or OLEDB technique.


The next step is to set the working directory that contains the analysis results. You can either choose to use the current directory (where the source file is located) or any directory of your choosing. Click on the TIMiQuickGuide_english_v5_img23 button and create/select a working directory (here below, we selected the “f:\TIMi\income3” directory):



You can also change the name of the project to something more meaningful, like “demoIncome” (the same “working directory” can contain several projects, if they have different names). You should now have something like this:





Click the start-analysis button and wait for a few seconds (it can be a few minutes on very large dataset files). You should pretty soon see the following message:




The first step of the analysis is now completed. TIMi Modeler performed a first analysis of your dataset in an attempt to guess what’s the type of each column of your dataset. Based on this guess, it produced a “.TypeXML” file that appeared inside the “working directory”.


Click the "Yes" button: number1  The TIMi Modeler “.TypeXML file Editor” appears. Alternatively, you can also double-click the “demoIncome.TypeXML” file inside your working directory: number2