<< Click to Display Table of Contents >> Navigation: »No topics above this level« 1. Introduction |
Welcome to the Quick user’s guide to StarDust !
This document will guide you through the process of creating and interpreting a segmentation model using StarDust. The concepts used to create a good segmentation analysis are quite abstract and a small knowledge of college-grade mathematics is required to read this guide. Don’t worry! Stardust is fully automatized and really easy to use! This document is self-contained and no additional knowledge is required before reading it. Once you have grasp the small concepts explained in this guide, you will be able to create the most advanced segmentation models with StarDust! With StarDust you can easily explore terabytes of data and extract some useful knowledge. Discovering new insights about your customers should be fun and easy. …and now, it’s the case with Stardust! It’s a whole new (VR) world waiting to be discovered hidden inside your databases.
This document does not cover the creation of predictive models with TIMi.
During the course of this document we will analyse together a dataset named “census_income” (in statistics, the “data-tables” that are analysed are named “datasets”). This dataset contains data about the financial incomes of the inhabitants of the United State of America. Here is an extraction of this dataset:
key |
Is taxable income amount above $50K ? |
age |
education |
marital stat |
race |
sex |
country of birth |
weeks worked in year |
1 |
0 |
73 |
High school graduate |
Widowed |
White |
F |
USA |
0 |
2 |
0 |
58 |
Some college but no degree |
Divorced |
White |
M |
USA |
52 |
3 |
0 |
18 |
10th grade |
Never married |
Asian |
F |
Vietnam |
0 |
4 |
0 |
9 |
Children |
Never married |
White |
F |
USA |
0 |
5 |
0 |
10 |
Children |
Never married |
White |
F |
USA |
0 |
6 |
0 |
48 |
Some college but no degree |
Married-civilian |
Indian |
F |
USA |
52 |
7 |
0 |
42 |
Bachelors degree(BA AB BS) |
Married-civilian |
White |
M |
USA |
52 |
8 |
1 |
28 |
High school graduate |
Never married |
White |
F |
USA |
30 |
9 |
0 |
47 |
Some college but no degree |
Married-civilian |
White |
F |
USA |
52 |
10 |
0 |
34 |
Some college but no degree |
Married-civilian |
White |
M |
USA |
52 |
11 |
0 |
8 |
Children |
Never married |
White |
F |
USA |
0 |
13 |
0 |
51 |
Some college but no degree |
Married-civilian |
White |
M |
USA |
52 |
14 |
1 |
46 |
High school graduate |
Divorced |
White |
F |
Columbia |
52 |
15 |
0 |
26 |
Bachelors degree(BA AB BS) |
Never married |
White |
F |
USA |
52 |
16 |
0 |
13 |
Children |
Never married |
Black |
F |
USA |
0 |
17 |
0 |
47 |
Bachelors degree(BA AB BS) |
Never married |
White |
F |
USA |
52 |
18 |
0 |
39 |
10th grade |
Married-civilian |
White |
F |
Mexico |
0 |
19 |
0 |
16 |
10th grade |
Never married |
White |
F |
USA |
0 |
20 |
0 |
35 |
High school graduate |
Married-civilian |
White |
M |
USA |
49 |
In opposition to a predictive model, we don’t have any “Target” column to explain during a segmentation analysis. What’s interest us is the segment of customers inside your database.
The “census-income” dataset contains a special column: the “primary key column” or, in other words, the “primary key”. The “primary key” contains a different value on each line of the dataset. Its utility is to be able to define in a unilateral way each line of our dataset. The concept of “primary key” is well known in the database world. If you want to know more about this subject, I suggest that you ask your database administrator. The “primary key column” in our dataset is named “key”.
StarDust is able to process datasets stored in many formats. Your datasets can be stored inside relational databases (like Microsoft SQL server, Oracle, Informatix, MySQL,...), SAS dataset files (.sas7bdat files) or simple “flat files”. The preferred storage format for TIMi is a CSV-flat-file compressed in RAR format.