1. Introduction

<< Click to Display Table of Contents >>

Navigation:  »No topics above this level«

1. Introduction

 

Welcome to the Quick user’s guide to StarDust !

 

This document will guide you through the process of creating and interpreting a segmentation model using StarDust. The concepts used to create a good segmentation analysis are quite abstract and a small knowledge of college-grade mathematics is required to read this guide. Don’t worry! Stardust is fully automatized and really easy to use! This document is self-contained and no additional knowledge is required before reading it. Once you have grasp the small concepts explained in this guide, you will be able to create the most advanced segmentation models with StarDust! With StarDust you can easily explore terabytes of data and extract some useful knowledge. Discovering new insights about your customers should be fun and easy. …and now, it’s the case with Stardust! It’s a whole new (VR) world waiting to be discovered hidden inside your databases.

 

This document does not cover the creation of predictive models with TIMi.

 

During the course of this document we will analyse together a dataset named “census_income” (in statistics, the “data-tables” that are analysed are named “datasets”). This dataset contains data about the financial incomes of the inhabitants of the United State of America. Here is an extraction of this dataset:

 

key

Is taxable income amount above $50K ?

age

education

marital stat

race

sex

country of birth

weeks worked in year

1

0

73

High school graduate

Widowed

White

F

USA

0

2

0

58

Some college but no degree

Divorced

White

M

USA

52

3

0

18

10th grade

Never married

Asian

F

Vietnam

0

4

0

9

Children

Never married

White

F

USA

0

5

0

10

Children

Never married

White

F

USA

0

6

0

48

Some college but no degree

Married-civilian

Indian

F

USA

52

7

0

42

Bachelors degree(BA AB BS)

Married-civilian

White

M

USA

52

8

1

28

High school graduate

Never married

White

F

USA

30

9

0

47

Some college but no degree

Married-civilian

White

F

USA

52

10

0

34

Some college but no degree

Married-civilian

White

M

USA

52

11

0

8

Children

Never married

White

F

USA

0

13

0

51

Some college but no degree

Married-civilian

White

M

USA

52

14

1

46

High school graduate

Divorced

White

F

Columbia

52

15

0

26

Bachelors degree(BA AB BS)

Never married

White

F

USA

52

16

0

13

Children

Never married

Black

F

USA

0

17

0

47

Bachelors degree(BA AB BS)

Never married

White

F

USA

52

18

0

39

10th grade

Married-civilian

White

F

Mexico

0

19

0

16

10th grade

Never married

White

F

USA

0

20

0

35

High school graduate

Married-civilian

White

M

USA

49

 

 

In opposition to a predictive model, we don’t have any “Target” column to explain during a segmentation analysis. What’s interest us is the segment of customers inside your database.

 

The “census-income” dataset contains a special column: the “primary key column” or, in other words, the “primary key”. The “primary key” contains a different value on each line of the dataset. Its utility is to be able to define in a unilateral way each line of our dataset. The concept of “primary key” is well known in the database world. If you want to know more about this subject, I suggest that you ask your database administrator. The “primary key column” in our dataset is named “key”.

 

StarDust is able to process datasets stored in many formats. Your datasets can be stored inside relational databases (like Microsoft SQL server, Oracle, Informatix, MySQL,...), SAS dataset files (.sas7bdat files) or simple “flat files”. The preferred storage format for TIMi is a CSV-flat-file compressed in RAR format.