2.2.1. The K-Means Algorithm

<< Click to Display Table of Contents >>

Navigation:  2. Introduction to the PCA techniques > 2.2. Distance definition >

2.2.1. The K-Means Algorithm

 

Let’s start with a small example where each customer is represented by only 2 values (v1 and v2): each customer is a 2D point. A segmentation model is a model that assigns a color to each of these points. Each color represents a different segment.  Here we have represented a segmentation model that finds 3 different segments in the dataset:
 

Segment one   is composed by the customers  in green
 

Segment two   is composed by the customers  in red
 

Segment three is composed by the customers  in blue

 
 

See illustration:

 

STARDU~1_img51

 

 

To construct a segmentation model, Stardust uses (amongst other algorithms) the K-means algorithm. The K-means algorithm works this way: first we assign randomly one customer to each segment:

clip0005

 

Thereafter, we assign all the customers to the nearest segment:

 

STARDU~1_img53

 

After, we re-compute the center of each segment.

 

STARDU~1_img54

 

 

… and we repeat: We assign, once again, each point  to the nearest “center”.

 

STARDU~1_img55

 

… and we re-compute the optimal centers.

 

STARDU~1_img56

 

 

… and we repeat:

STARDU~1_img57

 

… and we repeat until the segmentation no longer changes:
 

STARDU~1_img58

 
 

STARDU~1_img59

Please Note that the very first step of the K-means algorithm is to assign randomly one customer to each segment (and after iterate).
 
These “special” customers are named “seeds”. Usually, different “seeds” will give different (but hopefully close) segmentations. Thus, there is no unique segmentation: Depending on the “seeds” you will usually find different segments.
 
Amongst all the different segmentations proposed by TIMi, you should choose the segmentation that has the best interpretation from a business-point-of-view. TIMi offers you many different intuitive charts that allow you, in a few mouse clicks, to interpret easily your segmentation from a business-perspective, and therefore, to easily select the segmentation that is the best for you (and always from a business-perspective!).