2.2.1. The K-Means Algorithm

<< Click to Display Table of Contents >>

Navigation:  2. Introduction to the PCA techniques > 2.2. Distance definition >

2.2.1. The K-Means Algorithm


Let’s start with a small example where each customer is represented by only 2 values (v1 and v2): each customer is a 2D point. A segmentation model is a model that assigns a color to each of these points. Each color represents a different segment.  Here we have represented a segmentation model that finds 3 different segments in the dataset:

Segment one   is composed by the customers  in green

Segment two   is composed by the customers  in red

Segment three is composed by the customers  in blue


See illustration:





To construct a segmentation model, Stardust uses (amongst other algorithms) the K-means algorithm. The K-means algorithm works this way: first we assign randomly one customer to each segment:



Thereafter, we assign all the customers to the nearest segment:




After, we re-compute the center of each segment.





… and we repeat: We assign, once again, each point  to the nearest “center”.




… and we re-compute the optimal centers.





… and we repeat:



… and we repeat until the segmentation no longer changes:




Please Note that the very first step of the K-means algorithm is to assign randomly one customer to each segment (and after iterate).
These “special” customers are named “seeds”. Usually, different “seeds” will give different (but hopefully close) segmentations. Thus, there is no unique segmentation: Depending on the “seeds” you will usually find different segments.
Amongst all the different segmentations proposed by TIMi, you should choose the segmentation that has the best interpretation from a business-point-of-view. TIMi offers you many different intuitive charts that allow you, in a few mouse clicks, to interpret easily your segmentation from a business-perspective, and therefore, to easily select the segmentation that is the best for you (and always from a business-perspective!).