5. How To use « TIMi – StarDust module »? > 5.5. How to create a segmentation?

One major drawback of the KMeans algorithm is that you have to “guess yourself“ how many segments your dataset contains. This is difficult. This is why we use the Ward’s algorithm to find the exact number of segments. The Ward’s algorithm is used after the K-means algorithm. The Ward’s algorithm is an iterative algorithm. It works this way:

1.starts with the segments found by the K-Means algorithm.

2.find the two closest segments and amongst all the segments still available.

3.delete the segments and and replace them by a new segment that is the sum of the segments and (this implies computing the position of the center of segment).

4.if there are still some segments available go back to point 2.

Thus, at each iteration of the Ward’s algorithm, the number of segments is decreased by one. We can now easily select how many segments we want by selecting how many iterations of the ward’s algorithm we will do.

5.5.3. The Ward’s algorithm

5.5.3. The Ward’s algorithm