KDD 2009

The objective of the world-level datamining completion KDD-2009 was a very common task inside the telecom industry and accurately represents the kind of tasks that are encountered in “real life” (in opposition to the purely “abstracts” tasks that are usually proposed in such academic competitions).

For the KDD2009, the final ranking is based on the average quality/accuracy (AUC) of 3 predictive models that are built using data coming from the “Orange” company (Orange is the number one of the French Telecom). The 3 predictive models to develop for the KDD2009 competition are one Churn model, one Upselling Model and one “Propensity-to-buy” (appetency) model.

The competition was a real “fight-to-the-death” because everybody wanted to demonstrate his superiority on “real world tasks”. At the KDD2009, there were:

  • 1299 registered teams
  • 7865 entries
  • 46 countries:
    Argentina Germany Malaysia South Korea
    Australia Greece Mexico Spain
    Austria Hong Kong Netherlands Sweden
    Belgium Hungary New Zealand Switzerland
    Brazil India Pakistan Taiwan
    Bulgaria Iran Portugal Turkey
    Canada Ireland Romania Uganda
    Chile Israel Russian Federation United Kingdom
    China Italy Singapore Uruguay
    Fiji Japan Slovak Republic United States
    Finland Jordan Slovenia
    France Latvia South Africa

As usual for such completion, your ranking is based on the (average) accuracy of the (3) predictive models (cross-selling, up-selling, churn) that you create, the higher accuracy, the better ranking. Here is a chart that illustrates the accuracy of the top 100 contestant: the RED dot is the accuracy obtained with TIMi:


I guess that the accuracy obtained with other “main stream” predictive analytics software is somewhere around 0.77 (AUC=77%). Although all major datamining re-sellers participated almost certainly to the completion, they did not reveal their final ranking… So we will never know for sure the accuracy of their tool… I wonder why….

These competition results place TIMi as the best “commercially available” predictive datamining tool in the world (the very few teams that managed to obtain a slightly better results than TIMi were all using software prototypes that are not available to the public). These results were obtained at a vendor-neutral world-level competition organized by qualified university researchers in the datamining field.

The added accuracy of the TIMi predictive models represents a tremendous difference in ROI for a telecom operator.