Not all predictive analytics tools are born equal

The main objective of any “Advanced Analytic” tool is to generate the best, most accurate, ranking (or “list of candidates”). There are mainly 2 different approaches to generate the ranking: segmentation or prediction. Different softwares use different approaches.

Segmentation tools

This covers 99% of the available tools. These tools are very easy to create and are, most of the time, only a small component into a larger “Operational CRM tool”. Some Example: Probance, Miner3D, Webtrends Segments,…

Segmentation techniques can be represented like this:

Each candy in the cookie-jar represents a know prospect. The big red cross represents our segmentation: we decided to create 4 segments based on our analysis of our population.

The objective of the exercise here is to create the best ranking: We want to select only the prospects that will buy our product. To create this ranking, we will analyze our population and we will pay a special attention to the people that already have the product (these costumers are represented with a Chokotoff label: )

There are many different ways to define segments of your population. Segments can be defined:

Using simple business-rules: for example:
- Segment 1 is composed by the men with age<30
- Segment 2 is composed by the men with age>30
- Segment 3 is composed by the women with age<30
- Segment 4 is composed by the women with age>30
Usually, one finds such business-rules using an obsolete In-RAM-OLAP tool to explore the population and try to find “good” segments.
Using “advanced analytics”: using tools like Stardust (KMeans clustering, Hierarchical clustering,etc.) or SpadSoft.

In most of the time, the segments inside your population have been created ‘by a very smart guy a few years ago‘. In the above example, the criteria that was used to create the 4 segments is the position of the candy in the cookie-jar.

Let’s now create a ‘ranking’ based on our segmentation. A ranking is simply an ordered list that contains all your prospects sorted from the one with the highest probability of purchase to the one with the lowest of purchase. We can easily compute the “probability of purchase” of the different customers inside a specific segment: it’s the percentage of buyers inside the segment. To create our ranking, we will order our segments from the “best” one (the one with the highest number of buyers in percent) to the “worst” one (the one with the lowest number of buyers in percent).

Here is an illustration of the ranking

The same ranking can also be illustrated in this way (the “Y” axis is now the cumulative number of buyers found):

The blue curve in the above chart is named the ‘Lift curve‘. The lift curve allows you to ‘see’ the quality of your ranking. Different ‘Analytical CRM tools’ will have different lift curves. The lift curve directly translates into ROI!

TIMi framework includes a unique tool that directly estimates, based on the lift curve, the ROI (in Euros or Dollars) of your marketing campaign. A good “Advanced Analytic” software will be able to directly find all the buyers and it will generate a “high lift curve”. The higher the lift curve, the better your ranking, the higher your ROI.

HIGHER LIFT CURVE = HIGHER ACCURACY = HIGHER ROI

Predictive analytic tools

These tools are quite intricate to create and although they all deliver superior performances compared to simple “Segmentation tools”, the quality of the delivered results (quality of the ranking) varies greatly between them.

Predictive techniques’ can be represented like this:

The large bold red circle represents our predictive model. The objective of this predictive model (in technical terms: the ‘target’) is to find all the people that have bought your product (i.e. the targets are the Chokotoff label: ).

This predictive model is making 4 errors:

Two blue candy and one yellow candy are classified as “customers currently having the product”. These errors are interesting: they represent very good “leads” (i.e. customer that do NOT have your product yet but are very likely to purchase it)
one customer is classified as somebody that did not bough your product but, in reality, he bought it.

Predictive models that are built with Timi also give you in addition the exact probability of purchase of each customer. In the above example, the prospects that are inside the thin red curve have the highest probability of purchase.

The quality of the ranking obtained through predictive technique is visible on the lift curve:

Comparison of both techniques

Lets now compare these 2 approaches in terms of ROI. By its very nature, segmentation is a technique well-adapted for exploratory work. In opposition, predictive analytics is discriminatory in nature.

In a classical lift-curve, the unit of the X-horizontal-axis is traditionally in percent: it’s the percentage of the population selected. Also, in a classical lift-curve, the unit of the Y-vertical-axis is traditionally also in percent: it’s the percentage of the buyers found.

Let’s plot the 2 lift curves (the one obtained from the segmentation and the one obtained from the predictive model) on the same chart (the X and Y axis are now in percent):

In the above chart, there is a yellow line: This yellow line represents the “random selection”: For example, if you select randomly 50% of your population, you will “find” 50% of your buyers, thus the yellow line goes through the coordinate (50%; 50%). The “random selection line” represents the worst selection/ranking that you can do (i.e. it’s a pure random selection).

In the above chart, the lift that characterizes the ranking obtained through predictive analytics (the red one) is higher than the one obtained with the segmentation technique (the blue one). This is always the case. What does it means?

Predictive techniques creates better rankings than segmentation techniques: On this example the ranking obtain through predictive analytics will typically generate 20% more cash than the ranking obtained through segmentation technique.
The predictive model is able to extract out of your database the “right people”: The predictive model exactly “extracted” the right people: i.e. the ones that are interested in buying your product (and very few other people).
Depending on the context, a difference between 2 rankings that is as small as 2 or 3 percent on the lift could mean millions of euros (or dollars) of difference between the corresponding marketing campaign. This is especially true in the banking, telecommunication and insurance world. In these fields, a few added percent on the lift curve directly translate to hundred thousands of added ROI for your marketing campaigns. You NEED to have the best lift. Otherwise, you are losing money at each marketing action.

The lift obtained with Timi are systematically better than the lift obtained with any other commercially available analytical CRM software (it’s very common to have an improvement from 5% to 20% at X=10%) (i.e. it’s very common to have an added ROI of 5% to 20% when using Timi, compared to other tool analytical CRM tool).

Why are there so many people still using segmentation techniques to create their ranking?
The answer is:

Creating a predictive model used to be extremely difficult: Very often, you had to hire expensive “specialized consultants” during 2 or 3 months to obtain a medium-accuracy predictive model. With Timi, anybody with an analytic mind can create extremely accurate rankings/lifts in a few mouse clicks, in a few minutes. This is a revolution.
Standard predictive softwares have enormous difficulties analyzing large databases. They run for hours without giving any results. Whatever the size of your database, Timi always gives an extremely accurate ranking in a few minutes.
Usually, software that are able to create correct predictive models are (very!) expensive. To have the same functionalities as the “basic Timi package”, it’s very common to pay between 170.000 and 940.000 euros per computer the first year (and it’s common practice to pay one third of the initial buying price each year, as “support” costs). Timi has zero up-front cost. This means that you don’t have to pay between 170.000 and 940.000 euros per computer to acquire Timi the first year. When using Timi, you only pay the “support costs”. The Timi support costs are usually 1/3 (one third) of the price of an equivalent “mainstream” solution. Furthermore, Timi is 100% free for education (e.g. for marketing departments in universities) and charity work. This is a revolution.
To create correct predictive models with other tools, you need first to “clean” our databases to remove all errors (like “negative values” in the “customer’s age” column). This process is usually extremely time-consuming (and expensive) and thus people generally avoid using predictive modelling. In opposition, Timi completely remove the need for “cleaning” and allows you to directly analyze “RAW data”. Timi can directly connect to your operational system and instantaneously give you many highly accurate rankings & lifts. This is a revolution!

Differences in results between the tools

Tools based on Prediction techniques (such as Timi):
These tools have a higher ROI (usually ten times higher than segmentation based tools). Timi delivers the highest ROI of all “Predictive Anlytics” tools. Usually, predictive Analytic tools have a higher price tag (but not Timi).
Tools based Segmentation techniques:
These tools have a very low ROI compared to Prediction-based tools. The tools based on segmentation techniques are usually marginally cheaper (but not always).

Here are some lift curves (automatically generated with Timi) that illustrate the quality of different rankings:

The lifts obtained with Timi are systematically better (they are higher) than the lifts obtained with any other commercially available “Advanced Analytics” tool. This fact is demonstrated by our outstanding results at various datamining competitions and various industrial benchmarks. This means that, when you are using Timi, all your marketing campaigns will have substantially higher ROI (from 10% to 20% added ROI compared to another Predictive-Analytical-CRM-Tool (and from 300% to 500% more ROI compared to a segmentation-based-Analytical-CRM-tool).

These differences in ROI between the different “Predictive Analytics” tools are somewhat difficult to understand when you are used to other kind of software. Let’s take an example: If you run a SQL script on the same database using two different RDBMS engine (e.g. using an “Oracle” engine and using a “Teradata” engine), you’ll get exactly the same answser. The same is not true for “Predictive Analytics” tools: For example: Let’s assume that you used the exact same dataset to create a ranking with Timi, with SAS, with SPSS, with R, etc. Each “Predictive Analytic” tool gives you a different ranking (and thus a different ROI)(…Although we used the exact same data to create all these rankings!).

Thus, the process of selecting the right “Predictive Analytic” software is different than for other “normal” tools (such as an ETL tool, a datawarehouse tool, a OLAP tools, etc.). With “normal” tools, you are mostly interested in the number of functionnalities because there are no differences in the end-results. In opposition, all predictive Analytics have, more or less, the same fonctionnality. Thus, to select a good “Predictive Analytics” tool, you should rather be interested in the quality of the results (i.e. you should be interested in the ROI that you can get from the tool). (See this page about how to compare precisely the ROI of different predictive models).

Summary

HIGHER LIFT CURVE = HIGHER ACCURACY = HIGHER ROI

Not all predictive analytics tools are born equal