Model Automation with Timi modeler
As the world of analytics keeps evolving, companies are facing an increasing challenge in terms of speed of response to market changes. While the buzz around big data and analytics keeps rising, many (most) companies still struggle to harvest the value contained in their own databases.
While I certainly would not challenge the incredible value that will come from the integration of all the data source of big data, connecting social media, videos, audio information, commodity and appliance usage with Internet of Things connections, the challenge in 2015 still is for the most part the value generation from existing, structured, transactional databases.
For many, as reported in the KDNugget poll of last April, the idea of predictive modelling automation is still a dream, with 95% of the respondents considering a thing for a somewhat distant future. Automation, however, can already be applied in many situations once the key challenges are addressed.
The need for automation is not always clear for every actor in predictive analytics. However, those who faced modeling tasks for banks, telecom companies or larger retailers know the pressure for fresher models due to the incredible pace at which the market changes and adapts.
In extremely competitive environments, banking customers are constantly solicited, and their financial situation changes quickly – and so are the offerings of competitive banks – making churn and cross selling models very quickly obsolete. Telecom companies face an even greater challenge as patterns of desertion come from small changes in the “social connections” among subscriber, much more than their objective socio-demographic of product characteristics. To react on time, those companies typically have a window of 3-5 days from the occurrence of the pattern to the retention reaction to have an impact. And such models (based on SNA variables) cannot be built on samples, meaning the totality of the call logs (CDR) must be updated and analyzed in a record time, which rapidly represents billions of records.
Retailers are the most unlucky ones. I addition to the sheer volume of information, they need to generate hundreds or even thousands of variables to be able to anticipate the needs of their customers and send them relevant discounts or offerings. Typically, keeping track of months of purchase history and build hundreds of models to compute the purchase probability of every possible product, and finally optimize the product combination so both purchase probability and profitability are taken into account. And of course, last month’s models are no longer relevant as the market did not stay still and competition made many offerings that affected the purchase probability for each product.
Recreating such a quantity of models by hand is an impossible task. Still, many consider that it is the only option to get models that satisfy conditions of a “good enough” lift, no problem of over-fitting and appropriate respect of distribution assumptions.
The good news is that Timi modeler is especially fitted to meet such challenges.
Fist, it drastically reduces the risk of over-fitting thanks to its automated validation system. Models are built with K-Fold Cross validation algorithms and TIMi takes care of the learn/test/validate problematic automatically.
Then, using an extremely fast implementation of an ElasticNet Regression, the variable selection allows analysts to not having to worry about pre-selecting the variables prior to constructing a model.
Finally, because the model generation is so fast (a matter of minutes per core per model), keeping track of “which model should be updated?” is often not an issue as the answer can easily be: “all of them”. By the end of the day.
After looking at those technical aspects of the automation, another problem often remains: the technological infrastructure requirements needed to perform such titanic task.
Here’s the good news: in 99% of the cases, this is feasible on existing infrastructure, as Timi modeler is built to run quickly and efficiently on standard grade modern PC. And for those rare occasions where it is not enough, Timi modeler also offers a distributed computed solution in which available computing can be shared among users.