“Into The Minds” publishes the ETL Guide 2022:
Anatella dominates the market
On Wednesday 26 January, the company “Into The Minds” published their new ETL Guide for 2022. This guide provides a brief history of ETL and a comparative analysis of three modern ETL. We invite you to read the guide by following this link.
The acronym ETL stands for Extract – Transform – Load. ETL are tools that facilitate the data preparation process. The Anatella solution belongs to the ETL category (and even more: it belongs to the “ETL+” category, according to the taxonomy used by the researchers from “Into The Minds”).
Historically, ETL tools focused on performing 3 types of operations on data: (1) Extracting data from files of different natures; (2) Transforming and enriching data to prepare them for exploitation; (3) reLoading data into another system so that they can be used further down the chain.
These operations are facilitated by the graphic interface of ETL solutions. These solutions are based on “boxes” that are assembled to achieve the desired result. Most ETL are therefore “No Code” solutions that can be put into (almost) any hands.
History of ETL
THE 1980’S | Invention of ETL: ETL are used to manage flows between “simple” databases | ||
---|---|---|---|
THE 1990’S | Evolution of ETL to manage complex datawarehouses (DWH) | ||
THE 2000’S | Division of ETL tools into 2 main categories: | ||
Category 1 ELT: some ETL become ELT and focus only on the “E” and “L” (Extract and Load) tasks, leaving the “T” to the database engine. Example: Talend, Matillion, etc. | Category 2 ETL: more “T”-type functionalities are added: data cleaning, slightly more complex joins. Example of ETL tools that are still in the category 2 in 2021: IBM data stage, Ab inito, etc | ||
YEAR 2010 | Invention of the DataLake: this required an evolution of ETL to handle a situation where there is no database engine “behind” to perform Transformations. Since ELT require a database engine “behind” them to run properly, they are a bit outdated. The data lake is a new concept that is optimized for business/data analysts and data scientists who have more advanced data needs. Because of the emergence of the data lake some “category 2” ETL are evolving into a “category 3”, that is optimized for the modern needs of data workers: | ||
YEAR 2015 | Category 3 “ETL+”: For data prep: it allows to do much more complex things: text mining, data mining (machine leaning), AI, big data, etc. Example of category 3 ETL tools: Anatella, alteryx, etc |
ETL on the market
The researchers from “Into The Minds” compared 3 ETL in the top category: “Category 3: ETL+”. The comparison is carried out along 3 main axes: Extract, Transform and Load operations. For each of these axes, the number of “boxes” available in each tool is reported. Here is a summary table of the results of this study:
Number of “boxes” to: | ETL | |||
---|---|---|---|---|
Alteryx 2020.1.5.25447 | Tableau Prep | Anatella 2.38 | Anatella 2.54 | |
Extract | 4 | 57 | 37 | 78 |
Transform | 33 | 5 | 50 | 50 |
Load | 5 | 3 | 27 | 27 |
It should be noted that the researchers from “Into The Minds” used a relatively old version of Anatella (v2.38). We are currently at v2.54 and this latest version has many more connectors for Extraction.
In Anatella, the 78 input connectors (for Extraction operations) are:
In terms of functionality in the ‘Extract’ category, the researchers from “Into The Minds” noted that: “Anatella also handles unstructured input formats”. Indeed, Anatella is the only tool that allows easy manipulation of multi-level XML or JSON files (99% of XML or JSON files are multi-level).
The researchers at “Into The Minds” make other pertinent remarks that I invite you to read directly on their blog.
Additional features
As explained in the “History” section, ETL have evolved over time to include certain functionalities that went beyond the specific Extract – Transform – Load framework. ETL in “Category 3: ETL+” also offer analytical functions or visualization functions. Indeed, far from being gadgets, these additional functions offered within the ETL tool are accelerators. They allow the analysis process to be accelerated by anticipating certain steps in the data preparation process.
As an example, here are some additional features unique to Anatella that are highly valued by the researchers from “Into The Minds” :
- NLP (Natural Language Processing): this is about finding the sentiment of a text in English, French, Dutch, etc. (this feature is used as part of their research work on the virality of posts on Linkedin)
- Language detection: very useful when working with unstructured data
- Visualisations with R: Anatella uses embedded R code that allows you to quickly make simple visualizations. These can be used to get a first idea of the data but also to perform quality controls. Very useful to check that no data has been lost in the data preparation process.
Conclusion of from the researchers from “Into The Minds” : Which ETL tool to choose?
These are the final conclusions of the researchers from “Into The Minds” :
It is necessary to choose the ETL that best suits your specific needs. …Each ETL solution has its own specificities and the comparisons I have made above are only one aspect among others.
Beyond the range of functionalities, we must also consider the speed of data preparation processes. The researchers at “Into The Minds” have already carried out a benchmark of 4 ETL tools and the differences in processing time were considerable.
In the end, I think there are 2 essential objective criteria to take into account:
- functionalities
- speed
In addition, there are more subjective aspects such as the product roadmap and the customer orientation of the publisher. From this point of view my preference is clearly for Anatella. The company behind it is very responsive and does not hesitate to develop specific solutions for your needs. I doubt that companies like Talend or Alteryx do the same.
The conclusions of the researchers from “Into The Minds” are clear: Whatever the criteria used to qualify an ETL (extent of functionality, speed of execution, quality of customer support, etc.), the dominant ETL is Anatella.