Anatella is the only tool inside the Hadoop Ecosystem that is 100% developed using native (C/C++ or assembler) code. More precisely:
•Anatella is the only tool that can manipulate the famous .parquet files (using the readParquet Action and the writeParquet Action) using very efficient C/C++ code.
•Anatella is amongst the very few tools that can natively access (read&write) the HDFS drive using only very efficient C/C++ code.
Thanks to its proprietary, native (code in C/C++) implementation, Anatella is faster than Hadoop/Spark clusters of any size (i.e. with any quantity of “nodes”). This is due to the relatively high incompressible time of Hadoop/Spark: you find more details on this subject here: https://timi.eu/blog/cloud/). Furthermore, thanks to its “streaming” (i.e. row-by-row) computations, Anatella is not limited to some small data files that “fits into RAM memory” (that is in opposition to Spark and to nearly all other tools inside the Hadoop ecosystem that are 100% “in-memory” tools). This means that, with Anatella, you can really process nearly unlimited datatset sizes (that is in opposition to Spark that is strongly limited to small data sizes because every input&ouput data files must fit simultaneously in the limited RAM memory available). To summarize, Anatella is for Seriously Big “Big data” processing (and Spark is for the “Little” Big Data cases).
The tools inside the Hadoop ecosystem are always exchanging data between each other using the same mechanism: i.e. they read/write .parquet files (or sometime .avro files) from/to the HDFS drive. In this regard, Anatella is no different from the other tools inside the Hadoop ecosystem: i.e. Anatella reads read/write .parquet files from/to the HDFS drive.