We can open HDFS drives and read .parquet files.
We've written more than 40k line of C, becoming the 1st and only 100% C code tool in the Hadoop ecosystem!
Compared to the common ETL solutions used in Hadoop:
[*]we're 10 to 40 times faster
[*]since we are NOT "in memory" we are NOT LIMITED by the size of the data...
For example, one of our banking/insurance customers had a big 40GB JSON file to create from 400GB data source. With Anatella it took them less than 15 min. They've tried the same operation with another solution and gave up after 24hours, on a cluster using more than 10 servers...
[*] Thanks to the Anatella graphical interface, we've removed the coding issues inherited from the language complexity and the lack of experienced programmer.
[*]thanks to Anatella integration with TIMi modeler we can create quickly and easily predictive model compared the Hadoop machine learning stack which we believe "limited"