About Spark and the Cloud
You’ll find here two youtube videos that explains:
* the Amdhal’s Law and the “incompressible time” of distributed computation engines.
* why you shouldn’t use Spark for ETL processes.
* why it’s better to avoid using “cloud solutions” (Amazon, Azure) for “data science” projects.
(subtitles in English and French are available).
Part 1:
Part 2:
The presentation used in the two videos:
http://download.timi.eu/docs/TIMi_vs_Spark.pdf
A quick one-page executive summary about the two videos:
http://download.timi.eu/docs/Spark_vs_TIMi_Executive_Summary.pdf
A white paper that summarizes the findings explained in the two videos:
http://download.timi.eu/docs/Spark_vs_TIMi_technical_white_paper.pdf
To see the video from Mister Frédéric Pierucci:
https://youtu.be/dejeVuL9-7c
The Github repository with the Anatella graphs and the scala codes used in the video:
https://github.com/Kranf99/TPC-H-Benchmarck-Anatella-Spark