About Spark and the Cloud You’ll find here two youtube videos that explains: * the Amdhal’s Law and the “incompressible time” of distributed computation engines. * why you shouldn’t use Spark for ETL processes. * why you shouldn’t use any “cloud solutions” (Amazon, Azure) for “data science” projects. (subtitles in English and French are available).
Data vaulting: from a bad idea to inefficient implementations An efficient data management mechanism should have two main characteristics: operational efficiency (it must run faster and with less resources than those it aims to replace) and structural clarity (it must be straightforward to access, understand, and query). As IT data manager, you know you sometimes
Why you need more data engineers, but not for the reasons you think The role of the data scientist has evolved quite a bit over the last few years. While in some areas, it stemmed from groups of software engineers and other IT specialist who soon realized making models was more than linking to a