Most ETL’s do not have multithreading functionalities. Some are using a very specific (and primitive) form of multithreading: They automatically insert a Multithread Action in-between every normal Action. For example, when you see the following data-transformation graph inside a “classical multithreaded ETL tool”:
… this is, in reality, executed in the following way (note that we simply inserted a Multithread Action in-between every Action):
This means that, if you have a graph with twenty Actions, there will be (roughly) twenty CPU’s used to run the data-transformation-graph. Twenty is a number that is (usually) significantly larger than the physical amount of available CPU’s and this graph will thus run very slowly (because of the software “emulation” of the missing CPU’s performed by MSWindows: See the previous section about this subject). There is no way of preventing that.
Each Multithread Action has an internal FIFO-row-buffer. The management of this FIFO-row-buffer consumes a big amount of precious CPU time. Thus, it’s a good idea to reduce to the minimum the number of Multithread Actions in your graph: you can do that by designing your graph so that it has the longest Sections as possible (i.e. a section should include as many Actions as possible).
The data-transformation-engines that are used inside “classical” multithread ETL’s are using a very large amount of FIFO-row-buffers (because they internally place a Multithread Action in-between every Action: All their sections are composed of only ONE Action: it’s terrible!) and they are thus losing a very large amount of precious CPU time in managing all these un-necessary FIFO-row-buffers (and also in “context switching”).
Most “classical” multithread ETL Engines do not even offer the equivalent of “N-Way Multithreaded Sections”.
In opposition, inside Anatella, you have the complete control of:
a)how many CPU’s your data-transformation is using.
b)how many FIFO-row-buffers your data-transformation is using.
This “total control” allows you to use in the most efficient way, all the CPU’s inside your server.