Sorting is one of the slowest operation that you can perform inside any ETL because it involves writing all the data on the hard drive (in tape files) and, just after, reading all the same data again from the hard drive (during the “Merge Sort”). All these I/O operations (writing and reading from the hard drive) take a considerable amount of time (especially for large input tables). Thus, when optimizing your Anatella graph for speed, you should avoid to use any Sort Action. There exists many different ways to obtain a sorted table without using a plain Sort Action: I strongly suggest you to use instead (if possible):
1.a MergeSort Action.
2.a MergeSortInput Action.
3.a partitionedSort Action.
These 3 Actions are incomparably faster than a plain Sort Action.
Sorting is required for the simple Join Action to work properly. If possible, replace the simple Join Action with a MultiJoin Action, to avoid sorting all the data.
Sorting is also required for the “out-of-memory” mode of the Aggregate Action to work properly. If possible, replace the “out-of-memory” mode with the “in-memory” mode, to avoid sorting all the data.