|
<< Click to Display Table of Contents >> Navigation: 5. Detailed description of the Actions > 5.5. Standard > 5.5.2. Sort (High-Speed action) > 5.5.2.2. Faster Alternatives to the “simple” Sort |
Sorting is one of the slowest operation that you can perform inside any ETL because it involves writing all the data on the hard drive (in tape files) and, just after, reading all the same data again from the hard drive (during the “Merge Sort”). All these I/O operations (writing and reading from the hard drive) take a considerable amount of time (especially for large input tables). Thus, when optimizing your Anatella graph for speed, you should avoid to use any
Sort Action. There exists many different ways to obtain a sorted table without using a plain
Sort Action: I strongly suggest you to use instead (if possible):
1.a
MergeSort Action.
2.a
MergeSortInput Action.
3.a
partitionedSort Action.
These 3 Actions are incomparably faster than a plain
Sort Action.
Sorting is required for the simple
Join Action to work properly. If possible, replace the simple
Join Action with a
MultiJoin Action, to avoid sorting all the data.
Sorting is also required for the “out-of-memory” mode of the
Aggregate Action to work properly. If possible, replace the “out-of-memory” mode with the “in-memory” mode, to avoid sorting all the data.