We are always tempted to increase the “Memory Buffer Size” parameter to the maximum, to obtain the fastest sort. Let’s consider the following data transformation graph:
Let’s assume that:
•The total available RAM memory inside the computer is 2GB (i.e. it’s a 32-bit computer).
•The memory used inside the MultiJoin Action to store the Slave Table is 1.7 GB.
This means that the remaining memory to perform the sort is only around 0.3GB=300MB. This is not much and the sort might be quite slow. Even worse, if you set the “Memory Buffer Size” parameter to a value above or near 300MB, the sort will directly fail with a “out-of-memory” message (because you cannot use more than 2GB RAM on a 32-bit computer). There are several solutions:
•Optimize the MultiJoin Action: Typically: change the Data Type of some columns in the Slave table so that less RAM memory are required to store the Slave table in memory.
•Re-Write your data transformation graph: For example:
To execute the above data transformation graph, Anatella will:
▪Load into RAM memory the Slave Table to compute the MultiJoin Action. The amount of free RAM memory is now only 300MB.
▪Compute the join and save the results inside a temporary file.
▪As soon as the computation of the temporary file is complete, Anatella will un-load from memory the Slave Table (because we don’t need it anymore to compute the join because there is no join to compute anymore). The amount of free RAM memory is now back to 2GB.
▪Read the temporary file and sort it. You can use 1.5GB of RAM memory to compute the sort (i.e. “Memory Buffer Size”=1500MB).
Please note that this last solution should ideally be used only for a “quick fix” because it’s not speed efficient because it implies writing all the data inside a temporary file (and then re-reading it). This extra I/O (i.e. reading & writing all the data on the hard drive) costs a large amount of time.