<< Click to Display Table of Contents >> Navigation: 5. Detailed description of the Actions > 5.27. Output Actions > 5.27.2. Anatella “Gel file” writer |
Icon:
Function: writeGel
Property window:
Short description:
Writes a table to a “.anatella_gel” file.
Long Description:
Please refer to section 5.1.1 to have more information on how to specify the filename of the .gel_anatella file (i.e. You can use relative path and Javascript to specify your filename).
The .gel_anatella files contain data in a proprietary compressed format. The maximum de-compression speed (when reading a .gel_anatella file) is around 600 MB/sec (uncompressed) (600 MB/Sec of uncompressed data is roughly equal to 170 MB/sec of compressed data because the compression ratio is usually between 1:3 and 1:4). The typical compression speed (on a Intel Core I7 processor) is around 60MB/sec (uncompressed data) (when using one thread). This (relatively) slow compression speed can sometime become the main “bottleneck” inside your data-transformation graph. To avoid such situation, you can decide to use several threads/CPU’s for data compression. For example, when using 3 CPU’s, you get around 3x60=180 MB/sec of compression speed.
You can set the number of CPU’s used for compression using the “Number of compression Threads” parameter. When this parameter is “-1”, Anatella uses the default value given inside the “Graph Global Parameters” window:
The Anatella automatic “HD cache” system also creates “Gel Files” when you click on the output pins of the different Actions. These “Gel Files” are also useful when you want to exchange data with other Anatella processes: See section 5.3.3. about this subject.
Anatella has a special behaviour when you click the output pin of the WriteGel Action.
The output pin of the WriteGel Action is slightly different than the output pins of ALL the other Actions and this can sometime be a little bit disturbing, if you don’t expect that.
These are the three main differences:
Difference 1. When clicking the output pin of the WriteGel Action and as long as the Anatella graph is running, you don’t get any data preview. Once the data transformation is stopped, you get the data-preview as usual.
Difference 2. When you “split” the input table in several different Anatella files (using the “Anatella Gel File Splitting” option), you won’t get any data preview when clicking the output pin of the WriteGel Action (Instead, if you still want a data-preview, simply click the output pin of the previous or the next Action).
Difference 3. When you activate the “column selection” option (to write inside your .gel_anatella file a (sub)selection of some specific columns of the input table), the data-preview that you obtain when clicking the output pin of the WriteGel Action only includes the selected column (and not ALL the columns). Nevertheless, the WHOLE table (including ALL the columns) is still propagated to the “next” Action.
About Blocking (and Non-blocking) I/O Algorithm
A typical algorithm that writes some rows inside a file on the Hard Drive follows these 4 steps:
1. Receive from the input pin of the GelFileWriter Action one row to write on the hard drive.
2. Copy the content of the row inside a large RAM buffer.
3. When the large RAM buffer is “full”:
● Compress the RAM buffer to get one data-block, containing
compressed data.
● Compute the CRC code used to validate the data-block when reading
the “.gel_anatella”.
● Write the data-block to the Hard Drive.
4. If there are no more rows to write on the input pin one of the GelFileWriter Action, stop here. Otherwise, go back to step 1.
In terms of speed, the above algorithm is not very efficient because, it’s a “blocking” I/O algorithm: i.e. When the large RAM buffer is “full”, the Action connected to the input pin of the GelFileWriter Action cannot send any more rows (i.e. because there is no space left in the RAM buffer to store them!). Instead, this transformation must block until the content of the large RAM buffer is “flushed” on the hard drive: More precisely, it must:
● Block until the compression of the data-block is completed.
● Block until the computation of the CRC code used to validate of data-block is completed.
● Block until the write of the data-block in the Hard Drive is completed.
…and then (and only then) the data-transformation unblocks and it can process the next rows.
Anatella always use a (better) non-blocking I/O algorithm to create all the “.gel_anatella” file. The GelFileWriter Action always uses one (or more) dedicated compression thread(s) when writing the .gel_anatella files to the hard drive. This means that all the tasks from the step 3 herabove will be performed continuously using “one (or several) background thread(s)”, so that the data-transformation-process never blocks (i.e. it will never block if enough dedicated compression threads are used).
When reading or writing Gel Files (row-based “.gel_anatella” or columnar “.cgel_anatella”, all I/O’s inside Anatella are non-blocking.