5.3.2.3. Input Actions and Multithread Sections

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.3. Run Control > 5.3.2. Multithread Run >

5.3.2.3. Input Actions and Multithread Sections

The clip0042 ReadCSV Action is “injecting” new rows (that are read from the Hard Drive) into the graph. Each time Anatella calls the ReadCSV Action, it does the following:
 

1.Look at the internal Row-Buffer of the Action:
 

a.If the Row-buffer is empty:

i.Read a bunch of data (typically 1 MB) from the Hard Drive and copy into the Row-Buffer (And, optionally, de-compress the binary data into usable data).

ii.Select the first row in the buffer.
 

b.If the Row-Buffer is not empty:

      Select the next row in the buffer
 

2.“Inject” the selected row into the Anatella graph and remove it from the Row-Buffer.

 

The clip0043 ReadCSV Actions is using a synchronous (i.e. blocking) I/O algorithm (See the section 5.2.6.2. about asynchronous I/O algorithms). This means that, while Anatella is occupied reading some data from the Hard Drive (when it copies 1MB of data from Hard Drive to main RAM memory), it “freezes” the whole Multithreaded Section that contains the clip0043 ReadCSV Actions. To avoid freezing the whole data-transformation-graph, it’s a good idea to isolate the clip0043 ReadCSV Action in a separate Section (thus using a ANATEL~3_img5Multithread Action).

The same remark applies to all the other “Input Action” that are based on a simple synchronous (i.e. blocking) I/O algorithm: the ANATEL~3_img27 SASReader Action, the ANATEL~3_img28 ODBCReader Action, etc. For example, you’ll often have:

 

clip0044

 

 

ANATEL~2_img8

You should not use the above combination to increase the running speed of the Action that have asynchronous (i.e. non-blocking) I/O algorithms (See the section 5.2.6.2. about asynchronous I/O algorithms): these Actions include: the ANATEL~3_img31  GelReader Action, the ANATEL~3_img32  ColumnarGelFileReader Action, the ANATEL~3_img33  readStat Action and the ANATEL~3_img34 TcpIPReceiveTable Action.

 

Conceptually, the logic is the same with both solution: Use an additional thread to allow the rest of the data-transformation-graph to run while the data are extracted from the Hard Drive. The difference comes from FIFO buffer located inside the ANATEL~3_img5 multithread Action. This FIFO buffer implies a (slow) deep-copy of all the rows that are going through the ANATEL~3_img5 multithread Action. When using an asynchronous I/O algorithm, you don’t have to perform this deep copy at all and this is thus more efficient.

 

In this way, when the clip0043 ReadCSV Action “freezes” (because it’s waiting for the Hard Drive), it only blocks its own Multithreaded-Section but the rest of the transformation graph (i.e. the other Sections) can still continue to run (without any interruption), using the rows that are inside the FIFO-row-buffer of the ANATEL~3_img5Multithread Action, just next to it. Of course, if it freezes for too long (i.e. if the Hard Drive or the database is very slow), then the the FIFO-row-buffer of the ANATEL~3_img5Multithread Action empties out and, once again, the whole data-processing stops (this is sometime referred, in technical terms, as a “Pipeline Stall”). This happens very often with the ANATEL~3_img28 ODBCReader Action because the databases systems are usually very slow compared to Anatella Graphs: More precisely: Databases have usually some difficulties to “deliver” the rows at the high-speed required by Anatella for optimal execution speed. In such common situation, one way to reach high-processing speed is to run “in parallel” different SQL extractions, in different Multithreaded Sections.

 

For example, this is not very efficient (despite the fact that it includes several ANATEL~3_img5Multithread Actions):

 

 

ANATEL~3_img42

 

The above graph will run the 3 database extractions one after the other and the processing speed will most likely be quite slow because of “Pipeline Stalls”. The following graph (that runs the 3 database extractions “in parallel”) is a better solution:

 

clip0045

 

You can think of the clip0046 Multithread Action with several input pins as the “multithreaded” equivalent of the standard clip0047 GlobalRunFlag. The clip0047 GlobalRunFlag executes the graph sequentially:
First, the clip0047 GlobalRunFlag runs all the Actions connected to the pin 0.

Once it’s finished, the clip0047 GlobalRunFlag runs all the Actions connected to the pin 1,

Once it’s finished, the clip0047 GlobalRunFlag runs all the Actions connected to the pin 2,

Once it’s finished, the clip0047 GlobalRunFlag runs all the Actions connected to the pin 3, etc.

In opposition, the clip0046 Multithread Action executes the graph in parallel. As soon as you run the Graph (e.g. as soon as you pressed F5), all the Action connected to the clip0046Multithread Action start running at the same time.
You can still have some control on the order in which the Actions are executed using the “Synchronization” option of the ANATEL~3_img5Multithread Action.