To always guarantee to have no duplicated or lost rows, we will use an intermediary table.
To add a a few rows (available inside a file named <source.gel_anatella>) inside the “final” Table T, we will run the following steps:
1.(optional) CREATE an intermediary table (that has no INDEX)
2.Empty the intermediary table (i.e. run a “TRUNCATE TABLE” SQL command)
3.INSERT all the new rows from <source.gel_anatella> inside the intermediary table (See the previous section for more details about this INSERT procedure).
4.Delete the (local) file <source.gel_anatella>
5.Run a “INSERT INTO <Table T> SELECT * FROM <Intermediary Table>” SQL command (to copy the rows from the intermediary table to the final table T).
6.Empty the intermediary table (i.e. run a “TRUNCATE TABLE” SQL command)
With this 6-step procedure (via an intermediary table), you are 99% sure to have no "duplicate lines", even in the event of a machine crash (or power off). If the machine crashes, just restart the "complete" Anatella process. This procedure is 99% safe unless the PC crashes during step 4 (in this case, there will be some "lost" raws) but this is very unlikely because the step 4 is very short. In the case of machine crash (or power off), just restart the whole procedure from scratch.
We can also move the step 4 after the step 5 but then, in case of machine crash during step 5, there will be some “duplicated” rows. Furthermore, this might not be a good idea because the step 5 is very long.
To get a 100% safe procedure, we can use this slightly more complex procedure:
1. One time per day (or each time we get a new <source.gel_anatella> to upload inside the database): Save the result of a “SELECT COUNT(*) FROM <Table T>” into the local Anatella variable “Count_Table_T” (more precisely: store the variable “Count_Table_T” inside a text file)
2. Compare the result of a “SELECT COUNT(*) FROM <Table T>” to the local Anatella variable “Count_Table_T” (that has been read from the text file). if these 2 numbers are equal, run the upload procedure (i.e. run the steps 2.1. to 2.3.):
2.1. Empty the intermediary table (i.e. run a “TRUNCATE TABLE” SQL command)
2.2. INSERT all the new rows from <source.gel_anatella> inside the intermediary table (See the section 10.10.2 for more details about this INSERT procedure).
2.3. Run a “INSERT INTO <Table T> SELECT * FROM <Intermediary Table>” SQL command (to copy the rows from the intermediary table to the final table T).
3. Delete the (local) file <source.gel_anatella>
4. Empty the intermediary table (i.e. run a “TRUNCATE TABLE” SQL command)
The above procedure is 100% safe and guarantees that there will never be any lost or duplicated rows, even in the case of a computer crash (or power off). If a computer crash happens, just restart the whole procedure (excluding step 1) a second time.
To make things easier, you can save this whole procedure inside one “parametrized” Anatella graph (actually, you’ll need 2 or 3 graphs and not one). In this way, you’ll only need to call your graph (using the ParallelRun Action) to upload safely your rows to the database. Your “parametrized” Anatella graph will be using the 2 global parameters “<source_gel_file>” and “<destination_table>”: See the section 5.1.5. for more information about “Graph Global parameters”.
The use of an intermediary table when making large quantities of "INSERT" is very common for other reasons: when the final table T has an INDEX, it’s usually much better to “go through” an Intermediary Table (to avoid losing time updating the INDEX): You’ll get more details about the subject of “Uploading rows inside a table with an INDEX” inside section 220.127.116.11.
Because of the distributed nature of Teradata database, the update of the INDEX structure is very slow. This means that, 99% of the time, you must use an Intermediary Table (see the section 18.104.22.168. for more details about this subject). Also, to guarantee that no rows are ever duplicated or lost, you must also use an Intermediary Table (as explained in this section). This means that, when using Teradata, 99% of the time, we’ll use an Intermediary Table to upload new rows inside the database.
This is why, when we created the TeradataWriter Action, we decided to use the “FastLoad” tool as the back-end (that is working "behind the scene" in the TeradataWriter Action). At first sight, the “FastLoad” tool might seem limited because it doesn’t allow you to add rows inside a table that has an INDEX (and this means that we’ll be forced to use an Intermediary Table to be able to add rows to a Table with an INDEX). …But since, we’ll be using *anyway* 99% of the time an Intermediary Table (for the different reasons explained hereabove), the “FastLoad” tool becomes the most efficient choice (in 99% of the situations).