In the previous section we could extract from the database only the rows that have changed by simply executing the SQL command:
SELECT * FROM TableT WHERE LAST_UPDATE_TIME > <LastRunTime>
Since the column “LAST_UPDATE_TIME” is not available anymore, we’ll have to extract every day the *whole* table from the database and resort to other ways to detect the rows that have been updated. One efficient way to construct this “detection” mechanism is to use the MD5 Action (see section 5.18.5. for more information about the MD5 Action). To summarize:
•the MD5 action compute a checksum that summarizes the content of the row.
•If the MD5 checksum on a row has “changed” between two extractions, it means that somewhere inside that row there is one cell that has changed: i.e. We can use the MD5 checksum as a detection mechanism to detect the rows that have been updated.
Here is the initial data extraction graph. You only run this graph one time to create the initial version of “HistoricalData.gel_anatella”:
On the next page, you’ll find the “incremental” extraction procedure that we run everyday. This is a quite “straight forward” Anatella graph.
Inside the above graph, we gave a small ID number to each Action. Here is a small summary of the objective of each Actions:
•Action 1: This actually extract from the database all the rows.
•Actions 2,3,13,14: This filter only keeps the “new” rows (that are not inside the “HistoricalData.gel_anatella” file).
•Action 4: Stop the process if there are no new rows.
•Actions 5,6,7,15: This merges the new rows (that we just extracted) with the “old” rows (that were extracted a long time ago and that are stored inside “HistoricalData.gel_anatella”). In particular, the new rows (coming from the data base) are missing the 3 columns named “RECORD_VALIDITY_FROM”, “RECORD_VALIDITY_TO” and “RECORD_CURRENT_FLAG”, so, (using the Action 5) we add them.
•Action 8,9: On some rows, we must update the column “RECORD_VALIDITY_TO”. The update expression is only working because the table is sorted in a very specific way (see the Action 7 to know the exact Sort Order).
•Action 10: Update the column “RECORD_CURRENT_FLAG”.
•Actions 11,12,16,17: self-explaining.