5.2.1.1. Reading corrupted CSV/Text files.

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.2. Input Actions > 5.2.1. CSV file reader >

5.2.1.1. Reading corrupted CSV/Text files.

 

When transferring large .csv/.txt files (e.g. 100 GB text files) over an unrealiable computer network, it might sometime happen that some bytes of the file did not arrived correctly at the destination. In such situation, you will have a text file with a “hole” in it. The “hole” is typically composed of the ascii-character number zero (this character is also sometime referred as the “unicode code point zero”) and it’s usually 1 or 2 MB long. The same type of “holes” are also sometime happening when storing text files on some poor-quality USB keys. When you read such “damaged/corrupted file”, you have four options:
 

Abort the data-transformation graph if the ascii-character number zero is found inside the text file (this is the default option).
 

Skip all the rows that contains the ascii-character number zero: This allows to easily “jump” above the large “holes” inside the text files, to only keep the correct rows. This option is very useful to still read text files corrupted by a bad network transfer or a bad USB key.
 

Remove all the ascii-character number zero from the file and proceed as usual (this last option does not discard any row or column). This typically allows handling of erroneous text files produced by SAP.
 

Replace the ascii-character number zero with a valid character (e.g. the “space” character).