5.2.12.3. Character Encoding in EDI/X12 files

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.2. Input Actions > 5.2.12. Edi-Fact / X12 Reader >

5.2.12.3. Character Encoding in EDI/X12 files

 

The default character encoding used to read X12 files is “CP1252” (The “CP1252” character set is exactly like the standard “ISO-8859-1” but it contains, in addition, the Euro Symbol, that is quite useful). If the X12 file has a BOM (Byte-Order-Mark), then Anatella will use the character encoding specified inside the BOM (i.e. it will use UTF-8, UTF-16 or UTF32). If the X12 file does not contain any BOM, Anatella is still able to detect UTF-16 or UTF-32 files properly and open them accordingly.

 

For the EDI files, the same rules as for the X12 files applies (see the previous paragraph). If there is no BOM and if the file is not UTF-16, nor UTF-32, then Anatella looks at the first “Data Element” of the “UNB” Segment:

 

The First “Data Element” of the “UNB” Segment is:

Character Encoding used by Anatella to read the EDI file

UNOA, UNOB, UNOC

CP1252       (Latin1 - Western Europe and Americas)

UNOD

ISO-8859-2 (Latin2 - Slavic and Central European languages)

UNOE

ISO-8859-5 (Latin - Cyrillic)

UNOF

ISO-8859-7 (Latin - Greek)

UNOG

ISO-8859-3 (Latin3 - Esperanto, Galician, Maltese, and Turkish)

UNOH

ISO-8859-4 (Latin4 - Scandinavia/Baltic)

UNOI

ISO-8859-6 (Latin - Arabic)

UNOJ

ISO-8859-8 (Latin - Hebrew)

UNOK

ISO-8859-9 (Latin5 - Same as Latin1 except for Turkish)