5.9.2. Phonetic Encoding (High-Speed clip0090 action)

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.9. Text Mining >

5.9.2. Phonetic Encoding (High-Speed clip0090 action)


Icon: ANATEL~3_img717

Function: PhoneticEncoding

Property window:




Short description:


State-of-the-Art Phonetic encoder.


Long Description:


In opposition to the ANATEL~3_img694 CorrectSpelling Action, the ANATEL~3_img717 PhoneticEncoder Action is typically not used to "clean" the database: It rather allows you to "bypass" data quality issues when performing (for example) table-joins. This Phonetic Encoder is used, for example, inside Google Refine (The data quality tool from Google).


The typical usage of a phonetic encoder is the following: You have 2 tables that contains different piece of information about the same individuals. You want to create one unified view of these 2 tables (i.e. you want to join the 2 table into one). You must use the individual's names to join 2 rows (from the 2 tables) together but these names contains some miss-spellings (i.e. the primary keys are individual's names). To bypass these spelling errors, you can join the 2 tables using the phonetic encoding of the names (rather than simply using the plain names). You thus need a phonetic encoder that is able to understand how names written in English, Dutch, French, German, Spanish are pronounced. The Metaphone 3 encoder (i.e. the Phonetic Encoder used inside Anatella) is one of the very few phonetic encoder (maybe the only one) that is able to "understand" a very wide variety of languages automatically (other encoders are limited to one language only). It's thus the perfect encoder for such a task…