<< Click to Display Table of Contents >> Navigation: 5. Detailed description of the Actions > 5.9. Text Mining > 5.9.2. Phonetic Encoding (High-Speed action) |
Icon:
Function: PhoneticEncoding
Property window:
Short description:
State-of-the-Art Phonetic encoder.
Long Description:
In opposition to the CorrectSpelling Action, the PhoneticEncoder Action is typically not used to "clean" the database: It rather allows you to "bypass" data quality issues when performing (for example) table-joins. This Phonetic Encoder is used, for example, inside Google Refine (The data quality tool from Google).
The typical usage of a phonetic encoder is the following: You have 2 tables that contains different piece of information about the same individuals. You want to create one unified view of these 2 tables (i.e. you want to join the 2 table into one). You must use the individual's names to join 2 rows (from the 2 tables) together but these names contains some miss-spellings (i.e. the primary keys are individual's names). To bypass these spelling errors, you can join the 2 tables using the phonetic encoding of the names (rather than simply using the plain names). You thus need a phonetic encoder that is able to understand how names written in English, Dutch, French, German, Spanish are pronounced. The Metaphone 3 encoder (i.e. the Phonetic Encoder used inside Anatella) is one of the very few phonetic encoder (maybe the only one) that is able to "understand" a very wide variety of languages automatically (other encoders are limited to one language only). It's thus the perfect encoder for such a task…