5.9.3. Detect Language (High-Speed clip0090 action)

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.9. Text Mining >

5.9.3. Detect Language (High-Speed clip0090 action)

 

Icon: ANATEL~3_img721

 
Function: DetectLang
 

Property window:

 

ANATEL~3_img720

 

Short description:

 

Detect The Language of a text field.

 

Long Description:

 

The engine used to detect the Language of a text field is the same engine used to detect the language of a webpage inside Google Chrome. The engine’s name is “CLD2”.

 

An official comparative study between many Language Detection Engine demonstrates that “...across the full set of 65 languages, CLD2 is the single best-performing system". This comparative study is: “Accurate Language Identification of Twitter Messages” from Marco Lui and Timothy Baldwin (NICTA VRL - Department of Computing and Information Systems - University of Melbourne, VIC 3010, Australia)

 

The supported languages are (name followed by ISO code in parenthesis):

 

ENGLISH(en); DANISH(da); DUTCH(nl); FINNISH(fi); FRENCH(fr); GERMAN(de); HEBREW(iw); ITALIAN(it); JAPANESE(ja); KOREAN(ko); NORWEGIAN(no); POLISH(pl); PORTUGUESE(pt); RUSSIAN(ru); SPANISH(es); SWEDISH(sv); CHINESE(zh); CZECH(cs); GREEK(el); ICELANDIC(is); LATVIAN(lv); LITHUANIAN(lt); ROMANIAN(ro); HUNGARIAN(hu); ESTONIAN(et); TG_UNKNOWN_LANGUAGE(xxx); UNKNOWN_LANGUAGE(un); BULGARIAN(bg); CROATIAN(hr); SERBIAN(sr); IRISH(ga); GALICIAN(gl); TAGALOG(tl); TURKISH(tr); UKRAINIAN(uk); HINDI(hi); MACEDONIAN(mk); BENGALI(bn); INDONESIAN(id); LATIN(la); MALAY(ms); MALAYALAM(ml); WELSH(cy); NEPALI(ne); TELUGU(te); ALBANIAN(sq); TAMIL(ta); BELARUSIAN(be); JAVANESE(jw); OCCITAN(oc); URDU(ur); BIHARI(bh); GUJARATI(gu); THAI(th); ARABIC(ar); CATALAN(ca); ESPERANTO(eo); BASQUE(eu); INTERLINGUA(ia); KANNADA(kn); PUNJABI(pa); SCOTS_GAELIC(gd); SWAHILI(sw); SLOVENIAN(sl); MARATHI(mr); MALTESE(mt); VIETNAMESE(vi); FRISIAN(fy); SLOVAK(sk); CHINESE_T(zh-Hant); FAROESE(fo); SUNDANESE(su); UZBEK(uz); AMHARIC(am); AZERBAIJANI(az); GEORGIAN(ka); TIGRINYA(ti); PERSIAN(fa); BOSNIAN(bs); SINHALESE(si); NORWEGIAN_N(nn); XHOSA(xh); ZULU(zu); GUARANI(gn); SESOTHO(st); TURKMEN(tk); KYRGYZ(ky); BRETON(br); TWI(tw); YIDDISH(yi); SOMALI(so); UIGHUR(ug); KURDISH(ku); MONGOLIAN(mn); ARMENIAN(hy); LAOTHIAN(lo); SINDHI(sd); RHAETO_ROMANCE(rm); AFRIKAANS(af); LUXEMBOURGISH(lb); BURMESE(my); KHMER(km); TIBETAN(bo); DHIVEHI(dv); CHEROKEE(chr); SYRIAC(syr); LIMBU(lif); ORIYA(or); ASSAMESE(as); CORSICAN(co); INTERLINGUE(ie); KAZAKH(kk); LINGALA(ln); PASHTO(ps); QUECHUA(qu); SHONA(sn); TAJIK(tg); TATAR(tt); TONGA(to); YORUBA(yo); MAORI(mi); WOLOF(wo); ABKHAZIAN(ab); AFAR(aa); AYMARA(ay); BASHKIR(ba); BISLAMA(bi); DZONGKHA(dz); FIJIAN(fj); GREENLANDIC(kl); HAUSA(ha); HAITIAN_CREOLE(ht); INUPIAK(ik); INUKTITUT(iu); KASHMIRI(ks); KINYARWANDA(rw); MALAGASY(mg); NAURU(na); OROMO(om); RUNDI(rn); SAMOAN(sm); SANGO(sg); SANSKRIT(sa); SISWANT(ss); TSONGA(ts); TSWANA(tn); VOLAPUK(vo); ZHUANG(za); KHASI(kha); SCOTS(sco); GANDA(lg); MANX(gv); MONTENEGRIN(sr-ME); AKAN(ak); IGBO(ig); MAURITIAN_CREOLE(mfe); HAWAIIAN(haw); CEBUANO(ceb); EWE(ee); GA(gaa); HMONG(hmn); KRIO(kri); LOZI(loz); LUBA_LULUA(lua); LUO_KENYA_AND_TANZANIA(luo); NEWARI(new); NYANJA(ny); OSSETIAN(os); PAMPANGA(pam); PEDI(nso); RAJASTHANI(raj); SESELWA(crs); TUMBUKA(tum); VENDA(ve); WARAY_PHILIPPINES(war); NDEBELE(nr); X_BORK_BORK_BORK(zzb); X_PIG_LATIN(zzp); X_HACKER(zzh); X_KLINGON(tlh); X_ELMER_FUDD(zze); X_Common(xx-Zyyy); X_Latin(xx-Latn); X_Greek(xx-Grek); X_Cyrillic(xx-Cyrl); X_Armenian(xx-Armn); X_Hebrew(xx-Hebr); X_Arabic(xx-Arab); X_Syriac(xx-Syrc); X_Thaana(xx-Thaa); X_Devanagari(xx-Deva); X_Bengali(xx-Beng); X_Gurmukhi(xx-Guru); X_Gujarati(xx-Gujr); X_Oriya(xx-Orya); X_Tamil(xx-Taml); X_Telugu(xx-Telu); X_Kannada(xx-Knda); X_Malayalam(xx-Mlym); X_Sinhala(xx-Sinh); X_Thai(xx-Thai); X_Lao(xx-Laoo); X_Tibetan(xx-Tibt); X_Myanmar(xx-Mymr); X_Georgian(xx-Geor); X_Hangul(xx-Hang); X_Ethiopic(xx-Ethi); X_Cherokee(xx-Cher); X_Canadian_Aboriginal(xx-Cans); X_Ogham(xx-Ogam); X_Runic(xx-Runr); X_Khmer(xx-Khmr); X_Mongolian(xx-Mong); X_Hiragana(xx-Hira); X_Katakana(xx-Kana); X_Bopomofo(xx-Bopo); X_Han(xx-Hani); X_Yi(xx-Yiii); X_Old_Italic(xx-Ital); X_Gothic(xx-Goth); X_Deseret(xx-Dsrt); X_Inherited(xx-Qaai); X_Tagalog(xx-Tglg); X_Hanunoo(xx-Hano); X_Buhid(xx-Buhd); X_Tagbanwa(xx-Tagb); X_Limbu(xx-Limb); X_Tai_Le(xx-Tale); X_Linear_B(xx-Linb); X_Ugaritic(xx-Ugar); X_Shavian(xx-Shaw); X_Osmanya(xx-Osma); X_Cypriot(xx-Cprt); X_Braille(xx-Brai); X_Buginese(xx-Bugi); X_Coptic(xx-Copt); X_New_Tai_Lue(xx-Talu); X_Glagolitic(xx-Glag); X_Tifinagh(xx-Tfng); X_Syloti_Nagri(xx-Sylo); X_Old_Persian(xx-Xpeo); X_Kharoshthi(xx-Khar); X_Balinese(xx-Bali); X_Cuneiform(xx-Xsux); X_Phoenician(xx-Phnx); X_Phags_Pa(xx-Phag); X_Nko(xx-Nkoo); X_Sundanese(xx-Sund); X_Lepcha(xx-Lepc); X_Ol_Chiki(xx-Olck); X_Vai(xx-Vaii); X_Saurashtra(xx-Saur); X_Kayah_Li(xx-Kali); X_Rejang(xx-Rjng); X_Lycian(xx-Lyci); X_Carian(xx-Cari); X_Lydian(xx-Lydi); X_Cham(xx-Cham); X_Tai_Tham(xx-Lana); X_Tai_Viet(xx-Tavt); X_Avestan(xx-Avst); X_Egyptian_Hieroglyphs(xx-Egyp); X_Samaritan(xx-Samr); X_Lisu(xx-Lisu); X_Bamum(xx-Bamu); X_Javanese(xx-Java); X_Meetei_Mayek(xx-Mtei); X_Imperial_Aramaic(xx-Armi); X_Old_South_Arabian(xx-Sarb); X_Inscriptional_Parthian(xx-Prti); X_Inscriptional_Pahlavi(xx-Phli); X_Old_Turkic(xx-Orkh); X_Kaithi(xx-Kthi); X_Batak(xx-Batk); X_Brahmi(xx-Brah); X_Mandaic(xx-Mand); X_Chakma(xx-Cakm); X_Meroitic_Cursive(xx-Merc); X_Meroitic_Hieroglyphs(xx-Mero); X_Miao(xx-Plrd); X_Sharada(xx-Shrd); X_Sora_Sompeng(xx-Sora); X_Takri(xx-Takr);