Language Identification in the Context of Automatic Speech Understanding

E. Noth, S. Harbeck, H. Niemann, V. Warnke, I. Ipšić

Abstract


We present two concepts for systems with language identification in the context of multilingual information retrieval dialogues. The first one has an explicit module for language identification. It is based on training a codebook for each language, running the language specific vector quantizers in parallel and integrating over the output probability of the best alternative in each language. The system can decide for one language either after a predefined time interval or if the difference between the probabilities of the languages succeeds a certain threshold . T his approach allows to recognize languages that the system cannot process and give out a prerecorded message in that language. In the second approach, the trained recognizers of the languages to be recognized, the lexicons, and the language models are combined to one multilingual recognizer. Only allowing transitions between the words from one language, each hypothesized word chain contains only words from one language and language identification is an implicit byproduct of the speech recognizer. First results for the explicit language identification are presented.


Keywords


language identification, speech understanding, multilingual information, retrieval dialogues

Full Text:

PDF


Creative Commons License
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Crossref Similarity Check logo

Crossref logologo_doaj

 Hrvatski arhiv weba logo