Dissertation in the field of speech and language technology, André Mansikkaniemi
The title of the thesis is: Continuous Unsupervised Topic Adaptation for Morph-based Speech Recognition.
Automatic speech recognition (ASR) systems convert speech to text. The basic building blocks of a modern ASR system are the statistical acoustic and language models and the pronunciation dictionary (lexicon). The statistical models are trained on vast amounts of speech and text data from pre-existing collections. Depending on the language, the lexicon is either generated automatically or put together manually by experts. A challenge with time for ASR systems is how to recognize new words and phrases. In this thesis methods have been developed to enable automatic adaptation of an ASR system for Finnish. A method for language model adaptation has been studied where new text data is collected from the Web. The language model is adapted to a specific recording by automatically selecting Web articles which are topically closest to the recognized text. A new language model is acquired by adapting the baseline model with the selected articles. Results show that recognition accuracy is improved for Finnish broadcast news when the new adapted language model is used. Methods for adapting the lexicon have also been developed, with a special focus on foreign names and acronyms. Methods are used to automatically identify foreign names and acronyms in the Web texts, and new pronunciation rules are generated and added to the lexicon. Used together with the adapted language model, lexicon adaptation also improves recognition accuracy. These methods can be used to adapt ASR systems to new speech data and to enable continuous update cycles whenever new text data is acquired.
Opponent: Professor Torbjørn Svendsen, Norges Teknisk Naturvitenskapelige Universitet (NTNU)
Supervisor: Professor Mikko Kurimo, Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics
SPA, Aalto University spa.aalto.fi