Fonētiskās vārdnīcas un valodas modeļa izstrāde latviešu valodas runas apstrādei

Darbā tika pētītas divas no svarīgākajām sastāvdaļām ir valodas modelis un fonētiskā vārdnīca. Analizēta četru dažādu tekstu korpusu un to apvienojumu ietekme uz runas atpazīšana kvalitāti latviešu valodā. Testētas trīs dažādas fonētiskās izrunas ieguves metodes. Iegūtais rezultāts ir valodas atkarīgs, bet izmantotās metodes ir valodas neatkarīgas. Bāzlīnijas nepārtrauktas runas atpazīšanas sistēmas precizitāte ir 36.17%. Pēc uzlabojumu veikšanas precizitāte paaugstinājās par 6.45%, no 36.17% uz 42.62%. Lai gan labākie rezultāti tika sasniegti ar bāzlīnijas metodēm, darba izstrādes laikā iegūtās zināšanas ļaus pilnveidot bāzlīnijā izmantoto metožu kvalitāti. Atslēgvārdi: runas atpazīšanas sistēmas, runas atpazīšana process, valodas modeļi, fonētiskā vārdnīca.
The study investigated two of the most important components of speech processing - a language model and phonetic dictionary. Comparison of four different text corpora and their combination was done to estimate language model impact on Latvian speech recognition. Three different phonetic pronunciation extraction methods were tested. The result is language dependent, but the methods used are language independent. The baseline for continuous speech recognition system is 36.17%. After improvement the accuracy increased by 6:45% from 36.17% to 42.62%. Although the best results were achieved with the baseline methods, the knowledge gained in method development will help to improve the quality of the methods used in the baseline. Keywords: speech recognition systems, speech recognition, language modeling, graphe-to-phoneme modelation.

Keywords

Datorzinātne

URI

https://dspace.lu.lv/handle/7/21139

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses

Full item page

Fonētiskās vārdnīcas un valodas modeļa izstrāde latviešu valodas runas apstrādei

Files

Date

Authors

Co-author

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Language

Abstract

Keywords

Citation

Relation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By