Mašīnmācīšanās metožu lietojums vārdu sastatīšanā balstītai paralēlo korpusu novērtēšanai un tīrīšanai

Šajā darbā ir aprakstīta paralēlu korpusu novērtēšanas un tīrīšanas metode, kas automātiski spēj noteikt katra teikuma derīgumu pēc to vārdu sastatījumiem ar paralēlo teikumu. Vārdu sastatījumi teikumā apraksta vārdu atbilstību ar to pašu teikumu iztulkotu citā valodā. Ja tie ir daudz attiecībā pret vārdu daudzumu, tad var pieņemt, ka teikumi ir atbilstīgi. Pazīmju analīzei tiek izmantots mašīnmācīšanās algoritms, kas spēj uzbūvēt laba/slikta teikuma raksturojošu pazīmju modeli. Paralēlu tekstu korpusi ir plaši pielietoti mašīntulkošanas sistēmu izveidē. Tādējādi darbā izvirzīta hipotēze, ka sastatījumos balstīta korpusa novērtēšana un tīrīšana palīdz atbrīvoties no neprecīziem tulkojumiem un uzlabot mašīntulkošanas sistēmu kvalitāti.
This thesis looks at a method for evaluation and cleaning of parallel corpora that can automatically determine the quality of each sentence from its word alignments with the parallel sentence. Word alignments show the word by word alignment of a sentence in one language to the same sentence translated in a different language. It can be presumed that if there are many alignments against the total number of words in the sentence, then the parallel sentences are good translations of each other. Machine learning is used to analyse the features extracted from word alignments. Parallel text corpora are widely used in machine translation. Therefore, the hypothesis of this thesis states that corpus evaluation and cleaning based on word alignments help to remove bad translations and improve a machine translation system.

Keywords

Datorzinātne, mašīnmācīšanās, datorlingvistika, vārdu sastatījumi, mašīntulkošana, korpuss

URI

https://dspace.lu.lv/handle/7/35208

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses

Full item page

Mašīnmācīšanās metožu lietojums vārdu sastatīšanā balstītai paralēlo korpusu novērtēšanai un tīrīšanai

Files

Date

Authors

Co-author

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Language

Abstract

Keywords

Citation

Relation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By