Mašīnmācīšanās metožu lietojums vārdu sastatīšanā balstītai paralēlo korpusu novērtēšanai un tīrīšanai

Zariņa, Ieva

dc.contributor.advisor	Rikters, Matīss
dc.contributor.author	Zariņa, Ieva
dc.contributor.other	Latvijas Universitāte. Datorikas fakultāte
dc.date.accessioned	2017-07-01T01:09:06Z
dc.date.available	2017-07-01T01:09:06Z
dc.date.issued	2017
dc.identifier.other	57790
dc.identifier.uri	https://dspace.lu.lv/dspace/handle/7/35208
dc.description.abstract	Šajā darbā ir aprakstīta paralēlu korpusu novērtēšanas un tīrīšanas metode, kas automātiski spēj noteikt katra teikuma derīgumu pēc to vārdu sastatījumiem ar paralēlo teikumu. Vārdu sastatījumi teikumā apraksta vārdu atbilstību ar to pašu teikumu iztulkotu citā valodā. Ja tie ir daudz attiecībā pret vārdu daudzumu, tad var pieņemt, ka teikumi ir atbilstīgi. Pazīmju analīzei tiek izmantots mašīnmācīšanās algoritms, kas spēj uzbūvēt laba/slikta teikuma raksturojošu pazīmju modeli. Paralēlu tekstu korpusi ir plaši pielietoti mašīntulkošanas sistēmu izveidē. Tādējādi darbā izvirzīta hipotēze, ka sastatījumos balstīta korpusa novērtēšana un tīrīšana palīdz atbrīvoties no neprecīziem tulkojumiem un uzlabot mašīntulkošanas sistēmu kvalitāti.
dc.description.abstract	This thesis looks at a method for evaluation and cleaning of parallel corpora that can automatically determine the quality of each sentence from its word alignments with the parallel sentence. Word alignments show the word by word alignment of a sentence in one language to the same sentence translated in a different language. It can be presumed that if there are many alignments against the total number of words in the sentence, then the parallel sentences are good translations of each other. Machine learning is used to analyse the features extracted from word alignments. Parallel text corpora are widely used in machine translation. Therefore, the hypothesis of this thesis states that corpus evaluation and cleaning based on word alignments help to remove bad translations and improve a machine translation system.
dc.language.iso	lav
dc.publisher	Latvijas Universitāte
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Datorzinātne
dc.subject	mašīnmācīšanās
dc.subject	datorlingvistika
dc.subject	vārdu sastatījumi
dc.subject	mašīntulkošana
dc.subject	korpuss
dc.title	Mašīnmācīšanās metožu lietojums vārdu sastatīšanā balstītai paralēlo korpusu novērtēšanai un tīrīšanai
dc.title.alternative	Word alignment based parallel corpora evaluation and cleaning using machine learning techniques
dc.type	info:eu-repo/semantics/bachelorThesis

Files in this item

Name:: 302-57790-Zarina_Ieva_iz09005.pdf
Size:: 1.134Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses [5688]

Show simple item record