Dzimtes stereotipu mazināšana mašīntulkošanā: mērķvalodas gramatiskās dzimtes projekcijas avotvalodā

Stafanovičs, Artūrs

View/Open

302-76132-Stafanovics_Arturs_as16173.pdf (680.6Kb)

Author

Stafanovičs, Artūrs

Co-author

Latvijas Universitāte. Datorikas fakultāte

Advisor

Bergmanis, Toms

Date

2020

Metadata

Show full item record

Abstract

Tulkojot “The secretary asked for details.” no valodas bez gramatiskās dzimtes (angļu) uz valodu ar gramatisko dzimti (piemēram, latviešu), nepieciešams noteikt subjekta “the secretary” dzimumu. Ja teikuma līmenī nav nepieciešamās informācijas, to ne vienmēr ir iespējams izdarīt. Šādos gadījumos mašīntulkošanas sistēmas izvēlās biežāk sastopamos un līdz ar to dzimtes stereotipiem atbilstošos tulkojumu variantus (t.i. “sekretāre”). Darbā tiek piedāvāta mašīntulkošanas sistēmu apmācības metode, kur avotvalodas vārdiem ir tieši norādīta dzimtes pazīme (sieviešu vai vīriešu). Sistēmu apmācības dati tiek sagatavoti, projicējot mērķvalodas vārdu gramatiskās dzimtes uz avotvalodas vārdiem. Rezultāti uzrāda uzlabojumus tulkošanas kvalitātē līdz 4,6 BLEU punktiem un mazina sistēmu paļaušanos uz dzimumu stereotipiem, uzlabojot akurātumu līdz 32,9% WinoMT uzdevumā.

When translating “The secretary asked for details.” from language without grammatical gender (English) to a language with grammatical gender (e.g. Latvian), it is necessary to determine the gender of the subject “the secretary”. If the sentence does not contain the necessary information, it is not always possible to do so. In such cases, the machine translation systems choose the most common translation options, which correspond to the stereotypical (i.e. “sekretāre” female grammatical gender) translations. This work presents a training method for machine translation systems, where source language words have an explicit gender mark (female or male). Systems training data is prepared by projecting the grammatical gender of the target language words onto the source language words. The results show improvements in the quality of translation by up to 4,6 BLEU points and reduce systems' reliance on gender stereotypes, by improving accuracy up to 32,9% in WinoMT challenge set.

URI

https://dspace.lu.lv/dspace/handle/7/50789

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses [5688]