N-gramu modeļi sabalansētam latviešu valodas teksta korpusam

Pole, Sandra

dc.contributor.advisor	Siņenko, Nadežda	en_US
dc.contributor.author	Pole, Sandra	en_US
dc.contributor.other	Latvijas Universitāte. Fizikas un matemātikas fakultāte	en_US
dc.date.accessioned	2015-03-24T08:01:46Z
dc.date.available	2015-03-24T08:01:46Z
dc.date.issued	2008	en_US
dc.identifier.other	9619	en_US
dc.identifier.uri	https://dspace.lu.lv/dspace/handle/7/21811
dc.description.abstract	Darbā aplūkoti un salīdzināti n-gramu visbiežāk lietotie valodas modeļi, kā arī noteikts vispiemērotākais modelis n-gramu varbūtību aprēķināšanai. Praktiskajā darba daļā tiek noteikti visbiežāk lietotie n-grami latviešu valodā (n=1, 2, 3), ņemot vērā, ka izmantotie teksta resursi ir sastādīti tā, lai teksts aptvertu visu latviešu valodu. Darbs sastāv no divām daļām un pielikuma ar izmantotajām programmām, pirmā daļa ir teorētiskais pamatojums katram modelim, un praktiskā daļa ir šo modeļu pielietojums izvēlētajam teksta failam. Nepieciešamā informācija no teksta failiem ir iegūta ar programmēšanas valodas Turbo Pascal Version 7.0 palīdzību, bet paši aprēķini veikti Microsoft Excel.	en_US
dc.description.abstract	In this work are compared the most used language models, and elect the best of these models for n-gram probability calculations. In practical part are shown the most used n-grams (n=1, 2, 3) in Latvian language, considering, that text corpus is built in such a way, that it covers all Latvian language. This bachelor thesis consists of two parts and appendix with used programs, first part is theoretical motivation for each model, and the other is practical these model usage for chosen text corpus. Necessary information from text corpus is computed with programming language Turbo Pascal Version 7.0, all other calculations are made in Microsoft Excel.	en_US
dc.language.iso	N/A	en_US
dc.publisher	Latvijas Universitāte	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Matemātika	en_US
dc.title	N-gramu modeļi sabalansētam latviešu valodas teksta korpusam	en_US
dc.title.alternative	N-gram models for integrated latvian language text corpus	en_US
dc.type	info:eu-repo/semantics/bachelorThesis	en_US

Files in this item

Name:: 304-9619-Lazukina_Sandra_Mate0 ...
Size:: 488.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Bakalaura un maģistra darbi (FMOF) / Bachelor's and Master's theses [2730]

Show simple item record