Automātiska teksta konspektēšana izmantojot jēdzientelpu

Pīrāgs, Reinholds

dc.contributor.advisor	Bārzdiņš, Guntis
dc.contributor.author	Pīrāgs, Reinholds
dc.contributor.other	Latvijas Universitāte. Datorikas fakultāte
dc.date.accessioned	2016-07-02T01:08:24Z
dc.date.available	2016-07-02T01:08:24Z
dc.date.issued	2016
dc.identifier.other	53328
dc.identifier.uri	https://dspace.lu.lv/dspace/handle/7/32221
dc.description.abstract	Šobrīd pasaulē ir vērojams milzīgs informācijas daudzuma pieaugums un ir arvien grūtāk iepazīties ar šo informāciju. Automātiskas teksta konspektēšanas mērķis ir spēt pārveidot lielu tekstuālas informācijas daudzumu īsākā formātā, kurš spēj saglabāt oriģinālā teksta svarīgāko informāciju. Viena no metodēm kā automātiski konspektēt tekstu ir izvēlēties svarīgākos teikumus no teksta. Mērķis ir izvēlēties teikumus tā, lai tajos esošā informācija savstarpēji nepārklājas, kā arī nosedz pietiekamu daļu no konspektējamā teksta. Lai to izdarītu ir jāsalīdzina teikumu ietvertās informācijas līdzīgums. Jēdzientelpa ir moderns rīks, ar kura palīdzību var noteikt vārdu nozīmi un līdzību ar citiem vārdiem. Šajā darbā tiek izveidota sistēma, kura automātiski konspektē tekstu izmantojot jēdzientelpas vektorus, lai mērītu teikumu informācijas saturu. Pēc tam iegūtie rezultāti tiek salīdzināti ar tradicionālo TF-IDF metodi. Jēdzientelpas vektoru metodes rezultāti ir labi, bet tie ir nedaudz zemāki par tradicionālās TF-IDF metodes rezultāti.
dc.description.abstract	Currently the world is experiencing a huge increase in the amount of information and it is getting harder and harder to process it. The goal of automated text summarisation is to transform large amounts of textual information into much shorter summary text, which maintains the main information from the original. One of the methods of automated summarisation is to select the most important sentences. The objective is to select sentences in a way that the information in the sentences does not overlap and covers enough of the original text content. To do that one must compare the similarity of sentences. Word embedding is a modern tool for determining the meaning of a word and similarity to other words. In this work a system for automated summarisation of text using word embeddings is created. The resulting summaries are compared to standard TF-IDF system summaries. The word embedding method results are good, but it fails to outperform the TF-IDF system.
dc.language.iso	lav
dc.publisher	Latvijas Universitāte
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Datorzinātne
dc.subject	ekstraktīva konspektēšana
dc.subject	jēdzientelpa
dc.subject	word2vec
dc.subject	TF-IDF
dc.title	Automātiska teksta konspektēšana izmantojot jēdzientelpu
dc.title.alternative	Automated Text Summarisation Using Word Embeddings
dc.type	info:eu-repo/semantics/masterThesis

Файлы в этом документе

Имя:: 302-53328-Pirags_Reinholds_rp0 ...
Размер:: 2.709Mb
Формат:: PDF

Открыть

Данный элемент включен в следующие коллекции

Bakalaura un maģistra darbi (DF) / Bachelor's and Master's theses [3177]

Показать сокращенную информацию