Atbalsta vektoru metodes izmantošana teksta klasifikācijai

Dzalbe, Kristīne

View/Open

304-60695-Dzalbe_Kristine_kd11016.pdf (394.1Kb)

Author

Dzalbe, Kristīne

Co-author

Latvijas Universitāte. Fizikas un matemātikas fakultāte

Advisor

Grūzītis, Normunds

Date

2017

Metadata

Show full item record

Abstract

Mūsdienās arvien pieaug uzglabātās informācijas un datu apjoms. Daudz informācijas tiek uzkrāts teksta dokumentos, kas lielākoties tiek uzglabāti nestrukturētā veidā. Maģistra darba mērķis ir iepazīties ar teksta klasifikācijas problemātiku un izpētīt dažādas, biežāk lietotās mašīnmācīšanās metodes, ko izmanto šī uzdevuma atrisināšanai. Tāpat darbā apskatītas metodes teksta datu dimensiju skaita samazināšanai. Darba gaitā veikta angļu valodas datu klasifikācija atbilstoši tēmām, izmantojot ”The New York Times” ziņu virsrakstu datus. Veikta arī latviešu ziņu portālu komentāru klasifikācija agresīvos un neagresīvos komentāros. Abām datu kopām klasifikācija veikta, izmantojot atbalsta vektoru metodi, klasifikācijas kokus un gadījuma mežus. Labākie rezultāti sasniegti ar atbalsta vektoru metodi.

Nowadays amount of stored information and data increase exponentially. Besides, a lot of this information is accumulated in textual data. Those text data usually are stored in unstructured way. The aim of this paper is to investigate the problem of text classification and explore most frequently used machine learning methods for this task. Moreover, different dimensionality reduction techniques for textual data are investigated in this paper. To reach the goal of the thesis two different data sources are used: English headlines from “The New York Times” and Latvian comments of Latvian news portals. Three classifiers are employed: support vector machines, decision trees and random forests, however, the best classification accuracy are achieved with support vector machines.

URI

https://dspace.lu.lv/dspace/handle/7/36496

Collections

Bakalaura un maģistra darbi (FMOF) / Bachelor's and Master's theses [2775]