Datu ievākšanas un lielo valodas modeļu pielietojums ziņu rakstu ģenerēšanā

Burkēvičs, Artūrs

Öffnen

302-102997-Burkevics_Arturs_ab20153.pdf (359.7Kb)

Autor

Burkēvičs, Artūrs

Co-author

Latvijas Universitāte. Datorikas fakultāte

Advisor

Zviedris, Reinholds

Datum

2024

Metadata

Zur Langanzeige

Zusammenfassung

Bakalaura darbā tiek aplūkotas nepieciešamās tehnoloģijas un izstrādāti prototipi sistēmai, kas liela valodas modeļa ģenerētos ziņu rakstos cenšas paredzēt nākotnes notikumus. Darbā ir divas galvenās daļas. Pirmajā tiek aplūkotas metodes, kā ievākt ziņu rakstu saturus no iepriekš izvēlēta ziņu portālu klāsta, kā arī Python vidē izstrādāts prototips, kas cenšas izpildīt šo uzdevumu gan gadījumā, kad portālam pieejams RSS kanāls, gan gadījumā, kad RSS kanāla nav. Otrajā daļā ir pētīts, kā ievāktus un sašķirotus ziņu rakstus var iedot lielajam valodas modelim jaunu rakstu ģenerēšanā, kas cenšas paredzēt nākotnes uzdevumus. Darba mērķis ir apskatīt datu ievākšanas metodes, iepazīt, kā lielos valodas modeļus darbināt lokāli, kā arī aptaujas ceļā izvērtēt, cik ļoti ģenerētie raksti līdzinās kaut kam, ko varētu būt rakstījis cilvēks.

The bachelor thesis examines the necessary technologies and prototypes for a system that tries to predict future events in news articles generated by a large language model. The thesis has two main parts. The first one deals with methods to collect news articles from a pre-selected set of news portals and a prototype developed in Python that tries to perform this task both when the portal has an RSS feed and when there is no RSS feed. The second part investigates how the collected and sorted news articles can be fed into a large language model for the generation of new articles that tries to anticipate future tasks. The aim of the thesis is to look at data collection techniques, to learn how to run large language models locally, and to assess through a survey how closely the generated articles resemble something that could have been written by a human.

URI

https://dspace.lu.lv/dspace/handle/7/66152

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses [5688]