Gradientu pastiprināšanas algoritmu salīdzinājums un mainīgo būtiskuma analīze

Lazareva, Lana

View/Open

304-77061-Lazareva_Lana_ll16043.pdf (1.741Mb)

Author

Lazareva, Lana

Co-author

Latvijas Universitāte. Fizikas, matemātikas un optometrijas fakultāte

Advisor

Valeinis, Jānis

Date

2020

Metadata

Show full item record

Abstract

Šajā darbā tiek izpētīti un salīdzināti trīs no jaunākiem un plaši izmantotiem gradienta pastiprināšanas algoritmiem - XGBoost, LightGBM un CatBoost. Šie algoritmi tiek salīdzināti pēc to ātrdarbības, kā arī tendences uz pārpielāgošānos treniņa datiem. Tiek analizēta arī šo algoritmu spēja izmantot modelēšanā kategoriskus mainīgos. Papildus tiek izpētīti algoritmu hiperparametri un to ietekme uz algoritma pārpielāgošanos un modeļa precizitātes rādītājiem. Balstoties uz rezultātiem, tiek sniegti ieteikumi par hiperparametru skaņošanu. Otrajā eksperimenta daļā tiek izpētīta uz spēļu teorijas Šaplī vērtību balstītā metode mainīgo būtiskuma noteikšanai. Iegūtiem rādītājiem ar būtstrapa metodes palīdzību tiek noteikti ticamības intervāli.

This work covers studies on three of the newest and commonly used gradient boosting algorithm implementations: XGBoost, LightGBM and CatBoost. Algorithms are studied and compared by their training speed, precision and overfitting properties. Algorithm common hyperparameter effect on these metrics is analysed and compared as well. Based on the results, some recommendations are given for hyperparameter tuning. The second part provides analysis on feature importance estimation method based on game theory's Shapley values. Confidence intervals are then determined for acquired importances, using bootstrap resampling method.

URI

https://dspace.lu.lv/dspace/handle/7/51855

Collections

Bakalaura un maģistra darbi (FMOF) / Bachelor's and Master's theses [2775]