Statistiskās un datu izraces metodes klasifikācijas uzdevumos
Loading...
Date
Authors
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
Latvijas Universitāte
Language
N/A
Abstract
Pēdējos gados klasifikēcijas problemētika kļuvusi ļoti aktuāla lēmumu pieņemšanas
dažādās sfērās. Šo uzdevumu var risināt gan ar statistikas, gan ar datu izraces (angliski
data mining) palīdzību. Šī darba mērķis ir noskaidrot, vai datu izraces algoritmi spēj
konkurēt ar statistiskām metodām. Darbā ir aprakstīta lineāra diskriminantu analīze
(LDA),kodolu diskriminantu analīze (KDA), kā arī klasifikācijas koki (CRT) un vienslāņa
neironu tīkli (NNET). Statistiskajiem klasifikatoriem ir aprakstīts diskriminantu funkcijas
un diskriminācijas robežu iegūšanas process, datu izraces modeļiem ir apskatīti klasifikatoru
būvēšanas algoritmi, kā arī metodes, ar kuru palīdzību var izvairīties no pārliekas
pielāgošanās datiem. Darba nobeigumā ir apskatīti modeļu salīdzināšanas paņēmieni. Lai
empīriski salīdzinātu klasiskās metodes ar datu izraces metodēm, tika veiktas simulācijas
programmā R.
Atslēgas vārdi: datu izrace, klasifikators, diskriminantu analīze, klasifikācijas koki,
neironu tīkli, kopējā precizitāte.
In recent years classification problem has become a topical question in different field of decision making. Such kind of tasks can be solved using both statistical and data mining techniques. The goal of this thesis is to elucidate whether the data mining algorithms can be considered as competitors of statistical methods. Linear and kernel discriminant analysis, classification trees and neural networks are described in the thesis. It is explained how to get discriminant function and discrimination borders for statistical techniques and how to construct data mining classifiers avoiding unnecessary adaptation to data. Finally, model assessment and selection are discussed. The thesis contains empirical comparison of the classical statistical techniques and data mining algorithms in terms of simulated examples. Simulations were fulfilled, using statistical software R. Key words: data mining, classifier, discriminant analysis,classification trees, neural networks, overall accuracy.
In recent years classification problem has become a topical question in different field of decision making. Such kind of tasks can be solved using both statistical and data mining techniques. The goal of this thesis is to elucidate whether the data mining algorithms can be considered as competitors of statistical methods. Linear and kernel discriminant analysis, classification trees and neural networks are described in the thesis. It is explained how to get discriminant function and discrimination borders for statistical techniques and how to construct data mining classifiers avoiding unnecessary adaptation to data. Finally, model assessment and selection are discussed. The thesis contains empirical comparison of the classical statistical techniques and data mining algorithms in terms of simulated examples. Simulations were fulfilled, using statistical software R. Key words: data mining, classifier, discriminant analysis,classification trees, neural networks, overall accuracy.