Projektu pārvaldības optimizēšana surogātpasta un e-pasta klasifikācijas mašīnmācībai
Loading...
Date
Authors
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
Latvijas Universitāte
Language
lav
Abstract
E-pasts (elektroniskais pasts) ir pasaulē visvairāk izmantotā saziņas platforma starp lietotājiem, izmantojot dažādas ierīces. Tas ir tāpēc, ka to ir ļoti viegli lietot un tā ir ātrāka nekā citas saziņas platformas. Mūsdienu pasaulē surogātpasta e-pasta aktivitāšu skaits pieaug ar katru dienu, un katru dienu parādās daudz šādu gadījumu. Pašreizējā situācijā pēc Covid-19 ir reģistrēti vairāk nekā 18 miljoni krāpniecisku e-pasta aktivitāšu, un šo surogātpasta e-pastu dēļ pieaug arī personas informācijas zādzības un pikšķerēšanas aktivitāšu risks. Turklāt, sūtot e-pastu, sūtītājam nav garantijas, ka viņa/viņas e-pasts nonāks saņēmēja parastajā (galvenajā) iesūtnē vai surogātpasta mapē. Tas var palēnināt visu saziņas procesu vai dažkārt palikt bez uzraudzības. Šī projekta mērķis ir izmantot un optimizēt projektu vadības principus mašīnmācīšanās jomā, kā arī apspriest un aprakstīt, kā mašīnmācīšanās algoritms palīdzēs atrisināt šo surogātpasta e-pasta klasifikācijas problēmu. Šajā projektā dažādu mašīnmācīšanās algoritmu apmācības nolūkos tiek izmantoti dažādi datu kopumi, lai izvēlētos, kurš no tiem vislabāk darbosies šāda veida teksta klasifikācijai. Pēc apmācības algoritms izmanto bināros klasifikatorus, lai kategorizētu e-pastus divās dažādās kategorijās (surogātpasta e-pasti un e-pasti, kas nav surogātpasta e-pasti). Galīgais algoritms prognozēs surogātpasta teksta procentuālo daļu e-pastā un to, vai šis e-pasts nonāks surogātpasta mapē vai saņēmēja e-pasta galvenās iesūtnes mapē. Algoritms arī identificēs un norādīs kļūdaino tekstu e-pastā, ja e-pasts jau ir klasificēts kā surogātpasta e-pasts.
Email (electronic mail) is the world’s most used communication platform between users through different devices. This is because it is very easy to use and quicker than other communication platforms. In today’s world the amount spam email activities are increasing day by day and a lot of cases are coming every single day. As in the current situation of covid-19 more than 18 million scam email activities are raised and because of these spam emails the risk of stealing the personal information and phishing activities is increasing as well. Also, while sending an email the sender is not guaranteed that his/her email will land in the normal (Primary) inbox of the receiver or in the spam folder. Which can make the whole process of communication slow or sometime unattended. The aim of this project is to use and optimize principals of project management in machine learning field and discuss and describe how the machine learning algorithm will help in solving this problem of spam email classification. This project uses different datasets for training purposes of different machine learning algorithms and choose which one will work best for this type of text classification. After training, the algorithm uses binary classifiers to categorize the emails into two different categories (spam email and non spam emails). The finalized algorithm will predict the percentage of spam text present inside email and will predict whether this email will land in the spam folder or in the Primary inbox folder of the receiver email. The algorithm will also identify and provide the faulty text present inside the email if the email is classified as spam email already.
Email (electronic mail) is the world’s most used communication platform between users through different devices. This is because it is very easy to use and quicker than other communication platforms. In today’s world the amount spam email activities are increasing day by day and a lot of cases are coming every single day. As in the current situation of covid-19 more than 18 million scam email activities are raised and because of these spam emails the risk of stealing the personal information and phishing activities is increasing as well. Also, while sending an email the sender is not guaranteed that his/her email will land in the normal (Primary) inbox of the receiver or in the spam folder. Which can make the whole process of communication slow or sometime unattended. The aim of this project is to use and optimize principals of project management in machine learning field and discuss and describe how the machine learning algorithm will help in solving this problem of spam email classification. This project uses different datasets for training purposes of different machine learning algorithms and choose which one will work best for this type of text classification. After training, the algorithm uses binary classifiers to categorize the emails into two different categories (spam email and non spam emails). The finalized algorithm will predict the percentage of spam text present inside email and will predict whether this email will land in the spam folder or in the Primary inbox folder of the receiver email. The algorithm will also identify and provide the faulty text present inside the email if the email is classified as spam email already.