Automatizēta mūzikas transkripcija no vairāku kanālu skaņas

Jankevics, Dāvis

View/Open

302-100837-Jankevics_Davis_dj20033.pdf (864.6Kb)

Author

Jankevics, Dāvis

Co-author

Latvijas Universitāte. Datorikas fakultāte

Advisor

Lazovskis, Jānis

Date

2024

Metadata

Show full item record

Abstract

Šis darbs risina sarežģīto Automātiskās Mūzikas Transkripcijas (AMT) un Mūzikas Avotu Sadalīšanas (angliski Music Source Separation - MSS) integrācijas problēmu, lai uzlabotu polifoniskās mūzikas skaņas ierakstu pārrakstīšanas precizitāti muzikālā notācijā. Polifoniska mūzika, ko raksturo vairākas vienlaicīgas balsis vai instrumenti, rada ievērojamu izaicinājumu automātiskai transkripcijai. Bakalaura darbs izvērtē vairāku kanālu skaņas saderību ar automātiskās transkripcijas modeļiem. Šī darba ietvaros tika izstrādāts divvirzienu ilgtermiņa atmiņas modelis (angliski Bidirectional Long Short Term Memory - BiLSTM). Modeļa efektivitāte tika novērtēta, izmantojot klasiskās mūzikas skaņdarbu datu kopu, ko atskaņo gan solo instrumenti, gan instrumentu ansambļi un orķestri. Papildus, tika novērtēti jau esoši modeļi gan ar vairāku kanālu skaņu, ko veido Mūzikas Avotu Sadalīšanas modelis, gan bez tās. Modeļu sniegums tika novērtēts izmantojot precizitāti, atsaukumu (angliski recall) un F1 rezultātu (angliski F1 score). Rezultāti noraidīja hipotēzi, ka izmantojot AMT modeļa precizitātes rādītāji uzlabojās lietojot to kopā ar MSS sistēmu. Neskatoties uz rezultātiem, darbs uzsver potenciālās priekšrocības un svarīgumu turpmākai AMT sistēmu uzlabošanai. Papildus, šis darbs norāda vajadzību pēc precīzas MSS un AMT modeļu savienojamības, lielas datubāzes un modeļu datu pēcapstrādes uzlabošanas. Darba rezultāti un literatūras apkalpojums var tikt lietots kā pamats nākotnes pētījumiem, kas ir mērķēti MSS un AMT sadarbības uzlabošanai un datu pēcapstrādes un savienošanas uzlabošanai. Atslēgvārdi: Automātiskā Mūzikas Transkripcija (AMT), Mūzikas Avotu Sadalīšana (MSS), Divvirzienu Ilgtermiņa Atmiņa (BiLSTM), Polifoniska Mūzika.

This thesis addresses the complex challenge of Automatic Music Transcription (AMT) and its integration with Music Source Separation (MSS) to enhance the accuracy of transcribing polyphonic music into musical notation. Polyphonic music, characterized by multiple simultaneous voices or instruments, presents a significant challenge for transcription due to dense and overlapping sound. The thesis proposes that the use of MSS to separate the audio into separate multiple-channel sound before transcription can improve AMT accuracy. The research develops a Bidirectional Long Short-Term Memory (BiLSTM) model, aimed at capturing the nuances of polyphonic compositions. The effectiveness of the model was evaluated with a dataset comprised of classical music pieces, assessing its performance in terms of precision, recall, and F1 score, both independently and in conjunction with MSS. Additionally, the research included two additional models - MT3(Gardner et al., 2022) and Omnizart(Wu et al., 2021) - in the comparative analysis. This inclusion allowed for a more robust evaluation of the potential advantages of multiple-channel sound in AMT models. Results from the evaluation revealed that while the integration of MSS and AMT shows promise in theory, it did not outperform standalone AMT models in this instance. The study noted a decrease in performance metrics when MSS was applied, highlighting the need for further optimization and compatibility assessments between MSS and AMT technologies. Despite these conclusions, the study emphasizes the potential advantages of improving AMT systems, including expanding the range of musical compositions that can be digitally archived, improving accessibility to sheet music, and providing a theoretical foundation for the combination of MSS and AMT. The limitations encountered—such as the average performance of the model and the specific challenges of integrating MSS with AMT—point to areas for future exploration. These include the development of tailored MSS-AMT models, expanding and diversifying training datasets, and exploring advanced neural network architectures. Future work will aim at overcoming these challenges, with a particular focus on enhancing model architecture, optimizing data preparation and processing, and further exploring the synergies between MSS and AMT to achieve higher transcription accuracy. Keywords: Automatic Music Transcription (AMT), Music Source Separation (MSS), Bidirectional Long Short-Term Memory (BiLSTM), Polyphonic Music.

URI

https://dspace.lu.lv/dspace/handle/7/65597

Collections

Bakalaura un maģistra darbi (EZTF) / Bachelor's and Master's theses [6168]