An Empirical Evaluation of End-to-End Polyphonic Optical Music Recognition

Sachinda Edirisooriya; Hao-Wen Dong; Julian McAuley; Taylor Berg-Kirkpatrick

エンドツーエンドのポリフォニック光学音楽認識の経験的評価

以前の研究では、ニューラルアーキテクチャがモノフォニックおよびホモフォニック音楽に対して高精度で光学音楽認識（OMR）を実行できることが示されています。ただし、ピアノとオーケストラのスコアは、しばしばポリフォニックパッセージを示し、タスクに2番目の次元を追加します。モノフォニックおよびホモフォニック音楽は、ホモリズム、または単一の音楽リズムを持っていると説明することができます。一方、ポリフォニック音楽は、複数のリズミカルなシーケンスまたは音声を同時に持っていると見なすことができます。まず、MuseScoreフォーラムで公開されている楽譜からエンドツーエンドの認識に適した大規模なポリフォニックデータセットを作成するためのワークフローを紹介します。次に、エンドツーエンドのポリフォニックOMRの2つの新しい定式化を提案します。1つは問題を一種のマルチタスクバイナリ分類として扱い、もう1つはマルチシーケンス検出として扱います。エンコーダーデコーダーアーキテクチャとエンドツーエンドOMRに関する過去の研究で提案された画像エンコーダーに基づいて、2つの定式化に対応する2つの新しいデコーダーモデル（FlagDecoderとRNNDecoder）を提案します。最後に、ポリフォニックOMRに対するこれらのエンドツーエンドのアプローチの経験的パフォーマンスを比較し、マルチシーケンス検出デコーダーRNNDecoderを使用して新しい最先端のパフォーマンスを観察します。

Previous work has shown that neural architectures are able to perform optical music recognition (OMR) on monophonic and homophonic music with high accuracy. However, piano and orchestral scores frequently exhibit polyphonic passages, which add a second dimension to the task. Monophonic and homophonic music can be described as homorhythmic, or having a single musical rhythm. Polyphonic music, on the other hand, can be seen as having multiple rhythmic sequences, or voices, concurrently. We first introduce a workflow for creating large-scale polyphonic datasets suitable for end-to-end recognition from sheet music publicly available on the MuseScore forum. We then propose two novel formulations for end-to-end polyphonic OMR -- one treating the problem as a type of multi-task binary classification, and the other treating it as multi-sequence detection. Building upon the encoder-decoder architecture and an image encoder proposed in past work on end-to-end OMR, we propose two novel decoder models -- FlagDecoder and RNNDecoder -- that correspond to the two formulations. Finally, we compare the empirical performance of these end-to-end approaches to polyphonic OMR and observe a new state-of-the-art performance with our multi-sequence detection decoder, RNNDecoder.

updated: Tue Aug 03 2021 22:04:40 GMT+0000 (UTC)

published: Tue Aug 03 2021 22:04:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト