论文标题
使用手机的手机图片检索MIDI段落
Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages
论文作者
论文摘要
本文调查了一个跨模式检索问题,用户希望通过拍摄几行乐曲音乐的手机图片来从MIDI文件中检索音乐段落。这个问题具有挑战性,有两个原因:由于它是一个面向用户的应用程序,它具有重大的运行时限制,并且几乎没有包含单台乐谱手机图像的相关培训数据。为了解决这个问题,我们介绍了一个名为Bootleg Score的新型功能表示,该表演与乐谱音乐中的员工线条相对于工作人员线的位置进行了编码。可以使用西方音乐符号的确定性规则将MIDI表示形式转换为盗版得分,并且可以使用经典的计算机视觉技术将乐谱图像转换为盗版得分,以检测简单的几何形状。一旦将MIDI和手机图像转换为盗版得分,我们就可以使用动态编程估算对齐方式。我们系统最引人注目的特征是它根本没有可训练的权重 - 只有大约40个超级参数。借助仅400张图像的训练集,我们表明我们的系统可以很好地推广到160个看不见的音乐分数中的1600张测试图像。我们的系统实现了0.89的测试F度量得分,平均运行时间为0.90秒,并且基于音乐对象检测和表格原告对齐的基线系统的表现优于基线系统。我们提供了对系统的广泛实验验证和分析。
This article investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of several lines of sheet music. This problem is challenging for two reasons: it has a significant runtime constraint since it is a user-facing application, and there is very little relevant training data containing cell phone images of sheet music. To solve this problem, we introduce a novel feature representation called a bootleg score which encodes the position of noteheads relative to staff lines in sheet music. The MIDI representation can be converted into a bootleg score using deterministic rules of Western musical notation, and the sheet music image can be converted into a bootleg score using classical computer vision techniques for detecting simple geometrical shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we can estimate the alignment using dynamic programming. The most notable characteristic of our system is that it has no trainable weights at all -- only a set of about 40 hyperparameters. With a training set of just 400 images, we show that our system generalizes well to a much larger set of 1600 test images from 160 unseen musical scores. Our system achieves a test F measure score of 0.89, has an average runtime of 0.90 seconds, and outperforms baseline systems based on music object detection and sheet-audio alignment. We provide extensive experimental validation and analysis of our system.