Music Document Layout Analysis through Machine Learning and Human Feedback

Size: px

Start display at page:

Download "Music Document Layout Analysis through Machine Learning and Human Feedback"

Clemence Nichols
5 years ago
Views:

1 Music Document Layout Analysis through Machine Learning and Human Feedback Jorge Calvo-Zaragoza, Ké Zhang, Zeyad Saleh, Gabriel Vigliensoni, Ichiro Fujinaga Schulich School of Music McGill University, Montréal (Canada) 12th IAPR International Workshop on Graphics Recognition (Nov 2017) 1/27

2 Introduction 2/27

3 Introduction I Music archives and libraries preserve music over the centuries I Large amounts of content in symbolic format are required for computational analysis I Manual transcription from source implies a high cost 3/27

4 Introduction I Music archives and libraries preserve music over the centuries I Large amounts of content in symbolic format are required for computational analysis I Manual transcription from source implies a high cost I Automatic transcription systems for ancient scores become valuable tools 3/27

5 Introduction Optical Music Recognition (OMR) I From score image to symbolic encoding 4/27

6 Introduction Optical Music Recognition (OMR) I From score image to symbolic encoding 4/27

7 Introduction Optical Music Recognition (OMR) I Several interdisciplinary steps Score image Document processing Symbol classification Music reconstruction Music encoding Symbolic score 5/27

8 Introduction I Most document-processing stages focus on content separation: 6/27

9 Introduction I Most document-processing stages focus on content separation: 6/27

10 Introduction I Most document-processing stages focus on content separation: 6/27

11 Introduction I Most document-processing stages focus on content separation: 6/27

12 Introduction I Poor generalization of hand-crafted strategies I Music documents have a high level of heterogeneity 7/27

13 Introduction Framework I Machine learning framework for music document processing I Regardless of the specific characteristics of the source I Detection of the di erent layers at the same time 8/27

14 Framework 9/27

15 Framework Pixelwise classification approach I Categorization of each pixel within the input image I Allows detecting small and thin elements present in music notation 10 / 27

16 Framework I Machine learning for avoiding hand-crafted procedures 11 / 27

17 Framework I Machine learning for avoiding hand-crafted procedures I We make use of Convolutional Neural Networks (CNN) I Great performance in image-related tasks I Good generalization 11 / 27

18 Framework Pixelwise classification I Straightforward approach: classify every single pixel of the input image I (x, y)!{background, sta line, symbol, text,...} 12 / 27

19 Framework Pixelwise classification I Ground-truth example 1 I One page 30 million pixels 1 Salzinnes Antiphonal manuscript (CDM-Hsmu M ) 13 / 27

20 Framework Pixelwise classification I CNN is provided with the surrounding region of the pixel to be classified 14 / 27

21 Framework Pixelwise classification I Estimation of a probability for each possible category 15 / 27

22 Framework Performance I Example over piece of test document Original Ground-truth Prediction Mislabeling 16 / 27

23 Framework Generalization I Relevant issue I How to approach a new archive? 17 / 27

24 Framework Generalization I Relevant issue I How to approach a new archive? I Human-aided workflow 17 / 27

25 Human-aided workflow 18 / 27

26 Human-aided workflow 19 / 27

27 Human-aided workflow Pixel.js I Web-based tool for pixel-level annotation 20 / 27

28 Human-aided workflow Pixel.js I Web-based tool for pixel-level annotation I More information at 14:00, stay tuned! 20 / 27

29 Human-aided workflow Preliminary user-centered evaluation I Labeling one whole page ( 24 million pixels) of a new document type I Reduction from 30 to 18 hours with the human-aided approach 21 / 27

30 Conclusions 22 / 27

31 Conclusions Summary I Generalizable music document analysis with machine learning I Human-aided workflow for a new type of document 23 / 27

32 Conclusions Future work I Integration with the rest of the OMR workflow I E cient strategies for the classification stage 24 / 27

33 Conclusions Future work I Integration with the rest of the OMR workflow I E cient strategies for the classification stage I Image-to-image pixel-wise classification 24 / 27

34 Conclusions Future work I Image-to-image pixelwise classification I Classify a whole region at the same time I Fully-Convolutional Neural Network I Similar accuracy but much higher e ciency 25 / 27

35 Thank you! 26 / 27

36 Music Document Layout Analysis through Machine Learning and Human Feedback Jorge Calvo-Zaragoza, Ké Zhang, Zeyad Saleh, Gabriel Vigliensoni, Ichiro Fujinaga Schulich School of Music McGill University, Montréal (Canada) 12th IAPR International Workshop on Graphics Recognition (Nov 2017) 27 / 27

Pixel.js: Web-based Pixel Classification Correction Platform for Ground Truth Creation

Pixel.js: Web-based Pixel Classification Correction Platform for Ground Truth Creation Zeyad Saleh, Ké Zhang, Jorge Calvo-Zaragoza, Gabriel Vigliensoni, and Ichiro Fujinaga Distributed Digital Music Archives