Optical Music Recognition using Hidden Markov Models

Size: px

Start display at page:

Download "Optical Music Recognition using Hidden Markov Models"

Mavis Park
5 years ago
Views:

1 Optical Music Recognition using Hidden Markov Models Natalie Wilkinson April 25, Introduction Optical Music Recognition software (OMR) is software that has the ability to read sheet music. This type of software can be very useful as it can convert scanned music into a digital form that directly corresponds to music instead of having to save scanned music as an image. Some OMRs also have the ability to create a sound file from a scanned sheet of music, without requiring a trained musician to play the piece. Thus this type of software can be incredibly useful for students learning to read sheet music. Optical Character Recognition software (OCR) is a similar but much more commonplace software that converts images of text into a form that can be edited, such as a text file or a pdf. OCR software heavily influences OMR software, however there are different challenges for each type of software. Despite this, understanding OCR algorithms can be beneficial for implementing OMRs. There are many different ways OMRs can be implemented, and each implementation has its own strengths and weaknesses. The goal of this research was to investigate a particular mechanism that is used for the identification process of OMRs called hidden Markov models (HMM). HMMs are a mechanism that are trained by showing examples of items to be identified. Previous implementations of HMMs in OCRs for handwriting have been shown to be very successful [1]. Since sheet music appears to be of the same difficulty as handwritten text, it was assumed that HMMs in OMRs would be just as successful. However, previous implementations of OMRs using HMMs have been unsatisfactory compared to the other algorithms currently in use [12]. In this paper, we investigate whether an increase in the amount of training an HMM receives, could improve the accuracy of HMMs and hopefully provide a more useful mechanism for OMRs. 2 Related Work The research into OMRs started in the 1960s at MIT. They created a rudimentary OMR that could handle chords but not rests, clefs or time signatures. The next OMR that was created, in the 1970s, was based upon MIT s OMR but used a heuristic algorithm and could handle all basic symbols but could not parse chords. In the 1980s, a full OMR robot was built that could read sheet music placed in front of it and then play the corresponding piece on an organ [2]. The first attempt to handle handwritten music occurred in the early 1990s 1

2 End State S 0 S 1... S n S 0 p 00 p p 0n S 1 p 10 p p 1n S n p n0 p n1... p nn Observation E 0 E 1... E m S 0 p 00 p p 0m S 1 p 10 p p 1m S n p n0 p n1... p nm Start State State State S 0 S 1... S n [ p 0 p 1... p n ] Starting Transitional Conditional Figure 1: The matrices of a HMM [9]. In 2001, Bainbridge [2] summarized the challenges of Optical Music Recognition and provided a detailed history of OMR work up to that point. However, there are still many challenges in OMR leading to numerous open problems [9] [13]. As mentioned previously, OMRs were heavily influenced by OCRs. Thus it is worthwhile to mention the various mechanisms that are used in OCRs: pattern matching, nearest neighbors, and hidden Markov models [1]. For each of these mechanisms, the image storing the text has to first be segmented into the various individual characters. Then the identification mechanism can be applied on the image of a character. Pattern matching directly compares the image to others stored in a database, and works best with typewritten text of similar fonts. Nearest neighbors and hidden Markov models compare features computed from the image against features representing the characters and choose the best match. The performance of hidden Markov models tend to excel when used for handwritten text, compare to other mechanisms [1]. Hidden Markov models are a mechanism that creates models using a type of algorithm known as machine learning. This type of algorithm makes predictions about a data set, given prior knowledge of similar data sets. In a broad sense, machine learning can be compared to human learning, in that we make observations of our surroundings and learn from it, allowing us to make relatively accurate predictions about future events. In computer science, machine learning is typically implemented by first training a model on a set of data, then using the model(s) to correctly interpret similar data sets. It is in this way that hidden Markov Models are applied to OMRs. More specifically, hidden Markov models are probabilistic models used on systems with unobservable or hidden properties, which we refer to as states. In the system it is assumed that the hidden states can be estimated using probabilities learned from observations of the system. In the case of OMRs, an HMM attempts to model the sequence of states for a given musical symbol (whole note, quarter note, rest, etc.), based off of the features of the musical symbol (called observations) [10] [11]. The sequence of hidden states differs for each symbol, but the number of states will remain constant throughout. Digitally, an HMM is composed of three probability matrices, as shown in Figure 1. The first matrix holds the state-transition probabilities, which is of size n n, with n being the number of states. These probabilities indicate the likelihood of transitioning from one hidden state S i to another hidden state S j. The second matrix holds the conditional probabilities, 2

3 Symbol # of Symbols Neural Network Nearest Neighbour Support Vector Machines Hidden Markov Models Quarter Rest 63 85% 100% 100% 83% Treble Clef % 100% 100% 58% Flat % 100% 99% 96% Sharp 13 97% 100% 100% 99% Natural % 100% 100% 95% Figure 2: Comparison Data which is of size n m, with m being the number of possible observations. These probabilities indicate the likelihood of being in a hidden state S i, given a certain observation E j. The third matrix holds the initial-state probabilities, of size n, with probabilities indicating the likelihood of starting in a hidden state S i. The paper written by A. Rebelo et al. [12] is a comparative study of the performance of four of the main OMR algorithms on both handwritten and printed music symbols. Performance of the algorithms fluctuated based on which type of symbols were being tested, however their study found that the performance of HMMs for both handwritten and printed sheet music was lower than the other three algorithms. Their results are shown in Figure 2. This was a surprising result considering the algorithm s performance in OCRs on handwritten text, as stated by Arica and Yarman-Vural [1]. It was expected that HMMs would have comparable performance to the other algorithms for OMRs, since it outperforms them in OCRs on handwritten text. Due to this, A. Rebelo et al. listed hidden Markov models as an open problem to be investigated further. 3 Methods Most OMRs generally process sheet music in a series of stages. They first implement a pre-processing stage that prepares the sheet music for segmentation. Then they segment the image into the various musical symbols, and finally identify the musical symbols. Once that has been accomplished, they can then reconstruct the piece of music in a digital form. The whole process is fully described in the paper by Bainbridge and Bell [2], as well in the paper by Rebelo [13]. Our research began with an attempt to create a simplistic OMR in which to implement the HMM algorithm. As our hypothesis dealt with an increase in training data, we first needed to create the pre-processing and segmentation step of the OMR to allow us to generate a larger set of training data. Once this was achieved, we then created the identification step of the OMR. Our implementation of the simplistic OMR is described in the following sections. Figure 3 shows the general structure of the created software. Observation Extraction deals with the beginning of the identification step, creating observations for each image. Expectation Maximization is the algorithm used for creation of the HMMs, while Viterbi is the algorithm used for testing of the HMMs. The full identification step would only use Viterbi, once the 3

4 Figure 3: Basic OMR Structure HMMs were created. 3.1 Pre-Processing OMRs do not always need a pre-processing stage, however aligning and removal of staff lines eased the segmentation process by ensuring that most musical symbols would be surrounded by white space, and arranged in a line. Other OMRs have included the staff lines, which then become part of the model for each symbol. However, since the limited training data acquired from Rebelo et al. [12] did not include staff lines, we removed staff lines so that the training data we generated would be consistent with the acquired data. A piece of sheet music in Western music notation consists of multiple grand staves per page. These grand staves are denoted by curly braces at the beginning of sections of staff lines. Each grand stave is made up of two staves, each containing five staff lines. Notes and other symbols are placed among staff lines, where staff lines indicate the pitch of the note. The beginning of the stave contains the clef symbol as well as the time signature. This is illustrated in Figure 4. To ease the segmentation process of the OMR, we aligned the staff lines by musical staves. An alternate approach would require keeping track of the top and bottom of each grand stave, as well as the midpoint of the grand stave, and incorporating the values into the process for segmenting symbols. Thus it was simpler to find these values and then use them to split up the sheet music by staves. This was done by creating a horizontal histogram of the black pixels in an image as shown in Figure 4 and 5, and using the histogram to find the first instance with no black pixels in the horizontal direction above and below grand staves. Then the grand staves were split up by calculating the midpoint between the top and bottom of the staves. Once the staves were split, they were then aligned by staff line in the form show in Figure 6 Removal of the staff lines is not necessary, however as mentioned previously, this ensured that our segmented images would match our acquired data set. It also eases the segmentation process by ensuring that there is vertical white space between every individual symbol. The removal of staff lines was accomplished at the same time as aligning the grand staves. The peaks in the horizontal histogram corresponded to staff line locations. Thus during aligning, the lines which corresponded to staff lines were neglected, leading to an aligned piece of sheet music with staff lines removed, as shown in Figure 6. 4

5 Figure 4: Sample Sheet Music Figure 5: Sample staff line histogram Figure 6: Aligned sheet music with staff lines removed 5

This was done because segmentation simplified the feature extraction step, and created images for each musical symbol consistent with our acquired data.

6 Figure 7: Segmentation of Beam 3.2 Segmentation Although there were other techniques that could be used in place of segmentation, such as implementing a sliding window [10], we decided to use segmentation. This was done because segmentation simplified the feature extraction step, and created images for each musical symbol consistent with our acquired data. Creating an image for each musical symbol on a given piece of sheet music was beneficial since afterwards all images of a specific symbol could then be used to train an HMM for that symbol. The implementation of segmentation was done in multiple steps. First the program was given the pre-processed image, of the form shown in Figure 6. A vertical histogram of the number of black pixels was then created for the image, which was used to segment the image into sections of symbols by identifying white space. Musical symbols such as beams required multiple passes of the segmentation due to the beam interfering with the simple white space based segmentation. Using the initial vertical histogram of the aligned image, sections of symbols were segmented by splitting the image where the vertical histogram had a value of 0. In the pre-processing stage, the values n and d were found where n was the height of a single staff line and d was the height between two adjacent staff lines. For each symbol, we set w to be the width of a symbol and if w < 2d, then the symbol was classified as a single musical symbol. Otherwise, the symbol was classified as a beam and required additional segmentation [7]. This condition is valid because of the structure of Western music notation. No single symbol is wider than two staff line spaces To break up a beam into the individual symbols, an additional vertical histogram was created and is shown in Figure 7. This histogram was created by computing the maximum number of adjacent black pixels for that column, whose value was between 2n and 2n + d. This ensured that only beams and notes were counted in the histogram, excluding note stems. Beams were then removed from the histogram by average the histogram values and setting to 0 any column in the histogram whose value was less than the average. This process ensured that the beams themselves were not classified as an individual symbol, and allowed the musical symbols to be extracted out of the beam [5]. 6

7 3.3 Creation of Training Data Since our program could already segment an individual piece of sheet music into the subsequent symbols, the creation of the training data involved acquiring full pages of sheet music that could then be converted into the symbols required for the testing data. To do so, we simply had to run the pre-processing and segmentation stages of the program on the acquired sheet music images. Because our program removed the staff lines using a histogram method, all sheet music acquired for creation of the training data was required to be completely level, ensuring that a single staff line remained on the same row of pixels for the entire width of the image. Thus older pieces of scanned music that were tilted or fuzzy could not be used. All words that appeared in the music, such as verses, were removed using GIMP, a free open source alternative to photoshop, to ensure that only musical symbols would be segmented. Changes to the pre-processing stage of the OMR could straighten staff lines, remove blur, and potentially remove words, however this outside the scope of this research. Once the appropriate sheet music was acquired, the segmentation component of our OMR program was used to segment the sheet music into the individual musical symbols. Due to the nature of sheet music, every page of sheet music was segmented into hundreds of images of musical symbols. While each symbol of the same type could be similar to other symbols of the same type, due to their different positioning in the sheet music variation did occur. This was mostly due to the pre-processing stage which potentially removed sections of an image which overlapped with staff lines. Variation could also occur due to parts of the symbol image overlapping with other neighboring symbols. Because of the method in which segmentation created the images, these symbols were not sorted by type of musical symbol, which was needed for the training of the HMMs. Thus a manual sorting of all 18,000 images segmented from the 30 unique pages of sheet music was required. 3.4 Observation Extraction Before training of an HMM can begin, observation sequences had to be created for every musical symbol. These sequences contained a series of integer observations in the range of 0 to m, corresponding to the observations in the conditional probability matrix as shown in Figure 1. To do so, features had to first be extracted from the image, creating feature vectors, and then these vectors were combined and normalized to create the sequences of observations. Features were extracted from the image using a sliding window, with a width of 2 pixels [10]. For each window, six features were computed and normalized to a value between 0 and 1. These features were chosen based off the work by Pugin [10]. Feature 1 corresponds to 1 the number of distinct connected components of black pixels, and is computed using 1+n where n is the number of distinct connect black pixels. Features 2 and 3 calculate c x and c y, the gravity centers of x and y respectively. For each equation let w = width of window, h = height of window, and for each distinct black zone i, c i x, and c i y correspond to its x and y gravity center, and a i corresponds to its area. The computations for these features are shown in equations 1 and 2 respectively. 7

8 n i=1 c x = ci x a i A w n i=1 c y = ci y a i A h (1) (2) Feature 4 corresponds to the area of the largest black element, computed using a(n i) where S a(n i ) is the area of the largest black element, and S is the area of the window. Feature 5 corresponds to the area of the smallest white element, computed using a(n j) where a(n S j ) is the are of the smallest white element. The final feature corresponds to the total area of the black elements. Since Pugin s work kept staff lines in the images, to accurately determine the total area of the black elements they introduced a weighting mask. Since we removed staff lines from images, we simply computed this value using the equation A where A is the S total area of the black elements. After feature extraction for each window, the 6 features were converted to an integer in the range 0 to m. This was done by averaging the 6 features. Then, the minimum and maximum averaged value was computed across all the windows. This allowed us to create bins for every number in the range 0 to 6, and then place each averaged value in the appropriate bin, scaling them to the appropriate integer observation. 3.5 Expectation Maximization Creating and training a HMM requires the use of the Expectation Maximization (EM) Algorithm. First the type of HMM model, and the initial estimates of the HMM model must be chosen. Different types of models are described in detail in the paper by Chen [3]. Based off of Pugin s work [10], as well as work by Mohamed [8] in handwritten word recognition, we decided to use left-right HMMs. In these, transitions from hidden states can only occur from state i to state j where j >= i. Initial estimates were chosen to be uniform, as these have been shown to work well [11]. Once the model is chosen, the parameters of the HMMs have to be determined. The EM algorithm makes makes use of the forward-backward algorithm, to compute the expectation. We will begin with an explanation of the forward-backward algorithm Expectation In the forward-backward algorithm, the conditional probabilities are estimated given a sequence of observations. It does this in two passes. First, it computes the forward probabilities, which is the likelihood of being in a state given the first k observations. Second, it computes the backward probabilities, which is the likelihood of being in a state and seeing the remaining observations from the k th observation onward. Figure 8 illustrates the process of computing the forward probabilities for an HMM with 2 states. For each node S i, α it is the probabilities of being in S i having observed t observations. For each transition, p jit is computed from the probability of being in state i given observation t and from the probabilities of transitioning from S j to S i, where s stands for Start. The backward probabilities are computed the same 8

9 α 00 = p s00 S 0 p 001 = P (S 0 S 0 ) P (e 1 S 0 ) S 0 α 01 = α 00 p 001 +α 10 p p s00 = P (S 0 Start) P (e 0 S 0 ) Start p s10 = P (S 1 Start) P (e 0 S 1 ) p 101 = P (S 0 S 1 ) P (e 1 S 0 ) p 011 = P (S 1 S 0 ) P (e 1 S 1 ) α 10 = p s10 S 1 p 111 = P (S 1 S 1 ) P (e 1 S 1 ) S 1 α 11 = α 00 p α 10 p Figure 8: Forward-Backward Algorithm Diagram way as the forward probabilities, traveling through the observation sequence backwards. The algorithm for the forward and backward passes are explained in the paper by Eisner [4]. Once the forward and backward probabilities are computed, temporary values are computed using these probabilities to be used in the maximization step. These temporary variables are γ and ξ, and are computed using the following equations: γ it = α it βit N l=1 α lt β lt ξ ijt = α it a ij β j(t+1) b j(t+1) N l=1 α lt Where N is the number of states, T is the last observation, and a and b stand for the transitional and conditional matrices respectively Maximization Using the temporary variables γ and xi computed above for each sequence, the EM algorithm then iteratively re-estimates the three HMM matrices. This is done by running the forward-backward on each observation sequence, and then storing the temporary variables computed from each. Using these temporary variables, the three matrices are re-estimated using the following equations: π i = γ i (0) T 1 t=1 a ij = ξ ijt T 1 b i(et) = t=1 γ it T t=1 γ it bin(e t, t) T t=1 γ it 9

10 α 00 = p s00 p s00 = P (S 0 Start) P (e 0 S 0 ) Start S 0 p 101 = P (S 0 S 1 ) P (e 1 S 0 ) p s10 = P (S 1 Start) P (e 0 S 1 ) α 10 = p s10 S 1 p 001 = P (S 0 S 0 ) P (e 1 S 0 ) p 011 = P (S 1 S 0 ) P (e 1 S 1 ) p 111 = P (S 1 S 1 ) P (e 1 S 1 ) S 0 α 01 = max(α 00 p 001, α 10 p 101 )... S 1... α 11 = max(α 00 p 011, α 10 p 111 ) p 00T p 00T p 00T p 11T S 0 α 0T = max(α 0(T 1) p 00T, α 1(T 1) p 10T ) p 0(T +1) End p 1(T +1) S 1 α 1T = max(α 0(T 1) p 01T, α 1(T 1) p 11T ) α (T +1) = max( α 0T p 0(T +1), α 1T p 1(T +1) ) Figure 9: Viterbi Algorithm Diagram Where bin(e t, t) is 1 if e t = t and 0 if not, and π is the starting probability matrix. Each observation sequence produces its own re-estimation of the matrices. These re-estimations are then combined by averaging the values of each matrix. Then the full EM process, using the forward-backward algorithm and then re-estimating matrices, is repeated until the log-likelihood converges, resulting in one final set of matrices. The log-likelihood is computed by computing the sum of the values in the forward step of the forward-backward algorithm, computing the log of these, and then for each observation sequence, summing these logs. As the matrices are re-estimated, these values should decrease, indicating that our probabilities lead to smooth transitions. 3.6 Viterbi Once the HMMs have been created for each musical symbol, the Viterbi algorithm is used to determine how closely a symbol follows a given hidden Markov model. This is done in a similar method to the forward-backward algorithm. Figure 9 shows the viterbi trellis diagram for a hidden Markov model with 2 states, derived from the paper by Levy [6]. For each node S i, α i t is the probability of the most likely path through the trellis, ending at observation t in S i. This computed similarly to the forward-backwards algorithm, except instead of adding each transition, α it p jit, the maximum of the transitions is computed. The maximum value at the end state determines how closely a symbol follows the model [6]. Using the Viterbi algorithm we then tested our models. For each image, we ran the Viterbi algorithm with each of the models for each type of symbol. The model which produced the highest probability was returned, indicating that the image was likely that type of symbol. 10

11 Quarter Rest Natural Treble Clef Flat Sharp Symbol # of Symbols Correctly Identified Percentage Quarter Rest % Natural % Treble Clef % Flat % Sharp % Figure 10: Data 4 Results We successfully created a simplistic OMR with the ability to extract individual musical symbols out of an image of sheet music. These symbols can then, with the created software, be converted into a series of observations for use by the EM algorithm and the Viterbi Algorithm. The created OMR software is available at research/. The created software was successful in generating accurate HMMs for various musical symbols and testing the generated HMMs. HMMs were generated for Quarter Rests, Naturals, Treble Clefs, Sharps, and Flats, using the EM algorithm as detailed above. Other musical symbols were not considered for a variety of reasons. Notes were not included due to our segmentation program not handling chords. We were unsure if multiple notes versus single notes would be problematic, and thus did not consider them. The Bass Clef was not considered due to the nature of the clef, which has 3 distinct pieces. Since our segmentation program broke the symbol up, we did not include the symbol. The remaining common musical symbols were not included due to a limited number of the symbols, which would not have resulted in statistically significant results. The trained HMMs were then tested using the Viterbi algorithm, similar to Rebelo [12]. We ran the Viterbi algorithm in stages, once for each set of observation sequences of the same musical symbol, and then we counted the number of correct identifications. This was done to automate the counting process. The results are shown in Figure 10. From the data, it is clear that HMMs worked well for all of the trained symbols, and outperformed the HMM in the comparative study by Rebelo [12]. See Figure 2 for comparison. Thus we have shown that different implementations of HMMs can lead to higher performance. As we also had a greater number of training data, this could indicate that higher training data does improve performance, as hypothesized. However, since Rebelo also included deformations in their data set, this could be the reason for differences in performance. 11

12 5 Future Work The goal of the research was to accurately identify a symbol that was extracted from a piece of sheet music. For the symbols chosen, this was done successfully. However, there may still the potential for improvement in the models, to improve performance to the level of other algorithms. We have currently been using six states for the Markov models, based on our number of observations, as stated in the work by Pugin [10]. A change in the number of states may improve the performance of the models. Thus a goal for future work could be to run the program with a variety of different states and determine the number of states which leads to the highest performance of the HMMs. Changing the types of features computed may also result in an increased performance of the HMMs, however this seems unlikely as in the work by Pugin, they found that the six features used were optimal for their OMR [10]. However, an investigation into other features that could be implemented might be useful. This investigation is a bit more complicated than optimizing the number of states, as it requires reworking the feature extraction part of the program. However it is certainly very feasible, and if changing the number of states does not result in a better performance, this would be something to investigate. A secondary goal for future work would be to create a fully working OMR. Our OMR can currently only segment sheet music and then identify a given music symbol from the sheet music. A full OMR software not only identifies symbols but is able to correctly interpret an entire piece of music. Thus future work could entail adding the component that would recreate the piece of sheet music digitally, once the symbols have been identified. To do so, we will need to use a digital language that can accurately represent any given piece of sheet music, and then store a given piece of sheet music in this digital form. The most common digital language for music currently is musicxml. While there are already OMRs which can do this, incorporating this component would make the software more complete. We would also like the program to be able to segment a greater variety of sheet music as well. Currently it is assumed that the staff lines are completely horizontal with no deviation, and it is assumed that the given symbols are not blurred in any way, as mentioned previously. Thus, future work can include making the software more robust, in order to be able to handle scanned sheet music that may be tilted or blurred. Currently the number of symbols that the program can accurately identify is also limited. Thus, more hidden Markov models should be made to encompass all possible symbols that could be encountered. This would require further improvement in segmentation to break up chords into single notes, and to keep the Bass Clef connected. More data would also need to be acquired to produce the other models. Breaking up the chords will likely require implementation of a vertical histogram on the chords, to determine where individual notes are and break them up accordingly. This is outside the scope of this research. 6 Conclusion The software created for this research is currently functional but has room for improvement. The pre-processing stage successfully removes staff lines and aligns the piece by musical staves. The segmentation process accurately segments all symbols including beams, but 12

13 does not break up chords into the individual notes. However this is sufficient for the testing of HMMs. The feature extraction on each segmented symbol is also fully functional, resulting in integer observation sequences, as are the two algorithms required for this research: EM, and Viterbi. The evaluation of our program shows that HMMs work quite well for all of the symbols used, with accuracy above 95%. While there is more work that can be done to further improve the models, the data collected shows that it can be a viable algorithm for OMRs, and had better performance compared to prior work. Overall, we were successfully able to implement a working simplistic OMR and fully completed the identification process. References [1] N. Arica and F. T. Yarman-Vural. An overview of character recognition focused on off-line handwriting. IEEE Transactions on Systems, Man, and Cyber Part C, 31(2): , May [2] David Bainbridge and Tim Bell. The challenge of optical music recognition. Computers and the Humanities, 35(2):95 121, [3] Mou-Yen Chen, A. Kundu, and Jian Zhou. Off-line handwritten word recognition using a hidden markov model type stochastic network. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(5): , May [4] Jason Eisner. An interactive spreadsheet for teaching the forward-backward algorithm. In Dragomir Radev and Chris Brew, editors, Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL, pages 10 18, Philadelphia, July [5] Susan Ella. George. Visual perception of music notation : on-line and off-line recognition / Susan Ella George. IRM Press Hershey, Pa, [6] Roger Levy. Lecture notes on linguistics/cse 256: Hidden markov model inference with the viterbi algorithm: a mini-example, Winter [7] S. Marinai and P. Nesi. Projection based segmentation of musical sheets. In Document Analysis and Recognition, ICDAR 99. Proceedings of the Fifth International Conference on, pages , Sep [8] M. Mohamed and P. Gader. Handwritten word recognition using segmentation-free hidden markov modeling and segmentation-based dynamic programming techniques. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 18(5): , May [9] Jiri Novotn and Jaroslav Pokorn. Introduction to optical music recognition: Overview and practical challenges. In DATESO,

14 [10] Laurent Pugin. Optical music recognition of early typographic prints using hidden markov models. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria (BC), Canada, October ismir.net/papers/ismir06152_paper.pdf. [11] Lawrence R. Rabiner. Readings in speech recognition. chapter A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, pages Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, [12] A. Rebelo, G. Capela, and JaimeS. Cardoso. Optical recognition of music symbols. International Journal on Document Analysis and Recognition (IJDAR), 13(1):19 31, [13] Ana Rebelo, Ichiro Fujinaga, Filipe Paszkiewicz, Andre R. S. Marcal, Carlos Guedes, and Jaime S. Cardoso. Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3): ,

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University