This action might not be possible to undo. Are you sure you want to continue?
D Research Proposal: Coordinating Knowledge Within an Optical Music Recognition System
J. R. McPherson March, 2001
Introduction to Optical Music Recognition
Optical Music Recognition (OMR), sometimes also called musical score recognition or simply score recognition, is the process of automatically extracting musical meaning from a printed musical score. Music notation provides a rich description of the composer’s ideas, but ultimately sheet music is open to some degree of interpretation by performers. Performance considerations aside, the advantages of a computerised representation of a musical score are numerous. These include: • the ability to automatically transpose a particular instrument; • converting representation to other musical formats or notations, such as Braille-reading machines, for various software packages, or re-typesetting a score published in an outdated fashion; • allowing musicians to read the music from a computer display, for example to eliminate the need for page turns [GWMD96, McP99]; • a form of compression, resulting in smaller data sizes [BI98]; • ease of sharing and archiving; • increased ease of editing (using appropriate software), aiding in composition; and • automatic indexing and retrieval of information [MSBW97].
General framework for OMR
The automated process of extracting musical meaning from sheet music normally follows a number of specialised steps, performed in a ﬁxed order. The ﬁrst step is to acquire a digital form of the sheet music that a computer can access. Today, this step is fairly easy, with the widespread availability of cheap scanner hardware that can create both colour and monochrome digital images at a resolution of three hundred dots per inch or higher, which is more than adequate for our processing purposes. 1
The second step is to perform various image processing techniques to the acquired image. This is necessary to recognise the symbols that make up the page — for example, lines and note heads. This step is the hardest, and is often broken up into two or more separate steps. The ﬁnal step is to determine the musical meaning (also called musical semantics of the image based on the objects found in the previous step. In CMN for example, objects like notes and rests have musical qualities such as pitch, volume and duration; objects such as slurs, accents and trills aﬀect individuals notes; and objects such as tempo markings, key signatures and time signatures aﬀect the notes that follow.
Background and Starting Base
Common Music Notation (CMN), also called Western staﬀ notation or Western music notation, is the notation most widely used today — an example of CMN is shown in Figure 1. Other music notations include guitar tablature, plainsong notation, sacred harp notation, and various Asian, African and Indian musical notations.
Figure 1: A sample of Common Music Notation: Handel’s Sonata V for ﬂute and piano. Ideally, an OMR system should not be limited to any particular set of symbols. It should be possible to add rules that allow the system to ‘understand’ a new notation without making signiﬁcant internal changes to the system. This is referred to as extensibility. Bainbridge’s CANTOR system [Bai97] was one of the ﬁrst fully extensible optical music recognition systems developed. Most prior work was limited to work on small subsets of CMN, and often made assumptions about staﬀ lines, such as there were always ﬁve lines per staﬀ. While CANTOR still has the restriction that the music must be stave-based, there can be an arbitrary number of lines per staﬀ. Here, extensible refers to the fact that one of the design goals was to research and design a system that did not have hard-coded shapes built into it. This research led to the formation of Primela — a Primitive Expression Language for describing speciﬁc musical shapes. A set of Primela descriptions can be written to describe a particular music notation and then loaded and used at run-time, to process an image.
CANTOR consists of four main steps: Staﬀ line identiﬁcation, which locates staﬀs, removes staﬀ lines and locates objects in the bitmap. Primitive Recognition, which identiﬁes basic shapes, such as (for the CMN Primela descriptions) slurs, noteheads, tails, accidentals, and lines. Primitive Assembly, which joins the basic primitives found into musical objects, such as noteheads, stems and tails into a note; and Musical Semantics, which determines musical qualities such as pitch and duration of the musical objects found, and can output various musical ﬁle formats.
Areas of Research
Most current projects in the ﬁeld of OMR are concerned with improving the accuracy of the various components, particularly the pattern recognition stages. Instead of focusing solely on the individual components, I wish to research and create methods that improve the overall system not merely by improving components in isolation, but by improving how they interact with each other so as to maximise the amount of musical information gained from the image. Part of my research will involve determining and evaluating appropriate methods for the process controlling the interaction, known as the coordinator.
Coordinating interaction between components
Determining how best to coordinate the information receive from the OMR components will be the main area of focus for the thesis. Figure 2 shows how most current systems operate. The diﬀerent phases of the OMR system are performed in a linear sequence, and each phase’s output becomes the next phase’s input. This also means that each phase is tightly coupled to both the previous and following one, as they must share common data structures and formats.
Scanned music Staff line identification Image enhancement Musical object location Image enhancement Musical feature classification Musical knowledge Musical encoded data file Musical semantics
Figure 2: The current “pipeline” approach However, this model has some limitations. Most seriously, errors made in an early step will propagate through the following steps. For example, when performing musical semantics analysis on the recognised components, an error may be detected, such as a bar of music not having enough (or too many) 3
notes in it. Because this type of error can not be corrected within the current context, the system is forced to output something that it knows is not quite right. (Some errors, however, such as a missing or mis-detected accidental in a key signature, could conceivably be corrected in this context.) What would improve the system’s overall accuracy would be to use this newly-gained context to re-perform a previous stage, and hopefully correct the error given this new information.
Figure 3: The proposed “coordinated” approach Figure 3 shows a possible revised framework to allow feedback to earlier stages. All execution is controlled by a coordinating process — the modules can not communicate directly. The idea here is that the top-level process controls the ﬂow of execution, based on a number of variables. Part of the research is to determine the choice of variables used to control program ﬂow, and what aﬀect these variables have on both the performance and the run-time behaviour of the system. This type of framework would also encourage less integration between the various components. Loosely integrated components would allow, for example, the addition of several ‘competing’ components that are capable of doing the same or similar steps which could have their results compared for discrepancies by the coordinator. This would provide either more conﬁdence that the results are right, if the diﬀerent components agree, or particular areas that should be further examined if the results conﬂict. Another advantage is that this framework allows modules that do not directly perform any music processing but still provide additional context. An example of this is a component that could detect the scan quality (perhaps from the level of noise in the bitmap) and if the quality is low then tolerances could be lowered, or a set of descriptions that is speciﬁcally designed for noisy data could be used.
I would like to spend some eﬀort into investigating and/or designing algorithms for using a priori knowledge to determine possible “object types” before using the lower level recognition subsystems such as staﬀ location or character/text recognition. This more general area of research is known as document image analysis, and there are techniques that might be researched and improved with respect to the OMR domain. This could involve the system keeping a history of processed documents, to aid in predicting the layout of future documents, and using prior knowledge to decide that there may be a title and author somewhere 4
near the top of the page. The proposed coordinated approach for the OMR system could then decide whether or not to test this hypothesis given knowledge gained about this area of the page from other sources.
Classiﬁcation Algorithms for feature extraction
One of the more recent developments in the ﬁeld of OMR is the use of machinelearning techniques to develop shape descriptions, given a set of training data [Ala95, BAD99, SD98]. These techniques could be investigated to design feature sets for classiﬁcation of musical primitives for either the current ‘Primela’ framework, or some new, replacement method for diﬀerentiating objects.
Illustration of the Concept
There is currently an existing prototype — which is based on the CANTOR code and is work-in-progress — capable of using message passing to provide feedback from a particular phase to earlier phases. While not yet very advanced, the following example demonstrates the potential improvement that the methods under investigation may oﬀer. Figure 4(a) shows a small extract from the Clarinet Concerto by Mozart. This extract is from the pianist’s part, and also has the clarinetist’s part displayed above the piano stave. This incidentally also demonstrates how OMR must be able to deal with symbols at diﬀerent scales within the same piece. Figures 4(b) and 4(c) show the vertical lines and the ﬂats respectively that were found by CANTOR in the pattern recognition stage. There are some errors in both of these classiﬁcations: There are a few mis-identiﬁed vertical lines: the time signature ( 6 ) was just 8 broken enough to pass as two vertical lines. The musical semantics modules could pick up that there was no time signature yet there were extra vertical lines where a time signature might be expected, and allow the system to reexamine this area. Also, the two letter ‘l’s of the word “Allegro” were not unreasonably determined to be vertical lines, as they were close enough to the staﬀ to be checked. However, they are unlikely to have any musical meaning for CMN, and are also close to other textual characters. There are four naturals in the extract that were determined to be ﬂats, due to the default descriptions used. This could be solved by writing Primela descriptions that correctly diﬀerentiate between ﬂats and naturals for the particular fonts used in this piece of music, but it would be more elegant to automatically correct these with semantic analysis, by noticing that accidentals rarely appear that have no aﬀect on the note, due to either the last occurring key signature or from an accidental on the same note earlier in the same bar. Unfortunately, in this particular case there are also missing ﬂats in the key signatures of two of the staves. These could also be picked up using semantic analysis, by noticing that one staﬀ did have a key signature, so the others probably will as well. This, coupled with the fact that there will be unrecognised objects in the position where a key signature could be expected, should provide enough context that the recognition stage should look there again for a key signature. Lastly, for whatever reason the ﬁrst chord in the second bar did not have a note stem recognised as a vertical line — see the circled area within each ﬁgure 5
to locate this object. (CANTOR currently checks for vertical lines before checking for accidentals, although this is user-deﬁned in the Primela descriptions.) Because of this, the shape passed the tests as possibly being a ﬂat. This is as far as CANTOR goes. However, when the prototype system assembles the primitives together, it is noticed that this particular ﬂat does not have a notehead in the appropriate position to its immediate right. The primitive assembly module now issues a request to the coordinator to check this primitive’s classiﬁcation again. Note that if the request is rejected, the primitive assembly stage has already been completed, and processing can continue regardless. The coordinator determines that the pattern recognition module is capable of fulﬁlling this request, so passes the request to it. This stage now takes account of this new context, and subsequently rejects the shape as possibly being a ﬂat (Figure 4(d)). Currently this context (that is, the primitive could not be assembled) is accounted for by re-testing the object for the same classiﬁcation, but with a higher threshold for passing. While this may seem like a small step, it can have an impact on the ﬁnal output — this is the diﬀerence between the music as written, and an incorrect note resulting in a dischord. Unfortunately, the prototype does not yet use this new context to correctly identify this shape, in this case as a vertical line. The prototype system does not currently perform semantic analysis. As the above discussion shows, there are plenty of opportunities to use musical context for improvement in the recognition stages. The key will be ﬁnding a generalised approach for this task.
Intended Schedule and Requirements
This research will be carried out using existing equipment within the department. No extra computing (or other) resources are expected to be required. The following is an estimate of the work likely to be completed. Depending on the progress made during these tasks, other work, such as that mentioned in Sections 2.2 and 2.3, might be undertaken. Also, new developments by other researchers may cause a change in direction or scope for this research. Task Months Continue research, complete ﬁrst prototype. 6 Experimentation with prototype 2 Write-up methods, ideas and ﬁndings. 1 Investigate and create other coordinators 12 Comparisons between coordinators and other OMR systems 3 Completion of write-up 5 Total: 29 Note that some work has previously been done during enrolment for a Masters degree since July 2000. There are currently no foreseen ethical issues arising from this research. If at a later date it is necessary to perform evaluation studies on various methods and/or software, then ethical approval from the school’s Ethics Committee will be sought.
(a) The starting image
(b) The vertical lines found by CANTOR
(c) The ﬂats found by CANTOR
(d) The ﬂats found by CANTOR with coordination
Figure 4: Part of the ﬁrst line of the Rondo from Mozart’s Clarinet Concerto, with area of interest circled
[Ala95] Jarmo T. Alander. Indexed bibliography of genetic algorithms in optics and image processing. Report 941-OPTICS, University of Vaasa, Department of Information Technology and Production Economics, 1995. ftp.uwasa.fi/cs/report94-1/gaOPTICSbib.ps.Z. Kyungim Baek Bruce A. Draper, Jose Bins. Adore: Adaptive object recognition. In Proceedings of the International Conference on Vision Systems, pages 522–537, Las Palmas de Gran Canaria, Spain, Jan 1999. David Bainbridge. Extensible Optical Music Recognition. PhD thesis, University of Canterbury, Christchurch, New Zealand, 1997. David Bainbridge and Stuart Inglis. Musical image compression. In Proceedings of the IEEE Data Compression Conference, pages 209–218, Snowbird, Utah, 1998. IEEE.
[GWMD96] Christopher Graefe, Derek Wahila, Justin Maguire, and Orya Dasna. Designing the muse: A digital music stand for the symphony musician. In Proceedings of the CHI ’96 Conference on Human factors in computing systems, page 436, Vancouver, Canada, 1996. ACM. [McP99] J. R. McPherson. Page turning — score automation for musicians. B.Sc Honours thesis, University of Canterbury, New Zealand, 1999.
[MSBW97] Rodger J. McNab, Lloyd A. Smith, David Bainbridge, and Ian H. Witten. The New Zealand Digital Library MELody inDEX, May 1997. [SD98] Marc Vuilleumier St¨ckelberg and David Doermann. On musiu cal score recognition using probabilistic reasoning. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR ’98. IEEE, 1998.