«Author............................................................. Department of ...»
Recognition of Handwritten Mathematical Expressions
Nicholas E. Matsakis
Submitted to the Department of Electrical Engineering and
in partial fulﬁllment of the requirements for the degrees of
Bachelor of Science in Electrical Engineering and Computer Science
Master of Engineering in Electrical Engineering and Computer
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
c Nicholas E. Matsakis, MCMXCIX. All rights reserved.
The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part.
Department of Electrical Engineering and Computer Science May 21, 1999 Certiﬁed by........................................................
Paul A. Viola Associate Professor Thesis Supervisor Accepted by.......................................................
Arthur C. Smith Chairman, Department Committee on Graduate Students Recognition of Handwritten Mathematical Expressions by Nicholas E. Matsakis Submitted to the Department of Electrical Engineering and Computer Science on May 21, 1999, in partial fulﬁllment of the requirements for the degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science Abstract In recent years, the recognition of handwritten mathematical expressions has re- cieved an increasing amount of attention in pattern recognition research. The di- versity of approaches to the problem and the lack of a commercially viable system, however, indicate that there is still much research to be done in this area. In this thesis, I will describe an on-line approach for converting a handwritten mathematical expression into an equivalent expression in a typesetting command language such as TEX or MathML, as well as a feedback-oriented user interface which can make errors more tolerable to the end user since they can be quickly corrected.
The three primary components of this system are a method for classifying isolated handwritten symbols, an algorithm for partitioning an expression into symbols, and an algorithm for converting a two-dimensional arrangements of symbols into a typeset expression. For symbol classiﬁcation, a Gaussian classiﬁer is used to rank order the interpretations of a set of strokes as a single symbol. To partition an expression, the values generated by the symbol classiﬁer are used to perform a constrained search of possible partitions for the one with the minimum summed cost. Finally, the expression is parsed using a simple geometric grammar.
Thesis Supervisor: Paul A. Viola Title: Associate Professor
I would like to thank my advisor Paul Viola for providing the motivation and the resources for this research to be done. In particular, I would like to thank him for his continual suggestions for improving system performance and for suggesting the minimum spanning tree constraint. I would also like to thank Erik Miller for his willingness to help me work through a stubborn problem, for his insight into the nature of mathematical expression recognition, and for quickly providing me with counter examples for every deep truth I ever thought I had divined.
In addition, I would like to thank John Winn for his help working through the stroke reversing heuristics, Victor Luchangco and Kevin Simmons for their helpful suggestions for improvements to the manuscript of this thesis, and Jeff Norris for being a supportive ofﬁcemate.
I would like also to thank my parents, Elias and Joanne Matsakis, siblings Chris and Antigone Matsakis, grandparents Nicholas and Theodora Matsakis and Christopher and Antigone Dafnides, Godparents Michael and Penny Pilafas, and the rest of my family for their unwavering support both during this work and in all my endeavors. Finally, I would like to express my deepest gratitude to Terri Iuzzolino for all the encouragement and support she has provided throughout this project, and in particular during the ﬁnal stages of writing.
1-1 Screen image of the recognition system................. 11 1-2 The effect of partitioning on interpretation................ 12 1-3 An expression correctly partitioned into symbols............ 13 2-1 Two characters represented by the same symbol............ 16 2-2 Two symbols representing the same character.............. 16 3-1 The Data Collection Program....................... 21 3-2 The Angles Used to Find a Canonical Stroke Ordering......... 24 3-3 Eigenvalues Associated with the Principal Components........ 27 3-4 Examples of Reconstructed Characters.................. 28 3-5 Gaussian Classiﬁer Confusion Matrix.................. 29 3-6 Symbols Commonly Confused by a Gaussian Classiﬁer........ 29 3-7 Two styles of writing the digit “5”.................... 30 3-8 Classiﬁer Generalization Performance.................. 31 4-1 An Expression with Script Characters.................. 33 4-2 An Expression Correctly Partitioned into Symbols........... 33 4-3 A typical minimum spanning tree.................... 38 4-4 Using a Minimum Spanning Tree to Partition.............. 39 4-5 The chain formation of a minimum spanning tree........... 40 4-6 The star formation of a minimum spanning tree............ 40
1.1 Motivation The problem of machine recognition of handwritten expressions has long been a focus of study in the ﬁeld of pattern recognition. Research in this area has been driven by a desire to combine the natural advantages of handwritten input, including a simple interface and a well-established stylistic vocabulary, with the data processing capabilities of computers. Recently, the problem has been approached with increased vigor with the advent of palmtop computers with pen interfaces, which possess enough processing resources to handle the demands of machine recognition. As a result, a number of commercially successful products are available which recognize a user’s natural handwriting and use this ability to perform simple tasks such as scheduling appointments and writing memos.
Most scientists and engineers, however, are unable to take advantage of these computers for their technical work due to the lack of effective algorithms for interpreting more complex handwritten expressions, particularly diagrams, graphs, equations, and other mathematical forms. While computers can store these forms as “digital ink,” the inability to work with the expressions in a meaningful way after they have been entered has prevented these systems from replacing pencil and paper. Compared to the effort put into the recognition of printed and cursive prose the recognition of more complex forms has received only minor attention in 9 pattern recognition research. In addition, the diversity of approaches to the problem and the lack of a commercially viable system indicate that there is still much research to be done in this area.
As more powerful computers with better displays and input devices become available, demand will increase substantially for software systems which can work with the type of handwritten data that one would ﬁnd in a research notebook or technical document. Mathematical expressions are a natural place to begin such research as they are critical to virtually all technical writing and there already exists a wide body of literature on recognizing handwritten letters and words, major subcomponents of these expressions. Combining mathematical expression recognition capabilities with existing algebra solving software, graphing programs, and simulation systems would be a ﬁrst step towards a superior user interface for doing technical work with a computer.
1.2 System Overview
In this thesis, I describe an on-line approach for converting a handwritten mathematical expression into an equivalent expression in a typesetting command language such as TEX or MathML. In addition, I describe a feedback-oriented user interface which renders errors in a recognition system more tolerable to the end user since they can be quickly detected and corrected. Figure 1-1 provides a screen image of the system.
The problem of interpreting an expression can be divided into three modular subproblems called isolated symbol classiﬁcation, expression partitioning and parsing.
This division has the advantage that each subproblem can be solved and its performance evaluated essentially independent of the others, so that improvements can be made in each area while still maintaining the integrity of the entire system.
In processing an expression, it is ﬁrst partitioned into symbols in a process called expression partitioning. The symbol classiﬁer is used in this process to evaluate the likelihood that particular strokes should be combined into symbols. The
resulting set of symbols is then parsed by assigning characters to the symbols and determining a structure for the expression, resulting in an interpretation of the expression as a typesetting command.
1.2.1 Isolated Symbol Classiﬁcation One of the basic problems in handwriting recognition is determining, out of context, which symbol is best represented by a set of strokes. For this task I created Gaussian models of a set of common symbols from examples of my handwriting.
At run time these models were used to rank possible interpretations of a new set of strokes.
1.2.2 Expression Partitioning
Handwritten expressions typically contain more than a single symbol, so the capabilities of the symbol classiﬁer need to be used within a larger framework to determine the quantity, location, and identity of the symbols in an expression. Therefore, the expression partitioning algorithm attempts to ﬁnd an optimal partitioning 11 of an expression’s strokes into a set of symbols.
The correct partitioning of an expression is not always obvious, even to a human reader. Consider the example given in ﬁgure 1-2. This diagram shows how the strokes in an ambigious expression may be partitioned into two different sets of symbols, which are indicated by the placement of a grey bounding box around the symbols. Depending on the partitioning of strokes, this particular expression could be interpreted as either 1 x or kx. A good partitioning algorithm needs to be able to consider both of these possibilities and determine which one is preferable according to some cost function.
An additive cost model is used to partition an expression, where the cost for considering that a particular set of strokes belongs to the same symbol comes from the values assigned those strokes by the symbol classiﬁer. The cost of a partition of a set of strokes, then, is the sum of the costs of the symbols in the partition.
In addition, a minimum spanning tree based in interstroke distances is used to constrain the set of partitions searched to a reasonable size.
1.2.3 Parsing Once the expression has been correctly partitioned into symbols, as in ﬁgure 1-3, there still remains the problem of determining which characters the symbols represent and deciding on the structure of the resulting typeset expression. This can be done using a geometric grammar whose elements are inspired by the atomic elements of TEX’s typesetting engine. In this grammar, each character belongs to a single grammar type and combines with other characters in well deﬁned ways based on simple relationships between their bounding boxes. In addition, simple characters also have a baseline which aids in determining when a character is superscripted.
1.2.4 User Interface
An accurate recognition algorithm still needs a good user interface if it is to be a viable alternative to pen and paper. An interface needs to be simple, since it may be used on a computer with only a stylus and touch screen for input. In addition, it should allow the user to quickly correct errors, because some expressions are ambiguous and people often make mistakes while writing long expressions. Finally, it is important for the interface to be able to provide immediate feedback to the user so that errors can be quickly detected.
13 The program’s user interface provides a number of features which allow the user to immediately detect and correct errors. One way of detecting potential errors is to set a recognition threshold for symbols, and change the color of any strokes that can not be assigned to a symbol with a cost below the threshold as they are written. Another way of detecting errors is to draw faint boxes around the partitioned symbols, so that the user immediately knows that a two or three stroke symbol has been recognized.
The system also provides two ways of correcting errors. The simplest is the ability to erase a set of strokes simply by scribbling over them with the stylus.
A small part of an expression can then be changed without effecting the rest of the expression. Another feature for error correction is a pop-up correction menu, which allow the user to change the character assigned to a symbol by specifying an alternative from a list of options. Using these features, expressions with minor interpretation errors can be quickly corrected.
2.1 Terminology Research in handwriting recognition has produced a rich glossary of terms pertaining to every aspect of handwriting, with different approaches often necessitating the use of different terminology. Generally, recognition systems can be classiﬁed as either on-line or off-line systems. In on-line systems , the writing is captured by the computer as the user writes on a pen tablet or other similar input device.