Aspects of Mathematical Handwriting Recognition

Friday, 11 July, 2014 - 14:30

Mathematical handwriting provides a number of challenges beyond what is required for the recognition of handwritten natural languages. For example, it is usual to use symbols from a range of different alphabets and there are many similar-looking symbols.  Mathematical notation is two-dimensional and size and placement information is important.   Additionally, there is no fixed vocabulary of mathematical words'' to disambiguate symbol sequences.  On the other hand, there are some simplifications. For example, symbols do tend to be well-segmented.

We present a geometric theory that we have found useful in recognizing mathematical handwriting.   We represent symbols as parametric curves approximated by certain truncated orthogonal series.  The Euclidean distance in this space is closely related to the variational integral between two curves and may be used to find similar symbols very efficiently.   Training data sets with hundreds of classes are seen to be almost linearly separable, allowing classification by ensembles of linear SVMs.  In this setting, we find it particularly effective to classify symbols by their distance to the convex hulls of nearest neighbors from known classes.  Additionally, by choosing the functional basis appropriately, the series coefficients can be computed as each symbol is being written. By using truncated series for integral invariant functions, orientation-independent recognition is achieved.  The beauty of this theory is that a single, coherent view provides several related geometric techniques that give a high recognition rate and do not rely on peculiarities of the symbol set.  Time permitting, we will end with a demonstration of a browser tool for pen-based collaboration.


Stephen Watt, University of Western Ontario, Canada