Reading #12: Constellation Models for Sketch Recognition (2006)

by D. Sharon and M. van de Panne (paper)

Comments: Francisco

A Constellation is a collection of objects arranged in a certain pattern, forming an image. Drawings can be imagined as constellations, especially with regards to common objects, such as the human face. Each face has the same features arranged in the same manner, with slight variances in size, shape, and location. However, several overall main relationships are always present: two eyes are on a horizontal line and are above the nose, for example.

The authors have used this concept for sketch recognition. Required drawing elements and their relative positions are set as features and classified using a maximum likelihood search. Many sketches were collected for training data, and common features were identified and labeled. 5 class values were defined, including faces, flowers, sailboats, airplanes, and characters.

_______

This is an interesting way to identify common objects within sketches. Since the authors use only 5 classes, I wonder if this technique will support many classes. Also, it would be interesting to see how detailed this can get to perform more intricate recognition, such as multiple types of faces, or even individuals.

Reading #11: LADDER, a sketching language for user interface developers (2007)

by Tracy Hammond and Randall Davis (paper)

Comments: Jonathan

LADDER is a language to "describe how sketched diagrams in a domain are drawn, displayed, and edited." It is intended to help interface developers create sketch-based interfaces. LADDER is used to create shape descriptions. Shapes consist of components (such as lines), constraints (such as intersections), aliases, editing, and display properties. LADDER descriptions allow domain-specific definitions that can be used with domain-independent recognizers.

The paper gives many examples of shapes that can be modeled using LADDER. Such shapes include arrows and UML diagrams.

There are many predefined shapes (such as point, line, curve, ellipse), constraints (such as perpendicular, collinear, tangent, larger, acute), orientation-dependent constraints (such as horizontal, negative slope, above, centered below), editing methods (such as click, draw, encircle), and display methods (such as original strokes, ideal strokes, circle, rectangle, text, image).

Some shapes can be made up of certain numbers of segments, while some shapes can be made up of infinite segments.

When recognition occurs, primitive shapes are recognized first. Shapes are generated that contain the original strokes and their interpretations. Some shapes contain sub-shapes. Once primitive shapes are recognized, domain-specific shapes, using the domain descriptions, are recognized.

__________

LADDER is a nice tool for interface developers. I like that it allows complex shapes to be completely described using a programming language. However, I do see certain drawbacks, namely that each shape must be explicitly defined in LADDER. The paper mentions the capability to generate descriptions based on drawn examples, and I think this would be a great idea. It would be very tedious to define explicit shapes for large domains (COA...). I don't know if it has been implemented yet.

Reading #10: Graphical Input Through Machine Recognition of Sketches (1976)

by Chrisopher F. Herot (paper)

Comments: Jonathan

This is an early sketch recognition system aiming to allow sketch input to computer programs. It contains a system called HUNCH that is used to recognize primitive sketches. It uses speed only to detect corners in a sketch. Curves were viewed "to be a special case of corners," and were modeled using b-splines. Speed was also used to determine how "careful" the user was, for instance faster strokes were less careful. This was used to help identify curves and decide whether to draw them as b-splines or just corners. The programs used to convert sketches to straight segments and curves were called STRAIT and CURVIT, respectively.

STRAIT and CURVIT did not always create the same interpretation as humans. They seemed to be user-dependent, as sketches were interpreted better for some users than for others. An improved method of straight segmenting was implemented, called STRAIN. It used a function of speed to determine which line endpoints to join (instead of a fixed distance, which is what STRAIT did).

Programs were also developed to detect overtraced lines and turn 2D sketches into 3D.

The paper discusses the importance of context in sketch recognition systems. The HUNCH system does not use context in its recognition schemes. For example, all its subroutines are always called in a fixed sequence and always perform recognition in the same way. However, similar strokes will probably be interpreted in different ways given their context.

The paper discusses an interactive system and examines the hierarchical structures of recognized sketches. It discusses various ways to tune the algorithms to work for a "truly interactive system."

__________

This is an early approach to sketch recognition, and it asks many questions as well as answers some. It can be rather dry and boring, but it does bring up some questions that are still relevant today and whose solutions can still be improved on.

For example, the latching problem: when should close endpoints be merged? It depends on the context, which is another contemporary issue. While I was working on my truss recognizer, I dealt with the latching problem when merging close line endpoints to form truss nodes. I used a distance computed based on stroke lengths, but that was obviously a poor choice.

This paper begins to explore machine learning techniques for interpreting sketches. It introduces many questions and proposes possible solutions to those questions by hypothesizing extensions to an existing primitive corner finding and beautification system. I believe the authors asked good and relevant questions, as we are still asking ourselves how to solve many of the same problems in better ways.

Reading #9: PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (2008)

by Brandon Paulson and Tracy Hammond (paper)

Comments: Jianjie

Paleosketch is a low-level sketch recognition system that can identify primitive shapes from single strokes. It is capable of recognizing Lines, Polylines, Circles, Ellipses, Arcs, Curves, Spirals, and Helixes.

Recognition occurs in three steps: pre-recognition, individual shape tests, and result ranking. Pre-recognition performs is essentially some pre-processing to make recognition easier. This includes removing duplicate points, feature graph generation (speed, curvature, etc) that can be used during recognition, tail removal, and two new features DCR and NDDE. NDDE is the normalized distance between direction extremes, and it is calculated by calculating the stroke length between the point with the highest direction value and the lowest direction value (direction value = change in y over change in x). DCR is the direction change ratio, and it is calculated by taking the "maximum change in direction divided by the average change in direction."

Individual tests are done for lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes. Each test compares some stroke features and calculates the confidence that a stroke is that shape. When all tests are complete, the results are ordered using properties of the corner finding algorithm.

To test this system, data was collected from 10 users. This data was run on Paleosketch and compared with some features disabled as well as with Sezgin's algorithm. Paleosketch achieved a recognition rate of 98.56% across all shapes.

__________

Paleosketch is a good primitive recognition algorithm, essentially combining ideas from previous work and adding in some new stroke features and result ranking. It performs much better than other algorithms we have covered so far, including Rubine's and Sezgin's algorithms. I have used Paleosketch some, and I have found it to be fairly accurate in practice. Its biggest drawback in my opinion is its speed. It runs fairly quick for smaller strokes and collections of a few strokes, but when strokes become long or there are many strokes being analyzed Paleosketch slows down and can take up to a second or more to execute. This makes it a poor choice for online recognition systems. However, it is a great improvement over previous systems.