Reading #26: Picturephone: A Game for Sketch Data Capture (2009)

by Gabe Johnson

Comments: Chris

This paper talks about Picturephone, a sketching game that was mentioned in a previous paper. Picturephone is a game inspired by the telephone children's game in which a message is passed down a line of people by whispering one person at a time. The message usually changes drastically and in some cases can wind up being totally different from the original message.

For example, the original message might be, "Marty took a drink of water," and after one pass might change to "Marty drank some water." Eventually it might become "Marty took a drink of soda" and inevitably will become "Marty was arrested for arson while not wearing pants" after one or two more passes.

Picturephone has the first user sketch the story. The second person then creates a new story based on the sketch. A third person then sketches the new story, and a third player sketches this story. The sketches are then compared and graded somehow. The sketches are also labeled in the process.


This looks fun. I would like to play it. Who wants to play it with me? I would like to see what would happen if the sketch/story was repeated 20 times.

Reading #25: A descriptor for large scale image retrieval based on sketched feature lines (2009)

by mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, Marc Alexa

Comments: Francisco

This paper deals with sketched based image search, in which the images to be searched for are sketched by the user. The authors use a few asymmetric descriptors that match the main features of a stroke with objects in the images. They tested with a set of 1.5 million pictures of outdoor sceneries. They tested 27 sketches, and the results look similar to the queries. They illustrate several example sketches and top results.


I have thought about searching using sketch queries. We don't even have image search (where we input a normal image, not even a sketch) widely available. Hopefully that area and sketch searching will become widespread soon, as it is very useful.

Reading #24: Games For Sketch Data Collection (2009)

by Gabe Johnson and Ellen Yi-Luen Do

Comments: Chris

This paper discusses the use of games to collect data, specifically for sketch. The authors wish to understand "how people make and describe hand-made drawings." The paper describes two games: Picturephone (like the telephone game) and Stellasketch. Picturephone gives a description of a sketch for player 1 to draw, and player 2 must then describe the sketch that player 1 drew. More players can then draw the sketch based on player 2's text instead of the original text. This is fun. Stellasketch is like Pictionary. One player draws something based on a clue, and other players privately label the sketch. The point of using the games is to hopefully collect much more data for sketch research than the typical handful of users.


This is a cool idea. I actually want to play these games right now (I want to be in a user study). This is a very cool, free way to reward users for taking the study. Work is nice if it doesn't feel like work.

Reading #23: InkSeine: In Situ Search for Active Note Taking (2007)

by Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Ed Cutrell, Michael Shilman, and Desney Tan

Comments: George

This paper presents a note-taking application that helps the user create references by incorporating searching and gathering content. While taking notes, the user can perform searches by circling some previously written text. The actions are performed by pen gestures. They can add reference icons to a sketch, which appear as normal desktop icons and can link to files or URLs. 5 users tested the system.


For sketching to replace the mouse and keyboard, many unique applications such as this need to be invented and developed. Sketching introduces some interface navigation problems, which can be frustrating to the user, especially during sensitive applications like note taking. We need many novel solutions such as this.

Reading #22: Plushie: An Interactive Design System for Plush Toys

by Yuki Mori and Takeo Igarashi

Comments: Chris

This is a follow up of Teddy. This paper presents a system that can generate patterns that can be used to create plush toys. The program creates 3d models from 2d sketch inputs and finds a good pattern that can be printed and applied to fabric in such a way that the resulting plush looks like the 3d model. The program incorporates similar 3d conversion as Teddy, and it also includes some editing tools, such as cut, part creation, and seam insertion and deletion.


This is definitely unique. Once again, I like the conversion of the 2d stroke to a 3d shape. I am wondering how complex you can make the shape, since it seems like you have to make a big blob and carve away parts of it maybe. Also, I took a computer-aided sculpting course this semester, and I could have used this to make one of my sculptures (too bad I didn't read this paper when we were doing that project).

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design (1999)

by Takeo Igarashi, Satoshi Matsuoka, Hidehiko Tanaka

Comments: Chris

This paper showcases a program that takes sketched objects and constructs 3d models from 2d sketches. Basically, it makes wider areas thicker. Once a sufficient stroke is drawn, a 3d model is generated and can be rotated in 3d space and drawn on in different orientations to create different 3d features. The interface supports cutting and erasing geometry..

This program is intended to open up new areas of 3d design and to contribute to the rapid prototyping stage of design. Some of the tools include create, bend, paint, extrude, and smooth. General positive feedback was recorded from some people.


I am always intrigued by the conversion of 2d drawings to 3d. I hadn't seen this paper before, and it is pretty interesting. This paper doesn't really do much recognition, and there are many possibilities for expansion by including sketch recognition techniques.

Reading #20: MathPad 2 : A System for the Creation and Exploration of Mathematical Sketches

by Joseph J LaViola Jr and Robert C Zeleznik

Comments: Marty

This paper presents a cool math sketch program that can do many things and simulate some math stuff... The cool thing about this is the interface. It is a big graph paper you can write lots of different equations, systems of equations, and diagrams. Gestures are used to help perform segmentation and identification. It can also generate graphs and plots. 12 people or so tested the system and gave some positive feedback. The interface is easy to use and the authors want to be able to include even more stuff and things... and bits.


This seems like a sketch interface for matlab or something. It can simulate many mathy things and is a general purpose math tool. I don't know what its current state is, since this is a fairly old paper.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields

by Yuan Qi, Martin Szummer, Thomas P Minka

Comments: Sam

The authors use Bayesian Conditional random Fields (BCRFs) to analyze sketched diagrams to gather contextual information to better recognize complex diagrams.. which are complex. There are many equations which are boggling my mind at this time. 17 users drawer some diagrams, and the algorithms achieved high recognition rates in the low to high 90s.


This is a cool approach for recognizing large, context sensitive drawings, of which diagrams are excellent examples. The mathy approach works pretty well.

Reading #18: Spatial Recognition and Grouping of Text and Graphics (2004)

by Michael Shilman and Paul Viola

Comments: Marty

This paper discusses grouping and recognition of sketch diagrams. They take a big canvas and identify many different symbols in it. This is cool. You can draw the stuff in any order and it will segment out each symbol. This is hard and probably the main contribution of the paper, but I am sleepy. The grouping had 99% accuracy.. sweet man. Also, the recognition and the grouping were 97% accuracy.


I am dealing with the segmentation problem in hand gestures, and I can relate to this problem. It is nice to have this problem solved with a high accuracy. This makes more complicated sketch interaction possible.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink (2004)

by Christopher M Bishop, Groffrey E Hinton

Comments: Marty

This is an earlier text vs shape paper. It uses both stroke features, gaps, and time data to help separate text from other strokes. They used HMMs for recognition. They collected data from some dudes. The dudes drew some stuff, whatever they wanted, as long as the sketches contained some text elements and some non-text strokes. Recognition results were mixed, with some groups getting in the mid 90s and some in the mid 70s.


Shape v Text is a hard problem, and there are many solutions to solve it. I don't like this gaps and time solution, however. It just doesn't make sense to me... I think I would like entropy better or simply visual approaches. Also, I think we can combine gestures into the mix to denote text. blah blah blah

Reading #16: An Efficient Graph-Based Symbol Recognizer (2006)

by WeeSan Lee, Levent Burak Kara, Thomas F Stahovich

Comments: Ozgur

This paper takes a graph-based approach to sketch (symbol) recognition and explored several graph matching techniques. They compute many error metrics for matching graphs and represent symbols using graphs. They collected several types of symbols from some users and ran their 4 matching algorithms on the data, getting results in the mid- to high-90s for most algorithms.


Some symbols can naturally be represented as graphs. We have seen that graph matching can yield high accuracy for appropriate shapes. I think that this could be one component of a good general purpose recognizer.

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches (2005)

by Levent Burak Kara, Thomas F Stahovich

Comments: Jonathan

This paper takes an image-based approach to sketch recognition using an ensemble classifier consisting of four different classifiers. They want a system that can recognize sketches very fast (real time for interaction) and that is also rotation invariant (using a fast polar coordinate technique).

This paper really focuses on the sketch interface and making it an attractive alternative to paper. To be a viable alternative, interaction (and therefore recognition) must be able to occur in real-time with no interruptions to the user. They also want to be able to recognize many shapes as well as "sketchy" shapes.

They used 20 shapes collected from some users. They achieved recognition rates in the mid to high 90s.


This is a good paper for an introduction to image-based approaches. It is also useful for understanding sketch interfaces. Considering the year (2005), the sketches were recognized very quickly and would be recognized even faster on today's machines.

Reading #14: Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (2009)

by Akshay Bhat and Tracy Hammond

Comments: Ayden

The authors propose that entropy rates are higher for text strokes than for non-text strokes and attempt to separate shapes from text using this idea. They achieved a 92% recognition rate. They define entropy, calculate entropy for all letters of the alphabet, and perform classification on collected sketches.


I agree that text shapes have high entropy, and it is interesting to note that this approach has not been taken earlier in the history of sketch recognition. Obviously some primitive shapes, such as circle and rectangle, will have lower entropy than text, but what about helixes or more complex shapes? This might be good in some diagramming domains.

Reading #13: Ink Features for Diagram Recognition

by Rachel Patel, Beryl Plimmer, John Grundy, and Ross Ihaka

Comments: Jianjie

This paper aims to perform more accurate diagram recognition by performing a statistical analysis of features used for recognizing various diagram components from sketched samples. This is pretty much an introduction to some of the important concepts in sketch recognition and illustrates some general approaches to sketch recognition. The paper particularly focuses on shape vs. text.

The authors took 46 features grouped into 7 categories. They collected some sketches from 26 participants which contained some diagram elements and text. They used a statistical partitioning technique to find which features can best split the strokes into shape or text strokes and then constructed decision trees with significant features toward the start of the tree.

They tested their methods with some existing shape v text systems and found some interesting results...


Sketch recognition still remains in its infancy despite its age, and formal analyses like this are important to help us understand the processes and achieve greater recognition performance. This work seems kind of inconclusive, however, and I didn't understand the results very well.

Reading #12: Constellation Models for Sketch Recognition (2006)

by D. Sharon and M. van de Panne (paper)

Comments: Francisco

A Constellation is a collection of objects arranged in a certain pattern, forming an image. Drawings can be imagined as constellations, especially with regards to common objects, such as the human face. Each face has the same features arranged in the same manner, with slight variances in size, shape, and location. However, several overall main relationships are always present: two eyes are on a horizontal line and are above the nose, for example.

The authors have used this concept for sketch recognition. Required drawing elements and their relative positions are set as features and classified using a maximum likelihood search. Many sketches were collected for training data, and common features were identified and labeled. 5 class values were defined, including faces, flowers, sailboats, airplanes, and characters.


This is an interesting way to identify common objects within sketches. Since the authors use only 5 classes, I wonder if this technique will support many classes. Also, it would be interesting to see how detailed this can get to perform more intricate recognition, such as multiple types of faces, or even individuals.

Reading #11: LADDER, a sketching language for user interface developers (2007)

by Tracy Hammond and Randall Davis (paper)

Comments: Jonathan

LADDER is a language to "describe how sketched diagrams in a domain are drawn, displayed, and edited." It is intended to help interface developers create sketch-based interfaces. LADDER is used to create shape descriptions. Shapes consist of components (such as lines), constraints (such as intersections), aliases, editing, and display properties. LADDER descriptions allow domain-specific definitions that can be used with domain-independent recognizers.

The paper gives many examples of shapes that can be modeled using LADDER. Such shapes include arrows and UML diagrams.

There are many predefined shapes (such as point, line, curve, ellipse), constraints (such as perpendicular, collinear, tangent, larger, acute), orientation-dependent constraints (such as horizontal, negative slope, above, centered below), editing methods (such as click, draw, encircle), and display methods (such as original strokes, ideal strokes, circle, rectangle, text, image).

Some shapes can be made up of certain numbers of segments, while some shapes can be made up of infinite segments.

When recognition occurs, primitive shapes are recognized first. Shapes are generated that contain the original strokes and their interpretations. Some shapes contain sub-shapes. Once primitive shapes are recognized, domain-specific shapes, using the domain descriptions, are recognized.


LADDER is a nice tool for interface developers. I like that it allows complex shapes to be completely described using a programming language. However, I do see certain drawbacks, namely that each shape must be explicitly defined in LADDER. The paper mentions the capability to generate descriptions based on drawn examples, and I think this would be a great idea. It would be very tedious to define explicit shapes for large domains (COA...). I don't know if it has been implemented yet.

Reading #10: Graphical Input Through Machine Recognition of Sketches (1976)

by Chrisopher F. Herot (paper)

Comments: Jonathan

This is an early sketch recognition system aiming to allow sketch input to computer programs. It contains a system called HUNCH that is used to recognize primitive sketches. It uses speed only to detect corners in a sketch. Curves were viewed "to be a special case of corners," and were modeled using b-splines. Speed was also used to determine how "careful" the user was, for instance faster strokes were less careful. This was used to help identify curves and decide whether to draw them as b-splines or just corners. The programs used to convert sketches to straight segments and curves were called STRAIT and CURVIT, respectively.

STRAIT and CURVIT did not always create the same interpretation as humans. They seemed to be user-dependent, as sketches were interpreted better for some users than for others. An improved method of straight segmenting was implemented, called STRAIN. It used a function of speed to determine which line endpoints to join (instead of a fixed distance, which is what STRAIT did).

Programs were also developed to detect overtraced lines and turn 2D sketches into 3D.

The paper discusses the importance of context in sketch recognition systems. The HUNCH system does not use context in its recognition schemes. For example, all its subroutines are always called in a fixed sequence and always perform recognition in the same way. However, similar strokes will probably be interpreted in different ways given their context.

The paper discusses an interactive system and examines the hierarchical structures of recognized sketches. It discusses various ways to tune the algorithms to work for a "truly interactive system."


This is an early approach to sketch recognition, and it asks many questions as well as answers some. It can be rather dry and boring, but it does bring up some questions that are still relevant today and whose solutions can still be improved on.

For example, the latching problem: when should close endpoints be merged? It depends on the context, which is another contemporary issue. While I was working on my truss recognizer, I dealt with the latching problem when merging close line endpoints to form truss nodes. I used a distance computed based on stroke lengths, but that was obviously a poor choice.

This paper begins to explore machine learning techniques for interpreting sketches. It introduces many questions and proposes possible solutions to those questions by hypothesizing extensions to an existing primitive corner finding and beautification system. I believe the authors asked good and relevant questions, as we are still asking ourselves how to solve many of the same problems in better ways.

Reading #9: PaleoSketch: Accurate Primitive Sketch Recognition and Beautification (2008)

by Brandon Paulson and Tracy Hammond (paper)

Comments: Jianjie

Paleosketch is a low-level sketch recognition system that can identify primitive shapes from single strokes. It is capable of recognizing Lines, Polylines, Circles, Ellipses, Arcs, Curves, Spirals, and Helixes.

Recognition occurs in three steps: pre-recognition, individual shape tests, and result ranking. Pre-recognition performs is essentially some pre-processing to make recognition easier. This includes removing duplicate points, feature graph generation (speed, curvature, etc) that can be used during recognition, tail removal, and two new features DCR and NDDE. NDDE is the normalized distance between direction extremes, and it is calculated by calculating the stroke length between the point with the highest direction value and the lowest direction value (direction value = change in y over change in x). DCR is the direction change ratio, and it is calculated by taking the "maximum change in direction divided by the average change in direction."

Individual tests are done for lines, polylines, circles, ellipses, arcs, curves, spirals, and helixes. Each test compares some stroke features and calculates the confidence that a stroke is that shape. When all tests are complete, the results are ordered using properties of the corner finding algorithm.

To test this system, data was collected from 10 users. This data was run on Paleosketch and compared with some features disabled as well as with Sezgin's algorithm. Paleosketch achieved a recognition rate of 98.56% across all shapes.


Paleosketch is a good primitive recognition algorithm, essentially combining ideas from previous work and adding in some new stroke features and result ranking. It performs much better than other algorithms we have covered so far, including Rubine's and Sezgin's algorithms. I have used Paleosketch some, and I have found it to be fairly accurate in practice. Its biggest drawback in my opinion is its speed. It runs fairly quick for smaller strokes and collections of a few strokes, but when strokes become long or there are many strokes being analyzed Paleosketch slows down and can take up to a second or more to execute. This makes it a poor choice for online recognition systems. However, it is a great improvement over previous systems.

Reading #8: A Lightweight Multistroke Recognizer for User Interface Prototypes (2010)

by Lisa Anthony and Jacob Wobbrock (paper)

Comments: Danielle

This paper presents a multi-stroke extension to Wobbrock's $1 single-stroke recognizer, called $N. This paper presents the same approach to incorporating recognition into any program, just as was done with $1. The paper contains the pseudo-code for the algorithm that fits on less than one page.

$N allows multiple strokes by connecting the strokes into one long stroke. The many possible permutations that could occur are generated when new templates are created. Resampling, scaling, and rotation all occur as in $1, with some adjustments to enhance the capabilities of $1.

In addition to providing support for multiple strokes, $N addresses some problems with the $1 algorithm. $N allows 1-dimensional gestures (such as lines) by calculating the ratio of the sides of the bounding box and using a threshold to determine if the gesture is 1D or not. $N allows rotation in the gestures as well. If rotation is desired, the gestures are rotated to the template angles instead of 0. Finally, $N provides optimizations to increase recognition speed. Templates are only compared if the starting angle of the stroke is "about the same." Also, the developer can choose whether to limit the number of strokes in gestures that will probably always have a set number of strokes (for example, + and = gestures).

The drawbacks to $N are scale invariance, using more strokes than in the template, collision of gestures, and large numbers of templates.

To test the algorithm, 40 middle and high school students used a sketch input program to input simple algebraic equations.


This paper is nice in the same ways $1 was nice. It can allow any level of programmer to implement pen gesture interaction in any program. Because $N is overall better than $1 (despite having slightly less accuracy for 1-stroke gestures) it is good.

There are more possibilities with $N than with $1. There are more ways it can be used, and more ways it can be enhanced as well. It also seems that $N could easily be extended to 3D (of course with more computation required) to use for hand gestures or something similar.

Reading #7: Sketch Based Interfaces: Early Processing for Sketch Understanding (2001)

by Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis (paper)

Comments: Marty

This paper describes a system that analyzes a sketch after it is drawn and analyzes what was drawn instead of how the sketch was drawn. It also allows multiple strokes in a sketch, something we have yet to discuss in this course.

One of the main features of Sezgin's system is its vertex detection, or corner finding, implementation. He uses a combination of speed and curvature to detect line corners. After segmentation, the straight edges of a sketch are stored as a polyline.

The second feature is curve handling. The system is able to model curves aszier curves by approximating the control points using a least squares method.

The system beautifies the drawn strokes "primarily to make it look as intended." Lines meant to be parallel are made parallel (also similarly with perpendicular lines), straight lines are made straight, and curves are rendered properly.

Finally, the system performs primitive object recognition. It uses simple geometric constraints to recognize ovals, circles, rectangles, and squares.

A user study was done to test the usability of the program and compare it to a tool-based drawing program. The participants found the author's system easier to use since any shape can be instantly drawn without having to select the corresponding tool. The author's report an accuracy of 96% when approximating drawn shapes from a set of 10 figures.


This paper is an early beautification paper that turns sketched drawings into actual technical drawings such as schematics and diagrams. It does this by applying corner and curve finding to determine the user's intended sketch. I think this paper helps show that sketching can be a superior method of input than traditional menu and tool bar based drawing programs. Such interfaces were rare then, and still are now, and hopefully we can build upon this to help popularize sketch-based interfaces.

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (2010)

by Yang Li (paper)

Comments: Wenzhe

Protractor is a modified $1 algorithm. The enhancements include support for up to 8 directions of rotation, scale invariance, and speed.

Protractor does the resampling as $1 does, and it uses N=16 for the number of points ($1 used 64 in its testing). Rotation invariance can be toggled on or off. If the gesture is to be rotation-independent, Protractor will rotate around the centroid until the indicative angle is 0, just as $1 does. If rotation is enabled, it rotates the indicative angle to one of 8 equidistant angles. Protractor does not scale the strokes as $1 does, so it is scale-invariant. The rotation adjustment step is also modified. Instead of taking an iterative approach to finding the optimal orientation, an angle is calculated that is close to the optimal angle.

Because of these modifications, Protractor performs significantly faster than $1 as the number of training examples increases. The recognition rates are not significantly different from $1. Because of the speed enhancements, Protractor is ideally suited for mobile device applications.


I like this extension of the $1 algorithm. It sounds like it isn't much more difficult to implement that $1, and the speed enhancements without sacrificing accuracy are nice. It is nice to be able to specify orientation-dependent gestures. This, along with the scale-invariance, can help expand the limited 16-gesture set used by the $1 paper. The paper did show us a 26 gesture class, and Protractor did perform significantly better on that than $1 did in that case.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (2007)

by Jacob Wobbrock, Andrew Wilson, and Yang Li (paper)

Comments: George

This paper describes the $1 gesture recognizer. This sketch/gesture recognition algorithm is intended to be a simple, easy to program algorithm that can be implemented anywhere. This hopefully would allow gestures to be incorporated into rapid prototyped interfaces that otherwise might not have been able to use gesture input. This is because most user interface designers and programmers don't have the necessary knowledge or skills to be able to implement complex recognition algorithms, and current recognition toolkits are not everywhere in any language, especially in many environments human-computer interaction experts might use.

The authors describe the algorithm in 4 parts: point resampling, indicative angle rotation, scaling and translation, and finding the optimal angle for best score. These transformations applied to each input stroke allow them to easily match up to a few template strokes for each gesture. The recognition result is the template gesture with the smallest Euclidean distance to the input stroke.

The $1 algorithm is compared to the DTW and Rubine algorithms, and it is found to compete well against them, achieving high recognition rates and recognition speed. The $1 algorithm pseudo code is given as well to aid programmers.


This paper is very clearly written and the $1 algorithm is indeed very simple. I find it interesting that such a simple, almost naive, approach can perform very well if executed intelligently. It is easy to imagine improvements and how to add more recognition capabilities to this algorithm, such as rotation-dependent or time-dependent gestures.

Reading #4: Sketchpad: A Man-Machine Graphical Communication System (1963)

by Ivan E. Sutherland (paper)

Comments: Jonathan

This paper presents the initial sketch-based interaction work of Ivan Sutherland. This was one of the first systems to use a pen to draw on a screen, ushering in a new form of human-computer interaction.

To use the system, the user has a set of buttons and switches to activate certain modes and tools, such as a line tool or a delete mode. When the desired settings are set, interaction using the pen accomplishes the desired task. It is important to note that the pen does not perform any free-form drawings, but rather creates geometry using only pre-defined tools or performs commands using pointing or dragging. This makes it more like a CAD system that uses a pen for input (note that the mouse did not exist at the time of this work).

The paper shows its age by emphasizing things like the data structures and memory usage as well as generic representations of sketch elements. A "light pen" is used as the input device.

Most of the paper details the various constraints and tools and how they were implemented using non-procedural object oriented methods, all of which were new ideas (as discussed in this video).


This paper introduced many new ideas about human-computer interaction, graphical displays, and programming. It was the first of its kind in almost every aspect. It is hard to appreciate it now without reading comments from many years ago. Much of the paper seems trivial to implement using our current software development languages and tools. I found it interesting that many ideas were introduced back in 1962 that are still active, and hard, research problems today (such as recognizing artistic drawings and electrical schematics).

Reading #3: "Those Look Similar!" Issues in Automating Gesture Design Advice (2001)

by Long, Landay, and Rowe

Comments: Sam

This paper presents the quill gesture design tool that is aimed at helping developers create pen gesture-based interfaces. The quill software gives advice to the developers if multiple gestures might be ambiguous to the computer or visually similar to people.

The authors conducted some experiments to determine what kinds of gestures are perceived as similar by people by having a few hundred participants judge a large number of gestures and pick the most complex ones. They then developed an algorithm for predicting gesture similarity.

Interface designers use quill to input gestures for their interfaces. quill uses the similarity algorithm and Rubine's methods to give feedback to the users and train and recognize the gestures. The paper talks in detail about challenges related to giving advice such as how, when, and where advice is displayed in addition to what advice is displayed.

The authors conclude that the quill system, while it could use many refinements and improvements, is a good start and can possible inspire other advice-giving systems for gesture-based interfaces.


I can appreciate the assistance given to developers relating to gesture definitions. There still are not many systems that can do this, especially with 3D hand gestures. I have run into issues in my own research where two gestures I didn't think were similar actually were, and it can be a pain to re-define gestures, especially if you discover the similarity after a large gesture set has been defined, which can make it difficult to think of a new unique gesture. I would really appreciate more development of these tools for 2D and 3D gestures.

Reading #2: Specifying Gestures by Example (1991)

by Dean Rubine (paper)

Comments: Danielle

This paper presents Rubine's gesture-recognition algorithm and his implementation of a program that doesn't require a hand-coded recognizer. His goal is to increase the adoption of sketch-based gesture recognition in user interfaces by making it easier to integrate recognition by providing example gestures fed into a learning algorithm rather than hand-coding the recognizer.

Rubine has implemented a gestural drawing program in which simple single-stroke gestures are used to create and manipulate a drawing. Example gestures include rectangle creation, ellipse creation, copy, rotate-scale, and delete. The user of the program is able to add new gesture examples to aid recognition as well as modify the structure of each gesture.

He presents his simple gesture recognition algorithm, which assumes stroke segmentation is already taken care of. For the stroke drawn as the gesture, 13 features are computed. Rubine states that these 13 features are capable of recognizing many gestures, but fail in some cases. Once the features are calculated, they are input to a linear classifier that gives the class name of the stroke. He discusses how the classifier is trained, which is basically the standard method of training a linear classifier.

The classifier always gives one of the gestures from all gesture classes. A probability function is used to determine the probability that the gesture was classified correctly, and if that value falls below a threshold, the classification is rejected, as the gesture is ambiguous. He also rejects gestures based on the number of standard deviations of the gesture from the mean of the classification gesture class.

Rubine says his methods perform well in practice using 10 different gesture sets. He reports recognition rates in the mid to high 90s for varying numbers of examples per class, gesture classes per set, and test gestures per class.


This paper seems to be on the cutting edge of sketch recognition technology for its time. Indeed, the concepts presented in this field are still widely used and studied today. Very little work and very few non-hand-coded recognition applications existed in 1991. I was impressed by the high accuracy achieved on the gesture sets using the linear classifier, though the accuracy reporting didn't seem complete. I have seen other systems, such as in our lab, that can recognize much larger classes of data, and I am particularly interested in 3D extensions of this method as well as other classification algorithms I have been brainstorming, which I look forward to implementing.

Reading #1: Gesture Recognition (2010)

comments: Chris

Tracy Hammond (paper)

This paper is a summary of some well-known gesture recognition techniques in sketch recognition. It begins with a presentation of Rubine's feature-based algorithm. It describes the 13 features and gives some examples of how they are used as well as some illustrations to help understand the features. It briefly touches on the training and recognition system Rubine used, but doesn't go into much detail. Long's 22-feature extension of Rubine's algorithm is then presented in the same way. It presents Long's extra 11 features. Finally, Wobbrock's $1 algorithm is described.

This is a pretty good summary of these well-known sketch recognition methods. I think it summarizes the key points of each approach and would allow the implementation of each method without much trouble. It could probably make up part of a handbook/reference guide. There were many grammatical errors/typos and some missing figures and details, however.

Homework #1: CSCE 624 Introduction

picture of myself

dalogsdon gmail
2nd year Master of Computer Science

I am taking sketch recognition to gain a deeper knowledge of current work in the field. I have a creative and artistic background in addition to computer science, so I feel I might have a unique viewpoint to the issues we will discuss in the class.

Ten years from now, I expect to know what the next big technological advancement in computer science was. As an undergraduate, I rather enjoyed my computer-human interaction, software engineering, figure drawing, painting, and photography courses. My favorite movies are action/adventures and comedies. If I could travel back in time, I would not meet anyone for fear of destroying my existence, but I might go to my parents house and peek in the windows to see myself as a baby. It is perhaps interesting to note that 30 of my spinal vertebrae are fused.

Recent Developments and Applications of Haptic Devices


Comments: Manoj, Franck

This paper basically collects all the major haptic devices and technologies for comparison. It separates various input devices by degrees of freedom. It also presents some glove based devices and discusses vibration and hydraulics, among other things.

EyeDraw: enabling children with severe motor impairments to draw with their eyes


Comments: Franck, Murat

EyeDraw is a system which runs using an Eyegaze system to allow drawing with the eyes. Many disabled people have used eye-tracking systems for years to use a computer and to communicate. The EyeDraw system was developed and tested with 4 users. With feedback from the users, a 2nd version was developed and some improvements were made to enable easier use. Users gave positive feedback, though drawing was still difficult. The paper mentions a 3rd version under development.


The EyeDraw system is the most successful application I have seen for drawing with the eyes. It seems to solve the Midas touch problem somewhat, and the users can actually draw with it.

Coming to Grips with the Objects We Grasp: Detecting Interactions with Efficient Wrist-Worn Sensors


This paper aims to recognize gestures by attaching an accelerometer and an RFID transmitter to a wrist worn device. The idea is that we know what device is being used due to RFID tags placed on devices, and the accelerometer data paired with the device ID can be used to classify the current gesture. A "box test" was performed in which various items were placed into and removed from the box using different RFID antenna types, different objects, and with different subjects. A long term study was also conducted in which the bracelet was worn for an entire day to test real world applications.


I think the idea of this paper is nice, and if RFID tags were present in all our everyday items, much information could be obtained about how users interact on a day to day basis. I think it would only work well with a high number of tagged items, so if that doesn't happen, I don't know how useful this will be.

User-Defined Gestures for Surface Computing


This paper is a study of gestures for multitouch display interaction. The gestures were user defined for 27 commands. In all, 1080 gestures were observed. The purpose of this paper is to "help designers create better gesture sets informed by user behavior." The resulting gesture set was completely user defined. It turned out that the combined gestures predicted by the authors only covered 43.5% of the final gesture set. The authors also found that "users rarely care about the number of fingers they employ, that one hand is preferred to two, that desktop
idioms strongly influence users’ mental models, and that some commands elicit little gestural agreement."


This paper illustrates the necessity of user input when designing s system. Often, surprising results can be obtained by studying users in this way. This step is crucial when designing a new type of system in which little is known about the input or little input is formally defined.

Whack Gestures: Inexact and Inattentive Interaction with Mobile Devices


This paper explores using simple hitting gestures to interact with a device. For example, if the device is attached to the belt, a simple smack or wiggle can provide interaction. A custom device was developed and tested for this project. Users found that the whack gestures were simple, easy to remember. They experimented with multiple whacks and combinations of whacks and wiggles.


This type of interaction could be useful for cell phones, ipods, and other common pocket devices. For example, if the cell phone starts ringing, it could be silenced with a whack. Some ipods also use a shake, or wiggle, to advance songs. I personally don't use that feature very often.

Device Agnostic 3D Gesture Recognition using Hidden Markov Models


This paper attempts to determine how to effectively use HMMs to classify 3d gestures "regardless of the sensor device being used." They use a decomposition technique which successfully works for any sensor combination.


I am currently beginning learning HMMs, and this could be helpful as I design my gesture recognition algorithms for the glove.

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes


This paper is a well known sketch recognition paper that can recognize 16 shapes with a simple algorithm.


This is a good paper for beginning sketch recognition. It is easy to implement, and can recognize a fair set of shapes. Higher level systems can be built on top of this to create simple yet rich sketch interfaces.

The $3 Recognizer: Simple 3D Gesture Recognition on Mobile Devices

This is an extension of the $1 to 3D. It does not support all the original $1 shapes, but it does add a few new shapes, such as tennis serve and hacksaw.


This is cool, and I would like to see a video or a user study of how well this works for 3D sketching.

An Empirical Evaluation of Touch and Tangible Interfaces for Tabletop Displays

This paper compares several different methods for interacting on a table type display. They have implemented a touch based interface as well as a model-based interface on a single surface. They compared the touch and model based interaction for a variety of computing tasks and found that touch was better for some tasks and models were better for other tasks.


I personally have never played around with any model based interfaces. I can see the usefulness of such an interface, though I am not sure that many small pieces would be such a good thing.

FreeDrawer: a free-form sketchingsystem on the responsive workbench

These dudes use a pen to sketch in 3D. They employ curve and surface smoothing to create smooth drawings.


This paper was not so interesting. Drawing in 3D is nice, but I have seen better methods. Also no tests were done, so we don't know how well it actually works.

That one there! Pointing to establish device identity (2002)

Colin Swindells, John C. Dill, Melanie Tory
Simon Fraser University

Kori M. Inkpen
Dalhousie University

Comments: ...

This paper deals with the issue of human-computer identification. As the number of computing devices increase per person, the number of entries in wireless network lists increase. This makes it difficult for a person to select another computer or device to connect to in order to send information to that device. Traditionally, the name of the device is selected from a list of all visible devices on the network. As more and more devices are added to that list (with non-descriptive names much of the time), it becomes hard for humans to select the correct device to connect to. This trend contrasts with the increasing ease of computers to automatically enter, exit, and identify previously connected computers and devices. A solution to the human-computer identification problem is pointing. The device which the user wants to connect to is simply pointed at, and the computer can identify the target device and connect. This paper presents a device, called the gesturePen which sends an IR signal to tags installed on the target devices. By pointing the pen at the device, the device ID can be acquired and easily connected.

The paper illustrates some similar point-to-identify solutions, and points out that all others use a system which constantly broadcasts the device ids, which can still overwhelm the user. The gesturePen system tags "are only activated when ‘pinged’ by the gesturePen."

Liquids, Smoke, and Soap Bubbles – Reflections on Materials for Ephemeral User Interfaces (2010)

Axel Sylvester
University of Hamburg

Tanja Döring, Albrecht Schmidt
University of Duisburg-Essen

Comments: ..

This is a short paper intended to "provoke thoughts about durability, control, and materiality
of tangible user interfaces" by introducing the concept of an "ephemeral user interface" composed of transient materials, liquid, smoke and soap bubbles, that eludes complete user control by demanding that the inputs be treated delicately, as the bubbles inevitably will burst. The user interacts with a computer system by generating and then manipulating soap bubbles which can be empty or filled with smoke. The interaction surface is composed of a dark liquid on which the bubbles land after being generated.

This work is motivated by the increasing presence of computing in our everyday tasks and the lack of research in the area of materials used for interaction, despite studies illustrating "the importance of materials and materiality for humans." By using such unusual and transient materials such as smoke and soap bubbles, this work easily provokes thought about the possibilities of materials and "handles" used for interaction through its unusualness and "contradiction to ordinary technical and durable materials of computer technology."

Soap bubbles are highly symbolic and therefore are relevant to many fields including science, art, and entertainment. A fascination with soap bubbles occurs when viewing them as "'in-between' spaces - spaces that are neither real nor fully virtual." This is easily applied and understood from a computing interface perspective.

The system consists of a small, dark, round pool of liquid with camera beneath tracking the bubbles, which are blown onto the surface of the liquid from above. Either empty or smoke filled bubbles can be generated, and an overhead projector can illuminate the bubbles.

Once on the surface of the liquid, the bubbles can then be moved either by blowing or gently touching. In one application, the size of the bubbles determines the brightness of the ambient light in the room, and the x and y coordinates control red and blue hues of the ambient light in the room.

The researchers see this as a playful, entertaining, yet useful interaction mechanism as computing is further integrated into our everyday lives. For example, the paper suggests "a growing demand for user interfaces for services where specific and accurate control is not necessary and playful interaction with diverse materials suits the situation well." To illustrate this concept, the paper also suggests the concept of "buttons on demand" which could use these ephemeral materials or simple ambient displays.


I like the ideas that this paper provokes. I hadn't thought of such approaches to tangible user interfaces. I tend to view currently available hardware and think of ideas of how to use those for interfaces. This paper inspires me to think of alternate materials and input methods and devices.

Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration

Indrajit Poddar, Yogesh Sethi, Ercan Ozyildiz, Rajeev Sharma
Pennsylvania State University

Comments: ...

This paper discusses the limitations of current gesture recognition, claiming that all the restrictions imposed in most work violate the naturalness of the HCI involved. Therefore, they have decided to impose no restraints on the user by analyzing videos of weathermen, which is a domain they claim to be analogous to HCI. They employ some vision techniques to identify the person's head and hands and extract 5 features for each hand (distances, angles, velocities).

They use an HMM to recognize the gestures and have defined possible causal models. The speech was also analyzed in conjunction with the hand gestures to try to improve correctness and accuracy in recognizing the gestures.

To begin with, three main types of gestures were imagined: "here," which refers to a specific point, "direction," which can be something like east(ern) or north(ern), and "location," which is a proper noun form of "here." Three classes of gestures were named: contour, area, and point. The speech was analyzed in conjunction with the gesture to determine at what time some keywords were spoken: before, during, or after the gesture.

Analysis of speech and gestures shows that relevant keywords are spoken during the gesture the majority of the time, and sometimes after the gesture. Therefore, the speech can be used both as classification and verification of the gesture. Separate analysis of just the video vs the video and speech shows higher correctness and accuracy when speech is included.

Though the accuracy of this system is considerable lower than other gesture recognition systems, the authors claim this is much more natural, as the subjects analyzed were not participating in any user study at all. They were just naturally speaking and making gestures. The authors state that this study can "serve as a basis for a statistical approach for more robust gesture/speech recognition for natural HCI."


As someone who is currently working on a hand gesture recognition project (using the data glove), I am thinking about the implications of this work in my own project. We are currently imagining a very limited gesture set, though we have been thinking about the differences of gestures among different users. We have been imagining a user study to determine what specific gestures to use, but this paper makes me think of eventually extending my current work in a much more natural direction where users can perform the gestures they want to and the system will respond in a unique way to each user while letting the user perform his own natural gesture, undefined by the game. This could make for an interesting system considering the domains we are targeting.

The Wiimote with multiple sensor bars: creating an affordable, virtual reality controller (2009)

Torben Sko, Henry Gardner
Australian National Univeristy

Comments: ...

This paper discusses using a Wii remote as a viable method to control a virtual reality system by using multiple sensor bars to define a much larger field of view for the Wii remote. This way, the Wii remote can be used across a surrounding display.

Five sensor bars were arranged in front of the user in a vertical position, as illustrated by the image above and by the video. Software allows the Wii remote to "bunny hop" from one sensor bar to another, since the Wii remote can only see 4 IR sources at one time, and one sensor bar contains 2 IR LEDs.

The researchers modified the Half Life 2 engine to create a game suitable for testing. The Wii remote was able to successfully track across the whole screen, allowing the user to play the game as normal.

The biggest limitation of this system comes from the "bunny hopping" feature of the Wii remote. Because it knows where it is based on the currently visible IR sources, it must be constantly pointed at the screen, which causes fatigue for the user.


I was impressed at the effectiveness of this method. The video clearly shows that the Wii remotes are very adequate for precise aiming across the large 2-walled display. I think the limitation imposed by the Wii remote technology is not a big issue, since more specialized systems can create a controller that the user can set down and not keep aimed at the IR sources. The main contribution of this paper is that a Wii remote style or gun-style free-hand aiming system can be implemented for surround screen usage using inexpensive parts.

That being said, I would like to see them improve on this by allowing the user to rest and lower the controller away from the screen, whether they are able to do this with the Wii remote or some other hardware. Naturally, this demo makes me think of glove applications on this type of screen, though I don't have any specific ideas yet...

The Peppermill: A Human-Powered User Interface Device (2010)

Nicolas Villar and Steve Hodges
Microsoft Research, Cambridge, UK

Comments: ...

This paper presents the Peppermill, which is a wireless and batteryless interaction device. The device is powered by the user and momentarily sends out a digital signal.

The paper gives some background on user-powered devices, beginning with the Zenith Space Commander developed in 1955. The authors also mention MIT's user-powered button. Both these devices have the limitation that power is generated and therefore interaction only happens on the down-press of a button. The authors aim to improve on this idea by providing a method for richer interaction.

The method they came up with is a rotary control that is powered when the user twists the device. The user can twist the knob in 2 directions with varying speeds. A simple circuit detects the direction and speed along with a set of modifiers consisting of three buttons. This simple device is thus capable of very rich interaction.

The authors give an example of usage for this device by using it to control a video browsing application. When no buttons are pressed, a set of videos is cycled through, much like changing the channels on a tv. The speed of rotation controls the speed of video cycling, and the direction controls the direction of cycling. When the green button is held down and the control is rotated, the volume is adjusted. Once again, the speed of rotation controls the speed by which the volume is adjusted and the direction determines if the volume is adjusted up or down.

The authors talked a bit about future work, most notably a method of providing haptic feedback to the user while the knob is being turned.


I was intrigued by this control, not only because it is human-powered, but also because of its unique interaction style. When I first looked at how this device is used, I actually didn't know it was human-powered. I am impressed with the versatility of a device that has no batteries or cord.

This method of interaction, especially without batteries, got me thinking about our own projects. I wonder if we can come up with a glove that is somehow human-powered. That would allow for greater motion than the wired gloves, and would not need batteries, like the wireless gloves need.

Gameplay Issues in the Design of 3D Gestures for Video Games (2006)

John Payne, Paul Keir, Jocelyn Elgoyhen, Mairghread McLundie, Martin Naef, Martyn Horner, Paul Anderson
Digital Design Studio, Glasgow School of Art


This paper talks about the importance of designing 3D gestures and the relationship between gestures and gameplay. It discusses some important concepts such as affordance, mapping, and feedback. It stresses the importance of simplicity and the mapping of gestures to actions.

The team developed a 3D gesture capturing device which they call the 3motion. It uses a combination of accelerometers similar to a Wiimote to perform 3D gestures. It should be noted that this work was done almost a year before the Wii was released.

To test their device, several simple games were used. These included a tilt-ball game, an alarm game, the classic helicopter game, and a spell-casting game. Unique 3D gestures were defined for each game.

Two users were employed to test out each game and its gestures. Some games and their corresponding gestures had more success than others, which the researchers attribute to varying degrees of "informative tutorials, single word instructional phrases, effective semiotics and appropriate user feedback" among the games. These principles, they say, are very important to ensuring that "the gesture based interaction is intuitive, fun and rapidly understood."

While not initially an important factor, they came to realize that the gestures and the type of gameplay were tightly coupled and must be evaluated together.


I was very interested in this paper particularly because of the final project for this class. We are also using 3D gestures, though with a glove instead of a handheld device.

We are also faced with the problem of designing the gestures for our system, and we plan on doing a preliminary study to help guide us to the correct gestures. We can use the insight of this paper to help guide us in out design.

I also wonder what influence the upcoming Wii and its controller had on this research, if any? I do not remember when the Wii was announced, unfortunately, though I doubt it was announced as early as this research.

An Architecture for Gesture-Based Control of Mobile Robots

Soshi Iba, J. Michael Vande Weghe, Christiaan J. J. Paredis, and Pradeep K. Khosla
Carnegie Mellon University

Comments: ...........

This paper presents a system for controlling mobile robots using hand gestures.

Previous work has been done with controlling robots in various ways, but most of those have used a keyboard and mouse to control the robots. This is deemed inappropriate for novice or unfamiliar users, so a more intuitive interface is necessary for these kinds of users.

The goal of this project is to work toward an intuitive, multi-modal system for controlling mobile robots. This project introduces hand gestures as a means to control mobile robots by waving in the desired direction the robots should move or pointing at the intended location for the robots to move.

The system uses a CyberGlove, a Polhemus 6DOF sensor, and a GPS unit in the robot itself.

6 gestures were used to control the robot:

OPENING: Moving from a closed fi st to a flat open hand
OPENED: Flat open hand
CLOSING: Moving from a flat open hand to a closed fist
POINTING: Moving from a flat open hand to index fi nger pointing, or from a closed fi st to index fi nger pointing
WAVING LEFT: Fingers extended, waving to the left, as if directing someone to the left
WAVING RIGHT: Fingers extended, waving to the right

They also incporated a "wait state," which simply was a gesture other than those above.

Robot control can occur in two modes: local and global. In local mode, the gestures are interpreted as if from the point of view of the robot. In global mode, they are interpreted in world coordinates to control the robot from the user's view. The reason for having the local control mode is to operate the robot remotely, in which video signals from the robot are viewed. The global control mode is used if the robot is in sight of the user.

The gestures work like this for Local Control:

CLOSING: decelerates and eventually stops the robot
OPENING, OPENED: maintains the current state of the robot
POINTING: accelerates the robot
WAVING LEFT/RIGHT: increase the rotational velocity to move left/right
The gestures work like this for Global Control:

CLOSING: decelerates and eventually stops the robot (cancels the destination if one exists)
OPENING, OPENED: maintains the current state of the robot
POINTING: "go there"
WAVING LEFT/RIGHT: directs the robot towards the direction in which the hand is waving.

A Hidden Markov Model algorithm was used to detect and recognize the gestures with an accuracy of 96%. The wait state feature helps the recognition significantly compared to systems without a wait state.


I like that these researchers are trying to get a more intuitive interface for controlling robots so that novice users can use the system. I always like this approach to projects where appropriate. It is also interesting to see a new use of the data glove that I have not thought of before.

I wonder how feedback is given from the robot/system to the user. It would be very important to know exactly how your actions are affecting the robot, so that you don't over-steer or over-accelerate the robot, for example. This problem would be magnified if there is any sort of delay between a gesture and the robot's response as perceived by the user or if the user is using local mode, which definitely would introduce some lag.

I would like to see a usability study, obviously, to sort out issues like the one I have described, especially if the research is aimed at the general public.

I am also interested in the high-level multi-robot control to come...