Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration

Indrajit Poddar, Yogesh Sethi, Ercan Ozyildiz, Rajeev Sharma
Pennsylvania State University


Comments: ...

This paper discusses the limitations of current gesture recognition, claiming that all the restrictions imposed in most work violate the naturalness of the HCI involved. Therefore, they have decided to impose no restraints on the user by analyzing videos of weathermen, which is a domain they claim to be analogous to HCI. They employ some vision techniques to identify the person's head and hands and extract 5 features for each hand (distances, angles, velocities).

They use an HMM to recognize the gestures and have defined possible causal models. The speech was also analyzed in conjunction with the hand gestures to try to improve correctness and accuracy in recognizing the gestures.

To begin with, three main types of gestures were imagined: "here," which refers to a specific point, "direction," which can be something like east(ern) or north(ern), and "location," which is a proper noun form of "here." Three classes of gestures were named: contour, area, and point. The speech was analyzed in conjunction with the gesture to determine at what time some keywords were spoken: before, during, or after the gesture.

Analysis of speech and gestures shows that relevant keywords are spoken during the gesture the majority of the time, and sometimes after the gesture. Therefore, the speech can be used both as classification and verification of the gesture. Separate analysis of just the video vs the video and speech shows higher correctness and accuracy when speech is included.

Though the accuracy of this system is considerable lower than other gesture recognition systems, the authors claim this is much more natural, as the subjects analyzed were not participating in any user study at all. They were just naturally speaking and making gestures. The authors state that this study can "serve as a basis for a statistical approach for more robust gesture/speech recognition for natural HCI."

----------

As someone who is currently working on a hand gesture recognition project (using the data glove), I am thinking about the implications of this work in my own project. We are currently imagining a very limited gesture set, though we have been thinking about the differences of gestures among different users. We have been imagining a user study to determine what specific gestures to use, but this paper makes me think of eventually extending my current work in a much more natural direction where users can perform the gestures they want to and the system will respond in a unique way to each user while letting the user perform his own natural gesture, undefined by the game. This could make for an interesting system considering the domains we are targeting.

2 comments of glory:

M Russell said...

A multi modal approach should increase the accuracy of classification. I liked the identification of particular keywords being associated with gesture subcomponents.

Franck Norman said...

Allow the users to perform the gestures they want and make the system respond accordingly. That sounds like a really good idea.

Post a Comment