A Multi-scale Boosted Detector for Efficient and Robust Gesture Recognition

Monnier, C., German, S., and Ost, A.

Presented at the ECCV ChaLearn Workshop on Looking at People, Zurich, Switzerland (September 2014)

We present an approach to detecting and recognizing gestures in a stream of multi-modal data. Our approach combines a sliding window gesture detector with features drawn from skeleton data, color imagery, and depth data produced by a first-generation Kinect sensor. The detector consists of a set of one-versus-all boosted classifiers, each tuned to a specific gesture. Features are extracted at multiple temporal scales, and include descriptive statistics of normalized skeleton joint positions, angles, and velocities, as well as image-based hand descriptors. The full set of gesture detectors may be trained in under two hours on a single machine, and is extremely efficient at runtime, operating at 1700fps using only skeletal data, or at 100fps using fused skeleton and image features. Our method achieved a Jaccard Index score of 0.834 on the ChaLearn-2014 Gesture Recognition Test dataset.

For More Information

To learn more or request a copy of a paper (if available), contact Camille Monnier.

(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)