Presented at the ECCV ChaLearn Workshop on Looking at People, Zurich, Switzerland (September 2014)
We present an approach to detecting and recognizing gestures in a stream of multi-modal data. Our approach combines a sliding window gesture detector with features drawn from skeleton data, color imagery, and depth data produced by a first-generation Kinect sensor. The detector consists of a set of one-versus-all boosted classifiers, each tuned to a specific gesture. Features are extracted at multiple temporal scales, and include descriptive statistics of normalized skeleton joint positions, angles, and velocities, as well as image-based hand descriptors. The full set of gesture detectors may be trained in under two hours on a single machine, and is extremely efficient at runtime, operating at 1700fps using only skeletal data, or at 100fps using fused skeleton and image features. Our method achieved a Jaccard Index score of 0.834 on the ChaLearn-2014 Gesture Recognition Test dataset.
For More Information
(Please include your name, address, organization, and the paper reference. Requests without this information will not be honored.)