Monocular 3D human pose estimation using sparse motion features

Ben Daubney, David Gibson, Neill Campbell

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review


    In this paper we demonstrate that the motion of a sparse set of tracked features can be used to extract 3D pose from a single viewpoint. The purpose of this work is to illustrate the wealth of information present in the temporal dimension of a sequence of images that is currently not being exploited. Our approach is entirely dependent upon motion. We use low-level part detectors consisting of 3D motion models, these describe probabilistically how well the observed motion of a tracked feature fits each model. Given these initial detections a bottom-up approach is employed to find the most likely configuration of a person in each frame. Models used are learnt directly from motion capture data and no training is performed using descriptors derived from image sequences. The result is the presented approach can be applied to people moving at arbitrary and previously unseen orientations relative to the camera, making it particularly versatile and robust. We evaluate our approach for both walking and jogging on the HumanEva data set where we achieve an accuracy of 65.8±23.3 mm and 69.4±20.2 mm for each action respectively.
    Original languageEnglish
    Title of host publication2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops)
    PublisherInstitute of Electrical and Electronics Engineers
    Number of pages8
    ISBN (Print)978-1-4244-4442-7
    Publication statusPublished - Sept 2009

    Publication series

    Name2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops)


    • Humans
    • Motion estimation
    • Tracking
    • Data mining
    • Motion detection
    • Detectors
    • Image sequences
    • Cameras
    • Robustness
    • Legged locomotion
    • pose estimation
    • feature extraction
    • human eva data set


    Dive into the research topics of 'Monocular 3D human pose estimation using sparse motion features'. Together they form a unique fingerprint.

    Cite this