Text-to-image models, like Stable Diffusion, challenges persist in generating anatomically accurate images of humans. Generated images contain wrong numbers of hands or fingers, particularly when image subjects are depicted holding items, when their hands are intertwined, or when the view angle is obstructed. To provide a correct human prior for text-to-image models, our paper present a human mesh recovery and motion capture framework. The framework produces 3D human mesh from monocular images and videos to capture precise poses of human body, face, and hands. Our method capture body motion in real-time and the pose of generated full-body parametric model (SMPL-X) is highly aligned to the inputs. We demonstrate the effectiveness of our model on various challenging real-world datasets and apply the model in multiple downstream tasks.
(Source: PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images)
Reload page to sync videos if needed.
(Source: SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation)
(Source: https://toyxyz.gumroad.com/l/ciojz)
The generted 3D human mesh can be observed 360 degrees by varying camera angles.
(Source: ECON: Explicit Clothed humans Optimized via Normal integration)
(Source: VIBE: Video Inference for Human Body Pose and Shape Estimation)
The 3D human model can be animated to another character.
(Source: ROMP: Monocular, One-stage, Regression of Multiple 3D People)
@article{yu2023hmr,
author = {Yuan-Peng Yu, Scott John Easley},
title = {Human Mesh Recovery and Motion Capture},
journal = {},
year = {2023},
}