Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers

KAIST
CVPRF(CVPR Findings) 2026

*Indicates Equal Contribution
Our framework blends pretrained experts from distinct motion domains—a dynamic whole-body agent (PHC) and a contact-aware HOI agent (InterMimic)—enabling dynamic and contact-rich human-object interaction (HOI) generation and imitation in physics simulation. Across diverse motion styles (run, jump, kick, dance) and HOI categories, the composer demonstrates versatility and robustness while maintaining training efficiency.

Our framework blends pretrained experts from distinct motion domains—a dynamic whole-body agent (PHC) and a contact-aware HOI agent (InterMimic)—enabling dynamic and contact-rich human-object interaction (HOI) generation and imitation in physics simulation. Across diverse motion styles (run, jump, kick, dance) and HOI categories, the composer demonstrates versatility and robustness while maintaining training efficiency.

Abstract

Generating physically plausible dynamic motions of human-object interaction (HOI) remains challenging, mainly due to existing HOI datasets limited to static interactions, and pretrained agents capable of either dynamic full-body motions without objects or static HOI motions. Recent works such as InsActor and CLoSD generate HOI motions in planning and execution stages, are yet limited to either static or short-term contacts e.g. striking. In this work, we propose a framework that fulfills dynamic and long-term interaction motions such as running while holding a table, by combining pretrained motion priors and imitation agents in planning and execution stages. In the planning stage, we augment HOI datasets with dynamic priors from a pretrained human motion diffusion model, followed by object trajectory generation. This plans dynamic HOI sequences. In the execution stage, a composer network blends actions of pretrained imitation agents specialized either for dynamic human motions or static HOI motions, enabling spatio-temporal composition of their complementary skills. Our method over relevant prior-arts consistently improves success rates while maintaining interaction for dynamic HOI tasks. Furthermore, blending pretrained experts with our composer achieves competitive performance in significantly reduced training time. Ablation studies validate the effectiveness of our augmentation and composer blending.

TL;DR We blend pretrained dynamic motion and HOI imitation agents via a composer network to achieve physically plausible, contact-rich human-object interaction in physics simulation.

Demo Video

Quantitative Results

Qualitative Results

Dynamic HOI Planning

Dynamic HOI Execution

More End-to-end Results

Poster

BibTeX

@inproceedings{cvprf2026dynamichoi,
  title={Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers},
  author={Nam, Sanghyeok and Kim, Byoungjun and Park, Daehyung and Kim, Tae-Kyun},
  booktitle={CVPRF(CVPR Findings)},
  year={2026},
}