Building Robotics Foundation Models with Reasoning in the Loop by Mr Duan Jiafei

13 Jan 2026 10.00 AM - 11.00 AM ESR10 Current Students, Industry/Academic Partners, Prospective Students

Abstract
Recent advances in generative AI have demonstrated the power of scaling: large language and vision models trained on internet-scale data now exhibit remarkable capabilities in perception, generation, and  reasoning. These successes have inspired growing interest in bringing foundation-model paradigms to robotics, with the goal of moving beyond
task-specific autonomy in constrained environments toward generalpurpose  robots that can operate robustly in open-world settings. However, robotics fundamentally differs from language and vision. Robot learning cannot rely on passive internet data at scale, and collecting large-scale, high-quality embodied interaction data remains expensive and slow. As a result, simply scaling data and model parameters is insufficient. To build general-purpose and robust robotics foundation models, we must instead ask: how can robots learn more from less data—and continue to improve

over time? 

In this talk, I argue that reasoning in the loop offers a promising path forward. Rather than treating reasoning as a downstream capability applied after learning, I show how reasoning can be integrated directly into the learning process itself. This enables robots to learn from structured feedback, temporal context, and failure, thereby compensating for data scarcity and improving generalization. I will present a unified research agenda along three axes. First, I introduce approaches for spatial reasoning, enabling robots to ground language in 3D space and reason about object relationships for precise manipulation. Second, I discuss temporal reasoning, focusing on memory-centric models that retain, query, and reason over past observations to support long-horizon,high-precision control. Third, I show how reasoning over failures allows robots to understand why actions fail and use that understanding to selfimprove, increasing robustness without additional supervision. Together,these results reframe robotics foundation models as systems that learn through reasoning, closing the loop between perception, action, and structured inference to enable self-improving autonomy.

 

Biography
Jiafei Duan is a Ph.D. candidate in Computer  Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on robotics foundation models, with an emphasis on scalable data collection and generation, grounding vision–language models in robotic reasoning, and improving
robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Jiafei’s research has appeared in top AI and robotics venues, including ICLR, ICML, RSS, CoRL, ECCV,
IJCAI, CoLM, and EMNLP, and has received several honors, including Best Paper at Ubiquitous Robots 2023, Best Paper at the CoRL RememberRL Workshop 2025, and a Spotlight Award at ICLR 2024.