Data-Efficient Robot Manipulation through Diffusion Augmentation and Vision-Language Models by Asst. Prof. Daniel Seita

23 Jan 2026 03.00 PM - 04.00 PM LT26 (South Spine) Current Students, Industry/Academic Partners, Prospective Students

Abstract:

Recent progress in robot learning has produced impressive results, yet many systems still require learning from large datasets of demonstrations and are less effective in clutter or with highly deformable objects. This talk presents work on data-efficient manipulation using (i) diffusion-based augmentation that synthesizes geometrically consistent images and action labels to reduce demonstration requirements and (ii) vision-language models (VLMs) that inject high-level semantics for contact-rich motion planning in clutter. We will also discuss the current status of VLMs for low-level manipulation reasoning, as shown by our ManipBench benchmark. Together these will lead toward robot manipulators that can learn and operate with reduced demonstration requirements across cluttered and real-world environments.

 

Bio:

Daniel Seita is an Assistant Professor in the Computer Science department at the University of Southern California and the director of the Sensing, Learning, and Understanding for Robotic Manipulation (SLURM) Lab. His research interests are in computer vision, machine learning, and foundation models for robot manipulation, focusing on improving performance in visually and geometrically challenging settings. Daniel was a postdoc at Carnegie Mellon University's Robotics Institute and holds a PhD in computer science from the University of California, Berkeley. Daniel has been honored with the AAAI 2026 New Faculty Highlights program. He presents his work at premier robotics conferences such as ICRA, IROS, RSS, and CoRL.