Skip to main

MEMS Robotics Seminar: How general are generalist robot policies? Data scaling, diagnostic tools, and memorization in VLAs

Speaker

Mac Schwager

Abstract: Vision-Language-Action (VLA) policies have recently emerged as a promising paradigm for generalist robot autonomy. However, VLAs have several challenges that must be overcome before they can achieve their potential. Firstly, these models require fine-tuning with human-teleoperation demonstrations, which can be tedious, expensive, and time-consuming to collect. Secondly, policy performance is limited to teleop demonstration quality, which can be highly variable depending on the human teleoperator's skill and the dexterity barrier of the teleop interface. Lastly, VLA models, with the current state of practice, appear to suffer from strong overfitting to the fine-tuning data. All of these issues lead to "generalist" policies that do not generalize very well. In this talk, I will describe recent work in my lab to address each of these problems. I will describe techniques we have developed to scale up demonstration data by leveraging 3D Gaussian Splatting models and optimization-based planning experts to generate arbitrary volumes of high-quality visual demonstrations to augment or replace human teleop data. I will describe our work on multi-task progress models that can track, based on visual inputs and text prompts, the progress of a demonstration. This can be used to filter human teleop data for high-quality training data, and can be used as an online performance monitor during policy execution for fault detection, recovery guidance, and diagnostics. Finally, I will describe our work on memorization vs generalization in visuo-motor policies, where we find that current fine-tuning practices cause overfitting to the training data, limiting a VLA's generalization capabilities. I will explore some remedies for this problem. The talk will include experimental results for drone navigation policies, drone aerial manipulation policies, and table-top manipulation DR. MAC SCHWAGER is an Associate Professor of Aeronautics and Astronautics and Computer Science (by courtesy) at Stanford University.

Categories

Engineering, Lecture/Talk, Technology