The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task requires expensive trajectory data where a trained human annotator moves a physical robot to train it. Consequently, only those with access to robots produce demonstrations to train robots.
In this work, we remove this restriction with EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations, without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories.
In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching—physically moving a physical robot—in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). EVE allows users to train robots for personalized tasks, such as sorting desk supplies, organizing ingredients, or setting up board games. We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.
To understand the opportunities and challenges of trajectory collection, we conducted a formative study with $10$ participants with varying experience in robotics, using state-of-the-art collection interfaces (kinesthetic teaching, teleoperation with a VR controller, AR2-D2) and the initial prototype of EVE with three different AR visualizations (AR Kinesthetic Teaching, Path History, Invisible Robot).
In addition, we distilled the challenges users faced with the current AR visualizations into the following six based on the semi-structured interviews:
Informed by formative study findings and our own experiences using EVE, we upgraded EVE's prototype with seven additional system changes to address the usability challenges.
We conducted a user study to evaluate the effectiveness of EVE compared to baseline interfaces for three common tabletop tasks. We include the task setups and the real-world policy evaluation using data collected with EVE below.
Success rates and remaining time for each task were recorded for all interfaces. Upon completing the task collection with all interfaces, participants filled out the SUS survey and a form ranking motion intent communication, user enjoyment, and overall preference for the interfaces. We aimed to measure $10$ collection attempts for each task.
We trained policies with Perceiver-Actor (PerAct), using $6$ demonstrations over $30,000$ iterations for each interface. On $30$ task rollouts for the toggle switch task, the policy trained with EVE-collected data achieved a $10$% higher accuracy compared to that trained with AR2-D2-collected data.