EVE logo EVE: Enabling Anyone to Train Robots using Augmented Reality
UIST 2024

Jun Wang Chun-Cheng Chang Jiafei Duan Dieter Fox Ranjay Krishna
University of Washington NVIDIA Allen Institute for AI

 EVE is an iOS app that enables everyday users to train robots using
intuitive augmented reality visualizations, without needing a real robot.

Abstract

The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task requires expensive trajectory data where a trained human annotator moves a physical robot to train it. Consequently, only those with access to robots produce demonstrations to train robots.

In this work, we remove this restriction with EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations, without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories.

In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching—physically moving a physical robot—in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). EVE allows users to train robots for personalized tasks, such as sorting desk supplies, organizing ingredients, or setting up board games. We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.

Formative User Study

To understand the opportunities and challenges of trajectory collection, we conducted a formative study with $10$ participants with varying experience in robotics, using state-of-the-art collection interfaces (kinesthetic teaching, teleoperation with a VR controller, AR2-D2) and the initial prototype of EVE with three different AR visualizations (AR Kinesthetic Teaching, Path History, Invisible Robot).

user comments

User comments about the six demonstration collection methods used in the formative study.

formative study results

The mean and standard deviation of usability (SUS: 1-100), motion intent communication (ranking: 1-4), user enjoyment (ranking: 1-4), and user preference (ranking: 1-4) are provided. Statistical significance is indicated by $\:\hat{}\:$ for $p \lt 0.05$.

In addition, we distilled the challenges users faced with the current AR visualizations into the following six based on the semi-structured interviews:

  1. Ambiguous spatial position of the hand and the robot: The lack of depth perception made it challenging for users to accurately gauge the relative positions of their hands and the AR robot.
  2. Lack of knowledge of joint constraints: A lack of information about the joint limits, coupled with the lack of depth perception via the iPad screen, causes the robot to move unexpectedly when the specified point with the hand is out of the joint limit.
  3. Imprecise trajectory visualization: The path history visualization displays a straight green line between the robot's end effector coordinate and the user's hand coordinate.
  4. Absence of feedback regarding the collection efficacy: The lack of collision detection prevented the participants from verifying whether the collected trajectory was feasible.
  5. Obstructive robot design: The AR robot's 1:1 scale correspondence with the physical Franka Panda robot frequently resulted in the occlusion of relevant objects within the scene.
  6. Inconsistent hand tracking: Inaccuracies and latency in the hand tracking system employed in the AR environment.

Evaluation User Study

Informed by formative study findings and our own experiences using EVE, we upgraded EVE's prototype with seven additional system changes to address the usability challenges.

We conducted a user study to evaluate the effectiveness of EVE compared to baseline interfaces for three common tabletop tasks. We include the task setups and the real-world policy evaluation using data collected with EVE below.

Toggle Switch

Success rates and remaining time for each task were recorded for all interfaces. Upon completing the task collection with all interfaces, participants filled out the SUS survey and a form ranking motion intent communication, user enjoyment, and overall preference for the interfaces. We aimed to measure $10$ collection attempts for each task.

success rate
remaining time
eval_1
eval_2

Policy Evaluation: AR2-D2 vs. EVE

We trained policies with Perceiver-Actor (PerAct), using $6$ demonstrations over $30,000$ iterations for each interface. On $30$ task rollouts for the toggle switch task, the policy trained with EVE-collected data achieved a $10$% higher accuracy compared to that trained with AR2-D2-collected data.

AR2-D2 RGB-D
AR2-D2 Evaluation
EVE RGB-D
EVE Evaluation