AR-Guided Robot Data Collection

Using AR to help operators collect more diverse grasp data

Guoheng Sun

ENEE759N — May 16, 2026

1 / 10

The Problem

Imitation learning needs diverse demonstration data to generalize
But humans have systematic biases — we repeat what feels comfortable
After 100+ demos, operators have no idea which angles or positions are missing
Result: trained policy works on seen angles but fails on novel ones

How biased are humans?

Towse et al. (2014) asked people to generate random number sequences. Even in this trivial 1D task, outputs are far from uniform — adults strongly prefer digits 1–3.

If we can't be unbiased picking numbers on a line, we certainly can't uniformly cover a 6-DOF robot workspace when teleoperating hundreds of grasp demonstrations.

Towse, J. N., Loetscher, T., & Brugger, P. (2014). Not all numbers are equal. Frontiers in Psychology, 5, 19.

2 / 10

Our Idea

Show operators what they've already collected
directly in the AR headset while they teleoperate

See your history → avoid repetition → better coverage → stronger policy

Teleoperate
with Quest 3

→

Record
Trajectories

→

Visualize in AR
+ Random Targets

→

More Diverse
Data

3 / 10

System Architecture

Image generated by OpenAI Image2

Quest 3
AR + Control

UDP
↔

Jetson Thor
Bridge

ROS2
↔

xArm 6-DOF
Executes

4 / 10

Communication

Quest 3 → Jetson: Operator holds the trigger; Quest 3 computes relative hand displacement from the lock-in position and sends it as a velocity command (UDP 5005)
Jetson → xArm: Bridge scales the velocity and forwards it to the robot via ROS2 Cartesian velocity control; gripper is toggled via xArm SDK
xArm → Jetson: Robot API returns the end-effector position relative to its base (the origin)
Jetson → Quest 3: Bridge sends back the end-effector position + gripper state (UDP 5006, 3Hz); Quest 3 renders it in AR

Quest 3
hand Δ → velocity

→

Jetson
bridge

→

xArm
move + report EE

→

Jetson
forward EE pose

→

Quest 3
AR render

5 / 10

Calibration

The core challenge:

The robot and Unity have two different coordinate systems. The robot API gives us the end-effector position relative to its base. To render this in AR, we need a mapping between the two coordinate systems.

The operator places the left controller at the robot base and presses X
This records where the robot's origin is in Unity's world, and which direction is forward
Now we have the mapping — any position the robot reports can be converted to a Unity world position and rendered in AR

Coordinate conversion:

Robot (Z-up): X = forward, Y = left, Z = up
Unity (Y-up): Z = forward, X = right, Y = up

Mapping: Unity.x = −robot.y, Unity.y = robot.z, Unity.z = robot.x

6 / 10

Data Collection Workflow

Step 1 — Hardware Setup

Power on xArm, connect Jetson via Ethernet. Connect Quest 3 to Jetson via WiFi Direct. Launch xArm ROS2 driver and bridge on Jetson. Deploy the Unity app to Quest 3.

Step 2 — Calibration

Put on the headset — the real world is visible through passthrough. Place the left controller at the robot arm's base, pointing forward. Press X to calibrate. RGB axes appear at the origin, and the first target cube is generated at a random reachable position.

Step 3 — Place the Physical Object

A small semi-transparent colored cube with a numbered label and black wireframe edges appears in AR. The operator places a real physical cube at that position, aligning it with the AR overlay.

Step 4 — Grasp Demonstration

Hold right trigger to teleoperate — hand motion maps to robot velocity. Press grip to close/open the gripper. The trajectory draws in real time (cyan line). Blue/yellow markers appear at grasp/release points.

Step 5 — Next Trial

Press A to stop recording. The trajectory is frozen. Reposition the arm to a new starting pose. Press A again — the old target fades, a new target appears at a different random location, and recording resumes. All historical trajectories stay visible. Press Y to clear everything and start fresh.

7 / 10

What You See in AR

Visual	Meaning
● Red sphere	Live end-effector position
━ Cyan line	Trajectory during current trial
■ Blue cube	Where the gripper closed (grasp point)
■ Yellow cube	Where the gripper opened (release point)
■ Colored target	Randomized grasp target with number label
■ Faded targets	Completed trials (history stays visible)

8 / 10

Demo

9 / 10

Summary

Problem: Humans have spatial bias → redundant demo data → weak policies
Solution: AR passthrough with trajectory history + randomized target placement
System: Quest 3 + Jetson Thor + xArm 6-DOF, all connected by lightweight UDP
Key insight: AR only guides the human — no extra hardware needed for visualization
Result: Operator is aware of coverage, naturally explores more diverse strategies

Thank you! Questions?

10 / 10