Cost of Building a Teleoperation Rig: Full Breakdown
Three teleop architectures, three price points. Bimanual ALOHA lands near USD 22,000 in parts, a VR-based setup can fit under USD 8,000, and a glove-based dexterous rig sits between the two. Here is the complete itemized cost breakdown, assembly time, and the turnkey alternative when you do not want to build from parts.
Three teleoperation architectures
Before you spend anything, pick the architecture that fits your research agenda. The three popular options as of early 2026 are:
- Leader-follower bimanual (ALOHA). Two small leader arms driven by hand, each joint-copied to a matching follower arm on a shared task surface. Best for high-fidelity bimanual manipulation data collection.
- VR controller-based. A Meta Quest 3 (or similar HMD) provides hand pose and button events, mapped to the end-effector of a commercial arm such as a Franka FR3, UR5e or OpenArm. Good for unimanual workflows and immersive operator feedback.
- Glove-based dexterous. A Manus or Rokoko motion-capture glove streams finger-level pose to a dexterous hand (ORCA, Allegro, Inspire). Required when your research depends on individual-finger control.
Most labs end up with two of these on the same bench, not one. Plan for that.
Option A: Bimanual ALOHA-style rig ($20k–$25k)
ALOHA (A Low-cost Open-source Hardware System for Bimanual Teleoperation) has become the default data-collection architecture for imitation learning in 2024–2026. A faithful build uses four Trossen WidowX 250 arms — two as leaders, two as followers — and four USB cameras on a shared task surface.
Itemized breakdown — bimanual ALOHA rig
| Component | Qty | Unit cost (USD) | Subtotal |
|---|---|---|---|
| WidowX 250 leader arm | 2 | $4,000 | $8,000 |
| WidowX 250 follower arm (with gripper) | 2 | $4,000 | $8,000 |
| Logitech Brio 4K USB camera | 4 | $200 | $800 |
| Intel RealSense D435 (optional, depth) | 1 | $450 | $450 |
| Workstation PC (RTX 4090, 64 GB RAM) | 1 | $3,500–$5,000 | $3,500–$5,000 |
| Powered USB 3 hub (10+ port) | 1 | $80 | $80 |
| 80/20 aluminum extrusion frame + mounting plates | 1 set | $400–$800 | $400–$800 |
| Task-surface table (36×24 in) | 1 | $150–$400 | $150–$400 |
| Cabling, zip ties, Velcro, labels | 1 kit | $120 | $120 |
| Foot pedal (episode start/stop) | 1 | $60 | $60 |
Cost summary — ALOHA rig
Lead time for WidowX arms is typically 4–8 weeks from the manufacturer; SVRC carries stock on several configurations and can often ship faster. See our bimanual teleop hardware setup guide for the reference wiring diagram. Plan 40–80 hands-on engineer hours for the build, most of which is frame assembly, calibration and cable management.
Option B: VR-based teleoperation ($3k–$8k beyond the arm)
The cheapest path to teleoperation in 2026 is a Meta Quest 3 ($500) wired (or streamed) to a workstation running an open-source VR-to-arm bridge. You pay very little for the teleoperation hardware itself; the arm and workstation are where the cost lives.
Itemized breakdown — VR teleop rig
| Component | Qty | Unit cost (USD) | Subtotal |
|---|---|---|---|
| Meta Quest 3 headset | 1 | $500 | $500 |
| SteamVR base stations (optional, for pose anchoring) | 2 | $150 | $300 |
| Vive trackers or HTC trackers (optional) | 2 | $130 | $260 |
| Workstation PC | 1 | $2,500–$4,000 | $2,500–$4,000 |
| Arm mount + task surface + adapters | 1 | $300–$800 | $300–$800 |
| USB cameras (Brio, 2x) | 2 | $200 | $400 |
| Networking / router for low-latency link | 1 | $100–$300 | $100–$300 |
Add this on top of the arm cost (Franka FR3 ~USD 30k, UR5e ~USD 25k, OpenArm at
Option C: Glove-based dexterous teleoperation ($8k–$15k)
If your research agenda requires individual-finger control — dexterous in-hand manipulation, tool use, or whole-hand grasping studies — a glove is the only option. Popular 2026 gloves are Manus Quantum Metagloves (USD 2,500–$5,000 per pair depending on configuration) and Rokoko Smartgloves (USD 2,500–$4,000 per pair). These stream fingertip and joint pose at sub-10ms latency.
Itemized breakdown — glove-based dexterous rig
| Component | Qty | Unit cost (USD) | Subtotal |
|---|---|---|---|
| Manus or Rokoko glove (pair) | 1 | $2,500–$5,000 | $2,500–$5,000 |
| Dexterous robot hand (Allegro, Inspire, ORCA) | 1 pair | $3,000–$8,000 | $3,000–$8,000 |
| SteamVR base stations for wrist pose | 2 | $150 | $300 |
| Workstation PC | 1 | $2,500–$4,000 | $2,500–$4,000 |
| USB/PoE cameras | 3 | $200 | $600 |
| Wrist-mount adapters and cable harness | 1 | $200–$500 | $200–$500 |
See our glove-based dexterous teleoperation guide for a walkthrough on calibration and latency budgets.
Mobile ALOHA add-on ($10k–$15k on top)
When the research task requires mobility — opening a fridge, moving between stations, navigating a home kitchen — the Mobile ALOHA add-on mounts the bimanual rig on a small wheeled base. The reference base is the AgileX Tracer, available through SVRC for approximately USD 10,000–$15,000 depending on configuration.
| Component | Unit cost (USD) |
|---|---|
| AgileX Tracer mobile base | $10,000–$15,000 |
| Additional LiDAR for base (2D or 3D) | $800–$3,500 |
| Onboard battery extension | $500–$1,500 |
| Re-mount kit for ALOHA arms onto base | $400–$1,200 |
Our Mobile ALOHA setup guide covers the mechanical and electrical integration; our Mobile ALOHA at the SVRC Store is a turnkey option.
Step-by-step: build a bimanual ALOHA rig
- Order parts. Place the WidowX arms order first; it has the longest lead time. Order cameras and workstation in parallel.
- Build the frame. Cut and assemble 80/20 extrusion for a shared task surface with mount plates for followers and a separate operator bench.
- Mount arms and cameras. Install two follower arms facing forward on the task surface, two leader arms on the operator bench, and three to four cameras (2 wrist + 1 front + 1 top).
- Wire and calibrate. Connect each arm’s USB to a powered hub into the workstation. Calibrate joint zero positions. Confirm sub-20ms end-to-end latency on the leader-follower loop.
- Run first data session. Record 30 episodes of a simple task. Export to LeRobot or HDF5. Train a small ACT policy and deploy it back to validate the loop.
See our camera setup for teleoperation and data collection guides for detail.
Build vs buy vs lease
Our rough rule: if the team has one ROS-literate integrator with 80 free hours, build. If not, buy turnkey. A turnkey SVRC ALOHA typically ships at a 20–30 percent premium over raw parts but saves 60–120 hours of integration. At a USD 150/hr loaded engineer rate, turnkey pays for itself almost immediately.
For short projects or for evaluation before commitment, leasing an ALOHA from SVRC starts around USD 1,500/month and includes support. For custom teleoperation data collection as a service, see data services.
Latency, throughput and where the rig limits your research
After the hardware is on the bench, the single most important quality metric for a teleoperation rig is end-to-end latency. Leader-follower rigs like ALOHA can hit 10–20 ms round-trip from operator joint motion to follower response, which feels immediate and enables subtle manipulation. VR-based rigs with a USB-tethered Quest 3 typically hit 20–35 ms. Glove-based rigs hit 15–25 ms on the hand pose alone but wrist pose from SteamVR adds another 10 ms. Above roughly 40 ms, fine manipulation tasks begin to suffer.
A second limit is throughput: how many episodes can a single operator collect in an hour. In practice, ALOHA operators typically produce 30–60 episodes per hour for short tasks (pick and place) and 10–20 for longer tasks (folding, tool use). At 40 episodes/hour averaged across a 6-hour shift, one operator generates 240 episodes per day. A thousand-episode dataset — typical for a small VLA fine-tune — takes about four to five operator-days. This is the right framing for budgeting data-collection programs; see our data collection guide for details.
A third limit is operator ergonomics. A badly arranged leader-follower bench will burn out operators in a few hours. Our recommended layout puts leader arms on an adjustable-height bench, followers at a matched height on the task surface, and a chair with good lumbar support; budget USD 600–$1,200 for operator furniture that you will not regret.
Common mistakes that blow the budget
- Underspeccing the workstation. A cheap PC without enough USB bandwidth will drop frames and corrupt your training data silently. Always use a powered USB 3 hub per arm and test at full camera frame-rate before collecting real episodes.
- Buying one-off cameras. Mixed camera brands cause synchronization headaches. Buy four identical cameras from the same manufacturing batch.
- Skipping the frame. 80/20 extrusion feels optional but without it the arms drift relative to each other and the task surface across weeks of use.
- Ignoring lead time. WidowX arm orders of 4+ can delay 8–12 weeks. Place the order first and build in parallel.
- Hand-rolling software. Use LeRobot or the ALOHA reference code rather than writing your own teleoperation bridge — the community implementations handle edge cases you do not want to rediscover.
Cost per collected episode: a useful TCO frame
Divide your total rig TCO by the expected number of collected episodes to get cost per episode. For a USD 22,500 ALOHA build amortized over 5,000 episodes (a realistic first-year output), the hardware cost per episode is USD 4.50. Add the operator cost (at USD 30/hr loaded and 40 episodes/hr, USD 0.75/episode) and total cost per episode lands near USD 5. For a SO-100-based micro-rig, cost per episode can fall below USD 2. For a Mobile ALOHA build, it can rise above USD 15. This framing is what the larger AI labs use internally to budget data programs; use it to sanity-check any teleop purchase before signing the PO.
Frequently asked questions
About USD 20,000–$25,000 in parts for an ALOHA-style rig. VR rigs can be done for USD 3,000–$8,000 beyond the arm; glove rigs at USD 8,000–$15,000.
Turnkey typically runs a 20–30% premium but saves 60–120 engineer-hours. At USD 150/hr it break-evens almost immediately.
Adds an AgileX Tracer mobile base under the bimanual rig. USD 10k–$15k on top of a standard ALOHA.
Three to four Logitech Brio 4K USB cameras ($200 each) for standard work; add a RealSense D435 for depth-aware policies.
40–80 engineer hours of hands-on time plus 4–8 weeks of arm lead time. Order arms first.
Yes. Quest 3 + open-source bridge is the cheapest path but trades off dexterity vs. leader-follower setups.
For data capture: RTX 4060, 32 GB RAM. For training on the same box: RTX 4090 or 5090, 64+ GB RAM.