Latency Requirements by Task

Not all teleoperation tasks have the same latency tolerance. The three tiers that matter in practice:

  • Precision contact tasks (<30ms required): Peg insertion, surface following, connector mating. At these tolerances, only local-area network teleoperation is viable. Even 50ms introduces enough phase lag in the operator's feedback loop to make precise contact judgment unreliable.
  • Medium-precision tasks (100–300ms acceptable): Object pick-and-place with >5mm tolerance, manipulation of large objects, locomotion control. Wide-area internet with compensation strategies works here. This covers the majority of manipulation data collection tasks.
  • High-latency tasks (>500ms — not viable for direct control): At this latency, direct teleoperation breaks down for any task requiring reactive control. Supervisory control (send high-level commands, robot executes autonomously) is the only viable mode above 500ms.

Latency Sources and Optimization

Pipeline StageTypical LatencyOptimization
Network propagation (US coast-to-coast)70–90msOperator geographic routing — match operators to nearby robots
Video encoding (H.264 software)50–100msSwitch to WebRTC VP8 hardware encode: 15–30ms
Video decoding (browser)10–30msEnable hardware acceleration in browser
Control command serialization/deserialization2–5msBinary protocol (MessagePack/protobuf) vs JSON
Robot controller loop overhead5–20msHigh-priority RT thread for command processing
Camera capture + USB frame delivery10–33msUSB3 camera at 60fps: 16ms max frame age

Latency Compensation Strategies

  • Smith Predictor: A classical control compensation technique for systems with known constant delay. The Smith predictor runs an internal model of the robot dynamics in parallel, shifted by the round-trip delay, so the operator is effectively controlling the model output rather than the delayed real output. Works well when delay is stable and the plant model is accurate. Falls apart with variable internet jitter (±30ms swings break the constant-delay assumption).
  • Visual Lead: Operators are trained to aim for where the robot will be rather than where the video shows it currently. A visual overlay on the operator interface shows the predicted robot position based on the last command and the known latency. Simple, effective, and doesn't require model accuracy. Preferred method for most real deployments.
  • Task Selection by Latency Tier: The most robust strategy. Measure round-trip latency at session start and automatically route operators to appropriate tasks. Operators with <50ms RTT get contact-critical tasks; operators with 100–200ms RTT get standard pick-place; operators above 250ms are assigned to navigation or supervisory tasks.

Video Streaming Technology Comparison

TechnologyGlass-to-Glass LatencyBandwidthRecommendation
WebRTC VP8 (hardware encode)30–50ms2–8 MbpsBest for teleoperation — use this
WebRTC H.26440–80ms1.5–5 MbpsGood fallback if VP8 not available
H.264 over RTSP100–150ms1–3 MbpsAcceptable for supervisory control
MJPEG over HTTP100–300ms10–50 MbpsAvoid — high bandwidth, high latency
JPEG frames over WebSocket200–500ms5–20 MbpsDo not use for teleoperation

Connection Requirements

  • Symmetric fiber (home or office): RTT typically <30ms domestic, <80ms intercontinental. Most reliable for sustained teleoperation sessions. Recommend as minimum for operators doing 4+ hour sessions.
  • 5G mmWave: 10–20ms RTT, 100Mbps+ bandwidth. Excellent when available. Coverage is still limited to dense urban areas; not reliable for mobile operator setups.
  • 4G LTE: 30–80ms RTT typically, variable. Viable for medium-precision tasks. Jitter is the main problem — implement adaptive jitter buffer on the control command receive side.
  • Home WiFi (shared): 20–100ms variable, with potential 200–500ms spikes during household traffic. Require operators to connect via Ethernet for production data collection sessions.

The SVRC teleop platform measures RTT at session start, routes operators to appropriate task queues automatically, and uses WebRTC VP8 with hardware encode for all video streams.