Integration Architecture Options

Before writing any code, choose the right integration pattern for your requirements. The four main options each make different tradeoffs between latency, coupling, and ecosystem compatibility:

PatternLatencyCouplingBest For
Direct SDK<5msTight — language/platform specificSafety-critical control loops, high-frequency motion
REST API50–200msLoose — language agnosticTask dispatch, status queries, configuration
WebSocket5–50msMedium — long-lived connection requiredReal-time telemetry, servo commands, streaming
ROS2 bridge10–100msLoose — uses ROS2 ecosystemROS2-compatible software, visualization, multi-robot

Most production integrations use a combination: REST API for task dispatch and status queries (where latency is acceptable), WebSocket for real-time telemetry streaming, and the direct SDK only for the low-level control loop running on the robot's onboard computer. Never expose the direct SDK over a network — this bypasses all safety abstractions.

REST API Design for Robot Control

A well-designed robot REST API treats robots as resources and tasks as first-class objects. Here is the minimal API surface that covers most use cases:

  • POST /tasks — Submit a new task. Body: {"task_type": "pick_and_place", "robot_id": "arm-001", "parameters": {...}, "priority": 1}. Returns: {"task_id": "t-12345", "status": "queued", "estimated_start": "..."}.
  • GET /tasks/{id} — Get task status. Returns current status (queued / running / succeeded / failed), execution metrics, and error details if failed.
  • GET /robots/{id}/status — Get robot health: joint temperatures, battery, error codes, current task, and pose.
  • DELETE /tasks/{id} — Cancel a queued or running task. Returns 200 if successfully canceled, 409 Conflict if the task is in a non-cancellable state (e.g., mid-insertion).
  • Authentication: Use API key for machine-to-machine integrations (simpler, auditable) or OAuth2 client credentials flow for multi-tenant deployments. Pass the key in the Authorization: Bearer <key> header on all requests.
  • Rate limiting: Apply a 100 requests/second per-key rate limit enforced at the API gateway layer. Return 429 Too Many Requests with a Retry-After header. Most integrations never approach this limit.

WebSocket Implementation

WebSocket provides the bidirectional, low-latency channel needed for real-time robot control and telemetry streaming. Two separate WebSocket channels are recommended — one for commands, one for telemetry — to prevent command latency being affected by telemetry volume.

  • Command stream (robot ← client): Client sends TwistStamped-format JSON commands at 50–100 Hz. The robot's onboard controller must acknowledge each command within 10 ms or enter a safe stop state. Use binary encoding (MessagePack or CBOR) instead of JSON for command streams — 3–5× smaller payload, 30–50% lower latency.
  • Telemetry stream (robot → client): Robot sends joint state, end-effector pose, camera metadata, and error codes at 10 Hz. 10 Hz is sufficient for monitoring and visualization; higher rates require binary encoding and dedicated network bandwidth.
  • Heartbeat and reconnection: The client must send a ping frame every 1 second. If the robot receives no ping for 3 seconds, it enters a safe stop state. Implement exponential backoff reconnection logic in the client (retry at 1s, 2s, 4s, 8s) to handle transient network interruptions.
  • Message sequencing: Include a monotonically increasing sequence number in every message. The receiver can detect dropped messages and decide whether to interpolate or wait for the next message.

ROS2 Web Bridge

For teams with existing ROS2 infrastructure, rosbridge_suite exposes ROS2 topics, services, and actions over WebSocket with JSON encoding. This allows web dashboards and non-ROS applications to interact with the ROS2 graph without running ROS2 themselves.

  • rosbridge_server: Install via sudo apt install ros-humble-rosbridge-suite. Runs a WebSocket server on port 9090 by default. Clients connect and subscribe/publish to ROS2 topics using a simple JSON protocol.
  • roslibjs: The JavaScript client library for rosbridge. Enables web dashboards to subscribe to /joint_states, /tf, and custom topics, and to call ROS2 services from the browser.
  • JSON encoding overhead: rosbridge uses JSON encoding which is 5–10× larger than ROS2's native CDR binary encoding. For high-frequency topics (>10 Hz with large messages), consider a dedicated binary bridge or filtering topics before bridging.
  • Security note: rosbridge has no authentication by default. Always run it behind the WireGuard VPN described in the fleet management guide, never exposed directly to the internet.

Enterprise Integration Patterns

Connecting robots to enterprise systems (WMS, ERP, MES) requires patterns that handle the reliability mismatch between always-on enterprise systems and occasionally-offline robots:

  • Message queue for task dispatch (Kafka or RabbitMQ): WMS publishes pick orders to a Kafka topic. The robot fleet manager consumes from the topic and dispatches tasks to available robots. If a robot is offline when a task arrives, the task stays in the queue until a robot is available. This decouples the WMS from robot availability.
  • ERP/WMS webhooks for order events: Configure the WMS to post a webhook to your robot platform API when new orders are ready. The robot platform processes the webhook and dispatches tasks. Implement webhook signature verification (HMAC-SHA256) to prevent spoofed order injection.
  • Time-series database for telemetry (InfluxDB): Store robot telemetry in InfluxDB for long-term analysis and compliance reporting. Use InfluxDB's data retention policies to automatically expire raw 100 Hz data after 7 days while retaining 1-minute aggregates for 2 years.

Security Implementation

  • TLS 1.3 for all network communication: All REST, WebSocket, and rosbridge traffic must use TLS 1.3. Configure your reverse proxy (Nginx or Caddy) with ssl_protocols TLSv1.3; and HSTS headers. Robots that connect to cloud services must validate the server certificate.
  • Certificate pinning for robot clients: Robot onboard software should pin the expected server certificate fingerprint and reject connections to unexpected certificates. This prevents man-in-the-middle attacks in factory/warehouse network environments where rogue APs may be present.
  • VPN for sensitive environments: For deployments in high-security environments (pharmaceutical, defense, financial), route all robot-cloud communication through a WireGuard VPN tunnel. The robot's cloud connectivity is then a single authenticated, encrypted tunnel rather than multiple individually-secured connections.
  • Audit logging: Log every API call with: timestamp, API key ID (never the key itself), endpoint, response status, and request IP. Store audit logs in a write-once store (AWS CloudTrail or GCP Cloud Audit Logs). This is required for SOC 2 compliance and incident investigation.

Testing and Mocking

  • Mock robot server for development: Build a mock HTTP/WebSocket server that responds to the full robot API with realistic simulated data. Developers can build and test integrations against the mock without needing physical robot access. The mock should replay recorded robot sessions for deterministic integration testing.
  • Simulation-based integration tests: For end-to-end testing of the full stack (WMS → robot platform → robot → telemetry → dashboard), use a simulated robot in Gazebo or Isaac Sim. These tests run in CI on every pull request and catch integration regressions that unit tests miss.
  • Contract testing: Use Pact or similar contract testing framework to verify that the robot API producer (your robot platform) and API consumers (WMS integration, dashboard) agree on the API schema. This prevents breaking changes from going undetected until integration.

Integration Architecture Diagram

A typical end-to-end architecture for a production warehouse deployment:

  • Layer 1 — Robot hardware: Robot arm + onboard computer running ROS2, direct SDK control loop at 500 Hz, local safety watchdog.
  • Layer 2 — Robot platform (on-premise or cloud): REST API for task management, WebSocket server for real-time telemetry, SVRC platform for fleet dashboard and policy deployment.
  • Layer 3 — Enterprise integration: Kafka message bus consuming WMS order events and dispatching to robot platform; InfluxDB for telemetry storage; Grafana dashboard for operations team.
  • Layer 4 — Enterprise systems: WMS (warehouse management), ERP (inventory, finance), MES (manufacturing execution system) — all communicate with Layer 3 only, never directly with robots.