Robot Sensors for Manipulation Learning: Cameras, Encoders, Force, and Tactile

Why Sensors Matter for Learning

A robot policy is only as good as its inputs. The policy observes the world through sensors and must infer everything it needs to act — object positions, grasp quality, contact forces, whether the task succeeded — from the sensor stream. Sensor gaps create policy gaps.

Three sensor properties matter most: resolution (how much detail is captured), latency (how old the data is when the policy uses it), and synchronization (whether all sensors report the same moment in time). A 50 ms desynchronization between cameras and joint states can significantly degrade policy performance on fast tasks.

Joint Encoders

Joint encoders are the primary state sensors for a robot arm — they report the angle of each joint.

Absolute encoders are standard in modern cobots. Unlike incremental encoders (which only measure changes), absolute encoders report the true joint angle from the moment of power-on, without requiring a homing routine.

Resolution: Typically 19-bit, giving 2¹⁹ = 524,288 steps per revolution, or approximately 0.0007° per step. This is more than sufficient for manipulation — mechanical compliance in the joints is the limiting factor.
Sample rate: 1 kHz for joint position; velocity is usually computed by differentiating position with a low-pass filter.
Latency: 0.5–2 ms from physical angle to reported value over the motor bus (typically EtherCAT or CAN).
Role in learning: Joint encoder data provides the robot's proprioceptive state — its own body configuration — which is always included as a policy input alongside camera images.

RGB Cameras

RGB cameras are the most information-rich sensors for manipulation. A single 720p camera provides ~1 million pixels per frame, each encoding color and texture. Policies based on camera observations have demonstrated remarkable generalization when trained on diverse data.

Resolution: 640×480 (VGA) is the minimum for object recognition; 1280×720 (HD) is standard; 1920×1080 (FHD) adds detail for small object manipulation.
Frame rate: 30 fps for slow manipulation; 60 fps for fast tasks or dynamic tracking.
Interface: USB3 is convenient but has non-deterministic latency. GigE Vision (Ethernet) provides deterministic timing — preferred for synchronized multi-camera setups.
Shutter type: Global shutter captures all pixels simultaneously (no rolling-shutter distortion on fast motions); critical for wrist cameras that move quickly.
Typical setup: 3 cameras per robot station — one overhead, one side/front, one wrist-mounted.

Depth Cameras

Depth cameras add a distance measurement to each pixel, giving the policy explicit 3D geometry. The most commonly used model in manipulation research is the Intel RealSense D435 (structured light / active stereo).

Range: 0.1 m to 3 m; optimal accuracy at 0.3–1.5 m.
Frame rate: 30 fps at 848×480 depth resolution.
Limitation: Structured light performs poorly on shiny, specular, or transparent surfaces (glass, metal, water). For these materials, tactile sensors or force feedback provide more reliable contact information.
When to use: Pick planning on cluttered tabletops, bin picking, any task where estimating object height matters.

Force/Torque Sensors

A six-axis wrist force/torque (F/T) sensor measures the forces and torques the end-effector exerts on (and receives from) the environment. It is mounted between the robot's last link and the end-effector.

Measurements: Fx, Fy, Fz (forces in three axes) and Tx, Ty, Tz (torques around three axes).
Typical range: ±200 N force, ±10 Nm torque.
Bandwidth: 1 kHz, enabling real-time contact detection.
Applications: Detecting grasp success (force below threshold → object slipped), compliant insertion (peg-in-hole using force-guided search), safe human-robot contact (stop if unexpected force detected), surface following.
In policies: F/T readings can be included directly as policy inputs — policies trained with F/T data show significantly better contact-rich task performance.

Tactile Sensors

Tactile sensors measure distributed contact information at the fingertip level — going beyond a single F/T reading to spatial pressure maps or even images of the contact patch.

GelSight (MIT): A soft gel fingertip with an internal camera. Contact deforms the gel, and the camera images the deformation, providing high-resolution contact geometry (~400×300 pixels per finger). Excellent for detecting grasp pose and object slip.
Capacitive array: A grid of capacitive pressure sensors on the finger surface. Lighter weight than GelSight; lower resolution but fast (>1 kHz). Used for slip detection at the fingertip.
Current state: Tactile sensing in robot learning is an active research area — most production systems use F/T sensors and rely on cameras for the rest. Tactile data is valuable but adds complexity to the data pipeline.

Sensor Data Rates and Storage Requirements

Sensor	Sample Rate	Data per Second	Data per 2-min Episode
Joint encoders (7-DOF)	1000 Hz	~56 KB/s	~6.7 MB
RGB camera 720p JPEG (×3)	30 Hz	~9 MB/s	~1.1 GB
Depth camera 480p (×1)	30 Hz	~28 MB/s raw (lossless)	~3.4 GB raw
Force/torque sensor	1000 Hz	~48 KB/s	~5.8 MB
Tactile sensor (GelSight ×2)	30 Hz	~6 MB/s	~720 MB
Typical total (3 RGB + joints + F/T)	—	~9 MB/s	~1.1 GB

Time Synchronization

All sensors must be timestamped to a common clock. Without synchronization, a camera frame captured at t=100 ms and a joint state recorded at t=105 ms will be incorrectly paired during training, adding noise to every sample.

Best practices:

Hardware trigger: A GPIO pulse from the controller triggers all cameras simultaneously. This achieves <1 ms synchronization and is the standard for serious data collection setups.
Software timestamping: Record system time at receipt; apply measured fixed latency offsets per sensor. Achieves 5–20 ms synchronization — adequate for 30 fps camera policies, marginal for high-speed tasks.
PTP/IEEE 1588: Hardware-level network time protocol for multi-machine setups. Achieves <1 ms across machines on a local network.

To browse compatible sensors and accessories for your manipulation setup, visit the SVRC equipment store.