The Grasp Planning Problem
Given an RGB-D observation of a scene, predict a 6-DOF gripper pose (position + orientation) that produces a stable grasp on a target object. This deceptively simple problem statement hides significant complexity: the space of possible gripper poses is continuous and high-dimensional, most poses will fail, and the relationship between pre-grasp pose and grasp success depends on contact mechanics that are not fully visible from a single RGB-D frame.
The problem divides roughly into two regimes: top-down grasping (2.5D — ignore gripper rotation around the approach axis, grasp from above) which simplifies the problem enough for classical methods; and 6-DOF grasping (full pose prediction) which requires learned methods for reliable generalization.
Classical Methods
- GPD (Grasp Pose Detection): Point cloud-based 6-DOF grasp pose detection. Samples candidate grasp poses on the point cloud surface, evaluates each with a learned binary classifier (grasp quality). Runs at approximately 5 Hz on a modern workstation. Reliable on clean, accurate point clouds — struggles with occlusion and sensor noise. Open source (MIT license).
- GQ-CNN (Berkeley AUTOLAB): Grasp quality CNN operating on depth image patches. Fast (50ms inference) and accurate for top-down grasps from direct overhead view. Limitation: limited to top-down approach direction — cannot predict side grasps or precise 6-DOF poses. Best for bin-picking scenarios with overhead camera. Open source with pre-trained models available.
Learned 6-DOF Methods: Comparison
| Method | 6-DOF | Novel Objects | Inference Speed | Availability |
|---|---|---|---|---|
| GraspNet-1Billion (CVPR 2020) | Yes | Good (ShapeNet coverage) | 10 Hz | Open source, pre-trained |
| Contact-GraspNet (ICRA 2021) | Yes | Good (handles occlusion) | 8 Hz | Open source, pre-trained |
| AnyGrasp (2023) | Yes | Best (open-set objects) | 15 Hz | Commercial API + SDK |
| GraspNeRF (ICRA 2023) | Yes | Good (NeRF reconstruction) | 0.5 Hz (NeRF overhead) | Research only |
AnyGrasp: Best for Novel Objects
AnyGrasp, developed by the same group as GraspNet-1Billion, is the state-of-the-art method for grasping novel objects not seen during training. Trained on the GraspNet-1Billion dataset (600K+ training grasps across 97 object categories) plus additional augmentation, AnyGrasp generalizes to open-set objects with 90%+ grasp success rate on standard benchmarks.
The commercial API (available from Graspnet.net) provides inference as a service with a Python client library — useful for teams that want to use AnyGrasp without maintaining the inference infrastructure. For teams that need on-device inference, the SDK supports Jetson AGX Orin deployment at 8–12 Hz.
Integration Pipeline
The standard integration with a ROS2 + MoveIt2 stack:
- Step 1 — Perception: RGB-D camera (RealSense D435i or similar) → point cloud → grasp planner. Register camera extrinsics to robot base frame for coordinate transform.
- Step 2 — Grasp Planning: Pass point cloud to GraspNet or AnyGrasp API → receive list of candidate grasp poses ranked by quality score.
- Step 3 — Grasp Selection: Filter candidates by: (a) reachability given current arm configuration (query MoveIt2 IK), (b) collision-free approach path, (c) quality score threshold (>0.7 typical).
- Step 4 — Execution: Plan and execute approach motion with MoveIt2. Switch to impedance control in final 5cm of approach to handle position errors.
Failure Modes and Mitigations
- Transparent Objects (glass, clear plastic): Depth sensors fail on transparent surfaces — structured light and ToF both require surface reflectance. Mitigation: use tactile search (move gripper to estimated contact point, probe with low force) or add polarized lighting to increase surface visibility.
- Heavy Objects Near Payload Limit: Grasp planning doesn't account for payload limits. Grasp an object near the arm payload limit from the wrong angle and you may succeed at grasping but fail to lift due to torque limits. Add payload estimation to your grasp selection filter.
- Thin Objects <3mm: Standard parallel-jaw grippers cannot close to <3mm without hardware modification. Grasp planners trained on standard objects produce invalid grasps for credit cards, sheets of paper, or thin plates. Requires specialized gripper geometry or vacuum-based grasping.
SVRC stocks RealSense D435i and D405 cameras optimized for manipulation, along with pre-built ROS2 grasp planning nodes for GraspNet and AnyGrasp. Visit the hardware catalog and platform documentation for integration guides.