The State of the Field in 2025
Robot learning — training neural policies from demonstration data or environment interaction — has advanced more in the past three years than in the preceding decade. Diffusion Policy, ACT, and vision-language-action models like RT-2 and π0 have pushed the boundary of what is possible with 50–500 demonstrations on a single robot.
But "possible in a lab" and "reliable in deployment" remain separated by a long list of open problems. The following ten problems represent the highest-priority research directions for the field from 2025 to 2028, based on deployment experience, literature synthesis, and conversations with practitioners.
1. Scaling Data Collection Efficiently
The bottleneck: Real-world robot demonstration data is expensive to collect — $4–30 per episode depending on task complexity. Training state-of-the-art generalist policies requires tens of thousands of episodes. The math does not work at current collection costs for most organizations.
Research directions: Active learning (collect only the most informative demonstrations, not random ones); simulation pre-training with domain randomization to reduce real-data requirements; cross-embodiment transfer (train on data from robot A, deploy on robot B with minimal fine-tuning); and semi-supervised approaches that extract signal from unlabeled robot videos.
Progress signal: A 10× reduction in demonstrations needed to reach a given success rate on a standardized benchmark task, compared to vanilla behavioral cloning baselines.
2. Long-Horizon Task Completion
Current state: State-of-the-art robot policies handle 10–20 step tasks reliably in controlled settings. Real-world manipulation often requires 50–200 step sequences. Error compounds: a 95% per-step success rate gives only 8% task success over 50 steps.
Research directions: Hierarchical policies with explicit subgoal representations; integration of task and motion planning (TAMP) with learned components for the manipulation primitives; large language model subgoal generation combined with low-level learned policies; and temporal abstraction in action representations.
Progress signal: Reliable completion of 50+ step manipulation tasks in unstructured environments with success rates above 70%.
3. Dexterous In-Hand Manipulation
Current state: Most robot learning work uses parallel-jaw grippers performing power grasps. Dexterous in-hand manipulation — repositioning, rotating, and regrasping objects within a multi-fingered hand — remains largely unsolved for general objects.
Research directions: High-resolution tactile sensing arrays that provide contact information comparable to human fingertip sensitivity; dexterous hand teleoperation with data gloves for demonstration collection (SVRC operates a glove-based data collection program); learned contact models that generalize across object shapes and surface properties.
Progress signal: In-hand rotation of arbitrary objects to a specified orientation with success rates above 80% across 50+ novel object types.
4. Generalization to Novel Objects
Current state: Policies trained on a set of objects typically achieve 60–75% success on those objects. Novel objects (unseen during training) drop to 25–40% success — a 2× generalization gap that makes most policies impractical for real environments.
Research directions: Foundation model visual features (CLIP, DINOv2) as frozen encoders that provide semantic generalization; diverse training data covering 100+ object categories rather than 5–10; category-level pose estimation as an intermediate representation; and test-time adaptation using a small number of novel-object demonstrations.
Progress signal: Novel-object success rate within 15% of trained-object success rate on a standardized evaluation set of 50 held-out object categories.
5. Physical World Understanding
Current state: Current robot policies are excellent pattern matchers but lack understanding of physical object properties. A robot trained on rigid objects cannot adapt its grasp force for fragile objects. Policies have no representation of mass, deformability, or fragility.
Research directions: Implicit physics models learned from interaction data; tactile world models that predict contact outcomes from touch observations; language-conditioned property representations ("handle this gently — it is fragile") integrated into policy conditioning; and physics-informed architectures that embed physical constraints into the policy structure.
Progress signal: Successful manipulation of deformable objects (cloth, foam, soft food items) with success rates above 70% without task-specific training.
6. Human-Robot Collaboration
Current state: Robot policies are designed for static environments. When humans are present, unpredictable movements invalidate the policy's assumptions, causing failures or unsafe behavior.
Research directions: Real-time human intent prediction from pose and gaze; safe compliant control that adapts robot motion in response to unexpected human proximity; collaborative manipulation policies trained on human-robot interaction data; and explicit handover protocols with learned timing.
Progress signal: Successful completion of collaborative tasks (human hands object to robot, robot continues task) with graceful failure modes when human behavior deviates from expectation.
7. Robust Sim-to-Real for Contact Tasks
Current state: Sim-to-real transfer works well for free-space motion (reaching, pre-grasp approach). Contact tasks (insertion, assembly, surface following) transfer poorly due to the sim-to-real gap in contact dynamics — simulated contact is fundamentally different from real contact.
Research directions: Differentiable simulation with learned contact models that are calibrated to real robot data; randomized contact parameters during training to create robust policies; learned residual dynamics models that correct simulation errors; and contact-rich foundation models trained primarily on real data with simulation for augmentation only.
Progress signal: Peg-in-hole insertion (0.5mm clearance) trained entirely in simulation achieving 80%+ success on real hardware without any real-data fine-tuning.
8. Lifelong Learning Without Forgetting
Current state: Deployed robot policies are static — they do not improve from operational experience. When new tasks are added or existing tasks need updating, the standard approach is full retraining from scratch, which is expensive and slow.
Research directions: Continual learning methods that add new task capabilities without overwriting existing ones; LoRA and other parameter-efficient fine-tuning methods that modularly extend policies; experience replay with a curated memory buffer to prevent catastrophic forgetting; and modular policy architectures where task-specific modules can be swapped independently.
Progress signal: A robot policy that learns 10 new tasks sequentially while maintaining 90%+ of original performance on all previous tasks, with no full retraining.
9. Efficient Real-World Reinforcement Learning
Current state: Reinforcement learning on real robots is too slow and unsafe for most applications — learning from scratch requires thousands of environment interactions, many of which involve failures that damage equipment or create safety hazards. Most practical robot learning uses imitation learning from demonstrations.
Research directions: Model-based RL that learns a world model from a small number of real interactions and plans within it; safe exploration methods that constrain the agent's behavior space during learning; combining imitation learning initialization with RL fine-tuning for the final performance gap; and offline RL methods that improve policies from logged operational data without any online interaction.
Progress signal: Reaching 85%+ success rate on a contact-rich manipulation task using only 2 hours of real robot interaction (no simulation), with zero safety violations during learning.
10. Evaluation Standardization
Current state: Robot learning papers report results on incompatible benchmarks with different robots, tasks, success metrics, and evaluation protocols. There is no agreed benchmark analogous to ImageNet for vision or MMLU for language. Comparing methods across papers is nearly impossible.
Research directions: Community convergence on a small set of standard benchmark suites — LIBERO, SimplerEnv, and RoboFlamingo Benchmark are leading candidates; standardized evaluation protocols (number of trials, evaluation conditions, success definition); hardware-in-the-loop evaluation services that allow third-party verification of claimed results; and a leaderboard infrastructure analogous to Papers With Code.
Progress signal: Three or more major robot learning venues (CoRL, ICRA, RSS) requiring results on a common benchmark subset for paper acceptance.
How SVRC Contributes to These Directions
SVRC's research and infrastructure programs directly address problems 1 (data collection efficiency), 3 (dexterous manipulation data via glove collection), 8 (lifelong learning via continuous data flywheel), and 10 (evaluation through our standardized task suite).
For researchers working on these open problems, SVRC provides data collection services that produce benchmark-quality demonstration datasets, robot access programs for hardware-in-the-loop evaluation, and dataset contributions to the LeRobot Hub and Open X-Embodiment format. Visit our research programs page for collaboration and dataset access inquiries.