We present qualitative examples in both simulated and real-world environments in the attached video.
Simulation. We select several challenging cases characterized by severe visual occlusions.
Real world. We show three indoor scenes: office, classroom, and common room. In the classroom, we further demonstrate robustness under two types of dynamic conditions:
Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existing ZSON methods often struggle in realistic and cluttered environments, particularly when the scene contains heavy occlusions, unknown risks, or dynamically moving target objects. To address these challenges, we propose Schrödinger’s Navigator, a navigation framework inspired by Schrödinger’s thought experiment on uncertainty. The framework treats unobserved space as a set of plausible future worlds and reasons over them before acting. Conditioned on egocentric visual inputs and three candidate trajectories, a trajectory-conditioned 3D world model imagines future observations along each path. This enables the agent to see beyond occlusions and anticipate risks in unseen regions without requiring extra detours or dense global mapping. The imagined 3D observations are fused into the navigation map and used to update a value map. These updates guide the policy toward trajectories that avoid occlusions, reduce exposure to uncertain space, and better track moving targets. Experiments on a Go2 quadruped robot across three challenging scenarios, including severe static occlusions, unknown risks, and dynamically moving targets, show that Schrödinger’s Navigator consistently outperforms strong ZSON baselines in self-localization, object localization, and overall Success Rate in occlusion-heavy environments. These results demonstrate the effectiveness of trajectory-conditioned 3D imagination in enabling robust zero-shot object navigation.
Overview of our Navigator pipeline. Left: The system receives a goal instruction, RGB-D observations, and the robot pose as input. Bottom center: A trajectory sampler deterministically selects three candidate trajectories and conditions a 3D world model. The model predicts future 3DGS observations along these trajectories—left bypass, right bypass, and over-the-top—to infer occluded and unobserved regions. Top right: The predicted cues are fused with current observations to construct and update multi-sourced value maps and enable future-aware reasoning. This process produces a final affordance map used for intermediate waypoint selection. Bottom right: The execution unit follows the selected waypoint and generates control commands to navigate the robot continuously toward the goal.
We report results in both real-world and simulation settings. The real-robot experiments validate robustness under occlusions, unknown risks, and moving targets, while the simulator experiments provide controlled comparisons across diverse environments.
Performance under three indoor scenes: office, classroom, and common room.
| Scene | Office | Classroom | Common Room | All |
|---|---|---|---|---|
| Search for static objects | ||||
| InstructNav | 7/10 | 7/10 | 8/10 | 22/30 |
| Ours | 8/10 | 8/10 | 7/10 | 23/30 |
| Search for dynamic objects | ||||
| InstructNav | 3/10 | 4/10 | 3/10 | 10/30 |
| Ours | 5/10 | 5/10 | 6/10 | 16/30 |
| Sudden Obstacles | ||||
| InstructNav | 4/10 | 3/10 | 5/10 | 12/30 |
| Ours | 6/10 | 7/10 | 6/10 | 19/30 |
| Method | Training Free | HM3D | ||
|---|---|---|---|---|
| SR↑ | SPL↑ | DTG↓ | ||
| ZSON | ✗ | 0.255 | 0.126 | -- |
| PixNav | ✗ | 0.379 | 0.205 | -- |
| SPNet | ✗ | 0.312 | 0.101 | -- |
| SGM | ✗ | 0.602 | 0.308 | -- |
| ESC | ✓ | 0.392 | 0.223 | -- |
| VLFM | ✓ | 0.525 | 0.304 | -- |
| VoroNav | ✓ | 0.420 | 0.260 | -- |
| L3MVN | ✓ | 0.504 | 0.231 | 4.43 |
| TriHelper | ✓ | 0.565 | 0.253 | 3.87 |
| GAMap | ✓ | 0.531 | 0.260 | -- |
| InstructNav | ✓ | 0.510 | 0.187 | 2.89 |
| InstructNav* | ✓ | 0.453 | 0.186 | 3.38 |
| CogNav | ✓ | 0.725 | 0.262 | -- |
| ApexNav | ✓ | 0.762 | 0.380 | -- |
| Ours | ✓ | 0.609 | 0.237 | 2.23 |