Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation

¹Fudan University, ²Shanghai Jiao Tong University, ³Shanghai University of International Business and Economics, ⁴Shanghai Innovation Institute

We present qualitative examples in both simulated and real-world environments in the attached video.

Simulation. We select several challenging cases characterized by severe visual occlusions.

Real world. We show three indoor scenes: office, classroom, and common room. In the classroom, we further demonstrate robustness under two types of dynamic conditions:

Moving objects, such as a chair in motion.
Sudden occlusion, caused by a chair abruptly entering the view.

Trajectory-conditioned 3D imagination helps robots see beyond occlusions.

Real-world zero-shot object navigation often fails when the target object (e.g., a cat) is hidden behind occlusions and surrounded by unknown or potentially hazardous space. Conventional navigation systems typically perceive only the immediate occluder and cannot infer what exists beyond it.

Schrödinger’s Navigator addresses this challenge by modeling unobserved regions as multiple plausible futures. It samples several trajectories around the occluding structure and uses a trajectory-conditioned 3DGS imagination model to predict expected observations along each path. This enables the robot to anticipate the post-occlusion scene and select safer, less-occluded routes that increase the likelihood of locating the target.

Abstract

Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existing ZSON methods often struggle in realistic and cluttered environments, particularly when the scene contains heavy occlusions, unknown risks, or dynamically moving target objects. To address these challenges, we propose Schrödinger’s Navigator, a navigation framework inspired by Schrödinger’s thought experiment on uncertainty. The framework treats unobserved space as a set of plausible future worlds and reasons over them before acting. Conditioned on egocentric visual inputs and three candidate trajectories, a trajectory-conditioned 3D world model imagines future observations along each path. This enables the agent to see beyond occlusions and anticipate risks in unseen regions without requiring extra detours or dense global mapping. The imagined 3D observations are fused into the navigation map and used to update a value map. These updates guide the policy toward trajectories that avoid occlusions, reduce exposure to uncertain space, and better track moving targets. Experiments on a Go2 quadruped robot across three challenging scenarios, including severe static occlusions, unknown risks, and dynamically moving targets, show that Schrödinger’s Navigator consistently outperforms strong ZSON baselines in self-localization, object localization, and overall Success Rate in occlusion-heavy environments. These results demonstrate the effectiveness of trajectory-conditioned 3D imagination in enabling robust zero-shot object navigation.

The framework of our Schrödinger's Navigator

Overview of our Navigator pipeline. Left: The system receives a goal instruction, RGB-D observations, and the robot pose as input. Bottom center: A trajectory sampler deterministically selects three candidate trajectories and conditions a 3D world model. The model predicts future 3DGS observations along these trajectories—left bypass, right bypass, and over-the-top—to infer occluded and unobserved regions. Top right: The predicted cues are fused with current observations to construct and update multi-sourced value maps and enable future-aware reasoning. This process produces a final affordance map used for intermediate waypoint selection. Bottom right: The execution unit follows the selected waypoint and generates control commands to navigate the robot continuously toward the goal.

Results

We report results in both real-world and simulation settings. The real-robot experiments validate robustness under occlusions, unknown risks, and moving targets, while the simulator experiments provide controlled comparisons across diverse environments.

**Table:** Comparison with baseline method in real-world environments. Results show success counts over ten trials per environment. The last column summarizes performance across all trials.
Scene	Office	Classroom	Common Room	All
Search for static objects
InstructNav	7/10	7/10	8/10	22/30
Ours	8/10	8/10	7/10	23/30
Search for dynamic objects
InstructNav	3/10	4/10	3/10	10/30
Ours	5/10	5/10	6/10	16/30
Sudden Obstacles
InstructNav	4/10	3/10	5/10	12/30
Ours	6/10	7/10	6/10	19/30

Table: Comparison with baseline method in real-world environments. Results show success counts over ten trials per environment. The last column summarizes performance across all trials.

Scene

Office

Classroom

Common Room

All

Search for static objects

InstructNav

7/10

8/10

22/30

Ours

8/10

7/10

23/30

Search for dynamic objects

InstructNav

3/10

4/10

3/10

10/30

Ours

5/10

6/10

16/30

Sudden Obstacles

InstructNav

4/10

3/10

5/10

12/30

Ours

6/10

7/10

6/10

19/30

**Table:** **Quantitative Comparison on Simulation Results.** Cell background colors indicate the best, second best, or third best.
Method	Training Free	HM3D
ZSON	✗	0.255	0.126	--
PixNav	✗	0.379	0.205	--
SPNet	✗	0.312	0.101	--
SGM	✗	0.602	0.308	--
ESC	✓	0.392	0.223	--
VLFM	✓	0.525	0.304	--
VoroNav	✓	0.420	0.260	--
L3MVN	✓	0.504	0.231	4.43
TriHelper	✓	0.565	0.253	3.87
GAMap	✓	0.531	0.260	--
InstructNav	✓	0.510	0.187	2.89
InstructNav*	✓	0.453	0.186	3.38
CogNav	✓	0.725	0.262	--
ApexNav	✓	0.762	0.380	--
Ours	✓	0.609	0.237	2.23

Table: Quantitative Comparison on Simulation Results. Cell background colors indicate the best, second best, or third best.

Method

Training Free

HM3D

SR↑

SPL↑

DTG↓

ZSON

✗

0.255

0.126

PixNav

✗

0.379

0.205

SPNet

✗

0.312

0.101

SGM

✗

0.602

0.308

ESC

✓

0.392

0.223

VLFM

✓

0.525

0.304

VoroNav

✓

0.420

0.260

L3MVN

✓

0.504

0.231

4.43

TriHelper

✓

0.565

0.253

3.87

GAMap

✓

0.531

0.260

InstructNav

✓

0.510

0.187

2.89

InstructNav*

✓

0.453

0.186

3.38

CogNav

✓

0.725

0.262

ApexNav

✓

0.762

0.380

Ours

✓

0.609

0.237

2.23

Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation