We present NeuSE, a novel Neural SE(3)- Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-equivariantly in tandem with the object in the physical world. With NeuSE, relative frame transforms can be directly derived from inferred latent codes. Using NeuSE for object shape and pose characterization, our proposed SLAM paradigm can operate independently or in conjunction with typical SLAM systems. It directly infers SE(3) camera pose constraints compatible with general SLAM pose graph optimization while maintaining a lightweight object-centric map that adapts to real-world changes. Our approach is evaluated on synthetic and real-world sequences featuring changed objects and shows improved localization accuracy and change-aware mapping capability, when working either standalone or jointly with a common SLAM pipeline.
The above schematic illustrates how we achieve consistent spatial understanding with NeuSE. Object-centric map
of
mugs and bottles constructed
from the real-world experiment is shown here for illustration.
(a) NeuSE acts as a compact point cloud surrogate for objects, encoding full object shapes and
transforming SE(3)-
equivariantly with the objects. Latent codes of bottles and mugs from different frames can be effectively
associated (dashed line) for direct computation of inter-frame transforms, which are then added to constrain
camera pose (Ti)
optimization both locally (TLi) and globally (TGi).
(b)
The system performs change-aware object-level mapping, where
changed objects are updated alongside unchanged ones with full shape
reconstructions in the object-centric map.
We evaluate the proposed algorithm on both synthetic and real-world sequences consisting of unseen instances of the bottle and mug categories, where objects are added, removed, and switched places to simulate environment changes in the long term. Below, we showcase the localization and mapping results of the proposed NeuSE-based object SLAM approach on two self-collected real-world sequences: the 4-Round loop and the Triple-infinity loop.
(1) In column 1, the proposed object SLAM paradigm demonstrates its ability to sustain reasonable
localization performance when working standalone with only NeuSE-inferred inter-frame camera pose
constraints (objects available)
or noisy external odometry measurements (objects unavailable).
(2) In column 2-5, the proposed object SLAM paradigm shows its ability to improve localization accuracy
when
working jointly with typical SLAM systems (ORB-SLAM3 in this case). Specifically, in (a), the
integration of
our strategy helps prevent tracking failure, as seen by the spike-free trajectory estimates in column 3
and
5 compared to those in column 2 and 4. In (b), our strategy successfully eliminates the start and
end
point
drift, resulting in an improved trajectory estimate when revisiting the lower right part of the
environment, as indicated by the lighter color of the ATE value distribution in column 5 compared to
column
4.
@inproceedings{fu2023neuse,
title={NeuSE: Neural SE (3)-Equivariant Embedding for Consistent Spatial Understanding with Objects},
author={Fu, Jiahui and Du, Yilun and Singh, Kurran and Tenenbaum, Joshua B and Leonard, John J},
booktitle={Proceedings of Robotics: Science and Systems (RSS)},
year={2023}
}