Text-driven indoor navigation with OCR-based landmark SLAM for ROS 2.
The robot detects text signs (room names, directions, store fronts) with NavOCR, builds a 3D landmark map on top of SLAM, and then navigates on text commands such as Kitchen or Exit by converting them into Nav2 goals.
NavOCR detects text on the RGB frame. textmap lifts each detection into 3D using the depth image, associates it across frames on a spatial grid, and publishes the result as a persistent landmark map anchored to the SLAM pose graph.
Given a text command, text_nav_bridge finds the closest-matching landmark from the saved map, ray-marches the robot-to-landmark line on the Nav2 costmap to pick a free-space goal, and sends it to Nav2 as a NavigateToPose action.
Both GIFs above are recorded in the Gazebo simulation shipped with this repo — the same pipeline reproduces the pipeline's behavior on real hardware (see Real Environment).
End-to-end pipeline in Gazebo: build a map with text landmarks, then drive to them by name. No real hardware required.
ROS 2 Humble on Ubuntu 22.04.
# Nav2 and the simulation SLAM backend
sudo apt install \
ros-humble-navigation2 ros-humble-nav2-bringup \
ros-humble-slam-toolbox
# Gazebo and TurtleBot3
sudo apt install \
ros-humble-gazebo-ros-pkgs \
ros-humble-turtlebot3-gazebo \
ros-humble-turtlebot3-description \
ros-humble-turtlebot3-teleop
# textmap build-time dependency (msg definitions only)
sudo apt install ros-humble-rtabmap-msgs
# NavOCR Python dependencies
pip install paddlepaddle==3.0.0 paddleocr==3.4.0Set the TurtleBot3 model once per shell:
export TURTLEBOT3_MODEL=waffleThis repository is a colcon workspace umbrella — it pulls in the four packages (NavOCR, textmap, text_nav_bridge, text_nav_sim) as Git submodules. Clone recursively so src/ is populated in one step.
git clone --recurse-submodules https://github.com/kc-ml2/text_navigation.git ~/ros2_ws
cd ~/ros2_ws
source /opt/ros/humble/setup.bash
colcon build
source install/setup.bashAlready cloned without --recurse-submodules? Initialize the submodules after the fact:
cd ~/ros2_ws
git submodule update --init --recursiveGazebo, the mapping pipeline, and teleop live in separate packages now, so they run in separate terminals:
# Terminal 1 — Gazebo (house world + waffle_rgbd)
ros2 launch text_nav_sim simulation.launch.py
# Terminal 2 — slam_toolbox + NavOCR + textmap + RViz
ros2 launch textmap textmap_sim.launch.py
# Terminal 3 — teleop
ros2 run turtlebot3_teleop teleop_keyboardWhen you are done exploring, before pressing Ctrl+C, save the occupancy grid (Terminal 2 prints the exact command with a timestamped directory):
mkdir -p ~/map/sim_run
# Occupancy grid for AMCL (.pgm + .yaml)
ros2 run nav2_map_server map_saver_cli -f ~/map/sim_run/map \
--ros-args -p map_subscribe_transient_local:=truePress Ctrl+C on Terminal 2 to stop. textmap writes landmarks.yaml to ~/map/sim_run/ automatically on shutdown.
You should now have:
~/map/sim_run/
map.pgm # occupancy grid image
map.yaml # map_server metadata
landmarks.yaml # detected text landmarks with 3D positions
Same split as Phase 1 — Gazebo in one terminal, Nav2 with AMCL in another:
# Terminal 1 — Gazebo
ros2 launch text_nav_sim simulation.launch.py
# Terminal 2 — map_server + AMCL + Nav2 + text_nav_bridge + RViz
ros2 launch text_nav_bridge text_nav_sim.launch.py \
landmark_file:=~/map/sim_run/landmarks.yaml \
map_yaml_file:=~/map/sim_run/map.yamlSend a text command — it will be matched against the landmarks captured in Phase 1 and converted into a Nav2 goal:
ros2 topic pub --once /text_nav/command std_msgs/msg/String "data: 'Kitchen'"Monitor progress:
ros2 topic echo /text_nav/statusTry other landmarks from the sim world: Bedroom, Bathroom, Office, Exit, Living Room, Laundry, Storage Room, Garage, Closet.
Pose graph localization (using the slam_toolbox .posegraph) is planned for a later release; AMCL on the occupancy grid is currently the only supported backend in simulation.
spawn_entity.py hangs or odom frame does not exist. A previous Gazebo process is still alive. Kill it before relaunching:
killall -9 gzserver gzclient; pkill -9 -f ros2FLAGS_enable_pir_* / oneDNN errors from NavOCR on CPU. Set these before launching:
export FLAGS_enable_pir_api=0
export FLAGS_enable_pir_in_executor=0Conda (base) breaks spawn_entity.py. Deactivate conda before launching — Gazebo expects system Python 3.10:
conda deactivateThe same pipeline runs on a physical robot. The simulation and real-hardware flows differ in three ways: (1) the SLAM backend is RTAB-Map instead of slam_toolbox, (2) you need a real RGB-D camera and IMU driver, and (3) you run the navigation stack against an RTAB-Map database (.db) rather than an occupancy grid + pose graph.
This project was tested on an Intel RealSense D455 (RGB-D + built-in IMU) mounted on a mobile base. Any RGB-D + IMU sensor with a ROS 2 driver should work with parameter tweaks to textmap (camera frame names, depth topic, camera_info topic).
On top of the simulation prerequisites, install:
sudo apt install \
ros-humble-rtabmap-ros \
ros-humble-imu-filter-madgwick \
ros-humble-realsense2-cameraThe default flow is to run the camera driver and textmap live while you drive the robot.
# Terminal 1 — RealSense driver
ros2 launch realsense2_camera rs_launch.py \
enable_infra1:=true enable_infra2:=true \
enable_depth:=true enable_gyro:=true enable_accel:=true \
unite_imu_method:=2
# Terminal 2 — textmap + RTAB-Map SLAM + NavOCR
ros2 launch textmap textmap_rtabmap.launch.py \
landmark_save_path:=~/map/real_run/landmarks.yamlDrive the robot around the environment. On shutdown (Ctrl+C on terminal 2), textmap writes landmarks.yaml automatically; RTAB-Map writes its database to ~/.ros/rtabmap.db by default (move it next to the landmark file for the navigation step).
To save landmarks manually while the node is still running:
ros2 service call /textmap/save_landmarks std_srvs/srv/TriggerAlternative: record a rosbag and map from it offline
If you prefer to separate data collection from mapping, record the RGB-D + IMU + TF topics and play them back later:
# Record
ros2 bag record -o my_run \
/camera/color/image_raw /camera/color/camera_info \
/camera/depth/image_rect_raw /camera/infra1/camera_info \
/camera/imu /tf /tf_static
# Later: map from the bag
ros2 launch textmap textmap_rtabmap.launch.py \
landmark_save_path:=~/map/real_run/landmarks.yaml &
ros2 bag play my_run --clock -r 1.0This is useful when you want to re-run mapping with different textmap parameters without re-collecting data.
text_nav_bridge's launch resolves both the landmark file and the RTAB-Map database from a single bag_name argument, which looks up:
src/text_nav_bridge/landmarks/<bag_name>.yamlsrc/text_nav_bridge/rtabmap_db/<bag_name>.db
Copy or symlink the outputs from the mapping step into those directories, then launch the navigation stack and the camera driver:
# Terminal 1 — Nav2 + RTAB-Map localization + text_nav_bridge
ros2 launch text_nav_bridge text_nav_rtabmap.launch.py bag_name:=real_run
# Terminal 2 — RealSense driver (same as mapping)
ros2 launch realsense2_camera rs_launch.py \
enable_infra1:=true enable_infra2:=true \
enable_depth:=true enable_gyro:=true enable_accel:=true \
unite_imu_method:=2Send a text command:
ros2 topic pub --once /text_nav/command std_msgs/msg/String "data: 'restroom'"
ros2 topic echo /text_nav/statusAlternative: replay a rosbag instead of running the camera
If you have a recorded session, swap the camera driver for rosbag playback and keep the navigation launch the same:
# Terminal 1 — same as above
ros2 launch text_nav_bridge text_nav_rtabmap.launch.py bag_name:=real_run
# Terminal 2 — rosbag replay
ros2 bag play my_run --clockA reference rosbag recorded on our setup is available for download:
- Link: TBD — to be added
The bag corresponds to bag_name:=TBD in the navigation launch and can be used to reproduce the text_nav_bridge GIF above without physical hardware.
| Package | Role | Repository |
|---|---|---|
| NavOCR | Text detection + OCR (PaddleDetection + PaddleOCR) | kc-ml2/NavOCR |
| textmap | 3D text landmark SLAM (NavOCR + depth + SLAM pose graph) | kc-ml2/TextMap |
| text_nav_bridge | Text command to Nav2 NavigateToPose goal |
kc-ml2/text_nav_bridge |
| text_nav_sim | Gazebo world + TurtleBot3 waffle_rgbd spawn (mapping and navigation pipelines live in textmap and text_nav_bridge) | kc-ml2/text_nav_sim |
| Package | Role | Upstream |
|---|---|---|
| Nav2 | Navigation stack (planner, controller, recovery) | https://github.com/ros-navigation/navigation2 |
| slam_toolbox | 2D SLAM backend used by the simulation | https://github.com/SteveMacenski/slam_toolbox |
| rtabmap_ros | 3D SLAM backend used on real hardware | https://github.com/introlab/rtabmap_ros |
Apache License 2.0. See LICENSE for the full text and per-package notices.

