A C++/Python toolkit for processing SpatialMP4 format, supporting reading and processing spatial video files containing RGB, depth, pose, and audio data.
- Multi-modal Data Support: Simultaneously process RGB images, depth maps, pose data, and audio
- Stereo Vision: Support for left and right eye RGB image data
- High Performance: Efficient video decoding based on FFmpeg
- Flexible Reading Modes: Support for RGB-only, Depth-only, and Depth-first reading modes
- Random Access: Support for random access to video frames and keyframe indexing
- 3D Reconstruction: Built-in point cloud generation and RGBD data processing
- Camera Calibration: Support for reading and applying intrinsic and extrinsic parameters
- Visualization Tools: Rich data visualization and debugging capabilities
- Cross-Platform: Full support for Linux and macOS
- Operating System:
- Linux (Ubuntu 18.04+ recommended)
- macOS (10.15+ Catalina, Xcode required)
- Compiler:
- GCC 7.0+ or Clang 6.0+ (C++17 support required)
- Apple Clang from Xcode 11.0+ on macOS (how-to-install)
- CMake: 3.24.1+
The project depends on the following third-party libraries:
- FFmpeg: Video encoding/decoding (libavformat, libavcodec, libswscale)
- OpenCV: Image processing and computer vision
- Eigen3: Linear algebra operations
- Sophus: Lie group operations for SE(3) group
- spdlog: High-performance logging library
- fmt: Modern C++ formatting library
- Google Test: Unit testing framework (optional)
git clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4Build ffmpeg first:
bash scripts/build_ffmpeg.shbash scripts/install_deps.shmkdir build && cd build
# Configure project
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_PYTHON=OFF
# Build
make -j$(nproc) # On Linux
make -j$(sysctl -n hw.ncpu) # On macOSSometime it can be difficult to build a c++ source project. FAQ for installation may help you. If still not working, welcome submit a issue.
# Build with tests
cmake .. -DBUILD_TESTING=ON
make -j$(nproc) # On Linux
make -j$(sysctl -n hw.ncpu) # On macOS
# Run tests
cd ..
./build/test_readergit clone https://github.com/Pico-Developer/SpatialMP4
cd SpatialMP4Build ffmpeg first:
bash scripts/build_ffmpeg.shbash scripts/install_deps.shpip3 install .#include "spatialmp4/reader.h"
#include "spatialmp4/data_types.h"
// Create reader
SpatialML::Reader reader("path/to/your/spatial.mp4");
// Check data types
if (reader.HasRGB()) {
std::cout << "Contains RGB data" << std::endl;
}
if (reader.HasDepth()) {
std::cout << "Contains depth data" << std::endl;
}
if (reader.HasPose()) {
std::cout << "Contains pose data" << std::endl;
}
// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft();
auto depth_intrinsics = reader.GetDepthIntrinsics();// Set reading mode
reader.SetReadMode(SpatialML::Reader::ReadMode::DEPTH_FIRST);
reader.Reset();
while (reader.HasNext()) {
SpatialML::rgb_frame rgb_frame;
SpatialML::depth_frame depth_frame;
// Read RGB and depth frames simultaneously
reader.Load(rgb_frame, depth_frame);
// Process data
cv::Mat left_rgb = rgb_frame.left_rgb;
cv::Mat right_rgb = rgb_frame.right_rgb;
cv::Mat depth = depth_frame.depth;
std::cout << "RGB timestamp: " << rgb_frame.timestamp << std::endl;
std::cout << "Depth timestamp: " << depth_frame.timestamp << std::endl;
}reader.SetReadMode(SpatialML::Reader::ReadMode::RGB_ONLY);
reader.Reset();
while (reader.HasNext()) {
SpatialML::rgb_frame rgb_frame;
reader.Load(rgb_frame);
// Process RGB data
cv::imshow("Left RGB", rgb_frame.left_rgb);
cv::imshow("Right RGB", rgb_frame.right_rgb);
cv::waitKey(1);
}// Get all pose data
auto pose_frames = reader.GetPoseFrames();
for (const auto& pose : pose_frames) {
// Convert to SE(3) representation
Sophus::SE3d se3_pose = pose.as_se3();
// Get rotation and translation
Eigen::Matrix3d rotation = se3_pose.rotationMatrix();
Eigen::Vector3d translation = se3_pose.translation();
std::cout << "Pose timestamp: " << pose.timestamp << std::endl;
std::cout << "Position: " << translation.transpose() << std::endl;
}#include "utilities/RgbdUtils.h"
// Get camera parameters
auto rgb_intrinsics = reader.GetRgbIntrinsicsLeft().as_cvmat();
auto depth_intrinsics = reader.GetDepthIntrinsics().as_cvmat();
// Calculate transformation matrix
auto T_I_Srgb = reader.GetRgbExtrinsicsLeft().as_se3();
auto T_I_Stof = reader.GetDepthExtrinsics().as_se3();
auto T_Srgb_Stof = T_I_Srgb.inverse() * T_I_Stof;
// Project depth to RGB
cv::Mat projected_depth;
Utilities::ProjectDepthToRgb(depth_frame.depth, rgb_frame.left_rgb,
rgb_intrinsics, depth_intrinsics,
T_Srgb_Stof, projected_depth);#include "utilities/PointcloudUtils.h"
// Generate point cloud from RGBD data
Utilities::Pointcloud pcd;
Utilities::RgbdToPointcloud(rgb_frame.left_rgb, projected_depth,
rgb_intrinsics, pcd, 10.0f);
// Save point cloud
Utilities::SavePointcloudToFile("output.obj", pcd);Main SpatialMP4 file reader class.
Reader(const std::string& filename)bool HasRGB() const; // Whether contains RGB data
bool HasDepth() const; // Whether contains depth data
bool HasPose() const; // Whether contains pose data
bool HasAudio() const; // Whether contains audio data
bool HasDisparity() const; // Whether contains disparity datacamera_intrinsics GetRgbIntrinsicsLeft() const; // Left RGB camera intrinsics
camera_intrinsics GetRgbIntrinsicsRight() const; // Right RGB camera intrinsics
camera_extrinsics GetRgbExtrinsicsLeft() const; // Left RGB camera extrinsics
camera_extrinsics GetRgbExtrinsicsRight() const; // Right RGB camera extrinsics
camera_intrinsics GetDepthIntrinsics() const; // Depth camera intrinsics
camera_extrinsics GetDepthExtrinsics() const; // Depth camera extrinsics
bool IsRgbDistorted() const; // Whether RGB stream applies lens distortion
std::string GetRgbDistortionModel() const; // Distortion model name (e.g., kBrownConrady)
std::string GetRgbDistortionParamsLeft() const; // Left RGB distortion parameters
std::string GetRgbDistortionParamsRight() const; // Right RGB distortion parametersvoid SetReadMode(ReadMode mode); // Set reading mode
bool HasNext() const; // Whether has next frame
void Reset(); // Reset to beginning
int GetIndex() const; // Get current indexvoid Load(rgb_frame& rgb_frame); // Load RGB frame
void Load(depth_frame& depth_frame); // Load depth frame
void Load(rgb_frame& rgb_frame, depth_frame& depth_frame); // Load RGB and depth frames simultaneouslystruct rgb_frame {
double timestamp; // Timestamp
cv::Mat left_rgb; // Left eye RGB image
cv::Mat right_rgb; // Right eye RGB image
pose_frame pose; // Corresponding pose data
};struct depth_frame {
double timestamp; // Timestamp
cv::Mat depth; // Depth image
pose_frame pose; // Corresponding pose data
};struct pose_frame {
double timestamp; // Timestamp
double x, y, z; // Position
double qw, qx, qy, qz; // Quaternion rotation
Sophus::SE3d as_se3() const; // Convert to SE(3) representation
};import spatialmp4
import numpy as np
# Create a reader
reader = spatialmp4.Reader("your_video.mp4")
# Check available streams
print("Has RGB:", reader.has_rgb())
print("Has Depth:", reader.has_depth())
print("Has Pose:", reader.has_pose())
# Set reading mode
reader.set_read_mode(spatialmp4.ReadMode.DEPTH_FIRST)
# Read frames
while reader.has_next():
rgb_frame, depth_frame = reader.load_both()
left_rgb = rgb_frame.left_rgb # numpy array (H, W, 3)
depth = depth_frame.depth # numpy array (H, W)
pose = rgb_frame.pose
print("RGB timestamp:", rgb_frame.timestamp, "Pose:", pose.x, pose.y, pose.z)
# Convert head pose to IMU pose using the utility binding
head_pose = pose.as_se3().matrix()
head_model_offset = np.array([-0.05057, -0.01874, 0.04309])
imu_pose = spatialmp4.head_to_imu(head_pose, head_model_offset)
print("T_W_I:\n", imu_pose)Main class for reading SpatialMP4 files.
Reader(filename: str)β Create a new reader for the given file.has_rgb() -> boolβ Whether the file contains RGB data.has_depth() -> boolβ Whether the file contains depth data.has_pose() -> boolβ Whether the file contains pose data.has_audio() -> boolβ Whether the file contains audio data.has_disparity() -> boolβ Whether the file contains disparity data.get_duration() -> floatβ Get video duration in seconds.get_rgb_fps() -> floatβ Get RGB stream FPS.get_depth_fps() -> floatβ Get depth stream FPS.get_rgb_width() -> intβ Get RGB frame width.get_rgb_height() -> intβ Get RGB frame height.get_depth_width() -> intβ Get depth frame width.get_depth_height() -> intβ Get depth frame height.get_rgb_intrinsics_left() -> CameraIntrinsicsβ Get left RGB camera intrinsics.get_rgb_intrinsics_right() -> CameraIntrinsicsβ Get right RGB camera intrinsics.get_rgb_extrinsics_left() -> CameraExtrinsicsβ Get left RGB camera extrinsics.get_rgb_extrinsics_right() -> CameraExtrinsicsβ Get right RGB camera extrinsics.get_depth_intrinsics() -> CameraIntrinsicsβ Get depth camera intrinsics.get_depth_extrinsics() -> CameraExtrinsicsβ Get depth camera extrinsics.is_rgb_distorted() -> boolβ Whether the RGB stream includes lens distortion.get_rgb_distortion_model() -> strβ Distortion model name ("kBrownConrady", etc.).get_rgb_distortion_params_left() -> strβ Serialized left-eye distortion parameters.get_rgb_distortion_params_right() -> strβ Serialized right-eye distortion parameters.get_pose_frames() -> List[PoseFrame]β Get all pose frames.set_read_mode(mode: ReadMode)β Set reading mode (see enums below).has_next() -> boolβ Whether there is a next frame.reset()β Reset to the beginning of the file.get_index() -> intβ Get current frame index.get_frame_count() -> intβ Get total number of frames.load_rgb() -> RGBFrameβ Load the next RGB frame.load_depth(raw_head_pose: bool = False) -> DepthFrameβ Load the next depth frame.load_both() -> (RGBFrame, DepthFrame)β Load the next RGB and depth frames simultaneously.load_rgbd(densify: bool = False) -> Rgbdβ Load RGBD data (for advanced use).
head_to_imu(head_pose: np.ndarray, head_model_offset: np.ndarray) -> np.ndarrayβ Convert a head pose (4Γ4 matrix) and head-model offset (3,) into the corresponding IMU pose (4Γ4 matrix). The function validates the incoming array shapes, accepts standard NumPy row-major buffers, and returns a copy of the resulting transform.
timestamp: floatβ Frame timestamp.left_rgb: np.ndarrayβ Left RGB image (H, W, 3, uint8).right_rgb: np.ndarrayβ Right RGB image (H, W, 3, uint8).pose: PoseFrameβ Associated pose data.
timestamp: floatβ Frame timestamp.depth: np.ndarrayβ Depth image (H, W, float32, meters).pose: PoseFrameβ Associated pose data.
timestamp: floatβ Pose timestamp.x, y, z: floatβ Position.qw, qx, qy, qz: floatβ Quaternion orientation.as_se3()β Convert to SE(3) representation (requires Sophus/Eigen, advanced use).
fx, fy, cx, cy: floatβ Camera intrinsic parameters.as_cvmat()β Return as OpenCV matrix.
extrinsics: np.ndarrayβ 4x4 extrinsic matrix.as_cvmat()β Return as OpenCV matrix.as_se3()β Return as SE(3) (advanced use).
RGB_ONLYβ Only read RGB frames.DEPTH_ONLYβ Only read depth frames.DEPTH_FIRSTβ Read both RGB and depth frames, depth as reference.
UNKNOWNβ Unknown stream typeAUDIOβ Audio streamAUDIO_2β Secondary audio streamRGBβ RGB video streamDISPARITYβ Disparity streamPOSEβ Pose data streamDEPTHβ Depth stream
- See examples/python/visualize_rerun.py and examples/python/generate_pcd.py for advanced usage, including point cloud generation and visualization with Open3D or Rerun.
- All image and depth data are returned as NumPy arrays for easy integration with OpenCV, Open3D, PyTorch, etc.
- Camera parameters and pose data can be used for 3D reconstruction and SLAM applications.
The project uses spdlog for logging:
#include <spdlog/spdlog.h>
// Set log level
spdlog::set_level(spdlog::level::debug);
// Debug information will be automatically output in the codeThis project is licensed under the MIT. See the LICENSE file for details.
Issues and Pull Requests are welcome to improve this project!
For questions or suggestions, please contact us through GitHub Issues.