MAD-LTX

Controllable driving-video generation with motion and appearance decoupling.

MAD-LTX generates driving videos from a first frame, a text prompt, and optional structured controls such as ego motion, object motion, semantic segmentation, or HD maps. The method first predicts an intermediate motion representation, then conditions a second LTX-Video model on that motion video to generate the final RGB video.

What This Repo Contains

Config-driven inference for the released MAD-LTX LoRA checkpoints.
Training configs and latent-caching scripts for reproducing the main OpenDV LoRA models.
Minimal retained utilities for ego-motion control rendering and panoptic-control coloring.
Evaluation entrypoints for video quality and control-following metrics.

Installation

The tested environment is captured in Dockerfile. You can build it locally:

docker build -t mad-ltx -f Dockerfile .

or start from the published image:

docker pull ahparyald/mad-ltx

If you prefer a native conda or pip setup, use the package list in the Dockerfile as the reference environment for both training and inference.

Quick Inference

PYTHONPATH=src python -m ltxv_trainer.inference \
  --config configs/inference/rgb_pose_motion_13b.yaml \
  --prompt "The image depicts a residential urban road. A number of parked vehicles are present in the road, one parked directly in front of the ego vehicle. The surrounding environment includes buildings and trees. The lighting suggests daytime." \
  --rgb-image examples/inference/ego_motion/rgb_2.jpg \
  --control-image examples/inference/ego_motion/pose_2.jpg \
  --reference-video examples/inference/ego_motion/egomotion_2.mp4 \
  --out outputs/rgb_pose_motion

The command downloads the base model and selected LoRA checkpoint if they are not already cached. See docs/inference.md for all supported modes and example assets.

Reproduce Training

Training has three stages:

Render intermediate motion representations from the released OpenDV pose files.
Cache LTX VAE latents and text embeddings.
Train the corresponding LoRA from configs/training/rgb_to_pose.yaml or configs/training/pose_to_rgb.yaml.

The commands and data-preparation notes are in docs/training.md.

Evaluation

The evaluation entrypoints are in evaluation. They cover video quality metrics and trajectory-based control-following metrics, with commands documented in evaluation/README.md.

Citation

@article{rahimi2026mad,
  title={MAD: Motion Appearance Decoupling for efficient Driving World Models},
  author={Rahimi, Ahmad and Gerard, Valentin and Zablocki, Eloi and Cord, Matthieu and Alahi, Alexandre},
  journal={arXiv preprint arXiv:2601.09452},
  year={2026}
}

Acknowledgements

MAD-LTX builds on Lightricks' LTX-Video codebase and the broader ecosystem around Diffusers, Transformers, PEFT, OpenPifPaf, and DWPose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAD-LTX

What This Repo Contains

Installation

Quick Inference

Reproduce Training

Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
docs		docs
evaluation		evaluation
examples/inference		examples/inference
preprocessing		preprocessing
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MAD-LTX

What This Repo Contains

Installation

Quick Inference

Reproduce Training

Evaluation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages