Skip to content

Commit 4401a5a

Browse files
committed
text to video ref implementation
1 parent 8999c4d commit 4401a5a

File tree

14 files changed

+1299
-0
lines changed

14 files changed

+1299
-0
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,6 @@
1010
[submodule "language/deepseek-r1/submodules/LiveCodeBench"]
1111
path = language/deepseek-r1/submodules/LiveCodeBench
1212
url = https://github.com/LiveCodeBench/LiveCodeBench
13+
[submodule "text_to_video/submodules/VBench"]
14+
path = text_to_video/submodules/VBench
15+
url = https://github.com/Vchitect/VBench

text_to_video/.gitignore

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Model weights (large files)
2+
models/
3+
4+
# Cache directories
5+
.cache/
6+
7+
# Generated outputs and evaluation results
8+
data/outputs/
9+
data/outputs_*/
10+
data/results/
11+
data/evaluation_results/
12+
data/*_eval*/
13+
data/prompt_mapping.json
14+
15+
# Python cache
16+
__pycache__/
17+
*.pyc
18+
*.pyo
19+
*.pyd
20+
.Python
21+
22+
# Jupyter
23+
.ipynb_checkpoints/
24+
25+
# IDE
26+
.vscode/
27+
.idea/
28+
*.swp
29+
*.swo
30+
31+
# OS
32+
.DS_Store
33+
Thumbs.db
34+
35+
# Logs and results
36+
*.log
37+
results/
38+
outputs/

text_to_video/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Text-to-Video Benchmark
2+
3+
Text-to-video generation using Wan2.2 T2V-A14B-Diffusers model and VBench evaluation.
4+
5+
## Quick Start
6+
7+
```bash
8+
# Clone with submodules
9+
git clone --recurse-submodules https://github.com/mlcommons/inference.git
10+
cd inference/text_to_video
11+
12+
# Build Docker
13+
./launch.sh --build
14+
15+
# Download model
16+
./launch.sh python3 download_model.py
17+
18+
# Run inference (supports data parallel)
19+
./launch.sh python -m torch.distributed.run --nproc_per_node=8 run_inference.py
20+
21+
# Evaluate
22+
./launch.sh python run_evaluation.py
23+
```
24+
25+
## Files
26+
27+
- `inference_config.yaml` - Generation parameters (resolution, fps, seed, etc.)
28+
- `download_model.py` - Model download
29+
- `run_inference.py` - Video generation
30+
- `run_evaluation.py` - VBench evaluation
31+
- `launch.sh` - Docker launcher
32+
- `data/vbench_prompts.txt` - Input prompts
33+
- `data/fixed_latent.pt` - Optional fixed latent tensor for deterministic generation
34+
35+
## References
36+
37+
- Model: [Wan-AI/Wan2.2-T2V-A14B-Diffusers](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers)
38+
- VBench: [GitHub](https://github.com/Vchitect/VBench)
39+
- MLPerf: [Inference](https://github.com/mlcommons/inference)
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
a shark is swimming in the ocean, black and white
2+
A person is shuffling cards
3+
palace
4+
aquarium
5+
pharmacy
6+
an umbrella and a handbag
7+
waterfall
8+
A person is washing dishes
9+
A person is playing chess
10+
desert
11+
a teddy bear and a frisbee
12+
A person is grooming dog
13+
a shark is swimming in the ocean, surrealism style
14+
a cow and an elephant
15+
A boat sailing leisurely along the Seine River with the Eiffel Tower in background, in cyberpunk style
16+
an elephant taking a peaceful walk
17+
A person is shaking hands
18+
courtyard
19+
A person is digging
20+
an apple and a cell phone
21+
a shark is swimming in the ocean, in cyberpunk style
22+
A boat sailing leisurely along the Seine River with the Eiffel Tower in background, watercolor painting
23+
a truck accelerating to gain speed
24+
a horse and a sheep
25+
An astronaut flying in space, in cyberpunk style
26+
alley
27+
A person is pushing cart
28+
a car slowing down to stop
29+
A person is playing trumpet
30+
A person is skydiving
31+
A person is shining shoes
32+
an airplane soaring through a clear blue sky
33+
sky
34+
campus
35+
An astronaut flying in space, surrealism style
36+
a knife and a tv
37+
a toilet and a hair drier
38+
An astronaut flying in space by Hokusai, in the style of Ukiyo
39+
cafeteria
40+
volcano
41+
a car stuck in traffic during rush hour
42+
a skateboard and a surfboard
43+
A person is knitting
44+
A person is push up
45+
a bear catching a salmon in its powerful jaws
46+
A person is counting money
47+
a cat drinking water
48+
A beautiful coastal beach in spring, waves lapping on sand by Hokusai, in the style of Ukiyo
49+
A person is ice skating
50+
a remote and a keyboard

text_to_video/data/fixed_latent.pt

9.23 MB
Binary file not shown.

0 commit comments

Comments
 (0)