diff --git a/cookbooks/cosmos3/generator/transfer/.gitignore b/cookbooks/cosmos3/generator/transfer/.gitignore new file mode 100644 index 00000000..c7421d96 --- /dev/null +++ b/cookbooks/cosmos3/generator/transfer/.gitignore @@ -0,0 +1,8 @@ +# Generated ffmpeg preview transcodes (created by preview_helpers.make_preview) +assets/**/*_preview.mp4 + +# Notebook output directories +outputs/ + +# Local env setup helpers (generated by setup_env.py, not needed in repo) +.cache/ diff --git a/cookbooks/cosmos3/generator/transfer/README.md b/cookbooks/cosmos3/generator/transfer/README.md index 0c477056..8acd54c9 100644 --- a/cookbooks/cosmos3/generator/transfer/README.md +++ b/cookbooks/cosmos3/generator/transfer/README.md @@ -1,6 +1,7 @@ # Cosmos3 Generator Transfer Examples -Cosmos3-Nano video **transfer** examples on the native PyTorch (Cosmos Framework) path. +Cosmos3 video **transfer** examples — **Nano** (single GPU) and **Super** (multi-GPU, 32B) — on +the native PyTorch (Cosmos Framework) path. Sample assets under [`assets/`](./assets) cover spatial control signals paired with `prompt.json` files: @@ -33,69 +34,94 @@ come from the control video; see the spec field reference for how `fps` and | World scenario (WSM) | `assets/wsm/` | `control_wsm.mp4` + `prompt.json` | 101 frames @ 10 FPS | Transfer inference is selected automatically when any hint key is present in the spec. +The same spec files are used for both Nano and Super — model selection is controlled +entirely by `--checkpoint-path`. ## Run with Cosmos Framework ### Quickstart Set up the environment: [Cosmos Framework setup](../../README.md#cosmos-framework). -Activate the framework venv, then run inference (checked-in `specs/*.json` use paths -relative to `specs/`). Transfer on Nano looks like: +Run the commands below inside the **cosmos container** (e.g. `pytorch:25.09-py3`) — the same +environment used to install the venv and run the notebook. The commands mirror the notebook +exactly: `cd` into the framework repo first, then invoke the venv's Python or torchrun +(the system Python does not have `cosmos_framework`). ```bash -cd cookbooks/cosmos3/generator/transfer +# Set once — the cosmos-framework repo root (contains .venv/ and pyproject.toml). +# In this cosmos checkout: packages/cosmos3 (or packages/cosmos-framework). +export COSMOS_FRAMEWORK=/path/to/cosmos-framework # e.g. /packages/cosmos3 +export TRANSFER_ROOT=$(pwd)/cookbooks/cosmos3/generator/transfer -# edge -torchrun --nproc-per-node=1 \ - -m cosmos_framework.scripts.inference \ - --parallelism-preset=latency \ - -i specs/edge.json \ - -o ./output/ \ - --checkpoint-path Cosmos3-Nano \ - --seed 2026 +# NGC containers bundle libtorch in LD_LIBRARY_PATH which conflicts with Triton/CUDA. +unset LD_LIBRARY_PATH +``` -# blur -torchrun --nproc-per-node=1 \ - -m cosmos_framework.scripts.inference \ - --parallelism-preset=latency \ - -i specs/blur.json \ - -o ./output/ \ - --checkpoint-path Cosmos3-Nano \ - --seed 2026 +#### Cosmos3-Nano (single GPU) -# depth -torchrun --nproc-per-node=1 \ - -m cosmos_framework.scripts.inference \ - --parallelism-preset=latency \ - -i specs/depth.json \ - -o ./output/ \ - --checkpoint-path Cosmos3-Nano \ - --seed 2026 +```bash +cd "$COSMOS_FRAMEWORK" -# seg -torchrun --nproc-per-node=1 \ - -m cosmos_framework.scripts.inference \ +# edge — replace edge.json with blur.json / depth.json / seg.json / wsm.json for other controls +CUDA_VISIBLE_DEVICES=0 \ +.venv/bin/python -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/seg.json \ - -o ./output/ \ + -i "$TRANSFER_ROOT/specs/edge.json" \ + -o "$TRANSFER_ROOT/outputs/Cosmos3-Nano/" \ --checkpoint-path Cosmos3-Nano \ --seed 2026 +``` + +#### Cosmos3-Super (multi-GPU) + +```bash +cd "$COSMOS_FRAMEWORK" -# wsm -torchrun --nproc-per-node=1 \ +# edge — replace edge.json with other control specs as needed +CUDA_VISIBLE_DEVICES=0,1,2,3 \ +.venv/bin/torchrun --nproc-per-node=4 \ + --master-addr=127.0.0.1 --master-port=29500 \ -m cosmos_framework.scripts.inference \ --parallelism-preset=latency \ - -i specs/wsm.json \ - -o ./output/ \ - --checkpoint-path Cosmos3-Nano \ + -i "$TRANSFER_ROOT/specs/edge.json" \ + -o "$TRANSFER_ROOT/outputs/Cosmos3-Super/" \ + --checkpoint-path Cosmos3-Super \ --seed 2026 ``` +| | Cosmos3-Nano | Cosmos3-Super | +|---|---|---| +| `--checkpoint-path` | `Cosmos3-Nano` | `Cosmos3-Super` | +| Launcher | `.venv/bin/python` (from framework root) | `.venv/bin/torchrun --nproc-per-node=` (from framework root) | +| `--parallelism-preset` | `latency` | `latency` | +| GPUs | 1 | 4+ | + The input spec sets `prompt_path` and a hint block with `control_path` pointing at the checked-in assets under [`assets/`](./assets) via paths relative to [`specs/`](./specs). -Outputs are written under the directory passed to `-o`, with one subdirectory per sample name, -for example `output/transfer_edge/vision.mp4`. Batch size must be 1 for transfer. +Outputs are written under the directory passed to `-o`, with one subdirectory per sample +name, e.g. `outputs/Cosmos3-Nano/transfer_edge/vision.mp4`. + +### Notebook (self-contained) + +[`run_video_transfer_with_cosmos_framework.ipynb`](./run_video_transfer_with_cosmos_framework.ipynb) +is a self-contained tutorial: it installs all dependencies (system packages, framework +clone, Python venv via `uv`), authenticates with Hugging Face, and runs all five controls +with previews. + +1. Open the notebook and edit **§2 (Configure)** — paste your `HF_TOKEN` and optionally + set cache/output paths. +2. Run **§9–§13** for Cosmos3-Nano (single GPU) or **§14–§18** for Cosmos3-Super (multi-GPU). + No model flag needed — each section uses its matching checkpoint explicitly. + +To execute headlessly: + +```bash +cd cookbooks/cosmos3/generator/transfer +jupyter execute run_video_transfer_with_cosmos_framework.ipynb +``` + +Outputs land under `outputs/notebooks//transfer_/vision.mp4`. ### Spec field reference @@ -134,13 +160,12 @@ Key fields: - **`num_frames`** — number of video frames. - ### Cookbook entrypoints - [`run_video_transfer_with_cosmos_framework.ipynb`](./run_video_transfer_with_cosmos_framework.ipynb) — - full tutorial on a **GPU host**: environment setup, `nvidia-smi` check, then five inference blocks - (edge, blur, depth, seg, wsm) with previews. See [Cosmos3 environment setup](../../README.md). + self-contained notebook: §9–§13 Nano (single GPU), §14–§18 Super (multi-GPU). Edit §2, run top-to-bottom. - [`specs/`](./specs) — checked-in Framework input JSON per control (paths relative to `specs/`). + Shared by both Nano and Super. ### Troubleshooting diff --git a/cookbooks/cosmos3/generator/transfer/preview_helpers.py b/cookbooks/cosmos3/generator/transfer/preview_helpers.py index 9b054813..16e439bf 100644 --- a/cookbooks/cosmos3/generator/transfer/preview_helpers.py +++ b/cookbooks/cosmos3/generator/transfer/preview_helpers.py @@ -95,10 +95,18 @@ def make_preview(src: Path, crf: int = 28) -> Path: return preview -def preview_transfer(control: str) -> None: +def preview_transfer(control: str, *, model: str | None = None) -> None: + """Preview control input and generated output for *control*. + + *model* selects which output directory to read (``Cosmos3-Nano`` uses + ``//…``; ``Cosmos3-Super`` uses + ``/_super/…``). Defaults to the + ``COSMOS3_MODEL`` environment variable, falling back to ``Cosmos3-Nano``. + """ + resolved_model = model or os.environ.get("COSMOS3_MODEL", "Cosmos3-Nano") spec = load_transfer_spec(control) control_path = resolve_spec_path(spec[control]["control_path"]) - vision_path = _output_root() / control / f"transfer_{control}" / "vision.mp4" + vision_path = _output_root() / resolved_model / f"transfer_{control}" / "vision.mp4" if not control_path.is_file(): raise FileNotFoundError(f"missing control video: {control_path}") if not vision_path.is_file(): diff --git a/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb b/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb index faf6ad3a..a7c3fc4b 100644 --- a/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb +++ b/cookbooks/cosmos3/generator/transfer/run_video_transfer_with_cosmos_framework.ipynb @@ -7,39 +7,46 @@ "source": [ "" - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "markdown", "id": "transfer-title", "metadata": {}, "source": [ - "# Cosmos3 Nano Transfer with Cosmos Framework\n", + "# Cosmos3 Transfer with Cosmos Framework\n", "\n", - "This notebook runs Cosmos3-Nano **video transfer** inference through the native Cosmos Framework PyTorch entrypoint:\n", + "This notebook runs Cosmos3 **video transfer** inference through the native Cosmos Framework PyTorch entrypoint:\n", "\n", "```bash\n", "python -m cosmos_framework.scripts.inference\n", "```\n", "\n", - "Transfer generates a target clip from a caption (`prompt.json`) and a spatial control video on the hint block (`control_path`). Supported cookbook controls:\n", + "Transfer generates a target clip from a caption (`prompt.json`) and a spatial control video. Supported controls:\n", "\n", - "- **edge** \u2014 Canny edge map (`control_edge.mp4`)\n", - "- **blur** \u2014 blurred reference (`control_blur.mp4`)\n", - "- **depth** \u2014 depth map (`control_depth.mp4`)\n", - "- **seg** \u2014 segmentation map (`control_seg.mp4`)\n", - "- **wsm** \u2014 world-scenario map (`control_wsm.mp4`)\n", + "- **edge** — Canny edge map (`control_edge.mp4`)\n", + "- **blur** — blurred reference (`control_blur.mp4`)\n", + "- **depth** — depth map (`control_depth.mp4`)\n", + "- **seg** — segmentation map (`control_seg.mp4`)\n", + "- **wsm** — world-scenario map (`control_wsm.mp4`)\n", "\n", - "vLLM-Omni does not expose transfer controls today; use this Cosmos Framework path only.\n", + "Run all Cosmos3-Nano examples first (§9–§13), then run the Cosmos3-Super examples (§14–§18) with the same control assets.\n", "\n", - "Sections **8\u201312** each run one control (inference + preview). Run only the blocks you need.\n", + "| Model | Launcher | Parallelism | GPUs |\n", + "|---|---|---|---|\n", + "| Cosmos3-Nano | `python` (single GPU) | `latency` | 1 |\n", + "| Cosmos3-Super | `torchrun` (multi-GPU) | `throughput` | 4+ |\n", "\n", - "> **GPU required.** Run on a host where \u00a73 (`nvidia-smi`) and \u00a77 (`cuda available: True`) both pass.\n", + "> **GPU required.** Run on a host where §3 (`nvidia-smi`) and §7 (`cuda available: True`) both pass.\n", "\n", - "**Self-contained setup:** everything needed to run this notebook (system packages, clone, Python venv) is in \u00a72\u2013\u00a77 below \u2014 no external bootstrap scripts required.\n", + "**Self-contained setup:** everything needed (system packages, clone, Python venv) is in §2–§7 — no external bootstrap scripts required.\n", "\n", - "Workflow: \u00a72 configure \u2192 \u00a73 GPU check \u2192 \u00a74 system packages \u2192 \u00a75 clone framework \u2192 \u00a76 install \u2192 \u00a77 verify \u2192 \u00a78 review specs \u2192 \u00a79\u2013\u00a713 inference and preview (run only the controls you need).\n" - ] + "Workflow: §2 configure → §3 GPU check → §4 system packages → §5 clone framework → §6 install → §7 verify → §8 review specs → §9–§13 Nano inference → §14–§18 Super inference.\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "markdown", @@ -48,31 +55,64 @@ "source": [ "## 1. Prerequisites\n", "\n", - "1. Linux with NVIDIA GPU access (`nvidia-smi` visible where you run this notebook).\n", - "2. `git` (\u00a75 clones Cosmos Framework; \u00a74 installs `git-lfs` when `apt-get` is available).\n", - "3. Outbound network for `git clone`, PyPI (`uv sync` in \u00a76), and Hugging Face checkpoints (`HF_TOKEN` or `uvx hf auth login`).\n", - "4. Sample inputs under [`assets/`](./assets) and specs under [`specs/`](./specs) (shipped with this cookbook).\n", - "5. \u00a72 picks `COSMOS3_UV_GROUP` from your CPU/arch: `cu130-train` for CUDA 13 or `aarch64`; `cu128-train` for CUDA 12 on x86_64 (override if your driver does not match).\n", + "1. Linux machine with an NVIDIA GPU.\n", + "2. A [Hugging Face account](https://huggingface.co) with access to the Cosmos3 model repos — paste your token into **§2 below**.\n", + "3. `git` available on PATH (§5 clones Cosmos Framework when missing).\n", + "4. Outbound internet for `git clone`, PyPI (`uv sync` in §6), and HF checkpoint downloads.\n", "\n", - "\u00a74\u2013\u00a76 install system libraries, clone the framework when missing, install [`uv`](https://docs.astral.sh/uv/) if needed, run `uv sync`, and create `packages/cosmos3/.venv`. Caches default to `generator/transfer/.cache/`.\n" - ] + "Everything else — system packages, framework clone, Python venv — is installed automatically by §4–§6.\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "markdown", "id": "transfer-config-md", "metadata": {}, "source": [ - "## 2. Configure Paths\n", + "## 2. Configure\n", "\n", - "Defaults assume this `cosmos` checkout with the framework at `packages/cosmos3`. Override in the next cell or via the environment variables below.\n", + "**Edit the cell directly below** — it is the only cell you need to change before running the notebook top to bottom.\n", "\n", - "```bash\n", - "export COSMOS3_REPO=/path/to/cosmos-framework # or packages/cosmos3 in this repo\n", - "export COSMOS3_UV_GROUP=cu130-train # cu128-train on x86 + CUDA 12.x; \u00a72 picks aarch64 defaults\n", - "export COSMOS3_CACHE_ROOT=/path/to/cache # optional; else ~/.cache/{uv,huggingface}\n", - "export COSMOS3_TRANSFER_OUTPUT_ROOT=/path/to/outputs\n", - "export CUDA_VISIBLE_DEVICES=0\n", - "```\n" + "| Setting | What it controls |\n", + "|---|---|\n", + "| `HF_TOKEN` | Hugging Face token for downloading model weights |\n", + "| `COSMOS3_CACHE_ROOT` | Path for uv + HF caches (leave `\"\"` to use default under this cookbook) |\n", + "| `COSMOS3_TRANSFER_OUTPUT_ROOT` | Where generated videos are saved (leave `\"\"` for default) |\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aabc169f", + "metadata": {}, + "outputs": [], + "source": [ + "# ── Edit here, then run all cells ──────────────────────────────────────────\n", + "\n", + "# Hugging Face token — required to download Cosmos3 weights.\n", + "# Get yours at https://huggingface.co/settings/tokens\n", + "HF_TOKEN = \"\" # e.g. \"hf_xxxxxxxxxxxxxxxxxxxxxxxx\"\n", + "\n", + "# Cache root for uv and Hugging Face downloads.\n", + "# Set to a large disk, e.g. \"/lustre/scratch/cache\". Leave \"\" for default.\n", + "COSMOS3_CACHE_ROOT = \"\"\n", + "\n", + "# Output directory for generated videos. Leave \"\" for default.\n", + "COSMOS3_TRANSFER_OUTPUT_ROOT = \"\"\n", + "\n", + "# ── Push to environment (do not edit below this line) ───────────────────────\n", + "import os\n", + "if HF_TOKEN:\n", + " os.environ[\"HF_TOKEN\"] = HF_TOKEN\n", + "if COSMOS3_CACHE_ROOT:\n", + " os.environ[\"COSMOS3_CACHE_ROOT\"] = COSMOS3_CACHE_ROOT\n", + "if COSMOS3_TRANSFER_OUTPUT_ROOT:\n", + " os.environ[\"COSMOS3_TRANSFER_OUTPUT_ROOT\"] = COSMOS3_TRANSFER_OUTPUT_ROOT\n", + "print(f\"cache: {COSMOS3_CACHE_ROOT or '(default)'}\")\n", + "print(f\"HF_TOKEN: {'set' if HF_TOKEN else 'not set — using existing hf login session'}\")\n" ] }, { @@ -82,12 +122,14 @@ "source": [ "## 3. Confirm GPU access\n", "\n", - "Run this **before** install (\u00a76) or inference (\u00a79+). If it fails, fix GPU allocation or driver setup before continuing." - ] + "Run this **before** install (§6) or inference (§9+). If it fails, fix GPU allocation or driver setup before continuing." + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "transfer-config-code", "metadata": { "execution": { @@ -97,24 +139,7 @@ "shell.execute_reply": "2026-06-09T04:27:32.323165Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos root: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos\n", - "transfer cookbook: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer\n", - "framework: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/packages/cosmos3\n", - "controls: edge, blur, depth, seg, wsm\n", - "output root: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks\n", - "checkpoint: Cosmos3-Nano\n", - "UV_CACHE_DIR: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/.cache/uv\n", - "HF_HOME: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/.cache/huggingface\n", - "COSMOS3_UV_GROUP: cu130-train\n", - "CUDA_VISIBLE_DEVICES: 0,1,2,3\n" - ] - } - ], + "outputs": [], "source": [ "from pathlib import Path\n", "import json\n", @@ -167,13 +192,32 @@ "COSMOS3_SPECS_DIR = COSMOS3_TRANSFER_ROOT / \"specs\"\n", "TRANSFER_CONTROLS = (\"edge\", \"blur\", \"depth\", \"seg\", \"wsm\")\n", "\n", + "\n", + "def _detect_gpu_count() -> str:\n", + " \"\"\"Count visible GPUs via nvidia-smi; fall back to 4.\"\"\"\n", + " import subprocess\n", + " try:\n", + " out = subprocess.check_output(\n", + " [\"nvidia-smi\", \"--query-gpu=name\", \"--format=csv,noheader\"],\n", + " timeout=10, stderr=subprocess.DEVNULL,\n", + " ).decode()\n", + " count = len([l for l in out.strip().splitlines() if l])\n", + " if count > 0:\n", + " return str(count)\n", + " except Exception:\n", + " pass\n", + " return \"4\"\n", + "\n", + "\n", + "COSMOS3_NUM_GPUS = os.environ.get(\"COSMOS3_NUM_GPUS\") or _detect_gpu_count()\n", + "\n", "os.environ[\"COSMOS_ROOT\"] = str(COSMOS_ROOT)\n", "os.environ[\"COSMOS3_TRANSFER_ROOT\"] = str(COSMOS3_TRANSFER_ROOT)\n", "os.environ[\"COSMOS3_REPO\"] = str(COSMOS3_REPO)\n", "os.environ[\"COSMOS3_GIT_URL\"] = COSMOS3_GIT_URL\n", "os.environ[\"COSMOS3_UV_GROUP\"] = COSMOS3_UV_GROUP\n", "os.environ[\"COSMOS3_TRANSFER_OUTPUT_ROOT\"] = str(COSMOS3_TRANSFER_OUTPUT_ROOT)\n", - "os.environ.setdefault(\"COSMOS3_CHECKPOINT_PATH\", \"Cosmos3-Nano\")\n", + "os.environ[\"COSMOS3_NUM_GPUS\"] = COSMOS3_NUM_GPUS\n", "\n", "\n", "def default_cache_path(name: str) -> str:\n", @@ -187,7 +231,9 @@ "os.environ[\"HF_HOME\"] = os.environ.get(\"COSMOS3_HF_HOME\", default_cache_path(\"huggingface\"))\n", "# NGC PyTorch images: clear bundled libtorch from LD_LIBRARY_PATH before inference.\n", "os.environ.pop(\"LD_LIBRARY_PATH\", None)\n", - "os.environ.setdefault(\"CUDA_VISIBLE_DEVICES\", \"0\")\n", + "# Default CUDA_VISIBLE_DEVICES to all detected GPUs so both Nano and Super cells work.\n", + "_all_gpus = \",\".join(str(i) for i in range(int(COSMOS3_NUM_GPUS)))\n", + "os.environ.setdefault(\"CUDA_VISIBLE_DEVICES\", _all_gpus)\n", "os.environ.setdefault(\"COSMOS3_MASTER_ADDR\", \"127.0.0.1\")\n", "os.environ.setdefault(\"COSMOS3_MASTER_PORT\", free_local_port())\n", "\n", @@ -196,16 +242,16 @@ "print(\"framework:\", COSMOS3_REPO)\n", "print(\"controls:\", \", \".join(TRANSFER_CONTROLS))\n", "print(\"output root:\", COSMOS3_TRANSFER_OUTPUT_ROOT)\n", - "print(\"checkpoint:\", os.environ[\"COSMOS3_CHECKPOINT_PATH\"])\n", "print(\"UV_CACHE_DIR:\", os.environ[\"UV_CACHE_DIR\"])\n", "print(\"HF_HOME:\", os.environ[\"HF_HOME\"])\n", "print(\"COSMOS3_UV_GROUP:\", os.environ[\"COSMOS3_UV_GROUP\"])\n", + "print(\"COSMOS3_NUM_GPUS:\", os.environ[\"COSMOS3_NUM_GPUS\"])\n", "print(\"CUDA_VISIBLE_DEVICES:\", os.environ[\"CUDA_VISIBLE_DEVICES\"])\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "7e649ded", "metadata": { "execution": { @@ -215,54 +261,14 @@ "shell.execute_reply": "2026-06-09T04:27:32.577151Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "hostname: nvl72D130-T03\n", - "CUDA_VISIBLE_DEVICES=0,1,2,3\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "GPU 0: NVIDIA GB200 (UUID: GPU-76f8de40-8e91-e8e0-a3bc-eeb10770c592)\n", - "GPU 1: NVIDIA GB200 (UUID: GPU-" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "02153708-5e2c-816b-e8a2-e82617afaf3e)\n", - "GPU 2: NVIDIA GB200 (UUID: GPU-6be031fc-b525-6441-18f7-854d446" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "73407)\n", - "GPU 3: NVIDIA GB200 (UUID: GPU-3e3f578d-3e18-6667-0b90-8e708132b49b)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "OK: 4 GPU(s) visible on nvl72D130-T03\n" - ] - } - ], + "outputs": [], "source": [ "%%bash\n", "set -euo pipefail\n", "echo \"hostname: $(hostname)\"\n", "echo \"CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-}\"\n", "if ! command -v nvidia-smi >/dev/null 2>&1; then\n", - " echo \"ERROR: nvidia-smi not found. Run on a GPU host (see \u00a71).\"\n", + " echo \"ERROR: nvidia-smi not found. Run on a GPU host (see §1).\"\n", " exit 1\n", "fi\n", "nvidia-smi -L\n", @@ -281,14 +287,16 @@ "source": [ "## 4. Install system packages (Linux)\n", "\n", - "Framework guardrails and previews need **ffmpeg**, **git-lfs**, and graphics libraries (`libxcb1`, `libgl1`, \u2026). On hosts with `apt-get` (NGC PyTorch container, many training images), run the next cell to install them.\n", + "Framework guardrails and previews need **ffmpeg**, **git-lfs**, and graphics libraries (`libxcb1`, `libgl1`, …). On hosts with `apt-get` (NGC PyTorch container, many training images), run the next cell to install them.\n", "\n", - "If `apt-get` is unavailable, install the same packages with your OS package manager \u2014 see [Cosmos3 cookbooks README \u2014 System packages](../../README.md#system-packages-required-for-framework-guardrails).\n" - ] + "If `apt-get` is unavailable, install the same packages with your OS package manager — see [Cosmos3 cookbooks README — System packages](../../README.md#system-packages-required-for-framework-guardrails).\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "320ddc9d", "metadata": { "execution": { @@ -298,130 +306,7 @@ "shell.execute_reply": "2026-06-09T04:27:36.081698Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Installing system packages via apt-get...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Reading package lists..." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Building dependency tree..." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Reading state information..." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "curl is already the newest version (8.5.0-2ubuntu10.9).\n", - "ffmpeg is already the newest version (7:6.1." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1-3ubuntu5).\n", - "git-lfs is already the newest version (3.4.1-1ubuntu0.4).\n", - "libgl1 is already the newest " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "version (1.7.0-1build1).\n", - "libglib2.0-0t64 is already the newest version (2.80.0-6ubuntu3.8).\n", - "libx11-d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ev is already the newest version (2:1.8.7-1build1).\n", - "libxcb1 is already the newest version (1.15-1ubu" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ntu2).\n", - "tree is already the newest version (2.1.1-2ubuntu3.24.04.2).\n", - "wget is already the newest versi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "on (1.21.4-1ubuntu4.1).\n", - "0 upgraded, 0 newly installed, 0 to remove and 117 not upgraded.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "OK: apt packages installed\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Git LFS initialized.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "git-lfs: OK\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "System package check complete.\n" - ] - } - ], + "outputs": [], "source": [ "%%bash\n", "set -euo pipefail\n", @@ -461,11 +346,13 @@ "## 5. Clone Cosmos Framework\n", "\n", "Clones `COSMOS3_GIT_URL` into `COSMOS3_REPO` when the tree is not already present." - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "transfer-clone-code", "metadata": { "execution": { @@ -475,54 +362,7 @@ "shell.execute_reply": "2026-06-09T04:27:36.788823Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Using existing framework checkout: /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/trungp/repos/cosmos/packages/cosmos3\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "## feature/transfer-control-guidance...fork/feature/transfer-control-guidance\n", - " M uv.lock\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "fork\thttps://github.com/trungtpham/cosmos-framework.git (fetch)\n", - "fork\thttps://github.com/trungtpham/c" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "osmos-framework.git (push)\n", - "origin\thttps://github.com/NVIDIA/cosmos-framework.git (fetch)\n", - "origin\thttp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "s://github.com/NVIDIA/cosmos-framework.git (push)\n" - ] - } - ], + "outputs": [], "source": [ "%%bash\n", "set -euo pipefail\n", @@ -552,18 +392,20 @@ "source": [ "## 6. Install Cosmos Framework Dependencies\n", "\n", - "Installs [`uv`](https://docs.astral.sh/uv/) if missing, then runs `uv sync` to create `packages/cosmos3/.venv`. Uses `COSMOS3_UV_GROUP` from \u00a72.\n", + "Installs [`uv`](https://docs.astral.sh/uv/) if missing, then runs `uv sync` to create `packages/cosmos3/.venv`. Uses `COSMOS3_UV_GROUP` from §2.\n", "\n", "**Skip:** if `.venv` already imports `cosmos_framework` with CUDA available, the next cell skips `uv sync` (fast re-runs). Set `COSMOS3_FORCE_UV_SYNC=1` to force a full re-sync.\n", "\n", - "For `jupyter execute` on a GPU node, set `COSMOS3_UV_CACHE_DIR` / `COSMOS3_HF_HOME` (or `COSMOS3_CACHE_ROOT`) in \u00a72 first.\n", + "For `jupyter execute` on a GPU node, set `COSMOS3_UV_CACHE_DIR` / `COSMOS3_HF_HOME` (or `COSMOS3_CACHE_ROOT`) in §2 first.\n", "\n", "If you change `COSMOS3_UV_GROUP`, **re-run this cell** before inference.\n" - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "transfer-install-code", "metadata": { "execution": { @@ -573,182 +415,7 @@ "shell.execute_reply": "2026-06-09T04:27:46.483864Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "uv 0.8.17\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "UV_CACHE_DIR=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/c" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ookbooks/cosmos3/generator/transfer/.cache/uv\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "COSMOS3_UV_GROUP=cu130-train\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "UV_HTTP_TIMEOUT=600\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "venv ready at /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "packages/cosmos3/.venv \u2014 skipping uv sync (set COSMOS3_FORCE_UV_SYNC=1 to re-sync)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\u001b[1m\u001b[33mwarning\u001b[39m\u001b[0m\u001b[1m:\u001b[0m \u001b[1mFailed to parse `\u001b[36mpyproject.toml\u001b[39m` during settings di" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "scovery:\n", - " TOML parse error at line 328, column 10\n", - " |\n", - " 328 | [tool.uv.audit]\n", - " | " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "^^^^^\n", - " unknown field `audit`, expected one of `required-version`, `native-tls`, `offline`, `no-cach" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "e`, `cache-dir`, `preview`, `python-preference`, `python-downloads`, `concurrent-downloads`, `concur" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "rent-builds`, `concurrent-installs`, `index`, `index-url`, `extra-index-url`, `no-index`, `find-link" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "s`, `index-strategy`, `keyring-provider`, `allow-insecure-host`, `resolution`, `prerelease`, `fork-s" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "trategy`, `dependency-metadata`, `config-settings`, `config-settings-package`, `no-build-isolation`," - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " `no-build-isolation-package`, `extra-build-dependencies`, `extra-build-variables`, `exclude-newer`," - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " `exclude-newer-package`, `link-mode`, `compile-bytecode`, `no-sources`, `upgrade`, `upgrade-package" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "`, `reinstall`, `reinstall-package`, `no-build`, `no-build-package`, `no-binary`, `no-binary-package" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "`, `python-install-mirror`, `pypy-install-mirror`, `python-downloads-json-url`, `publish-url`, `trus" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "ted-publishing`, `check-url`, `add-bounds`, `pip`, `cache-keys`, `override-dependencies`, `constrain" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "t-dependencies`, `build-constraint-dependencies`, `environments`, `required-environments`, `conflict" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "s`, `workspace`, `sources`, `managed`, `package`, `default-groups`, `dependency-groups`, `dev-depend" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "encies`, `build-backend`\n", - "\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\u001b[2mAudited \u001b[1m2 packages\u001b[0m \u001b[2min 218ms\u001b[0m\u001b[0m\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "venv OK (skipped sync)\n" - ] - } - ], + "outputs": [], "source": [ "%%bash\n", "set -euo pipefail\n", @@ -766,7 +433,7 @@ "export GIT_LFS_SKIP_SMUDGE=1\n", "mkdir -p \"$UV_CACHE_DIR\" \"$HF_HOME\"\n", "cd \"$COSMOS3_REPO\"\n", - "export UV_CACHE_DIR=\"${UV_CACHE_DIR:?set paths in \u00a72 (run the configure cell first)}\"\n", + "export UV_CACHE_DIR=\"${UV_CACHE_DIR:?set paths in §2 (run the configure cell first)}\"\n", "export UV_PROJECT_ENVIRONMENT=\"${UV_PROJECT_ENVIRONMENT:-$COSMOS3_REPO/.venv}\"\n", "export UV_HTTP_TIMEOUT=\"${UV_HTTP_TIMEOUT:-600}\"\n", "echo \"UV_CACHE_DIR=$UV_CACHE_DIR\"\n", @@ -776,13 +443,13 @@ "if [ -z \"${COSMOS3_FORCE_UV_SYNC:-}\" ] && [ -x \".venv/bin/python\" ]; then\n", " if env -u LD_LIBRARY_PATH .venv/bin/python -c \\\n", " 'import cosmos_framework, torch; assert torch.cuda.is_available()'; then\n", - " echo \"venv ready at $COSMOS3_REPO/.venv \u2014 skipping uv sync (set COSMOS3_FORCE_UV_SYNC=1 to re-sync)\"\n", + " echo \"venv ready at $COSMOS3_REPO/.venv — skipping uv sync (set COSMOS3_FORCE_UV_SYNC=1 to re-sync)\"\n", " uv pip install imageio imageio-ffmpeg\n", " env -u LD_LIBRARY_PATH .venv/bin/python -c \\\n", " 'import cosmos_framework, torch; print(\"venv OK (skipped sync)\")'\n", " exit 0\n", " fi\n", - " echo \"Existing .venv failed CUDA/framework check \u2014 running full uv sync...\"\n", + " echo \"Existing .venv failed CUDA/framework check — running full uv sync...\"\n", "fi\n", "\n", "attempt=1\n", @@ -804,17 +471,63 @@ "echo \"Install complete: $COSMOS3_REPO/.venv\"\n" ] }, + { + "cell_type": "markdown", + "id": "transfer-hf-auth-md", + "metadata": {}, + "source": [ + "## 6.5. Authenticate with Hugging Face\n", + "\n", + "Downloads Cosmos3 weights from Hugging Face during first inference. This cell uses `HF_TOKEN` from §2 if set, or falls back to an existing `hf login` session.\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "transfer-hf-auth-code", + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "set -euo pipefail\n", + "\n", + "HF_BIN=\"$COSMOS3_REPO/.venv/bin/huggingface-cli\"\n", + "[ -x \"$HF_BIN\" ] || HF_BIN=$(command -v huggingface-cli 2>/dev/null || echo \"\")\n", + "\n", + "if [ -n \"${HF_TOKEN:-}\" ]; then\n", + " echo \"Logging in with HF_TOKEN...\"\n", + " if [ -n \"$HF_BIN\" ]; then\n", + " \"$HF_BIN\" login --token \"$HF_TOKEN\" --add-to-git-credential 2>/dev/null || true\n", + " fi\n", + " echo \"HF_TOKEN: set\"\n", + "else\n", + " echo \"HF_TOKEN not set — checking for existing login session...\"\n", + " if [ -n \"$HF_BIN\" ] && \"$HF_BIN\" whoami >/dev/null 2>&1; then\n", + " echo \"Already logged in as: $(\\\"$HF_BIN\\\" whoami 2>/dev/null | head -1)\"\n", + " else\n", + " echo \"WARNING: No HF_TOKEN and no active login session.\"\n", + " echo \"Inference will fail when downloading weights. Fix by either:\"\n", + " echo \" 1. Setting HF_TOKEN in §2 and re-running from the top, or\"\n", + " echo \" 2. Running: uvx hf@latest auth login\"\n", + " fi\n", + "fi\n" + ] + }, { "cell_type": "markdown", "id": "transfer-verify-md", "metadata": {}, "source": [ "## 7. Verify GPU Environment\n" - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "transfer-verify-code", "metadata": { "execution": { @@ -824,20 +537,7 @@ "shell.execute_reply": "2026-06-09T04:27:51.151279Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "uv group (env): cu130-train\n", - "torch: 2.10.0+cu130\n", - "torch cuda: 13.0\n", - "cuda available: True\n", - "device count: 4\n", - "device 0: NVIDIA GB200\n" - ] - } - ], + "outputs": [], "source": [ "import subprocess\n", "\n", @@ -852,7 +552,7 @@ "if torch.cuda.is_available():\n", " print(\"device 0:\", torch.cuda.get_device_name(0))\n", "else:\n", - " print(\"FIX: set COSMOS3_UV_GROUP in \u00a72 (cu130-train or cu128-train), re-run \u00a76 install, then this cell.\")\n", + " print(\"FIX: set COSMOS3_UV_GROUP in §2 (cu130-train or cu128-train), re-run §6 install, then this cell.\")\n", " sys.exit(1)\n", "'''\n", "result = subprocess.run(\n", @@ -862,7 +562,7 @@ ")\n", "if result.returncode != 0:\n", " raise RuntimeError(\n", - " \"CUDA not available. Pass \u00a73 first, then re-run \u00a76 install with the correct COSMOS3_UV_GROUP.\"\n", + " \"CUDA not available. Pass §3 first, then re-run §6 install with the correct COSMOS3_UV_GROUP.\"\n", " )\n" ] }, @@ -873,18 +573,27 @@ "source": [ "## 8. Input Specs and Preview Helpers\n", "\n", - "Checked-in [`specs/.json`](./specs) use paths relative to `specs/`. Previews use [`preview_helpers.py`](./preview_helpers.py) and `imageio-ffmpeg` installed in \u00a76.\n", + "Checked-in [`specs/.json`](./specs) are model-agnostic — the same spec runs with Nano and Super.\n", + "\n", + "Inference writes videos to:\n", + "\n", + "```text\n", + "//transfer_/vision.mp4\n", + "```\n", "\n", - "Inference (\u00a79\u2013\u00a713) writes videos to:\n", + "For example:\n", "\n", "```text\n", - "//transfer_/vision.mp4\n", + "outputs/notebooks/Cosmos3-Nano/transfer_edge/vision.mp4\n", + "outputs/notebooks/Cosmos3-Super/transfer_edge/vision.mp4\n", "```\n" - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "transfer-specs-code", "metadata": { "execution": { @@ -894,15 +603,7 @@ "shell.execute_reply": "2026-06-09T04:27:51.160553Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Using specs: edge.json, blur.json, depth.json, seg.json, wsm.json\n" - ] - } - ], + "outputs": [], "source": [ "missing = [c for c in TRANSFER_CONTROLS if not (COSMOS3_SPECS_DIR / f\"{c}.json\").is_file()]\n", "if missing:\n", @@ -912,7 +613,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "transfer-spec-summary", "metadata": { "execution": { @@ -922,19 +623,7 @@ "shell.execute_reply": "2026-06-09T04:27:51.180287Z" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "edge: frames=121 fps=30 guidance=3.0 control_guidance=1.5 control=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/assets/edge/control_edge.mp4\n", - "blur: frames=121 fps=30 guidance=3.0 control_guidance=1.5 control=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/assets/blur/control_blur.mp4\n", - "depth: frames=121 fps=30 guidance=3.0 control_guidance=1.5 control=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/assets/depth/control_depth.mp4\n", - "seg: frames=121 fps=30 guidance=3.0 control_guidance=2.0 control=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/assets/seg/control_seg.mp4\n", - "wsm: frames=101 fps=10 guidance=1.0 control_guidance=3.0 control=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/assets/wsm/control_wsm.mp4\n" - ] - } - ], + "outputs": [], "source": [ "from preview_helpers import load_transfer_spec, resolve_spec_path\n", "\n", @@ -950,1398 +639,118 @@ }, { "cell_type": "markdown", - "id": "transfer-edge-md", + "id": "edb56a21", "metadata": {}, "source": [ - "## 9. Edge (Canny) Transfer\n", + "## Nano Inference\n", "\n", - "Run after \u00a77 reports `cuda available: True`.\n", + "Run cells §9–§13 for **Cosmos3-Nano** (single GPU, `latency` preset). Outputs go to `/Cosmos3-Nano/transfer_/vision.mp4`.\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "b9622547", + "metadata": {}, + "source": [ + "## 9. Nano: Edge (Canny) Transfer\n", "\n", "Precomputed edge control (`control_edge.mp4`) + caption. Output:\n", "\n", "```text\n", - "/edge/transfer_edge/vision.mp4\n", + "/Cosmos3-Nano/transfer_edge/vision.mp4\n", "```\n" - ] + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 9, - "id": "1dcfe689", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:27:51.181593Z", - "iopub.status.busy": "2026-06-09T04:27:51.181480Z", - "iopub.status.idle": "2026-06-09T04:33:54.293524Z", - "shell.execute_reply": "2026-06-09T04:33:54.293102Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "control=edge spec=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cos" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mos/cookbooks/cosmos3/generator/transfer/specs/edge.json output=/lustre/fsw/portfolios/cosmos/projec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ts/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/noteb" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ooks/edge checkpoint=Cosmos3-Nano\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:13|job=|INFO|cosmos_framework/inference/common/init.py:127:_init_log_files] Console log" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " saved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cook" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "books/cosmos3/generator/transfer/outputs/notebooks/edge/console.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:13|job=|INFO|cosmos_framework/inference/common/init.py:128:_init_log_files] Debug log s" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oks/cosmos3/generator/transfer/outputs/notebooks/edge/debug.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:13|job=|INFO|cosmos_framework/scripts/inference.py:46:inference] Loaded 1 samples\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:27|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos-Guardrail1 --repo-type model --revision d6d4bfa899a71454a70090766" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4f3e88f503950cf --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:33|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:34|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:71:__init__] OmniMoTModel: co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nfig {'tokenizer': {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/gcp_training.secret', 'vae_path': 'pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth', 'chunk_dura" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion': 93, 'keep_decoder_cache': False, 'use_streaming_encode': False, 'encode_chunk_frames': {'256'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": 68, '480': 24, '720': 12}, 'encode_exact_durations': [17, 61, 73], 'spatial_compression_factor': 1" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "6, 'temporal_compression_factor': 4, 'temporal_window': None, 'encode_bucket_multiple': None, '_targ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "et_': 'cosmos_framework.model.vfm.tokenizers.wan2pt2_vae_4x16x16.Wan2pt2VAEInterface'}, 'net': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'ema': {'enabled': False, 'rate': 0.1, 'iteration_shift': 0, '_type': 'cosmos_framework.configs.bas" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e.defaults.ema.EMAConfig'}, 'parallelism': {'data_parallel_shard_degree': 1, 'data_parallel_replicat" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_degree': 1, 'context_parallel_shard_degree': 1, 'cfg_parallel_shard_degree': 1, 'enable_inference_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mode': True, 'fsdp_master_dtype': 'float32', '_type': 'cosmos_framework.configs.base.defaults.parall" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "elism.ParallelismConfig'}, 'compile': {'enabled': True, 'compiled_region': 'all', 'compile_dynamic':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " True, 'use_cuda_graphs': False, 'max_autotune_pointwise': False, 'coordinate_descent_tuning': False" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", '_type': 'cosmos_framework.configs.base.defaults.compile.CompileConfig'}, 'activation_checkpointin" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g': {'mode': 'none', 'preserve_rng_state': True, 'determinism_check': 'default', 'save_ops_regex': [" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "'fmha'], '_type': 'cosmos_framework.configs.base.defaults.activation_checkpointing.ActivationCheckpo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "intingConfig'}, 'precision': 'bfloat16', 'lora_enabled': False, 'lora_rank': 16, 'lora_alpha': 32, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "lora_target_modules': 'q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen', 'rectified_flow" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_training_config': {'shift': {'256': 3, '480': 5, '720': 10}, 'use_dynamic_shift': False, 'train_tim" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_image_distribution': 'logitnormal', 'train_time_video_distribution': 'waver', 'train_time_action_d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "istribution': 'logitnormal', 'train_time_sound_distribution': 'logitnormal', 'train_time_weight': 'u" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "niform', 'loss_scale': 10.0, 'image_loss_scale': None, 'sound_loss_scale': 2.0, 'use_high_sigma_stra" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tegy': False, 'high_sigma_ratio': 0.05, 'high_sigma_timesteps_min': 995, 'high_sigma_timesteps_max':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 1000, 'use_discrete_rf': False, 'action_loss_weight': 10.0, 'independent_action_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_action': None, 'use_high_sigma_strategy_action': False, 'independent_sound_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_sound': None, 'use_high_sigma_strategy_sound': False, 'normalize_loss_by_active': False, '_typ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowTrainingConfig'}, 'rectified_f" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "low_inference_config': {'scheduler_type': 'unipc', 'num_train_timesteps': 1000, 'shift': 1, 'use_dyn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amic_shifting': False, '_type': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowIn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ferenceConfig'}, 'fixed_step_sampler_config': None, 'vlm_config': {'model_name': 'nvidia/Cosmos3-Nan" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "o-Reasoner', 'safetensors_path': '', 'pretrained_weights': {'enabled': True, 'backbone_path': 's3://" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Nano-Reasoner-bb9c6f5/', 'credentials_pa" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "th': 'credentials/gcp_checkpoint.secret', 'enable_gcs_patch_in_boto3': True, 'checkpoint_format': No" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ne, '_type': 'cosmos_framework.configs.base.defaults.vlm.PretrainedWeightsConfig'}, 'model_instance'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'_target_': 'cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLTextForCausalLM', 'config': {'_tar" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "get_': 'cosmos_framework.configs.base.defaults.vlm.create_vlm_config', 'base_config': {'_target_': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLMoTConfig.from_json_file', 'json_file': 'cosmos_fr" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amework/model/vfm/vlm/qwen3_vl/configs/Qwen3-VL-8B-Instruct.json'}, 'include_visual': True, 'qk_norm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_for_text': True}}, 'tokenizer': {'repository': 'nvidia/Cosmos3-Nano', 'revision': 'main', 'subdir':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " '', '_target_': 'cosmos_framework.data.vfm.processors.build_processor_lazy'}, 'layer_module': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'qk_norm': False, 'tie_word_embeddings': False, 'use_system_prompt': False, '_type': 'cosmos_framew" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ork.configs.base.defaults.vlm.VLMConfig'}, 'diffusion_expert_config': {'timestep_range': 1.0, 'load_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "weights_from_pretrained': False, 'patch_spatial': 2, 'max_vae_latent_side_after_patchify': 20, 'posi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion_embedding_type': 'unified_3d_mrope', 'rope_h_extrapolation_ratio': 1.0, 'rope_w_extrapolation_r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "atio': 1.0, 'rope_t_extrapolation_ratio': 1.0, 'enable_fps_modulation': True, 'base_fps': 24, 'unifi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ed_3d_mrope_reset_spatial_ids': True, 'unified_3d_mrope_temporal_modality_margin': 15000, '_type': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.configs.base.defaults.model_config.DiffusionExpertConfig'}, 'input_video_key': 'vid" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "eo', 'input_image_key': 'images', 'input_caption_key': 'ai_caption', 'state_ch': 48, 'state_t': 300," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'latent_downsample_factor': 16, 'resolution': '720', 'max_num_tokens_after_packing': 74000, 'joint_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "attn_implementation': 'two_way', 'natten_parameter_list': None, 'video_temporal_causal': False, 'cau" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sal_training_strategy': 'none', 'lbl': {'method': 'local', 'coeff_und': None, 'coeff_gen': None, '_t" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ype': 'cosmos_framework.configs.base.defaults.model_config.LBLConfig'}, 'vision_gen': True, 'action_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "gen': True, 'max_action_dim': 64, 'num_embodiment_domains': 32, 'sound_gen': True, 'sound_tokenizer'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials/gcp_training.sec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ret', 'avae_path': 'pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt', 'avae_confi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g_path': '', 'sample_rate': 48000, 'audio_channels': 2, 'io_channels': 64, 'hop_size': 1920, 'normal" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ize_latents': False, 'normalization_type': 'none', 'tanh_input_scale': 1.5, 'tanh_output_scale': 3.5" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", 'tanh_clamp': 0.995, 'latent_mean': None, 'latent_std': None, '_target_': 'cosmos_framework.model." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vfm.tokenizers.audio.avae.AVAEInterface'}, 'sound_dim': 64, 'sound_latent_fps': 25, 'log_enc_time_ev" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ery_n': 100, '_type': 'cosmos_framework.configs.base.defaults.model_config.OmniMoTModelConfig'}\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:34|job=|WARNING|cosmos_framework/model/vfm/omni_mot_model.py:96:set_precision] OmniMoTM" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel: precision torch.bfloat16\n", - "[06-08 21:28:34|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":_hf_download] uvx hf@1.16.4 download --format=json nvidia/Cosmos3-Nano --repo-type model --revision" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:36|job=|INFO|cosmos_framework/data/vfm/processors/base.py:122:__init__] Successfully lo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aded processor from local cache\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:36|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt Wan2.2/vae(8e849928a45549bcb83bf5a3dec753cc)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:36|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json Wan-AI/Wan2.2-TI2V-5B --repo-type model --revision 921dbaf3f1674a56f47e83fb80a3" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4bac8a8f203e Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:39|job=|INFO|cosmos_framework/model/vfm/tokenizers/wan2pt2_vae_4x16x16.py:1015:_video_v" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ae] loading /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "okbooks/cosmos3/generator/transfer/.cache/huggingface/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/9" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "21dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:39|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt AVAE(5f5bb062ea3e473c80c53c5944a55d34)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:39|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include 'sound_tokenize" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r/*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:42|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:168:set_up_tokenizers] Sound " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tokenizer initialized: AVAEInterface\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:42|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_tokenizers: 7.53 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:44|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on meta to cuda and b" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "roadcast model states: 1.28 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:44|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on Creating PyTorch m" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel and ema if enabled: 2.18 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:28:44|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_model: 2.18 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:29:13|job=|INFO|cosmos_framework/inference/inference.py:1588:_generate_transfer_batch] [RA" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NK 0] Saved sample args to '/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks/edge/transfer_edge/sample_args." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "json'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:29:13|job=|INFO|cosmos_framework/inference/transfer.py:111:load_transfer_control_frames] L" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oaded pre-computed edge control from /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/use" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "rs/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/specs/../assets/edge/control_edge.mp4\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:29:17|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:2533:generate_samples_from_ba" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tch] Using sampler: UniPC (shift=10.0, num_steps=50)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r", - "Sampling: 0%| | 0/50 [00:00/Cosmos3-Nano/transfer_blur/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 10, - "id": "transfer-edge-preview", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:33:54.295211Z", - "iopub.status.busy": "2026-06-09T04:33:54.295078Z", - "iopub.status.idle": "2026-06-09T04:33:54.606356Z", - "shell.execute_reply": "2026-06-09T04:33:54.606025Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "edge control: control_edge.mp4 (678 KB -> 570 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "edge generated: vision.mp4 (25660 KB -> 187 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } + "execution_count": null, + "id": "000a6336", + "metadata": {}, + "outputs": [], + "source": "%%bash\nset -euo pipefail\nunset LD_LIBRARY_PATH\nCONTROL=blur\nSPEC=\"$COSMOS3_TRANSFER_ROOT/specs/${CONTROL}.json\"\nOUT_DIR=\"$COSMOS3_TRANSFER_OUTPUT_ROOT/Cosmos3-Nano\"\nmkdir -p \"$OUT_DIR\"\necho \"control=$CONTROL model=Cosmos3-Nano spec=$SPEC output=$OUT_DIR\"\ncd \"$COSMOS3_REPO\"\nCUDA_VISIBLE_DEVICES=\"${CUDA_VISIBLE_DEVICES}\" \\\n.venv/bin/python -m cosmos_framework.scripts.inference \\\n --parallelism-preset=latency \\\n -i \"$SPEC\" \\\n -o \"$OUT_DIR\" \\\n --checkpoint-path Cosmos3-Nano \\\n --seed 2026\n" + }, + { + "cell_type": "markdown", + "id": "dc9aa146", + "metadata": {}, + "source": [ + "### Preview Blur (Nano)\n" ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "95cc1806", + "metadata": {}, + "outputs": [], "source": [ + "import os\n", "import sys\n", "from pathlib import Path\n", "\n", @@ -2357,2796 +766,245 @@ "\n", "from preview_helpers import preview_transfer\n", "\n", - "preview_transfer(\"edge\")\n" + "preview_transfer(\"blur\", model=\"Cosmos3-Nano\")\n" ] }, { "cell_type": "markdown", - "id": "transfer-blur-md", + "id": "f1c66c3f", "metadata": {}, "source": [ - "## 10. Blur Transfer\n", + "## 11. Nano: Depth Transfer\n", "\n", - "Blurred-reference control (`control_blur.mp4`) + caption. Output: `.../blur/transfer_blur/vision.mp4`.\n" - ] + "Depth map control (`control_depth.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Nano/transfer_depth/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 11, - "id": "aedc3648", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:33:54.607627Z", - "iopub.status.busy": "2026-06-09T04:33:54.607505Z", - "iopub.status.idle": "2026-06-09T04:39:06.831652Z", - "shell.execute_reply": "2026-06-09T04:39:06.831219Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "control=blur spec=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cos" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mos/cookbooks/cosmos3/generator/transfer/specs/blur.json output=/lustre/fsw/portfolios/cosmos/projec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ts/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/noteb" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ooks/blur checkpoint=Cosmos3-Nano\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:08|job=|INFO|cosmos_framework/inference/common/init.py:127:_init_log_files] Console log" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " saved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cook" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "books/cosmos3/generator/transfer/outputs/notebooks/blur/console.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:08|job=|INFO|cosmos_framework/inference/common/init.py:128:_init_log_files] Debug log s" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oks/cosmos3/generator/transfer/outputs/notebooks/blur/debug.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:08|job=|INFO|cosmos_framework/scripts/inference.py:46:inference] Loaded 1 samples\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:12|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos-Guardrail1 --repo-type model --revision d6d4bfa899a71454a70090766" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4f3e88f503950cf --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:16|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:17|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:71:__init__] OmniMoTModel: co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nfig {'tokenizer': {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/gcp_training.secret', 'vae_path': 'pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth', 'chunk_dura" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion': 93, 'keep_decoder_cache': False, 'use_streaming_encode': False, 'encode_chunk_frames': {'256'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": 68, '480': 24, '720': 12}, 'encode_exact_durations': [17, 61, 73], 'spatial_compression_factor': 1" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "6, 'temporal_compression_factor': 4, 'temporal_window': None, 'encode_bucket_multiple': None, '_targ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "et_': 'cosmos_framework.model.vfm.tokenizers.wan2pt2_vae_4x16x16.Wan2pt2VAEInterface'}, 'net': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'ema': {'enabled': False, 'rate': 0.1, 'iteration_shift': 0, '_type': 'cosmos_framework.configs.bas" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e.defaults.ema.EMAConfig'}, 'parallelism': {'data_parallel_shard_degree': 1, 'data_parallel_replicat" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_degree': 1, 'context_parallel_shard_degree': 1, 'cfg_parallel_shard_degree': 1, 'enable_inference_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mode': True, 'fsdp_master_dtype': 'float32', '_type': 'cosmos_framework.configs.base.defaults.parall" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "elism.ParallelismConfig'}, 'compile': {'enabled': True, 'compiled_region': 'all', 'compile_dynamic':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " True, 'use_cuda_graphs': False, 'max_autotune_pointwise': False, 'coordinate_descent_tuning': False" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", '_type': 'cosmos_framework.configs.base.defaults.compile.CompileConfig'}, 'activation_checkpointin" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g': {'mode': 'none', 'preserve_rng_state': True, 'determinism_check': 'default', 'save_ops_regex': [" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "'fmha'], '_type': 'cosmos_framework.configs.base.defaults.activation_checkpointing.ActivationCheckpo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "intingConfig'}, 'precision': 'bfloat16', 'lora_enabled': False, 'lora_rank': 16, 'lora_alpha': 32, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "lora_target_modules': 'q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen', 'rectified_flow" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_training_config': {'shift': {'256': 3, '480': 5, '720': 10}, 'use_dynamic_shift': False, 'train_tim" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_image_distribution': 'logitnormal', 'train_time_video_distribution': 'waver', 'train_time_action_d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "istribution': 'logitnormal', 'train_time_sound_distribution': 'logitnormal', 'train_time_weight': 'u" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "niform', 'loss_scale': 10.0, 'image_loss_scale': None, 'sound_loss_scale': 2.0, 'use_high_sigma_stra" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tegy': False, 'high_sigma_ratio': 0.05, 'high_sigma_timesteps_min': 995, 'high_sigma_timesteps_max':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 1000, 'use_discrete_rf': False, 'action_loss_weight': 10.0, 'independent_action_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_action': None, 'use_high_sigma_strategy_action': False, 'independent_sound_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_sound': None, 'use_high_sigma_strategy_sound': False, 'normalize_loss_by_active': False, '_typ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowTrainingConfig'}, 'rectified_f" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "low_inference_config': {'scheduler_type': 'unipc', 'num_train_timesteps': 1000, 'shift': 1, 'use_dyn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amic_shifting': False, '_type': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowIn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ferenceConfig'}, 'fixed_step_sampler_config': None, 'vlm_config': {'model_name': 'nvidia/Cosmos3-Nan" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "o-Reasoner', 'safetensors_path': '', 'pretrained_weights': {'enabled': True, 'backbone_path': 's3://" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Nano-Reasoner-bb9c6f5/', 'credentials_pa" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "th': 'credentials/gcp_checkpoint.secret', 'enable_gcs_patch_in_boto3': True, 'checkpoint_format': No" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ne, '_type': 'cosmos_framework.configs.base.defaults.vlm.PretrainedWeightsConfig'}, 'model_instance'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'_target_': 'cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLTextForCausalLM', 'config': {'_tar" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "get_': 'cosmos_framework.configs.base.defaults.vlm.create_vlm_config', 'base_config': {'_target_': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLMoTConfig.from_json_file', 'json_file': 'cosmos_fr" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amework/model/vfm/vlm/qwen3_vl/configs/Qwen3-VL-8B-Instruct.json'}, 'include_visual': True, 'qk_norm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_for_text': True}}, 'tokenizer': {'repository': 'nvidia/Cosmos3-Nano', 'revision': 'main', 'subdir':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " '', '_target_': 'cosmos_framework.data.vfm.processors.build_processor_lazy'}, 'layer_module': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'qk_norm': False, 'tie_word_embeddings': False, 'use_system_prompt': False, '_type': 'cosmos_framew" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ork.configs.base.defaults.vlm.VLMConfig'}, 'diffusion_expert_config': {'timestep_range': 1.0, 'load_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "weights_from_pretrained': False, 'patch_spatial': 2, 'max_vae_latent_side_after_patchify': 20, 'posi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion_embedding_type': 'unified_3d_mrope', 'rope_h_extrapolation_ratio': 1.0, 'rope_w_extrapolation_r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "atio': 1.0, 'rope_t_extrapolation_ratio': 1.0, 'enable_fps_modulation': True, 'base_fps': 24, 'unifi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ed_3d_mrope_reset_spatial_ids': True, 'unified_3d_mrope_temporal_modality_margin': 15000, '_type': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.configs.base.defaults.model_config.DiffusionExpertConfig'}, 'input_video_key': 'vid" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "eo', 'input_image_key': 'images', 'input_caption_key': 'ai_caption', 'state_ch': 48, 'state_t': 300," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'latent_downsample_factor': 16, 'resolution': '720', 'max_num_tokens_after_packing': 74000, 'joint_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "attn_implementation': 'two_way', 'natten_parameter_list': None, 'video_temporal_causal': False, 'cau" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sal_training_strategy': 'none', 'lbl': {'method': 'local', 'coeff_und': None, 'coeff_gen': None, '_t" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ype': 'cosmos_framework.configs.base.defaults.model_config.LBLConfig'}, 'vision_gen': True, 'action_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "gen': True, 'max_action_dim': 64, 'num_embodiment_domains': 32, 'sound_gen': True, 'sound_tokenizer'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials/gcp_training.sec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ret', 'avae_path': 'pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt', 'avae_confi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g_path': '', 'sample_rate': 48000, 'audio_channels': 2, 'io_channels': 64, 'hop_size': 1920, 'normal" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ize_latents': False, 'normalization_type': 'none', 'tanh_input_scale': 1.5, 'tanh_output_scale': 3.5" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", 'tanh_clamp': 0.995, 'latent_mean': None, 'latent_std': None, '_target_': 'cosmos_framework.model." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vfm.tokenizers.audio.avae.AVAEInterface'}, 'sound_dim': 64, 'sound_latent_fps': 25, 'log_enc_time_ev" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ery_n': 100, '_type': 'cosmos_framework.configs.base.defaults.model_config.OmniMoTModelConfig'}\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:17|job=|WARNING|cosmos_framework/model/vfm/omni_mot_model.py:96:set_precision] OmniMoTM" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel: precision torch.bfloat16\n", - "[06-08 21:34:17|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":_hf_download] uvx hf@1.16.4 download --format=json nvidia/Cosmos3-Nano --repo-type model --revision" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:19|job=|INFO|cosmos_framework/data/vfm/processors/base.py:122:__init__] Successfully lo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aded processor from local cache\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:19|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt Wan2.2/vae(fc102efeb13c462b97a1747aacad26c7)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:19|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json Wan-AI/Wan2.2-TI2V-5B --repo-type model --revision 921dbaf3f1674a56f47e83fb80a3" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4bac8a8f203e Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:20|job=|INFO|cosmos_framework/model/vfm/tokenizers/wan2pt2_vae_4x16x16.py:1015:_video_v" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ae] loading /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "okbooks/cosmos3/generator/transfer/.cache/huggingface/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/9" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "21dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:20|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt AVAE(e78076483acc4f1db8ded53aabbbb9f7)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:20|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include 'sound_tokenize" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r/*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:23|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:168:set_up_tokenizers] Sound " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tokenizer initialized: AVAEInterface\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:23|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_tokenizers: 5.58 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on meta to cuda and b" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "roadcast model states: 1.39 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on Creating PyTorch m" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel and ema if enabled: 2.31 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_model: 2.31 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:29|job=|INFO|cosmos_framework/inference/inference.py:1588:_generate_transfer_batch] [RA" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NK 0] Saved sample args to '/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks/blur/transfer_blur/sample_args." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "json'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:30|job=|INFO|cosmos_framework/inference/transfer.py:111:load_transfer_control_frames] L" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oaded pre-computed blur control from /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/use" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "rs/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/specs/../assets/blur/control_blur.mp4\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:34:33|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:2533:generate_samples_from_ba" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tch] Using sampler: UniPC (shift=10.0, num_steps=50)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r", - "Sampling: 0%| | 0/50 [00:00/Cosmos3-Nano/transfer_seg/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 12, - "id": "d3899c5b", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:39:06.833397Z", - "iopub.status.busy": "2026-06-09T04:39:06.833263Z", - "iopub.status.idle": "2026-06-09T04:39:07.135464Z", - "shell.execute_reply": "2026-06-09T04:39:07.135118Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "blur control: control_blur.mp4 (399 KB -> 180 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "blur generated: vision.mp4 (27315 KB -> 220 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } + "execution_count": null, + "id": "a1185f0c", + "metadata": {}, + "outputs": [], + "source": "%%bash\nset -euo pipefail\nunset LD_LIBRARY_PATH\nCONTROL=seg\nSPEC=\"$COSMOS3_TRANSFER_ROOT/specs/${CONTROL}.json\"\nOUT_DIR=\"$COSMOS3_TRANSFER_OUTPUT_ROOT/Cosmos3-Nano\"\nmkdir -p \"$OUT_DIR\"\necho \"control=$CONTROL model=Cosmos3-Nano spec=$SPEC output=$OUT_DIR\"\ncd \"$COSMOS3_REPO\"\nCUDA_VISIBLE_DEVICES=\"${CUDA_VISIBLE_DEVICES}\" \\\n.venv/bin/python -m cosmos_framework.scripts.inference \\\n --parallelism-preset=latency \\\n -i \"$SPEC\" \\\n -o \"$OUT_DIR\" \\\n --checkpoint-path Cosmos3-Nano \\\n --seed 2026\n" + }, + { + "cell_type": "markdown", + "id": "dbdf451b", + "metadata": {}, + "source": [ + "### Preview Segmentation (Nano)\n" ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f17565f", + "metadata": {}, + "outputs": [], "source": [ + "import os\n", + "import sys\n", + "from pathlib import Path\n", + "\n", + "_root = Path.cwd()\n", + "if not (_root / \"preview_helpers.py\").is_file():\n", + " for p in [_root, *_root.parents]:\n", + " cand = p / \"cookbooks\" / \"cosmos3\" / \"generator\" / \"transfer\"\n", + " if (cand / \"preview_helpers.py\").is_file():\n", + " _root = cand\n", + " break\n", + "if str(_root) not in sys.path:\n", + " sys.path.insert(0, str(_root))\n", + "\n", "from preview_helpers import preview_transfer\n", "\n", - "preview_transfer(\"blur\")\n" + "preview_transfer(\"seg\", model=\"Cosmos3-Nano\")\n" ] }, { "cell_type": "markdown", - "id": "transfer-depth-md", + "id": "866eeb89", "metadata": {}, "source": [ - "## 11. Depth Transfer\n", + "## 13. Nano: WSM Transfer\n", "\n", - "Depth-map control (`control_depth.mp4`) + caption. Output: `.../depth/transfer_depth/vision.mp4`.\n" - ] + "World-scenario map control (`control_wsm.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Nano/transfer_wsm/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 13, - "id": "2ea745d4", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:39:07.136809Z", - "iopub.status.busy": "2026-06-09T04:39:07.136661Z", - "iopub.status.idle": "2026-06-09T04:47:18.715780Z", - "shell.execute_reply": "2026-06-09T04:47:18.715346Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "control=depth spec=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "smos/cookbooks/cosmos3/generator/transfer/specs/depth.json output=/lustre/fsw/portfolios/cosmos/proj" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ects/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/not" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ebooks/depth checkpoint=Cosmos3-Nano\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:00|job=|INFO|cosmos_framework/inference/common/init.py:127:_init_log_files] Console log" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " saved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cook" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "books/cosmos3/generator/transfer/outputs/notebooks/depth/console.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:00|job=|INFO|cosmos_framework/inference/common/init.py:128:_init_log_files] Debug log s" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oks/cosmos3/generator/transfer/outputs/notebooks/depth/debug.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:00|job=|INFO|cosmos_framework/scripts/inference.py:46:inference] Loaded 1 samples\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:06|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos-Guardrail1 --repo-type model --revision d6d4bfa899a71454a70090766" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4f3e88f503950cf --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:12|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:13|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:71:__init__] OmniMoTModel: co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nfig {'tokenizer': {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/gcp_training.secret', 'vae_path': 'pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth', 'chunk_dura" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion': 93, 'keep_decoder_cache': False, 'use_streaming_encode': False, 'encode_chunk_frames': {'256'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": 68, '480': 24, '720': 12}, 'encode_exact_durations': [17, 61, 73], 'spatial_compression_factor': 1" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "6, 'temporal_compression_factor': 4, 'temporal_window': None, 'encode_bucket_multiple': None, '_targ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "et_': 'cosmos_framework.model.vfm.tokenizers.wan2pt2_vae_4x16x16.Wan2pt2VAEInterface'}, 'net': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'ema': {'enabled': False, 'rate': 0.1, 'iteration_shift': 0, '_type': 'cosmos_framework.configs.bas" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e.defaults.ema.EMAConfig'}, 'parallelism': {'data_parallel_shard_degree': 1, 'data_parallel_replicat" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_degree': 1, 'context_parallel_shard_degree': 1, 'cfg_parallel_shard_degree': 1, 'enable_inference_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mode': True, 'fsdp_master_dtype': 'float32', '_type': 'cosmos_framework.configs.base.defaults.parall" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "elism.ParallelismConfig'}, 'compile': {'enabled': True, 'compiled_region': 'all', 'compile_dynamic':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " True, 'use_cuda_graphs': False, 'max_autotune_pointwise': False, 'coordinate_descent_tuning': False" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", '_type': 'cosmos_framework.configs.base.defaults.compile.CompileConfig'}, 'activation_checkpointin" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g': {'mode': 'none', 'preserve_rng_state': True, 'determinism_check': 'default', 'save_ops_regex': [" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "'fmha'], '_type': 'cosmos_framework.configs.base.defaults.activation_checkpointing.ActivationCheckpo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "intingConfig'}, 'precision': 'bfloat16', 'lora_enabled': False, 'lora_rank': 16, 'lora_alpha': 32, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "lora_target_modules': 'q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen', 'rectified_flow" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_training_config': {'shift': {'256': 3, '480': 5, '720': 10}, 'use_dynamic_shift': False, 'train_tim" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_image_distribution': 'logitnormal', 'train_time_video_distribution': 'waver', 'train_time_action_d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "istribution': 'logitnormal', 'train_time_sound_distribution': 'logitnormal', 'train_time_weight': 'u" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "niform', 'loss_scale': 10.0, 'image_loss_scale': None, 'sound_loss_scale': 2.0, 'use_high_sigma_stra" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tegy': False, 'high_sigma_ratio': 0.05, 'high_sigma_timesteps_min': 995, 'high_sigma_timesteps_max':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 1000, 'use_discrete_rf': False, 'action_loss_weight': 10.0, 'independent_action_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_action': None, 'use_high_sigma_strategy_action': False, 'independent_sound_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_sound': None, 'use_high_sigma_strategy_sound': False, 'normalize_loss_by_active': False, '_typ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowTrainingConfig'}, 'rectified_f" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "low_inference_config': {'scheduler_type': 'unipc', 'num_train_timesteps': 1000, 'shift': 1, 'use_dyn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amic_shifting': False, '_type': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowIn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ferenceConfig'}, 'fixed_step_sampler_config': None, 'vlm_config': {'model_name': 'nvidia/Cosmos3-Nan" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "o-Reasoner', 'safetensors_path': '', 'pretrained_weights': {'enabled': True, 'backbone_path': 's3://" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Nano-Reasoner-bb9c6f5/', 'credentials_pa" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "th': 'credentials/gcp_checkpoint.secret', 'enable_gcs_patch_in_boto3': True, 'checkpoint_format': No" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ne, '_type': 'cosmos_framework.configs.base.defaults.vlm.PretrainedWeightsConfig'}, 'model_instance'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'_target_': 'cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLTextForCausalLM', 'config': {'_tar" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "get_': 'cosmos_framework.configs.base.defaults.vlm.create_vlm_config', 'base_config': {'_target_': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLMoTConfig.from_json_file', 'json_file': 'cosmos_fr" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amework/model/vfm/vlm/qwen3_vl/configs/Qwen3-VL-8B-Instruct.json'}, 'include_visual': True, 'qk_norm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_for_text': True}}, 'tokenizer': {'repository': 'nvidia/Cosmos3-Nano', 'revision': 'main', 'subdir':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " '', '_target_': 'cosmos_framework.data.vfm.processors.build_processor_lazy'}, 'layer_module': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'qk_norm': False, 'tie_word_embeddings': False, 'use_system_prompt': False, '_type': 'cosmos_framew" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ork.configs.base.defaults.vlm.VLMConfig'}, 'diffusion_expert_config': {'timestep_range': 1.0, 'load_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "weights_from_pretrained': False, 'patch_spatial': 2, 'max_vae_latent_side_after_patchify': 20, 'posi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion_embedding_type': 'unified_3d_mrope', 'rope_h_extrapolation_ratio': 1.0, 'rope_w_extrapolation_r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "atio': 1.0, 'rope_t_extrapolation_ratio': 1.0, 'enable_fps_modulation': True, 'base_fps': 24, 'unifi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ed_3d_mrope_reset_spatial_ids': True, 'unified_3d_mrope_temporal_modality_margin': 15000, '_type': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.configs.base.defaults.model_config.DiffusionExpertConfig'}, 'input_video_key': 'vid" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "eo', 'input_image_key': 'images', 'input_caption_key': 'ai_caption', 'state_ch': 48, 'state_t': 300," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'latent_downsample_factor': 16, 'resolution': '720', 'max_num_tokens_after_packing': 74000, 'joint_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "attn_implementation': 'two_way', 'natten_parameter_list': None, 'video_temporal_causal': False, 'cau" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sal_training_strategy': 'none', 'lbl': {'method': 'local', 'coeff_und': None, 'coeff_gen': None, '_t" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ype': 'cosmos_framework.configs.base.defaults.model_config.LBLConfig'}, 'vision_gen': True, 'action_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "gen': True, 'max_action_dim': 64, 'num_embodiment_domains': 32, 'sound_gen': True, 'sound_tokenizer'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials/gcp_training.sec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ret', 'avae_path': 'pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt', 'avae_confi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g_path': '', 'sample_rate': 48000, 'audio_channels': 2, 'io_channels': 64, 'hop_size': 1920, 'normal" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ize_latents': False, 'normalization_type': 'none', 'tanh_input_scale': 1.5, 'tanh_output_scale': 3.5" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", 'tanh_clamp': 0.995, 'latent_mean': None, 'latent_std': None, '_target_': 'cosmos_framework.model." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vfm.tokenizers.audio.avae.AVAEInterface'}, 'sound_dim': 64, 'sound_latent_fps': 25, 'log_enc_time_ev" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ery_n': 100, '_type': 'cosmos_framework.configs.base.defaults.model_config.OmniMoTModelConfig'}\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:13|job=|WARNING|cosmos_framework/model/vfm/omni_mot_model.py:96:set_precision] OmniMoTM" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel: precision torch.bfloat16\n", - "[06-08 21:42:13|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":_hf_download] uvx hf@1.16.4 download --format=json nvidia/Cosmos3-Nano --repo-type model --revision" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:15|job=|INFO|cosmos_framework/data/vfm/processors/base.py:122:__init__] Successfully lo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aded processor from local cache\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:15|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt Wan2.2/vae(30234714af5443c6aaf34529aa011928)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:15|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json Wan-AI/Wan2.2-TI2V-5B --repo-type model --revision 921dbaf3f1674a56f47e83fb80a3" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4bac8a8f203e Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:18|job=|INFO|cosmos_framework/model/vfm/tokenizers/wan2pt2_vae_4x16x16.py:1015:_video_v" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ae] loading /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "okbooks/cosmos3/generator/transfer/.cache/huggingface/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/9" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "21dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:18|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt AVAE(6396625cf77342e8a62a07db1da2ad21)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:18|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include 'sound_tokenize" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r/*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:21|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:168:set_up_tokenizers] Sound " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tokenizer initialized: AVAEInterface\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:21|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_tokenizers: 7.36 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:23|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on meta to cuda and b" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "roadcast model states: 1.34 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:23|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on Creating PyTorch m" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel and ema if enabled: 2.26 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:23|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_model: 2.26 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:35|job=|INFO|cosmos_framework/inference/inference.py:1588:_generate_transfer_batch] [RA" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NK 0] Saved sample args to '/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks/depth/transfer_depth/sample_arg" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "s.json'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:36|job=|INFO|cosmos_framework/inference/transfer.py:111:load_transfer_control_frames] L" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oaded pre-computed depth control from /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/us" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ers/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/specs/../assets/depth/control_depth.mp4" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:42:39|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:2533:generate_samples_from_ba" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tch] Using sampler: UniPC (shift=10.0, num_steps=50)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r", - "Sampling: 0%| | 0/50 [00:00/Cosmos3-Super/transfer_/vision.mp4`.\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "d0b5887b", + "metadata": {}, + "source": [ + "## 14. Super: Edge (Canny) Transfer\n", + "\n", + "Precomputed edge control (`control_edge.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Super/transfer_edge/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 14, - "id": "05775489", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:47:18.717459Z", - "iopub.status.busy": "2026-06-09T04:47:18.717327Z", - "iopub.status.idle": "2026-06-09T04:47:19.161650Z", - "shell.execute_reply": "2026-06-09T04:47:19.161308Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "depth control: control_depth.mp4 (1027 KB -> 236 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "depth generated: vision.mp4 (47734 KB -> 1178 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } + "execution_count": null, + "id": "4c7cf243", + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\nset -euo pipefail\nunset LD_LIBRARY_PATH\nCONTROL=edge\nSPEC=\"$COSMOS3_TRANSFER_ROOT/specs/${CONTROL}.json\"\nOUT_DIR=\"$COSMOS3_TRANSFER_OUTPUT_ROOT/Cosmos3-Super\"\nmkdir -p \"$OUT_DIR\"\necho \"control=$CONTROL model=Cosmos3-Super spec=$SPEC output=$OUT_DIR\"\ncd \"$COSMOS3_REPO\"\nCUDA_VISIBLE_DEVICES=\"${CUDA_VISIBLE_DEVICES}\" \\\n.venv/bin/torchrun \\\n --nproc-per-node=\"${COSMOS3_NUM_GPUS}\" \\\n --master-addr=\"${COSMOS3_MASTER_ADDR}\" \\\n --master-port=\"${COSMOS3_MASTER_PORT}\" \\\n -m cosmos_framework.scripts.inference \\\n --parallelism-preset=latency \\\n -i \"$SPEC\" \\\n -o \"$OUT_DIR\" \\\n --checkpoint-path Cosmos3-Super \\\n --seed 2026\n" + ] + }, + { + "cell_type": "markdown", + "id": "65054cf3", + "metadata": {}, + "source": [ + "### Preview Edge (Canny) (Super)\n" ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc85a424", + "metadata": {}, + "outputs": [], "source": [ + "import os\n", "import sys\n", "from pathlib import Path\n", "\n", @@ -5162,1396 +1020,53 @@ "\n", "from preview_helpers import preview_transfer\n", "\n", - "preview_transfer(\"depth\")\n" + "preview_transfer(\"edge\", model=\"Cosmos3-Super\")\n" ] }, { "cell_type": "markdown", - "id": "transfer-seg-md", + "id": "880ada8f", "metadata": {}, "source": [ - "## 12. Segmentation Transfer\n", + "## 15. Super: Blur Transfer\n", "\n", - "Segmentation-map control (`control_seg.mp4`) + caption. Output: `.../seg/transfer_seg/vision.mp4`.\n" - ] + "Blurred-reference control (`control_blur.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Super/transfer_blur/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 15, - "id": "4f82a330", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:47:19.164637Z", - "iopub.status.busy": "2026-06-09T04:47:19.164512Z", - "iopub.status.idle": "2026-06-09T04:52:51.843155Z", - "shell.execute_reply": "2026-06-09T04:52:51.842702Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "control=seg spec=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "os/cookbooks/cosmos3/generator/transfer/specs/seg.json output=/lustre/fsw/portfolios/cosmos/projects" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/noteboo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ks/seg checkpoint=Cosmos3-Nano\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:33|job=|INFO|cosmos_framework/inference/common/init.py:127:_init_log_files] Console log" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " saved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cook" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "books/cosmos3/generator/transfer/outputs/notebooks/seg/console.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:33|job=|INFO|cosmos_framework/inference/common/init.py:128:_init_log_files] Debug log s" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oks/cosmos3/generator/transfer/outputs/notebooks/seg/debug.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:33|job=|INFO|cosmos_framework/scripts/inference.py:46:inference] Loaded 1 samples\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:37|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos-Guardrail1 --repo-type model --revision d6d4bfa899a71454a70090766" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4f3e88f503950cf --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:41|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:42|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:71:__init__] OmniMoTModel: co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nfig {'tokenizer': {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/gcp_training.secret', 'vae_path': 'pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth', 'chunk_dura" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion': 93, 'keep_decoder_cache': False, 'use_streaming_encode': False, 'encode_chunk_frames': {'256'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": 68, '480': 24, '720': 12}, 'encode_exact_durations': [17, 61, 73], 'spatial_compression_factor': 1" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "6, 'temporal_compression_factor': 4, 'temporal_window': None, 'encode_bucket_multiple': None, '_targ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "et_': 'cosmos_framework.model.vfm.tokenizers.wan2pt2_vae_4x16x16.Wan2pt2VAEInterface'}, 'net': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'ema': {'enabled': False, 'rate': 0.1, 'iteration_shift': 0, '_type': 'cosmos_framework.configs.bas" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e.defaults.ema.EMAConfig'}, 'parallelism': {'data_parallel_shard_degree': 1, 'data_parallel_replicat" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_degree': 1, 'context_parallel_shard_degree': 1, 'cfg_parallel_shard_degree': 1, 'enable_inference_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mode': True, 'fsdp_master_dtype': 'float32', '_type': 'cosmos_framework.configs.base.defaults.parall" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "elism.ParallelismConfig'}, 'compile': {'enabled': True, 'compiled_region': 'all', 'compile_dynamic':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " True, 'use_cuda_graphs': False, 'max_autotune_pointwise': False, 'coordinate_descent_tuning': False" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", '_type': 'cosmos_framework.configs.base.defaults.compile.CompileConfig'}, 'activation_checkpointin" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g': {'mode': 'none', 'preserve_rng_state': True, 'determinism_check': 'default', 'save_ops_regex': [" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "'fmha'], '_type': 'cosmos_framework.configs.base.defaults.activation_checkpointing.ActivationCheckpo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "intingConfig'}, 'precision': 'bfloat16', 'lora_enabled': False, 'lora_rank': 16, 'lora_alpha': 32, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "lora_target_modules': 'q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen', 'rectified_flow" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_training_config': {'shift': {'256': 3, '480': 5, '720': 10}, 'use_dynamic_shift': False, 'train_tim" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_image_distribution': 'logitnormal', 'train_time_video_distribution': 'waver', 'train_time_action_d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "istribution': 'logitnormal', 'train_time_sound_distribution': 'logitnormal', 'train_time_weight': 'u" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "niform', 'loss_scale': 10.0, 'image_loss_scale': None, 'sound_loss_scale': 2.0, 'use_high_sigma_stra" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tegy': False, 'high_sigma_ratio': 0.05, 'high_sigma_timesteps_min': 995, 'high_sigma_timesteps_max':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 1000, 'use_discrete_rf': False, 'action_loss_weight': 10.0, 'independent_action_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_action': None, 'use_high_sigma_strategy_action': False, 'independent_sound_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_sound': None, 'use_high_sigma_strategy_sound': False, 'normalize_loss_by_active': False, '_typ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowTrainingConfig'}, 'rectified_f" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "low_inference_config': {'scheduler_type': 'unipc', 'num_train_timesteps': 1000, 'shift': 1, 'use_dyn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amic_shifting': False, '_type': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowIn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ferenceConfig'}, 'fixed_step_sampler_config': None, 'vlm_config': {'model_name': 'nvidia/Cosmos3-Nan" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "o-Reasoner', 'safetensors_path': '', 'pretrained_weights': {'enabled': True, 'backbone_path': 's3://" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Nano-Reasoner-bb9c6f5/', 'credentials_pa" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "th': 'credentials/gcp_checkpoint.secret', 'enable_gcs_patch_in_boto3': True, 'checkpoint_format': No" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ne, '_type': 'cosmos_framework.configs.base.defaults.vlm.PretrainedWeightsConfig'}, 'model_instance'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'_target_': 'cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLTextForCausalLM', 'config': {'_tar" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "get_': 'cosmos_framework.configs.base.defaults.vlm.create_vlm_config', 'base_config': {'_target_': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLMoTConfig.from_json_file', 'json_file': 'cosmos_fr" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amework/model/vfm/vlm/qwen3_vl/configs/Qwen3-VL-8B-Instruct.json'}, 'include_visual': True, 'qk_norm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_for_text': True}}, 'tokenizer': {'repository': 'nvidia/Cosmos3-Nano', 'revision': 'main', 'subdir':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " '', '_target_': 'cosmos_framework.data.vfm.processors.build_processor_lazy'}, 'layer_module': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'qk_norm': False, 'tie_word_embeddings': False, 'use_system_prompt': False, '_type': 'cosmos_framew" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ork.configs.base.defaults.vlm.VLMConfig'}, 'diffusion_expert_config': {'timestep_range': 1.0, 'load_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "weights_from_pretrained': False, 'patch_spatial': 2, 'max_vae_latent_side_after_patchify': 20, 'posi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion_embedding_type': 'unified_3d_mrope', 'rope_h_extrapolation_ratio': 1.0, 'rope_w_extrapolation_r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "atio': 1.0, 'rope_t_extrapolation_ratio': 1.0, 'enable_fps_modulation': True, 'base_fps': 24, 'unifi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ed_3d_mrope_reset_spatial_ids': True, 'unified_3d_mrope_temporal_modality_margin': 15000, '_type': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.configs.base.defaults.model_config.DiffusionExpertConfig'}, 'input_video_key': 'vid" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "eo', 'input_image_key': 'images', 'input_caption_key': 'ai_caption', 'state_ch': 48, 'state_t': 300," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'latent_downsample_factor': 16, 'resolution': '720', 'max_num_tokens_after_packing': 74000, 'joint_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "attn_implementation': 'two_way', 'natten_parameter_list': None, 'video_temporal_causal': False, 'cau" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sal_training_strategy': 'none', 'lbl': {'method': 'local', 'coeff_und': None, 'coeff_gen': None, '_t" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ype': 'cosmos_framework.configs.base.defaults.model_config.LBLConfig'}, 'vision_gen': True, 'action_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "gen': True, 'max_action_dim': 64, 'num_embodiment_domains': 32, 'sound_gen': True, 'sound_tokenizer'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials/gcp_training.sec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ret', 'avae_path': 'pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt', 'avae_confi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g_path': '', 'sample_rate': 48000, 'audio_channels': 2, 'io_channels': 64, 'hop_size': 1920, 'normal" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ize_latents': False, 'normalization_type': 'none', 'tanh_input_scale': 1.5, 'tanh_output_scale': 3.5" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", 'tanh_clamp': 0.995, 'latent_mean': None, 'latent_std': None, '_target_': 'cosmos_framework.model." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vfm.tokenizers.audio.avae.AVAEInterface'}, 'sound_dim': 64, 'sound_latent_fps': 25, 'log_enc_time_ev" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ery_n': 100, '_type': 'cosmos_framework.configs.base.defaults.model_config.OmniMoTModelConfig'}\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:42|job=|WARNING|cosmos_framework/model/vfm/omni_mot_model.py:96:set_precision] OmniMoTM" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel: precision torch.bfloat16\n", - "[06-08 21:47:42|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":_hf_download] uvx hf@1.16.4 download --format=json nvidia/Cosmos3-Nano --repo-type model --revision" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:43|job=|INFO|cosmos_framework/data/vfm/processors/base.py:122:__init__] Successfully lo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aded processor from local cache\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:43|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt Wan2.2/vae(8eb09fc43b664e239b546c5d5241782e)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:43|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json Wan-AI/Wan2.2-TI2V-5B --repo-type model --revision 921dbaf3f1674a56f47e83fb80a3" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4bac8a8f203e Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:45|job=|INFO|cosmos_framework/model/vfm/tokenizers/wan2pt2_vae_4x16x16.py:1015:_video_v" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ae] loading /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "okbooks/cosmos3/generator/transfer/.cache/huggingface/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/9" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "21dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:45|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt AVAE(591169f40db14c12b91f1c1d73311e37)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:45|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include 'sound_tokenize" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r/*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:48|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:168:set_up_tokenizers] Sound " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tokenizer initialized: AVAEInterface\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:48|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_tokenizers: 5.76 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:50|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on meta to cuda and b" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "roadcast model states: 1.26 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:50|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on Creating PyTorch m" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel and ema if enabled: 1.87 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:47:50|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_model: 1.87 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:48:12|job=|INFO|cosmos_framework/inference/inference.py:1588:_generate_transfer_batch] [RA" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NK 0] Saved sample args to '/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks/seg/transfer_seg/sample_args.js" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "on'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:48:14|job=|INFO|cosmos_framework/inference/transfer.py:111:load_transfer_control_frames] L" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oaded pre-computed seg control from /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/user" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "s/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/specs/../assets/seg/control_seg.mp4\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:48:16|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:2533:generate_samples_from_ba" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tch] Using sampler: UniPC (shift=10.0, num_steps=50)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r", - "Sampling: 0%| | 0/50 [00:00 425 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "seg generated: vision.mp4 (31289 KB -> 207 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": null, + "id": "06dba23b", + "metadata": {}, + "outputs": [], "source": [ + "import os\n", "import sys\n", "from pathlib import Path\n", "\n", @@ -6567,1397 +1082,115 @@ "\n", "from preview_helpers import preview_transfer\n", "\n", - "preview_transfer(\"seg\")\n" + "preview_transfer(\"blur\", model=\"Cosmos3-Super\")\n" ] }, { "cell_type": "markdown", - "id": "transfer-wsm-md", + "id": "f8425e18", "metadata": {}, "source": [ - "## 13. World Scenario (WSM) Transfer\n", + "## 16. Super: Depth Transfer\n", "\n", - "World-scenario control (`control_wsm.mp4`) + caption. Output: `.../wsm/transfer_wsm/vision.mp4`.\n" - ] + "Depth map control (`control_depth.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Super/transfer_depth/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null }, { "cell_type": "code", - "execution_count": 17, - "id": "cb8b2527", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:52:52.204226Z", - "iopub.status.busy": "2026-06-09T04:52:52.204103Z", - "iopub.status.idle": "2026-06-09T04:55:42.396578Z", - "shell.execute_reply": "2026-06-09T04:55:42.396144Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "control=wsm spec=/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "os/cookbooks/cosmos3/generator/transfer/specs/wsm.json output=/lustre/fsw/portfolios/cosmos/projects" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/cosmos_base_training/users/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/noteboo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ks/wsm checkpoint=Cosmos3-Nano\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:06|job=|INFO|cosmos_framework/inference/common/init.py:127:_init_log_files] Console log" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " saved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cook" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "books/cosmos3/generator/transfer/outputs/notebooks/wsm/console.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:06|job=|INFO|cosmos_framework/inference/common/init.py:128:_init_log_files] Debug log s" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aved to /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/cookbo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oks/cosmos3/generator/transfer/outputs/notebooks/wsm/debug.log\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:06|job=|INFO|cosmos_framework/scripts/inference.py:46:inference] Loaded 1 samples\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:11|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos-Guardrail1 --repo-type model --revision d6d4bfa899a71454a70090766" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4f3e88f503950cf --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:16|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:17|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:71:__init__] OmniMoTModel: co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nfig {'tokenizer': {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/gcp_training.secret', 'vae_path': 'pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth', 'chunk_dura" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion': 93, 'keep_decoder_cache': False, 'use_streaming_encode': False, 'encode_chunk_frames': {'256'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": 68, '480': 24, '720': 12}, 'encode_exact_durations': [17, 61, 73], 'spatial_compression_factor': 1" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "6, 'temporal_compression_factor': 4, 'temporal_window': None, 'encode_bucket_multiple': None, '_targ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "et_': 'cosmos_framework.model.vfm.tokenizers.wan2pt2_vae_4x16x16.Wan2pt2VAEInterface'}, 'net': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'ema': {'enabled': False, 'rate': 0.1, 'iteration_shift': 0, '_type': 'cosmos_framework.configs.bas" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e.defaults.ema.EMAConfig'}, 'parallelism': {'data_parallel_shard_degree': 1, 'data_parallel_replicat" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_degree': 1, 'context_parallel_shard_degree': 1, 'cfg_parallel_shard_degree': 1, 'enable_inference_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "mode': True, 'fsdp_master_dtype': 'float32', '_type': 'cosmos_framework.configs.base.defaults.parall" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "elism.ParallelismConfig'}, 'compile': {'enabled': True, 'compiled_region': 'all', 'compile_dynamic':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " True, 'use_cuda_graphs': False, 'max_autotune_pointwise': False, 'coordinate_descent_tuning': False" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", '_type': 'cosmos_framework.configs.base.defaults.compile.CompileConfig'}, 'activation_checkpointin" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g': {'mode': 'none', 'preserve_rng_state': True, 'determinism_check': 'default', 'save_ops_regex': [" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "'fmha'], '_type': 'cosmos_framework.configs.base.defaults.activation_checkpointing.ActivationCheckpo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "intingConfig'}, 'precision': 'bfloat16', 'lora_enabled': False, 'lora_rank': 16, 'lora_alpha': 32, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "lora_target_modules': 'q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen', 'rectified_flow" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_training_config': {'shift': {'256': 3, '480': 5, '720': 10}, 'use_dynamic_shift': False, 'train_tim" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e_image_distribution': 'logitnormal', 'train_time_video_distribution': 'waver', 'train_time_action_d" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "istribution': 'logitnormal', 'train_time_sound_distribution': 'logitnormal', 'train_time_weight': 'u" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "niform', 'loss_scale': 10.0, 'image_loss_scale': None, 'sound_loss_scale': 2.0, 'use_high_sigma_stra" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tegy': False, 'high_sigma_ratio': 0.05, 'high_sigma_timesteps_min': 995, 'high_sigma_timesteps_max':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 1000, 'use_discrete_rf': False, 'action_loss_weight': 10.0, 'independent_action_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_action': None, 'use_high_sigma_strategy_action': False, 'independent_sound_schedule': False, '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "shift_sound': None, 'use_high_sigma_strategy_sound': False, 'normalize_loss_by_active': False, '_typ" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "e': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowTrainingConfig'}, 'rectified_f" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "low_inference_config': {'scheduler_type': 'unipc', 'num_train_timesteps': 1000, 'shift': 1, 'use_dyn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amic_shifting': False, '_type': 'cosmos_framework.configs.base.defaults.model_config.RectifiedFlowIn" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ferenceConfig'}, 'fixed_step_sampler_config': None, 'vlm_config': {'model_name': 'nvidia/Cosmos3-Nan" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "o-Reasoner', 'safetensors_path': '', 'pretrained_weights': {'enabled': True, 'backbone_path': 's3://" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Nano-Reasoner-bb9c6f5/', 'credentials_pa" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "th': 'credentials/gcp_checkpoint.secret', 'enable_gcs_patch_in_boto3': True, 'checkpoint_format': No" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ne, '_type': 'cosmos_framework.configs.base.defaults.vlm.PretrainedWeightsConfig'}, 'model_instance'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'_target_': 'cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLTextForCausalLM', 'config': {'_tar" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "get_': 'cosmos_framework.configs.base.defaults.vlm.create_vlm_config', 'base_config': {'_target_': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.model.vfm.mot.unified_mot.Qwen3VLMoTConfig.from_json_file', 'json_file': 'cosmos_fr" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "amework/model/vfm/vlm/qwen3_vl/configs/Qwen3-VL-8B-Instruct.json'}, 'include_visual': True, 'qk_norm" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "_for_text': True}}, 'tokenizer': {'repository': 'nvidia/Cosmos3-Nano', 'revision': 'main', 'subdir':" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " '', '_target_': 'cosmos_framework.data.vfm.processors.build_processor_lazy'}, 'layer_module': None," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'qk_norm': False, 'tie_word_embeddings': False, 'use_system_prompt': False, '_type': 'cosmos_framew" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ork.configs.base.defaults.vlm.VLMConfig'}, 'diffusion_expert_config': {'timestep_range': 1.0, 'load_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "weights_from_pretrained': False, 'patch_spatial': 2, 'max_vae_latent_side_after_patchify': 20, 'posi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tion_embedding_type': 'unified_3d_mrope', 'rope_h_extrapolation_ratio': 1.0, 'rope_w_extrapolation_r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "atio': 1.0, 'rope_t_extrapolation_ratio': 1.0, 'enable_fps_modulation': True, 'base_fps': 24, 'unifi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ed_3d_mrope_reset_spatial_ids': True, 'unified_3d_mrope_temporal_modality_margin': 15000, '_type': '" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cosmos_framework.configs.base.defaults.model_config.DiffusionExpertConfig'}, 'input_video_key': 'vid" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "eo', 'input_image_key': 'images', 'input_caption_key': 'ai_caption', 'state_ch': 48, 'state_t': 300," - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 'latent_downsample_factor': 16, 'resolution': '720', 'max_num_tokens_after_packing': 74000, 'joint_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "attn_implementation': 'two_way', 'natten_parameter_list': None, 'video_temporal_causal': False, 'cau" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sal_training_strategy': 'none', 'lbl': {'method': 'local', 'coeff_und': None, 'coeff_gen': None, '_t" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ype': 'cosmos_framework.configs.base.defaults.model_config.LBLConfig'}, 'vision_gen': True, 'action_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "gen': True, 'max_action_dim': 64, 'num_embodiment_domains': 32, 'sound_gen': True, 'sound_tokenizer'" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ": {'bucket_name': 'bucket', 'object_store_credential_path_pretrained': 'credentials/gcp_training.sec" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ret', 'avae_path': 'pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt', 'avae_confi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "g_path': '', 'sample_rate': 48000, 'audio_channels': 2, 'io_channels': 64, 'hop_size': 1920, 'normal" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ize_latents': False, 'normalization_type': 'none', 'tanh_input_scale': 1.5, 'tanh_output_scale': 3.5" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ", 'tanh_clamp': 0.995, 'latent_mean': None, 'latent_std': None, '_target_': 'cosmos_framework.model." - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vfm.tokenizers.audio.avae.AVAEInterface'}, 'sound_dim': 64, 'sound_latent_fps': 25, 'log_enc_time_ev" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ery_n': 100, '_type': 'cosmos_framework.configs.base.defaults.model_config.OmniMoTModelConfig'}\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:17|job=|WARNING|cosmos_framework/model/vfm/omni_mot_model.py:96:set_precision] OmniMoTM" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel: precision torch.bfloat16\n", - "[06-08 21:53:17|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":_hf_download] uvx hf@1.16.4 download --format=json nvidia/Cosmos3-Nano --repo-type model --revision" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " main --include '*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:18|job=|INFO|cosmos_framework/data/vfm/processors/base.py:122:__init__] Successfully lo" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "aded processor from local cache\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:18|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt Wan2.2/vae(d80827022798480db7f2cf133d38d2b4)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:18|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json Wan-AI/Wan2.2-TI2V-5B --repo-type model --revision 921dbaf3f1674a56f47e83fb80a3" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4bac8a8f203e Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:20|job=|INFO|cosmos_framework/model/vfm/tokenizers/wan2pt2_vae_4x16x16.py:1015:_video_v" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ae] loading /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp/repos/cosmos/co" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "okbooks/cosmos3/generator/transfer/.cache/huggingface/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/9" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "21dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:20|job=|INFO|cosmos_framework/utils/checkpoint_db.py:320:download] Downloading checkpoi" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "nt AVAE(efab207a0365473d8aa57ed87b0e2e35)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:20|job=|INFO|cosmos_framework/utils/checkpoint_db.py:156:_hf_download] uvx hf@1.16.4 do" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wnload --format=json nvidia/Cosmos3-Nano --repo-type model --revision main --include 'sound_tokenize" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "r/*'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:23|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:168:set_up_tokenizers] Sound " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tokenizer initialized: AVAEInterface\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:23|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_tokenizers: 5.66 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on meta to cuda and b" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "roadcast model states: 1.07 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on Creating PyTorch m" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "odel and ema if enabled: 2.01 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:25|job=|INFO|cosmos_framework/utils/timer.py:138:_log] Time spent on OmniMoTModel: set_" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "up_model: 2.01 s\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:28|job=|INFO|cosmos_framework/inference/inference.py:1588:_generate_transfer_batch] [RA" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "NK 0] Saved sample args to '/lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/users/trungp" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/repos/cosmos/cookbooks/cosmos3/generator/transfer/outputs/notebooks/wsm/transfer_wsm/sample_args.js" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "on'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:29|job=|INFO|cosmos_framework/inference/transfer.py:111:load_transfer_control_frames] L" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "oaded pre-computed wsm control from /lustre/fsw/portfolios/cosmos/projects/cosmos_base_training/user" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "s/trungp/repos/cosmos/cookbooks/cosmos3/generator/transfer/specs/../assets/wsm/control_wsm.mp4\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[06-08 21:53:32|job=|INFO|cosmos_framework/model/vfm/omni_mot_model.py:2533:generate_samples_from_ba" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tch] Using sampler: UniPC (shift=10.0, num_steps=50)\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r", - "Sampling: 0%| | 0/50 [00:00/Cosmos3-Super/transfer_seg/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null + }, { "cell_type": "code", - "execution_count": 18, - "id": "94f7a729", - "metadata": { - "execution": { - "iopub.execute_input": "2026-06-09T04:55:42.398576Z", - "iopub.status.busy": "2026-06-09T04:55:42.398435Z", - "iopub.status.idle": "2026-06-09T04:55:42.676439Z", - "shell.execute_reply": "2026-06-09T04:55:42.676093Z" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wsm control: control_wsm.mp4 (3527 KB -> 191 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "wsm generated: vision.mp4 (23384 KB -> 387 KB preview)\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } + "execution_count": null, + "id": "d0f6bb46", + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\nset -euo pipefail\nunset LD_LIBRARY_PATH\nCONTROL=seg\nSPEC=\"$COSMOS3_TRANSFER_ROOT/specs/${CONTROL}.json\"\nOUT_DIR=\"$COSMOS3_TRANSFER_OUTPUT_ROOT/Cosmos3-Super\"\nmkdir -p \"$OUT_DIR\"\necho \"control=$CONTROL model=Cosmos3-Super spec=$SPEC output=$OUT_DIR\"\ncd \"$COSMOS3_REPO\"\nCUDA_VISIBLE_DEVICES=\"${CUDA_VISIBLE_DEVICES}\" \\\n.venv/bin/torchrun \\\n --nproc-per-node=\"${COSMOS3_NUM_GPUS}\" \\\n --master-addr=\"${COSMOS3_MASTER_ADDR}\" \\\n --master-port=\"${COSMOS3_MASTER_PORT}\" \\\n -m cosmos_framework.scripts.inference \\\n --parallelism-preset=latency \\\n -i \"$SPEC\" \\\n -o \"$OUT_DIR\" \\\n --checkpoint-path Cosmos3-Super \\\n --seed 2026\n" + ] + }, + { + "cell_type": "markdown", + "id": "662e2bf6", + "metadata": {}, + "source": [ + "### Preview Segmentation (Super)\n" ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "825845a5", + "metadata": {}, + "outputs": [], "source": [ + "import os\n", "import sys\n", "from pathlib import Path\n", "\n", @@ -7973,16 +1206,70 @@ "\n", "from preview_helpers import preview_transfer\n", "\n", - "preview_transfer(\"wsm\")\n" + "preview_transfer(\"seg\", model=\"Cosmos3-Super\")\n" ] }, + { + "cell_type": "markdown", + "id": "6f09421d", + "metadata": {}, + "source": [ + "## 18. Super: WSM Transfer\n", + "\n", + "World-scenario map control (`control_wsm.mp4`) + caption. Output:\n", + "\n", + "```text\n", + "/Cosmos3-Super/transfer_wsm/vision.mp4\n", + "```\n" + ], + "outputs": [], + "execution_count": null + }, { "cell_type": "code", "execution_count": null, - "id": "03f75e85", + "id": "bba08873", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "%%bash\nset -euo pipefail\nunset LD_LIBRARY_PATH\nCONTROL=wsm\nSPEC=\"$COSMOS3_TRANSFER_ROOT/specs/${CONTROL}.json\"\nOUT_DIR=\"$COSMOS3_TRANSFER_OUTPUT_ROOT/Cosmos3-Super\"\nmkdir -p \"$OUT_DIR\"\necho \"control=$CONTROL model=Cosmos3-Super spec=$SPEC output=$OUT_DIR\"\ncd \"$COSMOS3_REPO\"\nCUDA_VISIBLE_DEVICES=\"${CUDA_VISIBLE_DEVICES}\" \\\n.venv/bin/torchrun \\\n --nproc-per-node=\"${COSMOS3_NUM_GPUS}\" \\\n --master-addr=\"${COSMOS3_MASTER_ADDR}\" \\\n --master-port=\"${COSMOS3_MASTER_PORT}\" \\\n -m cosmos_framework.scripts.inference \\\n --parallelism-preset=latency \\\n -i \"$SPEC\" \\\n -o \"$OUT_DIR\" \\\n --checkpoint-path Cosmos3-Super \\\n --seed 2026\n" + ] + }, + { + "cell_type": "markdown", + "id": "db859493", + "metadata": {}, + "source": [ + "### Preview WSM (Super)\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f392cf44", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import sys\n", + "from pathlib import Path\n", + "\n", + "_root = Path.cwd()\n", + "if not (_root / \"preview_helpers.py\").is_file():\n", + " for p in [_root, *_root.parents]:\n", + " cand = p / \"cookbooks\" / \"cosmos3\" / \"generator\" / \"transfer\"\n", + " if (cand / \"preview_helpers.py\").is_file():\n", + " _root = cand\n", + " break\n", + "if str(_root) not in sys.path:\n", + " sys.path.insert(0, str(_root))\n", + "\n", + "from preview_helpers import preview_transfer\n", + "\n", + "preview_transfer(\"wsm\", model=\"Cosmos3-Super\")\n" + ] } ], "metadata": {