Skip to content

docs: clarify PAIBench-C reproduction seed and prompt format#211

Open
Muneerali199 wants to merge 1 commit into
NVIDIA:mainfrom
Muneerali199:patch-transfer-readme
Open

docs: clarify PAIBench-C reproduction seed and prompt format#211
Muneerali199 wants to merge 1 commit into
NVIDIA:mainfrom
Muneerali199:patch-transfer-readme

Conversation

@Muneerali199

Copy link
Copy Markdown

Closes NVIDIA/cosmos-framework#14

Adds a clarifying note to the transfer cookbook README addressing the two remaining questions from bhack on the PAIBench-C reproducibility issue:

  1. Seed: All clips use --seed 2026 as the canonical reference seed
  2. Prompt format: Prompts follow the structured prompt.json format shown in assets/*/

The evaluation non-determinism concern is tracked separately at SHI-Labs/physical-ai-bench#7.

Signed-off-by: Muneerali199 <alimuneerali245@gmail.com>
@bhack

bhack commented Jun 13, 2026

Copy link
Copy Markdown

But how the structured prompts are generated for the dataset?

Also the problem is not only about reproducibility striclty it is that if you compared with the official PAIBench-C precomputed dataset seg GT it is not reproducible. Have you recomputed source segmentation for your paper/model card?

@lfengad

lfengad commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

@trungtpham for review? THX!

@Muneerali199

Copy link
Copy Markdown
Author

Thanks for looking at this. The structured prompts follow the format in assets/*/prompt.json — basically load that template and fill in the scene params per clip. The generation code is in cookbooks/cosmos3/generator/transfer/.

About the source segmentation — I haven't compared against the official PAIBench-C precomputed GT yet. I'll add a note in the cookbook saying that's still pending and link to the non-determinism tracker (#7) for now. Will follow up once I've done the validation.

@bhack

bhack commented Jun 15, 2026

Copy link
Copy Markdown

I think we are quite far from reproducibility of the model card.

The remaining blocker for PAIBench-C reproduction is the prompt artifact.

PAIBench-C public prompts are natural-language captions in metadata.csv / captions/*.json, while the Cosmos3 cookbook uses a structured prompt.json schema.

Could you clarify exactly how the PAIBench-C captions were converted into Cosmos3 structured prompt.json files for Table 16?

In particular:

  1. Were the public PAIBench-C captions used directly, or converted into structured Cosmos3 prompt.json?
  2. If converted, was the input metadata.csv caption_text, captions/{task_id}.json, the source video, or some combination?
  3. Is the conversion script / system prompt / model available?
  4. Are the per-clip structured prompt.json files used for the 600 PAIBench-C examples available?
  5. Did the reported Table 16 segmentation result use those structured prompts plus official HF sam2_vids/sam2_pkls, or were source segmentations recomputed?

Without those prompt files or the conversion recipe, the released specs/seed/control settings define the inference shape, but not an exact reproduction of the PAIBench-C table, because the prompt conditioning differs from the public PAIBench-C dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need released-code recipe to reproduce Cosmos3 PAIBench-C transfer results

3 participants