docs: clarify PAIBench-C reproduction seed and prompt format#211
docs: clarify PAIBench-C reproduction seed and prompt format#211Muneerali199 wants to merge 1 commit into
Conversation
Signed-off-by: Muneerali199 <alimuneerali245@gmail.com>
|
But how the structured prompts are generated for the dataset? Also the problem is not only about reproducibility striclty it is that if you compared with the official PAIBench-C precomputed dataset seg GT it is not reproducible. Have you recomputed source segmentation for your paper/model card? |
|
@trungtpham for review? THX! |
|
Thanks for looking at this. The structured prompts follow the format in About the source segmentation — I haven't compared against the official PAIBench-C precomputed GT yet. I'll add a note in the cookbook saying that's still pending and link to the non-determinism tracker (#7) for now. Will follow up once I've done the validation. |
|
I think we are quite far from reproducibility of the model card. The remaining blocker for PAIBench-C reproduction is the prompt artifact. PAIBench-C public prompts are natural-language captions in metadata.csv / captions/*.json, while the Cosmos3 cookbook uses a structured prompt.json schema. Could you clarify exactly how the PAIBench-C captions were converted into Cosmos3 structured In particular:
Without those prompt files or the conversion recipe, the released specs/seed/control settings define the inference shape, but not an exact reproduction of the PAIBench-C table, because the prompt conditioning differs from the public PAIBench-C dataset. |
Closes NVIDIA/cosmos-framework#14
Adds a clarifying note to the transfer cookbook README addressing the two remaining questions from bhack on the PAIBench-C reproducibility issue:
--seed 2026as the canonical reference seedprompt.jsonformat shown inassets/*/The evaluation non-determinism concern is tracked separately at SHI-Labs/physical-ai-bench#7.