action dataloader: episode-shuffle stream (fix DROID grad-norm instability)#37
Open
fwd4 wants to merge 3 commits into
Open
action dataloader: episode-shuffle stream (fix DROID grad-norm instability)#37fwd4 wants to merge 3 commits into
fwd4 wants to merge 3 commits into
Conversation
f786168 to
8eec346
Compare
Collaborator
|
LGTM |
lfengad
previously approved these changes
Jun 12, 2026
| "action_modality_embed", | ||
| ], | ||
| lr=2.0e-04, # matches internal droid_lerobot_8b_policy submit (--lr 2e-4) | ||
| lr=1.0e-04, # sqrt-scaled for 2048 global batch (internal 2e-4 was for 8192 = 4x) |
Collaborator
There was a problem hiding this comment.
Is this change intended? Our internal ablation showed that fixing lr to 2.0e-4 is a key to high policy success rate.
Collaborator
Author
There was a problem hiding this comment.
It is accidentally introduced in one of the resource-constrained experiment, now reverted
…ebased on main) Rebased onto current main. main NVIDIA#34 upstreamed the DROID dataset (joint_pos, use_state, keep-ranges filter, action_space) so droid_lerobot_dataset.py now carries only the get_shuffle_blocks helper grafted onto main's version; NVIDIA#29's recipe change (dropped /cluster override) is incorporated. Remaining contribution: action_policy_droid_nano recipe (mode=policy, lr=2e-4 @ 8192 global, max_num_tokens_after_packing=-1, scrubbed comments), the episode-shuffle stream (action_sft_dataset.py), the multi-node-capable SFT launcher (NNODES/NODE_RANK/MASTER_ADDR passthrough + EXTRA_TAIL_OVERRIDES), and the post-train doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
ae3db20 to
f34031a
Compare
…None joint_pos uses raw (un-normalized) joint actions, so DROIDLeRobotDataset sets action_normalization=None — but _build_result called normalize_action() unconditionally, which raises 'Unknown normalization method: None'. Guard it so None means raw actions (caught by a 2-node sanity run on the rebased branch). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
…reference The bare recipe trained with the NANO default loss_scale=1.0, weighting the vision flow-matching loss 10x lower than the Cosmos3-Nano-Policy-DROID reference (which uses 10.0). Set it post-construction so the recipe reproduces without launcher overrides. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The DROID action SFT dataloader trained with an unstable, slow-settling grad-norm (and a noisy action-loss plateau) vs the internal reference. Root cause: the DROID action dataset is map-style and — unlike the iterable vision
SFTDataset, which self-shuffles — does not shuffle, andRankPartitionedDataLoaderwraps it in aDataLoaderwith noshuffle, i.e. aSequentialSampler. Every rank then iterates the same consecutive, overlapping windows, so the all-reduced global batch is effectively ~1 episode → high gradient variance.(Forward + gradients were verified numerically equivalent to the internal model on identical input, so this was a data-path issue, not the model/loss/optimizer.)
Fix
ActionIterableShuffleDataset(iterable_shuffle=True): anIterableDatasetview of the map-style dataset that streams rank × worker-sharded, episode-order-shuffled, sequential-within-episode — decorrelated batches with sequential reads (preserves I/O locality + copy-on-write; a plainshuffle=True/RandomSamplerinstead does random-access I/O → ~11 min/iter and OOM from broken COW). Mirrors the internal iterable dataset's per-worker episode assignment.DROIDLeRobotDataset.get_shuffle_blocks()(per-episode/segment flat-index blocks the iterable streams).DataLoader/sampler change needed —IterableDatasetis handled natively (sampler=None).Validation (8192 global batch)
Per-component action loss converges to ~0.0055 (matches internal ~0.005; the no-shuffle run plateaued noisily at 0.03–0.07). Builds on #24 (recipe + FusedAdam optimizer).
🤖 Generated with Claude Code
Added commits (recipe correctness)
mode="policy"default —DROIDLeRobotDatasetdefaulted tomode="joint"(random forward_dynamics/inverse_dynamics/policy per sample), so the policy recipe was silently training multi-task.inverse_dynamicszeros the vision loss andforward_dynamicszeros the action loss, diluting each per-task loss by ~1/3 vs the policy-only internal run. Now defaults topolicy(matching i4'sDROIDLeRobotDataset);modeis also threaded throughget_action_droid_sft_dataset.max_num_tokens_after_packing=-1— uncaps the packed-sequence length (NANO default 45056) to match the internaldroid_lerobot_8brun, so the full vision sequence is processed per step. Does not change the per-token loss; widens the effective vision context per step.