Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion pufferlib/ocean/drive/drive.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ def __init__(
self.episode_length = episode_length
self.termination_mode = termination_mode
self.resample_frequency = resample_frequency
self._rng = np.random.default_rng(seed)
self.dynamics_model = dynamics_model
Comment on lines 127 to 129
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._rng is seeded from the constructor argument seed, but reset(seed=...) also accepts a seed and is used by PufferLib to control per-run determinism. Because self.tick staggering draws from self._rng, changing the reset() seed will not affect the stagger offset, which can be surprising for reproducibility. Consider deriving the stagger RNG from the reset() seed (or re-seeding _rng on reset) so reset(seed=...) fully controls stochastic reset behavior.

Copilot uses AI. Check for mistakes.
# reward randomization bounds
self.reward_bound_goal_radius_min = reward_bound_goal_radius_min
Expand Down Expand Up @@ -423,7 +424,13 @@ def __init__(

def reset(self, seed=0):
binding.vec_reset(self.c_envs, seed)
self.tick = 0
# Stagger initial tick so workers don't all resample maps at the same step.
# The first episode will be shorter than resample_frequency, but this
# desynchronizes resets across workers for the rest of training.
if self.resample_frequency > 0:
self.tick = int(self._rng.integers(self.resample_frequency))
Comment on lines +427 to +431
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset() now randomizes self.tick on every reset, but the PR description says to stagger only on the first reset per worker. If reset() is called at every episode boundary, this will repeatedly shorten the post-reset interval until the next resample (and may change training dynamics beyond the intended one-time desync). Consider gating this with a flag (e.g., only stagger once, then reset tick to 0 on subsequent resets).

Copilot uses AI. Check for mistakes.
else:
self.tick = 0
Comment on lines +430 to +433
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new behavior (staggering map resampling via randomized initial tick) that is central to training stability, but there’s no automated test asserting workers/env instances start with different ticks when resample_frequency > 0. Adding a small pytest regression test (e.g., create multiple Drive instances with different seeds and assert their initial tick values aren’t all equal, and/or that only the first reset staggers if that’s the intended behavior) would help prevent accidental re-synchronization regressions.

Copilot uses AI. Check for mistakes.
self.truncations[:] = 0
return self.observations, []

Expand Down
Loading