-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description
DeepMind recently released DiscoRL, a new framework for discovering and generalizing reinforcement learning algorithms automatically.
It aims to produce general-purpose RL algorithms that can adapt across different tasks and environments, rather than being tailored to a single setup.
Reference:
🔗 https://deepwiki.com/google-deepmind/disco_rl/5-contributing
📄 Original Paper(Nature2025)
👉Github
Motivation
Currently, ML-Agents provides several built-in RL algorithms (PPO, SAC, etc.), but all are manually designed.
Integrating or experimenting with DiscoRL could:
- Enable meta-RL or algorithm discovery within Unity environments.
- Provide a new research direction for users exploring automated RL.
- Potentially lead to more general, data-efficient learning agents.
This aligns with ML-Agents’ goal of being a flexible platform for both game AI and reinforcement learning research.
Proposed Implementation Ideas
- Add a new trainer module under mlagents/trainers/disco_rl/ following the pattern of existing algorithms (PPO, SAC).
- Allow users to toggle DiscoRL mode in trainer_config.yaml (e.g. trainer_type: disco_rl).
- Provide a simple Unity demo environment (like GridWorld or Walker) to test its behavior.
Possible Challenges
- DiscoRL is still experimental and may require adapting its meta-learning infrastructure.
- Integration would depend on PyTorch version compatibility and reproducibility.
- May require additional compute or environment abstractions for “algorithm search.”
Additional Context
If accepted, I’d be happy to help draft an initial implementation plan or contribute a prototype to explore feasibility.