Skip to content

Conversation

@RidwanAdebosin
Copy link
Contributor

@RidwanAdebosin RidwanAdebosin commented Oct 31, 2025

DQN GridWorld Demo Notebook with Visualization

This PR adds a step-by-step Markdown notebook demonstrating how to set up, train, and visualize a Deep Q-Network (DQN) agent in Fehu using a simple GridWorld environment.

Features:

  • Clear instructions for environment setup, agent creation, and training
  • Automatic video recording of agent behavior before and after training
    • Videos are generated in [/run] directory and embedded in the notebook for quick visualization
    • Agent Behavior Video (Before Training)
  • Agent Behavior Video (After Training)
  • Metrics tracking: episode rewards, episode length, loss, epsilon, and average Q-value
  • Standalone OCaml plotting script for visualizing all key metrics
  • Notebook hosted in www.raven.com/docs/fehu/dqn-demo.md
  • Troubleshooting and summary sections for reproducibility

Visualizations:

  • Episode rewards plot
  • Episode length plot
  • Loss curve
  • Epsilon schedule
  • Average Q-value plot
  • All code and documentation follow Raven’s philosophy of minimalism, clarity, and principled design.

Visualizations in the Demo

Episode Rewards

Episode Rewards

This plot shows the total reward per episode.
A rising or stable reward curve indicates successful learning.

Episode Length

Episode Length

This plot shows the number of steps taken in each episode.
A decreasing or stable episode length indicates the agent is learning to reach the goal more efficiently.

Loss Curve

Loss

This plot shows the DQN loss over episodes.
A decreasing loss suggests the Q-network is learning to predict better action values.

Epsilon Schedule

Epsilon

This plot shows the epsilon value used for exploration.
Epsilon decays over time, meaning the agent explores less and exploits more as training progresses.

Average Q-value

Average Q

This plot shows the average Q-value per episode.
Tracking average Q helps diagnose learning stability and value estimation quality.


Implementation Notes

  • Due to MDX limitations, plot images are generated using the Hugin library in a standalone OCaml script or utop, not directly inside MDX code blocks.
  • Workflow:
    • The notebook contains the plotting code, but users must run it outside MDX to generate the image file.
    • The image should be placed in the same directory as the notebook for markdown preview.

References

- Implemented a new demo for training a DQN agent in a GridWorld environment.
- Added a script to plot training metrics from CSV data.
- Removed the old DQN training example.
- Updated DQN algorithm files to support new features.
- Created tests for the new plotting functionality.
- Added documentation for the DQN GridWorld demo, including setup and visualization instructions.
- Included sample videos of agent behavior before and after training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a step-by-step pedagogical demo for DQN, including training and visualization

1 participant