feat(demo): Add step-by-step DQN GridWorld demo with training and visualization #139

RidwanAdebosin · 2025-10-31T14:18:51Z

DQN GridWorld Demo Notebook with Visualization

This PR adds a step-by-step Markdown notebook demonstrating how to set up, train, and visualize a Deep Q-Network (DQN) agent in Fehu using a simple GridWorld environment.

Features:

Clear instructions for environment setup, agent creation, and training
Automatic video recording of agent behavior before and after training
- Videos are generated in [/run] directory and embedded in the notebook for quick visualization
- Agent Behavior Video (Before Training)
Agent Behavior Video (After Training)
Metrics tracking: episode rewards, episode length, loss, epsilon, and average Q-value
Standalone OCaml plotting script for visualizing all key metrics
Notebook hosted in www.raven.com/docs/fehu/dqn-demo.md
Troubleshooting and summary sections for reproducibility

Visualizations:

Episode rewards plot
Episode length plot
Loss curve
Epsilon schedule
Average Q-value plot
All code and documentation follow Raven’s philosophy of minimalism, clarity, and principled design.

Visualizations in the Demo

Episode Rewards

This plot shows the total reward per episode.
A rising or stable reward curve indicates successful learning.

Episode Length

This plot shows the number of steps taken in each episode.
A decreasing or stable episode length indicates the agent is learning to reach the goal more efficiently.

Loss Curve

This plot shows the DQN loss over episodes.
A decreasing loss suggests the Q-network is learning to predict better action values.

Epsilon Schedule

This plot shows the epsilon value used for exploration.
Epsilon decays over time, meaning the agent explores less and exploits more as training progresses.

Average Q-value

This plot shows the average Q-value per episode.
Tracking average Q helps diagnose learning stability and value estimation quality.

Implementation Notes

Due to MDX limitations, plot images are generated using the Hugin library in a standalone OCaml script or utop, not directly inside MDX code blocks.
Workflow:
- The notebook contains the plotting code, but users must run it outside MDX to generate the image file.
- The image should be placed in the same directory as the notebook for markdown preview.

References

Closes: Add a step-by-step pedagogical demo for DQN, including training and visualization #130
Extends: save/load implementation from Implement save/load for Fehu's DQN agent #106

… image

…lot instructions

…heatmap and reward logging

- Implemented a new demo for training a DQN agent in a GridWorld environment. - Added a script to plot training metrics from CSV data. - Removed the old DQN training example. - Updated DQN algorithm files to support new features. - Created tests for the new plotting functionality. - Added documentation for the DQN GridWorld demo, including setup and visualization instructions. - Included sample videos of agent behavior before and after training.

…documentation

RidwanAdebosin added 10 commits October 30, 2025 21:56

feat(demo): Add DQN GridWorld tutorial and example plotting

74b545e

fix(demo): Remove unnecessary markdown code block for episode rewards…

31af733

… image

feat(changelog): Add DQN GridWorld demo notebook and episode reward p…

0e15874

…lot instructions

feat(demo): Implement DQN training environment with state visitation …

d79461f

…heatmap and reward logging

Update changelog for DQN agent serialization

b5cf6c2

feat(test): Add DQN plotting utilities and remove obsolete plot_test.ml

a58349b

feat(test): Add trajectory plotting and animation for DQN agent

4fae08b

refactor(test): Remove obsolete DQN plotting scripts

696e3a0

refactor(demo): Remove unnecessary blank lines in DQN GridWorld demo …

218b087

…documentation

tmattio force-pushed the main branch from e78f616 to 36f1926 Compare November 3, 2025 09:14

Merge branch 'main' into new-ridwan-adebosin-feat-dqn-demo-notebook

26d370c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(demo): Add step-by-step DQN GridWorld demo with training and visualization #139

feat(demo): Add step-by-step DQN GridWorld demo with training and visualization #139

Uh oh!

RidwanAdebosin commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(demo): Add step-by-step DQN GridWorld demo with training and visualization #139

Are you sure you want to change the base?

feat(demo): Add step-by-step DQN GridWorld demo with training and visualization #139

Uh oh!

Conversation

RidwanAdebosin commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DQN GridWorld Demo Notebook with Visualization

Features:

Visualizations:

Visualizations in the Demo

Episode Rewards

Episode Length

Loss Curve

Epsilon Schedule

Average Q-value

This plot shows the average Q-value per episode. Tracking average Q helps diagnose learning stability and value estimation quality.

Implementation Notes

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RidwanAdebosin commented Oct 31, 2025 •

edited

Loading

This plot shows the average Q-value per episode.
Tracking average Q helps diagnose learning stability and value estimation quality.