Skip to content

Rainforests training plugins#2244

Open
lukehoffmann wants to merge 25 commits into
metoppv:masterfrom
lukehoffmann:rainforests_training
Open

Rainforests training plugins#2244
lukehoffmann wants to merge 25 commits into
metoppv:masterfrom
lukehoffmann:rainforests_training

Conversation

@lukehoffmann
Copy link
Copy Markdown

Description

Adds some plugins to perform the basic tasks for training RainForests calibration models.

  • TrainRainForestsModel to generate the LightGBM tree, and
  • CompileRainForestsModel to compile these with Treelite

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

CLA

  • If a new developer, signed up to CLA

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.22%. Comparing base (84a8944) to head (fb7bce7).
⚠️ Report is 152 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2244      +/-   ##
==========================================
- Coverage   98.39%   95.22%   -3.17%     
==========================================
  Files         124      151      +27     
  Lines       12212    15237    +3025     
==========================================
+ Hits        12016    14510    +2494     
- Misses        196      727     +531     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lukehoffmann lukehoffmann added the BoM review required PRs opened by non-BoM developers that require a BoM review label Dec 7, 2025
@lukehoffmann lukehoffmann marked this pull request as ready for review December 7, 2025 23:30
Copy link
Copy Markdown
Contributor

@benowen-bom benowen-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a brief look over your PR. I think this is a really good place to build up from and the logic you have there is sound.

I have some broad comments which would be good to put up front before committing to a detailed review. The first is a suggestion, and the other question, both relating to structural considerations which would be good to pin down now.

Suggestion:
the calibration works by building a series of models defined over a set of thresholds. I would suggest that threshold argument in process function within TrainRainForestsModel become a list of thresholds thresholds and add a loop over these thresholds to write out a series of files. I appreciate that producing a series of files is not the IMPROVER norm, but ideally we get a series of models to map out the CDF across multiple values

Question:
do you think there is any value in bringing the compilation within the TrainRainForestsModel as add a flag to the process function to compile? That would simplify the issue of having to run the compilation over many files and would ensure there is a 1-1 mapping for the Light-GBM files and the compiled treelite files.

Comment thread improver/calibration/rainforest_compilation.py Outdated
@lukehoffmann
Copy link
Copy Markdown
Author

lukehoffmann commented Jan 21, 2026

I've had a brief look over your PR. I think this is a really good place to build up from and the logic you have there is sound.

Thanks @benowen-bom, I've finally got back to this now.

Suggestion: the calibration works by building a series of models defined over a set of thresholds. I would suggest that threshold argument in process function within TrainRainForestsModel become a list of thresholds thresholds and add a loop over these thresholds to write out a series of files. I appreciate that producing a series of files is not the IMPROVER norm, but ideally we get a series of models to map out the CDF across multiple values

That was easy enough to change and looks okay. I've just had to add the threshold to each filename to make them unique.

Question: do you think there is any value in bringing the compilation within the TrainRainForestsModel as add a flag to the process function to compile? That would simplify the issue of having to run the compilation over many files and would ensure there is a 1-1 mapping for the Light-GBM files and the compiled treelite files.

It still makes sense to me to keep a dedicated class for compilation. I've tried adding an option to pass this object into the training class and optionally compile at the same time as training. Let me know what you think.
Update: I've ended up removing this option to train and compile in one step.. It didn't make sense to me when I went through the worked example in Forecast-Improvement/raincal_lite.

Copy link
Copy Markdown
Contributor

@benowen-bom benowen-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is only a partial review: I'll review the tests next, but want to make sure I don't lose comments so far.

Comment thread improver/calibration/rainforest_compilation.py
Comment thread improver/calibration/rainforest_compilation.py Outdated
Comment thread improver/calibration/rainforest_compilation.py Outdated
if output_path.suffix.lower() != TREELITE_EXTENSION:
raise ValueError(f"Output path must have extension {TREELITE_EXTENSION}")

Path.mkdir(output_path.parent, parents=True, exist_ok=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this should occur here, or whether it should be assumed that the dir exists; generally directory creation sits outside of improver (ie handled by the suite). That being said, the paradigm of rainforests is somewhat exceptional, and this approach simplifies things significantly, particularly when files across lead-times or threshold reside in separate folders.

Leave this as is, but I would be interested to hear what other think.

Comment thread improver/calibration/rainforest_training.py Outdated
Comment thread improver/calibration/rainforest_training.py Outdated
Comment thread improver/calibration/rainforest_training.py Outdated
Comment thread improver/calibration/rainforest_training.py Outdated
Copy link
Copy Markdown
Contributor

@benowen-bom benowen-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally finished reviewing the rest of the PR. You've done a remarkable job at taking what was in raincal and producing something that is generic and flexible for general calibration problems.

I've got a few comments that would be worth considering. Once we've worked through these I'm happy to approve, but I think it would be worthwhile getting someone else (more technically savvy) to review.

Comment thread improver_tests/calibration/rainforests_training/conftest.py
Comment thread improver_tests/calibration/rainforests_training/test_CompileRainForestsModel.py Outdated
@lukehoffmann
Copy link
Copy Markdown
Author

During the course of @benowen-bom's review, I've also gone through the exercise of testing the plugins in a raincal-like workflow. This has resulted in a highly simplified version of the raincal training scripts, which I've added to this repo Forecast-Improvement/raincal_lite.

During this process, I changed the design of the plugins slightly. They now accept the same type of "model_config_dict" object as the ApplyRainForestsCalibration plugins. This specifies the full set of lead times and thresholds, and the full paths for all of the trained and compiled model files.

After this PR is merged, we will need further changes to provide the acceptance tests and documentation for these plugins. I've intentionally kept those out of this PR to limit the amount of changes to be reviewed at once, and to allow the design to continue to change as needed during this review.

Comment thread improver/calibration/rainforest_compilation.py
Comment thread improver/calibration/rainforest_training.py
Copy link
Copy Markdown
Contributor

@benowen-bom benowen-bom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lukehoffmann for working through my suggestions. I've had a look over the changes and I'm happy to approve the PR now, particularly the additions to the tests. I appreciate that the nature of these routines make it tricky to test, but I think where this has landed is a good balance between thorough without being excessive.

Great work! I think it would be good to have another review (I see that you have requested a review from @dmentipl).

@benowen-bom
Copy link
Copy Markdown
Contributor

benowen-bom commented May 19, 2026

Also, I think @2384 address the failing check above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BoM review required PRs opened by non-BoM developers that require a BoM review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants