Rainforests training plugins#2244
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2244 +/- ##
==========================================
- Coverage 98.39% 95.22% -3.17%
==========================================
Files 124 151 +27
Lines 12212 15237 +3025
==========================================
+ Hits 12016 14510 +2494
- Misses 196 727 +531 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
benowen-bom
left a comment
There was a problem hiding this comment.
I've had a brief look over your PR. I think this is a really good place to build up from and the logic you have there is sound.
I have some broad comments which would be good to put up front before committing to a detailed review. The first is a suggestion, and the other question, both relating to structural considerations which would be good to pin down now.
Suggestion:
the calibration works by building a series of models defined over a set of thresholds. I would suggest that threshold argument in process function within TrainRainForestsModel become a list of thresholds thresholds and add a loop over these thresholds to write out a series of files. I appreciate that producing a series of files is not the IMPROVER norm, but ideally we get a series of models to map out the CDF across multiple values
Question:
do you think there is any value in bringing the compilation within the TrainRainForestsModel as add a flag to the process function to compile? That would simplify the issue of having to run the compilation over many files and would ensure there is a 1-1 mapping for the Light-GBM files and the compiled treelite files.
Thanks @benowen-bom, I've finally got back to this now.
That was easy enough to change and looks okay. I've just had to add the threshold to each filename to make them unique.
It still makes sense to me to keep a dedicated class for compilation. |
This is simpler than generating file paths within the plugin
benowen-bom
left a comment
There was a problem hiding this comment.
Sorry, this is only a partial review: I'll review the tests next, but want to make sure I don't lose comments so far.
| if output_path.suffix.lower() != TREELITE_EXTENSION: | ||
| raise ValueError(f"Output path must have extension {TREELITE_EXTENSION}") | ||
|
|
||
| Path.mkdir(output_path.parent, parents=True, exist_ok=True) |
There was a problem hiding this comment.
I'm not sure whether this should occur here, or whether it should be assumed that the dir exists; generally directory creation sits outside of improver (ie handled by the suite). That being said, the paradigm of rainforests is somewhat exceptional, and this approach simplifies things significantly, particularly when files across lead-times or threshold reside in separate folders.
Leave this as is, but I would be interested to hear what other think.
benowen-bom
left a comment
There was a problem hiding this comment.
I've finally finished reviewing the rest of the PR. You've done a remarkable job at taking what was in raincal and producing something that is generic and flexible for general calibration problems.
I've got a few comments that would be worth considering. Once we've worked through these I'm happy to approve, but I think it would be worthwhile getting someone else (more technically savvy) to review.
|
During the course of @benowen-bom's review, I've also gone through the exercise of testing the plugins in a raincal-like workflow. This has resulted in a highly simplified version of the raincal training scripts, which I've added to this repo Forecast-Improvement/raincal_lite. During this process, I changed the design of the plugins slightly. They now accept the same type of " After this PR is merged, we will need further changes to provide the acceptance tests and documentation for these plugins. I've intentionally kept those out of this PR to limit the amount of changes to be reviewed at once, and to allow the design to continue to change as needed during this review. |
benowen-bom
left a comment
There was a problem hiding this comment.
Thanks @lukehoffmann for working through my suggestions. I've had a look over the changes and I'm happy to approve the PR now, particularly the additions to the tests. I appreciate that the nature of these routines make it tricky to test, but I think where this has landed is a good balance between thorough without being excessive.
Great work! I think it would be good to have another review (I see that you have requested a review from @dmentipl).
|
Also, I think @2384 address the failing check above. |
Description
Adds some plugins to perform the basic tasks for training RainForests calibration models.
TrainRainForestsModelto generate the LightGBM tree, andCompileRainForestsModelto compile these with TreeliteTesting:
CLA