[BIO-303] support LoRA in evo2 mbridge#1550
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
2b8e0ea to
306f0e4
Compare
306f0e4 to
b1e54ff
Compare
jstjohn
left a comment
There was a problem hiding this comment.
Looks good but the biggest gap I see is around documenting how to ship and ingest a LORA checkpoint. Is it separate from the rest of the model or is it a combined checkpoint with everything in it including the original weights? I'd say call this out in the docs etc so people know what to expect. The other gap I see is the head swap use case which is pretty common for fine tuning. For now you could mention theoretically how you would do it. I think it would end up being a new model type or a new training script that has the desired structure in code, then I'm not 100% how to load the existing weights into it while handling the missing new head weight or changed shape gracefully, maybe it's writing a new checkpoint converter that does the necessary weight mugging? Anyways calling out in the readme how that would be done will probably save us support time long term.
Thanks!
Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>
b1e54ff to
37ee6ef
Compare
Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>
affb95a to
66da18e
Compare
Signed-off-by: Bruno Alvisio <balvisio@nvidia.com>
66da18e to
7775b41
Compare
|
@jstjohn I added the documentation on how LoRA adapters are saved and how to use them along with the base checkpoint for inference. For the modification of the model head I will add as part of the next PR that contains a LoRA focused Jupyter notebook. |
Description
Usage
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.
Unit tests marked as
@pytest.mark.multi_gpuor@pytest.mark.distributedare not run in the PR pipeline.For more details, see CONTRIBUTING
Note
By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.
Authorizing CI Runs
We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
/ok to testcomment on the pull request to trigger CI. This will need to be done for each new commit.Triggering Code Rabbit AI Review
To trigger a code review from code rabbit, comment on a pull request with one of these commands:
See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.
Pre-submit Checklist