Add OSV recidivism enrichment pipeline, repository mirror tooling, and INI-based local configuration#1
Conversation
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/e751bef4-2b38-43c5-84cc-0419afe5499e Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/e751bef4-2b38-43c5-84cc-0419afe5499e Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/e751bef4-2b38-43c5-84cc-0419afe5499e Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
|
@copilot Have the scripts get their settings from a file called "recidivism.ini" that will have any local locations of inputs and outputs. Git should ignore recidivism.ini but we should have a recidivism.default.ini that has everything we need to configure. If you run the code without a recidivism.ini, output a message about what they should do to configure this locally |
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/a059b408-391c-467d-b495-6d9d3789015b Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
|
@copilot Don't actually try to calculate recidivism - instead put a placeholder there and remove the code that was guessing what vulnerability recidivism actually is. |
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/a059b408-391c-467d-b495-6d9d3789015b Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
Agent-Logs-Url: https://github.com/VulnerabilityHistoryProject/recidivism/sessions/a059b408-391c-467d-b495-6d9d3789015b Co-authored-by: andymeneely <341847+andymeneely@users.noreply.github.com>
Implemented in commit d146354. Both scripts now load settings from |
andymeneely
left a comment
There was a problem hiding this comment.
This is in the ballpark of what I'm looking for in terms of making our own copy of OSV data with recidivism in it. Let's start with running these and refining them.
This PR adds scripts for OSV-based research workflows: enriching vulnerability records with a recidivism-derived severity metric and materializing local clones of referenced source repositories. It also introduces shared parsing/scoring utilities, INI-driven local configuration, and focused unit coverage for both metric and configuration primitives.
OSV ingestion + recidivism enrichment
scripts/enrich_osv_recidivism.pyto:database_specific.recidivism,RECIDIVISMandRECIDIVISM_ADJUSTED.Shared vulnerability analysis primitives
scripts/osv_common.pywith reusable functions for:0.0..10.0),Repository cloning for cluster-local mirrors
scripts/clone_osv_repositories.pyto discover GitHub repos in OSV references and clone/update them locally.<target-dir>/<owner>/<repo>to avoid cross-owner name collisions.INI-based local configuration
recidivism.default.ini.scripts/recidivism_config.pyto load settings fromrecidivism.iniwith fallback to defaults..gitignoreto ignorerecidivism.ini.recidivism.iniis missing, scripts print guidance to copy and editrecidivism.default.ini.Documentation + focused tests
README.mdwith config setup and script usage.tests/test_osv_common.pycovering extraction behavior and recidivism score math edge cases.tests/test_recidivism_config.pycovering fallback behavior, path resolution, and required-value validation.Original prompt