-
-
Notifications
You must be signed in to change notification settings - Fork 699
feat: add batch processing for Trivy vulnerabilities and new vulnerab… #5498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: add batch processing for Trivy vulnerabilities and new vulnerab… #5498
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
ab7d82c to
609d22f
Compare
…ility retrieval method Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
Signed-off-by: Arjav <[email protected]>
350b850 to
5a85de4
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesFootnotes
|
nscuro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. A few change requests.
Also, it would be great if you could at least mention that AI was used for this, and to what capacity. The PR description was clearly written by AI, and makes various claims that are not reflected in the code.
| handleAllComponentsBatchOptimized(vulnsByComponent); | ||
| } | ||
|
|
||
| private void handleAllComponentsBatchOptimized(final Map<Component, List<trivy.proto.common.Vulnerability>> vulnsByComponent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a method name that is a bit more concise? "Batch omtimized" is an implementation detail.
| Component persistent = qm.getObjectByUuid(Component.class, uuid); | ||
| if (persistent != null) { | ||
| persistentComponents.put(uuid, persistent); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not fetch the components in batches as well?
| public List<Vulnerability> getVulnerabilityByVulnIds(Map<String, Set<String>> sourceToVulnIds, | ||
| boolean includeVulnerableSoftware) { | ||
| return getVulnerabilityQueryManager().getVulnerabilitiesBySourceAndVulnIds(sourceToVulnIds, includeVulnerableSoftware); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method names are different: getVulnerabilityByVulnIds vs. getVulnerabilitiesBySourceAndVulnIds.
| List<Vulnerability> fetchedVulns = qm.getVulnerabilityByVulnIds(vulnIds, false); | ||
|
|
||
| for (Vulnerability vuln : fetchedVulns) { | ||
| existingVulns.put(vuln.getSource() + ":" + vuln.getVulnId(), vuln); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use VulnIdAndSource as key instead.
|
|
||
| @SuppressWarnings("unchecked") | ||
| List<Vulnerability> fetched = (List<Vulnerability>) query.executeWithMap(paramValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @SuppressWarnings("unchecked") | |
| List<Vulnerability> fetched = (List<Vulnerability>) query.executeWithMap(paramValues); | |
| query.setNamedParameters(paramValues); | |
| List<Vulnerability> fetched = executeAndCloseList(query); |
| @SuppressWarnings("unchecked") | ||
| List<Vulnerability> detached = (List<Vulnerability>) pm.detachCopyAll(fetched); | ||
|
|
||
| // Safely apply aliases | ||
| for (Vulnerability vuln : detached) { | ||
| vuln.setAliases(getVulnerabilityAliases(vuln)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aliases are not required for the Trivy analysis task. Detaching has a negative effect for the Trivy task because it then has to immediately attach the records back in order to perform persistence operations (e.g., adding a component) with them.
| Map<String, Set<String>> sourceToVulnIds, | ||
| boolean includeVulnerableSoftware) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider accepting Collection<VulnIdAndSource> here instead, and omitting the includeVulnerableSoftware parameter. VulnerableSoftware records are not required for the Trivy analysis task.
Description
New Batch Implementation Improvements (with approximate numbers)
Batch Mode Behavior
1. Component UUID lookups:
Instead of one lookup per component inside the loop → performed once
200 component queries → now 1 batched loop (≈200 ops total) (unchanged count but centralized)
2. Vulnerability existence checks:
Old: 3,000 “source + vulnId” lookups
New: Build a map of unique
(source, vulnId)pairs, e.g. 3,000 unique →Single
getVulnerabilityByVulnIds()callReplaces 3,000 DB queries with 1 batched query
3. Creating new vulnerabilities:
Still O(#missing vulns), but performed in one transaction scope.
Index event dispatched just once.
4. Adding vulnerability–component mappings:
Still requires writes, but all done in a tight loop inside one transaction.
No repeated component fetches
No repeated vulnerability lookups
Estimated totals (new code)
Grand total (new code)
~200 component lookups (no change)
1 DB query instead of 3,000 for vuln existence checks
~3,300 writes (same as old, but in one transaction block)
Significant transaction overhead reduction
Impact Summary
Overall effect
Addressed Issue
closes #5497
Checklist