Skip to content

Conversation

@arjavdongaonkar
Copy link
Contributor

@arjavdongaonkar arjavdongaonkar commented Nov 5, 2025

Description

New Batch Implementation Improvements (with approximate numbers)

Batch Mode Behavior

1. Component UUID lookups:
Instead of one lookup per component inside the loop → performed once
200 component queries → now 1 batched loop (≈200 ops total) (unchanged count but centralized)

2. Vulnerability existence checks:
Old: 3,000 “source + vulnId” lookups
New: Build a map of unique (source, vulnId) pairs, e.g. 3,000 unique →
Single getVulnerabilityByVulnIds() call
Replaces 3,000 DB queries with 1 batched query

3. Creating new vulnerabilities:
Still O(#missing vulns), but performed in one transaction scope.
Index event dispatched just once.

4. Adding vulnerability–component mappings:
Still requires writes, but all done in a tight loop inside one transaction.
No repeated component fetches
No repeated vulnerability lookups

Estimated totals (new code)

Operation Old New
Vulnerability lookups 3,000 queries 1 query
Component lookups 200 queries 200 lookups in a single block
Add vulnerability to component 3,000 writes 3,000 writes (unavoidable)
Severity updates 300 writes 300 writes
Index commits N commits 1 commit

Grand total (new code)

~200 component lookups (no change)
1 DB query instead of 3,000 for vuln existence checks
~3,300 writes (same as old, but in one transaction block)
Significant transaction overhead reduction


Impact Summary

Metric Old Code New Code
Vulnerability lookups ~3,000 DB roundtrips 1 DB roundtrip
IndexEvent dispatches many 1
Transactions opened ~200 (one per component) 1
Total DB operations ~6,500+ ~3,500 but batched and far cheaper

Overall effect

  • Dramatically less DB round-trip latency
  • Higher throughput during large Trivy scans
  • Lower CPU + I/O load on the database
  • More predictable performance for large SBOMs

Addressed Issue

closes #5497

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@owasp-dt-bot
Copy link

owasp-dt-bot commented Nov 5, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@arjavdongaonkar arjavdongaonkar force-pushed the optimize-trivy-analyzer branch from 350b850 to 5a85de4 Compare November 6, 2025 05:08
@codacy-production
Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 72dfb1a1 81.63% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (72dfb1a) Report Missing Report Missing Report Missing
Head commit (5a85de4) 24144 19529 80.89%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#5498) 98 80 81.63%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@nscuro nscuro added enhancement New feature or request integration/trivy Related to the Trivy integration labels Dec 4, 2025
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. A few change requests.

Also, it would be great if you could at least mention that AI was used for this, and to what capacity. The PR description was clearly written by AI, and makes various claims that are not reflected in the code.

handleAllComponentsBatchOptimized(vulnsByComponent);
}

private void handleAllComponentsBatchOptimized(final Map<Component, List<trivy.proto.common.Vulnerability>> vulnsByComponent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a method name that is a bit more concise? "Batch omtimized" is an implementation detail.

Comment on lines +386 to +389
Component persistent = qm.getObjectByUuid(Component.class, uuid);
if (persistent != null) {
persistentComponents.put(uuid, persistent);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fetch the components in batches as well?

Comment on lines +814 to +817
public List<Vulnerability> getVulnerabilityByVulnIds(Map<String, Set<String>> sourceToVulnIds,
boolean includeVulnerableSoftware) {
return getVulnerabilityQueryManager().getVulnerabilitiesBySourceAndVulnIds(sourceToVulnIds, includeVulnerableSoftware);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method names are different: getVulnerabilityByVulnIds vs. getVulnerabilitiesBySourceAndVulnIds.

List<Vulnerability> fetchedVulns = qm.getVulnerabilityByVulnIds(vulnIds, false);

for (Vulnerability vuln : fetchedVulns) {
existingVulns.put(vuln.getSource() + ":" + vuln.getVulnId(), vuln);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use VulnIdAndSource as key instead.

Comment on lines +439 to +441

@SuppressWarnings("unchecked")
List<Vulnerability> fetched = (List<Vulnerability>) query.executeWithMap(paramValues);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@SuppressWarnings("unchecked")
List<Vulnerability> fetched = (List<Vulnerability>) query.executeWithMap(paramValues);
query.setNamedParameters(paramValues);
List<Vulnerability> fetched = executeAndCloseList(query);

Comment on lines +447 to +453
@SuppressWarnings("unchecked")
List<Vulnerability> detached = (List<Vulnerability>) pm.detachCopyAll(fetched);

// Safely apply aliases
for (Vulnerability vuln : detached) {
vuln.setAliases(getVulnerabilityAliases(vuln));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliases are not required for the Trivy analysis task. Detaching has a negative effect for the Trivy task because it then has to immediately attach the records back in order to perform persistence operations (e.g., adding a component) with them.

Comment on lines +395 to +396
Map<String, Set<String>> sourceToVulnIds,
boolean includeVulnerableSoftware) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider accepting Collection<VulnIdAndSource> here instead, and omitting the includeVulnerableSoftware parameter. VulnerableSoftware records are not required for the Trivy analysis task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request integration/trivy Related to the Trivy integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trivy Analyzer Performs Excessive Repeated DB Queries During Vulnerability Ingestion

3 participants