Save attached outputs for compile-only cache entries#394
Conversation
Updates Based on Reviewer FeedbackI've improved this PR to address the concerns raised by @AlexanderAshitkin in PR #176's review: Code Improvements
Addressing "Save/Restore Consistency"The reviewer's concern about consistency between save and restore operations is addressed: Save side (this PR):
Restore side (existing code):
This design ensures:
TestingThe integration test validates the complete workflow: ./mvnw -Prun-its -Dit.test=Issue393CompileRestoreTest verifyTest coverage includes:
|
55c6e7a to
489ff03
Compare
| final boolean hasPackagePhase = project.hasLifecyclePhase("package"); | ||
|
|
||
| attachGeneratedSources(project); | ||
| attachOutputs(project); |
There was a problem hiding this comment.
Conditioning this logic on package served two main purposes:
-
It prevented premature caching of outputs if the corresponding lifecycles have not run. Doing otherwise could lead to costly and incorrect caching or restoration of the previous branch's compilation output. Here’s a problematic sequence:
- Compile Branch A and then checkout Branch B.
- Run
process-resourcesin Branch B. Result: the compiled sources of Branch A are cached, under checksum for Branch B. - Run the compilation in Branch B. Result: the compiled classes from Branch A are restored in Branch B, potentially interfering with the compilation of Branch B.
Although such cached entries can be replaced later, it is still not desirable.
-
Conditioning on
packagealso reduces the number of caching and unpacking operations. Specifically, it avoids the need to zip, download, or upload files during every compilation, which helps maintain performance. When an engineer is actively working on the code, is it really beneficial to cache intermediate results? It is challenging to estimate how useful this would be in practice, and I assume it won’t always be desirable.
Just thinking aloud, from the user's perspective, intentionally invoking restoration seems to offer better usability. In that sense, Gradle cache model is also flawed - saving a zip of classes with every compile task seems excessive and results in the io/cpu spent on creation of unnecessary cache entries that contain intermediate compilation results. In that sense package is not that bad - it allows to throttle costly operation and run it together with other costly IO work.
There was a problem hiding this comment.
Ah! That makes much more sense. Thank you for clarifying why this logic was conditioned no the package phase.
To tackle this problem, I've combined timestamp-based filtering with per-project thread-safe isolation to prevent both the branch-switching scenario and race conditions in multi-threaded builds.
How It Works
1. Timestamp Filtering
The cache now only captures outputs that were either:
- Modified during the current build (timestamp >= buildStartTime), OR
- Restored from cache during this build (tracked explicitly)
This elegantly handles your branch-switching scenario:
- mvn compile on Branch A at 10:00:00 → target/classes modified at 10:00:01
- git checkout branch-b
- mvn process-resources on Branch B starts at 10:01:00
- Check target/classes: lastModified (10:00:01) < buildStartTime (10:01:00)
AND not restored this build → SKIPPED ✓
2. Handling Cache Hits
Your question about mvn package with a compile cache hit is crucial. The solution tracks restored outputs:
- First build: mvn compile at 10:00:00
- Compiles classes, caches them ✓
- Second build: mvn package at 10:05:00
- Compile phase: Cache hit, restores to target/classes
- Restoration tracked: classifier added to state.restoredOutputClassifiers
- Package phase save() checks:
- Fresh package outputs: timestamp >= 10:05:00 → include ✓
- Restored compile outputs: in restoredOutputClassifiers → include ✓
- Stale old outputs: timestamp < 10:05:00 AND not restored → exclude ✓
3. Thread Safety for Multi-Threaded Builds
Per your comment about multi-threaded builds, I've also fixed the existing thread safety issues:
Problem: CacheControllerImpl is @SessionScoped (one instance shared across all modules). The original code used:
private final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
private int attachedResourceCounter = 0;With mvn -T 4, calling clear() in one thread would affect other threads' modules.
Solution: Per-project isolation using ConcurrentHashMap:
private static class ProjectCacheState {
final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
int attachedResourceCounter = 0;
final Set<String> restoredOutputClassifiers = new HashSet<>();
}
private final ConcurrentMap<String, ProjectCacheState> projectStates = new ConcurrentHashMap<>();Each module gets isolated state. Cleanup happens per-project in save()'s finally block.
Implementation Details
In CacheControllerImpl.java:
- Capture build start time:
final long buildStartTime = session.getRequest().getStartTime().getTime(); - Track restored outputs in restoreProjectArtifacts():
state.restoredOutputClassifiers.add(attachedArtifactInfo.getClassifier()); - Check timestamps in
attachDirIfNotEmpty():
long lastModified = Files.getLastModifiedTime(candidateSubDir).toMillis();
boolean isRestoredThisBuild = state.restoredOutputClassifiers.contains(classifier);
if (lastModified < buildStartTime && !isRestoredThisBuild) {
// Skip stale outputs
return;
}Testing
All existing tests pass, and the approach has been validated through:
- Compilation verification
- Full test suite execution
- Thread safety analysis of concurrent access patterns
Let me know if you have any concerns about this approach or if you'd like me to add additional safeguards or test cases.
Thread Safety Improvements: - Replace HashMap with ConcurrentHashMap for thread-safe access - Implement per-project isolation for attachedResourcesPathsById - Implement per-project counters to prevent race conditions - Remove clear() pattern that caused races in multi-threaded builds Each project now has isolated state (project key → resources map), preventing cross-module contamination and race conditions in `mvn -T` builds. Configuration Property: - Add saveCompileOutputs property (default: true) - Allows users to control compile-phase caching behavior - Provides opt-out for users wanting reduced I/O during development - Default fixes JPMS module descriptor restoration bug Addresses reviewer feedback on PR apache#394: 1. Thread safety concern with shared HashMap 2. Performance/design concerns about compile-phase caching
2bf5f01 to
92822da
Compare
Addressing Performance ConcernsThank you for raising the important question about performance overhead during active development. You're absolutely right that caching compile outputs on every
Solution: Configurable Compile CachingI've added a new property # Disable compile caching during active development:
mvn clean compile -Dmaven.build.cache.cacheCompile=false
# Enable for CI/multi-module scenarios (default):
mvn clean compileDesign RationaleDefault: true - Maintains the fix for issue #393 without breaking changes. Users experiencing compile-only cache restoration issues get the fix automatically. Opt-out available - Developers actively editing code can disable compile caching to avoid overhead, while still benefiting from package-phase caching. Use CasesWhen to keep enabled (default):
When to disable:
ImplementationThe property gates the calls to if (cacheConfig.isCacheCompile()) {
attachGeneratedSources(project, state, buildStartTime);
attachOutputs(project, state, buildStartTime);
}This means:
TestingAdded
Ready for your review! |
26af745 to
3126fe1
Compare
|
If I understand correctly, all of this is for being able to use the cache with |
elharo
left a comment
There was a problem hiding this comment.
need to run mvn spotless:apply
- Upgrade spotless-maven-plugin from 2.44.5 to 3.0.0 - Upgrade palantir-java-format from 2.56.0 to 2.81.0 for Java 25 support - Run mvn spotless:apply to format code Addresses review feedback on PR apache#394.
7681019 to
a421356
Compare
|
I fixed the build failures. At the same time, I introduced a change that requires your review. To protect against caching stale artifacts caused by:
I implemented a "staging directory" approach. The performance impact should be minimal as the staging operation only occurs for uncached builds and uses fast filesystem moves rather than copies. Cached builds are unaffected. How it worksBefore the build: Pre-existing artifacts are moved to After This approach is more robust than timestamp comparison, as it physically separates potentially stale artifacts rather than relying on heuristics. |
The null/ directory was accidentally created and incorrectly added to .gitignore. This removes the incorrect gitignore entry. Addresses review: apache#394 (review)
Remove blank lines flagged as 'not needed' in code review: - BuildCacheMojosExecutionStrategy.java: after restoreProjectArtifacts call - LifecyclePhasesHelper.java: after for loop opening brace Addresses: apache#394
The logging level was changed from INFO to DEBUG for cache-related messages.
Tests that verify log output need to enable debug logging with --debug flag
to see these messages.
Added verifier.addCliOption("--debug") to:
- MandatoryCleanTest.simple()
- MandatoryCleanTest.disabledViaProperty() (after each getCliOptions().clear())
- BuildExtensionTest.skipSaving()
|
Thanks @erik-meuwese-topicus! Fixed. Can you please rerun the build? |
elharo
left a comment
There was a problem hiding this comment.
Hmm, is this whole cache system assuming git? Does it it work if a different source management system is used?
Address reviewer feedback that the cache system shouldn't assume git. The staging directory mechanism works with any SCM (or no SCM at all). Changes: - Rename GitCheckoutStaleArtifactTest → StaleArtifactTest - Rename GitCheckoutStaleMultimoduleTest → StaleMultimoduleArtifactTest - Rename test project directories (git-checkout-* → stale-*) - Update comments to say 'source changes' instead of 'git checkout' - Update JavaDoc to use generic terms like 'version A/B' and 'branch switch' instead of git-specific terminology
|
@elharo As far as I can tell, nothing in the PR's implementation is git-specific but I've committed an update that removes any git-specific terminology. Please let me know if I've missed anything. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR fixes incomplete cache restoration from compile-only builds by ensuring that configured outputs (e.g., classes, test-classes, module-info) are always attached and saved to the cache. The key change removes the conditional that previously limited output attachment to the package phase, enabling compile-only builds to create restorable cache entries.
- Introduces per-project cache state management for thread safety
- Implements staging directory mechanism to prevent caching stale artifacts
- Adds
maven.build.cache.cacheCompileflag to control compile-phase caching
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java | Core changes to save logic, artifact staging/restoration, and state management |
| src/main/java/org/apache/maven/buildcache/xml/CacheConfig.java | Adds isCacheCompile() method to configuration interface |
| src/main/java/org/apache/maven/buildcache/xml/CacheConfigImpl.java | Implements CACHE_COMPILE property with default true |
| src/main/java/org/apache/maven/buildcache/CacheController.java | Adds interface methods for staging/restoring artifacts |
| src/main/java/org/apache/maven/buildcache/BuildCacheMojosExecutionStrategy.java | Integrates artifact staging into build execution flow |
| src/main/mdo/build-cache-build.mdo | Adds isDirectory field to artifact metadata |
| src/test/java/org/apache/maven/buildcache/its/Issue393CompileRestoreTest.java | Integration test validating compile-only cache restoration |
| src/test/java/org/apache/maven/buildcache/its/CacheCompileDisabledTest.java | Tests behavior when cacheCompile flag is disabled |
| src/test/java/org/apache/maven/buildcache/its/StaleArtifactTest.java | Tests prevention of stale artifact caching |
| src/test/java/org/apache/maven/buildcache/its/StaleMultimoduleArtifactTest.java | Tests stale artifact handling in multi-module projects |
| src/test/java/org/apache/maven/buildcache/its/MandatoryCleanTest.java | Adds debug logging to existing test |
| src/test/java/org/apache/maven/buildcache/its/BuildExtensionTest.java | Adds debug logging to existing test |
| src/test/projects/issue-393-compile-restore/* | Test project files for Issue 393 integration test |
| src/test/projects/stale-artifact/* | Test project files for stale artifact test |
| src/test/projects/stale-multimodule-artifact/* | Test project files for stale multimodule artifact test |
Comments suppressed due to low confidence (1)
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java:351
- The error message 'Cannot restore file' should include the restoration path to help diagnose issues. Add restorationPath to the error message for better debugging context.
if (cacheConfig.isRestoreOnDiskArtifacts() && MavenProjectInput.isRestoreOnDiskArtifacts(project)) {
Path restorationPath = project.getBasedir().toPath().resolve(artifact.getFilePath());
final AtomicBoolean restored = new AtomicBoolean(false);
return file -> {
// Set to restored even if it fails later, we don't want multiple try
if (restored.compareAndSet(false, true)) {
verifyRestorationInsideProject(project, restorationPath);
try {
restoreArtifactToDisk(file, artifact, restorationPath);
} catch (IOException e) {
LOGGER.error("Cannot restore file " + artifact.getFileName(), e);
throw new RuntimeException(e);
}
}
return restorationPath.toFile();
};
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This is a never-ending review cycle. What will it take to get this PR merged ...? |
- Differentiate cache skip message when cacheCompile=false vs true - Change empty directory artifact log from warn to info level - Add artifact examples to stagePreExistingArtifacts javadoc - Add proper javadoc to collectCachedArtifactPaths method - Fix resource leak: wrap Files.walk() in try-with-resources
Bukama
left a comment
There was a problem hiding this comment.
The implementation looks valuable, but I only have limited knowledge about the BCE. But what I recognize that no documentation has been update - even with the fact that there is a new configuration parameter.
|
@olamy Can you please investigate the latest build failure? It seems to be unrelated to this PR. I see:
and
|
|
Thank you for the review @Bukama! You're right - I've added documentation for the new
See commit e3d8e58. |
Document the new cacheCompile configuration parameter that controls whether compile-only builds create cache entries: - Add parameter to the command line flags table in parameters.md - Add how-to section explaining when to disable compile caching
e3d8e58 to
6a65421
Compare
|
Thanks @olamy! |
|
@cowwoc Running a build with these changes and Running it together with Would it be possible to include this use case as well, as this is an official configuration option? Thanks! |
When running with -Dmaven.build.cache.enabled=false, the build would fail with IllegalStateException: "Cache is not initialized. Actual state: DISABLED" because stagePreExistingArtifacts() was being called unconditionally. This commit adds a check for cacheState == INITIALIZED before calling stagePreExistingArtifacts() and restoreStagedArtifacts(), ensuring these cache-dependent operations are only performed when the cache is properly initialized. Fixes the regression reported in PR apache#394 comment.
Summary
This PR fixes incomplete cache restoration from compile-only builds by ensuring that configured outputs (e.g.,
classes,test-classes,module-info) are always attached and saved to the cache, regardless of the highest lifecycle phase executed.Changes Made
Reset attached resource bookkeeping per save invocation
attachedResourcesPathsByIdand resetsattachedResourceCounterat the start of eachsave()callAttach outputs for all lifecycle phases
attachGeneratedSources()andattachOutputs()outside thehasLifecyclePhase("package")conditionalAdd comprehensive integration test
module-info.classfiles are restored correctly after cache hitcompile-Only Builds #393Addressing Reviewer Feedback from PR #176
In PR #176#discussion_r1732039886, @AlexanderAshitkin suggested:
This PR implements exactly that suggestion:
attachGeneratedSources()isRestoreGeneratedSources())Root Cause Analysis
Before this PR:
Problem: Compile-only builds (
mvn clean compile) never calledattachOutputs(), creating cache entries without critical files likemodule-info.class.After this PR:
Result: Compile-only builds save configured outputs, enabling proper restoration in subsequent builds.
Testing
Run the integration test:
The test validates:
mvn clean compilecreates cache entry withmodule-info.classmvn clean verifyrestores from cache includingmodule-info.classRelated Issues
Fixes #393 - Incomplete Cache Restoration from compile-only Builds
Fixes #259 - Cannot restore cache after
mvn cleancommandsFixes #340 - Cache fails to restore compiled classes with
mvn clean testMigration Notes
No configuration changes required. Existing projects will automatically benefit from this fix on their next build.
Note on Implementation Philosophy: This PR prioritizes correctness and consistency over optimization. All lifecycle phases save configured outputs to ensure cache integrity. Future optimizations could selectively skip certain outputs based on phase, but this would require careful analysis to avoid reintroducing the bugs fixed here.