"Modifiable" input files for incremental builds #1707

ag-eitilt · 2025-08-26T19:36:14Z

Much as we'd like to have Wake be the only build system managing compilation, that's not a realistic assumption to make when working with languages that have strong ecosystems. In those cases, a solution for passing multiple files at once to a build tool and having that tool determine which need to be recompiled is often optimal, but Wake's sandboxing prevents that determination from occurring. This allows Runners to selectively poke holes in the sandbox to pick up artifacts from previous compilation, and to write to those same files if the tool determines the artifact needs to be recompiled. Crucially, that's only exposed at the level of the Runner, and user-facing Plans don't have any way of adding modifiable files -- the intent is to allow defining an incremental runner for each tool (a la incrementalCRunner for C+Make in the test¹), but not to allow this knob as a free-for-all footgun.

Additionally, the PR properly marks files in the previous visible list as not being writable within the sandbox, regardless of their permissions outside it. That is the existing behaviour as enforced by the FUSE implementation, and why simply having the Runners adding files to RunnerInputs.Visible rather than the new RunnerInputs.Modifiable wasn't sufficient.

Side note: I wouldn't have had any hope of finding the second tier of sandbox file in 289bdc1 if it weren't for the AI assistant.

I'd still want to go back through before merging this to add documentation, double-check the generated C test code meets my coding style, etc., but I wanted to get this posted first to get discussion going; if we determine that this is too unsafe, I wouldn't want to have to throw out all the time to polish it up.

C+Make isn't actually something that should be implemented using this method; that setup, at the least, is well-served by just using Wake's caching -- better-served, actually, since it enables hash-based recompilation rather than timestamp-based. It is, however, a very simple model to build a test around. ↩

At a guess, it wasn't liking the failed jobs when I was previously testing it.

ag-eitilt · 2025-08-26T19:40:12Z

share/wake/lib/system/remote_cache_runner.wake

+def mkCacheAllowedRequest ((RunnerInput label cmd vis _ env dir stdin _res _prefix _ isAtty _ _): RunnerInput) (output: RunnerOutput) (hidden: String) =
    def Usage status runtime cputime mem _ibytes obytes = output.getRunnerOutputUsage

    CacheAllowedRequest label cmd dir env hidden isAtty stdin vis status runtime cputime mem obytes


Most of the existing runners seem fine enough ignoring the modifiable inputs. This along with mkJobCacheRunner are the pair I'm most concerned about. So long as the invariant that the same outputs are produced from the same inputs holds up, it should be fine to cache, but it feels a bit too easy to have a buggy/malicious artifact slip through and poison the cache.

AbrarQuazi

Don't have any issue with this first pass for implementation, other than some questions. I don't think that this is an unsafe change, and am ok with you going further along with this, as I see this as a value add to what the sandbox should be allowed to do (ignoring or writing to incremental compile collaterals) as long as we properly gate what gets to be defined as a Modifiable input file

AbrarQuazi · 2025-09-23T20:22:24Z

share/wake/lib/system/runner.wake

    export Command: List String
    export Visible: List Path
+    # Files which the Runner exposes to the Job as additional inputs, but which may be modified as
+    # needed.  These *must not* have any effect on the output files (i.e. for the same inputs, the


Is there any guarantees for this? What prevents someone from adding a file that affects the outputs?

One question I have is, how are we handling caching of individual steps of an incremental compilation? Do we want to cache these individual steps, or only cache the final output?

Is there any guarantees for this? What prevents someone from adding a file that affects the outputs?

Entirely down to this comment. This is an instruction to whoever's writing the Runner about what they need to be sure of themselves, rather than a description of any internal processes that get run for them.

One question I have is, how are we handling caching of individual steps of an incremental compilation? Do we want to cache these individual steps, or only cache the final output?

If I'm reading the question correctly, I think the "incremental" wording is maybe slightly misleading. So far as Wake is concerned, there's only one layer of output, which should be cached or not according to the standard decision tree. The "incremental" comes in when the Job gets run later on as part of a separate invocation,¹ at which point the only difference is that this field allows pre-populating the FUSE sandbox with files without requiring they be read-only within the sandbox, nor that they be tracked in the job_hash. After the second instance of the Job returns, it's processed and cached identically to the first instance.²

Footnotes

Or, technically, it might be part of the same invocation if the Plan specifies ReRun. That's obviously not going to have an impact on caching. I'll have to look at how well it deals with that case; I think the answer is "just as poorly as if a non-incremental Job gets run multiple time at once", but it's worth actual exploration. ↩

I know I was planning to enforce that anything in the modifiable list avoids output conflicts by removing anything listed here from the output-paths list before handing it over to the Plan's FnOutputs. I'm pretty sure that made it in, but I can't remember whether it actually did or not, and meanwhile the test filters them out manually. Another thing to look into, and which will need fixing to at least some degree, even if it's just removing an extraneous editRunnerOutputOutputs. ↩

AbrarQuazi · 2025-09-23T20:36:56Z

tools/fuse-waked/main.cpp

  if (res == -1) res = -errno;
+
+  // If file is only visible (not writeable), remove write permissions from the reported mode
+  if (!it->second.is_writeable(key.second)) {


What is the behavior like before?Did all non-writeable (readonly) files have write permissions? That seems odd if true.

Writes were always being blocked; I don't have the error message handy any more, but was something about "permission denied", and I'm pretty sure it was just the standard one you'd get from any write-protected file. This was entirely fixing a rendering bug along the lines of

$ ls -l input.ext -rw-r--r-- 1 root root 0 Sep 23 15:39 input.ext $ echo test >> input.ext Permission denied $ whoami root

AbrarQuazi · 2025-09-23T20:52:37Z

tools/fuse-waked/main.cpp


  for (auto &x : files_wrote) files_read.erase(x);
+  // Although files_modifiable are conceptually inputs as well, they should be pruned from the
+  // reported inputs to ensure idempotent behavior; besides, the primary use of files_read is to


Can you explain why we need to remove the files_modifiable from the visible list, specifically how it breaks idempotency if we keep it? As far as I know, these modifiable files are not considered when hashing, so it doesnt break our determinism

Good question. I'll need to try to remember why that was; this might be a remnant of an earlier effort where I was trying to treat them as additional inputs internally, rather than this final form where it turns out they work better if you treat them as if they're files that got written by the Job in the process of setting up the sandbox.

I think the code's good since it's following the same model as the files_wrote erasures, but the comment can definitely be improved -- though it's actually maybe redundant depending on the relation between files_modifiable and files_wrote earlier on.

AbrarQuazi · 2025-09-23T21:29:04Z

tools/fuse-waked/main.cpp

+
+  auto j = job.files_modifiable.lower_bound(dir + "/");
+  return j != job.files_modifiable.end() && j->size() > dir.size() &&
+         0 == j->compare(0, dir.size(), dir);


this function is used to help remove directories from fuse right? So incremental build files may exist from a previous run, but we need this to allow wakefuse to allow it to unlink and succeed. With the incremental file that is left over, is Wake responsible for cleaning that up or the other build tool?

I should probably have a test targetting this. I don't think the other build tool comes into play here, but I've lost whatever knowledge I gleaned from this in context. Another place where it would really help if the code was properly commented (ironically despite the fact that it does have a comment -- the only new info that gives us is that this function is somehow connected to deletion and to is_writeable).

ag-eitilt added 6 commits August 25, 2025 18:46

Add modifiable files as a category of RW inputs

71ddddf

Fix display bug in ls -l permission bits

760eb83

Add test to be sure everything gets implemented correctly

d10deea

Update RunnerInput to allow specifying modifiable files

fb9a698

Expose the modifiable files in wakefs as well.

289bdc1

wake --clean is working again

a277662

At a guess, it wasn't liking the failed jobs when I was previously testing it.

ag-eitilt requested review from AbrarQuazi and ngraybeal August 26, 2025 19:36

ag-eitilt commented Aug 26, 2025

View reviewed changes

AbrarQuazi reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Modifiable" input files for incremental builds #1707

"Modifiable" input files for incremental builds #1707

Uh oh!

ag-eitilt commented Aug 26, 2025

Uh oh!

ag-eitilt Aug 26, 2025

Uh oh!

AbrarQuazi left a comment

Uh oh!

AbrarQuazi Sep 23, 2025

Uh oh!

AbrarQuazi Sep 23, 2025

Uh oh!

ag-eitilt Sep 23, 2025

Uh oh!

AbrarQuazi Sep 23, 2025

Uh oh!

ag-eitilt Sep 23, 2025 •

edited

Loading

Uh oh!

AbrarQuazi Sep 23, 2025

Uh oh!

ag-eitilt Sep 23, 2025

Uh oh!

AbrarQuazi Sep 23, 2025

Uh oh!

ag-eitilt Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

"Modifiable" input files for incremental builds #1707

Are you sure you want to change the base?

"Modifiable" input files for incremental builds #1707

Uh oh!

Conversation

ag-eitilt commented Aug 26, 2025

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AbrarQuazi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ag-eitilt Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ag-eitilt Sep 23, 2025 •

edited

Loading