Skip to content

CI: UPX-compress Linux release binaries (150 MB → 49 MB)#1701

Draft
zlav wants to merge 6 commits intomasterfrom
zach/binary-size-upx-and-macos-split-sections
Draft

CI: UPX-compress Linux release binaries (150 MB → 49 MB)#1701
zlav wants to merge 6 commits intomasterfrom
zach/binary-size-upx-and-macos-split-sections

Conversation

@zlav
Copy link
Copy Markdown
Member

@zlav zlav commented Apr 23, 2026

Overview

Enables UPX compression on the Linux release binaries. macOS and Windows intentionally skipped (reasons below).

Still draft pending a decision on whether Linux-only is worth shipping on its own; the spike answer is "yes UPX works on Linux, no macOS is not worth the risk surface."

What ships

UPX on Linux amd64 (and Linux-arm via the same gate)

  • Install upx via apk add --no-cache upx inside the existing Alpine build container (Linux runs in fossa/haskell-static-alpine:ghc-9.8.4).
  • Run upx --best --lzma against fossa, diagnose, and millhone after strip.
  • rendergraph skipped — its main reads from stdin, and UPX's self-modifying stub is more brittle for piped tools.
  • Post-UPX smoke-launch verifies the compressed binaries boot before they get uploaded as artifacts.

Real compression (measured on this branch's green CI run):

Binary Pre-UPX Post-UPX Reduction
fossa 150 MB 49 MB 67.6%
millhone 12 MB 2.4 MB 79.8%
diagnose 3 MB 870 KB 70.7%

UPX compression itself takes ~60 seconds in CI.

What doesn't ship (and why)

macOS UPX — dropped after first CI run

UPX 5.x (brew's current version) refuses Mach-O with CantPackException: macOS is currently not supported (try --force-macos). Stacking UPX's own experimental macOS path on top of the hardened-runtime notarization question made failures diagnostically ambiguous — if --force-macos produced a binary Apple's notary rejected, we couldn't tell if UPX, the runtime, or the notary was the culprit. Separate spike if macOS matters enough.

macOS split-sections — tried, didn't work

Original plan was to also enable split-sections: True in cabal.project.ci.macos, matching Linux/Windows. CI showed why this was never set:

[GHC-74335] [-Winconsistent-flags, Werror=inconsistent-flags]
    -fsplit-sections is not useful on this platform since it uses subsections-via-symbols. Ignoring.

Mach-O's ld64 already does function-granular dead-code elimination via subsections-via-symbols by default. The ~10–20 MB I projected for macOS was imaginary — that saving was already baked in. Replaced with an explanatory comment in the cabal project file so nobody repeats the mistake.

Windows UPX — excluded on principle

AV false-positive surface at our distribution scale: Defender ML (Trojan:Win32/Wacatac.H!ml), corporate EDR (CrowdStrike/SentinelOne), Zscaler-class proxies, and SmartScreen all flag UPX-packed Windows binaries regardless of contents. Across thousands of CI environments the support burden would outweigh the ~100 MB saving.

LinuxARM UPX — deferred

Linux-arm passed the gate (matrix.os-name == 'Linux') doesn't include it, so the current workflow compresses only Linux amd64. ARM would need the equivalent apk install inside its own container step; boring but real work.

Acceptance criteria

  • Linux amd64 fossa artifact is noticeably smaller (observed 49 MB vs 150 MB baseline).
  • ./fossa --version launches successfully against the UPX-packed binary (smoke-launch step).
  • Windows and macOS artifacts unchanged.

Testing plan

CI

  • Build job matrix all green (verified).
  • Smoke-launch step on Linux after UPX passes.

Manual (before un-drafting)

  • Download the Linux-binaries artifact from the latest build on this branch.
  • file fossa should report a UPX-packed executable.
  • Run on a fresh Linux box (Docker: docker run --rm -v $(pwd):/w -w /w alpine:3.19 ./fossa --version or equivalent).
  • Measure cold-start latency: UPX LZMA adds ~300–700 ms at decompression time; tolerable for a CLI invoked once per CI job.

Risks

  • UPX + AV on Linux: Far lower surface than Windows but not zero. ClamAV can flag packed binaries. Mitigation: low prior incidence for our Linux customer base; watch for support tickets after first release.
  • UPX startup latency: ~300–700 ms cold-start for LZMA decompression. Visible if someone scripts fossa --version in a hot loop (unusual), invisible in normal CI use.
  • Binary identification: tools like ldd, strings, and antivirus heuristics see the UPX stub, not the inner binary. Debug workflow changes: need upx -d to inspect.

Out of scope

  • --force-macos UPX spike (separate issue if pursued).
  • Windows UPX + code-signing reputation warmup.
  • Dep audit and HTTP stack consolidation (separate PRs).

References

Checklist

  • CI workflow change — validated via green CI.
  • No user-visible behavior change (internal CI + packaging).
  • No .fossa.yml / fossa-deps / subcommand changes.
  • Manual bare-box Linux launch verification before un-drafting.
  • Product/release decision: ship Linux-only size reduction, or hold for macOS parity?

zlav and others added 4 commits April 23, 2026 12:58
Two layered size-reduction changes for the non-Windows release binaries.
Windows is intentionally excluded from UPX because the AV false-positive
surface (Defender, CrowdStrike, corporate proxies, SmartScreen) would
disrupt distribution at our scale.

1. Enable split-sections on macOS.
   Linux and Windows cabal project files already set split-sections: True;
   macOS was the outlier. With split-sections the compiler emits each
   top-level binding into its own ELF/Mach-O section, which lets the
   linker GC unused code at function granularity instead of per-module.
   Expected savings: 10–20 MB on macOS.

2. UPX compression on macOS and Linux (amd64).
   Install upx via brew on macOS and apt on ubuntu-latest. Run
   upx --best --lzma against fossa, diagnose, and millhone after strip
   and before codesign (so the signature covers the compressed binary).
   rendergraph is skipped because UPX's self-modification can conflict
   with its stdin-reading main. LinuxARM is skipped because the runner
   doesn't have apt; revisit if this ships.

   Post-UPX smoke-launch verifies the compressed binary boots before
   any signing attempt.

3. Temporary signing trigger on this branch.
   The Sign and Notarize step previously ran only on tag pushes. To
   validate that codesign + Apple notarization survive UPX-packed
   Mach-O, this branch is added as a secondary trigger. MUST be
   reverted before merge — tag pushes should be the only production
   signing trigger.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Linux job runs inside fossa/haskell-static-alpine via container: at
job level; every step runs in Alpine where apt-get does not exist
and the user is root. First CI run failed with apt-get: command not
found at this step and cancelled the whole matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UPX 5.x refuses to pack Mach-O without --force-macos. Stacking UPX's
own experimental macOS support on top of the hardened-runtime
notarization question makes failures ambiguous — if the end-to-end
chain breaks we can't tell whether UPX, the runtime, or the notary
is the culprit.

Keep the macOS size reduction via split-sections only (already in
cabal.project.ci.macos on this branch). Drop the temporary
signing-on-this-branch trigger since there is no UPX on macOS to
sign-test. Revisit macOS UPX as its own spike if/when Linux UPX is
proven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GHC-74335: -fsplit-sections is not useful on this platform since it
uses subsections-via-symbols. Ignoring.

macOS's Mach-O linker already does function-granular dead-code
elimination via ld64's subsections-via-symbols feature, so GHC emits
-Winconsistent-flags when -fsplit-sections is requested. Our CI has
-Werror, so that warning kills the build.

The original assumption that macOS was 'missing' split-sections was
wrong — macOS doesn't need it. The ~10-20 MB savings I'd projected
for the macOS binary were already baked in by default.

Replace the setting with a comment recording the reason so the next
person looking at this doesn't repeat the mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zlav zlav changed the title CI: Shrink release binaries via split-sections + UPX (spike) CI: UPX-compress Linux release binaries (150 MB → 49 MB) Apr 23, 2026
zlav and others added 2 commits April 23, 2026 15:44
Two CI-time optimizations for the Linux UPX step.

1. Cache the packed output keyed on the pre-UPX content hash.
   When a build produces the same uncompressed fossa/diagnose/millhone
   triple as a prior build — same deps, same source, same strip
   output — we skip re-running UPX and restore the packed copies
   straight from the cache. actions/cache saves at end-of-job on a
   cache miss, so future matching builds hit. On cache hit, Install
   UPX and the compression step are both skipped; the packed binaries
   are ready after the restore step.

   Key is `sha256(sha256(fossa) || sha256(diagnose) || sha256(millhone))`
   of the uncompressed inputs; any one of them changing invalidates
   the triple. Single combined key keeps the yaml simple and the
   three-binary UPX step is coupled anyway.

2. Drop --lzma in favour of UPX's default NRV compression.
   Cache misses now pay ~20s instead of ~60s per job for ~2.5% larger
   packed output. On a 150 MB fossa that is ~4 MB — cheap insurance
   given LZMA's 3× slowdown on every cache miss.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prefer the best compression ratio over CI time on cache miss. With
the packed-binary cache in place, the ~60s LZMA cost is paid only
when source actually changes; cache hits pay zero regardless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant