Skip to content

ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703

Draft
zlav wants to merge 4 commits intomasterfrom
ane-1036-glob-exclusion-filters
Draft

ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703
zlav wants to merge 4 commits intomasterfrom
ane-1036-glob-exclusion-filters

Conversation

@zlav
Copy link
Copy Markdown
Member

@zlav zlav commented Apr 24, 2026

Overview

Extend paths.only and paths.exclude in .fossa.yml so entries may be glob patterns as well as concrete directory paths. An entry is treated as a glob if it contains any of *, ?, or [; otherwise it keeps its existing "match this directory and all of its children" semantics, so the change is backward-compatible.

Glob matching goes through the existing Data.Glob wrapper around System.FilePattern, which is already a dependency used by license-scan path filters and .gitignore-style handling in the Node.js strategy.

Acceptance criteria

Users can write globs like the following in .fossa.yml and have them prune scanning the way they expect:

paths:
  exclude:
    - "**/vendor/**"
    - "**/node_modules/**"
    - "build/generated/*"

Testing plan

  1. Unit tests added in test/Discovery/FiltersSpec.hs cover:
    • **/name/** excludes match nested occurrences and prune the directory itself.
    • Single-segment wildcards (node_modules/*) do not cross path boundaries.
    • Glob excludes compose with concrete-path excludes.
    • Include-globs reject paths that don't match.
    • Exclude wins over include when both match.
    • Documented limitation: ? is treated as a literal character (not a single-character wildcard).
    • Documented limitation: [...] is treated as literal characters (not a character class).
    • Trailing-slash normalization regression guard: node_modules/* matches a Path Rel Dir whose toString yields a trailing /.
    • Root-level globs (build*, *.lock) match at the repo root and not at nested depths.
    • Four-way mix in one filter set: include-glob + include-concrete + exclude-glob + exclude-concrete, asserting accept/reject across representative paths.
    • partitionPathFilters splits a mixed list as expected.
  2. cabal build / cabal build test:unit-tests both succeed on this branch.
  3. Manual check: write a .fossa.yml with paths.exclude: ["**/vendor/**"] against a repo containing nested vendor/ directories and confirm the walker skips them in fossa analyze --debug output.

Note: I could not run the full unit-test binary locally because the test startup reads LFS-backed tar fixtures under test/Container/testdata/ that aren't materialized in this environment. The filter tests themselves compile cleanly with cabal build test:unit-tests.

Risks

  • Supported glob metacharacters: only * and ** are honored. System.FilePattern (the underlying engine via Data.Glob) does not implement ? single-character wildcards or [...] character classes. Patterns containing ? or [ still work, but those characters are matched literally — e.g. build?/out only matches a directory literally named build?/out, not build1/out. Two unit tests document this behavior so a future engine swap that adds the missing semantics is caught.
  • Classification heuristic: an entry containing *, ?, or [ is treated as a glob. Because ? and [ are matched literally by the engine, a path containing those characters is still classified as a glob and routed through the glob matcher, but will then only match its literal form — harmless, just slightly redundant. A directory whose literal name contains one of those characters would be misclassified, which seems acceptable for .fossa.yml.
  • Trailing-slash normalization: Path.toString on a Path Rel Dir appends /, which breaks System.FilePattern for patterns like node_modules/*. globMatchesDir strips the trailing slash before matching. I chose to normalize locally rather than change Data.Glob.matches, which is shared with the Node.js and Walk code paths.
  • FilterCombination's third field slightly changes the JSON encoding used for telemetry; the new field is only populated for filters that actually specify globs.

Metrics

Not currently tracked.

References

  • ANE-1036: Glob file matching for exclusion filters in .fossa.yml

Checklist

  • I added tests for this PR's change.
  • I added user-visible documentation (updated docs/references/files/fossa-yml.md).
  • I updated Changelog.md under ## Unreleased.
  • I updated docs/references/files/fossa-yml.v3.schema.json to document that paths.only / paths.exclude accept globs.
  • No subcommand option changes.

zlav and others added 4 commits April 24, 2026 13:29
`paths.only` and `paths.exclude` entries that contain `*`, `?`, or `[`
are now parsed as System.FilePattern globs via the existing Data.Glob
wrapper. Entries without glob metacharacters keep their prior
"directory and all children" semantics, so this change is
backward-compatible.

Adds a PathFilter sum type at the config layer, threads a parallel
list of glob patterns through FilterCombination, and extends
pathAllowed / applyComb to include-or-exclude directories whose
relative path matches a glob. Matching normalizes the trailing slash
that Path.toString appends to Dir paths so patterns like
`node_modules/*` match as users expect.

Docs and fossa-yml.v3.schema.json updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Normalize backslashes to forward slashes before glob matching so
user-supplied patterns like `node_modules/*` match the backslash-
separated paths produced by `Path Rel Dir` on Windows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover '?' wildcards, '[...]' character classes, root-anchored single-segment
globs, an explicit trailing-slash normalization regression guard, and a
four-way mix of include/exclude globs and concrete paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ests

System.FilePattern only implements `*` and `**`; `?` and `[...]` are
matched literally rather than as wildcards/character classes. The two
new tests asserted wildcard semantics and were red on CI. Flip the
expectations so they document the actual behavior and serve as a
regression guard if the engine ever gains those features.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant