ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703
Draft
ANE-1036: Glob file matching for exclusion filters in .fossa.yml#1703
Conversation
`paths.only` and `paths.exclude` entries that contain `*`, `?`, or `[` are now parsed as System.FilePattern globs via the existing Data.Glob wrapper. Entries without glob metacharacters keep their prior "directory and all children" semantics, so this change is backward-compatible. Adds a PathFilter sum type at the config layer, threads a parallel list of glob patterns through FilterCombination, and extends pathAllowed / applyComb to include-or-exclude directories whose relative path matches a glob. Matching normalizes the trailing slash that Path.toString appends to Dir paths so patterns like `node_modules/*` match as users expect. Docs and fossa-yml.v3.schema.json updated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Normalize backslashes to forward slashes before glob matching so user-supplied patterns like `node_modules/*` match the backslash- separated paths produced by `Path Rel Dir` on Windows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover '?' wildcards, '[...]' character classes, root-anchored single-segment globs, an explicit trailing-slash normalization regression guard, and a four-way mix of include/exclude globs and concrete paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ests System.FilePattern only implements `*` and `**`; `?` and `[...]` are matched literally rather than as wildcards/character classes. The two new tests asserted wildcard semantics and were red on CI. Flip the expectations so they document the actual behavior and serve as a regression guard if the engine ever gains those features.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Extend
paths.onlyandpaths.excludein.fossa.ymlso entries may be glob patterns as well as concrete directory paths. An entry is treated as a glob if it contains any of*,?, or[; otherwise it keeps its existing "match this directory and all of its children" semantics, so the change is backward-compatible.Glob matching goes through the existing
Data.Globwrapper aroundSystem.FilePattern, which is already a dependency used by license-scan path filters and.gitignore-style handling in the Node.js strategy.Acceptance criteria
Users can write globs like the following in
.fossa.ymland have them prune scanning the way they expect:Testing plan
test/Discovery/FiltersSpec.hscover:**/name/**excludes match nested occurrences and prune the directory itself.node_modules/*) do not cross path boundaries.?is treated as a literal character (not a single-character wildcard).[...]is treated as literal characters (not a character class).node_modules/*matches aPath Rel DirwhosetoStringyields a trailing/.build*,*.lock) match at the repo root and not at nested depths.partitionPathFilterssplits a mixed list as expected.cabal build/cabal build test:unit-testsboth succeed on this branch..fossa.ymlwithpaths.exclude: ["**/vendor/**"]against a repo containing nestedvendor/directories and confirm the walker skips them infossa analyze --debugoutput.Note: I could not run the full unit-test binary locally because the test startup reads LFS-backed tar fixtures under
test/Container/testdata/that aren't materialized in this environment. The filter tests themselves compile cleanly withcabal build test:unit-tests.Risks
*and**are honored.System.FilePattern(the underlying engine viaData.Glob) does not implement?single-character wildcards or[...]character classes. Patterns containing?or[still work, but those characters are matched literally — e.g.build?/outonly matches a directory literally namedbuild?/out, notbuild1/out. Two unit tests document this behavior so a future engine swap that adds the missing semantics is caught.*,?, or[is treated as a glob. Because?and[are matched literally by the engine, a path containing those characters is still classified as a glob and routed through the glob matcher, but will then only match its literal form — harmless, just slightly redundant. A directory whose literal name contains one of those characters would be misclassified, which seems acceptable for.fossa.yml.Path.toStringon aPath Rel Dirappends/, which breaksSystem.FilePatternfor patterns likenode_modules/*.globMatchesDirstrips the trailing slash before matching. I chose to normalize locally rather than changeData.Glob.matches, which is shared with the Node.js and Walk code paths.FilterCombination's third field slightly changes the JSON encoding used for telemetry; the new field is only populated for filters that actually specify globs.Metrics
Not currently tracked.
References
Checklist
docs/references/files/fossa-yml.md).Changelog.mdunder## Unreleased.docs/references/files/fossa-yml.v3.schema.jsonto document thatpaths.only/paths.excludeaccept globs.