Skip to content

Extend regex variable extraction: multiple named groups per pattern #46

@michaelbeutler

Description

@michaelbeutler

Currently each ExtractedVariable defines one name and one pattern. To extract multiple values from a single regex (e.g., date + account ID from a Bank filename), support extracting all named capture groups from a single pattern.

Example Config

variables:
  extracted:
    - pattern: "(?P<file_year>[0-9]{4})(?P<file_month>[0-9]{2})(?P<file_day>[0-9]{2})_.*(?P<account_id>[0-9]{4}\\.[0-9]{4}\\.[0-9]{4})"
      source: filename

All named groups ($file_year, $file_month, $file_day, $account_id) become available as template variables.

Implementation

  • File: crates/paporg/src/config/schema.rs

  • Make name optional on ExtractedVariable — when omitted, all named capture groups become variables

  • Add optional defaults: HashMap<String, String> for per-group defaults

  • File: crates/paporg/src/config/variables.rs

  • In extract_variables() (line ~36): when name is None, iterate over all named capture groups and insert each as a variable

  • When name is Some, preserve current behavior (single variable extraction)

Acceptance Criteria

  • Omitting name extracts all named capture groups as separate variables
  • Providing name preserves existing single-variable behavior (backwards compatible)
  • Per-group defaults work when a specific group doesn't match
  • Transform applies to all extracted groups (or is skippable per group)
  • Covered by unit tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions