Skip to content

feat(evaluators): ATR regex-based threat detection evaluator #169

@eeee2345

Description

@eeee2345

Problem

Agent Control's evaluator ecosystem has Cisco AI Defense (cloud API) and Galileo Luna (LLM-based), but no local, regex-based evaluator for detecting known AI agent threat patterns without API keys or network calls.

Proposed solution

A contrib evaluator using ATR (Agent Threat Rules) — community-maintained regex rules for AI agent threats.

# Usage
from agent_control_evaluator_atr.threat_rules import ATREvaluator, ATRConfig

evaluator = ATREvaluator(ATRConfig(
    min_severity="medium",
    categories=["prompt-injection", "tool-poisoning"],
))
result = await evaluator.evaluate("Ignore all previous instructions...")
# EvaluatorResult(matched=True, confidence=0.9, metadata={findings: [...]})

Key characteristics:

  • atr.threat_rules evaluator name, auto-discovered via entry points
  • 20 rules, 306 patterns covering OWASP Agentic Top 10
  • Configurable: min_severity, categories filter, block_on_match, on_error (fail-open/closed)
  • Pure regex, no API keys, <5ms evaluation
  • Returns all matching rules (not just first match) with metadata
  • Follows the Cisco evaluator pattern exactly (pyproject.toml, Makefile, entry points)
  • Rules maintained at agentthreatrule.org (MIT licensed)
  • ATR is already used by Cisco AI Defense

Willingness to contribute

Yes — full implementation ready with 22 tests covering detection, false-positive safety, config options, error handling, and multi-match behavior. Happy to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions