Skip to content

Conversation

@andrewm4894
Copy link
Owner

@andrewm4894 andrewm4894 commented Aug 26, 2025

Summary

  • split anomaly agent into modular components (models, tools, nodes, graph) and slim agent API
  • document module layout and add lint/format helpers to Makefile
  • update tests for new imports

Testing

  • make lint
  • make tests (fails: The api_key client option must be set either by passing api_key to the client or setting the OPENAI_API_KEY environment variable)

https://chatgpt.com/codex/tasks/task_e_68ae24bd3e2c83289020eeff52c3f7e1

Summary by CodeRabbit

  • New Features
    • Introduced a high-level AnomalyAgent for detecting and optionally verifying anomalies.
    • Streamlined package imports (import AnomalyAgent, Anomaly, AnomalyList from the package root).
    • Consistent anomaly DataFrame output in long/wide formats.
  • Documentation
    • Added a Module Overview section and improved formatting across docs.
  • Refactor
    • Internal architecture simplified for better reusability and clarity (no API breakage).
  • Tests
    • Updated imports to use the package root.
  • Chores
    • Added make targets: lint and format for easier local workflows.

Copilot AI review requested due to automatic review settings August 26, 2025 21:40
@coderabbitai
Copy link

coderabbitai bot commented Aug 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Warning

Rate limit exceeded

@andrewm4894 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between bdda392 and 4f38626.

📒 Files selected for processing (5)
  • anomaly_agent/graph.py (1 hunks)
  • anomaly_agent/models.py (1 hunks)
  • anomaly_agent/nodes.py (1 hunks)
  • anomaly_agent/tools.py (1 hunks)
  • tests/conftest.py (1 hunks)

Walkthrough

Refactors anomaly detection into a class-based API using a reusable LangGraph. Adds new models, nodes, tools, and graph modules. Updates package exports to source models from a new module. Expands Makefile with lint/format targets. Refreshes documentation with module overviews and formatting fixes. Tests updated to import from package root.

Changes

Cohort / File(s) Summary
Agent refactor
anomaly_agent/agent.py
Replaces procedural graph wiring with AnomalyAgent class using a reusable graph; removes in-file models/state/helpers; updates methods for detection and DataFrame output.
Graph and nodes
anomaly_agent/graph.py, anomaly_agent/nodes.py
Introduces StateGraph construction via create_graph and node factories (detection/verification) with routing via should_verify.
Models
anomaly_agent/models.py
Adds Pydantic models Anomaly and AnomalyList with validators for timestamp, values, and descriptions.
Tools (LLM chains)
anomaly_agent/tools.py
Adds factory functions to build detection/verification chains using structured outputs.
Public API/export adjustments
anomaly_agent/__init__.py
Re-exports unchanged public surface; Anomaly/AnomalyList now sourced from .models instead of .agent.
Build/automation
Makefile
Adds phony targets lint and format; updates .PHONY to include examples, lint, format.
Documentation
ARCHITECTURE.md, README.md
Adds module overviews; formatting/whitespace tidy-ups; no behavioral/API changes.
Tests
tests/test_agent.py, tests/test_prompts.py
Update imports to use package root (anomaly_agent) without changing behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant AA as AnomalyAgent
    participant G as Graph (StateGraph)
    participant D as Detection Node
    participant V as Verification Node

    User->>AA: detect_anomalies(DataFrame)
    AA->>G: app.invoke({time_series, variable_name})
    G->>D: run detection chain
    D-->>G: detected_anomalies
    alt verify enabled and anomalies exist
        G->>V: run verification chain
        V-->>G: verified_anomalies
        G-->>AA: result (verified_anomalies)
    else
        G-->>AA: result (detected_anomalies)
    end
    AA-->>User: {column -> AnomalyList}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paw—new graphs arise,
Nodes hop in sequence, sleek and wise.
Models groomed, their whiskers neat,
Chains align with structured beat.
Makefile squeaks “lint, format!” too—
I sniff the diffs: fresh paths to chew.
Onward, anomalies—caught on cue! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/refactor-agent-implementation-into-modules

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

This comment was marked as outdated.

@andrewm4894 andrewm4894 requested a review from Copilot August 26, 2025 21:42
@codecov
Copy link

codecov bot commented Aug 26, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.56%. Comparing base (65cb7aa) to head (4f38626).

Files with missing lines Patch % Lines
anomaly_agent/models.py 78.26% 6 Missing and 4 partials ⚠️
anomaly_agent/nodes.py 92.59% 1 Missing and 1 partial ⚠️
anomaly_agent/tools.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
+ Coverage   62.39%   63.56%   +1.16%     
==========================================
  Files           6       10       +4     
  Lines         234      247      +13     
  Branches       39       36       -3     
==========================================
+ Hits          146      157      +11     
- Misses         82       84       +2     
  Partials        6        6              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the anomaly detection agent from a monolithic implementation into a modular architecture, splitting functionality across separate modules for better maintainability and organization.

Key Changes:

  • Split the main agent.py into specialized modules (models, tools, nodes, graph)
  • Updated imports in test files to use the new module structure
  • Added documentation for the modular layout and new Makefile helpers

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_prompts.py Updated import path from anomaly_agent.agent to anomaly_agent
tests/test_agent.py Updated import path from anomaly_agent.agent to anomaly_agent
anomaly_agent/tools.py New module containing LLM chain creation utilities
anomaly_agent/nodes.py New module with graph nodes for detection and verification
anomaly_agent/models.py New module with Pydantic models extracted from agent.py
anomaly_agent/graph.py New module with graph construction utilities
anomaly_agent/agent.py Refactored to use new modular components, significantly simplified
anomaly_agent/__init__.py Updated exports to import from new module locations
README.md Added module overview documentation
Makefile Added lint and format helper targets
ARCHITECTURE.md Added module structure documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

)
anomaly_description: str = Field(description="A description of the anomaly")

@validator("timestamp") # type: ignore
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @validator decorator is deprecated in Pydantic v2. Consider using @field_validator instead for forward compatibility.

Suggested change
@validator("timestamp") # type: ignore
@field_validator("timestamp")

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 49
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)

@validator("anomaly_description") # type: ignore
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @validator decorator is deprecated in Pydantic v2. Consider using @field_validator instead for forward compatibility.

Suggested change
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
@validator("anomaly_description") # type: ignore
@field_validator("variable_value")
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
@field_validator("anomaly_description")

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 49
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)

@validator("anomaly_description") # type: ignore
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @validator decorator is deprecated in Pydantic v2. Consider using @field_validator instead for forward compatibility.

Suggested change
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
@validator("anomaly_description") # type: ignore
@field_validator("variable_value", mode="before")
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
@field_validator("anomaly_description", mode="before")

Copilot uses AI. Check for mistakes.

anomalies: List[Anomaly] = Field(description="The list of anomalies")

@validator("anomalies") # type: ignore
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @validator decorator is deprecated in Pydantic v2. Consider using @field_validator instead for forward compatibility.

Suggested change
@validator("anomalies") # type: ignore
@field_validator("anomalies") # type: ignore

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
tests/test_agent.py (1)

102-114: Tests currently hit live LLMs and fail without OPENAI_API_KEY; mock LLM/graph calls

Running make tests fails with “The api_key client option must be set...” because AnomalyAgent() instantiates ChatOpenAI even if we don't invoke the model. Per repo guidelines, tests should mock LLM calls. Add an autouse pytest-mock fixture (preferably in tests/conftest.py) to stub ChatOpenAI so no network or API key is required. The stub should also return timestamps present in the provided time series to satisfy timestamp assertions in irregular/sub-second tests.

Add this new file:

# tests/conftest.py
import re
import pytest
from anomaly_agent import Anomaly, AnomalyList

@pytest.fixture(autouse=True)
def _mock_llm(mocker):
    class _FakeRunnable:
        def invoke(self, inputs):
            ts_str = inputs.get("time_series", "")
            # Find first timestamp in the table string (supports sub-second)
            m = re.search(r"\d{4}-\d{2}-\d{2}(?:\s+\d{2}:\d{2}:\d{2}(?:\.\d{3,6})?)?", ts_str)
            ts = m.group(0) if m else "2024-01-01 00:00:00"
            var = inputs.get("variable_name", "value")
            return AnomalyList(
                anomalies=[
                    Anomaly(
                        timestamp=ts,
                        variable_value=3.14,
                        anomaly_description=f"synthetic anomaly in {var}",
                    )
                ]
            )

    class _FakeLLM:
        def with_structured_output(self, _schema):
            return _FakeRunnable()

    # Prevent real client construction and network usage
    mocker.patch("anomaly_agent.agent.ChatOpenAI", return_value=_FakeLLM())

Optionally, to harden determinism of data-generation in this module (random noise), seed numpy at module level:

# at top of the module
np.random.seed(0)

If you want me to push a patch across both test files (and add conftest.py), say the word.

Also applies to: 116-127, 129-141, 143-174, 176-183, 185-195, 197-207, 209-216, 218-242, 244-258, 260-279, 281-297, 299-317, 319-340, 342-376, 383-397, 398-417, 419-438, 440-462, 464-487, 511-527

tests/test_prompts.py (1)

122-136: Add an autouse fixture to mock ChatOpenAI in prompt‐integration tests

All of the prompt tests in tests/test_prompts.py (that cover lines 122–136, 138–156, 158–176, 178–196, 198–220, 222–238, and 240–257) call AnomalyAgent(), which in turn constructs a real ChatOpenAI client in its constructor. Without a pytest fixture to stub out ChatOpenAI, these tests will attempt real API calls (and fail when OPENAI_API_KEY is unset).

Please add a tests/conftest.py with an automatic fixture that replaces the real LLM with a dummy stub. For example:

# tests/conftest.py
import pytest
from anomaly_agent import agent

class DummyLLM:
    def __init__(self, *args, **kwargs):
        pass
    def __call__(self, *args, **kwargs):
        # return minimal valid response shape for your tests
        return {}

@pytest.fixture(autouse=True)
def mock_chat_llm(monkeypatch):
    """
    Prevent real ChatOpenAI instantiation in all tests.
    """
    monkeypatch.setattr(agent, "ChatOpenAI", DummyLLM)

• Ensure this fixture is applied globally (no need to import it in individual test files).
• After adding it, verify locally with no OPENAI_API_KEY:

unset OPENAI_API_KEY
pytest -q

This will guarantee that all prompt‐integration tests use the dummy LLM and never hit the real OpenAI API.

anomaly_agent/models.py (1)

56-66: Adopt Pydantic v2 field validator for list field and add docstring

Ensures consistency with v2 and satisfies D102.

 class AnomalyList(BaseModel):
     """Represents a list of anomalies."""
 
     anomalies: List[Anomaly] = Field(description="The list of anomalies")
 
-    @validator("anomalies")  # type: ignore
-    def validate_anomalies(cls, v: List[Anomaly]) -> List[Anomaly]:
-        if not isinstance(v, list):
-            raise ValueError("anomalies must be a list")
-        return v
+    @field_validator("anomalies")
+    def validate_anomalies(cls, v: List[Anomaly]) -> List[Anomaly]:
+        """Ensure anomalies is a list of Anomaly instances."""
+        if not isinstance(v, list):
+            raise ValueError("anomalies must be a list")
+        return v
README.md (1)

231-243: Enforce clear separation of unit vs. integration tests

Our Makefile’s default make test target (lines 30–33) currently runs all tests—including those that require an OPENAI_API_KEY—and none of the existing tests are marked with @pytest.mark.integration. As a result, make test will fail without an API key, and pytest -m integration won’t actually pick up any tests.

Required changes:

  • Update the Makefile so that make test excludes integration tests by default.
  • Add a new integration-tests target (or update the existing test alias) to run only integration tests.
  • Mark any tests relying on the OpenAI client or OPENAI_API_KEY with @pytest.mark.integration.
  • Revise README.md to reflect:
    • make test → unit tests only (no API key needed)
    • make integration-tests (or uv run pytest -m integration) → requires OPENAI_API_KEY

Pinpointed locations:

  • Makefile (around lines 30–33):
    tests:
  • uv run pytest tests/ -v --cov=anomaly_agent --cov-branch --cov-report=term-missing --cov-report=xml
  • uv run pytest tests/ -v --cov=anomaly_agent --cov-branch --cov-report=term-missing --cov-report=xml -m "not integration"

test: tests

+integration-tests:

  • uv run pytest tests/ -v -m integration
- **tests/**: Prepend `@pytest.mark.integration` to any test functions or classes that require `OPENAI_API_KEY`.
- **README.md** (around lines 231–243):
```diff
-# Run all tests with coverage
-make test
+# Run all unit tests with coverage (no API key needed)
+make test

-# Run specific test categories
-uv run pytest tests/test_agent.py -v                    # Core functionality
-uv run pytest tests/test_prompts.py -v                  # Prompt system
-uv run pytest tests/test_graph_architecture.py -v       # Advanced architecture
+# Run specific unit-test categories
+uv run pytest tests/test_agent.py -v                    # Core functionality
+uv run pytest tests/test_prompts.py -v                  # Prompt system
+uv run pytest tests/test_graph_architecture.py -v       # Advanced architecture

-# Integration tests (requires OPENAI_API_KEY in .env)
-uv run pytest tests/ -m integration -v
+# Integration tests (requires OPENAI_API_KEY in .env)
+make integration-tests

This will ensure that make test never demands an API key, while preserving a clear path for running integration tests when a key is available.

anomaly_agent/agent.py (1)

16-26: Expose Required Streaming and Parallel Anomaly Detection APIs

We’ve confirmed that the following public methods are currently absent from anomaly_agent/agent.py and must be exposed to comply with our coding guidelines and preserve backward compatibility:

• detect_anomalies_streaming
• detect_anomalies_parallel (async)
• detect_anomalies_streaming_async (async)

Please add these methods immediately after the existing detect_anomalies() implementation in anomaly_agent/agent.py. Below is the minimal skeleton you can merge in; adjust imports and helper calls as needed:

 class AnomalyAgent:
     …  # existing methods

     def detect_anomalies_streaming(
         self,
         df: pd.DataFrame,
         timestamp_col: Optional[str] = None,
         verify_anomalies: Optional[bool] = None,
     ):
         """Yield (column, step, AnomalyList) updates as they become available."""
         # …implementation as per guidelines

+    async def detect_anomalies_parallel(
+        self,
+        df: pd.DataFrame,
+        timestamp_col: Optional[str] = None,
+        verify_anomalies: Optional[bool] = None,
+    ) -> Dict[str, AnomalyList]:
+        """Run per-column anomaly detection concurrently."""
+        # …implementation as per guidelines
+
+    async def detect_anomalies_streaming_async(
+        self,
+        df: pd.DataFrame,
+        timestamp_col: Optional[str] = None,
+        verify_anomalies: Optional[bool] = None,
+    ):
+        """Async streaming variant yielding (column, step, AnomalyList)."""
+        # …implementation as per guidelines

Bullet-point checklist for updates:

  • Add detect_anomalies_streaming under the public API.
  • Add async def detect_anomalies_parallel.
  • Add async def detect_anomalies_streaming_async.
  • Ensure imports (asyncio, Dict, Any, etc.) are present.
  • Update __all__ or other export lists to include the new methods if applicable.

These additions are critical to avoid breaking downstream code that depends on our streaming and parallel interfaces.

🧹 Nitpick comments (19)
Makefile (1)

4-4: Add conventional phony targets and satisfy checkmake warnings

Checkmake flags missing "all", "clean", and "test" phony targets. Declare them and provide minimal implementations. Also mark the already-present test alias as phony to avoid target/file name collisions.

Apply this diff:

-.PHONY: examples lint format
+.PHONY: examples lint format all clean test
+
+# Conventional aggregate/default targets
+all: lint tests
+
+# Clean build/test artifacts
+clean:
+	rm -rf build dist .pytest_cache .coverage coverage.xml *.egg-info
ARCHITECTURE.md (4)

63-70: Module structure section is clear and matches the refactor

The listed modules align with the package split (models, tools, nodes, graph, agent). Consider linking each bullet to the actual file path for quick navigation in GitHub.


270-274: Quantified performance claim needs evidence or qualification

“80% Performance Improvement” is a strong claim. Either add a brief note on methodology (dataset size, hardware, baseline) or soften the language (e.g., “significant improvement observed in internal benchmarks”).

Proposed wording tweak:

- **⚡ 80% Performance Improvement**: Eliminates redundant graph creation
+ **⚡ Significant Performance Improvement** (internal benchmarks): Eliminates redundant graph creation

360-364: Phase status indicators should reflect current implementation

Phase 2 is marked ✅ for Streaming & Parallel Processing. If async/streaming isn’t in this PR, mark as “Planned” or “In Progress” to avoid confusion.

- **Phase 2: Streaming & Parallel Processing** ✅
+ **Phase 2: Streaming & Parallel Processing** (Planned)

403-409: Minor grammar/flow polish in Summary section

There’s a small flow break between the bullet list and the following sentence. Join them with proper punctuation.

Apply this diff:

-This foundation enables robust anomaly detection while maintaining the flexibility to evolve with changing requirements and technological advances.
+This foundation enables robust anomaly detection while maintaining the flexibility to evolve with changing requirements and technological advances.

Note: If you intended a paragraph break, ensure there’s an empty line above and remove any trailing spaces that might trigger linters.

README.md (5)

70-77: Clarify module paths to the new public API modules

List fully qualified paths to reduce ambiguity and align with the PR’s “modular public API” intent.

-### 📦 Module Overview
-
-- `models.py` – Pydantic data models for anomalies
-- `tools.py` – LLM chains used by the graph nodes
-- `nodes.py` – Detection and verification node implementations
-- `graph.py` – Utilities for assembling the LangGraph state machine
-- `agent.py` – High-level interface for end users
+### 📦 Module Overview
+
+- `anomaly_agent/models.py` – Pydantic data models for anomalies
+- `anomaly_agent/tools.py` – LLM chains used by the graph nodes
+- `anomaly_agent/nodes.py` – Detection and verification node implementations
+- `anomaly_agent/graph.py` – Utilities for assembling the LangGraph state machine
+- `anomaly_agent/agent.py` – High-level interface for end users

29-29: Minor punctuation/style consistency in bullets

Maintain consistent punctuation in the feature list (all bullets end with a period or none do). Also resolves a LanguageTool warning.

-- 🔄 **Two-Stage Pipeline**: Detection and optional verification phases to reduce false positives
+- 🔄 **Two-Stage Pipeline**: Detection and optional verification phases to reduce false positives.

141-146: Note that examples require an API key when they hit the model

Users following examples that invoke the LLM may encounter “api_key must be set.” Add a reminder and a non-LLM example command.

 # Try real-world sensor data scenario
 python examples/examples.py --example real-world --plot
 
 # Custom model and plotting
 python examples/examples.py --model gpt-4o-mini --example multiple --plot
+#
+# Note: Examples that call the LLM require OPENAI_API_KEY in your .env.
+# For offline plotting-only runs, pass --no-detect (if supported) or use datasets that skip LLM calls.

204-209: Insert a blank line before the header for proper Markdown rendering

Fixes a LanguageTool warning and ensures the header renders as a header, not inline text.

-### 🔬 Science & Engineering
+
+### 🔬 Science & Engineering

295-299: Emphasize Pydantic v2 field validators to match coding guidelines

The repo standard is Pydantic v2 with field validators. Update wording to avoid confusion with v1 “validator”.

-- **✅ Pydantic Validation**: Type-safe data models throughout the pipeline
+- **✅ Pydantic v2 Validation**: Type-safe models with field validators throughout the pipeline
anomaly_agent/tools.py (1)

1-11: Import/style layout looks good; keep Black happy by trimming extra blank lines

Pre-commit flagged Black changes. One common cause is superfluous blank lines around imports. The following retains readability and satisfies Black.

-"""Utility chains used by anomaly detection nodes."""
-from langchain_core.runnables import Runnable
-from langchain_openai import ChatOpenAI
-
-from .models import AnomalyList
-from .prompt import (
+"""Utility chains used by anomaly detection nodes."""
+from langchain_core.runnables import Runnable
+from langchain_openai import ChatOpenAI
+from .models import AnomalyList
+from .prompt import (
     DEFAULT_SYSTEM_PROMPT,
     DEFAULT_VERIFY_SYSTEM_PROMPT,
     get_detection_prompt,
     get_verification_prompt,
 )
-
+
anomaly_agent/nodes.py (3)

20-35: Correct type hints for node factories to reflect partial state updates

Node functions return partial updates (dict fragments), not full AgentState instances. Update hints to reduce confusion and improve mypy accuracy.

-def create_detection_node(
-    llm: ChatOpenAI, detection_prompt: str
-) -> Callable[[AgentState], AgentState]:
+def create_detection_node(
+    llm: ChatOpenAI, detection_prompt: str
+) -> Callable[[AgentState], dict[str, Any]]:
@@
-    def detection_node(state: AgentState) -> AgentState:
+    def detection_node(state: AgentState) -> dict[str, Any]:
-        result = chain.invoke(
+        result = chain.invoke(
             {
-                "time_series": state["time_series"],
-                "variable_name": state["variable_name"],
+                "time_series": state.time_series if hasattr(state, "time_series") else state["time_series"],
+                "variable_name": state.variable_name if hasattr(state, "variable_name") else state["variable_name"],
             }
         )
         return {"detected_anomalies": result, "current_step": "verify"}

38-67: Mirror type hints and accessor pattern in verification node; guard for Pydantic or dict state

Keeps compatibility whether LangGraph passes a dict or a Pydantic instance.

-def create_verification_node(
-    llm: ChatOpenAI, verification_prompt: str
-) -> Callable[[AgentState], AgentState]:
+def create_verification_node(
+    llm: ChatOpenAI, verification_prompt: str
+) -> Callable[[AgentState], dict[str, Any]]:
@@
-    def verification_node(state: AgentState) -> AgentState:
-        if state["detected_anomalies"] is None:
+    def verification_node(state: AgentState) -> dict[str, Any]:
+        anomalies = (
+            state.detected_anomalies
+            if hasattr(state, "detected_anomalies")
+            else state.get("detected_anomalies")  # type: ignore[union-attr]
+        )
+        if anomalies is None:
             return {"verified_anomalies": None, "current_step": "end"}
 
-        detected_str = "\n".join(
+        detected_str = "\n".join(
             [
                 (
                     f"timestamp: {a.timestamp}, "
                     f"value: {a.variable_value}, "
                     f"Description: {a.anomaly_description}"
                 )
-                for a in state["detected_anomalies"].anomalies
+                for a in anomalies.anomalies
             ]
         )
 
-        result = chain.invoke(
+        ts = state.time_series if hasattr(state, "time_series") else state["time_series"]
+        var = state.variable_name if hasattr(state, "variable_name") else state["variable_name"]
+        result = chain.invoke(
             {
-                "time_series": state["time_series"],
-                "variable_name": state["variable_name"],
+                "time_series": ts,
+                "variable_name": var,
                 "detected_anomalies": detected_str,
             }
         )
         return {"verified_anomalies": result, "current_step": "end"}

71-73: Conditional router remains correct; tighten type for clarity if switching to Pydantic

Readable and concise. If migrating to Pydantic-only state, consider state.current_step == "verify".

-def should_verify(state: AgentState) -> Literal["verify", "end"]:
-    """Determine if we should proceed to verification."""
-    return "verify" if state["current_step"] == "verify" else "end"
+def should_verify(state: AgentState) -> Literal["verify", "end"]:
+    """Determine if we should proceed to verification."""
+    step = state.current_step if hasattr(state, "current_step") else state["current_step"]
+    return "verify" if step == "verify" else "end"
anomaly_agent/agent.py (5)

3-3: Type imports: prepare for stricter typing and literals.

To support more precise typing in this module (e.g., Literal for format, Any for state), extend the typing imports.

Apply this diff:

-from typing import Dict, Optional
+from typing import Any, Dict, Optional, Literal

34-36: Avoid compiling an unused app in init; cache compiled apps by verify flag.

You compile self.graph/self.app here but later re-create and compile a new graph in detect_anomalies. That wastes time and memory. Prefer lazy caching keyed by verify_anomalies.

Apply this diff:

-        self.graph = create_graph(
-            self.llm, verify_anomalies, detection_prompt, verification_prompt
-        )
-        self.app = self.graph.compile()
+        # Lazily compile and cache apps keyed by verify flag
+        self._apps: Dict[bool, Any] = {}
+        self._graph_prompts = (detection_prompt, verification_prompt)

Add this helper method (outside changed ranges):

def _get_app(self, verify: bool):
    app = self._apps.get(verify)
    if app is None:
        graph = create_graph(self.llm, verify, *self._graph_prompts)
        app = graph.compile()
        self._apps[verify] = app
    return app

45-45: Docstring is fine, but consider documenting token/row limits.

Given LLM context limits, documenting recommended max rows/columns (and any sampling) helps users avoid large payloads.


53-55: Recreating the graph on every call is unnecessary; reuse a cached compiled app.

Use the cached app based on the effective verify setting.

Apply this diff:

-        graph = create_graph(
-            self.llm, verify_anomalies, self.detection_prompt, self.verification_prompt
-        )
-        app = graph.compile()
+        app = self._get_app(verify_anomalies)

91-91: Use Literal for ‘format’ and validate early.

Narrowing the type improves mypy/IDE help and catches typos at call sites.

Apply this diff:

-    def get_anomalies_df(
-        self, anomalies: Dict[str, AnomalyList], format: str = "long"
-    ) -> pd.DataFrame:
+    def get_anomalies_df(
+        self,
+        anomalies: Dict[str, AnomalyList],
+        format: Literal["long", "wide"] = "long",
+    ) -> pd.DataFrame:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 65cb7aa and bdda392.

📒 Files selected for processing (11)
  • ARCHITECTURE.md (12 hunks)
  • Makefile (2 hunks)
  • README.md (8 hunks)
  • anomaly_agent/__init__.py (1 hunks)
  • anomaly_agent/agent.py (4 hunks)
  • anomaly_agent/graph.py (1 hunks)
  • anomaly_agent/models.py (1 hunks)
  • anomaly_agent/nodes.py (1 hunks)
  • anomaly_agent/tools.py (1 hunks)
  • tests/test_agent.py (1 hunks)
  • tests/test_prompts.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
anomaly_agent/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

anomaly_agent/**/*.py: Format Python code with Black using a maximum line length of 79 characters for files under anomaly_agent/
Sort imports with isort for files under anomaly_agent/
Enforce flake8 linting for files under anomaly_agent/
Run mypy type checking for files under anomaly_agent/
Use Pydantic v2 models (with field validators) for state and configuration instead of TypedDict

Files:

  • anomaly_agent/tools.py
  • anomaly_agent/graph.py
  • anomaly_agent/nodes.py
  • anomaly_agent/__init__.py
  • anomaly_agent/models.py
  • anomaly_agent/agent.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock to mock LLM calls in tests when appropriate

Files:

  • tests/test_prompts.py
  • tests/test_agent.py
anomaly_agent/agent.py

📄 CodeRabbit inference engine (CLAUDE.md)

anomaly_agent/agent.py: AgentConfig must validate constraints: max_retries between 0–10 and timeout between 30–3600 seconds
Expose and maintain streaming/parallel APIs: detect_anomalies_streaming(), detect_anomalies_parallel() (async), and detect_anomalies_streaming_async()

Files:

  • anomaly_agent/agent.py
🧠 Learnings (8)
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/prompt.py : Implement a validated, customizable prompt system for detection and verification prompts

Applied to files:

  • tests/test_prompts.py
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/**/*.py : Sort imports with isort for files under anomaly_agent/

Applied to files:

  • anomaly_agent/__init__.py
  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/constants.py : Set DEFAULT_MODEL_NAME to gpt-5-nano as the cost-optimized default model

Applied to files:

  • ARCHITECTURE.md
  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/**/*.py : Use Pydantic v2 models (with field validators) for state and configuration instead of TypedDict

Applied to files:

  • ARCHITECTURE.md
  • anomaly_agent/models.py
  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/**/*.py : Run mypy type checking for files under anomaly_agent/

Applied to files:

  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/**/*.py : Format Python code with Black using a maximum line length of 79 characters for files under anomaly_agent/

Applied to files:

  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/**/*.py : Enforce flake8 linting for files under anomaly_agent/

Applied to files:

  • README.md
📚 Learning: 2025-08-24T11:39:39.518Z
Learnt from: CR
PR: andrewm4894/anomaly-agent#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T11:39:39.518Z
Learning: Applies to anomaly_agent/agent.py : Expose and maintain streaming/parallel APIs: detect_anomalies_streaming(), detect_anomalies_parallel() (async), and detect_anomalies_streaming_async()

Applied to files:

  • anomaly_agent/agent.py
🧬 Code graph analysis (7)
anomaly_agent/tools.py (2)
anomaly_agent/models.py (1)
  • AnomalyList (56-65)
anomaly_agent/prompt.py (2)
  • get_detection_prompt (72-96)
  • get_verification_prompt (99-125)
anomaly_agent/graph.py (1)
anomaly_agent/nodes.py (4)
  • AgentState (10-17)
  • create_detection_node (20-35)
  • create_verification_node (38-68)
  • should_verify (71-73)
anomaly_agent/nodes.py (2)
anomaly_agent/models.py (1)
  • AnomalyList (56-65)
anomaly_agent/tools.py (2)
  • create_detection_chain (14-20)
  • create_verification_chain (23-29)
tests/test_prompts.py (2)
anomaly_agent/agent.py (1)
  • AnomalyAgent (16-119)
anomaly_agent/models.py (1)
  • AnomalyList (56-65)
tests/test_agent.py (2)
anomaly_agent/models.py (2)
  • Anomaly (9-53)
  • AnomalyList (56-65)
anomaly_agent/agent.py (1)
  • AnomalyAgent (16-119)
anomaly_agent/__init__.py (2)
anomaly_agent/agent.py (1)
  • AnomalyAgent (16-119)
anomaly_agent/models.py (2)
  • Anomaly (9-53)
  • AnomalyList (56-65)
anomaly_agent/agent.py (3)
anomaly_agent/graph.py (1)
  • create_graph (8-26)
anomaly_agent/models.py (1)
  • AnomalyList (56-65)
anomaly_agent/nodes.py (1)
  • AgentState (10-17)
🪛 GitHub Actions: pre-commit
anomaly_agent/tools.py

[error] 1-1: Black formatting changed file anomaly_agent/tools.py.

anomaly_agent/graph.py

[error] 1-1: Black formatting changed file anomaly_agent/graph.py.

anomaly_agent/nodes.py

[error] 1-1: Black formatting changed file anomaly_agent/nodes.py.


[error] 3-3: F401 'typing.List' imported but unused.

anomaly_agent/models.py

[error] 1-1: D100 Missing docstring in public module.


[error] 44-44: D102 Missing docstring in public method.


[error] 50-50: D102 Missing docstring in public method.


[error] 62-62: D102 Missing docstring in public method.

🪛 checkmake (0.2.2)
Makefile

[warning] 4-4: Missing required phony target "all"

(minphony)


[warning] 4-4: Missing required phony target "clean"

(minphony)


[warning] 4-4: Missing required phony target "test"

(minphony)

🪛 Ruff (0.12.2)
anomaly_agent/nodes.py

2-2: typing.List imported but unused

Remove unused import: typing.List

(F401)

anomaly_agent/models.py

39-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 LanguageTool
ARCHITECTURE.md

[grammar] ~360-~360: There might be a mistake here.
Context: ...e 2: Streaming & Parallel Processing** ✅ - Real-time anomaly detection streaming - ...

(QB_NEW_EN)


[grammar] ~403-~403: There might be a mistake here.
Context: ...gent caching and optimization strategies - 🛡️ Reliability: Comprehensive error h...

(QB_NEW_EN)


[grammar] ~404-~404: There might be a mistake here.
Context: ...prehensive error handling and validation - 📈 Scalability: Designed for growth an...

(QB_NEW_EN)


[grammar] ~405-~405: There might be a mistake here.
Context: ...ned for growth and enterprise deployment - 🔄 Extensibility: Plugin architecture ...

(QB_NEW_EN)


[grammar] ~406-~406: There might be a mistake here.
Context: ...gin architecture for custom requirements This foundation enables robust anomaly d...

(QB_NEW_EN)

README.md

[grammar] ~29-~29: There might be a mistake here.
Context: ...ication phases to reduce false positives - 📊 Multi-Variable Support: Analyze m...

(QB_NEW_EN)


[grammar] ~204-~204: There might be a mistake here.
Context: ... anomalies ### 🔬 Science & Engineering - 🌡️ IoT Sensors: Temperature, humidity...

(QB_NEW_EN)


[grammar] ~295-~295: There might be a mistake here.
Context: ...safe data models throughout the pipeline - 🎯 GraphManager Caching: Optimized per...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.10)
  • GitHub Check: test (3.11)
🔇 Additional comments (15)
Makefile (1)

48-51: Nice: first-class lint/format shortcuts via pre-commit

Having lint and format forward to pre-commit is clean and consistent with the repo’s workflow.

tests/test_agent.py (1)

11-11: Importing from package root is correct after re-exports in init

Good move to import Anomaly, AnomalyAgent, and AnomalyList from anomaly_agent. Keeps tests aligned with the public API surface.

tests/test_prompts.py (1)

12-12: Importing from package root is consistent with the new public API

Matches the re-exports in anomaly_agent/__init__.py. Good.

anomaly_agent/__init__.py (1)

7-8: Re-exporting models and agent from the package root is the right API shape

The new imports cleanly expose AnomalyAgent, Anomaly, and AnomalyList at the package root. This matches the tests and simplifies user imports. Ensure isort keeps this block grouped and stable.

README.md (3)

56-61: Mermaid formatting LGTM

The added spacing improves Mermaid readability and does not affect rendering.


189-191: Model table and defaults look good

Demonstrates non-default model override while the documented default remains gpt-5-nano. No action needed.


256-259: Quality commands section LGTM

Explicit isort/flake8/mypy entries align with repo guidelines.

anomaly_agent/graph.py (2)

15-26: Graph compilation is already centralized in agent.py, no change needed

I’ve verified that every call to create_graph in anomaly_agent/agent.py is immediately followed by a .compile() call (lines 37 and 56), ensuring the graph is compiled before use. Returning the raw StateGraph from create_graph is intentional and safe in the current code paths.


1-27: Apply Black Formatting to anomaly_agent/graph.py

Please format this file with Black to satisfy the project’s style settings (line-length = 88 in pyproject.toml). After adjusting the import wrapping as discussed, run:

black anomaly_agent/graph.py

This will automatically apply the correct line breaks and spacing. Once formatted, commit the changes to clear the pre-commit failure.

anomaly_agent/tools.py (3)

14-21: Factory function LGTM

Clear single-responsibility chain builder with typed output via structured parsing to AnomalyList.


23-30: Verification chain LGTM

Mirrors detection chain; consistent defaults and output type.


1-30: Run Black locally to fix formatting

Pre-commit indicates that Black would reformat this file, but Black isn’t available in this environment. Please verify and apply the required formatting changes by running locally:

# Ensure Black is installed (if needed)
pip install black

# Format the file
black anomaly_agent/tools.py
anomaly_agent/models.py (1)

9-17: Class docstring and fields read well

Model fields are clear and documented.

anomaly_agent/nodes.py (1)

1-74: Run pre-commit hooks to apply formatting and linting

Pre-commit is configured with Black, isort, and flake8 in .pre-commit-config.yaml. After updating your code, please run the following to fix any formatting or lint issues:

# Format with Black
pre-commit run black --all-files

# Sort imports with isort
pre-commit run isort --all-files

# Lint with flake8
pre-commit run flake8 --all-files

Or simply run all hooks at once:

pre-commit run --all-files
anomaly_agent/agent.py (1)

1-1: Good concise module docstring.

No issues here.

Comment on lines +9 to +13
from .constants import DEFAULT_MODEL_NAME, DEFAULT_TIMESTAMP_COL
from .graph import create_graph
from .models import AnomalyList
from .nodes import AgentState
from .prompt import DEFAULT_SYSTEM_PROMPT, DEFAULT_VERIFY_SYSTEM_PROMPT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

AgentState import violates guideline to use Pydantic models (v2) for state.

Per coding_guidelines, state should be a Pydantic v2 model, not a TypedDict. Here, importing AgentState (TypedDict) from nodes couples this module to a non-compliant type and makes mypy stricter configuration harder.

Short-term in this file: avoid the TypedDict dependency and type the state as Dict[str, Any]. Longer-term: migrate AgentState in anomaly_agent/nodes.py to a Pydantic v2 model with field validators, then update graph.py’s StateGraph state type accordingly.

Apply this diff to decouple agent.py now:

-from .nodes import AgentState

And adjust the state annotation in detect_anomalies (see suggested diff on Lines 75-79).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In anomaly_agent/agent.py around lines 9-13 and in the detect_anomalies function
around lines 75-79, remove the import of AgentState from nodes and stop
depending on the TypedDict; instead import typing.Dict and typing.Any and change
any state type annotations to Dict[str, Any] (including the detect_anomalies
parameter/type hints). Ensure you also remove the unused AgentState import line
and update imports so mypy no longer sees the TypedDict dependency; keep the
longer-term migration of AgentState in anomaly_agent/nodes.py to a Pydantic v2
model for later.

Comment on lines +27 to 33
self.llm = ChatOpenAI(model=model_name)
self.timestamp_col = timestamp_col
self.verify_anomalies = verify_anomalies
self.detection_prompt = detection_prompt
self.verification_prompt = verification_prompt

# Create the graph
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Inject LLM and validate config (fixes tests without OPENAI_API_KEY; enforces max_retries/timeout constraints).

Tests currently fail with “api_key client option must be set...”. Allow injecting an LLM (for tests/mocks) and/or an explicit api_key. Also enforce required constraints: max_retries ∈ [0,10], timeout ∈ [30,3600] (coding_guidelines).

Apply this diff to the constructor signature and initialization:

-    def __init__(
-        self,
-        model_name: str = DEFAULT_MODEL_NAME,
-        timestamp_col: str = DEFAULT_TIMESTAMP_COL,
-        verify_anomalies: bool = True,
-        detection_prompt: str = DEFAULT_SYSTEM_PROMPT,
-        verification_prompt: str = DEFAULT_VERIFY_SYSTEM_PROMPT,
-    ):
-        """Initialize the AnomalyAgent with a specific model."""
-        self.llm = ChatOpenAI(model=model_name)
+    def __init__(
+        self,
+        model_name: str = DEFAULT_MODEL_NAME,
+        timestamp_col: str = DEFAULT_TIMESTAMP_COL,
+        verify_anomalies: bool = True,
+        detection_prompt: str = DEFAULT_SYSTEM_PROMPT,
+        verification_prompt: str = DEFAULT_VERIFY_SYSTEM_PROMPT,
+        *,
+        api_key: Optional[str] = None,
+        llm: Optional["BaseChatModel"] = None,
+        max_retries: int = 2,
+        timeout: int = 120,
+    ):
+        """Initialize the AnomalyAgent with a specific model and config."""
+        # Validate config per guidelines
+        if not (0 <= max_retries <= 10):
+            raise ValueError("max_retries must be between 0 and 10")
+        if not (30 <= timeout <= 3600):
+            raise ValueError("timeout must be between 30 and 3600 seconds")
+        # Allow injection for tests and custom LLMs
+        if llm is not None:
+            self.llm = llm
+        else:
+            self.llm = ChatOpenAI(
+                model=model_name,
+                api_key=api_key,
+                max_retries=max_retries,
+                timeout=timeout,
+            )

Add these imports at the top of the file (near other imports):

+from langchain_core.language_models.chat_models import BaseChatModel

Optionally, to satisfy “use Pydantic v2 models for configuration”, introduce AgentConfig (add this outside the changed ranges):

from pydantic import BaseModel, field_validator

class AgentConfig(BaseModel):
    max_retries: int = 2
    timeout: int = 120

    @field_validator("max_retries")
    @classmethod
    def _vr_retries(cls, v: int) -> int:
        if not (0 <= v <= 10):
            raise ValueError("max_retries must be between 0 and 10")
        return v

    @field_validator("timeout")
    @classmethod
    def _vr_timeout(cls, v: int) -> int:
        if not (30 <= v <= 3600):
            raise ValueError("timeout must be between 30 and 3600 seconds")
        return v

And accept config: Optional[AgentConfig] = None in init, wiring its values to ChatOpenAI if provided. If you want, I can provide the exact patch.

🤖 Prompt for AI Agents
In anomaly_agent/agent.py around lines 27 to 33, the constructor currently
constructs ChatOpenAI directly and doesn't allow injecting a mock LLM or
explicit api_key and doesn't enforce max_retries/timeout constraints; change the
__init__ signature to accept llm: Optional[Any] = None, api_key: Optional[str] =
None, and config: Optional[AgentConfig] = None (or separate max_retries:int=2,
timeout:int=120) and if llm is provided use it otherwise instantiate ChatOpenAI
with api_key (if provided) and the config values; add imports for Optional and
AgentConfig (or add the AgentConfig pydantic BaseModel shown in the comment) and
validate max_retries ∈ [0,10] and timeout ∈ [30,3600] (either via AgentConfig
validators or explicit checks) before passing them to ChatOpenAI so tests can
inject mocks and the constraints are enforced.

Comment on lines +75 to 79
"time_series": df_str,
"variable_name": col,
"current_step": "detect",
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

State typing and payload size: decouple from TypedDict and shrink per-call input.

  • Type as Dict[str, Any] to avoid the TypedDict import and ease migration to a Pydantic state.
  • Instead of passing the whole DataFrame for every column, pass only the two relevant columns as CSV. This reduces tokens and improves performance.

Apply this diff:

-            state: AgentState = {
-                "time_series": df_str,
+            state: Dict[str, Any] = {
+                "time_series": df[[self.timestamp_col, col]].to_csv(index=False),
                 "variable_name": col,
                 "current_step": "detect",
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
state: AgentState = {
"time_series": df_str,
"variable_name": col,
"current_step": "detect",
}
state: Dict[str, Any] = {
"time_series": df[[self.timestamp_col, col]].to_csv(index=False),
"variable_name": col,
"current_step": "detect",
}
🤖 Prompt for AI Agents
In anomaly_agent/agent.py around lines 75 to 79, the state is currently typed as
AgentState and embeds the full DataFrame for each column; change the type to
Dict[str, Any] (remove or stop using the TypedDict import) and replace the
per-column DataFrame payload with a CSV string containing only the two relevant
columns (time column and the current variable column) to shrink per-call input —
build the CSV via df[[time_col, col]].to_csv(index=False) (or equivalent) and
assign that string to "time_series" in the state object, keeping keys
"variable_name" and "current_step" unchanged.

Comment on lines +1 to +6
from datetime import datetime
from typing import List

from pydantic import BaseModel, Field, validator

from .constants import TIMESTAMP_FORMAT
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Adopt Pydantic v2 field validators and add module docstring

Repo standard is Pydantic v2. Switch from validator to field_validator and add a short module docstring to satisfy docstring checks.

+"""Pydantic models for anomaly detection results."""
 from datetime import datetime
 from typing import List
 
-from pydantic import BaseModel, Field, validator
+from pydantic import BaseModel, Field, field_validator
+# Optional: enforce strict schema (no extra keys)
+# from pydantic import ConfigDict
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import datetime
from typing import List
from pydantic import BaseModel, Field, validator
from .constants import TIMESTAMP_FORMAT
"""Pydantic models for anomaly detection results."""
from datetime import datetime
from typing import List
from pydantic import BaseModel, Field, field_validator
# Optional: enforce strict schema (no extra keys)
# from pydantic import ConfigDict
from .constants import TIMESTAMP_FORMAT
🧰 Tools
🪛 GitHub Actions: pre-commit

[error] 1-1: D100 Missing docstring in public module.

🤖 Prompt for AI Agents
In anomaly_agent/models.py lines 1-6, add a short module-level docstring at the
top and migrate any use of Pydantic v1's validator to Pydantic v2's
field_validator: update the import to include field_validator (from pydantic
import BaseModel, Field, field_validator), replace @validator decorators with
@field_validator and adjust method signatures to accept the new v2 style
(validate per-field, use mode="after" or "before" as needed), and update any
references to the validator API accordingly so the file conforms to Pydantic v2
and the repo docstring check.

Comment on lines +18 to +42
@validator("timestamp") # type: ignore
def validate_timestamp(cls, v: str) -> str:
"""Validate that the timestamp is in a valid format."""
try:
datetime.strptime(v, TIMESTAMP_FORMAT)
return v
except ValueError:
try:
dt = datetime.fromisoformat(v.replace("Z", "+00:00"))
if "." in v:
return dt.strftime(TIMESTAMP_FORMAT)
return dt.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
try:
dt = datetime.strptime(v, "%Y-%m-%d")
return dt.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
try:
datetime.strptime(v, "%Y-%m-%d %H:%M:%S")
return v
except ValueError:
raise ValueError(
f"timestamp must be in {TIMESTAMP_FORMAT} format, ISO format, or YYYY-MM-DD format"
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Switch timestamp validator to Pydantic v2 and use exception chaining

Use field_validator in v2. Also prefer raise ... from None (or from err) in except blocks (Ruff B904) for clearer error origins.

-    @validator("timestamp")  # type: ignore
-    def validate_timestamp(cls, v: str) -> str:
-        """Validate that the timestamp is in a valid format."""
-        try:
-            datetime.strptime(v, TIMESTAMP_FORMAT)
-            return v
-        except ValueError:
-            try:
-                dt = datetime.fromisoformat(v.replace("Z", "+00:00"))
-                if "." in v:
-                    return dt.strftime(TIMESTAMP_FORMAT)
-                return dt.strftime("%Y-%m-%d %H:%M:%S")
-            except ValueError:
-                try:
-                    dt = datetime.strptime(v, "%Y-%m-%d")
-                    return dt.strftime("%Y-%m-%d %H:%M:%S")
-                except ValueError:
-                    try:
-                        datetime.strptime(v, "%Y-%m-%d %H:%M:%S")
-                        return v
-                    except ValueError:
-                        raise ValueError(
-                            f"timestamp must be in {TIMESTAMP_FORMAT} format, ISO format, or YYYY-MM-DD format"
-                        )
+    @field_validator("timestamp")
+    def validate_timestamp(cls, v: str) -> str:
+        """Validate and normalize supported timestamp formats."""
+        try:
+            datetime.strptime(v, TIMESTAMP_FORMAT)
+            return v
+        except ValueError:
+            pass
+        # ISO-like: '...Z' or with offset
+        try:
+            dt = datetime.fromisoformat(v.replace("Z", "+00:00"))
+            return dt.strftime(TIMESTAMP_FORMAT if "." in v else "%Y-%m-%d %H:%M:%S")
+        except ValueError:
+            pass
+        # Date-only
+        try:
+            dt = datetime.strptime(v, "%Y-%m-%d")
+            return dt.strftime("%Y-%m-%d %H:%M:%S")
+        except ValueError:
+            pass
+        # Already normalized?
+        try:
+            datetime.strptime(v, "%Y-%m-%d %H:%M:%S")
+            return v
+        except ValueError:
+            raise ValueError(
+                f"timestamp must be in {TIMESTAMP_FORMAT} format, ISO 8601, or YYYY-MM-DD"
+            ) from None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@validator("timestamp") # type: ignore
def validate_timestamp(cls, v: str) -> str:
"""Validate that the timestamp is in a valid format."""
try:
datetime.strptime(v, TIMESTAMP_FORMAT)
return v
except ValueError:
try:
dt = datetime.fromisoformat(v.replace("Z", "+00:00"))
if "." in v:
return dt.strftime(TIMESTAMP_FORMAT)
return dt.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
try:
dt = datetime.strptime(v, "%Y-%m-%d")
return dt.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
try:
datetime.strptime(v, "%Y-%m-%d %H:%M:%S")
return v
except ValueError:
raise ValueError(
f"timestamp must be in {TIMESTAMP_FORMAT} format, ISO format, or YYYY-MM-DD format"
)
@field_validator("timestamp")
def validate_timestamp(cls, v: str) -> str:
"""Validate and normalize supported timestamp formats."""
try:
datetime.strptime(v, TIMESTAMP_FORMAT)
return v
except ValueError:
pass
# ISO-like: '...Z' or with offset
try:
dt = datetime.fromisoformat(v.replace("Z", "+00:00"))
return dt.strftime(TIMESTAMP_FORMAT if "." in v else "%Y-%m-%d %H:%M:%S")
except ValueError:
pass
# Date-only
try:
dt = datetime.strptime(v, "%Y-%m-%d")
return dt.strftime("%Y-%m-%d %H:%M:%S")
except ValueError:
pass
# Already normalized?
try:
datetime.strptime(v, "%Y-%m-%d %H:%M:%S")
return v
except ValueError:
raise ValueError(
f"timestamp must be in {TIMESTAMP_FORMAT} format, ISO 8601, or YYYY-MM-DD"
) from None
🧰 Tools
🪛 Ruff (0.12.2)

39-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In anomaly_agent/models.py around lines 18 to 42, the timestamp validator is
written with Pydantic v1's @validator and uses bare raises inside nested
excepts; convert it to a Pydantic v2 field validator (use
@field_validator("timestamp", mode="before") or appropriate mode and update
imports) and refactor the nested try/except blocks to re-raise with explicit
exception chaining (e.g., raise ValueError(...) from err or from None) so each
caught exception preserves its origin; keep the same parsing logic/formatting
but update the decorator/imports and replace bare raises with chained raises.

Comment on lines 43 to 48
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Switch numeric validator to Pydantic v2 and document purpose

Also minor docstring addition satisfies D102.

-    @validator("variable_value")  # type: ignore
-    def validate_variable_value(cls, v: float) -> float:
-        if not isinstance(v, (int, float)):
-            raise ValueError("variable_value must be a number")
-        return float(v)
+    @field_validator("variable_value")
+    def validate_variable_value(cls, v: float) -> float:
+        """Coerce numeric inputs to float; reject non-numerics."""
+        if not isinstance(v, (int, float)):
+            raise ValueError("variable_value must be a number")
+        return float(v)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@validator("variable_value") # type: ignore
def validate_variable_value(cls, v: float) -> float:
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
@field_validator("variable_value")
def validate_variable_value(cls, v: float) -> float:
"""Coerce numeric inputs to float; reject non-numerics."""
if not isinstance(v, (int, float)):
raise ValueError("variable_value must be a number")
return float(v)
🧰 Tools
🪛 GitHub Actions: pre-commit

[error] 44-44: D102 Missing docstring in public method.

Comment on lines 49 to 54
@validator("anomaly_description") # type: ignore
def validate_anomaly_description(cls, v: str) -> str:
if not isinstance(v, str):
raise ValueError("anomaly_description must be a string")
return v

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Switch description validator to Pydantic v2 and add docstring

Adds the missing method docstring flagged by the docstring checker.

-    @validator("anomaly_description")  # type: ignore
-    def validate_anomaly_description(cls, v: str) -> str:
-        if not isinstance(v, str):
-            raise ValueError("anomaly_description must be a string")
-        return v
+    @field_validator("anomaly_description")
+    def validate_anomaly_description(cls, v: str) -> str:
+        """Ensure the anomaly description is a string."""
+        if not isinstance(v, str):
+            raise ValueError("anomaly_description must be a string")
+        return v
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@validator("anomaly_description") # type: ignore
def validate_anomaly_description(cls, v: str) -> str:
if not isinstance(v, str):
raise ValueError("anomaly_description must be a string")
return v
@field_validator("anomaly_description")
def validate_anomaly_description(cls, v: str) -> str:
"""Ensure the anomaly description is a string."""
if not isinstance(v, str):
raise ValueError("anomaly_description must be a string")
return v
🧰 Tools
🪛 GitHub Actions: pre-commit

[error] 50-50: D102 Missing docstring in public method.

🤖 Prompt for AI Agents
In anomaly_agent/models.py around lines 49 to 54, the validator uses the
Pydantic v1 @validator decorator and lacks a docstring; replace
@validator("anomaly_description") with Pydantic v2's
@field_validator("anomaly_description") (import field_validator from pydantic),
adjust the function signature to the v2 convention (accepting the field value as
the first parameter) and add a concise docstring describing that the method
ensures anomaly_description is a string and raises ValueError otherwise.

Comment on lines 2 to 8
from typing import Callable, List, Literal, Optional, TypedDict

from langchain_openai import ChatOpenAI

from .models import AnomalyList
from .tools import create_detection_chain, create_verification_chain

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused import and prepare for Pydantic-based state

Ruff/flake8 flagged typing.List as unused. Also, repo standard prefers Pydantic v2 models for state over TypedDict; see follow-up diff below.

-from typing import Callable, List, Literal, Optional, TypedDict
+from typing import Any, Callable, Literal, Optional, TypedDict
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Callable, List, Literal, Optional, TypedDict
from langchain_openai import ChatOpenAI
from .models import AnomalyList
from .tools import create_detection_chain, create_verification_chain
from typing import Any, Callable, Literal, Optional, TypedDict
from langchain_openai import ChatOpenAI
from .models import AnomalyList
from .tools import create_detection_chain, create_verification_chain
🧰 Tools
🪛 Ruff (0.12.2)

2-2: typing.List imported but unused

Remove unused import: typing.List

(F401)

🪛 GitHub Actions: pre-commit

[error] 3-3: F401 'typing.List' imported but unused.

🤖 Prompt for AI Agents
In anomaly_agent/nodes.py around lines 2 to 8, remove the unused import
typing.List and replace the TypedDict-based state types with a Pydantic v2
model: delete List and TypedDict from the imports, add "from pydantic import
BaseModel", define a small BaseModel subclass for the state (with the same
keys/types previously declared in the TypedDict), and update any references to
use that Pydantic model type; keep the other imports (Callable, Literal,
Optional, ChatOpenAI, AnomalyList, create_detection_chain,
create_verification_chain) unchanged.

Comment on lines +10 to +18
class AgentState(TypedDict, total=False):
"""State for the anomaly detection agent."""

time_series: str
variable_name: str
detected_anomalies: Optional[AnomalyList]
verified_anomalies: Optional[AnomalyList]
current_step: str

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Migrate AgentState from TypedDict to a Pydantic v2 model (repo standard)

Using Pydantic for state enables validation and better editor support, and aligns with the coding guidelines. If you’d prefer to stage this, at least introduce a Pydantic wrapper and plan a follow-up refactor.

Proposed inline refactor (minimizes behavioral change; functions still return partial dict updates as LangGraph expects):

-from typing import Callable, Any, Literal, Optional, TypedDict
+from typing import Any, Callable, Literal, Optional
+from pydantic import BaseModel, Field
@@
-class AgentState(TypedDict, total=False):
-    """State for the anomaly detection agent."""
-
-    time_series: str
-    variable_name: str
-    detected_anomalies: Optional[AnomalyList]
-    verified_anomalies: Optional[AnomalyList]
-    current_step: str
+class AgentState(BaseModel):
+    """State for the anomaly detection agent."""
+
+    time_series: str
+    variable_name: str
+    detected_anomalies: Optional[AnomalyList] = None
+    verified_anomalies: Optional[AnomalyList] = None
+    current_step: Literal["detect", "verify", "end"] = "detect"

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants