Skip to content

Conversation

@Hoper-J
Copy link

@Hoper-J Hoper-J commented Aug 21, 2025

Summary

This PR adds historical data storage to keep usage statistics beyond Claude's 30-day limit.

Note: This feature was developed for personal use and I thought it might be useful for others too. No pressure to merge - just wanted to share! Thank you for creating claude-monitor! 😊

Key Features

🗄️ Historical Data Management

  • HistoryManager: New module for saving/loading historical daily data
  • Automatic data preservation on each --view daily execution
  • Smart merging: Current data (within 30 days) takes priority over historical
  • Data stored in ~/.claude-monitor/history/daily/ organized by year/month

📅 Date Filtering

  • --start-date and --end-date parameters for filtering data ranges
  • Supports formats: YYYY-MM-DD, YYYY.MM.DD, YYYY/MM/DD
  • Works with both current and historical data

🔄 History Modes

New --history parameter with four modes:

  • auto (default): Save and load historical data automatically
  • off: Disable history functionality completely
  • readonly: Only load historical data, don't save new data
  • writeonly: Only save new data, don't load historical

📊 Enhanced Monthly View

  • Monthly statistics now aggregate from complete historical daily data
  • No longer limited to 30-day window
  • Ensures consistency between daily and monthly totals

Try It Out

If you'd like to test this feature:

git clone https://github.com/Hoper-J/Claude-Code-Usage-Monitor.git
cd Claude-Code-Usage-Monitor
git checkout feature/historical-data-storage
pip install -e .

Usage Examples

# View daily usage with automatic history (default)
claude-monitor --view daily

# View specific date range
claude-monitor --view daily --start-date 2024-01-01 --end-date 2024-12-31

# View monthly aggregated from all historical data
claude-monitor --view monthly --history auto

# Only read historical data without saving
claude-monitor --view daily --history readonly

Implementation Details

Modified files:

  • cli/main.py - Added history integration
  • core/settings.py - New parameters
  • data/aggregator.py - Date filtering
  • data/history_manager.py - New history module

Data Structure

Historical data is stored as JSON files:

~/.claude-monitor/history/daily/
└── 2025/
    ├── 07/
    │   ├── 2025-07-30.json
    │   └── 2025-07-31.json
    └── 08/
        ├── 2025-08-01.json
        ├── 2025-08-02.json
        └── ...

Testing

  • Various date ranges and formats ✅
  • Historical data loading and merging ✅
  • Monthly aggregation from historical data ✅
  • All four history modes (auto/off/readonly/writeonly) ✅

Benefits

  1. Long-term tracking: View usage beyond 30-day limit
  2. Complete statistics: Monthly totals based on all historical data
  3. Data preservation: Automatic backup of daily statistics

Hope this helps! Feel free to reach out if you have any questions.

Summary by CodeRabbit

  • New Features

    • CLI date-range parsing and validation (timezone-aware; accepts YYYY-MM-DD / YYYY.MM.DD / YYYY/MM/DD; errors if start > end). End date is inclusive. Aggregation accepts date ranges and applies local-time filtering. History support for daily/monthly views: load, merge with current data, optionally save history, and derive monthly summaries from combined daily data. New Settings: start_date, end_date, history (auto/off/readonly/writeonly) with validation and CLI propagation.
  • Tests

    • Added tests for date-filtered aggregations, timezone grouping, history lifecycle, and settings validation.

- Add HistoryManager to save/load daily usage data beyond Claude's 30-day limit
- Support date filtering with --start-date and --end-date parameters
- Add --history parameter with modes: auto, off, readonly, writeonly
- Monthly view now aggregates from historical daily data for complete stats
- Store history in ~/.claude-monitor/history/daily/ organized by year/month
- Smart merging of current and historical data (current data takes priority)
- Automatic data saving on each --view daily/monthly execution

This allows users to maintain long-term usage history and view statistics
beyond Claude's 30-day data retention limit.
@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Walkthrough

Adds timezone-aware CLI date-range parsing/validation, threads start/end datetimes into UsageAggregator.aggregate (end inclusive), introduces HistoryManager for persistent daily history (load/merge/save/aggregate), extends Settings with start_date/end_date/history and validator, and updates tests for date filters, timezone grouping, and history behavior.

Changes

Cohort / File(s) Change Summary
CLI table view & history orchestration
src/claude_monitor/cli/main.py
Parse CLI --start-date/--end-date into timezone-aware start_dt/end_dt (supports YYYY-MM-DD / YYYY.MM.DD / YYYY/MM/DD); validate range; call aggregator.aggregate(start_dt, end_dt); orchestrate HistoryManager load/merge/save flows and emit runtime messages; branch daily vs monthly history composition.
Settings: date range & history mode
src/claude_monitor/core/settings.py
Add start_date: Optional[str], end_date: Optional[str], and history: Literal["auto","off","readonly","writeonly"]; add a before-field validator to normalize/validate history; propagate fields into CLI namespace via to_namespace.
Aggregator API: date-aware aggregation
src/claude_monitor/data/aggregator.py
Change UsageAggregator.aggregate signature to aggregate(self, start_date: Optional[datetime]=None, end_date: Optional[datetime]=None) and thread range into aggregate_daily/aggregate_monthly; perform timezone-aware normalization, coerce naive timestamps, and apply inclusive end-of-day filtering; use TimezoneHandler(timezone) and adjust period-key extraction to local timestamps.
Historical storage & utilities
src/claude_monitor/data/history_manager.py
New HistoryManager class storing per-day JSON under history/daily/YYYY/MM/; adds save_daily_data, load_historical_daily_data (range / days_back), merge_with_current_data, cleanup_old_data, aggregate_monthly_from_daily, and get_statistics; includes safe writes, conflict-resolution rules, and robust load/parse handling.
Tests: aggregator, history, settings
src/tests/test_aggregator.py, src/tests/test_history_manager.py, src/tests/test_settings.py
Add tests for inclusive end-date filters, timezone grouping behavior, HistoryManager lifecycle and edge-cases (save/load/merge/cleanup/aggregate), and Settings new fields/validator; include mocks for entry loading in aggregator tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (table view)
  participant Settings
  participant Aggregator
  participant History as HistoryManager

  User->>CLI: Run with --start-date/--end-date/--history
  CLI->>Settings: Parse & validate settings
  CLI->>CLI: Normalize start_dt/end_dt (tz-aware, make end inclusive)
  CLI->>Aggregator: aggregate(start_dt, end_dt)
  Aggregator-->>CLI: current_data (daily or monthly-pre)
  alt history != "off"
    CLI->>History: load_historical_daily_data(start_dt, end_dt)
    History-->>CLI: historical_daily
    CLI->>History: merge_with_current_data(current_daily, historical_daily)
    History-->>CLI: merged_daily
    alt monthly view requested
      CLI->>History: aggregate_monthly_from_daily(merged_daily)
      History-->>CLI: monthly_data
      CLI-->>CLI: Use monthly_data for display
    else daily view
      CLI-->>CLI: Use merged_daily (or current) for display
    end
    alt history in ["auto","writeonly"]
      CLI->>History: save_daily_data(current_daily)
      History-->>CLI: saved_count
    end
  else
    CLI-->>CLI: Display current_data
  end
  CLI-->>User: Render table
Loading
sequenceDiagram
  autonumber
  participant CLI
  participant Aggregator
  participant DailyAgg as Daily Aggregator
  participant MonthlyAgg as Monthly Aggregator

  CLI->>Aggregator: aggregate(start_dt, end_dt_inc)
  alt aggregation_mode == "daily"
    Aggregator->>DailyAgg: aggregate(range)
    DailyAgg-->>Aggregator: daily_rows
    Aggregator-->>CLI: daily_rows
  else aggregation_mode == "monthly"
    Aggregator->>DailyAgg: aggregate(range)
    DailyAgg-->>Aggregator: daily_rows
    Aggregator->>MonthlyAgg: aggregate(daily_rows)
    MonthlyAgg-->>Aggregator: monthly_rows
    Aggregator-->>CLI: monthly_rows
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I nibble timestamps, tuck tokens in a hole,
Days stacked like carrots, each with a role.
I merge yesterday’s crumbs with today’s bright sight,
Save burrowed memories, then hop out of the night.
Hop! A table, a month — the burrow hums with light. 🥕🐇

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
src/claude_monitor/data/aggregator.py (3)

269-279: Document date-filter semantics (inclusive bounds) and timezone expectations.

aggregate now accepts start_date/end_date, but the docstring doesn’t clarify inclusivity or timezone assumptions. The filtering logic includes both bounds (entry <= end_date), and CLI passes end_date + 1 day to include the whole end day. Please make this explicit to prevent misuse.

Apply this docstring tweak:

 def aggregate(
     self,
     start_date: Optional[datetime] = None,
     end_date: Optional[datetime] = None
 ) -> List[Dict[str, Any]]:
-    """Main aggregation method that reads data and returns aggregated results.
+    """Main aggregation method that reads data and returns aggregated results.
+
+    Notes:
+        - Date filters are treated as inclusive. If you want to include a whole
+          calendar day, pass an end_date at the end of that day, or pass the
+          next day’s midnight (as the CLI does by adding 1 day).
+        - start_date/end_date should be timezone-aware to avoid ambiguous
+          comparisons. Aggregation compares aware datetimes directly.

299-304: Align period-key grouping with the user-selected timezone (not just filtering).

Filtering happens against aware datetimes from the CLI, which is good. However, period grouping uses entry.timestamp.strftime(...) directly, and entries are only localized when naive—defaulting to UTC because TimezoneHandler() is constructed without the requested timezone. This can produce UTC-based day/month keys while filtering is done in another tz, leading to off-by-one-day grouping near boundaries.

Two parts:

  1. Initialize the handler with the chosen timezone, or set it in init.
  2. Convert to that timezone before computing the period key.

Outside the changed hunk, consider this refactor:

# __init__
self.timezone = timezone
self.timezone_handler = TimezoneHandler(default_tz=self.timezone)

# _aggregate_by_period
for entry in entries:
    # ... existing date range checks ...
    local_ts = self.timezone_handler.to_timezone(entry.timestamp, self.timezone)
    period_key = period_key_func(local_ts)

And keep the filtering against aware datetimes (as it already does). This preserves consistent “what day/month is this?” semantics across filtering and grouping.


269-304: Minor API consistency: consider adding date filters to aggregate_from_blocks.

Not a blocker, but since aggregate now takes date filters, aggregate_from_blocks could accept the same to keep the public surface coherent, particularly for tests or future CLI paths that aggregate pre-extracted blocks.

src/claude_monitor/core/settings.py (1)

141-149: Clarify accepted date formats in help text; optionally normalize early.

The CLI accepts YYYY-MM-DD, YYYY.MM.DD, and YYYY/MM/DD, but the field description only mentions YYYY-MM-DD. Please update the description to reduce user confusion. Optional: add a validator to normalize to ISO (YYYY-MM-DD) early.

Apply this small description tweak:

-    start_date: Optional[str] = Field(
-        default=None,
-        description="Start date for filtering data (YYYY-MM-DD format)"
-    )
+    start_date: Optional[str] = Field(
+        default=None,
+        description="Start date for filtering data (YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD)"
+    )

And same for end_date:

-    end_date: Optional[str] = Field(
-        default=None,
-        description="End date for filtering data (YYYY-MM-DD format)"
-    )
+    end_date: Optional[str] = Field(
+        default=None,
+        description="End date for filtering data (YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD)"
+    )

Optional normalization validator (outside this hunk):

@field_validator("start_date", "end_date", mode="before")
@classmethod
def normalize_dates(cls, v: Any) -> Any:
    if isinstance(v, str):
        v = v.strip()
        # Accept dotted/slashed and convert to ISO
        v_iso = v.replace(".", "-").replace("/", "-")
        # Basic YYYY-MM-DD sanity check
        if len(v_iso) == 10:
            return v_iso
    return v
src/claude_monitor/cli/main.py (2)

394-404: Improve the invalid-date message to reflect all supported formats.

The parser accepts three formats but the warning mentions only YYYY-MM-DD. Align the message to reduce confusion.

-            print_themed(f"Invalid date format: {date_str}. Use YYYY-MM-DD.", style="warning")
+            print_themed(
+                f"Invalid date format: {date_str}. Use one of: YYYY-MM-DD, YYYY.MM.DD, YYYY/MM/DD.",
+                style="warning"
+            )

421-424: Avoid redundant aggregation when view_mode == 'monthly'.

You eagerly aggregate once (monthly) and then rebuild monthly from daily + history, overwriting the initial result. This doubles I/O and CPU. Only aggregate upfront for daily; let the monthly branch handle its own flow.

-        # Get aggregated data with date filters
-        logger.info(f"Loading {view_mode} usage data...")
-        aggregated_data = aggregator.aggregate(start_dt, end_dt_inclusive)
+        # Get aggregated data with date filters
+        logger.info(f"Loading {view_mode} usage data...")
+        if view_mode == "daily":
+            aggregated_data = aggregator.aggregate(start_dt, end_dt_inclusive)
+        else:
+            aggregated_data = []
src/claude_monitor/data/history_manager.py (3)

58-123: Safe writes and update heuristics are good; consider richer “freshness” criteria.

You skip overwrites if existing_data == new or if existing_tokens >= new_tokens. Token count is a decent proxy, but you might miss improvements where tokens are equal but other fields differ (e.g., entries_count, total_cost, model_breakdowns).

Consider extending the comparison:

-                    existing_tokens = existing_data.get("input_tokens", 0) + existing_data.get("output_tokens", 0)
-                    new_tokens = day_data.get("input_tokens", 0) + day_data.get("output_tokens", 0)
-                    
-                    if existing_tokens >= new_tokens:
+                    existing_tokens = existing_data.get("input_tokens", 0) + existing_data.get("output_tokens", 0)
+                    new_tokens = day_data.get("input_tokens", 0) + day_data.get("output_tokens", 0)
+                    existing_entries = existing_data.get("entries_count", 0)
+                    new_entries = day_data.get("entries_count", 0)
+                    existing_cost = existing_data.get("total_cost", 0.0)
+                    new_cost = day_data.get("total_cost", 0.0)
+
+                    # Prefer data with more tokens; tie-break by entries, then cost
+                    if (
+                        existing_tokens > new_tokens
+                        or (existing_tokens == new_tokens and existing_entries >= new_entries)
+                        or (existing_tokens == new_tokens and existing_entries == new_entries and existing_cost >= new_cost)
+                    ):
                         self._saved_dates.add(date_str)
                         continue

240-284: Cleanup routine is careful and removes empty directories.

Looks good. Minor nit: consider logging the cutoff date for traceability.


1-409: Add targeted tests for HistoryManager behaviors.

To lock in correctness across refactors, consider tests for:

  • save_daily_data overwrite heuristics,
  • load_historical_daily_data range filtering and sorting,
  • merge_with_current_data precedence,
  • aggregate_monthly_from_daily correctness,
  • cleanup_old_data deletion and empty-dir removal.

I can draft these if helpful.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 06f0fe1 and 874df4c.

📒 Files selected for processing (4)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/core/settings.py (4 hunks)
  • src/claude_monitor/data/aggregator.py (2 hunks)
  • src/claude_monitor/data/history_manager.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (5)
src/claude_monitor/utils/time_utils.py (2)
  • TimezoneHandler (348-458)
  • ensure_timezone (412-416)
src/claude_monitor/terminal/themes.py (1)
  • print_themed (692-700)
src/claude_monitor/data/aggregator.py (2)
  • aggregate (269-305)
  • UsageAggregator (92-305)
src/claude_monitor/data/history_manager.py (5)
  • HistoryManager (18-409)
  • load_historical_daily_data (124-202)
  • merge_with_current_data (204-238)
  • save_daily_data (58-122)
  • aggregate_monthly_from_daily (285-366)
src/claude_monitor/ui/table_views.py (1)
  • display_aggregated_view (318-384)
src/claude_monitor/data/aggregator.py (1)
src/tests/test_aggregator.py (2)
  • test_aggregate_daily_with_date_filter (346-360)
  • test_aggregate_monthly_with_date_filter (391-401)
🔇 Additional comments (10)
src/claude_monitor/core/settings.py (3)

183-187: History mode field looks good.

The Literal plus dedicated validator provide a clear UX and error message. No issues here.


258-271: Validator is correct and resilient.

Lowercasing and enumerated check provide precise feedback on invalid values. Good defensive programming.


382-385: Propagation to CLI namespace is correct.

start_date, end_date, and history are now available to CLI paths. Matches the PR objectives.

src/claude_monitor/cli/main.py (3)

408-410: End-date inclusivity approach is correct.

Adding one day to make the end-date inclusive aligns with the aggregator’s “<= end_date” check and avoids 23:59:59 edge cases. Nice touch.


455-502: Monthly-from-daily flow is solid and aligns with the persistence strategy.

The sequence—build current daily, load historical daily, optional save, merge (current wins), then aggregate to monthly—looks correct and makes monthly consistent with daily totals. Nice.


472-476: Great, fetching the context around the save logic to confirm the weekly/monthly flows. We'll review whether the “writeonly” mode should indeed apply in both places.

src/claude_monitor/data/history_manager.py (4)

124-203: Date-range loading is robust; inclusive comparisons are consistent with CLI.

The tz-naive comparison after stripping tzinfo from aware bounds is a pragmatic choice given YYYY-MM-DD filenames. The logic pairs well with the CLI’s “end_date + 1 day” strategy.


204-239: Merge semantics are correct and predictable.

Historical first, then overwrite with current ensures freshest data wins for overlaps. Sorted output is helpful for downstream display.


285-366: Monthly aggregation from daily is implemented well.

Correct field names, proper set-to-list conversion, and deterministic sorting. Nice.


368-409: Useful stats helper.

No issues; the size and oldest/newest summaries will aid support/debug.

- Fixed 76 ruff linting errors (W293, W292, I001)
- Removed trailing whitespace from blank lines
- Added missing newline at end of file
- Fixed import ordering

Apologies for the lint issues in the initial PR. All checks should pass now.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/claude_monitor/data/history_manager.py (3)

148-158: Normalize date comparisons once and compare by date-only

  • Treat days_back=0 explicitly by checking is not None.
  • Compute comparison dates once.
  • Compare date() to avoid time-of-day edge cases.
-        if days_back:
+        if days_back is not None:
             end_date = datetime.now()
             start_date = end_date - timedelta(days=days_back)
         elif not start_date:
             # Default to loading all available data
             start_date = datetime(2020, 1, 1)  # Arbitrary old date
-
-        if not end_date:
-            end_date = datetime.now()
+        if not end_date:
+            end_date = datetime.now()
+        # Normalize to dates for comparison (strip tz if present)
+        def _to_naive_date(dt):
+            if dt is None:
+                return None
+            if getattr(dt, "tzinfo", None):
+                dt = dt.replace(tzinfo=None)
+            return dt.date()
+        compare_start_date = _to_naive_date(start_date)
+        compare_end_date = _to_naive_date(end_date)
@@
-                            # Make file_date timezone-naive for comparison
-                            # Convert start_date and end_date to naive if they're aware
-                            compare_start = (
-                                start_date.replace(tzinfo=None)
-                                if start_date and start_date.tzinfo
-                                else start_date
-                            )
-                            compare_end = (
-                                end_date.replace(tzinfo=None)
-                                if end_date and end_date.tzinfo
-                                else end_date
-                            )
+                            # Compare by date-only
+                            file_date_date = file_date.date()
@@
-                            if compare_start and file_date < compare_start:
+                            if compare_start_date and file_date_date < compare_start_date:
                                 continue
-                            if compare_end and file_date > compare_end:
+                            if compare_end_date and file_date_date > compare_end_date:
                                 continue

Also applies to: 181-198


261-279: Use date-only cutoff to avoid off-by-hours deletions

Comparing naive datetimes can delete files exactly N days old depending on current time. Compare date() values.

-        cutoff_date = datetime.now() - timedelta(days=days_to_keep)
+        cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).date()
@@
-                        file_date = datetime.strptime(date_str, "%Y-%m-%d")
+                        file_date_date = datetime.strptime(date_str, "%Y-%m-%d").date()
@@
-                        if file_date < cutoff_date:
+                        if file_date_date < cutoff_date:
                             file_path.unlink()

88-90: Specify UTF-8 encoding and avoid ASCII-escaping JSON (nit)

This makes file IO explicit and preserves non-ASCII in model names, etc.

-                    with open(file_path, "r") as f:
+                    with open(file_path, "r", encoding="utf-8") as f:
                         existing_data = json.load(f)
@@
-                with open(temp_file, "w") as f:
-                    json.dump(day_data, f, indent=2, default=str)
+                with open(temp_file, "w", encoding="utf-8") as f:
+                    json.dump(day_data, f, indent=2, ensure_ascii=False, default=str)
@@
-                            with open(file_path, "r") as f:
+                            with open(file_path, "r", encoding="utf-8") as f:
                                 data = json.load(f)

Also applies to: 114-116, 201-204

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 874df4c and 1521544.

📒 Files selected for processing (4)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/core/settings.py (4 hunks)
  • src/claude_monitor/data/aggregator.py (2 hunks)
  • src/claude_monitor/data/history_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/claude_monitor/cli/main.py
  • src/claude_monitor/core/settings.py
  • src/claude_monitor/data/aggregator.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/claude_monitor/data/history_manager.py (3)
src/claude_monitor/core/settings.py (2)
  • exists (82-84)
  • load (55-71)
src/claude_monitor/utils/notifications.py (1)
  • NotificationManager (9-121)
src/claude_monitor/data/analysis.py (1)
  • analyze_usage (18-100)
🔇 Additional comments (1)
src/claude_monitor/data/history_manager.py (1)

239-247: Merge strategy LGTM

Merging historical first and letting current overwrite by date is correct and keeps the latest authoritative values.

- Prevent path traversal attacks with strict date format validation
- Improve data comparison logic to preserve comprehensive information
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/claude_monitor/data/history_manager.py (3)

49-66: Strict date validation: good; add final path containment check

Regex + strptime is solid. Add a resolve().relative_to(self.daily_dir.resolve()) check to guarantee the computed path stays under the base dir (belt-and-suspenders against any edge cases).

Apply:

@@ def _get_daily_file_path(self, date_str: str) -> Path:
-        month_dir = self.daily_dir / year / month
-        month_dir.mkdir(parents=True, exist_ok=True)
-
-        return month_dir / f"{date_str}.json"
+        month_dir = self.daily_dir / year / month
+        month_dir.mkdir(parents=True, exist_ok=True)
+        target = month_dir / f"{date_str}.json"
+        base = self.daily_dir.resolve()
+        resolved = target.resolve()
+        try:
+            resolved.relative_to(base)
+        except ValueError as e:
+            raise ValueError(f"Resolved path escapes history base dir: {resolved}") from e
+        return resolved

87-90: Don’t short-circuit same-session updates

This skips potentially richer data in the same run. Let on-disk comparison decide; only mark _saved_dates after a successful write.

Apply:

-            # Skip if already saved in this session (unless overwrite)
-            if not overwrite and date_str in self._saved_dates:
-                continue
+            # Let on-disk comparison decide; only guard after we actually write.

Note: Updating this will require adjusting tests that rely on the current short-circuit behavior.


100-135: Compare richness across fields, not just summed tokens

Use field-wise “greater-or-equal across key metrics” to avoid discarding updates where cost, cache tokens, or entries increase without raising total tokens.

Apply:

-                    existing_total_tokens = (
-                        existing_data.get("input_tokens", 0)
-                        + existing_data.get("output_tokens", 0)
-                        + existing_data.get("cache_creation_tokens", 0)
-                        + existing_data.get("cache_read_tokens", 0)
-                    )
-                    new_total_tokens = (
-                        day_data.get("input_tokens", 0)
-                        + day_data.get("output_tokens", 0)
-                        + day_data.get("cache_creation_tokens", 0)
-                        + day_data.get("cache_read_tokens", 0)
-                    )
-
-                    # Compare entries count and cost
-                    existing_entries = existing_data.get("entries_count", 0)
-                    new_entries = day_data.get("entries_count", 0)
-                    existing_cost = existing_data.get("total_cost", 0.0)
-                    new_cost = day_data.get("total_cost", 0.0)
-
-                    # Keep existing only if it has more total tokens, more entries, AND higher cost
-                    # This ensures we don't lose any valuable information
-                    if (
-                        existing_total_tokens > new_total_tokens
-                        and existing_entries >= new_entries
-                        and existing_cost >= new_cost
-                    ):
+                    fields = [
+                        "input_tokens",
+                        "output_tokens",
+                        "cache_creation_tokens",
+                        "cache_read_tokens",
+                        "entries_count",
+                    ]
+                    existing_cost = existing_data.get("total_cost", 0.0)
+                    new_cost = day_data.get("total_cost", 0.0)
+                    # Keep existing only if it's richer-or-equal across all fields and cost
+                    if all(existing_data.get(k, 0) >= day_data.get(k, 0) for k in fields) and existing_cost >= new_cost:
                         self._saved_dates.add(date_str)
                         continue
🧹 Nitpick comments (5)
src/tests/test_settings.py (1)

720-732: Avoid testing internal validators directly

Calling Settings.validate_history directly couples tests to implementation details. Prefer validating via instantiation (already covered) and remove or downgrade this test.

src/claude_monitor/data/history_manager.py (1)

141-147: Durable writes: fsync temp before atomic replace

Flush and fsync the temp file prior to replace to reduce risk of data loss on crashes.

Apply:

-                with open(temp_file, "w", encoding="utf-8") as f:
-                    json.dump(day_data, f, indent=2, default=str, ensure_ascii=False)
+                import os
+                with open(temp_file, "w", encoding="utf-8") as f:
+                    json.dump(day_data, f, indent=2, default=str, ensure_ascii=False)
+                    f.flush()
+                    os.fsync(f.fileno())
src/tests/test_history_manager.py (3)

82-109: Test relies on same-session short-circuit

This asserts no overwrite in the same run. If we remove the short-circuit (recommended), update this test to instead verify on-disk comparison behavior.


178-199: Stabilize time-based test

Using datetime.now() can be flaky across DST/clock changes. Consider freezing time (e.g., freezegun) or anchoring to a fixed date in UTC.


154-177: Add boundary case for end_date

Include a file with a date equal to end_date and assert whether it’s included/excluded to lock the contract.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1521544 and 0a1400b.

📒 Files selected for processing (5)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/data/history_manager.py (1 hunks)
  • src/tests/test_aggregator.py (2 hunks)
  • src/tests/test_history_manager.py (1 hunks)
  • src/tests/test_settings.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/claude_monitor/cli/main.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/tests/test_history_manager.py (1)
src/claude_monitor/data/history_manager.py (8)
  • HistoryManager (18-454)
  • _get_daily_file_path (35-65)
  • save_daily_data (67-158)
  • load_historical_daily_data (160-246)
  • merge_with_current_data (248-280)
  • aggregate_monthly_from_daily (327-411)
  • cleanup_old_data (282-325)
  • get_statistics (413-454)
src/tests/test_aggregator.py (2)
src/claude_monitor/data/aggregator.py (4)
  • UsageAggregator (92-303)
  • aggregate_daily (157-179)
  • aggregate_monthly (181-203)
  • aggregate (269-303)
src/claude_monitor/core/models.py (1)
  • UsageEntry (20-31)
src/claude_monitor/data/history_manager.py (1)
src/claude_monitor/data/aggregator.py (4)
  • UsageAggregator (92-297)
  • AggregatedStats (20-48)
  • AggregatedPeriod (52-89)
  • add_entry (62-72)
src/tests/test_settings.py (1)
src/claude_monitor/core/settings.py (4)
  • Settings (87-384)
  • validate_history (258-268)
  • to_namespace (364-384)
  • Settings (87-354)
🔇 Additional comments (5)
src/tests/test_aggregator.py (2)

625-707: Good coverage for date-range filtering

Solid slice tests for daily and monthly windows with sorted assertions. LGTM.


708-744: aggregate signature matches test usage
Verified that UsageAggregator.aggregate defines start_date: Optional[datetime] = None and end_date: Optional[datetime] = None (src/claude_monitor/data/aggregator.py:269–271), so the tests’ calls align with the implementation.

src/tests/test_settings.py (2)

733-761: to_namespace includes the new fields
Lines 380–382 in src/claude_monitor/core/settings.py map start_date, end_date, and history to the namespace.


670-696: Unable to locate the _parse_date function implementation in src/claude_monitor/cli/main.py. Please verify that _parse_date correctly parses and normalizes all supported separators (YYYY-MM-DD, YYYY.MM.DD, YYYY/MM/DD) before comparing; if it currently only handles one format, add explicit parsing logic or validation to cover the others.

src/tests/test_history_manager.py (1)

396-439: Great “richer update” coverage

Good assertion that newer data with more info replaces the old record. LGTM.

@Hoper-J Hoper-J force-pushed the feature/historical-data-storage branch from 0a1400b to 364f572 Compare September 6, 2025 14:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
src/claude_monitor/cli/main.py (2)

403-406: Clarify accepted date formats in the warning message.

Message lists only YYYY-MM-DD while code accepts ".", "/". Update for accuracy.

-            print_themed(
-                f"Invalid date format: {date_str}. Use YYYY-MM-DD.", style="warning"
-            )
+            print_themed(
+                f"Invalid date format: {date_str}. Use YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD.",
+                style="warning",
+            )

458-465: Save only current-session days to history to avoid redundant rewrites.

After merging, aggregated_data may include already-saved historical days. Filter to save only new/current ones (auto) and keep writeonly behavior unchanged.

-                if aggregated_data and history_mode in ["auto", "writeonly"]:
-                    saved_count = history_manager.save_daily_data(aggregated_data)
+                if aggregated_data and history_mode in ["auto", "writeonly"]:
+                    data_to_save = aggregated_data
+                    if history_mode == "auto" and historical_data:
+                        hist_dates = {d.get("date") for d in historical_data}
+                        data_to_save = [d for d in aggregated_data if d.get("date") not in hist_dates]
+                    saved_count = history_manager.save_daily_data(data_to_save)
                     if saved_count > 0:
                         print_themed(
                             f"Saved {saved_count} days to history", style="success"
                         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a1400b and 364f572.

📒 Files selected for processing (1)
  • src/claude_monitor/cli/main.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/claude_monitor/cli/main.py (4)
src/claude_monitor/utils/time_utils.py (2)
  • TimezoneHandler (348-458)
  • ensure_timezone (412-416)
src/claude_monitor/terminal/themes.py (1)
  • print_themed (692-700)
src/claude_monitor/data/aggregator.py (4)
  • aggregate (269-303)
  • UsageAggregator (92-303)
  • UsageAggregator (92-297)
  • aggregate_daily (157-179)
src/claude_monitor/data/history_manager.py (4)
  • load_historical_daily_data (160-246)
  • merge_with_current_data (248-280)
  • save_daily_data (67-158)
  • aggregate_monthly_from_daily (327-411)
🔇 Additional comments (1)
src/claude_monitor/cli/main.py (1)

526-538: Nice: monthly derived from merged daily ensures consistency.

Using HistoryManager.aggregate_monthly_from_daily after merging current+history is the right move to align daily and monthly totals.

- Fix path traversal vulnerability with strict regex validation
- Implement unified inclusive date boundaries across components
- Remove problematic session-level skip check in history manager
- Add date range validation in CLI
- Update test expectations for new boundary semantics
- Remove complex date conversion logic from main.py
@Hoper-J Hoper-J force-pushed the feature/historical-data-storage branch from 4aeb3ea to 8da35a4 Compare September 7, 2025 08:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
src/claude_monitor/data/aggregator.py (2)

138-149: Hoist constant computation for end boundary

Compute next_day once outside the loop to avoid per-entry allocation. Behavior stays the same.

         norm_end = (
             self.timezone_handler.to_timezone(end_date, self.timezone)
             if end_date
             else None
         )
 
+        next_day = norm_end + timedelta(days=1) if norm_end else None
-
-        for entry in entries:
+        for entry in entries:
             # Convert entry timestamp to the configured timezone for filtering
             # and period-key extraction.
             ts_local = self.timezone_handler.to_timezone(entry.timestamp, self.timezone)
 
             # Apply date filters (inclusive boundaries in local timezone)
             if norm_start and ts_local < norm_start:
                 continue
             # For end_date, include all entries up to the end of that day.
             # Exclude entries >= next day's midnight in local timezone.
-            if norm_end:
-                next_day = norm_end + timedelta(days=1)
-                if ts_local >= next_day:
-                    continue
+            if next_day and ts_local >= next_day:
+                continue

Also applies to: 151-165


319-323: Avoid mutating input entries in place

You’re normalizing tzinfo by mutating entry.timestamp. Safer to avoid side effects and use a local when possible. Not critical here, but reduces surprises for callers reusing the list.

src/claude_monitor/cli/main.py (1)

430-433: Minor: skip redundant monthly aggregation when using history

You compute monthly via UsageAggregator, then recompute from daily+history and replace it. Skip the initial call when view_mode=="monthly" and history is not "off" to avoid wasted work.

-        logger.info(f"Loading {view_mode} usage data...")
-        aggregated_data = aggregator.aggregate(start_dt, end_dt)
+        logger.info(f"Loading {view_mode} usage data...")
+        aggregated_data = (
+            [] if (view_mode == "monthly" and getattr(args, "history", "auto") != "off")
+            else aggregator.aggregate(start_dt, end_dt)
+        )
src/claude_monitor/data/history_manager.py (2)

208-227: Compare by date only to avoid time-of-day skew

Files are per-day; if a caller passes a datetime with a time component, current naive comparison can exclude the boundary day unintentionally. Normalize to dates before comparing.

-                            # Make file_date timezone-naive for comparison
-                            # Convert start_date and end_date to naive if they're aware
-                            compare_start = (
-                                start_date.replace(tzinfo=None)
-                                if start_date and start_date.tzinfo
-                                else start_date
-                            )
-                            compare_end = (
-                                end_date.replace(tzinfo=None)
-                                if end_date and end_date.tzinfo
-                                else end_date
-                            )
-
-                            # Check if within range (inclusive on both ends)
-                            if compare_start and file_date < compare_start:
+                            file_day = file_date.date()
+                            compare_start = start_date.date() if start_date else None
+                            compare_end = end_date.date() if end_date else None
+
+                            # Check if within range (inclusive on both ends)
+                            if compare_start and file_day < compare_start:
                                 continue
-                            if (
-                                compare_end and file_date > compare_end
-                            ):  # > means end_date is inclusive
+                            if compare_end and file_day > compare_end:
                                 continue

281-325: Minor: cutoff by date for cleanup

Switch to date-based comparison to avoid partial-day behavior; not urgent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 364f572 and 8da35a4.

📒 Files selected for processing (6)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/core/settings.py (4 hunks)
  • src/claude_monitor/data/aggregator.py (5 hunks)
  • src/claude_monitor/data/history_manager.py (1 hunks)
  • src/tests/test_aggregator.py (3 hunks)
  • src/tests/test_history_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/tests/test_history_manager.py
  • src/claude_monitor/core/settings.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/tests/test_aggregator.py (2)
src/claude_monitor/data/aggregator.py (4)
  • UsageAggregator (92-330)
  • aggregate_daily (184-206)
  • aggregate_monthly (208-230)
  • aggregate (296-330)
src/claude_monitor/core/models.py (1)
  • UsageEntry (20-31)
src/claude_monitor/cli/main.py (5)
src/claude_monitor/utils/time_utils.py (2)
  • TimezoneHandler (348-458)
  • ensure_timezone (412-416)
src/claude_monitor/terminal/themes.py (1)
  • print_themed (692-700)
src/claude_monitor/data/aggregator.py (2)
  • aggregate (296-330)
  • UsageAggregator (92-330)
src/claude_monitor/data/history_manager.py (5)
  • HistoryManager (18-453)
  • load_historical_daily_data (153-245)
  • merge_with_current_data (247-279)
  • save_daily_data (66-151)
  • aggregate_monthly_from_daily (326-410)
src/claude_monitor/ui/table_views.py (1)
  • display_aggregated_view (318-384)
src/claude_monitor/data/aggregator.py (2)
src/tests/test_data_reader.py (1)
  • timezone_handler (470-471)
src/claude_monitor/utils/time_utils.py (2)
  • TimezoneHandler (348-458)
  • to_timezone (441-445)
🔇 Additional comments (10)
src/claude_monitor/data/aggregator.py (1)

108-111: Good: timezone handler now respects user-selected TZ

Initializing TimezoneHandler with the configured timezone is correct and aligns all downstream conversions.

src/tests/test_aggregator.py (5)

352-354: End-date inclusivity clarified in tests

Comment and inputs now reflect whole-day inclusive semantics.


624-659: Solid coverage for inclusive daily filters

The range Jan 3–7 with boundary assertions and token checks looks good.


660-707: Monthly range test matches intended semantics

Using end_date=2025-01-01 to include only December 2024 works and is unambiguous.


708-745: aggregate(start,end) wiring validated

Patch/mocked load verifies both filtered and unfiltered paths; assertions are tight.


746-791: Timezone grouping test is precise

Boundary behavior under UTC vs America/New_York is correctly asserted with totals.

src/claude_monitor/cli/main.py (2)

393-411: Date parsing is robust and TZ-aware

Supporting three formats and localizing via TimezoneHandler is correct.


412-419: Range validation: good early guard

Clear user-facing message and early return prevent confusing results.

src/claude_monitor/data/history_manager.py (2)

48-65: Good hardening: strict YYYY-MM-DD and safe path construction

Regex + datetime.strptime + fixed layout eliminates traversal risks.


326-411: Monthly aggregation from daily looks correct

Accumulators, models_used set→list, and breakdown merges are sound.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/claude_monitor/cli/main.py (1)

482-488: Apply the same end-of-day fix to daily aggregation feeding monthly composition

Without this, the last day can be dropped from monthly totals.

Apply:

-                    current_daily = daily_aggregator.aggregate(start_dt, end_dt)
+                    current_daily = daily_aggregator.aggregate(start_dt, end_dt_eod)
🧹 Nitpick comments (3)
src/claude_monitor/data/history_manager.py (3)

165-175: Validate days_back and handle non-positive values

Avoid surprising ranges when days_back is 0/negative.

Apply:

-        if days_back:
-            end_date = datetime.now()
-            start_date = end_date - timedelta(days=days_back)
+        if days_back is not None:
+            if days_back <= 0:
+                logger.warning("days_back must be > 0; ignoring days_back=%r", days_back)
+            else:
+                end_date = datetime.now()
+                start_date = end_date - timedelta(days=days_back)

198-209: Minor perf/readability: compute compare_start/end once

These are constant across files; move them out of the inner loop.

Apply:

         # Scan for files in the date range
-        for year_dir in sorted(self.daily_dir.iterdir()):
+        # Normalize bounds once
+        compare_start = (
+            start_date.replace(tzinfo=None) if start_date and start_date.tzinfo else start_date
+        )
+        compare_end = (
+            end_date.replace(tzinfo=None) if end_date and end_date.tzinfo else end_date
+        )
+
+        for year_dir in sorted(self.daily_dir.iterdir()):
@@
-                            # Make file_date timezone-naive for comparison
-                            # Convert start_date and end_date to naive if they're aware
-                            compare_start = (
-                                start_date.replace(tzinfo=None)
-                                if start_date and start_date.tzinfo
-                                else start_date
-                            )
-                            compare_end = (
-                                end_date.replace(tzinfo=None)
-                                if end_date and end_date.tzinfo
-                                else end_date
-                            )

Also applies to: 176-176


437-443: Expose both history root and daily path in stats

“data_directory” points to the daily subdir; also returning the history root can help users locate everything.

Apply:

         return {
             "total_files": total_files,
             "oldest_date": oldest_date.strftime("%Y-%m-%d") if oldest_date else None,
             "newest_date": newest_date.strftime("%Y-%m-%d") if newest_date else None,
             "total_size_mb": round(total_size / (1024 * 1024), 2),
-            "data_directory": str(self.daily_dir),
+            "data_directory": str(self.daily_dir),
+            "history_directory": str(self.data_dir),
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8da35a4 and 10df28e.

📒 Files selected for processing (2)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/data/history_manager.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (1)
src/claude_monitor/data/aggregator.py (7)
  • UsageAggregator (92-297)
  • aggregate_daily (157-179)
  • add_entry (62-72)
  • AggregatedPeriod (52-89)
  • _aggregate_by_period (110-155)
  • aggregate (269-297)
  • aggregate_monthly (181-203)
src/claude_monitor/data/history_manager.py (1)
src/claude_monitor/data/aggregator.py (4)
  • UsageAggregator (92-297)
  • add_entry (62-72)
  • AggregatedStats (20-48)
  • AggregatedPeriod (52-89)
🔇 Additional comments (3)
src/claude_monitor/data/history_manager.py (1)

46-65: Good: strict path validation + atomic writes

Regex + strptime prevent traversal; temp-file swap is safe. Looks solid.

Also applies to: 125-133

src/claude_monitor/cli/main.py (2)

441-472: Daily view history flow reads/merges/saves correctly

Behavior aligns with modes and inclusive history semantics. Nice.


430-433: Confirm aggregator.aggregate(...) accepts start/end args

The call passes two args; ensure the Aggregator API matches to avoid runtime errors.

Run:

#!/bin/bash
# Verify aggregate signature and parameter names
rg -nP 'class\s+UsageAggregator\b' -n src/claude_monitor/data/aggregator.py -C2
rg -nP 'def\s+aggregate\s*\(' src/claude_monitor/data/aggregator.py -n -C2
rg -nP 'aggregate\([^)]*(start|end)_date' src/claude_monitor/data/aggregator.py -n -C2

- Add detailed comments explaining whole-day inclusive end_date behavior
- Document that end_date includes all entries up to 23:59:59
- Clarify that aggregator excludes entries >= next day's midnight
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/claude_monitor/cli/main.py (2)

389-394: Nit: move imports to module scope (or add justification).

Local imports are fine for cold-start, but unless startup cost is a concern here, prefer top-level imports for clarity and tooling. If you keep them local for perf, add a brief comment.


395-408: Date parsing looks good; trim whitespace to be forgiving.

Minor robustness: strip incoming strings before parsing to avoid user typos causing “invalid format.”

-        def _parse_date(date_str: Optional[str]):
+        def _parse_date(date_str: Optional[str]):
             if not date_str:
                 return None
+            date_str = date_str.strip()
src/claude_monitor/data/aggregator.py (2)

126-137: Clarify “whole-day inclusive” applies to both start and end.

Doc mentions end-day inclusivity explicitly; mirror that for start to avoid ambiguity.

-            Both start_date and end_date are inclusive. If end_date is provided,
-            all entries from that entire day are included (up to 23:59:59.999999).
+            Both start_date and end_date are inclusive by calendar day in the
+            configured timezone. The filter includes all entries on start_date
+            and end_date (i.e., [start_of_day(start_date), start_of_day(end_date+1))).

321-325: Minor: avoid mutating entries in place.

If entries are reused elsewhere, prefer deriving a local variable for tz-adjusted timestamp to keep dataclasses immutable in spirit.

-        for entry in entries:
-            if entry.timestamp.tzinfo is None:
-                entry.timestamp = self.timezone_handler.ensure_timezone(entry.timestamp)
+        for entry in entries:
+            if entry.timestamp.tzinfo is None:
+                entry.timestamp = self.timezone_handler.ensure_timezone(entry.timestamp)
+            # Keep entry.timestamp as-is beyond tz assignment; use ts_local below exclusively.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10df28e and fac3903.

📒 Files selected for processing (2)
  • src/claude_monitor/cli/main.py (2 hunks)
  • src/claude_monitor/data/aggregator.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (2)
src/claude_monitor/ui/table_views.py (1)
  • display_aggregated_view (318-384)
src/tests/test_aggregator.py (1)
  • TestUsageAggregator (282-622)
src/claude_monitor/data/aggregator.py (1)
src/tests/test_aggregator.py (3)
  • test_aggregate_daily_with_date_filter (346-360)
  • test_aggregate_monthly_with_date_filter (391-401)
  • TestUsageAggregator (282-622)
🔇 Additional comments (9)
src/claude_monitor/cli/main.py (8)

412-418: Good: early range validation.

Clear message and early exit prevent confusing results.


430-436: Confirm end-bound semantics match aggregator.

You rely on aggregator’s “whole-day inclusive” handling. Ensure aggregator normalizes end to day-start before adding +1 day; otherwise passing an end with time (e.g., 23:59:59) can overshoot. See comment in aggregator.py.


444-467: Daily: correct load→merge flow.

Inclusive day-range for history load and merging order (current wins) align with spec.


468-475: Daily: write path honors writeonly.

Save gating for ["auto","writeonly"] looks right; ensure save is idempotent on existing days.

Would you like me to add a quick guard in HistoryManager.save_daily_data to skip files that already exist with identical content?


476-485: Monthly: build from daily first — good.

Using daily aggregation as the canonical source for monthly is the right call.


486-501: Monthly: mode gating for load/save is consistent.

Readonly avoids writes; writeonly avoids loads. Good.


503-534: Helpful composition messaging.

Nice UX showing current vs history provenance.


536-549: Monthly aggregation from daily is correct.

Replacing initial monthly data with aggregation from daily ensures consistency.

src/claude_monitor/data/aggregator.py (1)

326-331: LGTM: single entry point honors filters per mode.

Clean delegation to daily/monthly helpers with bounds.

Comment on lines +140 to 167
# Normalize filter boundaries into the configured timezone for
# consistent, intuitive "whole-day inclusive" semantics.
norm_start = (
self.timezone_handler.to_timezone(start_date, self.timezone)
if start_date
else None
)
norm_end = (
self.timezone_handler.to_timezone(end_date, self.timezone)
if end_date
else None
)

for entry in entries:
# Apply date filters
if start_date and entry.timestamp < start_date:
continue
if end_date and entry.timestamp > end_date:
# Convert entry timestamp to the configured timezone for filtering
# and period-key extraction.
ts_local = self.timezone_handler.to_timezone(entry.timestamp, self.timezone)

# Apply date filters (inclusive boundaries in local timezone)
if norm_start and ts_local < norm_start:
continue
# For end_date, include all entries up to the end of that day.
# Exclude entries >= next day's midnight in local timezone.
if norm_end:
next_day = norm_end + timedelta(days=1)
if ts_local >= next_day:
continue

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: end_date with time-of-day over-includes the next day. Normalize to day-start before +1 day.

Currently next_day = norm_end + 1 day. If norm_end is 23:59:59, you include the entire following day until 23:59:59. Normalize start/end to the start of their days, then use an exclusive upper bound = end_day_start + 1 day.

-        norm_start = (
-            self.timezone_handler.to_timezone(start_date, self.timezone)
-            if start_date
-            else None
-        )
-        norm_end = (
-            self.timezone_handler.to_timezone(end_date, self.timezone)
-            if end_date
-            else None
-        )
+        norm_start = (
+            self.timezone_handler.to_timezone(start_date, self.timezone)
+            if start_date
+            else None
+        )
+        norm_end = (
+            self.timezone_handler.to_timezone(end_date, self.timezone)
+            if end_date
+            else None
+        )
+        # Day-normalized inclusive bounds in local time
+        start_bound = (
+            norm_start.replace(hour=0, minute=0, second=0, microsecond=0)
+            if norm_start
+            else None
+        )
+        end_exclusive = (
+            norm_end.replace(hour=0, minute=0, second=0, microsecond=0) + timedelta(days=1)
+            if norm_end
+            else None
+        )
@@
-            # Apply date filters (inclusive boundaries in local timezone)
-            if norm_start and ts_local < norm_start:
+            # Apply date filters (inclusive by calendar day)
+            if start_bound and ts_local < start_bound:
                 continue
-            # For end_date, include all entries up to the end of that day.
-            # Exclude entries >= next day's midnight in local timezone.
-            if norm_end:
-                next_day = norm_end + timedelta(days=1)
-                if ts_local >= next_day:
-                    continue
+            # Exclude entries on or after start_of_day(end_date + 1)
+            if end_exclusive and ts_local >= end_exclusive:
+                continue
#!/bin/bash
# Expect to see only two days (2024-01-15 and 2024-01-31) in tests;
# grep current logic and test inputs to confirm potential overshoot.
rg -n -C2 'next_day\s*=\s*norm_end\s*\+\s*timedelta\(days=1\)' src/claude_monitor/data/aggregator.py
rg -n -C2 'aggregate_daily\(.+end_date' src/tests | sed -n '1,120p'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant