-
-
Notifications
You must be signed in to change notification settings - Fork 280
feat: Add historical data storage for daily and monthly usage tracking #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add historical data storage for daily and monthly usage tracking #145
Conversation
- Add HistoryManager to save/load daily usage data beyond Claude's 30-day limit - Support date filtering with --start-date and --end-date parameters - Add --history parameter with modes: auto, off, readonly, writeonly - Monthly view now aggregates from historical daily data for complete stats - Store history in ~/.claude-monitor/history/daily/ organized by year/month - Smart merging of current and historical data (current data takes priority) - Automatic data saving on each --view daily/monthly execution This allows users to maintain long-term usage history and view statistics beyond Claude's 30-day data retention limit.
WalkthroughAdds timezone-aware CLI date-range parsing/validation, threads start/end datetimes into UsageAggregator.aggregate (end inclusive), introduces HistoryManager for persistent daily history (load/merge/save/aggregate), extends Settings with start_date/end_date/history and validator, and updates tests for date filters, timezone grouping, and history behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI (table view)
participant Settings
participant Aggregator
participant History as HistoryManager
User->>CLI: Run with --start-date/--end-date/--history
CLI->>Settings: Parse & validate settings
CLI->>CLI: Normalize start_dt/end_dt (tz-aware, make end inclusive)
CLI->>Aggregator: aggregate(start_dt, end_dt)
Aggregator-->>CLI: current_data (daily or monthly-pre)
alt history != "off"
CLI->>History: load_historical_daily_data(start_dt, end_dt)
History-->>CLI: historical_daily
CLI->>History: merge_with_current_data(current_daily, historical_daily)
History-->>CLI: merged_daily
alt monthly view requested
CLI->>History: aggregate_monthly_from_daily(merged_daily)
History-->>CLI: monthly_data
CLI-->>CLI: Use monthly_data for display
else daily view
CLI-->>CLI: Use merged_daily (or current) for display
end
alt history in ["auto","writeonly"]
CLI->>History: save_daily_data(current_daily)
History-->>CLI: saved_count
end
else
CLI-->>CLI: Display current_data
end
CLI-->>User: Render table
sequenceDiagram
autonumber
participant CLI
participant Aggregator
participant DailyAgg as Daily Aggregator
participant MonthlyAgg as Monthly Aggregator
CLI->>Aggregator: aggregate(start_dt, end_dt_inc)
alt aggregation_mode == "daily"
Aggregator->>DailyAgg: aggregate(range)
DailyAgg-->>Aggregator: daily_rows
Aggregator-->>CLI: daily_rows
else aggregation_mode == "monthly"
Aggregator->>DailyAgg: aggregate(range)
DailyAgg-->>Aggregator: daily_rows
Aggregator->>MonthlyAgg: aggregate(daily_rows)
MonthlyAgg-->>Aggregator: monthly_rows
Aggregator-->>CLI: monthly_rows
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
src/claude_monitor/data/aggregator.py (3)
269-279: Document date-filter semantics (inclusive bounds) and timezone expectations.aggregate now accepts start_date/end_date, but the docstring doesn’t clarify inclusivity or timezone assumptions. The filtering logic includes both bounds (entry <= end_date), and CLI passes end_date + 1 day to include the whole end day. Please make this explicit to prevent misuse.
Apply this docstring tweak:
def aggregate( self, start_date: Optional[datetime] = None, end_date: Optional[datetime] = None ) -> List[Dict[str, Any]]: - """Main aggregation method that reads data and returns aggregated results. + """Main aggregation method that reads data and returns aggregated results. + + Notes: + - Date filters are treated as inclusive. If you want to include a whole + calendar day, pass an end_date at the end of that day, or pass the + next day’s midnight (as the CLI does by adding 1 day). + - start_date/end_date should be timezone-aware to avoid ambiguous + comparisons. Aggregation compares aware datetimes directly.
299-304: Align period-key grouping with the user-selected timezone (not just filtering).Filtering happens against aware datetimes from the CLI, which is good. However, period grouping uses entry.timestamp.strftime(...) directly, and entries are only localized when naive—defaulting to UTC because TimezoneHandler() is constructed without the requested timezone. This can produce UTC-based day/month keys while filtering is done in another tz, leading to off-by-one-day grouping near boundaries.
Two parts:
- Initialize the handler with the chosen timezone, or set it in init.
- Convert to that timezone before computing the period key.
Outside the changed hunk, consider this refactor:
# __init__ self.timezone = timezone self.timezone_handler = TimezoneHandler(default_tz=self.timezone) # _aggregate_by_period for entry in entries: # ... existing date range checks ... local_ts = self.timezone_handler.to_timezone(entry.timestamp, self.timezone) period_key = period_key_func(local_ts)And keep the filtering against aware datetimes (as it already does). This preserves consistent “what day/month is this?” semantics across filtering and grouping.
269-304: Minor API consistency: consider adding date filters to aggregate_from_blocks.Not a blocker, but since aggregate now takes date filters, aggregate_from_blocks could accept the same to keep the public surface coherent, particularly for tests or future CLI paths that aggregate pre-extracted blocks.
src/claude_monitor/core/settings.py (1)
141-149: Clarify accepted date formats in help text; optionally normalize early.The CLI accepts YYYY-MM-DD, YYYY.MM.DD, and YYYY/MM/DD, but the field description only mentions YYYY-MM-DD. Please update the description to reduce user confusion. Optional: add a validator to normalize to ISO (YYYY-MM-DD) early.
Apply this small description tweak:
- start_date: Optional[str] = Field( - default=None, - description="Start date for filtering data (YYYY-MM-DD format)" - ) + start_date: Optional[str] = Field( + default=None, + description="Start date for filtering data (YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD)" + )And same for end_date:
- end_date: Optional[str] = Field( - default=None, - description="End date for filtering data (YYYY-MM-DD format)" - ) + end_date: Optional[str] = Field( + default=None, + description="End date for filtering data (YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD)" + )Optional normalization validator (outside this hunk):
@field_validator("start_date", "end_date", mode="before") @classmethod def normalize_dates(cls, v: Any) -> Any: if isinstance(v, str): v = v.strip() # Accept dotted/slashed and convert to ISO v_iso = v.replace(".", "-").replace("/", "-") # Basic YYYY-MM-DD sanity check if len(v_iso) == 10: return v_iso return vsrc/claude_monitor/cli/main.py (2)
394-404: Improve the invalid-date message to reflect all supported formats.The parser accepts three formats but the warning mentions only YYYY-MM-DD. Align the message to reduce confusion.
- print_themed(f"Invalid date format: {date_str}. Use YYYY-MM-DD.", style="warning") + print_themed( + f"Invalid date format: {date_str}. Use one of: YYYY-MM-DD, YYYY.MM.DD, YYYY/MM/DD.", + style="warning" + )
421-424: Avoid redundant aggregation when view_mode == 'monthly'.You eagerly aggregate once (monthly) and then rebuild monthly from daily + history, overwriting the initial result. This doubles I/O and CPU. Only aggregate upfront for daily; let the monthly branch handle its own flow.
- # Get aggregated data with date filters - logger.info(f"Loading {view_mode} usage data...") - aggregated_data = aggregator.aggregate(start_dt, end_dt_inclusive) + # Get aggregated data with date filters + logger.info(f"Loading {view_mode} usage data...") + if view_mode == "daily": + aggregated_data = aggregator.aggregate(start_dt, end_dt_inclusive) + else: + aggregated_data = []src/claude_monitor/data/history_manager.py (3)
58-123: Safe writes and update heuristics are good; consider richer “freshness” criteria.You skip overwrites if existing_data == new or if existing_tokens >= new_tokens. Token count is a decent proxy, but you might miss improvements where tokens are equal but other fields differ (e.g., entries_count, total_cost, model_breakdowns).
Consider extending the comparison:
- existing_tokens = existing_data.get("input_tokens", 0) + existing_data.get("output_tokens", 0) - new_tokens = day_data.get("input_tokens", 0) + day_data.get("output_tokens", 0) - - if existing_tokens >= new_tokens: + existing_tokens = existing_data.get("input_tokens", 0) + existing_data.get("output_tokens", 0) + new_tokens = day_data.get("input_tokens", 0) + day_data.get("output_tokens", 0) + existing_entries = existing_data.get("entries_count", 0) + new_entries = day_data.get("entries_count", 0) + existing_cost = existing_data.get("total_cost", 0.0) + new_cost = day_data.get("total_cost", 0.0) + + # Prefer data with more tokens; tie-break by entries, then cost + if ( + existing_tokens > new_tokens + or (existing_tokens == new_tokens and existing_entries >= new_entries) + or (existing_tokens == new_tokens and existing_entries == new_entries and existing_cost >= new_cost) + ): self._saved_dates.add(date_str) continue
240-284: Cleanup routine is careful and removes empty directories.Looks good. Minor nit: consider logging the cutoff date for traceability.
1-409: Add targeted tests for HistoryManager behaviors.To lock in correctness across refactors, consider tests for:
- save_daily_data overwrite heuristics,
- load_historical_daily_data range filtering and sorting,
- merge_with_current_data precedence,
- aggregate_monthly_from_daily correctness,
- cleanup_old_data deletion and empty-dir removal.
I can draft these if helpful.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/core/settings.py(4 hunks)src/claude_monitor/data/aggregator.py(2 hunks)src/claude_monitor/data/history_manager.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (5)
src/claude_monitor/utils/time_utils.py (2)
TimezoneHandler(348-458)ensure_timezone(412-416)src/claude_monitor/terminal/themes.py (1)
print_themed(692-700)src/claude_monitor/data/aggregator.py (2)
aggregate(269-305)UsageAggregator(92-305)src/claude_monitor/data/history_manager.py (5)
HistoryManager(18-409)load_historical_daily_data(124-202)merge_with_current_data(204-238)save_daily_data(58-122)aggregate_monthly_from_daily(285-366)src/claude_monitor/ui/table_views.py (1)
display_aggregated_view(318-384)
src/claude_monitor/data/aggregator.py (1)
src/tests/test_aggregator.py (2)
test_aggregate_daily_with_date_filter(346-360)test_aggregate_monthly_with_date_filter(391-401)
🔇 Additional comments (10)
src/claude_monitor/core/settings.py (3)
183-187: History mode field looks good.The Literal plus dedicated validator provide a clear UX and error message. No issues here.
258-271: Validator is correct and resilient.Lowercasing and enumerated check provide precise feedback on invalid values. Good defensive programming.
382-385: Propagation to CLI namespace is correct.start_date, end_date, and history are now available to CLI paths. Matches the PR objectives.
src/claude_monitor/cli/main.py (3)
408-410: End-date inclusivity approach is correct.Adding one day to make the end-date inclusive aligns with the aggregator’s “<= end_date” check and avoids 23:59:59 edge cases. Nice touch.
455-502: Monthly-from-daily flow is solid and aligns with the persistence strategy.The sequence—build current daily, load historical daily, optional save, merge (current wins), then aggregate to monthly—looks correct and makes monthly consistent with daily totals. Nice.
472-476: Great, fetching the context around the save logic to confirm the weekly/monthly flows. We'll review whether the “writeonly” mode should indeed apply in both places.src/claude_monitor/data/history_manager.py (4)
124-203: Date-range loading is robust; inclusive comparisons are consistent with CLI.The tz-naive comparison after stripping tzinfo from aware bounds is a pragmatic choice given YYYY-MM-DD filenames. The logic pairs well with the CLI’s “end_date + 1 day” strategy.
204-239: Merge semantics are correct and predictable.Historical first, then overwrite with current ensures freshest data wins for overlaps. Sorted output is helpful for downstream display.
285-366: Monthly aggregation from daily is implemented well.Correct field names, proper set-to-list conversion, and deterministic sorting. Nice.
368-409: Useful stats helper.No issues; the size and oldest/newest summaries will aid support/debug.
- Fixed 76 ruff linting errors (W293, W292, I001) - Removed trailing whitespace from blank lines - Added missing newline at end of file - Fixed import ordering Apologies for the lint issues in the initial PR. All checks should pass now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/claude_monitor/data/history_manager.py (3)
148-158: Normalize date comparisons once and compare by date-only
- Treat
days_back=0explicitly by checkingis not None.- Compute comparison dates once.
- Compare
date()to avoid time-of-day edge cases.- if days_back: + if days_back is not None: end_date = datetime.now() start_date = end_date - timedelta(days=days_back) elif not start_date: # Default to loading all available data start_date = datetime(2020, 1, 1) # Arbitrary old date - - if not end_date: - end_date = datetime.now() + if not end_date: + end_date = datetime.now() + # Normalize to dates for comparison (strip tz if present) + def _to_naive_date(dt): + if dt is None: + return None + if getattr(dt, "tzinfo", None): + dt = dt.replace(tzinfo=None) + return dt.date() + compare_start_date = _to_naive_date(start_date) + compare_end_date = _to_naive_date(end_date) @@ - # Make file_date timezone-naive for comparison - # Convert start_date and end_date to naive if they're aware - compare_start = ( - start_date.replace(tzinfo=None) - if start_date and start_date.tzinfo - else start_date - ) - compare_end = ( - end_date.replace(tzinfo=None) - if end_date and end_date.tzinfo - else end_date - ) + # Compare by date-only + file_date_date = file_date.date() @@ - if compare_start and file_date < compare_start: + if compare_start_date and file_date_date < compare_start_date: continue - if compare_end and file_date > compare_end: + if compare_end_date and file_date_date > compare_end_date: continueAlso applies to: 181-198
261-279: Use date-only cutoff to avoid off-by-hours deletionsComparing naive datetimes can delete files exactly N days old depending on current time. Compare
date()values.- cutoff_date = datetime.now() - timedelta(days=days_to_keep) + cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).date() @@ - file_date = datetime.strptime(date_str, "%Y-%m-%d") + file_date_date = datetime.strptime(date_str, "%Y-%m-%d").date() @@ - if file_date < cutoff_date: + if file_date_date < cutoff_date: file_path.unlink()
88-90: Specify UTF-8 encoding and avoid ASCII-escaping JSON (nit)This makes file IO explicit and preserves non-ASCII in model names, etc.
- with open(file_path, "r") as f: + with open(file_path, "r", encoding="utf-8") as f: existing_data = json.load(f) @@ - with open(temp_file, "w") as f: - json.dump(day_data, f, indent=2, default=str) + with open(temp_file, "w", encoding="utf-8") as f: + json.dump(day_data, f, indent=2, ensure_ascii=False, default=str) @@ - with open(file_path, "r") as f: + with open(file_path, "r", encoding="utf-8") as f: data = json.load(f)Also applies to: 114-116, 201-204
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/core/settings.py(4 hunks)src/claude_monitor/data/aggregator.py(2 hunks)src/claude_monitor/data/history_manager.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/claude_monitor/cli/main.py
- src/claude_monitor/core/settings.py
- src/claude_monitor/data/aggregator.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/claude_monitor/data/history_manager.py (3)
src/claude_monitor/core/settings.py (2)
exists(82-84)load(55-71)src/claude_monitor/utils/notifications.py (1)
NotificationManager(9-121)src/claude_monitor/data/analysis.py (1)
analyze_usage(18-100)
🔇 Additional comments (1)
src/claude_monitor/data/history_manager.py (1)
239-247: Merge strategy LGTMMerging historical first and letting current overwrite by date is correct and keeps the latest authoritative values.
- Prevent path traversal attacks with strict date format validation - Improve data comparison logic to preserve comprehensive information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/claude_monitor/data/history_manager.py (3)
49-66: Strict date validation: good; add final path containment checkRegex + strptime is solid. Add a
resolve().relative_to(self.daily_dir.resolve())check to guarantee the computed path stays under the base dir (belt-and-suspenders against any edge cases).Apply:
@@ def _get_daily_file_path(self, date_str: str) -> Path: - month_dir = self.daily_dir / year / month - month_dir.mkdir(parents=True, exist_ok=True) - - return month_dir / f"{date_str}.json" + month_dir = self.daily_dir / year / month + month_dir.mkdir(parents=True, exist_ok=True) + target = month_dir / f"{date_str}.json" + base = self.daily_dir.resolve() + resolved = target.resolve() + try: + resolved.relative_to(base) + except ValueError as e: + raise ValueError(f"Resolved path escapes history base dir: {resolved}") from e + return resolved
87-90: Don’t short-circuit same-session updatesThis skips potentially richer data in the same run. Let on-disk comparison decide; only mark
_saved_datesafter a successful write.Apply:
- # Skip if already saved in this session (unless overwrite) - if not overwrite and date_str in self._saved_dates: - continue + # Let on-disk comparison decide; only guard after we actually write.Note: Updating this will require adjusting tests that rely on the current short-circuit behavior.
100-135: Compare richness across fields, not just summed tokensUse field-wise “greater-or-equal across key metrics” to avoid discarding updates where cost, cache tokens, or entries increase without raising total tokens.
Apply:
- existing_total_tokens = ( - existing_data.get("input_tokens", 0) - + existing_data.get("output_tokens", 0) - + existing_data.get("cache_creation_tokens", 0) - + existing_data.get("cache_read_tokens", 0) - ) - new_total_tokens = ( - day_data.get("input_tokens", 0) - + day_data.get("output_tokens", 0) - + day_data.get("cache_creation_tokens", 0) - + day_data.get("cache_read_tokens", 0) - ) - - # Compare entries count and cost - existing_entries = existing_data.get("entries_count", 0) - new_entries = day_data.get("entries_count", 0) - existing_cost = existing_data.get("total_cost", 0.0) - new_cost = day_data.get("total_cost", 0.0) - - # Keep existing only if it has more total tokens, more entries, AND higher cost - # This ensures we don't lose any valuable information - if ( - existing_total_tokens > new_total_tokens - and existing_entries >= new_entries - and existing_cost >= new_cost - ): + fields = [ + "input_tokens", + "output_tokens", + "cache_creation_tokens", + "cache_read_tokens", + "entries_count", + ] + existing_cost = existing_data.get("total_cost", 0.0) + new_cost = day_data.get("total_cost", 0.0) + # Keep existing only if it's richer-or-equal across all fields and cost + if all(existing_data.get(k, 0) >= day_data.get(k, 0) for k in fields) and existing_cost >= new_cost: self._saved_dates.add(date_str) continue
🧹 Nitpick comments (5)
src/tests/test_settings.py (1)
720-732: Avoid testing internal validators directlyCalling
Settings.validate_historydirectly couples tests to implementation details. Prefer validating via instantiation (already covered) and remove or downgrade this test.src/claude_monitor/data/history_manager.py (1)
141-147: Durable writes: fsync temp before atomic replaceFlush and fsync the temp file prior to replace to reduce risk of data loss on crashes.
Apply:
- with open(temp_file, "w", encoding="utf-8") as f: - json.dump(day_data, f, indent=2, default=str, ensure_ascii=False) + import os + with open(temp_file, "w", encoding="utf-8") as f: + json.dump(day_data, f, indent=2, default=str, ensure_ascii=False) + f.flush() + os.fsync(f.fileno())src/tests/test_history_manager.py (3)
82-109: Test relies on same-session short-circuitThis asserts no overwrite in the same run. If we remove the short-circuit (recommended), update this test to instead verify on-disk comparison behavior.
178-199: Stabilize time-based testUsing
datetime.now()can be flaky across DST/clock changes. Consider freezing time (e.g., freezegun) or anchoring to a fixed date in UTC.
154-177: Add boundary case for end_dateInclude a file with a date equal to end_date and assert whether it’s included/excluded to lock the contract.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/data/history_manager.py(1 hunks)src/tests/test_aggregator.py(2 hunks)src/tests/test_history_manager.py(1 hunks)src/tests/test_settings.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/claude_monitor/cli/main.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/tests/test_history_manager.py (1)
src/claude_monitor/data/history_manager.py (8)
HistoryManager(18-454)_get_daily_file_path(35-65)save_daily_data(67-158)load_historical_daily_data(160-246)merge_with_current_data(248-280)aggregate_monthly_from_daily(327-411)cleanup_old_data(282-325)get_statistics(413-454)
src/tests/test_aggregator.py (2)
src/claude_monitor/data/aggregator.py (4)
UsageAggregator(92-303)aggregate_daily(157-179)aggregate_monthly(181-203)aggregate(269-303)src/claude_monitor/core/models.py (1)
UsageEntry(20-31)
src/claude_monitor/data/history_manager.py (1)
src/claude_monitor/data/aggregator.py (4)
UsageAggregator(92-297)AggregatedStats(20-48)AggregatedPeriod(52-89)add_entry(62-72)
src/tests/test_settings.py (1)
src/claude_monitor/core/settings.py (4)
Settings(87-384)validate_history(258-268)to_namespace(364-384)Settings(87-354)
🔇 Additional comments (5)
src/tests/test_aggregator.py (2)
625-707: Good coverage for date-range filteringSolid slice tests for daily and monthly windows with sorted assertions. LGTM.
708-744: aggregate signature matches test usage
Verified thatUsageAggregator.aggregatedefinesstart_date: Optional[datetime] = Noneandend_date: Optional[datetime] = None(src/claude_monitor/data/aggregator.py:269–271), so the tests’ calls align with the implementation.src/tests/test_settings.py (2)
733-761: to_namespace includes the new fields
Lines 380–382 insrc/claude_monitor/core/settings.pymapstart_date,end_date, andhistoryto the namespace.
670-696: Unable to locate the_parse_datefunction implementation insrc/claude_monitor/cli/main.py. Please verify that_parse_datecorrectly parses and normalizes all supported separators (YYYY-MM-DD, YYYY.MM.DD, YYYY/MM/DD) before comparing; if it currently only handles one format, add explicit parsing logic or validation to cover the others.src/tests/test_history_manager.py (1)
396-439: Great “richer update” coverageGood assertion that newer data with more info replaces the old record. LGTM.
0a1400b to
364f572
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (2)
src/claude_monitor/cli/main.py (2)
403-406: Clarify accepted date formats in the warning message.Message lists only YYYY-MM-DD while code accepts ".", "/". Update for accuracy.
- print_themed( - f"Invalid date format: {date_str}. Use YYYY-MM-DD.", style="warning" - ) + print_themed( + f"Invalid date format: {date_str}. Use YYYY-MM-DD, YYYY.MM.DD, or YYYY/MM/DD.", + style="warning", + )
458-465: Save only current-session days to history to avoid redundant rewrites.After merging, aggregated_data may include already-saved historical days. Filter to save only new/current ones (auto) and keep writeonly behavior unchanged.
- if aggregated_data and history_mode in ["auto", "writeonly"]: - saved_count = history_manager.save_daily_data(aggregated_data) + if aggregated_data and history_mode in ["auto", "writeonly"]: + data_to_save = aggregated_data + if history_mode == "auto" and historical_data: + hist_dates = {d.get("date") for d in historical_data} + data_to_save = [d for d in aggregated_data if d.get("date") not in hist_dates] + saved_count = history_manager.save_daily_data(data_to_save) if saved_count > 0: print_themed( f"Saved {saved_count} days to history", style="success" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/claude_monitor/cli/main.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/claude_monitor/cli/main.py (4)
src/claude_monitor/utils/time_utils.py (2)
TimezoneHandler(348-458)ensure_timezone(412-416)src/claude_monitor/terminal/themes.py (1)
print_themed(692-700)src/claude_monitor/data/aggregator.py (4)
aggregate(269-303)UsageAggregator(92-303)UsageAggregator(92-297)aggregate_daily(157-179)src/claude_monitor/data/history_manager.py (4)
load_historical_daily_data(160-246)merge_with_current_data(248-280)save_daily_data(67-158)aggregate_monthly_from_daily(327-411)
🔇 Additional comments (1)
src/claude_monitor/cli/main.py (1)
526-538: Nice: monthly derived from merged daily ensures consistency.Using HistoryManager.aggregate_monthly_from_daily after merging current+history is the right move to align daily and monthly totals.
- Fix path traversal vulnerability with strict regex validation - Implement unified inclusive date boundaries across components - Remove problematic session-level skip check in history manager - Add date range validation in CLI - Update test expectations for new boundary semantics - Remove complex date conversion logic from main.py
4aeb3ea to
8da35a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/claude_monitor/data/aggregator.py (2)
138-149: Hoist constant computation for end boundaryCompute next_day once outside the loop to avoid per-entry allocation. Behavior stays the same.
norm_end = ( self.timezone_handler.to_timezone(end_date, self.timezone) if end_date else None ) + next_day = norm_end + timedelta(days=1) if norm_end else None - - for entry in entries: + for entry in entries: # Convert entry timestamp to the configured timezone for filtering # and period-key extraction. ts_local = self.timezone_handler.to_timezone(entry.timestamp, self.timezone) # Apply date filters (inclusive boundaries in local timezone) if norm_start and ts_local < norm_start: continue # For end_date, include all entries up to the end of that day. # Exclude entries >= next day's midnight in local timezone. - if norm_end: - next_day = norm_end + timedelta(days=1) - if ts_local >= next_day: - continue + if next_day and ts_local >= next_day: + continueAlso applies to: 151-165
319-323: Avoid mutating input entries in placeYou’re normalizing tzinfo by mutating entry.timestamp. Safer to avoid side effects and use a local when possible. Not critical here, but reduces surprises for callers reusing the list.
src/claude_monitor/cli/main.py (1)
430-433: Minor: skip redundant monthly aggregation when using historyYou compute monthly via UsageAggregator, then recompute from daily+history and replace it. Skip the initial call when view_mode=="monthly" and history is not "off" to avoid wasted work.
- logger.info(f"Loading {view_mode} usage data...") - aggregated_data = aggregator.aggregate(start_dt, end_dt) + logger.info(f"Loading {view_mode} usage data...") + aggregated_data = ( + [] if (view_mode == "monthly" and getattr(args, "history", "auto") != "off") + else aggregator.aggregate(start_dt, end_dt) + )src/claude_monitor/data/history_manager.py (2)
208-227: Compare by date only to avoid time-of-day skewFiles are per-day; if a caller passes a datetime with a time component, current naive comparison can exclude the boundary day unintentionally. Normalize to dates before comparing.
- # Make file_date timezone-naive for comparison - # Convert start_date and end_date to naive if they're aware - compare_start = ( - start_date.replace(tzinfo=None) - if start_date and start_date.tzinfo - else start_date - ) - compare_end = ( - end_date.replace(tzinfo=None) - if end_date and end_date.tzinfo - else end_date - ) - - # Check if within range (inclusive on both ends) - if compare_start and file_date < compare_start: + file_day = file_date.date() + compare_start = start_date.date() if start_date else None + compare_end = end_date.date() if end_date else None + + # Check if within range (inclusive on both ends) + if compare_start and file_day < compare_start: continue - if ( - compare_end and file_date > compare_end - ): # > means end_date is inclusive + if compare_end and file_day > compare_end: continue
281-325: Minor: cutoff by date for cleanupSwitch to date-based comparison to avoid partial-day behavior; not urgent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/core/settings.py(4 hunks)src/claude_monitor/data/aggregator.py(5 hunks)src/claude_monitor/data/history_manager.py(1 hunks)src/tests/test_aggregator.py(3 hunks)src/tests/test_history_manager.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/tests/test_history_manager.py
- src/claude_monitor/core/settings.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/tests/test_aggregator.py (2)
src/claude_monitor/data/aggregator.py (4)
UsageAggregator(92-330)aggregate_daily(184-206)aggregate_monthly(208-230)aggregate(296-330)src/claude_monitor/core/models.py (1)
UsageEntry(20-31)
src/claude_monitor/cli/main.py (5)
src/claude_monitor/utils/time_utils.py (2)
TimezoneHandler(348-458)ensure_timezone(412-416)src/claude_monitor/terminal/themes.py (1)
print_themed(692-700)src/claude_monitor/data/aggregator.py (2)
aggregate(296-330)UsageAggregator(92-330)src/claude_monitor/data/history_manager.py (5)
HistoryManager(18-453)load_historical_daily_data(153-245)merge_with_current_data(247-279)save_daily_data(66-151)aggregate_monthly_from_daily(326-410)src/claude_monitor/ui/table_views.py (1)
display_aggregated_view(318-384)
src/claude_monitor/data/aggregator.py (2)
src/tests/test_data_reader.py (1)
timezone_handler(470-471)src/claude_monitor/utils/time_utils.py (2)
TimezoneHandler(348-458)to_timezone(441-445)
🔇 Additional comments (10)
src/claude_monitor/data/aggregator.py (1)
108-111: Good: timezone handler now respects user-selected TZInitializing TimezoneHandler with the configured timezone is correct and aligns all downstream conversions.
src/tests/test_aggregator.py (5)
352-354: End-date inclusivity clarified in testsComment and inputs now reflect whole-day inclusive semantics.
624-659: Solid coverage for inclusive daily filtersThe range Jan 3–7 with boundary assertions and token checks looks good.
660-707: Monthly range test matches intended semanticsUsing end_date=2025-01-01 to include only December 2024 works and is unambiguous.
708-745: aggregate(start,end) wiring validatedPatch/mocked load verifies both filtered and unfiltered paths; assertions are tight.
746-791: Timezone grouping test is preciseBoundary behavior under UTC vs America/New_York is correctly asserted with totals.
src/claude_monitor/cli/main.py (2)
393-411: Date parsing is robust and TZ-awareSupporting three formats and localizing via TimezoneHandler is correct.
412-419: Range validation: good early guardClear user-facing message and early return prevent confusing results.
src/claude_monitor/data/history_manager.py (2)
48-65: Good hardening: strict YYYY-MM-DD and safe path constructionRegex + datetime.strptime + fixed layout eliminates traversal risks.
326-411: Monthly aggregation from daily looks correctAccumulators, models_used set→list, and breakdown merges are sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/claude_monitor/cli/main.py (1)
482-488: Apply the same end-of-day fix to daily aggregation feeding monthly compositionWithout this, the last day can be dropped from monthly totals.
Apply:
- current_daily = daily_aggregator.aggregate(start_dt, end_dt) + current_daily = daily_aggregator.aggregate(start_dt, end_dt_eod)
🧹 Nitpick comments (3)
src/claude_monitor/data/history_manager.py (3)
165-175: Validate days_back and handle non-positive valuesAvoid surprising ranges when days_back is 0/negative.
Apply:
- if days_back: - end_date = datetime.now() - start_date = end_date - timedelta(days=days_back) + if days_back is not None: + if days_back <= 0: + logger.warning("days_back must be > 0; ignoring days_back=%r", days_back) + else: + end_date = datetime.now() + start_date = end_date - timedelta(days=days_back)
198-209: Minor perf/readability: compute compare_start/end onceThese are constant across files; move them out of the inner loop.
Apply:
# Scan for files in the date range - for year_dir in sorted(self.daily_dir.iterdir()): + # Normalize bounds once + compare_start = ( + start_date.replace(tzinfo=None) if start_date and start_date.tzinfo else start_date + ) + compare_end = ( + end_date.replace(tzinfo=None) if end_date and end_date.tzinfo else end_date + ) + + for year_dir in sorted(self.daily_dir.iterdir()): @@ - # Make file_date timezone-naive for comparison - # Convert start_date and end_date to naive if they're aware - compare_start = ( - start_date.replace(tzinfo=None) - if start_date and start_date.tzinfo - else start_date - ) - compare_end = ( - end_date.replace(tzinfo=None) - if end_date and end_date.tzinfo - else end_date - )Also applies to: 176-176
437-443: Expose both history root and daily path in stats“data_directory” points to the daily subdir; also returning the history root can help users locate everything.
Apply:
return { "total_files": total_files, "oldest_date": oldest_date.strftime("%Y-%m-%d") if oldest_date else None, "newest_date": newest_date.strftime("%Y-%m-%d") if newest_date else None, "total_size_mb": round(total_size / (1024 * 1024), 2), - "data_directory": str(self.daily_dir), + "data_directory": str(self.daily_dir), + "history_directory": str(self.data_dir), }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/data/history_manager.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (1)
src/claude_monitor/data/aggregator.py (7)
UsageAggregator(92-297)aggregate_daily(157-179)add_entry(62-72)AggregatedPeriod(52-89)_aggregate_by_period(110-155)aggregate(269-297)aggregate_monthly(181-203)
src/claude_monitor/data/history_manager.py (1)
src/claude_monitor/data/aggregator.py (4)
UsageAggregator(92-297)add_entry(62-72)AggregatedStats(20-48)AggregatedPeriod(52-89)
🔇 Additional comments (3)
src/claude_monitor/data/history_manager.py (1)
46-65: Good: strict path validation + atomic writesRegex + strptime prevent traversal; temp-file swap is safe. Looks solid.
Also applies to: 125-133
src/claude_monitor/cli/main.py (2)
441-472: Daily view history flow reads/merges/saves correctlyBehavior aligns with modes and inclusive history semantics. Nice.
430-433: Confirm aggregator.aggregate(...) accepts start/end argsThe call passes two args; ensure the Aggregator API matches to avoid runtime errors.
Run:
#!/bin/bash # Verify aggregate signature and parameter names rg -nP 'class\s+UsageAggregator\b' -n src/claude_monitor/data/aggregator.py -C2 rg -nP 'def\s+aggregate\s*\(' src/claude_monitor/data/aggregator.py -n -C2 rg -nP 'aggregate\([^)]*(start|end)_date' src/claude_monitor/data/aggregator.py -n -C2
- Add detailed comments explaining whole-day inclusive end_date behavior - Document that end_date includes all entries up to 23:59:59 - Clarify that aggregator excludes entries >= next day's midnight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/claude_monitor/cli/main.py (2)
389-394: Nit: move imports to module scope (or add justification).Local imports are fine for cold-start, but unless startup cost is a concern here, prefer top-level imports for clarity and tooling. If you keep them local for perf, add a brief comment.
395-408: Date parsing looks good; trim whitespace to be forgiving.Minor robustness: strip incoming strings before parsing to avoid user typos causing “invalid format.”
- def _parse_date(date_str: Optional[str]): + def _parse_date(date_str: Optional[str]): if not date_str: return None + date_str = date_str.strip()src/claude_monitor/data/aggregator.py (2)
126-137: Clarify “whole-day inclusive” applies to both start and end.Doc mentions end-day inclusivity explicitly; mirror that for start to avoid ambiguity.
- Both start_date and end_date are inclusive. If end_date is provided, - all entries from that entire day are included (up to 23:59:59.999999). + Both start_date and end_date are inclusive by calendar day in the + configured timezone. The filter includes all entries on start_date + and end_date (i.e., [start_of_day(start_date), start_of_day(end_date+1))).
321-325: Minor: avoid mutating entries in place.If entries are reused elsewhere, prefer deriving a local variable for tz-adjusted timestamp to keep dataclasses immutable in spirit.
- for entry in entries: - if entry.timestamp.tzinfo is None: - entry.timestamp = self.timezone_handler.ensure_timezone(entry.timestamp) + for entry in entries: + if entry.timestamp.tzinfo is None: + entry.timestamp = self.timezone_handler.ensure_timezone(entry.timestamp) + # Keep entry.timestamp as-is beyond tz assignment; use ts_local below exclusively.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/claude_monitor/cli/main.py(2 hunks)src/claude_monitor/data/aggregator.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/claude_monitor/cli/main.py (2)
src/claude_monitor/ui/table_views.py (1)
display_aggregated_view(318-384)src/tests/test_aggregator.py (1)
TestUsageAggregator(282-622)
src/claude_monitor/data/aggregator.py (1)
src/tests/test_aggregator.py (3)
test_aggregate_daily_with_date_filter(346-360)test_aggregate_monthly_with_date_filter(391-401)TestUsageAggregator(282-622)
🔇 Additional comments (9)
src/claude_monitor/cli/main.py (8)
412-418: Good: early range validation.Clear message and early exit prevent confusing results.
430-436: Confirm end-bound semantics match aggregator.You rely on aggregator’s “whole-day inclusive” handling. Ensure aggregator normalizes end to day-start before adding +1 day; otherwise passing an end with time (e.g., 23:59:59) can overshoot. See comment in aggregator.py.
444-467: Daily: correct load→merge flow.Inclusive day-range for history load and merging order (current wins) align with spec.
468-475: Daily: write path honors writeonly.Save gating for ["auto","writeonly"] looks right; ensure save is idempotent on existing days.
Would you like me to add a quick guard in HistoryManager.save_daily_data to skip files that already exist with identical content?
476-485: Monthly: build from daily first — good.Using daily aggregation as the canonical source for monthly is the right call.
486-501: Monthly: mode gating for load/save is consistent.Readonly avoids writes; writeonly avoids loads. Good.
503-534: Helpful composition messaging.Nice UX showing current vs history provenance.
536-549: Monthly aggregation from daily is correct.Replacing initial monthly data with aggregation from daily ensures consistency.
src/claude_monitor/data/aggregator.py (1)
326-331: LGTM: single entry point honors filters per mode.Clean delegation to daily/monthly helpers with bounds.
| # Normalize filter boundaries into the configured timezone for | ||
| # consistent, intuitive "whole-day inclusive" semantics. | ||
| norm_start = ( | ||
| self.timezone_handler.to_timezone(start_date, self.timezone) | ||
| if start_date | ||
| else None | ||
| ) | ||
| norm_end = ( | ||
| self.timezone_handler.to_timezone(end_date, self.timezone) | ||
| if end_date | ||
| else None | ||
| ) | ||
|
|
||
| for entry in entries: | ||
| # Apply date filters | ||
| if start_date and entry.timestamp < start_date: | ||
| continue | ||
| if end_date and entry.timestamp > end_date: | ||
| # Convert entry timestamp to the configured timezone for filtering | ||
| # and period-key extraction. | ||
| ts_local = self.timezone_handler.to_timezone(entry.timestamp, self.timezone) | ||
|
|
||
| # Apply date filters (inclusive boundaries in local timezone) | ||
| if norm_start and ts_local < norm_start: | ||
| continue | ||
| # For end_date, include all entries up to the end of that day. | ||
| # Exclude entries >= next day's midnight in local timezone. | ||
| if norm_end: | ||
| next_day = norm_end + timedelta(days=1) | ||
| if ts_local >= next_day: | ||
| continue | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: end_date with time-of-day over-includes the next day. Normalize to day-start before +1 day.
Currently next_day = norm_end + 1 day. If norm_end is 23:59:59, you include the entire following day until 23:59:59. Normalize start/end to the start of their days, then use an exclusive upper bound = end_day_start + 1 day.
- norm_start = (
- self.timezone_handler.to_timezone(start_date, self.timezone)
- if start_date
- else None
- )
- norm_end = (
- self.timezone_handler.to_timezone(end_date, self.timezone)
- if end_date
- else None
- )
+ norm_start = (
+ self.timezone_handler.to_timezone(start_date, self.timezone)
+ if start_date
+ else None
+ )
+ norm_end = (
+ self.timezone_handler.to_timezone(end_date, self.timezone)
+ if end_date
+ else None
+ )
+ # Day-normalized inclusive bounds in local time
+ start_bound = (
+ norm_start.replace(hour=0, minute=0, second=0, microsecond=0)
+ if norm_start
+ else None
+ )
+ end_exclusive = (
+ norm_end.replace(hour=0, minute=0, second=0, microsecond=0) + timedelta(days=1)
+ if norm_end
+ else None
+ )
@@
- # Apply date filters (inclusive boundaries in local timezone)
- if norm_start and ts_local < norm_start:
+ # Apply date filters (inclusive by calendar day)
+ if start_bound and ts_local < start_bound:
continue
- # For end_date, include all entries up to the end of that day.
- # Exclude entries >= next day's midnight in local timezone.
- if norm_end:
- next_day = norm_end + timedelta(days=1)
- if ts_local >= next_day:
- continue
+ # Exclude entries on or after start_of_day(end_date + 1)
+ if end_exclusive and ts_local >= end_exclusive:
+ continue#!/bin/bash
# Expect to see only two days (2024-01-15 and 2024-01-31) in tests;
# grep current logic and test inputs to confirm potential overshoot.
rg -n -C2 'next_day\s*=\s*norm_end\s*\+\s*timedelta\(days=1\)' src/claude_monitor/data/aggregator.py
rg -n -C2 'aggregate_daily\(.+end_date' src/tests | sed -n '1,120p'
Summary
This PR adds historical data storage to keep usage statistics beyond Claude's 30-day limit.
Note: This feature was developed for personal use and I thought it might be useful for others too. No pressure to merge - just wanted to share! Thank you for creating claude-monitor! 😊
Key Features
🗄️ Historical Data Management
--view dailyexecution~/.claude-monitor/history/daily/organized by year/month📅 Date Filtering
--start-dateand--end-dateparameters for filtering data ranges🔄 History Modes
New
--historyparameter with four modes:auto(default): Save and load historical data automaticallyoff: Disable history functionality completelyreadonly: Only load historical data, don't save new datawriteonly: Only save new data, don't load historical📊 Enhanced Monthly View
Try It Out
If you'd like to test this feature:
Usage Examples
Implementation Details
Modified files:
cli/main.py- Added history integrationcore/settings.py- New parametersdata/aggregator.py- Date filteringdata/history_manager.py- New history moduleData Structure
Historical data is stored as JSON files:
Testing
Benefits
Hope this helps! Feel free to reach out if you have any questions.
Summary by CodeRabbit
New Features
Tests