Skip to content

[Feature]: Support Bundle Generation - Automated Diagnostics Collection #1197

@crivetimihai

Description

@crivetimihai

📦 Epic: Support Bundle Generation - Automated Diagnostics Collection

Goal

Implement a support bundle generation feature that allows administrators and users to collect comprehensive diagnostic information for troubleshooting MCP Gateway issues. The feature should automatically sanitize sensitive data (passwords, tokens, API keys, secrets) while providing all necessary technical details for support teams.

Why Now?

As MCP Gateway deployments grow in complexity and scale, troubleshooting production issues becomes increasingly challenging:

  1. Manual Diagnostic Collection is Error-Prone: Support teams currently ask users to manually collect logs, configuration files, and system information, leading to incomplete or inconsistent data
  2. Sensitive Data Exposure Risk: Users may accidentally share passwords, API keys, or tokens when sending logs or configuration files
  3. Time-Consuming Troubleshooting: Without standardized diagnostics, support teams spend significant time requesting additional information
  4. Missing Context: System metrics, platform details, and service status are often missing from user-provided diagnostics
  5. Operational Efficiency: A standardized support bundle accelerates issue resolution and improves user experience

By implementing automated support bundle generation with built-in sanitization, we enable:

  • One-click diagnostic collection
  • Guaranteed sensitive data redaction
  • Comprehensive troubleshooting context
  • Faster issue resolution
  • Better user experience

📖 User Stories

US-1: Platform Admin - Generate Support Bundle via CLI

As a Platform Administrator
I want to generate a support bundle from the command line
So that I can collect diagnostics for troubleshooting without accessing the UI

Acceptance Criteria:

Given I have CLI access to the MCP Gateway server
When I run the command:
  mcpgateway --support-bundle --output-dir /tmp --log-lines 1000
Then a ZIP file should be created at /tmp/mcpgateway-support-YYYY-MM-DD-HHMMSS.zip
And the bundle should contain:
  - Version information
  - System diagnostics
  - Configuration (sanitized)
  - Last 1000 lines of logs (sanitized)
  - Platform details
  - Service status
And I should see a success message with the bundle path
And I should see a security notice about reviewing before sharing

Technical Requirements:

  • CLI flag: --support-bundle
  • Optional parameters: --output-dir, --log-lines, --no-logs, --no-env, --no-system
  • Exit with status 0 on success, 1 on failure
  • Timestamped filename format: mcpgateway-support-YYYY-MM-DD-HHMMSS.zip
  • Display bundle size after generation
US-2: Support Engineer - Download Bundle via Admin UI

As a Support Engineer
I want to download a support bundle from the Admin UI
So that I can provide easy instructions to users without CLI access

Acceptance Criteria:

Given I am logged into the Admin UI
When I navigate to the Diagnostics tab
Then I should see a "Support Bundle" card with:
  - Description of bundle contents
  - Security notice about data sanitization
  - "Download Support Bundle" button
When I click the "Download Support Bundle" button
Then a ZIP file should download to my browser
And the filename should be mcpgateway-support-YYYY-MM-DD-HHMMSS.zip
And the download should complete within 10 seconds for typical deployments

Technical Requirements:

  • Located in Diagnostics/Version tab of Admin UI
  • Visual design consistent with existing UI components
  • Download button with loading state
  • Shows CLI alternative command for reference
  • Displays bundle contents checklist with green checkmarks
US-3: API Consumer - Generate Bundle via REST API

As an API Consumer
I want to generate support bundles programmatically via API
So that I can integrate diagnostics collection into monitoring/alerting systems

Acceptance Criteria:

Given I have valid authentication credentials
When I make a GET request to /admin/support-bundle/generate
Then I should receive:
  - HTTP 200 status code
  - Content-Type: application/zip
  - Content-Disposition header with filename
  - ZIP file containing sanitized diagnostics
And the response should be authenticated (require valid JWT or Basic Auth)

Technical Requirements:

  • Endpoint: GET /admin/support-bundle/generate
  • Query parameters: log_lines, include_logs, include_env, include_system
  • Authentication required (JWT bearer token or Basic Auth)
  • Response headers:
    • Content-Type: application/zip
    • Content-Disposition: attachment; filename="..."
    • Content-Length: <size>
    • X-Content-Type-Options: nosniff
US-4: Security Officer - Verify Data Sanitization

As a Security Officer
I want to ensure all sensitive data is automatically redacted
So that users can safely share support bundles without exposing credentials

Acceptance Criteria:

Given a support bundle has been generated
When I extract and review the bundle contents
Then the following should be redacted with "*****":
  - Passwords (DATABASE_PASSWORD, BASIC_AUTH_PASSWORD, etc.)
  - API keys (API_KEY, OPENAI_API_KEY, etc.)
  - Tokens (JWT_SECRET_KEY, BEARER tokens, etc.)
  - Secrets (AUTH_ENCRYPTION_SECRET, etc.)
  - Database URLs with passwords (postgresql://user:PASS@host → postgresql://user:*****@host)
  - Redis URLs with passwords
  - JWT tokens (eyJ... patterns)
  - Authorization headers
And public configuration values should remain visible:
  - HOST, PORT, LOG_LEVEL
  - Feature flags (UI_ENABLED, etc.)
  - Transport settings

Technical Requirements:

  • Regex-based pattern matching for sensitive data
  • URL credential sanitization (preserve username, remove password)
  • Environment variable filtering based on naming patterns
  • Log line-by-line sanitization
  • Configuration field exclusion (sensitive Pydantic fields)
  • Test coverage for all sanitization patterns
US-5: DevOps - Automate Bundle Collection on Errors

As a DevOps Engineer
I want to programmatically collect support bundles when errors occur
So that I can attach diagnostics to incident reports automatically

Acceptance Criteria:

Given I have monitoring/alerting configured
When an error threshold is exceeded
Then I can call the API to generate a support bundle
And store it in an incident management system
And attach it to the alert/ticket

Technical Requirements:

  • Scriptable API endpoint
  • Configurable bundle contents
  • Fast generation time (< 10 seconds typical)
  • Predictable error handling
  • Machine-readable success/failure responses

🏗 Architecture

Component Overview

graph TB
    subgraph "Entry Points"
        A[CLI Command]
        B[Admin UI Button]
        C[REST API Endpoint]
    end

    subgraph "Support Bundle Service"
        D[SupportBundleService]
        E[Data Collection]
        F[Sanitization Engine]
        G[ZIP Generator]
    end

    subgraph "Data Sources"
        H[Version Info]
        I[System Metrics]
        J[Configuration]
        K[Logs]
        L[Services Status]
    end

    A --> D
    B --> C
    C --> D
    D --> E
    E --> H
    E --> I
    E --> J
    E --> K
    E --> L
    E --> F
    F --> G
    G --> M[ZIP File Output]
Loading

Data Flow

sequenceDiagram
    participant User
    participant UI/CLI/API
    participant Service as SupportBundleService
    participant Sanitizer as Sanitization Engine
    participant FS as File System

    User->>UI/CLI/API: Request support bundle
    UI/CLI/API->>Service: generate_bundle(config)

    Service->>Service: Collect version info
    Service->>Service: Collect system metrics
    Service->>Service: Collect configuration
    Service->>Service: Collect logs

    Service->>Sanitizer: Sanitize environment vars
    Sanitizer-->>Service: Redacted env vars

    Service->>Sanitizer: Sanitize log lines
    Sanitizer-->>Service: Redacted logs

    Service->>Sanitizer: Sanitize URLs
    Sanitizer-->>Service: Redacted URLs

    Service->>Service: Create manifest
    Service->>Service: Generate ZIP
    Service->>FS: Write ZIP file

    Service-->>UI/CLI/API: Return bundle path
    UI/CLI/API-->>User: Download/display bundle
Loading

📋 Implementation Tasks

Phase 1: Core Service Implementation ✅

  • Create Service Module

    • Create mcpgateway/services/support_bundle_service.py
    • Define SupportBundleService class
    • Define SupportBundleConfig Pydantic model
    • Implement service initialization with timestamp and hostname
  • Data Collection Methods

    • _collect_version_info(): app name, version, MCP protocol, Python version, platform
    • _collect_system_info(): CPU, memory, disk (using psutil if available)
    • _collect_env_config(): environment variables with secret filtering
    • _collect_settings(): Pydantic settings with sensitive field exclusion
    • _collect_logs(): log file reading with size limits and line tailing
  • Sanitization Engine

    • Define SENSITIVE_PATTERNS regex list:
      • Password patterns: password[:=]"value"
      • Token patterns: token[:=]"value"
      • API key patterns: api_key[:=]"value"
      • Secret patterns: secret[:=]"value"
      • Bearer token patterns: Bearer <token>
      • Authorization headers
      • Database URL patterns: postgresql://user:pass@host
      • JWT token patterns: eyJ...
    • Implement _is_secret(key): detect secret env var names
    • Implement _sanitize_url(url): remove passwords from URLs
    • Implement _sanitize_line(line): apply regex patterns to log lines
  • Bundle Generation

    • Implement _create_manifest(): bundle metadata and warnings
    • Implement generate_bundle(): orchestrate collection and ZIP creation
    • Create ZIP file with timestamped filename
    • Add all collected data as JSON files
    • Add logs/ directory with sanitized logs
    • Add README.md with bundle description
    • Return Path to generated ZIP file
  • Configuration

    • SupportBundleConfig fields:
      • include_logs: bool = True
      • include_env: bool = True
      • include_system_info: bool = True
      • max_log_size_mb: float = 10.0
      • log_tail_lines: int = 1000 (0 = all)
      • output_dir: Optional[Path] = None (default: /tmp)

Phase 2: CLI Integration ✅

  • CLI Command Implementation

    • Add --support-bundle flag to mcpgateway/cli.py
    • Implement _handle_support_bundle() function
    • Parse command-line options:
      • --output-dir <path>
      • --log-lines <n>
      • --no-logs
      • --no-env
      • --no-system
    • Call SupportBundleService.generate_bundle()
    • Display success message with bundle path and size
    • Display security notice
    • Exit with proper status codes (0=success, 1=failure)
  • Error Handling

    • Catch and display user-friendly error messages
    • Handle permission errors (output directory)
    • Handle disk space errors
    • Handle log file access errors

Phase 3: API Endpoint ✅

  • REST API Endpoint

    • Add route: GET /admin/support-bundle/generate
    • Implement admin_generate_support_bundle() handler in mcpgateway/admin.py
    • Query parameters:
      • log_lines: int = 1000
      • include_logs: bool = True
      • include_env: bool = True
      • include_system: bool = True
    • Require authentication: user=Depends(get_current_user_with_permissions)
    • Generate bundle in temporary directory
    • Read ZIP file contents
    • Return Response with:
      • content: bytes (ZIP file)
      • media_type: "application/zip"
      • headers: Content-Disposition, Content-Length, X-Content-Type-Options
    • Clean up temporary file after response
  • Error Responses

    • HTTP 401 if not authenticated
    • HTTP 500 if generation fails
    • Include error message in JSON response

Phase 4: Admin UI Integration ✅

  • UI Component

    • Add support bundle card to mcpgateway/templates/version_info_partial.html
    • Create card with sections:
      • Header: "Troubleshooting Support"
      • Description of bundle contents
      • Bundle contents checklist (6 items with checkmarks)
      • Security notice (yellow warning box)
      • Download button (prominent, centered)
      • CLI alternative command (code block)
    • Style with Tailwind CSS classes (consistent with existing UI)
    • Download button links to: /admin/support-bundle/generate?log_lines=1000
    • Add SVG icons (download icon, checkmarks, warning icon)
  • Dark Mode Support

    • Use Tailwind dark mode classes: dark:bg-gray-800, dark:text-gray-200
    • Test in both light and dark themes

Phase 5: Testing ✅

  • Unit Tests (tests/unit/mcpgateway/services/test_support_bundle_service.py)

    • Test service initialization
    • Test _is_secret() detection (10+ cases)
    • Test _sanitize_url() with various URL formats
    • Test _sanitize_line() with various patterns
    • Test _collect_version_info() structure
    • Test _collect_system_info() structure
    • Test _collect_env_config() sanitization
    • Test _collect_settings() field exclusion
    • Test _collect_logs() with missing files
    • Test _create_manifest() structure
    • Test generate_bundle() creates valid ZIP
    • Test ZIP contents (files present)
    • Test custom configuration (exclusions)
    • Test convenience function create_support_bundle()
    • Test end-to-end sanitization in bundle
    • Target: 15+ tests, 90%+ coverage
  • Integration Tests

    • Test CLI command execution
    • Test API endpoint (authenticated request)
    • Test API endpoint (unauthenticated request → 401)
    • Test bundle download via browser simulation
  • Edge Cases

    • Test with log file > max_log_size_mb (should warn/skip)
    • Test with missing log directory
    • Test with read-only output directory (should fail gracefully)
    • Test with 0 log lines (all logs)
    • Test with all features disabled (minimal bundle)

Phase 6: Documentation ✅

  • User Documentation (CLAUDE.md)

    • Add "Generating Support Bundles" section
    • Document CLI usage with examples
    • Document API endpoint with curl examples
    • Document Admin UI location
    • List bundle contents
    • Highlight security features (automatic sanitization)
  • Code Documentation

    • Comprehensive docstrings with examples
    • Doctests in service methods
    • Type hints for all functions
    • README.md in bundle (generated content)
  • API Documentation

    • OpenAPI/Swagger docs for /admin/support-bundle/generate endpoint
    • Parameter descriptions
    • Response schema

Phase 7: Quality Assurance ✅

  • Code Quality

    • Run make autoflake isort black (formatting)
    • Run make flake8 (linting, pass with 0 errors)
    • Run make pylint (static analysis)
    • Run make doctest (verify docstring examples)
    • Pass make verify (comprehensive checks)
  • Security Review

    • Verify all SENSITIVE_PATTERNS catch real-world patterns
    • Test with actual production .env files (redacted)
    • Verify no sensitive data leaks in any scenario
    • Review for path traversal vulnerabilities
    • Review for zip bomb vulnerabilities
  • Performance Testing

    • Measure bundle generation time (target: < 10 seconds)
    • Test with large log files (100MB+)
    • Test with many environment variables (1000+)
    • Verify memory usage is bounded

⚙️ CLI Usage Examples

Basic Usage

# Generate with default settings
mcpgateway --support-bundle

# Custom output directory
mcpgateway --support-bundle --output-dir /var/tmp

# Limit log lines
mcpgateway --support-bundle --log-lines 500

# Exclude components
mcpgateway --support-bundle --no-logs
mcpgateway --support-bundle --no-env --no-system

# Get all logs (no limit)
mcpgateway --support-bundle --log-lines 0

API Usage

# Using curl with JWT token
export TOKEN="your-jwt-token"
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:4444/admin/support-bundle/generate?log_lines=1000" \
  -o support-bundle.zip

# Using curl with Basic Auth
curl -u admin:password \
  "http://localhost:4444/admin/support-bundle/generate" \
  -o support-bundle.zip

# Customized bundle
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:4444/admin/support-bundle/generate?log_lines=500&include_system=true" \
  -o support-bundle.zip

Python Usage

from pathlib import Path
from mcpgateway.services.support_bundle_service import (
    SupportBundleService,
    SupportBundleConfig,
    create_support_bundle
)

# Using convenience function
bundle_path = create_support_bundle()
print(f"Bundle created: {bundle_path}")

# Using service with custom config
config = SupportBundleConfig(
    output_dir=Path("/tmp"),
    log_tail_lines=500,
    include_logs=True,
    include_env=True,
    include_system_info=True,
    max_log_size_mb=20.0
)

service = SupportBundleService()
bundle_path = service.generate_bundle(config)
print(f"Bundle created: {bundle_path}")

📦 Bundle Structure

mcpgateway-support-2025-01-09-120000.zip
├── MANIFEST.json                 # Bundle metadata and warnings
├── README.md                     # Usage instructions
├── version.json                  # App version, Python, FastAPI versions
├── system_info.json              # CPU, memory, disk, platform details
├── settings.json                 # Application settings (sanitized)
├── environment.json              # Environment variables (secrets redacted)
└── logs/
    └── mcpgateway.log            # Application logs (sanitized)

MANIFEST.json

{
  "bundle_version": "1.0",
  "generated_at": "2025-01-09T12:00:00+00:00",
  "hostname": "mcp-gateway-prod-01",
  "app_version": "0.8.0",
  "configuration": {
    "include_logs": true,
    "include_env": true,
    "include_system_info": true,
    "log_tail_lines": 1000
  },
  "warning": "This bundle may contain sensitive information. Review before sharing."
}

✅ Success Criteria

  • Functionality

    • CLI command generates valid ZIP bundles
    • API endpoint returns downloadable ZIP files
    • Admin UI button initiates bundle download
    • All three methods produce identical bundle structure
  • Security

    • 100% of tested secret patterns are redacted
    • No actual passwords, tokens, or API keys in generated bundles
    • Security notice displayed/included in all interfaces
    • Bundle README includes security warning
  • Performance

    • Bundle generation completes in < 10 seconds for typical deployments
    • Memory usage bounded (< 100MB for generation)
    • Works with large log files (100MB+)
  • Usability

    • One-command CLI usage
    • One-click UI download
    • Clear error messages on failures
    • Timestamped filenames for easy identification
  • Quality

    • 15+ unit tests with 90%+ coverage
    • Pass all linting/formatting checks
    • Comprehensive documentation
    • Zero security vulnerabilities

🏁 Definition of Done

  • SupportBundleService implemented with all collection methods
  • Sanitization engine with 8+ regex patterns
  • CLI command --support-bundle with optional parameters
  • API endpoint GET /admin/support-bundle/generate
  • Admin UI card in Diagnostics tab with download button
  • 15+ unit tests with 90%+ coverage
  • All tests passing (pytest)
  • Code passes make verify checks
  • Documentation updated (CLAUDE.md)
  • Security review completed (no sensitive data leaks)
  • Performance benchmarked (< 10 seconds)
  • Manual testing on dev/staging environment
  • Team review and approval

📝 Additional Notes

🔹 Security Considerations

Automatic Sanitization:

  • Regex patterns cover common secret naming conventions
  • URL parsing removes passwords while preserving connection info
  • Environment variable filtering based on naming patterns
  • Log line-by-line sanitization for in-line credentials

Residual Risk:

  • Custom/unusual secret naming may not be caught
  • Secrets in freeform text (comments, descriptions) may leak
  • Users should still review bundles before sharing publicly

Mitigation:

  • Clear warnings in UI, CLI, and bundle README
  • Documentation emphasizes review before sharing
  • Consider allow-list approach for production deployments

🔹 Performance Optimization

Bundle Generation Time:

  • Version info: < 1ms (in-memory)
  • System metrics: < 100ms (psutil calls)
  • Configuration: < 10ms (Pydantic serialization)
  • Environment: < 10ms (dictionary filtering)
  • Logs: < 5 seconds (depends on file size, tail optimization)
  • ZIP creation: < 1 second (compression)
  • Total: 3-10 seconds typical

Memory Usage:

  • Stream log files instead of loading entirely (when possible)
  • Use iterative ZIP writing (no double-buffering)
  • Bound log tail lines (default 1000)
  • Peak memory: 50-100MB typical

🔹 Future Enhancements

  • Selective Bundling: UI checkboxes to include/exclude components
  • Bundle History: Store last N bundles, allow re-download
  • Scheduled Bundles: Cron/timer to generate bundles periodically
  • Incident Integration: Auto-attach bundles to incident tickets
  • Bundle Analysis: Built-in diagnostics scanner (log error patterns, config issues)
  • Multi-Server Bundles: Aggregate bundles from federated gateways
  • Custom Sanitization Rules: User-defined regex patterns via config
  • Compression Levels: Configurable ZIP compression (speed vs size)

🔹 Testing Strategy

Unit Tests (15+):

  • Individual method testing (collection, sanitization)
  • Edge case coverage (missing files, permissions, etc.)
  • Configuration validation

Integration Tests:

  • CLI end-to-end (run command, verify ZIP)
  • API endpoint (auth, download, error responses)
  • UI interaction (Playwright/Selenium)

Manual Testing Checklist:

  • Generate bundle in development environment
  • Extract and review all files
  • Verify no actual secrets present
  • Test with large log files
  • Test with missing log directory
  • Test with read-only filesystem
  • Test CLI from different directories
  • Test API with invalid auth
  • Test UI download in Chrome, Firefox, Safari

🔗 Related Issues


📊 Implementation Progress

Estimated Effort

  • Phase 1 (Core Service): 4-6 hours
  • Phase 2 (CLI): 1-2 hours
  • Phase 3 (API): 1-2 hours
  • Phase 4 (UI): 2-3 hours
  • Phase 5 (Testing): 3-4 hours
  • Phase 6 (Documentation): 1-2 hours
  • Phase 7 (QA): 2-3 hours

Total: 14-22 hours (2-3 days)

Risks & Mitigation

Risk Impact Probability Mitigation
Sensitive data leaks High Low Comprehensive test cases, security review
Performance issues with large logs Medium Medium Log tailing, size limits, streaming
Platform-specific errors Medium Medium Cross-platform testing, psutil fallbacks
User confusion Low Medium Clear documentation, in-app guidance

🎯 Acceptance Testing

Test Scenario 1: Happy Path

Given I am a platform admin with CLI access
When I run: mcpgateway --support-bundle --output-dir /tmp
Then I see: "✅ Support bundle created: /tmp/mcpgateway-support-2025-01-09-120000.zip"
And I see: "📦 Bundle size: 9.27 KB"
And I see: "⚠️ Security Notice: The bundle has been sanitized..."
And the ZIP file contains: MANIFEST.json, version.json, system_info.json, settings.json, environment.json, logs/mcpgateway.log, README.md
And no sensitive data is present in any file

Test Scenario 2: API Download

Given I have a valid JWT token
When I make a GET request to /admin/support-bundle/generate
Then I receive HTTP 200 with Content-Type: application/zip
And the response body is a valid ZIP file
And the Content-Disposition header includes a timestamped filename

Test Scenario 3: Sanitization Verification

Given my .env file contains DATABASE_PASSWORD=secret123
And my logs contain "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
When I generate a support bundle
Then environment.json shows DATABASE_PASSWORD: "*****"
And logs/mcpgateway.log shows "Bearer *****"
And no occurrence of "secret123" or "eyJhbGciOi" exists in any file

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestpythonPython / backend development (FastAPI)securityImproves security

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions