Skip to content

Multi-Kubernetes Cluster Task Runner Support #18641

@FrankChen021

Description

@FrankChen021

Multi-Kubernetes Cluster Task Runner Support

1. Background

Current Problem

Druid clusters deployed across multiple IDCs (within the same Availability Zone) face a critical issue with task management when using Kubernetes-based task runners:

Scenario:

  • Druid cluster spans multiple IDCs (AZ1, AZ2, etc.)
  • Each IDC has its own Kubernetes cluster
  • Overlord leader runs in AZ1 and submits tasks to K8s cluster in AZ1
  • When leader failover occurs to an Overlord in AZ2:
    • New leader cannot restore running task information from K8s cluster in AZ1
    • New leader starts fresh tasks in K8s cluster in AZ2
    • Result: Duplicate tasks running simultaneously in both clusters
    • Realtime queries go to these two tasks, causing incorrect query result
Image

However, for middle manager deployment, there's no such problem, because overlords in different AZs are able to communicate with middle managers in different AZs.
So I would like to see this is a feature missing for K8S-based task deployment.

Root Cause

The current KubernetesTaskRunner implementation has a fundamental limitation:

  • Single-cluster assumption: Only connects to one Kubernetes cluster
  • Local restoration: Only restores tasks from the local K8s cluster during startup
  • No cross-cluster visibility: Cannot discover or manage tasks running in remote clusters

Impact

  • Data inconsistency: Duplicate indexing tasks process the same data
  • Resource waste: Unnecessary compute resources consumed
  • Data corruption risk: Conflicting writes to the same segments
  • Operational complexity: Manual intervention required to resolve conflicts

2. Solution

High-Level Architecture

2.1 Multi-Cluster Client Management

  • Cluster Registry: Maintain a registry of all available Kubernetes clusters across IDCs
  • Client Pool: Create and manage Kubernetes client connections to multiple clusters
  • Health Monitoring: Continuously monitor cluster availability and connectivity
  • Failover Support: Gracefully handle cluster unavailability

2.2 Cross-Cluster Task Discovery

  • Unified Task View: Aggregate running tasks from all registered clusters
  • Cluster Tagging: Tag each task with its originating cluster information
  • State Synchronization: Maintain consistent view of task states across clusters
  • Conflict Detection: Identify and prevent duplicate task submissions

2.3 Intelligent Task Placement

  • Cluster Selection Strategy: Implement algorithms to select optimal cluster for new tasks
  • Load Balancing: Distribute tasks across clusters based on capacity and load
  • Affinity Rules: Support cluster-specific constraints and preferences
  • Resource Awareness: Consider cluster resources and availability

2.4 Enhanced Task Lifecycle Management

  • Cross-Cluster Monitoring: Monitor task status across all clusters
  • Unified Control Plane: Provide single interface for task management regardless of cluster
  • Graceful Migration: Support task migration between clusters when needed
  • Cleanup Coordination: Ensure proper cleanup of tasks across all clusters

Key Components

2.5 Configuration Management

  • Multi-Cluster Config: Extend configuration to support multiple cluster definitions
  • Context Management: Handle different authentication contexts for each cluster
  • Namespace Strategy: Define namespace usage across clusters
  • Security Policies: Implement cluster-specific security and access controls

2.6 Monitoring and Observability

  • Cross-Cluster Metrics: Aggregate metrics from all clusters
  • Unified Logging: Provide consolidated view of task execution across clusters
  • Alerting: Alert on cluster-specific or cross-cluster issues
  • Dashboard Integration: Extend existing monitoring to show multi-cluster view

2.7 Fault Tolerance

  • Cluster Isolation: Ensure failure in one cluster doesn't affect others
  • Task Recovery: Implement recovery mechanisms for tasks in failed clusters
  • Leader Election: Ensure only one Overlord manages tasks at a time
  • Consistency Guarantees: Maintain data consistency across cluster boundaries

Benefits

2.8 Operational Benefits

  • High Availability: Tasks continue running even if individual clusters fail
  • Geographic Distribution: Leverage multiple IDCs for better performance
  • Resource Optimization: Better utilization of compute resources across clusters
  • Disaster Recovery: Natural disaster recovery through geographic distribution

2.9 Technical Benefits

  • Scalability: Scale beyond single cluster limitations
  • Flexibility: Choose optimal cluster for each task based on requirements
  • Resilience: Improved fault tolerance and recovery capabilities
  • Efficiency: Better resource utilization and load distribution

Implementation Considerations

Implements a new K8S task runner MultiKubernetesClusterTaskRunner which inherits existing KubernetesTaskRunner and overrides the task restore/submit behaviour.

this task runner initializes different k8s client apis by provided kubeconfig for different k8s clusters, and restores tasks from all these client apis.

when a new task is submitted, pick up a live k8s cluster and use corresponding client api to create task on target k8s cluster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions