-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Multi-Kubernetes Cluster Task Runner Support
1. Background
Current Problem
Druid clusters deployed across multiple IDCs (within the same Availability Zone) face a critical issue with task management when using Kubernetes-based task runners:
Scenario:
- Druid cluster spans multiple IDCs (AZ1, AZ2, etc.)
- Each IDC has its own Kubernetes cluster
- Overlord leader runs in AZ1 and submits tasks to K8s cluster in AZ1
- When leader failover occurs to an Overlord in AZ2:
- New leader cannot restore running task information from K8s cluster in AZ1
- New leader starts fresh tasks in K8s cluster in AZ2
- Result: Duplicate tasks running simultaneously in both clusters
- Realtime queries go to these two tasks, causing incorrect query result
However, for middle manager deployment, there's no such problem, because overlords in different AZs are able to communicate with middle managers in different AZs.
So I would like to see this is a feature missing for K8S-based task deployment.
Root Cause
The current KubernetesTaskRunner implementation has a fundamental limitation:
- Single-cluster assumption: Only connects to one Kubernetes cluster
- Local restoration: Only restores tasks from the local K8s cluster during startup
- No cross-cluster visibility: Cannot discover or manage tasks running in remote clusters
Impact
- Data inconsistency: Duplicate indexing tasks process the same data
- Resource waste: Unnecessary compute resources consumed
- Data corruption risk: Conflicting writes to the same segments
- Operational complexity: Manual intervention required to resolve conflicts
2. Solution
High-Level Architecture
2.1 Multi-Cluster Client Management
- Cluster Registry: Maintain a registry of all available Kubernetes clusters across IDCs
- Client Pool: Create and manage Kubernetes client connections to multiple clusters
- Health Monitoring: Continuously monitor cluster availability and connectivity
- Failover Support: Gracefully handle cluster unavailability
2.2 Cross-Cluster Task Discovery
- Unified Task View: Aggregate running tasks from all registered clusters
- Cluster Tagging: Tag each task with its originating cluster information
- State Synchronization: Maintain consistent view of task states across clusters
- Conflict Detection: Identify and prevent duplicate task submissions
2.3 Intelligent Task Placement
- Cluster Selection Strategy: Implement algorithms to select optimal cluster for new tasks
- Load Balancing: Distribute tasks across clusters based on capacity and load
- Affinity Rules: Support cluster-specific constraints and preferences
- Resource Awareness: Consider cluster resources and availability
2.4 Enhanced Task Lifecycle Management
- Cross-Cluster Monitoring: Monitor task status across all clusters
- Unified Control Plane: Provide single interface for task management regardless of cluster
- Graceful Migration: Support task migration between clusters when needed
- Cleanup Coordination: Ensure proper cleanup of tasks across all clusters
Key Components
2.5 Configuration Management
- Multi-Cluster Config: Extend configuration to support multiple cluster definitions
- Context Management: Handle different authentication contexts for each cluster
- Namespace Strategy: Define namespace usage across clusters
- Security Policies: Implement cluster-specific security and access controls
2.6 Monitoring and Observability
- Cross-Cluster Metrics: Aggregate metrics from all clusters
- Unified Logging: Provide consolidated view of task execution across clusters
- Alerting: Alert on cluster-specific or cross-cluster issues
- Dashboard Integration: Extend existing monitoring to show multi-cluster view
2.7 Fault Tolerance
- Cluster Isolation: Ensure failure in one cluster doesn't affect others
- Task Recovery: Implement recovery mechanisms for tasks in failed clusters
- Leader Election: Ensure only one Overlord manages tasks at a time
- Consistency Guarantees: Maintain data consistency across cluster boundaries
Benefits
2.8 Operational Benefits
- High Availability: Tasks continue running even if individual clusters fail
- Geographic Distribution: Leverage multiple IDCs for better performance
- Resource Optimization: Better utilization of compute resources across clusters
- Disaster Recovery: Natural disaster recovery through geographic distribution
2.9 Technical Benefits
- Scalability: Scale beyond single cluster limitations
- Flexibility: Choose optimal cluster for each task based on requirements
- Resilience: Improved fault tolerance and recovery capabilities
- Efficiency: Better resource utilization and load distribution
Implementation Considerations
Implements a new K8S task runner MultiKubernetesClusterTaskRunner which inherits existing KubernetesTaskRunner and overrides the task restore/submit behaviour.
this task runner initializes different k8s client apis by provided kubeconfig for different k8s clusters, and restores tasks from all these client apis.
when a new task is submitted, pick up a live k8s cluster and use corresponding client api to create task on target k8s cluster.