π is a queue-based actor framework for orchestrating AI/ML workloads on Kubernetes with:
- Independent scaling: Each actor scales 0βN based on its own queue depth
- Zero infrastructure code: Pure Python functions, no dependencies for queues/routing/retries
- Dynamic pipelines: Routes are data, not code - modify at runtime
- Cost efficiency: KEDA autoscaling from zero to max, pay only for active processing
Core idea: Write pure Python functions. Asya handles queues, routing, scaling, and monitoring.
π Documentation β’ π Quick Start β’ ποΈ Architecture β’ π‘ Concepts
Battle-tested at Delivery Hero for global-scale AI-powered image enhancement. Now powering LLM and agentic workflows.
Multi-step AI/ML pipelines:
- Document processing (OCR β classification β extraction β storage)
- Image pipelines (resize β detect β classify β tag)
- LLM workflows (retrieval β prompt β generate β judge β refine)
- Video analysis (split β transcribe β summarize β translate)
Event-driven workloads:
- Webhook processing (GitHub, Stripe, Twilio events)
- Batch predictions (scheduled model inference)
- Async API backends (user uploads β background processing)
Cost-sensitive deployments:
- GPU inference (scale to zero between batches, avoid idle costs)
- Bursty traffic (10x scale-up for peak hours, zero off-peak)
- Dev/staging environments (minimize resource waste)
- Real-time inference < 100ms latency: Queue overhead adds latency (use KServe/Seldon instead)
- Training jobs: Use Kubeflow, Ray Train, or native Kubernetes Jobs instead
See: Motivation | Core Concepts | Use Cases
Write pure Python functions - no decorators, no DAGs, no infrastructure code:
# handler.py
def process(payload: dict) -> dict:
return {
**payload, # Keep existing data
"result": my_model.predict(payload["input"])
}Class handlers for stateful initialization (model loading):
class MyActor:
def __init__(self, model_path: str = "/models/default"):
self.model = load_model(model_path) # Loaded once at pod startup
def process(self, payload: dict) -> dict:
return {
**payload,
"prediction": self.model.predict(payload["text"])
}Envelope mode for dynamic routing (agents, LLM judges):
class LLMJudge:
def __init__(self, threshold: float = 0.8):
self.model = load_llm("/models/judge")
self.threshold = float(threshold)
def process(self, envelope: dict) -> dict:
payload = envelope["payload"]
score = self.model.judge(payload["llm_response"])
payload["judge_score"] = score
# Dynamically modify route based on LLM judge score
route = envelope["route"]
if score < self.threshold:
route["actors"].insert(route["current"] + 1, "llm-refiner")
route["current"] += 1
return envelopePattern: Enrich payload with your results, pass it to next actor. Full pipeline history preserved.
See: Quickstart for Data Scientists | Handler Examples
Deploy actors via Kubernetes CRDs:
apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
name: text-classifier
spec:
transport: sqs # or rabbitmq
scaling:
enabled: true
minReplicas: 0
maxReplicas: 100
queueLength: 5 # Target: 5 messages per pod
workload:
kind: Deployment
template:
spec:
containers:
- name: asya-runtime
image: my-classifier:latest
env:
- name: ASYA_HANDLER
value: "classifier.TextClassifier.process"
resources:
limits:
nvidia.com/gpu: 1What happens:
- Operator creates queue
asya-text-classifier - Operator injects sidecar for message routing
- KEDA monitors queue depth, scales 0β100 pods
- Sidecar routes messages: Queue β Unix socket β Your code β Next queue
Transports: SQS (AWS), RabbitMQ (self-hosted), Kafka/NATS (planned)
See: Quickstart for Platform Engineers | Installation Guides | AsyncActor Examples
Asya uses a sidecar pattern for message routing:
- Operator watches AsyncActor CRDs, injects sidecars, configures KEDA
- Sidecar handles queue consumption, routing, retries (Go)
- Runtime executes your Python handler via Unix socket
- Gateway (optional) provides MCP HTTP API for envelope submission and SSE streaming
- KEDA monitors queue depth, scales actors 0βN
Message flow: Queue β Sidecar β Your Code β Sidecar β Next Queue
See: Architecture Documentation for system diagram, component details, protocols, and deployment patterns
New to Asya? Start here: Getting Started Guide (5 min read)
Then choose your path:
See also: AWS EKS Installation | Local Kind Installation | Helm Charts
We welcome contributions! See CONTRIBUTING.md for:
- Development setup (Go, Python, Docker, Make)
- Testing workflow (unit, component, integration, E2E)
- Code standards and linting
- Pull request process
Prerequisites: Go 1.24+, Python 3.13+, Docker, Make, uv
Quick commands:
make build # Build all components
make test-unit # Unit tests (Go + Python)
make test-integration # Integration tests (Docker Compose)
make test-e2e # E2E tests (Kind cluster)
make lint # Linters with auto-fixCopyright Β© 2025 Delivery Hero SE
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Alpha software under active development. APIs may change. Production use requires thorough testing.
Maintainers:
- Artem Yushkovskiy π (
@atemate,@atemate-dh)
Roadmap (see GitHub Discussions):
- Stabilization and API refinement
- Additional transports (Kafka, NATS, Google Pub/Sub)
- Fast pod startup (PVC for model storage)
- Integrations: KAITO, Knative
- Enhanced observability (OpenTelemetry tracing)
- Multi-cluster routing
Feedback: Open an issue or discussion on GitHub β€οΈ