Skip to content

deliveryhero/asya

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

74 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Asya🎭

🎭 is a queue-based actor framework for orchestrating AI/ML workloads on Kubernetes with:

  • Independent scaling: Each actor scales 0β†’N based on its own queue depth
  • Zero infrastructure code: Pure Python functions, no dependencies for queues/routing/retries
  • Dynamic pipelines: Routes are data, not code - modify at runtime
  • Cost efficiency: KEDA autoscaling from zero to max, pay only for active processing

Core idea: Write pure Python functions. Asya handles queues, routing, scaling, and monitoring.

πŸ“˜ Documentation β€’ πŸš€ Quick Start β€’ πŸ—οΈ Architecture β€’ πŸ’‘ Concepts

Delivery Hero

Battle-tested at Delivery Hero for global-scale AI-powered image enhancement. Now powering LLM and agentic workflows.


When to Use Asya🎭

βœ… Ideal For

Multi-step AI/ML pipelines:

  • Document processing (OCR β†’ classification β†’ extraction β†’ storage)
  • Image pipelines (resize β†’ detect β†’ classify β†’ tag)
  • LLM workflows (retrieval β†’ prompt β†’ generate β†’ judge β†’ refine)
  • Video analysis (split β†’ transcribe β†’ summarize β†’ translate)

Event-driven workloads:

  • Webhook processing (GitHub, Stripe, Twilio events)
  • Batch predictions (scheduled model inference)
  • Async API backends (user uploads β†’ background processing)

Cost-sensitive deployments:

  • GPU inference (scale to zero between batches, avoid idle costs)
  • Bursty traffic (10x scale-up for peak hours, zero off-peak)
  • Dev/staging environments (minimize resource waste)

❌ Not Ideal For

  • Real-time inference < 100ms latency: Queue overhead adds latency (use KServe/Seldon instead)
  • Training jobs: Use Kubeflow, Ray Train, or native Kubernetes Jobs instead

See: Motivation | Core Concepts | Use Cases


For Data Scientists πŸ§‘β€πŸ”¬

Write pure Python functions - no decorators, no DAGs, no infrastructure code:

# handler.py
def process(payload: dict) -> dict:
    return {
        **payload,  # Keep existing data
        "result": my_model.predict(payload["input"])
    }

Class handlers for stateful initialization (model loading):

class MyActor:
    def __init__(self, model_path: str = "/models/default"):
        self.model = load_model(model_path)  # Loaded once at pod startup

    def process(self, payload: dict) -> dict:
        return {
            **payload,
            "prediction": self.model.predict(payload["text"])
        }

Envelope mode for dynamic routing (agents, LLM judges):

class LLMJudge:
    def __init__(self, threshold: float = 0.8):
        self.model = load_llm("/models/judge")
        self.threshold = float(threshold)

    def process(self, envelope: dict) -> dict:
        payload = envelope["payload"]
        score = self.model.judge(payload["llm_response"])
        payload["judge_score"] = score

        # Dynamically modify route based on LLM judge score
        route = envelope["route"]
        if score < self.threshold:
            route["actors"].insert(route["current"] + 1, "llm-refiner")

        route["current"] += 1
        return envelope

Pattern: Enrich payload with your results, pass it to next actor. Full pipeline history preserved.

See: Quickstart for Data Scientists | Handler Examples


For Platform Engineers βš™οΈ

Deploy actors via Kubernetes CRDs:

apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
  name: text-classifier
spec:
  transport: sqs  # or rabbitmq
  scaling:
    enabled: true
    minReplicas: 0
    maxReplicas: 100
    queueLength: 5  # Target: 5 messages per pod
  workload:
    kind: Deployment
    template:
      spec:
        containers:
        - name: asya-runtime
          image: my-classifier:latest
          env:
          - name: ASYA_HANDLER
            value: "classifier.TextClassifier.process"
          resources:
            limits:
              nvidia.com/gpu: 1

What happens:

  1. Operator creates queue asya-text-classifier
  2. Operator injects sidecar for message routing
  3. KEDA monitors queue depth, scales 0β†’100 pods
  4. Sidecar routes messages: Queue β†’ Unix socket β†’ Your code β†’ Next queue

Transports: SQS (AWS), RabbitMQ (self-hosted), Kafka/NATS (planned)

See: Quickstart for Platform Engineers | Installation Guides | AsyncActor Examples


Architecture

Asya uses a sidecar pattern for message routing:

  • Operator watches AsyncActor CRDs, injects sidecars, configures KEDA
  • Sidecar handles queue consumption, routing, retries (Go)
  • Runtime executes your Python handler via Unix socket
  • Gateway (optional) provides MCP HTTP API for envelope submission and SSE streaming
  • KEDA monitors queue depth, scales actors 0β†’N

Message flow: Queue β†’ Sidecar β†’ Your Code β†’ Sidecar β†’ Next Queue

See: Architecture Documentation for system diagram, component details, protocols, and deployment patterns


Quick Start

New to Asya? Start here: Getting Started Guide (5 min read)

Then choose your path:

See also: AWS EKS Installation | Local Kind Installation | Helm Charts


Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • Development setup (Go, Python, Docker, Make)
  • Testing workflow (unit, component, integration, E2E)
  • Code standards and linting
  • Pull request process

Prerequisites: Go 1.24+, Python 3.13+, Docker, Make, uv

Quick commands:

make build              # Build all components
make test-unit          # Unit tests (Go + Python)
make test-integration   # Integration tests (Docker Compose)
make test-e2e           # E2E tests (Kind cluster)
make lint               # Linters with auto-fix

License

Copyright Β© 2025 Delivery Hero SE

Licensed under the Apache License, Version 2.0. See LICENSE for details.


Project Status

Alpha software under active development. APIs may change. Production use requires thorough testing.

Maintainers:

  • Artem Yushkovskiy πŸ• (@atemate, @atemate-dh)

Roadmap (see GitHub Discussions):

  • Stabilization and API refinement
  • Additional transports (Kafka, NATS, Google Pub/Sub)
  • Fast pod startup (PVC for model storage)
  • Integrations: KAITO, Knative
  • Enhanced observability (OpenTelemetry tracing)
  • Multi-cluster routing

Feedback: Open an issue or discussion on GitHub ❀️

About

🎭 Actors on Kubernetes for scalable Gen AI

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 6