Skip to content

RerankReturnType not JSON serializable - Worker crashes when returning rerank results #37

@Khalid-J02

Description

@Khalid-J02

@michaelfeil

Description

When trying to send a request using the reranker models such as BAAI/bge-reranker-v2-m3, the worker successfully processes rerank requests but crashes when attempting to return results, causing 60-second client timeouts. The error indicates that RerankReturnType objects from the infinity-emb library are not JSON serializable.

Error Message

Error while returning job result. | Object of type RerankReturnType is not JSON serializable

Environment

  • RunPod Serverless: Yes
  • infinity-emb version: 0.0.76
  • runpod version: ~1.7.0
  • Base Image: nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
  • Python: 3.11

Steps to Reproduce

  1. Deploy the worker-infinity-embedding to RunPod Serverless
  2. Send a rerank request with the following structure:
{
  "input": {
    "query": "your search query",
    "docs": ["doc1", "doc2", "doc3"],
    "model": "your-rerank-model",
    "return_docs": true
  }
}
  1. Worker processes successfully but crashes when returning results
  2. Client receives timeout after 60 seconds

Root Cause

The handler.py returns the result from embedding_service.infinity_rerank() directly, which returns a RerankReturnType Pydantic model object. RunPod's serverless framework requires plain Python dictionaries (JSON-serializable objects) as return values.

The issue is in handler.py at this code block:

if job_input.get("query"):
    call_fn, kwargs = embedding_service.infinity_rerank, {
        "query": job_input.get("query"),
        "docs": job_input.get("docs"),
        "return_docs": job_input.get("return_docs"),
        "model_name": job_input.get("model"),
    }

And later:

try:
    out = await call_fn(**kwargs)
    return out  # ❌ This returns a Pydantic model, not a dict
except Exception as e:
    return create_error_response(str(e)).model_dump()

Proposed Solution

Convert all Pydantic model responses to dictionaries before returning:

try:
    out = await call_fn(**kwargs)
    # Convert Pydantic models to dicts
    if hasattr(out, 'model_dump'):
        return out.model_dump()
    elif hasattr(out, 'dict'):
        return out.dict()
    return out
except Exception as e:
    return create_error_response(str(e)).model_dump()

Alternatively, ensure each route handler explicitly converts its response:

if job_input.get("query"):
    call_fn, kwargs = embedding_service.infinity_rerank, {
        "query": job_input.get("query"),
        "docs": job_input.get("docs"),
        "return_docs": job_input.get("return_docs"),
        "model_name": job_input.get("model"),
    }
    result = await call_fn(**kwargs)
    return result.model_dump() if hasattr(result, 'model_dump') else result

Additional Context

  • Embedding requests work fine (possibly because they return serializable structures)
  • Error responses work correctly (they use .model_dump())
  • This affects all rerank operations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions