Skip to content

[Enhancement] Add LangChain-compatible DocumentLoader for audio transcription #685

@deepgram-robot

Description

@deepgram-robot

Summary

Add a DeepgramAudioLoader class that implements the LangChain BaseLoader interface, enabling developers to load audio files as LangChain Document objects with Deepgram transcription — ready for use in RAG pipelines, agent tools, and chain workflows.

Problem it solves

Developers using LangChain for AI agent and RAG workflows currently have no built-in way to ingest audio content via Deepgram. They must write custom glue code to transcribe audio and convert results into LangChain Document format. A native loader in the SDK (or as a companion utility) would make Deepgram a first-class audio source in the LangChain ecosystem, matching patterns developers already use for PDF, CSV, and web loaders.

Proposed API

from deepgram.integrations.langchain import DeepgramAudioLoader

loader = DeepgramAudioLoader(
    file_paths=["meeting.mp3", "call.wav"],
    api_key="...",  # or from env
    model="nova-3",
    smart_format=True,
    diarize=True,
)

documents = loader.load()
# Each Document has:
#   page_content = transcript text
#   metadata = {"source": "meeting.mp3", "duration": 120.5, "speakers": 3, ...}

Acceptance criteria

  • Implements LangChain BaseLoader interface (or BaseBlobParser)
  • Supports file paths, URLs, and raw bytes as input
  • Populates Document metadata with Deepgram response fields (duration, speakers, confidence, model)
  • Works with both sync and async LangChain patterns
  • Documented with usage example
  • Compatible with existing SDK API

Raised by the DX intelligence system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions