Skip to main content
Get started using the Soniox audio transcription loader in LangChain.

Setup

Install the package:
pip install langchain-soniox

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Example how to transcribe audio file using the SonioxDocumentLoader and generate the summary with an LLM.
from langchain_soniox import SonioxDocumentLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

audio_file_url = "https://soniox.com/media/examples/coffee_shop.mp3"
loader = SonioxDocumentLoader(file_url=audio_file_url)

print(f"Transcribing {audio_file_url}...")
docs = loader.load()

transcript_text = docs[0].page_content
print(f"Transcript: {transcript_text}")

# Create a chain to summarize the transcript
prompt = ChatPromptTemplate.from_template(
    "Write a concise summary of the following speech:\n\n{transcript}"
)

chain = prompt | ChatOpenAI(model="gpt-5-mini") | StrOutputParser()
summary = chain.invoke({"transcript": transcript_text})
print(summary)
You can also load audio from a local file or from bytes:
# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")

# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
    audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)

Async transcription

For async operations, use aload() or alazy_load():
import asyncio
from langchain_soniox import SonioxDocumentLoader

async def transcribe_async():
    loader = SonioxDocumentLoader(
        file_url="https://soniox.com/media/examples/coffee_shop.mp3"
    )

    docs = [doc async for doc in loader.alazy_load()]
    print(docs[0].page_content)

asyncio.run(transcribe_async())

Advanced usage

Language hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages. Language hints do not restrict recognition—they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        language_hints=["en", "es"],
    ),
)

docs = loader.load()
For more details, see the Soniox language hints documentation.

Speaker diarization

Enable speaker identification to distinguish between different speakers:
from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_speaker_diarization=True,
    ),
)

docs = loader.load()

# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_speaker != token["speaker"]:
        current_speaker = token["speaker"]
        output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

# Analyze the conversation
prompt = ChatPromptTemplate.from_template(
    """
    Analyze the following conversation between speakers.
    Identify the intent of each speaker.

    Conversation:
    {conversation}
    """
)

chain = prompt | ChatOpenAI(model="gpt-5-mini") | StrOutputParser()
analysis = chain.invoke({"conversation": output})
print(analysis)

Language identification

Enable automatic language detection and identification:
from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_language_identification=True,
    ),
)

docs = loader.load()

# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_language != token["language"]:
        current_language = token["language"]
        output += f"\n[{current_language}] {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary. The context object supports four optional sections:
from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    StructuredContext,
    StructuredContextGeneralItem,
    StructuredContextTranslationTerm,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        context=StructuredContext(
            # Structured key-value information (domain, topic, intent, etc.)
            general=[
                StructuredContextGeneralItem(key="domain", value="Healthcare"),
                StructuredContextGeneralItem(
                    key="topic", value="Diabetes management consultation"
                ),
                StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
            ],
            # Longer free-form background text or related documents
            text="The patient has a history of...",
            # Domain-specific or uncommon words
            terms=["Celebrex", "Zyrtec", "Xanax"],
            # Custom translations for ambiguous terms
            translation_terms=[
                StructuredContextTranslationTerm(
                    source="Mr. Smith", target="Sr. Smith"
                ),
                StructuredContextTranslationTerm(source="MRI", target="RM"),
            ],
        ),
    ),
)

docs = loader.load()
For more details, see the Soniox context documentation.

Translation

Translate from any detected language to a target language:
from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    TranslationConfig,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        translation=TranslationConfig(
            type="one_way",
            target_language="fr",
        ),
        language_hints=["en"],
    ),
)

docs = list(loader.lazy_load())

translated_text = ""
original_text = ""

for token in docs[0].metadata["tokens"]:
    if token["translation_status"] == "translation":
        translated_text += token["text"]
    else:
        original_text += token["text"]

print(original_text)
print(translated_text)
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.

API reference

Constructor parameters

ParameterTypeRequiredDefaultDescription
file_pathstrNo*NonePath to local audio file to transcribe
file_databytesNo*NoneBinary data of audio file to transcribe
file_urlstrNo*NoneURL of audio file to transcribe
api_keystrNoSONIOX_API_KEY env varSoniox API key
base_urlstrNohttps://api.soniox.com/v1API base URL (see regional endpoints)
optionsSonioxTranscriptionOptionsNoSonioxTranscriptionOptions()Transcription options
polling_interval_secondsfloatNo1.0Time between status polls (seconds)
timeout_secondsfloatNo300.0 (5 minutes)Maximum time to wait for transcription
http_request_timeout_secondsfloatNo60.0Timeout for individual HTTP requests
* You must specify exactly one of: file_path, file_data, or file_url.

Transcription options

The SonioxTranscriptionOptions class supports these parameters:
ParameterTypeDescription
modelstrAsync model to use (see available models)
language_hintslist[str]Language hints for transcription (ISO language codes)
language_hints_strictboolEnforce strict language hints
enable_speaker_diarizationboolEnable speaker identification
enable_language_identificationboolEnable language detection
translationTranslationConfigTranslation configuration
contextStructuredContextContext for improved accuracy
client_reference_idstrCustom reference ID for your records
webhook_urlstrWebhook URL for completion notifications
webhook_auth_header_namestrCustom auth header name for webhook
webhook_auth_header_valuestrCustom auth header value for webhook
Browse the API documentation for a full list of supported options.

Return value

The lazy_load() and alazy_load() methods yield a single Document object:
Document(
    page_content=str,  # The transcribed text
    metadata={
        "source": str,  # File URL, path, or "file_upload"
        "transcription_id": str,  # Unique transcription ID
        "audio_duration_ms": int,  # Audio duration in milliseconds
        "model": str,  # Model used for transcription
        "created_at": str,  # ISO 8601 timestamp
        "tokens": list[dict],  # Detailed token-level information
    }
)
The tokens array in metadata includes detailed information for each transcribed word:
  • text: The transcribed text
  • start_ms: Start time in milliseconds
  • end_ms: End time in milliseconds
  • speaker: Speaker ID (if diarization enabled), for example "1", "2", etc.
  • language: Detected language (if identification enabled), for example "en", "fr", etc.
  • translation_status: Translation status ("original", "translated" or "none")
Learn more about the Soniox API reference.