Skip to main content
Get started using the Soniox audio transcription loader in LangChain.

Setup

Install the package:
npm2yarn
npm install @soniox/langchain

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Example how to transcribe audio file using the SonioxAudioTranscriptLoader and generate the summary with an LLM.
import { SonioxAudioTranscriptLoader } from "@soniox/langchain";
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const audioFileUrl = "https://soniox.com/media/examples/coffee_shop.mp3";
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    language_hints: ["en"],
    // Any other transcription parameters you find here
    // https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription
  }
);

console.log(`Transcribing ${audioFileUrl}...`);
const docs = await loader.load();

const transcriptText = docs[0].pageContent;
console.log(`Transcript: ${transcriptText}`);

// Create a chain to summarize the transcript
const prompt = ChatPromptTemplate.fromTemplate(
  "Write a concise summary of the following speech:\n\n{transcript}"
);

const chain = prompt
  .pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
  .pipe(new StringOutputParser());

const summary = await chain.invoke({ transcript: transcriptText });
console.log(summary);
You can also transcribe audio from binary data:
// Fetch the file
const response = await fetch("https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3");
const audioBuffer = await response.bytes(); // Uint8Array

const loader = new SonioxAudioTranscriptLoader({
    audio: audioBuffer,
})

const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text

Translation

Translate from any detected language to a target language:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    translation: {
      type: "one_way",
      target_language: "fr",
    },
    language_hints: ["en"],
  }
);

const docs = await loader.load();

let originalText = "";
let translatedText = "";

for (const token of docs[0].metadata.tokens) {
  if (token.translation_status === "translation") {
    translatedText += token.text;
  } else {
    originalText += token.text;
  }
}

console.log(originalText);
console.log(translatedText);
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.

Language hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages. Language hints do not restrict recognition—they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    language_hints: ["en", "es"],
  }
);

const docs = await loader.load();
For more details, see the Soniox language hints documentation.

Speaker diarization

Enable speaker identification to distinguish between different speakers:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    enable_speaker_diarization: true,
  }
);

const docs = await loader.load();

// Access speaker information in the metadata
let currentSpeaker = null;
let output = "";
for (const token of docs[0].metadata.tokens) {
  if (currentSpeaker !== token.speaker) {
    currentSpeaker = token.speaker;
    output += `\nSpeaker ${currentSpeaker}: ${token.text.trimStart()}`;
  } else {
    output += token.text;
  }
}
console.log(output);

// Analyze the conversation
const prompt = ChatPromptTemplate.fromTemplate(
  `Analyze the following conversation between speakers.
Identify the intent of each speaker.

Conversation:
{conversation}`
);

const chain = prompt
  .pipe(new ChatOpenAI({ model: "gpt-5-mini" }))
  .pipe(new StringOutputParser());

const analysis = await chain.invoke({ conversation: output });
console.log(analysis);

Language identification

Enable automatic language detection and identification:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioFileUrl,
  },
  {
    enable_language_identification: true,
  }
);

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    context: {
      general: [
        { key: "industry", value: "healthcare" },
        { key: "meeting_type", value: "consultation" }
      ],
      terms: ["hypertension", "cardiology", "metformin"],
      translation_terms: [
        { source: "blood pressure", target: "presión arterial" },
        { source: "medication", target: "medicamento" }
      ]
    }
  }
);
For more details, see the Soniox context documentation.

API reference

Constructor parameters

SonioxLoaderParams (required)

ParameterTypeRequiredDescription
audioUint8Array | stringYesAudio file as buffer or URL
audioFormatSonioxAudioFormatNoAudio file format
apiKeystringNoSoniox API key (defaults to SONIOX_API_KEY env var)
apiBaseUrlstringNoAPI base URL (defaults to https://api.soniox.com/v1)
pollingIntervalMsnumberNoPolling interval in ms (min: 1000, default: 1000)
pollingTimeoutMsnumberNoPolling timeout in ms (default: 180000)

SonioxLoaderOptions (optional)

ParameterTypeDescription
modelSonioxTranscriptionModelIdModel to use (default: "stt-async-v4")
translationobjectTranslation configuration
language_hintsstring[]Language hints for transcription
language_hints_strictbooleanEnforce strict language hints
enable_speaker_diarizationbooleanEnable speaker identification
enable_language_identificationbooleanEnable language detection
contextobjectContext for improved accuracy
Browse the documentation for a full list of supported options.

Supported audio formats

  • aac - Advanced Audio Coding
  • aiff - Audio Interchange File Format
  • amr - Adaptive Multi-Rate
  • asf - Advanced Systems Format
  • flac - Free Lossless Audio Codec
  • mp3 - MPEG Audio Layer III
  • ogg - Ogg Vorbis
  • wav - Waveform Audio File Format
  • webm - WebM Audio

Return value

The load() method returns an array containing a single Document object:
type Document {
  pageContent: string, // The transcribed text
  metadata: SonioxTranscriptResponse // Full transcript with metadata
}
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
type SonioxTranscriptResponse = {
  id: string;
  text?: string | null;
  tokens?: SonioxTranscriptToken[] | null;
}
Token type:
type SonioxTranscriptToken = {
  text: string;
  start_ms?: number | null;
  end_ms?: number | null;
  confidence?: number | null;
  speaker?: number | string | null;
  language?: string | null;
  translation_status?: string | null;
};
You can learn more about the SonioxTranscriptResponse type in the Soniox REST API Reference.