Skip to content

Audio Extractors

Extracting data from audio another important use case. Audio data is more complex than text because of the medium and nature of the information. Indexify allows you to choose between different extractors based on your use case and source of data. If you want to learn more about extractors, their design and usage, read the Indexify documentation.

Extractor Name Use Case Supported Input Types
ASR Diarization Speech recognition and speaker diarization audio, audio/mpeg
Whisper Audio transcription audio, audio/mpeg
Speaker Diarization Speaker identification in transcriptions audio, audio/mpeg
WhisperGroq Whisper ASR using GROQ audio, audio/mpeg
Whisper MLX Whisper ASR on Apple MLX audio, audio/mpeg

ASR Diarization Static Badge

Description

The ASR and diarization pipelines are modularly implemented, with diarization built on ASR outputs. Pyannote is recommended for state-of-the-art diarization. Speculative decoding is added to speed up inference, using a smaller model to suggest generations validated by the larger model. Note that this requires matching decoder architectures and a batch size of 1. For Whisper, a distilled version is suggested as the assistant model.

Input Data Types

["audio", "audio/mpeg"]

Class Name

ASRExtractor

Download Command

indexify-extractor download tensorlake/asrdiarization

Whisper Static Badge

Description

This extractor converts extracts transcriptions from audio. The entire text and chunks with timestamps are represented as metadata of the content.

Input Data Types

["audio", "audio/mpeg"]

Class Name

WhisperExtractor

Download Command

indexify-extractor download tensorlake/whisper-asr
docker run -d tensorlake/whisper-asr

Speaker Diarization

Description

This extractor identifies the speaker for each sentence in the transcription generated by Whisper.

Input Data Types

["audio", "audio/mpeg"]

Class Name

WhisperDiarizationExtractor

Download Command

indexify-extractor download tensorlake/whisper-diarization
docker run -d tensorlake/whisper-diarization

WhisperGroq

Description

Whisper ASR using GROQ.

Input Data Types

["audio", "audio/mpeg"]

Class Name

WhisperExtractor

Download Command

indexify-extractor download tensorlake/whispergroq

Whisper MLX

Description

Whisper ASR on Apple MLX.

Input Data Types

["audio", "audio/mpeg"]

Class Name

WhisperExtractor

Download Command

indexify-extractor download tensorlake/whisper-mlx