ASR Diarization

The ASR and diarization pipelines are modularly implemented, with diarization built on ASR outputs. Pyannote is recommended for state-of-the-art diarization. Speculative decoding is added to speed up inference, using a smaller model to suggest generations validated by the larger model. Note that this requires matching decoder architectures and a batch size of 1. For Whisper, a distilled version is suggested as the assistant model.

indexify-extractor download tensorlake/asrdiarization

Whisper

This extractor converts extracts transcriptions from audio. The entire text and chunks with timestamps are represented as metadata of the content.

Speaker Diarization

This extractor indentifies the speaker for each sentence in the transcription generated by Whisper.

Was this page helpful?