Audio Extraction
Real Time Speech Recognition Pipelines ๐ง
You can build real time pipelines with Indexify that incorporates speech, build applications that retrieve information from the audio. We describe the possible tasks and provide some examples -
๐ What Can You Achieve with Indexify?
With Indexify, you can accomplish the following with your audio files:
- ๐ Speech-to-Text: Easily convert spoken words into written text, enabling you to analyze and search through your audio content effortlessly. Say goodbye to manual transcription and hello to automated efficiency!
- ๐ Audio Indexing: Build comprehensive indexes on vector stores and structured stores by combining audio extractors with chunking, embedding, and structured data extractors. Create a searchable knowledge base thatโs always at your fingertips!
- ๐ค Audio Q&A: Leverage the power of LLMs to query your audio indexes and get accurate answers to your questions. Itโs like having a personal assistant that understands your audio files inside out!
๐ง The Extraction Pipeline: A Three-Stage Journey
To unlock the full potential of your audio files, weโve designed a seamless three-stage extraction pipeline that will take you from raw audio to actionable insights:
- ๐ค Content Extraction Stage: Start by extracting raw content from your audio files using extractors like
tensorlake/whisper-mlx
,tensorlake/whisper-asr
, ortensorlake/asrdiarization
. These extractors will convert the spoken words into text, laying the foundation for further analysis. - โ๏ธ Content to Chunk Extraction Stage: Break down the extracted text into manageable chunks using extractors like
text/chunking
. This stage helps organize your content into coherent and contextually relevant pieces, making it easier to process and understand. - ๐ง Chunk to Embedding Extraction Stage: Convert the chunks into vector embeddings using extractors like
embedding/minilm-l6
orembedding/arctic
. By transforming your content into numerical representations, you enable powerful similarity search and retrieval capabilities.
By chaining these stages together, you can create a powerful pipeline that enables question answering using the RAG (Retrieval-Augmented Generation) approach. Watch as your audio files come to life, ready to answer any question you throw at them! ๐
๐ Explore the Audio Extractor Landscape
We offer a range of audio extractors to suit your specific needs. Hereโs a quick overview of our pre-built extractors:
Extractor | Output Type | Best For | Example Usage |
---|---|---|---|
tensorlake/whisper-mlx | text | macOS devices | |
tensorlake/whisper-asr | text | Regular devices | Audio RAG, Audio Transcription |
tensorlake/asrdiarization | text | Multi-speaker conversations, assisted generation | ASR Diarization Colab Notebook |
Choosing the Right Extractor
When selecting an audio extractor, consider your specific requirements and the nature of your audio files:
- If you need advanced features like speaker diarization and speculative decoding,
tensorlake/asrdiarization
is the way to go. It leverages state-of-the-art models to handle complex audio scenarios and deliver rich insights. - For straightforward speech-to-text conversion on macOS devices,
tensorlake/whisper-mlx
is a great choice. It provides accurate transcription results optimized for the macOS environment. - If youโre working with general-purpose audio files on regular devices,
tensorlake/whisper-asr
offers reliable and efficient transcription capabilities. Itโs compatible with a wide range of operating systems and can handle various audio qualities.
Remember, you can always experiment with different extractors and compare their results to find the one that best suits your needs. Indexify provides the flexibility to switch between extractors seamlessly, allowing you to explore and leverage the strengths of each one.
๐ Get Started with Audio Extraction
You can test it locally and unlock the secrets hidden within your audio files:
-
Download an Audio Extractor:
indexify-extractor download tensorlake/whisper-asr indexify-extractor join-server whisper-asr.whisper_extractor:WhisperExtractor
-
(Optional) Load it in a notebook or terminal:
from indexify_extractor_sdk import load_extractor, Content extractor, config_cls = load_extractor("indexify_extractors.whisper-asr.whisper_extractor:WhisperExtractor") content = Content.from_file("/path/to/audio.mp3") results = extractor.extract(content,params={}) print(results)
๐ Continuous Audio Extraction for Applications
Weโve made it incredibly easy to integrate Indexify into your workflow. Get ready to supercharge your audio processing capabilities! ๐
-
Start the Indexify Server and Extraction Policies:
curl https://getindexify.ai | sh ./indexify server -d
-
Start a long-running Audio Extractor:
indexify-extractor download tensorlake/whisper-asr indexify-extractor join-server
-
Create an Extraction Graph:
from indexify import IndexifyClient client = IndexifyClient() extraction_graph_spec = """ name: 'audioknowledgebase' extraction_policies: - extractor: 'tensorlake/whisper-asr' name: 'my-audio-extractor' """ extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec) client.create_extraction_graph(extraction_graph)
-
Upload Audio Files from your application:
from indexify import IndexifyClient client = IndexifyClient() content_id = client.upload_file("audioknowledgebase", "/path/to/audio.mp3")
-
Inspect the extracted content:
extracted_content = client.get_extracted_content(content_id, 'audioknowledgebase', 'my-audio-extractor') print(extracted_content)
With just a few lines of code, you can use data locked in audio files in your applications. Example use-cases: automated transcription, intelligent audio search, and effortless question answering.
๐ Explore More Examples
Check out this inspiring example to showcase the power of audio extraction:
- ASR Diarization Colab Notebook: Experience the state-of-the-art ASR + diarization + speculative decoding capabilities. ๐๏ธ๐ฃ๏ธ
Was this page helpful?