Chunking and Indexing
- The extraction graph creates an endpoint which accepts audio files and transcribes them using OpenAI's Whisper model
- The transcription is fed through a chunking function to chunk the transcript into smaller segments.
- The chunks are embedded and indexed.
Code Reference
graph.yaml
- contains the extraction graph.setup_graph.py
- Sets up the extraction graph in Indexify Serverupload_and_retrieve.py
- Uploads audio into the extraction graph, waits for extraction and finally retrieves from the endpoint.
Download & Start Indexify Server
Download & Join Indexify Extractors
Terminal 2
virtualenv ve
source ve/bin/activate
pip install indexify-extractor-sdk
indexify-extractor download tensorlake/whisper-asr
indexify-extractor download tensorlake/minilm-l6
indexify-extractor join-server
Setup the Graph
Upload Data and Retrieve
The next step is to upload an audio file and retreive the transcript