Chunking and Indexing

The extraction graph creates an endpoint which accepts audio files and transcribes them using OpenAI's Whisper model
The transcription is fed through a chunking function to chunk the transcript into smaller segments.
The chunks are embedded and indexed.

Code Reference

graph.yaml - contains the extraction graph.
setup_graph.py - Sets up the extraction graph in Indexify Server
upload_and_retrieve.py - Uploads audio into the extraction graph, waits for extraction and finally retrieves from the endpoint.

Download & Start Indexify Server

Terminal 1

curl https://getindexify.ai | sh
./indexify server -d

Download & Join Indexify Extractors

Terminal 2

virtualenv ve
source ve/bin/activate

pip install indexify-extractor-sdk
indexify-extractor download tensorlake/whisper-asr
indexify-extractor download tensorlake/minilm-l6
indexify-extractor join-server

Setup the Graph

Terminal 3

python setup_graph.py

Upload Data and Retrieve

The next step is to upload an audio file and retreive the transcript

Terminal 3

python upload_and_retrieve.py