Indexify complements DSPy by providing a robust platform for indexing large volume of multi-modal content such as PDFs, raw text, audio and video. It provides a retriever API to retrieve context for LLMs.

We provide a Indexify Retreival Module for DSPy, that works with DSPy.

It will act as a wrapper that internally uses your indexify client for indexing and querying but externally leverages the modularity and design of DSPy.

Install the Indexify DSPy retriever package -

pip install indexify-dspy indexify

Import the necessary libraries

import dspy
from indexify import IndexifyClient
from indexify_dspy import IndexifyRM

Instantiate the Retreival Model

You can create a Retreival Model to retrieve from an index mantained by Indexify. Use the DSPy settings to confgiure the retriever model.

turbo = dspy.OpenAI(model="gpt-3.5-turbo")
indexify_client = IndexifyClient()
indexify_retriever_model = IndexifyRM("index_name", indexify_client, k=3)

dspy.settings.configure(lm=turbo, rm=indexify_retriever_model)

Using the Retreival Model is very simple

retrieve = dspy.Retrieve(k=3)
question = "Who are the NBA Finals MVPs"
topK_passages = retrieve(question).passages

Create an indexify client and populate it with some documents

indexify_client = IndexifyClient()

extraction_graph_spec = """
name: 'myextractiongraph'
extraction_policies:
  - extractor: 'tensorlake/minilm-l6'
    name: 'minilml6'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)  

indexify_client.add_documents(
    "myextractiongraph",
    [
        "Indexify is amazing!",
        "Indexify is a retrieval service for LLM agents!",
        "Steph Curry is the best basketball player in the world.",
    ],
)

Initialize the IndexifyRM class

Using the RM class

retrieve = IndexifyRM(indexify_client)
topk_passages = retrieve("Sports", "myextractiongraph.minilml6.embedding", k=2).passages
print(topk_passages)

Setting up DSPy Module with Indexify

You can use IndexifyRM like any other DSPy module or build your own wrapper for retrieval using the Indexify client following this example.

class RAG(dspy.Module):
    def __init__(self, num_passages=2):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        ...

    def forward(self, question):
        context = self.retrieve(question).passages
        ...

Was this page helpful?