Image Retrieval
Image Retrieval based on natural language is typically done in the following manner today -
- Embed images with CLIP, embed the query using the same model and do KNN search to retrieve semantically similar images.
- Describe an image using a Visual LLM such as LLava and GPT-V, embed the description and retrieve images by searching the descriptions.
While these work, semantic search on descriptions or CLIP based algorithms retrieve semantically similar images so they can be less accurate. In addition to that, running these models are expensive. To make retrieval more accurate and cheaper, Indexify in addition to supporting CLIP and VLLM based extractors, also supports SQL based querying of images which are far more cheaper and accurate. Indexify automatically exposes structured data extracted by object detection and tracking models with a SQL interface.
For SQL based retrieval, you will -
- Run object detection extractors powered by efficient models like YoloV9 or Grounding Dino(if you want prompt based extraction).
- You will make SQL queries with predicates to find relevant images from your applications.
In this tutorial we will show you how to do all three image retrieval. We will upload some images of New York City, and query them with Natural Language to find images with skateboards.
Download Indexify
Download and start Indexify!
Download and Run Yolo Extractor
(Optional) How to test Yolo Extractor Locally
Load Yolo Extractor in a notebook or terminal
from indexify_extractor_sdk import load_extractor, Content
extractor, config_cls = load_extractor("indexify_extractors.yolo.yolo_extractor:YoloExtractor")
content = Content.from_file("/path/to/file.jpg")
results = extractor.extract(content)
print(results)
Create an Extraction Policy
extraction_graph_spec = """
name: 'imageknowledgebase'
extraction_policies:
- extractor: 'tensorlake/yolo-extractor'
name: 'object_detection'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
Upload Files
Let's add some files stored remotely, which we can use for the rest of the tutorial.
file_names=["skate.jpg", "congestion.jpg", "bushwick-bred.jpg", "141900.jpg", "132500.jpg", "123801.jpg","120701.jpg", "103701.jpg"]
file_urls = [f"https://extractor-files.diptanu-6d5.workers.dev/images/{file_name}" for file_name in file_names]
for file_url in file_urls:
client.ingest_remote_file(file_url, "image/png", {})
SQL Based Retrieval
Search using SQL
Lets find a image with a skateboard!
Make OpenAI find the images using Langchain!
We can make OpenAI generate the SQL query based on a language -
Semantic Search with CLIP Embeddings
OpenAI's CLIP embedding model allows searching images with semantically similar description of images.
Download and start the Clip Embedding Extractor
Create an Extraction Graph
extraction_graph_spec = """
name: 'imageknowledgebase'
extraction_policies:
- extractor: 'tensorlake/clip-extractor'
name: 'clip_embedding'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
imageknowledgebase.clip_embedding.embedding
. You can also find the index name via the following API -
Upload Images
Upload some images or search on the images which were already uploaded