Indexing Images Based on Visual Description¶
In this notebook we show how you can index images by visual description. We use a small visual description model called MoonDream.
Once you setup Indexify, it will continoulsy extract visual descriptions using Moondream and index the description as images are ingested. You can build reliable applications which have to react to images in real-time. The use of such pipelines spans security, retail, and robotics.
Setup¶
In [ ]:
Copied!
%pip install indexify indexify-extractor-sdk
# Download Indexify Server
!curl https://getindexify.ai | sh
# Download Extractors
!indexify-extractor download tensorlake/moondream
!indexify-extractor download tensorlake/minilm-l6
%pip install indexify indexify-extractor-sdk
# Download Indexify Server
!curl https://getindexify.ai | sh
# Download Extractors
!indexify-extractor download tensorlake/moondream
!indexify-extractor download tensorlake/minilm-l6
After installing the necessary libraries, download the server, and the extractors, you need to restart the runtime. Then, you have to run Indexify Server with the Extractors.
Open 2 terminals and run the following commands:
# Terminal 1
./indexify server -d
# Terminal 2
indexify-extractor join-server
Indexing Images¶
In [34]:
Copied!
from indexify import IndexifyClient
client = IndexifyClient()
from indexify import IndexifyClient
client = IndexifyClient()
In [35]:
Copied!
file_names=[
"skate.jpg", "congestion.jpg", "bushwick-bred.jpg",
"141900.jpg", "132500.jpg", "123801.jpg"
"120701.jpg", "103701.jpg"
]
file_urls = [f"https://extractor-files.diptanu-6d5.workers.dev/images/{file_name}" for file_name in file_names]
for file_url in file_urls:
content_id = client.ingest_remote_file("image", file_url, "image/png", {})
client.wait_for_extraction(content_id)
file_names=[
"skate.jpg", "congestion.jpg", "bushwick-bred.jpg",
"141900.jpg", "132500.jpg", "123801.jpg"
"120701.jpg", "103701.jpg"
]
file_urls = [f"https://extractor-files.diptanu-6d5.workers.dev/images/{file_name}" for file_name in file_names]
for file_url in file_urls:
content_id = client.ingest_remote_file("image", file_url, "image/png", {})
client.wait_for_extraction(content_id)
In [ ]:
Copied!
extraction_graph_spec = """
name: "image"
extraction_policies:
- extractor: "tensorlake/moondream"
name: "image_descriptions"
- extractor: "tensorlake/minilm-l6"
name: "image_description_embedding_index"
content_source: "image_descriptions"
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
extraction_graph_spec = """
name: "image"
extraction_policies:
- extractor: "tensorlake/moondream"
name: "image_descriptions"
- extractor: "tensorlake/minilm-l6"
name: "image_description_embedding_index"
content_source: "image_descriptions"
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
In [ ]:
Copied!
client.indexes()
client.indexes()
In [40]:
Copied!
result = client.search_index(
name="image.image_description_embedding_index.embedding",
query="skateboard",
top_k=5
)
result
result = client.search_index(
name="image.image_description_embedding_index.embedding",
query="skateboard",
top_k=5
)
result
Out[40]:
[{'content_id': 'c969a5b1844d73b7', 'text': 'The image captures a bustling city street scene. A man in a blue striped shirt and a woman in a yellow jacket are walking on the sidewalk, while a man in a black shirt and a woman in a white shirt are riding a skateboard. The street is filled with cars and taxis, and the buildings lining the street are tall and modern, with a green awning visible on one of them. The sky is clear and blue, and the sun is shining brightly, casting a warm glow over the scene.', 'mime_type': 'text/plain', 'confidence_score': 1.2513796, 'labels': {}}, {'content_id': 'e5c109fe9d24f7dc', 'text': 'The image depicts a vibrant underwater coral reef scene. The coral, a mix of white and brown, is teeming with life, including several fish in shades of black and white. The fish are scattered throughout the coral, some swimming near the top and others closer to the bottom. The background is a deep blue color, likely the water surrounding the coral reef.', 'mime_type': 'text/plain', 'confidence_score': 1.6255214, 'labels': {}}, {'content_id': 'b5112d5b60be581a', 'text': 'The image depicts a lively street scene with a group of people gathered on the sidewalk, engaged in various activities. A large mural on the side of a building features a man and a woman, adding a touch of artistry to the urban landscape. The street is lined with trees, providing a natural element to the urban setting. A black fence can be seen in the background, and a bicycle is parked on the sidewalk.', 'mime_type': 'text/plain', 'confidence_score': 1.6457492, 'labels': {}}, {'content_id': '42ea119700e2d319', 'text': 'The image captures a serene harbor scene with a large, three-masted sailing ship with a red and white flag on its deck, sailing towards the right side of the frame. In the foreground, a smaller sailboat with a white hull and red sails is also sailing towards the right side of the image. The harbor is surrounded by buildings, and the sky is filled with clouds, creating a calm and peaceful atmosphere.', 'mime_type': 'text/plain', 'confidence_score': 1.6898127, 'labels': {}}, {'content_id': 'd3dc0d541281cdb', 'text': 'The image depicts a serene harbor scene with a variety of boats of different sizes and colors docked in the calm water. The boats are scattered across the harbor, with some closer to the shore and others further out. The sky above is a clear blue, providing a beautiful backdrop to the scene. In the distance, the city skyline can be seen, adding depth to the image.', 'mime_type': 'text/plain', 'confidence_score': 1.7620384, 'labels': {}}]
In [41]:
Copied!
content_ids = [r['content_id'] for r in result]
content_metadata_list = [client.get_content_metadata(content_id) for content_id in content_ids]
parent_ids = [cm['content_metadata']['parent_id'] for cm in content_metadata_list]
content_ids = [r['content_id'] for r in result]
content_metadata_list = [client.get_content_metadata(content_id) for content_id in content_ids]
parent_ids = [cm['content_metadata']['parent_id'] for cm in content_metadata_list]
In [42]:
Copied!
parent_ids
parent_ids
Out[42]:
['WR2CGS6wh3Wwiyz2', 'HGqVUfdV62ahZwdx', 'P8J9Gbt1QMl9p4Yq', 'sLBgvOCB1NAmWldK', 'qDj1PuTCispXZM5C']
In [43]:
Copied!
image_bytes = [client.download_content(parent_id) for parent_id in parent_ids]
image_bytes = [client.download_content(parent_id) for parent_id in parent_ids]
In [44]:
Copied!
from IPython.display import Image
from IPython.display import display
from IPython.display import Image
from IPython.display import display
In [45]:
Copied!
display(*[Image(data=b) for b in image_bytes])
display(*[Image(data=b) for b in image_bytes])