Skip to content

Image Extractors

Images present a different type of unstructured data. Indexify allows you the freedom to choose the extractor of your choice based on the specific use-case and type of images. If you want to learn more about extractors, their design and usage, read the Indexify documentation.

Extractor Name Use Case Supported Input Types
Florence Image Extractor Image analysis and captioning image/jpeg, image/png, image/gif
Yolo Object detection image/jpeg, image/png
GroundingDino Zero-shot object detection image/jpeg, image/png
Moondream Image question answering image/bmp, image/gif, image/jpeg, image/png, image/tiff

Florence Image Extractor Static Badge

Description

FlorenceImageExtractor is a powerful image analysis tool leveraging the Microsoft Florence-2 large vision-language model. It performs a wide range of image understanding tasks, from simple captioning to complex visual reasoning.

Input Parameters

  • model_name (default: 'microsoft/Florence-2-large'): Name of the Florence model to use
  • task_prompt (default: ''): Task prompt for the model
  • text_input (optional): Additional text input for the task

Input Data Types

["image/jpeg", "image/png", "image/gif"]

Class Name

FlorenceImageExtractor

Download Command

indexify-extractor download tensorlake/florence

Yolo

Description

Extract bounding boxes and class names from Ultralytics YOLOv9 model.

Input Parameters

  • model_name (default: 'yolov8n.pt'): Name of the YOLO model to use
  • conf (default: 0.25): Confidence threshold for detections
  • iou (default: 0.7): IoU threshold for non-maximum suppression

Input Data Types

["image/jpeg", "image/png"]

Class Name

YoloExtractor

Download Command

indexify-extractor download tensorlake/yolo-extractor

GroundingDino

Description

This extractor uses Grounding DINO model and accepts an (image, text prompt) pair as inputs. It is based on IDEA-Research GroundingDINO and can identify objects from categories it was not specifically trained on based on a prompt.

Input Parameters

  • prompt (default: "person"): Text prompt for object detection
  • box_threshold (default: 0.35): Threshold for bounding box detection
  • text_threshold (default: 0.25): Threshold for text detection

Input Data Types

["image/jpeg", "image/png"]

Class Name

GroundingDinoExtractor

Download Command

indexify-extractor download tensorlake/groundingdino

Moondream Static Badge

Description

Moondream2 is a 1.86B parameter tiny vision language model initialized with weights from SigLIP and Phi 1.5. This extractor uses Moondream to answer questions about images.

Input Parameters

  • prompt (default: "Describe this image."): Question or prompt about the image

Input Data Types

["image/bmp", "image/gif", "image/jpeg", "image/png", "image/tiff"]

Class Name

MoondreamExtractor

Download Command

indexify-extractor download tensorlake/moondream