Service Configuration

Indexify is configured by a YAML configuration file. The easiest way to start is by generating it with the CLI or by downloading a sample configuration file, and then tweaking it to fit your needs.

Generate with CLI

Unable to find ./indexify?

Don't forget to download our Indexify binary before running the command below. You can do by running the command curl | sh.

This will in turn download the relevant binary at the relative path ./indexify.

./indexify init-config --config-path /tmp/indexify.yaml

Configuration Reference

Network Configuration

api_port: 8900
coordinator_port: 8950
coordinator_http_port: 8960
raft_port: 8970
  • listen_if: The interface on which the servers listens on. Typically you would want to listen on all interfaces.
  • api_port: The port in which the application facing API server is exposed. This is the HTTP port on which applications upload data, create extraction policies and retrieved extracted data from indexes.
  • coordinator_port: Port on which the coordinator is exposed. This is available as a separate configuration becasue in the dev mode, we expose both the api server and the coordinator server in the same process.
  • coordinator_http_port Port to access coordinator metrics
  • raft_port: Port on which internal messages across coordinator nodes are transmitted. This is only needed if Indexify is either started as a coordinator or in dev mode.

Don't forget to configure a volume

Indexify stores all of the the Extraction Graphs you've configured and data it has processed locally. This is configured in indexify.yaml as seen below

  path: <state store path>

Don't forget to configure a persistent volume at this location if you'll like to make sure you don't lose your data when your server restarts.

Blob Storage Configuration

Blob Storage Configuration refers to the raw bytes of unstructured data. For instance if you're splitting your text data into chunks, these text chunks will be stored at the location you specify below.

We support two forms of blob storage at the moment - Disk and S3 Storage.


A common use-case for disk storage is if you're using a shared volume to replicate/share data between different processes.

  backend: disk
    path: /tmp/indexify-blob-storage

S3 Storage

For S3 Storage, you'll need to also ensure you have the two following environment variables configured. Once you've configured these environment variables, our S3 integration will take care of the rest

  backend: s3
    bucket: indexifydata
    region: us-east-1

Vector Index Storage

  • index_store: (Default: LanceDb): Name of the vector be, possible values: LanceDb, Qdrant, PgVector

Qdrant Config

addr: Address of the Qdrant http endpoint

  index_store: Qdrant
    addr: ""

Pg Vector Config

addr: Address of Postgres

  index_store: PgVector
    addr: postgres://postgres:postgres@localhost/indexify
    m: 16
    efconstruction: 64

LanceDb Config

path: Path of the database


  backend: none
  backend: memory
    max_size: 1000000
  backend: redis
    addr: redis://localhost:6379

API Server TLS

To set up mTLS for the indexify server, you first need to create a root certificate along with a client certificate and key pair along with a server certificate and key pair. The commands below will generate the certificates and keys and store them in a folder called .dev-tls.

local-dev-tls-insecure: ## Generate local development TLS certificates (insecure)
    @mkdir -p .dev-tls && \
    openssl req -x509 -newkey rsa:4096 -keyout .dev-tls/ca.key -out .dev-tls/ca.crt -days 365 -nodes -subj "/C=US/ST=TestState/L=TestLocale/O=IndexifyOSS/CN=localhost" && \
    openssl req -new -newkey rsa:4096 -keyout .dev-tls/server.key -out .dev-tls/server.csr -nodes -config ./client_cert_config && \
    openssl x509 -req -in .dev-tls/server.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/server.crt -days 365 -extensions v3_ca -extfile ./client_cert_config && \
    openssl req -new -nodes -out .dev-tls/client.csr -newkey rsa:2048 -keyout .dev-tls/client.key -config ./client_cert_config && \
    openssl x509 -req -in .dev-tls/client.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/client.crt -days 365 -extfile ./client_cert_config -extensions v3_ca

Once you have the certificates and keys generated, add the config below to your server config and provide the paths to where you have stored the root certificate and the server certificate and key pair.

  api: true
  ca_file: .dev-tls/ca.crt        # Path to the CA certificate
  cert_file: .dev-tls/server.crt  # Path to the server certificate
  key_file: .dev-tls/server.key   # Path to the server private key

HA configuration

To setup mulitple coordinator nodes for high availability configuration, start with a single node, called a seed node. Create a separate configuration file for each additional coordinator instance. Each node should have a unique node_id field in configuration file. seed_node field should be set to ip address and port of the original coordinator node.

Seed node:

raft_port: 8970
node_id: 0
seed_node: localhost:8970

New node (replace with actual seed node IP address, 8970 should match configured raft_port of the seed node):

node_id: 1