Server Configuration

Indexify is configured by a YAML configuration file. The easiest way to start is by generating it with the CLI or by downloading a sample configuration file, and then tweaking it to fit your needs.

Generate with CLI

Don’t forget to download our Indexify binary before running the command below. You can do by running the command curl https://getindexify.ai | sh.

This will in turn download the relevant binary at the relative path ./indexify.

./indexify init-config --config-path /tmp/indexify.yaml

Configuration Reference

Network Configuration

listen_if: 0.0.0.0
api_port: 8900
coordinator_port: 8950
coordinator_http_port: 8960
raft_port: 8970
coordinator_addr: 0.0.0.0:8950

listen_if: The interface on which the servers listens on. Typically you would want to listen on all interfaces.
api_port: The port in which the application facing API server is exposed. This is the HTTP port on which applications upload data, create extraction policies and retrieved extracted data from indexes.
coordinator_port: Port on which the coordinator is exposed. This is available as a separate configuration becasue in the dev mode, we expose both the api server and the coordinator server in the same process.
coordinator_http_port Port to access coordinator metrics
raft_port: Port on which internal messages across coordinator nodes are transmitted. This is only needed if Indexify is either started as a coordinator or in dev mode.

“Don’t forget to configure a volume”

Indexify stores all of the the Extraction Graphs you’ve configured and data it has processed locally. This is configured in indexify.yaml as seen below

state_store:
  path: <state store path>

Don’t forget to configure a persistent volume at this location if you’ll like to make sure you don’t lose your data when your server restarts.

Blob Storage Configuration

Blob Storage Configuration refers to the raw bytes of unstructured data. For instance if you’re splitting your text data into chunks, these text chunks will be stored at the location you specify below.

We support two forms of blob storage at the moment - Disk and S3 Storage.

Disk

A common use-case for disk storage is if you’re using a shared volume to replicate/share data between different processes.

blob_storage:
  backend: disk
  disk:
    path: /tmp/indexify-blob-storage

S3 Storage

For S3 Storage, you’ll need to also ensure you have the two following environment variables configured. Once you’ve configured these environment variables, our S3 integration will take care of the rest

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

blob_storage:
  backend: s3
  s3:
    bucket: indexifydata
    region: us-east-1

Vector Index Storage

index_store: (Default: LanceDb): Name of the vector be, possible values: LanceDb, Qdrant, PgVector

Qdrant Config

addr: Address of the Qdrant http endpoint

index_config:
  index_store: Qdrant
  qdrant_config:
    addr: "http://127.0.0.1:6334"

Pg Vector Config

addr: Address of Postgres

index_config:
  index_store: PgVector
  pg_vector_config:
    addr: postgres://postgres:postgres@localhost/indexify
    m: 16
    efconstruction: 64

LanceDb Config

path: Path of the database

Caching

cache:
  backend: none

cache:
  backend: memory
  memory:
    max_size: 1000000

cache:
  backend: redis
  redis:
    addr: redis://localhost:6379

API Server TLS

To set up mTLS for the indexify server, you first need to create a root certificate along with a client certificate and key pair along with a server certificate and key pair. The commands below will generate the certificates and keys and store them in a folder called .dev-tls.

local-dev-tls-insecure: ## Generate local development TLS certificates (insecure)
	@mkdir -p .dev-tls && \
	openssl req -x509 -newkey rsa:4096 -keyout .dev-tls/ca.key -out .dev-tls/ca.crt -days 365 -nodes -subj "/C=US/ST=TestState/L=TestLocale/O=IndexifyOSS/CN=localhost" && \
	openssl req -new -newkey rsa:4096 -keyout .dev-tls/server.key -out .dev-tls/server.csr -nodes -config ./client_cert_config && \
	openssl x509 -req -in .dev-tls/server.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/server.crt -days 365 -extensions v3_ca -extfile ./client_cert_config && \
	openssl req -new -nodes -out .dev-tls/client.csr -newkey rsa:2048 -keyout .dev-tls/client.key -config ./client_cert_config && \
	openssl x509 -req -in .dev-tls/client.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/client.crt -days 365 -extfile ./client_cert_config -extensions v3_ca

Once you have the certificates and keys generated, add the config below to your server config and provide the paths to where you have stored the root certificate and the server certificate and key pair.

tls:
  api: true
  ca_file: .dev-tls/ca.crt        # Path to the CA certificate
  cert_file: .dev-tls/server.crt  # Path to the server certificate
  key_file: .dev-tls/server.key   # Path to the server private key

HA configuration

To run multiple coordinators in a high availability configuration, you’ll want to have a way for them to discover themselves. Set seed_node to the address that can be used for this. Note that the only requirement is that the returned node is “ready” as defined by localhost:8960/status. For dynamic environments, put a load balancer in front of all coordinator nodes and enable/disable their endpoints based off the status of the raft cluster.

Note: it is important that one of the nodes is started with --initialize the first time the cluster is started. This provides an initial leader to form the cluster around. You’ll likely want to no longer use this flag after the cluster is formed.

raft_port: 8970
node_id: 0
seed_node: my-dicovery-address:8970

Get Started

Overview

Use Cases

CLI & UI

Pre-Built Extractors

LLM Frameworks

Deployment and Operation

Server Configuration

Generate with CLI

Configuration Reference

Network Configuration

Blob Storage Configuration

Disk

S3 Storage

Vector Index Storage

Qdrant Config

Pg Vector Config

LanceDb Config

Caching

API Server TLS

HA configuration

Get Started

Overview

Use Cases

CLI & UI

Pre-Built Extractors

LLM Frameworks

Deployment and Operation

​Generate with CLI

​Configuration Reference

​Network Configuration

​Blob Storage Configuration

​Disk

​S3 Storage

​Vector Index Storage

​Qdrant Config

​Pg Vector Config

​LanceDb Config

​Caching

​API Server TLS

​HA configuration

Generate with CLI

Configuration Reference

Network Configuration

Blob Storage Configuration

Disk

S3 Storage

Vector Index Storage

Qdrant Config

Pg Vector Config

LanceDb Config

Caching

API Server TLS

HA configuration