Server Configuration
Indexify is configured by a YAML configuration file. The easiest way to start is by generating it with the CLI or by downloading a sample configuration file, and then tweaking it to fit your needs.
Generate with CLI
Don’t forget to download our Indexify
binary before running the command below. You can do by running the command curl https://getindexify.ai | sh
.
This will in turn download the relevant binary at the relative path ./indexify
.
./indexify init-config --config-path /tmp/indexify.yaml
Configuration Reference
Network Configuration
listen_if: 0.0.0.0
api_port: 8900
coordinator_port: 8950
coordinator_http_port: 8960
raft_port: 8970
coordinator_addr: 0.0.0.0:8950
- listen_if: The interface on which the servers listens on. Typically you would want to listen on all interfaces.
- api_port: The port in which the application facing API server is exposed. This is the HTTP port on which applications upload data, create extraction policies and retrieved extracted data from indexes.
- coordinator_port: Port on which the coordinator is exposed. This is available as a separate configuration becasue in the dev mode, we expose both the api server and the coordinator server in the same process.
- coordinator_http_port Port to access coordinator metrics
- raft_port: Port on which internal messages across coordinator nodes are transmitted. This is only needed if Indexify is either started as a coordinator or in dev mode.
“Don’t forget to configure a volume”
Indexify stores all of the the Extraction Graphs you’ve configured and data it has processed locally. This is configured in indexify.yaml
as seen below
state_store:
path: <state store path>
Don’t forget to configure a persistent volume at this location if you’ll like to make sure you don’t lose your data when your server restarts.
Blob Storage Configuration
Blob Storage Configuration refers to the raw bytes of unstructured data. For instance if you’re splitting your text data into chunks, these text chunks will be stored at the location you specify below.
We support two forms of blob storage at the moment - Disk and S3 Storage.
Disk
A common use-case for disk storage is if you’re using a shared volume to replicate/share data between different processes.
blob_storage:
backend: disk
disk:
path: /tmp/indexify-blob-storage
S3 Storage
For S3 Storage, you’ll need to also ensure you have the two following environment variables configured. Once you’ve configured these environment variables, our S3 integration will take care of the rest
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
blob_storage:
backend: s3
s3:
bucket: indexifydata
region: us-east-1
Vector Index Storage
- index_store: (Default: LanceDb): Name of the vector be, possible values:
LanceDb
,Qdrant
,PgVector
Qdrant Config
addr
: Address of the Qdrant http endpoint
index_config:
index_store: Qdrant
qdrant_config:
addr: "http://127.0.0.1:6334"
Pg Vector Config
addr
: Address of Postgres
index_config:
index_store: PgVector
pg_vector_config:
addr: postgres://postgres:postgres@localhost/indexify
m: 16
efconstruction: 64
LanceDb Config
path
: Path of the database
Caching
cache:
backend: none
cache:
backend: memory
memory:
max_size: 1000000
cache:
backend: redis
redis:
addr: redis://localhost:6379
API Server TLS
To set up mTLS for the indexify server, you first need to create a root certificate along with a client certificate and key pair along with a server certificate and key pair. The commands below will generate the certificates and keys and store them in a folder called .dev-tls
.
local-dev-tls-insecure: ## Generate local development TLS certificates (insecure)
@mkdir -p .dev-tls && \
openssl req -x509 -newkey rsa:4096 -keyout .dev-tls/ca.key -out .dev-tls/ca.crt -days 365 -nodes -subj "/C=US/ST=TestState/L=TestLocale/O=IndexifyOSS/CN=localhost" && \
openssl req -new -newkey rsa:4096 -keyout .dev-tls/server.key -out .dev-tls/server.csr -nodes -config ./client_cert_config && \
openssl x509 -req -in .dev-tls/server.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/server.crt -days 365 -extensions v3_ca -extfile ./client_cert_config && \
openssl req -new -nodes -out .dev-tls/client.csr -newkey rsa:2048 -keyout .dev-tls/client.key -config ./client_cert_config && \
openssl x509 -req -in .dev-tls/client.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/client.crt -days 365 -extfile ./client_cert_config -extensions v3_ca
Once you have the certificates and keys generated, add the config below to your server config and provide the paths to where you have stored the root certificate and the server certificate and key pair.
tls:
api: true
ca_file: .dev-tls/ca.crt # Path to the CA certificate
cert_file: .dev-tls/server.crt # Path to the server certificate
key_file: .dev-tls/server.key # Path to the server private key
HA configuration
To run multiple coordinators in a high availability configuration, you’ll want
to have a way for them to discover themselves. Set seed_node
to the address
that can be used for this. Note that the only requirement is that the returned
node is “ready” as defined by localhost:8960/status
. For dynamic environments,
put a load balancer in front of all coordinator nodes and enable/disable their
endpoints based off the status of the raft cluster.
Note: it is important that one of the nodes is started with
--initialize
the first time the cluster is started. This provides an
initial leader to form the cluster around. You’ll likely want to no longer use
this flag after the cluster is formed.
raft_port: 8970
node_id: 0
seed_node: my-dicovery-address:8970
Was this page helpful?