Deployment
Indexify can be deployed in the following ways -
- Bare Metal and VMs
- Docker Compose
- Kubernetes (or any other container orchestrator)
Bare Metal
Indexify doesn’t depend on Kubernetes or Docker, you can run the server and executors on any VM or bare metal machines.
Start Server
Start the server on one machine. Read the configuration reference to understand how to customize the server to use blob stores for storing function outputs.
indexify-server
We have a replicated mode for the server, based on Raft consensus protocol. It’s not public yet because we are still figuring out how to make it easy to configure, operate and use by developers. If you are interested in using it, please reach out to us.
Start Executor
Start as many executors you want in different machines.
indexify-cli executor --server-addr <server-ip>:<server-port>
Docker Compose
You can spin up the server and executor using docker compose, and deploy and run in a production-like environment. Copy the docker-compose.yaml file from here.
docker compose up
This starts the server and two replicas of the exeuctor in separate containers.
Change the replicas
field for the executor in docker compose to add more executors (i.e parallelism) to the workflow.
This uses a default executor container based on Debian and a vanilla Python installation. We generally provide docker compose files for local testing of every example project in the repository.
Kubernetes
We provide some basic Helm charts to deploy Indexify on Kubernetes. If you’d like to try with your own cluster, check out the instructions.
Components
- Server - This is where all your requests go. There’s an
ingress which exposes
/
by default. - Executor - Extractors can take multiple forms, this example is generic and works for all the extractors which are distributed by the project.
Dependencies
Blob Store
We recommend using an S3 like service for the blob store. Our ephemeral example uses minio for this. See the environment variable patch for how this gets configured.
The API server and coordinator will need an AWS_ENDPOINT
env var pointing to
where your S3 solution is hosted. Extractors need a slightly different env var -
AWS_ENDPOINT_URL
.
GCP
- You’ll want to create a HMAC key to use as
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Set
AWS_ENDPOINT_URL
tohttps://storage.googleapis.com/
Other Clouds
Not all clouds expose a S3 interface. For those that don’t check out the s3proxy project. However, we’d love help implementing your native blob storage of choice! Please open an issue so that we can have a discussion on how that would look for the project.