Skip to main content

🐳 Docker, Deploying LiteLLM Proxy

You can find the Dockerfile to build litellm proxy here

Quick Start

See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm

docker pull ghcr.io/berriai/litellm:main-latest
docker run ghcr.io/berriai/litellm:main-latest

That's it ! That's the quick start to deploy litellm

Options to deploy LiteLLM

DocsWhen to Use
Quick Startcall 100+ LLMs + Load Balancing
Deploy with Database+ use Virtual Keys + Track Spend
LiteLLM container + Redis+ load balance across multiple litellm containers
LiteLLM Database container + PostgresDB + Redis+ use Virtual Keys + Track Spend + load balance across multiple litellm containers

Deploy with Database

We maintain a seperate Dockerfile for reducing build time when running LiteLLM proxy with a connected Postgres Database

docker pull docker pull ghcr.io/berriai/litellm-database:main-latest
docker run --name litellm-proxy \
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest

Your OpenAI proxy server is now running on http://0.0.0.0:4000.

LiteLLM container + Redis

Use Redis when you need litellm to load balance across multiple litellm containers

The only change required is setting Redis on your config.yaml LiteLLM Proxy supports sharing rpm/tpm shared across multiple litellm instances, pass redis_host, redis_password and redis_port to enable this. (LiteLLM will use Redis to track rpm/tpm usage )

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
redis_host: <your redis host>
redis_password: <your redis password>
redis_port: 1992

Start docker container with config

docker run ghcr.io/berriai/litellm:main-latest --config your_config.yaml

LiteLLM Database container + PostgresDB + Redis

The only change required is setting Redis on your config.yaml LiteLLM Proxy supports sharing rpm/tpm shared across multiple litellm instances, pass redis_host, redis_password and redis_port to enable this. (LiteLLM will use Redis to track rpm/tpm usage )

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/<your-deployment-name>
api_base: <your-azure-endpoint>
api_key: <your-azure-api-key>
rpm: 6 # Rate limit for this deployment: in requests per minute (rpm)
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
rpm: 6
router_settings:
redis_host: <your redis host>
redis_password: <your redis password>
redis_port: 1992

Start litellm-databasedocker container with config

docker run --name litellm-proxy \
-e DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest --config your_config.yaml

Best Practices for Deploying to Production

1. Switch of debug logs in production

don't use --detailed-debug, --debug or litellm.set_verbose=True. We found using debug logs can add 5-10% latency per LLM API call

Advanced Deployment Settings

Customization of the server root path

info

In a Kubernetes deployment, it's possible to utilize a shared DNS to host multiple applications by modifying the virtual service

Customize the root path to eliminate the need for employing multiple DNS configurations during deployment.

👉 Set SERVER_ROOT_PATH in your .env and this will be set as your server root path

Setting SSL Certification

Use this, If you need to set ssl certificates for your on prem litellm proxy

Pass ssl_keyfile_path (Path to the SSL keyfile) and ssl_certfile_path (Path to the SSL certfile) when starting litellm proxy

docker run ghcr.io/berriai/litellm:main-latest \
--ssl_keyfile_path ssl_test/keyfile.key \
--ssl_certfile_path ssl_test/certfile.crt

Provide an ssl certificate when starting litellm proxy server

Platform-specific Guide

AWS Cloud Formation Stack

LiteLLM AWS Cloudformation Stack - Get the best LiteLLM AutoScaling Policy and Provision the DB for LiteLLM Proxy

This will provision:

  • LiteLLMServer - EC2 Instance
  • LiteLLMServerAutoScalingGroup
  • LiteLLMServerScalingPolicy (autoscaling policy)
  • LiteLLMDB - RDS::DBInstance

Using AWS Cloud Formation Stack

LiteLLM Cloudformation stack is located here - litellm.yaml

1. Create the CloudFormation Stack:

In the AWS Management Console, navigate to the CloudFormation service, and click on "Create Stack."

On the "Create Stack" page, select "Upload a template file" and choose the litellm.yaml file

Now monitor the stack was created successfully.

2. Get the Database URL:

Once the stack is created, get the DatabaseURL of the Database resource, copy this value

3. Connect to the EC2 Instance and deploy litellm on the EC2 container

From the EC2 console, connect to the instance created by the stack (e.g., using SSH).

Run the following command, replacing <database_url> with the value you copied in step 2

docker run --name litellm-proxy \
-e DATABASE_URL=<database_url> \
-p 4000:4000 \
ghcr.io/berriai/litellm-database:main-latest

4. Access the Application:

Once the container is running, you can access the application by going to http://<ec2-public-ip>:4000 in your browser.

Extras

Run with docker compose

Step 1

Here's an example docker-compose.yml file

version: "3.9"
services:
litellm:
build:
context: .
args:
target: runtime
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000" # Map the container port to the host, change the host port if necessary
volumes:
- ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
# You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
command: [ "--config", "/app/config.yaml", "--port", "4000", "--num_workers", "8" ]

# ...rest of your docker-compose config if any

Step 2

Create a litellm-config.yaml file with your LiteLLM config relative to your docker-compose.yml file.

Check the config doc here

Step 3

Run the command docker-compose up or docker compose up as per your docker installation.

Use -d flag to run the container in detached mode (background) e.g. docker compose up -d

Your LiteLLM container should be running now on the defined port e.g. 4000.