Skip to main content
Tutorial

Python FastAPI production deployment — complete guide

The Reflex Team9 min17 May 2026

FastAPI has earned its place as the default choice for new Python APIs. It is fast, type-safe, and the developer experience is genuinely good. But the gap between uvicorn main:app --reload on your laptop and a production deployment that survives real traffic is wider than most tutorials acknowledge.

This guide covers the full path: ASGI server configuration, process management with systemd, nginx reverse proxy, SSL termination, and environment management. No Docker required — though we will mention where containers help and where they add unnecessary complexity.

Gunicorn + Uvicorn: the production ASGI stack

Uvicorn is an ASGI server. Gunicorn is a process manager that can spawn and supervise multiple Uvicorn workers. In production, you almost always want both:

gunicorn main:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000 \
  --timeout 120 \
  --graceful-timeout 30 \
  --max-requests 1000 \
  --max-requests-jitter 50 \
  --access-logfile - \
  --error-logfile -

Worker count: the classic formula is (2 × CPU_CORES) + 1. For I/O-bound APIs (database queries, external HTTP calls), this is a reasonable starting point. For CPU-bound workloads (ML inference, image processing), match worker count to core count and offload heavy computation to background tasks.

--max-requests is your memory leak safety net. After serving 1000 requests (plus random jitter to avoid thundering herd), the worker gracefully restarts. This is not a performance tuning knob — it is insurance against the slow leak in that one dependency you cannot control.

--timeout kills workers that have been silent for 120 seconds. Set this higher than your slowest legitimate endpoint. If your API genuinely needs more than 120 seconds for a request, that request should be a background job.

systemd service configuration

Do not run Gunicorn in a tmux session. systemd gives you automatic restarts, log management via journald, dependency ordering, and resource limits:

[Unit]
Description=FastAPI Application
After=network.target postgresql.service redis.service
Wants=postgresql.service

[Service]
Type=notify
User=deploy
Group=www-data
WorkingDirectory=/var/www/api
Environment="PATH=/var/www/api/.venv/bin:/usr/bin"
EnvironmentFile=/var/www/api/.env
ExecStart=/var/www/api/.venv/bin/gunicorn main:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind unix:/run/api/gunicorn.sock \
  --timeout 120 \
  --max-requests 1000 \
  --max-requests-jitter 50
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/var/www/api/storage /var/log/api

[Install]
WantedBy=multi-user.target

Key decisions: Unix socket instead of TCP for local nginx communication — eliminates TCP overhead and simplifies firewall rules. PrivateTmp and ProtectSystem=strict are systemd security hardening that costs nothing and limits blast radius. EnvironmentFile loads your .env without embedding secrets in the unit file.

After creating the file at /etc/systemd/system/api.service:

sudo systemctl daemon-reload
sudo systemctl enable api
sudo systemctl start api
sudo journalctl -u api -f

nginx reverse proxy

nginx handles SSL termination, static file serving, request buffering, and connection management — all things Gunicorn should not waste worker time on:

upstream api_backend {
    server unix:/run/api/gunicorn.sock fail_timeout=0;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate /etc/letsencrypt/live/api.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/api.example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;

    client_max_body_size 10M;
    proxy_read_timeout 120s;
    proxy_connect_timeout 5s;

    add_header X-Content-Type-Options nosniff always;
    add_header X-Frame-Options DENY always;
    add_header Referrer-Policy strict-origin-when-cross-origin always;

    location / {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect off;
        proxy_buffering on;
    }

    location /health {
        proxy_pass http://api_backend;
        access_log off;
    }
}

server {
    listen 80;
    server_name api.example.com;
    return 301 https://$server_name$request_uri;
}

proxy_buffering on is the default and the right choice — nginx buffers the upstream response so Gunicorn workers are freed immediately instead of waiting for slow clients to receive every byte. fail_timeout=0 disables nginx's passive health checking in favour of your own active health check logic.

SSL with Let's Encrypt

Certbot handles certificate issuance and automatic renewal:

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d api.example.com

Certbot modifies your nginx config to add SSL directives and installs a systemd timer for renewal. Verify the timer is active:

sudo systemctl list-timers | grep certbot

For automated environments, use the DNS challenge with your provider's plugin — this avoids port 80 availability requirements and works behind load balancers.

Environment management

Python virtual environments are non-negotiable in production. System Python's site-packages is a shared mutable resource that will eventually conflict:

python3 -m venv /var/www/api/.venv
source /var/www/api/.venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Pin every dependency version in requirements.txt. Use pip-compile from pip-tools to generate locked dependencies from a requirements.in file — this is the Python equivalent of composer.lock:

pip install pip-tools
pip-compile requirements.in --output-file requirements.txt
pip-sync requirements.txt

For environment variables, use a .env file loaded by systemd's EnvironmentFile directive. Never hardcode secrets. Never commit .env to version control. Validate critical variables at application startup:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    redis_url: str
    secret_key: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()

Pydantic validates types and raises clear errors on startup if required variables are missing — fail fast rather than discovering a missing DATABASE_URL on the first request.

Deployment workflow

A minimal zero-downtime deploy for FastAPI:

cd /var/www/api
git pull origin main
source .venv/bin/activate
pip-sync requirements.txt
alembic upgrade head
sudo systemctl reload api

The reload sends HUP to Gunicorn's master process, which gracefully restarts workers — existing requests complete, new workers start with updated code. No dropped connections.

For atomic deployments with rollback, use a symlink strategy (same as Capistrano or Reflex Pipeline): build in a fresh release directory, flip the symlink, reload Gunicorn.

How Reflex fits

Reflex's reflexd agent monitors the Gunicorn process tree, tracks worker memory and restart patterns, watches nginx upstream health, and alerts on the signals that matter: worker crash loops, memory growth trends, and upstream timeout spikes. The Brain can trigger a graceful Gunicorn reload when workers enter an unhealthy state — the same HUP signal you would send manually, but at 3am when you are asleep.