retriva-hybrid-rag-engine

Deployment Guide

Retriva ships as a Dockerized FastAPI app. The saved demo indexes in outputs/embeddings/ are copied into the image, so the API can start without rebuilding indexes on every deploy.

Required secret

Set this as a hosting-platform environment variable/secret:

GEMINI_API_KEY=your_google_ai_studio_key

Get a free Gemini key from https://aistudio.google.com/app/apikey. Do not commit real keys.

Optional environment variables:

MODEL_NAME=gemini-2.5-flash
RAG_ENABLE_RERANKER=0
RAG_DISABLE_EMBEDDINGS=0
CORS_ORIGINS=*
LOG_LEVEL=INFO

Local Docker smoke test

cp .env.example .env   # then edit GEMINI_API_KEY
docker build -t retriva-rag .
docker run --env-file .env -p 8000:8000 retriva-rag
curl http://localhost:8000/health

Open http://localhost:8000/docs for Swagger UI.

Option A — Render Web Service (simple/free-to-start)

Push the repo to GitHub.
Render → New → Web Service → connect this repository.
Runtime: Docker.
Start command: leave blank; the Dockerfile starts uvicorn and respects Render’s PORT.
Add environment variables:
- GEMINI_API_KEY: your key
- MODEL_NAME: gemini-2.5-flash
- RAG_ENABLE_RERANKER: 0 for lower memory/latency
- CORS_ORIGINS: your UI domain or * for API-only demos
Deploy.
Verify:

curl https://<render-service>.onrender.com/health

Notes:

Render free instances may cold-start and may have memory limits. Keep reranking off for the first deploy.
If you replace the document corpus, rebuild indexes locally and commit the updated outputs/embeddings/ artifacts, or add a deploy build step that runs python scripts/build_index.py.

Option B — Railway/Fly.io/other Docker hosts

Use the same Docker image settings:

Exposed port: 8000 internally, or platform-provided $PORT.
Health path: /health.
Required secret: GEMINI_API_KEY.
Start command is already in the Dockerfile.

Verify /health and /docs after deploy.

Option C — AWS App Runner via GitHub Actions

This repo includes:

.github/workflows/ecr-push.yml — builds and pushes the Docker image to ECR on main.
.github/workflows/deploy-apprunner.yml — manually deploys/updates App Runner via CloudFormation.
aws/app-runner-service.yaml — App Runner service template.

Prerequisites:

AWS account and ECR repository named rag in eu-west-1, or edit workflow env values.
GitHub repository variable AWS_ROLE_TO_ASSUME with an OIDC role ARN.
GitHub secret GEMINI_API_KEY.
OIDC role permissions for ECR push/read, CloudFormation deploy, App Runner create/update, and scoped iam:PassRole.

Deploy flow:

Push to main to run Build and Push to ECR.
GitHub Actions → Deploy App Runner → Run workflow.
Use latest image tag or provide a specific tag.
Verify the output service URL:

curl https://<apprunner-url>/health
curl https://<apprunner-url>/docs

Health verification

Without starting a public server:

python scripts/health_check.py

This imports the FastAPI app, triggers startup, loads saved indexes, then checks /health and /stats.

Portfolio launch checklist

Confirm CI is green on GitHub.
Deploy the Docker API and verify /health.
Add the live /docs URL to README.md.
Record a short GIF/screenshot of /docs or the Streamlit UI and add it under docs/assets/.
Pin the repo and add GitHub topics: rag, llm, fastapi, faiss, bm25, gemini, docker, llmops.

This site is open source. Improve this page.