Retriva ships as a Dockerized FastAPI app. The saved demo indexes in outputs/embeddings/ are copied into the image, so the API can start without rebuilding indexes on every deploy.
Set this as a hosting-platform environment variable/secret:
GEMINI_API_KEY=your_google_ai_studio_key
Get a free Gemini key from https://aistudio.google.com/app/apikey. Do not commit real keys.
Optional environment variables:
MODEL_NAME=gemini-2.5-flash
RAG_ENABLE_RERANKER=0
RAG_DISABLE_EMBEDDINGS=0
CORS_ORIGINS=*
LOG_LEVEL=INFO
cp .env.example .env # then edit GEMINI_API_KEY
docker build -t retriva-rag .
docker run --env-file .env -p 8000:8000 retriva-rag
curl http://localhost:8000/health
Open http://localhost:8000/docs for Swagger UI.
uvicorn and respects Render’s PORT.GEMINI_API_KEY: your keyMODEL_NAME: gemini-2.5-flashRAG_ENABLE_RERANKER: 0 for lower memory/latencyCORS_ORIGINS: your UI domain or * for API-only demoscurl https://<render-service>.onrender.com/health
Notes:
outputs/embeddings/ artifacts, or add a deploy build step that runs python scripts/build_index.py.Use the same Docker image settings:
8000 internally, or platform-provided $PORT./health.GEMINI_API_KEY.Verify /health and /docs after deploy.
This repo includes:
.github/workflows/ecr-push.yml — builds and pushes the Docker image to ECR on main..github/workflows/deploy-apprunner.yml — manually deploys/updates App Runner via CloudFormation.aws/app-runner-service.yaml — App Runner service template.Prerequisites:
rag in eu-west-1, or edit workflow env values.AWS_ROLE_TO_ASSUME with an OIDC role ARN.GEMINI_API_KEY.iam:PassRole.Deploy flow:
main to run Build and Push to ECR.curl https://<apprunner-url>/health
curl https://<apprunner-url>/docs
Without starting a public server:
python scripts/health_check.py
This imports the FastAPI app, triggers startup, loads saved indexes, then checks /health and /stats.
/health./docs URL to README.md./docs or the Streamlit UI and add it under docs/assets/.rag, llm, fastapi, faiss, bm25, gemini, docker, llmops.