Production-ready Retrieval-Augmented Generation for private document Q&A.
Retriva is a portfolio-grade hybrid RAG system that answers questions over private documents using BM25 + FAISS retrieval, optional cross-encoder reranking, source-grounded prompts, a FastAPI API, and a lightweight Streamlit demo UI.
It is designed to show production AI engineering judgment: reproducible setup, saved indexes, health checks, Docker deployment, CI, evaluation reports, and honest notes about API keys/secrets.
Live demo: add the deployed URL here after following DEPLOY.md.
Screenshot/GIF placeholder: add
docs/assets/retriva-demo.gifor a screenshot once the API/UI is deployed. Do not commit API keys or private documents.
Local API docs are available at http://localhost:8000/docs after startup.
RAG_ENABLE_RERANKER=1./query, /health, /stats, and OpenAPI /docs.outputs/evaluations/.User Query
│
▼
[FastAPI / Streamlit]
│
▼
[Semantic Cache] ── hit ───────────────────────────────► Cached Answer
│ miss
▼
[Query Normalization]
│
├──► [Dense Retrieval] SentenceTransformers → FAISS
│
├──► [Sparse Retrieval] BM25 keyword index
│
▼
[Hybrid Result Fusion]
│
▼
[Optional Cross-Encoder Reranking]
│
▼
[Prompt Builder + Source Formatting]
│
▼
[Gemini LLM]
│
▼
Answer + citations + source metadata
outputs/embeddings/ at API startup.git clone https://github.com/gauthambinoy/retriva-hybrid-rag-engine.git
cd retriva-hybrid-rag-engine
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
Edit .env and set:
GEMINI_API_KEY=your_google_ai_studio_key
Get a free Gemini key from Google AI Studio: https://aistudio.google.com/app/apikey.
This repo includes saved demo indexes in outputs/embeddings/. If you change documents under data/raw_documents/, rebuild:
python scripts/build_index.py
For lightweight local/CI runs without dense embedding downloads:
RAG_DISABLE_EMBEDDINGS=1 python scripts/build_index.py
uvicorn app.api:app --reload --host 0.0.0.0 --port 8000
Then open:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question":"What is the transformer architecture?","top_k":5,"min_score":0.3}'
streamlit run app/dashboard.py
cp .env.example .env # add GEMINI_API_KEY
docker build -t retriva-rag .
docker run --env-file .env -p 8000:8000 retriva-rag
Verify:
curl http://localhost:8000/health
POST /queryRequest:
{
"question": "What is the transformer architecture?",
"top_k": 5,
"min_score": 0.3
}
Response shape:
{
"question": "What is the transformer architecture?",
"answer": "The Transformer is a neural network architecture... [C1]",
"sources": ["Attention_is_all_you_need.pdf"],
"num_chunks": 5,
"relevance_scores": [0.44, 0.43, 0.40, 0.39, 0.38],
"model": "gemini-2.5-flash",
"provider": "gemini",
"tokens_used": {"total_tokens": "N/A"}
}
| Endpoint | Purpose |
|---|---|
GET / |
API metadata |
GET /health |
Readiness check; returns 503 until the pipeline is loaded |
GET /stats |
Retriever/model stats |
GET /docs |
Swagger/OpenAPI UI |
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
yes for generation | unset | Google Gemini API key. API starts without it, but /query returns a missing-key error. |
MODEL_NAME |
no | gemini-2.5-flash |
Gemini model preference. |
RAG_ENABLE_RERANKER |
no | 0 |
Enable cross-encoder reranking (1) for higher precision and more latency. |
RAG_DISABLE_EMBEDDINGS |
no | 0 |
Use BM25-only mode; useful in CI or low-memory environments. |
CORS_ORIGINS |
no | * |
Comma-separated allowed origins for browsers. Use exact domains in production. |
LOG_LEVEL |
no | INFO |
Logging verbosity for deployments. |
# Unit/integration tests
python -m pytest tests/ -q
# API readiness without starting a public server
python scripts/health_check.py
CI runs dependency installation and pytest on pushes/PRs to main via .github/workflows/ci.yml.
retriva-hybrid-rag-engine/
├── app/
│ ├── api.py # FastAPI app
│ └── dashboard.py # Streamlit demo UI
├── src/
│ ├── pipeline.py # End-to-end RAG orchestration
│ ├── loaders/ # PDF, DOCX, XLS/XLSX loaders
│ ├── preprocessing/ # Normalization and chunking
│ ├── retrieval/ # FAISS, BM25, reranking, caching
│ ├── generation/ # Gemini interface and prompts
│ └── evaluation/ # Metrics utilities
├── scripts/
│ ├── build_index.py # Build saved retrieval indexes
│ └── health_check.py # Deployment readiness check
├── outputs/ # Saved indexes and evaluation reports
├── data/raw_documents/ # Demo/private document corpus
├── .github/workflows/ # CI and deployment workflows
├── Dockerfile
├── DEPLOY.md
└── requirements.txt
See DEPLOY.md for realistic free/low-cost deployment steps.
Recommended portfolio path:
GEMINI_API_KEY as a platform secret/environment variable./health and /docs.rag, fastapi, faiss, bm25, gemini, llmops.MIT © Gautham Binoy