System Overview Diagram
RAG SYSTEM ARCHITECTURE
═══════════════════════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────────────────────┐
│ OFFLINE INDEXING │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Source Docs │ │ Chunking │ │ Embedding │ │ Vector │
│ PDF/DOCX/ │────▶│ Strategy │────▶│ Model │────▶│ Store │
│ HTML/TXT │ │ 512 tokens │ │ e5-large │ │ (Milvus) │
└──────┬──────┘ │ 50 overlap │ └─────────────┘ └─────────────┘
│ └─────────────┘
▼ ┌─────────────┐
┌─────────────┐ ┌─────────────┐ │ BM25 │
│ Extraction │────▶│ Metadata │────────────────────────▶│ Index │
│ + Cleaning │ │ Enrichment │ │ (Elastic) │
└──────┬──────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Quality │
│ Gates │
└─────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ ONLINE RETRIEVAL │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │ │ Query │ ┌─▶│ Vector │────┐ │ Hybrid │
│ Query │────▶│ Embedding │──┤ │ Search │ ├▶│ Fusion │
└─────────────┘ │ e5-large │ │ │ (top 100) │ │ │ (RRF) │
└─────────────┘ │ └─────────────┘ │ └──────┬──────┘
│ │ │
│ ┌─────────────┐ │ │
└─▶│ BM25 │────┘ │
│ Search │ │
│ (top 100) │ │
└─────────────┘ │
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ GENERATION │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Hybrid │ │ Reranker │ │ Context │ │ LLM │
│ Results │────▶│ bge-reranker│────▶│ Assembly │────▶│ (Claude) │
│ (top 100) │ │ top 20 → 5 │ │ + Prompt │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
│
▼
┌─────────────┐
│ Response │
│ + Citations │
└──────┬──────┘
│
┌─────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ FEEDBACK LOOP │
│ User Clicks │ Corrections │ Ratings │ Latency Metrics │
└─────────────────────────────────────────────────────────────────────────────┘
Mermaid Diagram (for rendered viewing)
flowchart TB
subgraph Offline["OFFLINE INDEXING"]
direction LR
Docs[("Source Docs<br/>PDF/DOCX/HTML")] --> Extract["Extraction<br/>+ Cleaning"]
Extract --> Chunk["Chunking<br/>512 tokens"]
Chunk --> Embed["Embedding<br/>e5-large"]
Embed --> VectorStore[("Vector Store<br/>Milvus")]
Extract --> Meta["Metadata<br/>Enrichment"]
Meta --> BM25Store[("BM25 Index<br/>Elasticsearch")]
end
subgraph Online["ONLINE RETRIEVAL"]
direction LR
Query["User Query"] --> QueryEmbed["Query<br/>Embedding"]
QueryEmbed --> VectorSearch["Vector Search<br/>top 100"]
QueryEmbed --> BM25Search["BM25 Search<br/>top 100"]
VectorSearch --> Fusion["Hybrid Fusion<br/>RRF"]
BM25Search --> Fusion
end
subgraph Generation["GENERATION"]
direction LR
Fusion --> Rerank["Reranker<br/>100→5"]
Rerank --> Context["Context<br/>Assembly"]
Context --> LLM["LLM<br/>Claude"]
LLM --> Response["Response<br/>+ Citations"]
end
VectorStore -.-> VectorSearch
BM25Store -.-> BM25Search
Response --> Feedback["Feedback Loop<br/>Clicks, Ratings, Latency"]
style Offline fill:#e1f5fe
style Online fill:#fff3e0
style Generation fill:#e8f5e9
Key Design Decisions
| Chunk size |
512 tokens, 50 overlap |
Balance retrieval precision vs context coherence |
| Embedding |
e5-large-v2 |
Best quality/latency tradeoff for English |
| Vector DB |
Milvus |
Self-hosted, scales to 100M+ vectors |
| Reranker |
bge-reranker-large |
Cross-encoder accuracy, acceptable latency |
| Hybrid weight |
0.7 vector / 0.3 BM25 |
Tuned on held-out queries |
Data Flow Summary
- Ingestion: Source docs → Extraction → Cleaning → Metadata enrichment → Quality gates
- Indexing: Chunking → Embedding → Vector store + BM25 index
- Retrieval: Query embed → Vector search → Hybrid fusion (RRF) → Rerank top 20 → top 5
- Generation: Context assembly → Prompt template → LLM → Response with citations
- Feedback: User signals feed back into system improvement