Complete AET-RAG System Architecture
End-to-end view of the AET-RAG system showing user interaction, LangChain processing, Vertex AI models, and data persistence layers.
👤
Data Scientists
End Users
→
🌐
Flask Web UI
Chat Interface
→
☁️
Cloud Run
Container Hosting
→
🔗
LangGraph
Research Workflow
↔
🤖
Vertex AI
Gemini Models
↔
🗄️
ChromaDB
Vector Database
LangChain Framework Architecture
LangChain components powering the RAG system with embeddings, retrievers, and chat models.
🟢 LangChain Core Components
📝
ChatVertexAI
LLM Interface
🔍
VertexAIEmbeddings
text-embedding-005
📋
ChatPromptTemplate
Prompt Engineering
🔄
EnsembleRetriever
Hybrid Search
✂️
TextSplitter
Document Chunking
🔗
LCEL Chains
Pipeline Orchestration
LangGraph Research Workflow
Advanced multi-step research workflow using LangGraph for deep document analysis and response generation.
🔍
analyze_query
Intent & Entity Extraction
→
📋
plan_research
Strategy Planning
→
📚
retrieve_documents
Multi-Strategy Retrieval
→
🎯
filter_and_rank
Relevance Scoring
→
🏗️
build_context
Context Assembly
→
✍️
generate_answer
Response Generation
→
✅
validate_response
Quality Assurance
Google Vertex AI Integration
Comprehensive integration with Google Vertex AI services for embeddings and language models.
🔴 Vertex AI Models
🤖
Gemini Models
Language Generation
- Gemini 2.0 Flash (Default)
- Gemini 2.0 Flash Lite
- Gemini 1.5 Flash
- Gemini 1.5 Pro
- Gemini 2.5 Flash (us-central1)
- Gemini 2.5 Pro (us-central1)
🔍
Embedding Models
Vector Generation
- text-embedding-005
- Multimodal Support
- High Dimensional Vectors
- Semantic Understanding
🔵 GCP Configuration
🔑
Service Account
Authentication
→
🌍
us-east1
Primary Region
→
📊
Project: aethrag2
GCP Project
Google Cloud Platform Deployment
Complete deployment architecture on Google Cloud Platform with Docker containers and Cloud Run.
🔵 GCP Infrastructure
🌐
Cloud Run
aet-rag-service
←
📦
Artifact Registry
aet-rag-repo-east
←
🐳
Docker Image
Python FastAPI
Container Architecture
🔗
LangChain
RAG Framework
📊
LangGraph
Workflow Engine
🤖
Vertex AI SDK
AI Integration
→
🔗
Cloud Run URL
aet-rag-service-*.us-east1.run.app
→
🐳
Container Instance
Auto-scaling
Automated CI/CD Pipeline
GitHub Actions-powered deployment pipeline with Docker containerization and Google Cloud deployment.
🔄 Deployment Pipeline
📚
GitHub Repository
Source Code
→
⚙️
GitHub Actions
CI/CD Trigger
→
🐳
Docker Build
--platform linux/amd64
→
📦
Artifact Registry
Image Push
→
🚀
Cloud Run Deploy
Service Update
Build Process
- Trigger: Push to main branch
- Environment: ubuntu-latest
- Docker: Multi-stage build
- Platform: linux/amd64
- Registry: us-east1-docker.pkg.dev
- Authentication: Workload Identity
Deployment Features
- Auto-scaling: 0-10 instances
- Memory: 2Gi per instance
- CPU: 1 vCPU per instance
- Timeout: 300 seconds
- Port: 8080
- Health Check: /health endpoint
System Status & Configuration
Current deployment status, URLs, and configuration details for the AET-RAG system.
Deployment Status ✅ Active
- Service URL: aet-rag-service-946801466441.us-east1.run.app
- Region: us-east1
- Project: aethrag2
- Runtime: Python 3.9 + Flask
- Container: Docker via Artifact Registry
- Auto-scaling: 0-10 instances
- Memory: 2Gi per instance
AI Models Status ✅ Active
- Default Model: gemini-2.0-flash-001
- Available Models: 6 models configured
- Embeddings: text-embedding-005
- Region Support: us-east1 optimized
- Fallback Logic: Automatic model switching
- Temperature: 0.7 (configurable)
- Max Tokens: 8192 output
Key Features & Capabilities
📚
Document Processing
PDF, Text, Multi-format
🔍
Advanced RAG
Multi-strategy Retrieval
📊
LangGraph Workflow
7-step Research Process
🤖
Gemini Integration
6 Model Options
🗄️
Vector Database
ChromaDB Persistence
💬
Interactive Chat
Web-based Interface
Technical Specifications
Framework Stack
- LangChain 0.3+ (Core Framework)
- LangGraph (Workflow Engine)
- Flask (Web Framework)
- ChromaDB (Vector Database)
- Google Vertex AI SDK
Infrastructure
- Google Cloud Run (Serverless)
- Artifact Registry (Container Storage)
- GitHub Actions (CI/CD)
- Docker (Containerization)
- Workload Identity (Authentication)