AET-RAG System - Architecture Overview

Complete AET-RAG System Architecture

End-to-end view of the AET-RAG system showing user interaction, LangChain processing, Vertex AI models, and data persistence layers.

👤

Data Scientists

End Users

→

🌐

Flask Web UI

Chat Interface

→

☁️

Cloud Run

Container Hosting

→

🔗

LangGraph

Research Workflow

↔

🤖

Vertex AI

Gemini Models

↔

🗄️

ChromaDB

Vector Database

LangChain Framework Architecture

LangChain components powering the RAG system with embeddings, retrievers, and chat models.

🟢 LangChain Core Components

📝

ChatVertexAI

LLM Interface

🔍

VertexAIEmbeddings

text-embedding-005

📋

ChatPromptTemplate

Prompt Engineering

🔄

EnsembleRetriever

Hybrid Search

✂️

TextSplitter

Document Chunking

🔗

LCEL Chains

Pipeline Orchestration

📄

Document Loader

→

✂️

Text Splitter

→

🔍

Embeddings

→

🗄️

Vector Store

LangGraph Research Workflow

Advanced multi-step research workflow using LangGraph for deep document analysis and response generation.

🔍

analyze_query

Intent & Entity Extraction

→

📋

plan_research

Strategy Planning

→

📚

retrieve_documents

Multi-Strategy Retrieval

→

🎯

filter_and_rank

Relevance Scoring

→

🏗️

build_context

Context Assembly

→

✍️

generate_answer

Response Generation

→

✅

validate_response

Quality Assurance

Research Strategies

🔍

Semantic Search

🔤

Keyword Search

🏷️

Field-Specific

🔄

Hybrid Retrieval

📊

Intent-Based

🎯

Context-Aware

Google Vertex AI Integration

Comprehensive integration with Google Vertex AI services for embeddings and language models.

🔴 Vertex AI Models

🤖

Gemini Models

Language Generation

Gemini 2.0 Flash (Default)
Gemini 2.0 Flash Lite
Gemini 1.5 Flash
Gemini 1.5 Pro
Gemini 2.5 Flash (us-central1)
Gemini 2.5 Pro (us-central1)

🔍

Embedding Models

Vector Generation

text-embedding-005
Multimodal Support
High Dimensional Vectors
Semantic Understanding

🔵 GCP Configuration

🔑

Service Account

Authentication

→

🌍

us-east1

Primary Region

→

📊

Project: aethrag2

GCP Project

❓

User Query

→

🔍

Query Embedding

→

🔍

Vector Search

→

🤖

Gemini Response

Google Cloud Platform Deployment

Complete deployment architecture on Google Cloud Platform with Docker containers and Cloud Run.

🔵 GCP Infrastructure

🌐

Cloud Run

aet-rag-service

←

📦

Artifact Registry

aet-rag-repo-east

←

🐳

Docker Image

Python FastAPI

Container Architecture

🐍

Python 3.9

Runtime

⚡

Flask

Web Framework

🔗

LangChain

RAG Framework

📊

LangGraph

Workflow Engine

🗄️

ChromaDB

Vector Store

🤖

Vertex AI SDK

AI Integration

🌐

Internet

→

🔗

Cloud Run URL

aet-rag-service-*.us-east1.run.app

→

🐳

Container Instance

Auto-scaling

Automated CI/CD Pipeline

GitHub Actions-powered deployment pipeline with Docker containerization and Google Cloud deployment.

🔄 Deployment Pipeline

📚

GitHub Repository

Source Code

→

⚙️

GitHub Actions

CI/CD Trigger

→

🐳

Docker Build

--platform linux/amd64

→

📦

Artifact Registry

Image Push

→

🚀

Cloud Run Deploy

Service Update

Build Process

Trigger: Push to main branch
Environment: ubuntu-latest
Docker: Multi-stage build
Platform: linux/amd64
Registry: us-east1-docker.pkg.dev
Authentication: Workload Identity

Deployment Features

Auto-scaling: 0-10 instances
Memory: 2Gi per instance
CPU: 1 vCPU per instance
Timeout: 300 seconds
Port: 8080
Health Check: /health endpoint

System Status & Configuration

Current deployment status, URLs, and configuration details for the AET-RAG system.

Deployment Status ✅ Active

Service URL: aet-rag-service-946801466441.us-east1.run.app
Region: us-east1
Project: aethrag2
Runtime: Python 3.9 + Flask
Container: Docker via Artifact Registry
Auto-scaling: 0-10 instances
Memory: 2Gi per instance

AI Models Status ✅ Active

Default Model: gemini-2.0-flash-001
Available Models: 6 models configured
Embeddings: text-embedding-005
Region Support: us-east1 optimized
Fallback Logic: Automatic model switching
Temperature: 0.7 (configurable)
Max Tokens: 8192 output

Key Features & Capabilities

📚

Document Processing

PDF, Text, Multi-format

🔍

Advanced RAG

Multi-strategy Retrieval

📊

LangGraph Workflow

7-step Research Process

🤖

Gemini Integration

6 Model Options

🗄️

Vector Database

ChromaDB Persistence

💬

Interactive Chat

Web-based Interface

Technical Specifications

Framework Stack

LangChain 0.3+ (Core Framework)
LangGraph (Workflow Engine)
Flask (Web Framework)
ChromaDB (Vector Database)
Google Vertex AI SDK

Infrastructure

Google Cloud Run (Serverless)
Artifact Registry (Container Storage)
GitHub Actions (CI/CD)
Docker (Containerization)
Workload Identity (Authentication)

🤖 AET-RAG System

Framework Stack

Infrastructure