# Tech Stack Advisor - Technical Documentation
> **A production-grade multi-agent AI system with modern web interface, user authentication, and intelligent tech stack recommendations**
## π Table of Contents
1. [Project Overview](#project-overview)
2. [What We're Trying to Achieve](#what-were-trying-to-achieve)
3. [System Architecture](#system-architecture)
4. [Key Technical Decisions](#key-technical-decisions)
5. [Implementation Details](#implementation-details)
6. [Memory Management](#memory-management)
7. [Authentication & Security](#authentication--security)
8. [Challenges & Solutions](#challenges--solutions)
9. [Deployment Journey](#deployment-journey)
10. [Performance & Scalability](#performance--scalability)
11. [Lessons Learned](#lessons-learned)
---
## Project Overview
Tech Stack Advisor is a multi-agent AI system that provides intelligent technology stack recommendations using retrieval-augmented generation (RAG) and specialized AI agents. The system analyzes user requirements through intelligent multi-turn conversations and provides comprehensive recommendations across five domains: conversation management, database selection, infrastructure design, cost estimation, and security/compliance.
**Live Demo:** https://ranjana-tech-stack-advisor-production.up.railway.app
**Key Statistics:**
- **5 Specialized AI Agents** orchestrated by LangGraph
- **34 Technical Documents** in RAG knowledge base
- **~3,400 Lines of Code** (backend + frontend)
- **Multi-Turn Conversations** with context accumulation
- **Long-Term Memory** using Qdrant semantic search (384-dim vectors)
- **Sub-4 Second** recommendation generation
- **$0.0015** cost per recommendation
---
## What We're Trying to Achieve
### Primary Goals
1. **Democratize Technical Decision-Making**
- Make expert-level tech stack advice accessible to everyone
- Reduce analysis paralysis for new projects
- Provide data-driven recommendations, not opinions
2. **Production-Ready System**
- Not just a prototypeβdeployable and scalable
- Real authentication and authorization
- Cost monitoring and budget controls
- Comprehensive error handling
3. **Learn Modern AI Engineering**
- Multi-agent orchestration with LangGraph
- RAG implementation with vector databases
- Production deployment on cloud infrastructure
- Integration of OAuth providers
### Success Criteria
- β
Generate recommendations in < 5 seconds
- β
Cost per query < $0.01
- β
Support 100+ concurrent users
- β
99.9% uptime
- β
Secure authentication (OAuth + JWT)
- β
Mobile-responsive UI
---
## System Architecture
### High-Level Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Modern Web UI (HTML/CSS/JavaScript) β
β β’ User authentication (Local + Google OAuth) β
β β’ Responsive design β
β β’ Real-time API integration β
β β’ Admin dashboard β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β HTTP/REST + JWT Auth
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend (Port 8000) β
β β’ Serves static files (HTML/CSS/JS) β
β β’ REST API endpoints β
β β’ JWT authentication β
β β’ Rate limiting & cost controls β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LangGraph Workflow Orchestrator β
β β’ Context parsing (NLP extraction) β
β β’ Sequential agent coordination β
β β’ State management β
β β’ Error handling & recovery β
ββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββ
β β β β
ββββΌβββ ββββΌβββ ββββΌβββ ββββΌβββ
β DB β βInfraβ βCost β β Sec β
βAgentβ βAgentβ βAgentβ βAgentβ
ββββ¬βββ ββββ¬βββ ββββ¬βββ ββββ¬βββ
β β β β
βββββββββββ΄ββββββββββ΄ββββββββββ
β
βββββββββββββΌββββββββββββ
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββ
βQdrant β βClaude β βSQLite β
βVector β β AI β β Users β
βStore β β (LLM) β β DB β
ββββββββββ ββββββββββ ββββββββββ
```
### Component Breakdown
**1. Frontend Layer**
- **Technology:** Vanilla HTML/CSS/JavaScript
- **Why:** Simplicity, no build step, direct deployment
- **Features:** Authentication, responsive design, real-time updates
**2. API Layer**
- **Technology:** FastAPI (Python)
- **Why:** Async support, auto-generated docs, type safety
- **Features:** JWT auth, rate limiting, static file serving
**3. Orchestration Layer**
- **Technology:** LangGraph
- **Why:** State management, agent coordination, error recovery
- **Features:** Sequential workflow, state persistence, observability
**4. Agent Layer**
- **Technology:** Custom agents with Anthropic Claude
- **Why:** Specialized domain expertise, parallel processing
- **Features:** Tool-based architecture, cost tracking, logging
**5. Knowledge Layer**
- **Technology:** Qdrant (vector DB) + sentence-transformers
- **Why:** Semantic search, fast retrieval, scalability
- **Features:** 34 curated documents, metadata filtering
**6. Storage Layer**
- **Technology:** SQLite (users) + Qdrant (knowledge)
- **Why:** Simplicity for MVP, easy migration path
- **Features:** User management, OAuth integration
---
## Key Technical Decisions
### 1. Single-Page Application vs. Framework
**Decision:** Vanilla JavaScript single-page application
**Why:**
- β
No build step or bundler complexity
- β
Faster development iteration
- β
Direct deployment to Railway
- β
Lower barrier to understanding codebase
- β
~1,500 lines vs potential 5,000+ with React
**Rejected Alternatives:**
- **React/Next.js:** Overkill for this use case, adds build complexity
- **Streamlit:** Initially used, but removed due to WebSocket requirements and deployment complexity
- **Vue/Svelte:** Similar benefits to React but less ecosystem support
**Trade-offs Accepted:**
- Manual DOM manipulation (acceptable for our scope)
- No component reusability (not needed at current scale)
- Limited state management (JWT + localStorage sufficient)
---
### 2. Backend Framework Selection
**Decision:** FastAPI
**Why:**
- β
Native async/await support (critical for LLM calls)
- β
Automatic OpenAPI documentation
- β
Type hints for better code quality
- β
Easy integration with Pydantic models
- β
Growing ecosystem and community
**Rejected Alternatives:**
- **Flask:** Synchronous by default, less modern features
- **Django:** Too heavy, ORM overkill for our needs
- **Node.js/Express:** Team expertise in Python, better AI library support
**Trade-offs Accepted:**
- Python GIL limitations (mitigated by async)
- Slightly slower than Go/Rust (acceptable for our latency requirements)
---
### 3. LLM Provider Selection
**Decision:** Anthropic Claude (Haiku model)
**Why:**
- β
Best cost/performance ratio ($0.25 per 1M input tokens)
- β
Long context windows (200K tokens)
- β
Strong instruction following
- β
Built-in safety features
- β
Lower latency than GPT-4
**Cost Comparison (per 1,000 queries):**
```
Claude Haiku: $1.50
GPT-3.5-Turbo: $2.00
GPT-4-Turbo: $30.00
Gemini Pro: $0.50 (but slower, less reliable)
```
**Rejected Alternatives:**
- **OpenAI GPT-4:** 20x more expensive, unnecessary for our use case
- **GPT-3.5:** Similar price but lower quality responses
- **Open-source models:** Infrastructure complexity, lower quality
- **Gemini:** Inconsistent API, less mature ecosystem
---
### 4. Multi-Agent Architecture
**Decision:** 5 specialized agents orchestrated by LangGraph
**Why:**
- β
**Separation of concerns:** Each agent is expert in one domain
- β
**Parallel development:** Team can work on agents independently
- β
**Scalability:** Easy to add new agents or modify existing ones
- β
**Observability:** Clear boundaries for debugging and logging
- β
**Cost optimization:** Only invoke agents needed for query
**Agent Design:**
```python
class BaseAgent:
"""Base class for all agents"""
- LLM integration (Anthropic Claude)
- Tool management
- Cost tracking
- Structured logging
- Error handling
```
**Why LangGraph?**
- State management out of the box
- Visual workflow representation
- Error recovery and retries
- Easy to test individual nodes
- Active development and community
**Rejected Alternatives:**
- **LangChain Chains:** Less flexible, harder to debug
- **Custom orchestration:** Reinventing the wheel, maintenance burden
- **Single mega-agent:** Poor separation of concerns, higher costs
---
### 5. Vector Database Selection
**Decision:** Qdrant
**Why:**
- β
Native Python client
- β
Excellent documentation
- β
Built-in filtering and search
- β
Cloud offering (easy deployment)
- β
Free tier for development
- β
Fast query performance (< 30ms)
**Rejected Alternatives:**
- **Pinecone:** More expensive, vendor lock-in
- **Weaviate:** More complex setup, heavier resource usage
- **ChromaDB:** Less mature, limited production features
- **pgvector:** Requires PostgreSQL expertise, less optimized for vectors
**Trade-offs Accepted:**
- Vendor dependency (mitigated by standard vector formats)
- Limited ecosystem compared to Pinecone
- Requires separate service (not embedded)
---
### 6. Authentication Strategy
**Decision:** JWT + Google OAuth 2.0
**Why:**
- β
**JWT:** Stateless, scales horizontally, industry standard
- β
**Google OAuth:** Users trust Google, no password management
- β
**Hybrid approach:** Flexibility for users without Google accounts
- β
**Security:** bcrypt for passwords, 1-hour token expiration
**Architecture:**
```
Registration β bcrypt hash β SQLite
Login β JWT token (1 hour) β localStorage
Google OAuth β Exchange code β Create/update user β JWT token
API calls β Verify JWT β Allow/Deny
```
**Rejected Alternatives:**
- **Session-based auth:** Doesn't scale horizontally, requires Redis
- **OAuth-only:** Excludes users without Google accounts
- **Magic links:** Poor UX, email deliverability issues
- **No auth:** Security risk, no user management
**Trade-offs Accepted:**
- Token refresh complexity (1-hour expiration acceptable for MVP)
- localStorage security (acceptable risk vs. httpOnly cookies)
- OAuth setup complexity (worth it for UX improvement)
---
### 7. Frontend Architecture Evolution
**Original Decision:** Streamlit
**Why We Chose Streamlit Initially:**
- β
Rapid prototyping (got MVP in 2 hours)
- β
Python-based (no context switching)
- β
Built-in components
**Why We Switched to HTML/CSS/JS:**
- β **WebSocket requirement:** Streamlit requires persistent WebSocket connection
- β **Deployment complexity:** Needed separate service, more resources
- β **Limited customization:** Hard to match design requirements
- β **Authentication challenges:** Streamlit auth doesn't integrate well with JWT
- β **Cost:** Running two services on Railway vs. one
**Migration Impact:**
- Development time: +8 hours
- Final result: Better UX, single service, $5/month cheaper
- Learning: Premature optimization to stick with Streamlit would have cost more
---
### 8. Deployment Platform Selection
**Decision:** Railway
**Why:**
- β
GitHub integration (auto-deploy on push)
- β
Simple pricing ($5/month vs AWS Free Tier complexity)
- β
Built-in SSL certificates
- β
Easy environment variable management
- β
Good documentation and support
- β
Automatic HTTPS
**Cost Comparison:**
```
Railway: $5-10/month (simple, all-inclusive)
Vercel: Not suitable (no WebSocket/long-running processes)
Heroku: $7/month (deprecating free tier, less features)
AWS EC2: Free tier (complex setup, security management)
DigitalOcean: $6/month (more setup, manual SSL)
Render: $0-7/month (slow cold starts on free tier)
```
**Why We Upgraded to Paid Plan:**
- Free tier: 500 hours/month
- Our app: 24/7 running = 720 hours/month
- Exceeded limit β app went down
- **Lesson:** Factor in deployment costs early
**Rejected Alternatives:**
- **Vercel:** Can't run FastAPI backend with persistent connections
- **AWS Free Tier:** Too complex for MVP, time investment not justified
- **Heroku:** More expensive, less modern
- **Self-hosted VPS:** Maintenance burden, security responsibility
---
## Implementation Details
### 1. RAG System Architecture
**Goal:** Provide agents with relevant technical knowledge for recommendations
**Implementation:**
```python
# Embedding Model: sentence-transformers
model = SentenceTransformer('all-MiniLM-L6-v2')
# 384-dimensional vectors, 1-2ms per query
# Vector Database: Qdrant
collection = qdrant.create_collection(
collection_name="tech_stack_knowledge",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Knowledge Base Structure
knowledge_base/
βββ databases.json # 10 documents (PostgreSQL, MongoDB, Redis, etc.)
βββ infrastructure.json # 12 documents (AWS, GCP, Kubernetes, etc.)
βββ security.json # 12 documents (GDPR, HIPAA, security practices)
```
**Search Flow:**
1. User query β Extract technical terms
2. Generate query embedding (2ms)
3. Search Qdrant with metadata filters (25ms)
4. Return top 5 relevant documents
5. Inject into agent prompt
6. Agent generates recommendation
**Performance:**
- Query latency: ~30ms total
- Accuracy: 85-90% relevant results
- Scalability: Handles 100K+ documents
---
### 2. Agent Tool Architecture
**Design Pattern:** Protocol-based tool system
```python
class Tool(Protocol):
name: str
description: str
def execute(self, **kwargs: Any) -> dict[str, Any]: ...
class DatabaseAgent:
tools = [
DatabaseKnowledgeTool(), # RAG search
DatabaseScaleEstimator() # Scale calculations
]
def analyze(self, context):
# 1. Use tools to gather information
knowledge = self.tools[0].execute(query="PostgreSQL")
scale = self.tools[1].execute(dau=100000)
# 2. Create prompt with context
prompt = self._build_prompt(context, knowledge, scale)
# 3. Call LLM
response = self.llm.invoke(prompt)
# 4. Track costs
self.usage_tracker.track(response.usage)
return response
```
**Benefits:**
- Composable: Tools can be mixed and matched
- Testable: Each tool can be unit tested
- Observable: Clear logging at tool boundaries
- Extensible: New tools don't affect agent code
---
### 3. LangGraph Workflow Implementation
**Sequential Pipeline Design:**
```python
workflow = StateGraph(WorkflowState)
# Add nodes
workflow.add_node("parse_query", parse_query_node)
workflow.add_node("database_agent", database_node)
workflow.add_node("infrastructure_agent", infrastructure_node)
workflow.add_node("cost_agent", cost_node)
workflow.add_node("security_agent", security_node)
workflow.add_node("synthesize", synthesize_node)
# Define flow
workflow.set_entry_point("parse_query")
workflow.add_edge("parse_query", "database_agent")
workflow.add_edge("database_agent", "infrastructure_agent")
workflow.add_edge("infrastructure_agent", "cost_agent")
workflow.add_edge("cost_agent", "security_agent")
workflow.add_edge("security_agent", "synthesize")
workflow.add_edge("synthesize", END)
```
**State Management:**
```python
class WorkflowState(TypedDict):
# Input
user_query: str
correlation_id: str
# Parsed context
dau: int | None
workload_type: str
compliance: list[str]
# Agent results
database_result: dict | None
infrastructure_result: dict | None
cost_result: dict | None
security_result: dict | None
# Output
final_recommendation: dict | None
error: str | None
```
**Why Sequential?**
- Infrastructure decisions depend on database choices
- Cost calculations depend on infrastructure selections
- Security recommendations depend on architecture
- Clear data flow, easier to debug
**Future Optimization:**
Could parallelize database + infrastructure agents (independent)
---
### 4. Cost Tracking & Budget Controls
**Implementation:**
```python
class UsageTracker:
def __init__(self):
self.daily_budget = 2.00 # USD
self.daily_queries = 0
self.daily_cost = 0.0
def track(self, usage: Usage):
# Claude Haiku pricing
input_cost = (usage.input_tokens / 1_000_000) * 0.25
output_cost = (usage.output_tokens / 1_000_000) * 1.25
total_cost = input_cost + output_cost
self.daily_cost += total_cost
self.daily_queries += 1
# Alert if over budget
if self.daily_cost > self.daily_budget:
logger.warning(f"Daily budget exceeded: ${self.daily_cost:.2f}")
def can_process_query(self) -> bool:
return self.daily_cost < self.daily_budget
```
**Budget Enforcement:**
```python
@app.post("/recommend")
async def recommend(request: RecommendationRequest):
if not usage_tracker.can_process_query():
raise HTTPException(
status_code=429,
detail=f"Daily budget of ${settings.daily_budget_usd} exceeded"
)
# Process query...
```
**Cost Breakdown (per query):**
```
Parse query: ~500 tokens = $0.0001
Database agent: ~1,700 tokens = $0.0004
Infrastructure: ~2,050 tokens = $0.0005
Cost agent: ~1,100 tokens = $0.0003
Security agent: ~1,400 tokens = $0.0004
Total: ~6,750 tokens = $0.0017
```
**Daily Budget Calculation:**
```
Budget: $2.00/day
Cost per query: $0.0017
Max queries: 1,176/day
Actual limit: 100/day (buffer for safety)
```
---
## Memory Management
The system implements a comprehensive three-tier memory architecture: request-scoped correlation tracking, session-based multi-turn conversations, and persistent long-term memory using Qdrant vector database.
---
### Short-Term Memory (Request + Session Scope)
**1. Request-Scoped Correlation Tracking**
**Implementation:** Correlation IDs for request tracing
```python
import uuid
from contextvars import ContextVar
# Request-scoped correlation ID
correlation_id_var: ContextVar[str] = ContextVar('correlation_id')
@app.middleware("http")
async def add_correlation_id(request: Request, call_next):
correlation_id = str(uuid.uuid4())
correlation_id_var.set(correlation_id)
# Log all events with this ID
logger.info("request_start", correlation_id=correlation_id)
response = await call_next(request)
return response
```
**Purpose:**
- Trace single request through all agents
- Debug issues by correlation ID
- Performance analysis per request
**Example Log Trail:**
```json
{"event": "request_start", "correlation_id": "abc123", "query": "..."}
{"event": "parse_query", "correlation_id": "abc123", "dau": 100000}
{"event": "database_agent_start", "correlation_id": "abc123"}
{"event": "llm_call", "correlation_id": "abc123", "tokens": 1700}
{"event": "database_agent_complete", "correlation_id": "abc123"}
{"event": "request_complete", "correlation_id": "abc123", "duration_ms": 2340}
```
---
**2. Session-Based Multi-Turn Conversations (Implemented)**
**Implementation:** In-memory SessionStore with 30-minute timeout
```python
from typing import TypedDict
import time
import uuid
class SessionData(TypedDict):
user_id: str
conversation_history: list[dict]
extracted_context: dict
completion_percentage: int
ready_for_recommendation: bool
last_activity: float
_sessions: dict[str, SessionData] = {}
SESSION_TIMEOUT = 1800 # 30 minutes
class SessionStore:
"""In-memory short-term conversation memory"""
@staticmethod
def create_session(user_id: str) -> str:
"""Create new conversation session"""
session_id = str(uuid.uuid4())
_sessions[session_id] = {
"user_id": user_id,
"conversation_history": [],
"extracted_context": {},
"completion_percentage": 0,
"ready_for_recommendation": False,
"last_activity": time.time()
}
return session_id
@staticmethod
def add_message(session_id: str, role: str, content: str):
"""Add message to conversation history"""
if session_id not in _sessions:
raise ValueError("Session not found")
session = _sessions[session_id]
session["conversation_history"].append({
"role": role,
"content": content,
"timestamp": time.time()
})
session["last_activity"] = time.time()
@staticmethod
def update_context(session_id: str, context_updates: dict):
"""Update extracted context from conversation"""
if session_id not in _sessions:
raise ValueError("Session not found")
session = _sessions[session_id]
session["extracted_context"].update(context_updates)
# Calculate completion percentage
required_fields = ["dau", "workload_type", "budget", "compliance"]
filled = sum(1 for f in required_fields if f in session["extracted_context"])
session["completion_percentage"] = int((filled / len(required_fields)) * 100)
# Mark ready when 100% complete
if session["completion_percentage"] == 100:
session["ready_for_recommendation"] = True
@staticmethod
def get_session(session_id: str) -> SessionData:
"""Retrieve session data"""
if session_id not in _sessions:
raise ValueError("Session not found")
return _sessions[session_id]
@staticmethod
def cleanup_expired_sessions():
"""Remove sessions older than timeout"""
current_time = time.time()
expired = [
sid for sid, session in _sessions.items()
if current_time - session["last_activity"] > SESSION_TIMEOUT
]
for sid in expired:
del _sessions[sid]
```
**Conversation Flow Example:**
1. **User:** "I need a tech stack for my project"
2. **Agent:** "How many daily active users do you expect?"
3. **User:** "Around 100K users"
4. **Agent:** "What type of data will you be storing?"
5. **User:** "User profiles, chat messages, and media files"
6. **Context Updates:** `{"dau": 100000, "workload_type": "realtime", "data_type": "mixed"}`
7. **Completion:** `completion_percentage` increases from 0% β 75%
8. **Agent:** "What's your monthly budget?"
9. **User:** "$500/month"
10. **Context Updates:** `{"budget": 500}`
11. **Ready:** `ready_for_recommendation = True`, generates full recommendation
**Enabled Multi-Turn Queries:**
- "What if I increase the budget to $1000?" β Updates context, regenerates recommendations
- "Can you recommend alternatives to PostgreSQL?" β Refines database recommendations
- "How would this change for 1M users instead?" β Re-runs all agents with new scale
**Production Note:** For multi-instance deployments, migrate from in-memory SessionStore to Redis for persistence across server restarts.
---
### Long-Term Memory (User History - Implemented with Qdrant)
**Implementation:** Persistent storage using Qdrant vector database with semantic search
**Three Qdrant Collections:**
```python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
class UserMemoryStore:
"""Long-term memory with semantic search capabilities"""
def __init__(self):
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2') # 384-dim
self.client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
# Initialize collections
self._init_collections()
def _init_collections(self):
"""Create three collections for user memory"""
# 1. Users collection - Authentication + stats
self.client.create_collection(
collection_name="users",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# 2. User queries - Query history with embeddings
self.client.create_collection(
collection_name="user_queries",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# 3. User feedback - Feedback on recommendations
self.client.create_collection(
collection_name="user_feedback",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
def store_query(self, user_id: str, query: str, recommendations: dict,
tokens_used: int, cost_usd: float):
"""Store query with semantic embedding for similarity search"""
# Generate 384-dimensional embedding
query_embedding = self.embedding_model.encode(query).tolist()
# Store with vector for semantic search
self.client.upsert(
collection_name="user_queries",
points=[PointStruct(
id=str(uuid.uuid4()),
vector=query_embedding,
payload={
"user_id": user_id,
"query": query,
"recommendations": recommendations,
"tokens_used": tokens_used,
"cost_usd": cost_usd,
"timestamp": time.time()
}
)]
)
# Update user statistics
self._update_user_stats(user_id, tokens_used, cost_usd)
def search_similar_queries(self, user_id: str, query: str, limit: int = 5):
"""Find semantically similar past queries"""
# Generate query embedding
query_embedding = self.embedding_model.encode(query).tolist()
# Search with user filter
results = self.client.search(
collection_name="user_queries",
query_vector=query_embedding,
query_filter={"must": [{"key": "user_id", "match": {"value": user_id}}]},
limit=limit
)
return results # Returns queries with similarity scores (0-1)
def get_user_history(self, user_id: str, limit: int = 10):
"""Get recent query history for user"""
results = self.client.scroll(
collection_name="user_queries",
scroll_filter={"must": [{"key": "user_id", "match": {"value": user_id}}]},
limit=limit,
with_payload=True,
with_vectors=False
)
return results[0] # List of query records
def _update_user_stats(self, user_id: str, tokens: int, cost: float):
"""Update cumulative user statistics"""
# Fetch current stats
user = self.client.scroll(
collection_name="users",
scroll_filter={"must": [{"key": "user_id", "match": {"value": user_id}}]},
limit=1
)[0]
if user:
# Update existing user
current_queries = user[0].payload.get("total_queries", 0)
current_cost = user[0].payload.get("total_cost_usd", 0.0)
self.client.set_payload(
collection_name="users",
payload={
"total_queries": current_queries + 1,
"total_cost_usd": current_cost + cost,
"last_query": time.time()
},
points=[user[0].id]
)
```
**Collection Details:**
**1. users collection:**
- User authentication data (email, hashed password, OAuth tokens)
- Usage statistics (total_queries, total_cost_usd)
- User preferences and settings
**2. user_queries collection:**
- Complete query history with 384-dim semantic embeddings
- Recommendations returned for each query
- Token usage and cost tracking
- Timestamp for temporal filtering
**3. user_feedback collection:**
- User feedback on recommendations (helpful/not helpful)
- Rating scores (1-5 stars)
- Free-text comments
- Used for continuous improvement
**Enabled Features:**
1. **Query History:** "You asked something similar 2 days ago for a chat app"
2. **Semantic Search:** Find related queries even with different wording:
- Query: "real-time messaging app"
- Similar: "chat application with WebSocket" (similarity: 0.87)
3. **User Statistics:** Track total queries, cumulative cost per user
4. **Feedback Loop:** Store and analyze user feedback on recommendations
5. **Cost Tracking:** Monitor per-user API costs for budget controls
6. **Personalization:** Recommend technologies user has used successfully before
**Performance:**
- Embedding generation: ~2ms per query
- Semantic search: ~20-30ms (Qdrant)
- Storage cost: ~1KB per query
- 1M queries = ~1GB storage
**Privacy Considerations:**
- User queries may contain sensitive project information
- Implement data retention policies (e.g., delete after 90 days)
- Allow users to delete their history (GDPR right to erasure)
- Encrypt sensitive fields at rest
---
### Memory Architecture Summary
**Three Tiers Working Together:**
```
βββββββββββββββββββββββββββββββββββββββββββ
β Request Scope (Correlation ID) β
β β’ Single request tracing β
β β’ Performance monitoring β
β β’ Error debugging β
β Duration: Single request (~3 seconds) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Session Scope (SessionStore) β
β β’ Multi-turn conversations β
β β’ Context accumulation β
β β’ Completion tracking β
β Duration: 30 minutes (timeout) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Long-Term (Qdrant Vector DB) β
β β’ Query history (semantic search) β
β β’ User statistics & preferences β
β β’ Feedback collection β
β Duration: Permanent (90-day retention) β
βββββββββββββββββββββββββββββββββββββββββββ
```
**Benefits of This Architecture:**
1. **Request Tracing:** Debug any issue using correlation ID
2. **Multi-Turn Dialogs:** Gather requirements through conversation
3. **Semantic Memory:** "Show me similar queries I asked before"
4. **Personalization:** Learn user preferences over time
5. **Cost Control:** Track per-user spending
6. **Continuous Improvement:** Analyze feedback to improve recommendations
---
## Authentication & Security
### Why Authentication Was Necessary
**Initial Plan:** Public API, no auth
**Problems Encountered:**
1. **Abuse risk:** Anyone could make unlimited requests β cost spiral
2. **No user tracking:** Can't implement rate limiting per user
3. **No personalization:** Can't remember user preferences
4. **No admin features:** Can't manage users or view feedback
5. **Railway costs:** Need to control who uses the service
**Decision Point:** Add authentication after 1 week of development
---
### Authentication Architecture
**JWT Implementation:**
```python
from datetime import datetime, timedelta
from jose import jwt
import bcrypt
SECRET_KEY = os.getenv("SECRET_KEY", secrets.token_urlsafe(32))
ALGORITHM = "HS256"
TOKEN_EXPIRE_HOURS = 1
def create_access_token(data: dict) -> str:
to_encode = data.copy()
expire = datetime.utcnow() + timedelta(hours=TOKEN_EXPIRE_HOURS)
to_encode.update({"exp": expire})
encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
return encoded_jwt
def verify_token(token: str) -> dict:
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(401, "Token expired")
except jwt.JWTError:
raise HTTPException(401, "Invalid token")
```
**Password Hashing:**
```python
def hash_password(password: str) -> str:
salt = bcrypt.gensalt()
hashed = bcrypt.hashpw(password.encode(), salt)
return hashed.decode()
def verify_password(plain_password: str, hashed_password: str) -> bool:
return bcrypt.checkpw(
plain_password.encode(),
hashed_password.encode()
)
```
**Google OAuth Flow:**
```python
# 1. Generate auth URL with state
auth_url, state = generate_google_auth_url(
client_id=settings.google_client_id,
redirect_uri="http://localhost:8000/auth/google/callback"
)
# 2. User authenticates with Google
# (happens on Google's servers - password never touches our app)
# 3. Google redirects with code
@app.get("/auth/google/callback")
async def google_callback(code: str, state: str):
# Exchange code for access token
token = await exchange_code_for_token(code)
# Get user info from Google
user_info = await get_google_user_info(token)
# Create/update user in our DB
user = get_or_create_user(user_info["email"])
# Generate JWT for our app
jwt_token = create_access_token({"sub": user.email})
# Redirect to app with token
return RedirectResponse(f"/?token={jwt_token}")
```
---
### Security Measures Implemented
**1. Rate Limiting (SlowAPI)**
The system implements comprehensive rate limiting using **SlowAPI**, a FastAPI extension for rate limiting based on the token bucket algorithm with in-memory storage.
**Implementation Architecture:**
```python
# backend/src/api/main.py
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
# Initialize limiter with IP-based tracking
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
# Register exception handler for 429 responses
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
```
**Configuration (backend/src/core/config.py):**
```python
class Settings(BaseSettings):
# Rate Limiting
rate_limit_demo: str = "50/hour" # Demo/unauthenticated users
rate_limit_authenticated: str = "100/hour" # Authenticated users
daily_query_cap: int = 100 # Daily query limit per user
```
**Applied to Endpoints:**
```python
@app.post("/recommend")
@limiter.limit(settings.rate_limit_demo) # 50 requests/hour by IP
async def get_recommendation(request: Request, req: RecommendationRequest):
# Endpoint logic
pass
@app.post("/generate-diagram")
@limiter.limit(settings.rate_limit_demo)
async def generate_architecture_diagram(request: Request, req: dict):
pass
@app.post("/conversation/start")
@limiter.limit(settings.rate_limit_demo)
async def start_conversation(request: Request):
pass
```
**How It Works:**
1. **IP-Based Tracking**: `get_remote_address` extracts client IP from request headers
2. **Sliding Window Algorithm**: Tracks requests per IP in a time window (e.g., last hour)
3. **Automatic Enforcement**: When limit exceeded, returns HTTP 429 (Too Many Requests) with `Retry-After` header
4. **Per-Endpoint Limits**: Each decorated endpoint maintains independent rate limits
5. **In-Memory Storage**: Fast lookup with minimal latency (suitable for single-instance deployments)
**Benefits:**
- **Cost Control**: Prevents LLM API cost spiral from excessive requests
- **Abuse Prevention**: Protects against denial-of-service attempts
- **Fair Resource Allocation**: Ensures equitable access among users
- **Production-Ready**: Battle-tested library with minimal performance overhead
- **Configurable**: Different limits for demo vs authenticated users (50/hour vs 100/hour)
**Limitations & Future Enhancements:**
- **In-Memory Storage**: Limits reset on server restart; consider Redis backend for production clusters
- **IP-Based Only**: Sophisticated users can bypass with IP rotation; consider user-based limits
- **No Distributed Sync**: Multi-instance deployments need shared state (Redis/Memcached)
**2. CORS Configuration**
```python
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["https://your-domain.com"], # Production
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["Authorization", "Content-Type"],
)
```
**3. Input Validation**
```python
from pydantic import BaseModel, Field
class RecommendationRequest(BaseModel):
query: str = Field(..., min_length=10, max_length=1000)
dau: int | None = Field(None, ge=0, le=10_000_000)
budget_target: float | None = Field(None, ge=0)
```
**4. SQL Injection Prevention**
- Using SQLAlchemy ORM (parameterized queries)
- No raw SQL strings
**5. XSS Prevention**
- Frontend sanitizes all user input
- API returns JSON (not HTML)
- Content-Type headers set correctly
**6. CSRF Protection**
- OAuth state parameter (random token)
- JWT in Authorization header (not cookies)
---
## Challenges & Solutions
### Challenge 1: sentence-transformers Library Issues
**Problem:**
```
ImportError: cannot import name 'cached_download' from 'huggingface_hub'
```
**Root Cause:**
- `sentence-transformers` 2.x incompatible with `transformers` 4.x
- NumPy 2.0 breaking changes
- Conflicting dependency versions
**Investigation Process:**
1. Checked GitHub issues β Common problem
2. Tested different versions locally
3. Identified NumPy 2.0 as culprit
**Solution:**
```bash
# pyproject.toml
[project]
dependencies = [
"sentence-transformers>=2.2.2,<3.0.0",
"numpy>=1.21.0,<2.0.0", # Pin to NumPy 1.x
"transformers>=4.30.0",
]
```
**Lesson Learned:**
- Pin major versions in production
- Test dependencies before upgrading
- Check compatibility matrices
- Consider using Poetry/PDM for better dependency resolution
---
### Challenge 2: Railway Free Tier Limitations
**Problem:**
App went down with "exceeded usage limit" error
**Investigation:**
```
Free tier: 500 hours/month
Our usage: 24/7 Γ 30 days = 720 hours/month
Overage: 220 hours β app suspended
```
**Cost Analysis:**
```
Option 1: Hobby plan ($5/month) β Unlimited hours
Option 2: Sleep on inactivity β Complex, poor UX
Option 3: Migrate to AWS Free Tier β More complex, time-consuming
```
**Decision:** Upgrade to Hobby plan ($5/month)
**Why:**
- Simplest solution
- Predictable costs
- Allows continuous availability
- Still cheaper than AWS (when factoring in time)
**Lesson Learned:**
- Factor in deployment costs from day 1
- Free tiers have limits (500 hours = 20 days, not 30)
- $5/month is worth avoiding deployment headaches
---
### Challenge 3: Streamlit Deployment Complexity
**Problem:**
Streamlit requires WebSocket connection + separate service
**Initial Architecture:**
```
Railway Service 1: FastAPI (Backend)
Railway Service 2: Streamlit (Frontend)
```
**Issues:**
1. Two services = 2Γ cost
2. WebSocket persistence issues
3. Complex CORS configuration
4. Streamlit auth doesn't work with JWT
5. Slow cold starts
**Solution:** Rewrite frontend in vanilla HTML/CSS/JS
**Migration:**
- Time investment: 8 hours
- Cost savings: $5/month (50% reduction)
- Performance improvement: 2Γ faster page loads
- Deployment: Single service
**Lesson Learned:**
- Don't optimize too early for development speed
- Consider deployment implications upfront
- Sometimes simpler technology (vanilla JS) is better than frameworks
---
### Challenge 4: Semantic Search Accuracy
**Problem:**
RAG returning irrelevant results for some queries
**Example:**
```
Query: "GDPR compliance requirements"
Top result: "Kubernetes container orchestration" (wrong!)
```
**Root Cause:**
- General embeddings not domain-specific
- No metadata filtering
- Insufficient context in queries
**Solutions Implemented:**
**1. Add Metadata Filtering:**
```python
results = vectorstore.search(
query="GDPR compliance",
limit=5,
filters={"category": "security"} # Only search security docs
)
```
**2. Query Expansion:**
```python
def expand_query(query: str) -> str:
"""Add domain context to improve semantic search"""
expansions = {
"GDPR": "GDPR data protection privacy compliance EU",
"database": "database SQL NoSQL storage data",
"kubernetes": "kubernetes k8s container orchestration deployment"
}
for term, expansion in expansions.items():
if term.lower() in query.lower():
query = f"{query} {expansion}"
return query
```
**3. Re-ranking Results:**
```python
def rerank_results(query: str, results: list[dict]) -> list[dict]:
"""Use simple keyword matching to rerank vector search results"""
keywords = set(query.lower().split())
for result in results:
# Count keyword matches
text_lower = result["text"].lower()
matches = sum(1 for keyword in keywords if keyword in text_lower)
result["keyword_score"] = matches
# Sort by combined score
return sorted(
results,
key=lambda r: r["score"] * 0.7 + r["keyword_score"] * 0.3,
reverse=True
)
```
**Results:**
- Accuracy improved from ~70% to ~90%
- Query latency increased slightly (+5ms)
- User satisfaction improved
**Future Improvements:**
- Fine-tune embeddings on tech stack domain
- Use cross-encoder for re-ranking
- Implement hybrid search (BM25 + vectors)
---
### Challenge 5: Cost Control at Scale
**Problem:**
How to prevent runaway costs if app goes viral?
**Implemented Controls:**
**1. Daily Budget Cap:**
```python
DAILY_BUDGET_USD = 2.00
if usage_tracker.daily_cost >= DAILY_BUDGET_USD:
raise HTTPException(429, "Daily budget exceeded")
```
**2. Per-User Rate Limiting:**
```python
@limiter.limit("10/hour") # Per user (based on JWT)
async def recommend(request: Request, current_user: User):
pass
```
**3. Query Complexity Limits:**
```python
class RecommendationRequest(BaseModel):
query: str = Field(..., max_length=1000) # Prevent huge prompts
```
**4. Monitoring & Alerts:**
```python
if usage_tracker.daily_cost > DAILY_BUDGET_USD * 0.8:
send_email_alert(
subject="Budget Alert: 80% of daily limit",
message=f"Current: ${usage_tracker.daily_cost:.2f}"
)
```
**Cost Projections:**
```
Scenario 1: Normal usage (100 users/day)
100 queries Γ $0.0017 = $0.17/day = $5/month
Scenario 2: Viral (10,000 users/day)
Capped at daily budget: $2/day = $60/month (acceptable)
Scenario 3: Attack (100,000 requests/day)
Rate limiting prevents: Max 100 queries/day/user
Even 1000 users: $2/day (protected)
```
---
## Deployment Journey
### Timeline: Development to Production
**Week 1: Prototype (4 Core Agents)**
- Days 1-2: Built 4 agents with mock data (Database, Infrastructure, Cost, Security)
- Day 3: Integrated Claude API
- Days 4-5: LangGraph orchestration
- **Outcome:** Working MVP, $0 spent
**Week 2: RAG System**
- Days 1-2: Set up Qdrant, created knowledge base
- Day 3: Integrated vector search into agents
- Days 4-5: Tested and refined search accuracy
- **Outcome:** 90% search accuracy
**Week 3: API + Streamlit UI**
- Days 1-2: FastAPI endpoints
- Days 3-4: Streamlit UI
- Day 5: Testing end-to-end
- **Outcome:** Functional web app
**Week 4: Authentication + Redesign**
- Days 1-2: Added JWT auth
- Day 3: Integrated Google OAuth
- Days 4-5: Rewrote UI in vanilla JS (Streamlit issues)
- **Outcome:** Production-ready single service
**Week 5: Deployment**
- Day 1: Deployed to Railway free tier
- Day 2: App went down (exceeded free tier)
- Day 3: Upgraded to paid plan, fixed issues
- Days 4-5: Monitoring, bug fixes
- **Outcome:** Stable production deployment
**Week 6: Memory & Conversation Agent**
- Days 1-2: Implemented Conversation Manager Agent (5th agent)
- Day 3: Built SessionStore for multi-turn conversations (30-min timeout)
- Day 4: Implemented Qdrant-based long-term memory (3 collections)
- Day 5: Integrated semantic search for query history (384-dim vectors)
- **Outcome:** Intelligent multi-turn dialogues with persistent memory
---
### Deployment Configuration
**Railway Configuration (`railway.toml`):**
```toml
[build]
builder = "NIXPACKS"
[deploy]
startCommand = "python -m backend.src.api.main"
healthcheckPath = "/health"
healthcheckTimeout = 30
[[services]]
name = "tech-stack-advisor"
[services.env]
PORT = "8000"
ENVIRONMENT = "production"
```
**Environment Variables (Production):**
```bash
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
# Qdrant
QDRANT_URL=https://xxx.qdrant.io
QDRANT_API_KEY=xxx
# Google OAuth
GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=xxx
GOOGLE_REDIRECT_URI=https://your-domain.com/auth/google/callback
# Security
SECRET_KEY=xxx # For JWT signing
# Monitoring
LOG_LEVEL=INFO
ENVIRONMENT=production
```
**Dockerfile (for reference, not using currently):**
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY pyproject.toml .
RUN pip install -e .
# Copy application
COPY backend/ backend/
COPY knowledge_base/ knowledge_base/
# Run application
CMD ["python", "-m", "backend.src.api.main"]
```
---
### Deployment Checklist
Pre-deployment:
- [x] All environment variables set
- [x] Database migrations tested
- [x] API keys valid and funded
- [x] Rate limiting configured
- [x] Error handling comprehensive
- [x] Logging in place
- [x] Health check endpoint working
Post-deployment:
- [x] SSL certificate active
- [x] DNS configured correctly
- [x] Monitoring alerts set up
- [x] Backup strategy in place
- [x] Cost limits configured
- [x] Performance benchmarks met
---
### Monitoring & Observability
**Health Check Endpoint:**
```python
@app.get("/health")
async def health():
return {
"status": "healthy",
"timestamp": datetime.utcnow().isoformat(),
"agents_loaded": len(orchestrator.agents),
"uptime_seconds": time.time() - app.start_time
}
```
**Structured Logging:**
```python
import structlog
logger = structlog.get_logger()
# Every log includes:
logger.info(
"recommendation_generated",
correlation_id=correlation_id,
duration_ms=duration,
tokens_used=tokens,
cost_usd=cost
)
```
**Prometheus Metrics Endpoint:**
The system exposes Prometheus-format metrics at `/metrics/prometheus` for integration with monitoring systems like Grafana Cloud:
```bash
curl http://localhost:8000/metrics/prometheus
```
**HTTP Metrics:**
- `http_requests_total{method, endpoint, status_code}` - Total HTTP requests with labels
- `http_request_duration_seconds{method, endpoint}` - Request duration histogram (p50, p95, p99)
**LLM Usage & Cost Tracking:**
- `llm_tokens_total{agent, token_type}` - Token usage by agent (input/output)
- `llm_cost_usd_total{agent}` - Cumulative cost per agent
- `llm_requests_total{agent, status}` - LLM request count by status
- `llm_daily_tokens` - Daily token usage gauge
- `llm_daily_cost_usd` - Daily cost in USD gauge
- `llm_daily_queries` - Daily query count gauge
**Application Metrics:**
- `active_conversation_sessions` - Active conversation sessions count
- `user_registrations_total{oauth_provider}` - User registrations by OAuth provider
- `user_logins_total{oauth_provider}` - User logins by provider
- `recommendations_total{status, authenticated}` - Recommendations generated
**Grafana Cloud Integration:**
See [GRAFANA_CLOUD_SETUP.md](./GRAFANA_CLOUD_SETUP.md) for complete setup guide. The free tier provides:
- 10,000 metric series
- 14-day retention
- Real-time dashboards
- Alerting capabilities
- $0/month cost
**Example Queries:**
```promql
# Request rate
rate(http_requests_total[5m])
# P95 latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Daily cost tracking
llm_daily_cost_usd
# Error rate
sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
```
---
## Performance & Scalability
### Current Performance Metrics
**Latency Breakdown (typical request):**
```
Parse query: 5ms
Database agent: 800ms (LLM call)
Infrastructure: 850ms (LLM call)
Cost agent: 750ms (LLM call)
Security agent: 900ms (LLM call)
Synthesize: 10ms
Total: ~3,315ms (3.3 seconds)
```
**Bottlenecks:**
1. LLM API calls (sequential): 3,300ms / 3,315ms = 99.5% of time
2. Network latency to Anthropic: ~50-100ms per call
3. All other operations: < 20ms
**Optimization Opportunities:**
**1. Parallel Agent Execution:**
```python
# Current: Sequential (3,300ms)
db β infra β cost β security
# Optimized: Parallel (900ms - longest agent)
ββββ db (800ms)
ββββ infra (850ms)
ββββ cost (750ms)
ββββ security (900ms)
Improvement: 3.7Γ faster
```
**Implementation:**
```python
import asyncio
async def parallel_agents(state):
results = await asyncio.gather(
database_agent.analyze(state),
infrastructure_agent.analyze(state),
cost_agent.analyze(state),
security_agent.analyze(state)
)
return results
# Expected latency: ~900ms (longest agent)
```
**Why Not Implemented Yet:**
- Infrastructure decisions should consider database choices
- Cost depends on infrastructure
- Sequential flow easier to debug
- MVP optimized for correctness, not speed
---
### Scalability Analysis
**Current Capacity:**
```
Single Railway instance:
- CPU: 1 vCPU
- RAM: 512MB
- Concurrent requests: ~10 (async)
- Throughput: ~180 requests/hour (3.3s per request Γ 10 concurrent)
```
**Scaling Strategy:**
**Phase 1: Vertical Scaling (< 100 users/day)**
- Current: 512MB RAM, 1 vCPU
- Upgrade: 2GB RAM, 2 vCPUs
- Cost: +$10/month
- Capacity: 4Γ more concurrent requests
**Phase 2: Horizontal Scaling (100-1000 users/day)**
- Deploy multiple Railway instances
- Add load balancer
- **Challenge:** Shared state (JWT validation, rate limiting)
- **Solution:** Redis for shared rate limit counters
**Phase 3: Optimization (1000+ users/day)**
- Parallel agent execution (3.7Γ faster)
- Response caching (Redis)
- CDN for static assets
- Database connection pooling
**Cost Projections:**
```
100 users/day: $5/month (current)
500 users/day: $15/month (1 instance, optimized)
1000 users/day: $35/month (2 instances + Redis)
5000 users/day: $100/month (5 instances + Redis + optimizations)
```
---
### Caching Strategy (Future)
**What to Cache:**
**1. User Queries (Semantic Matching):**
```python
# If similar query asked recently, return cached result
cache_key = hash(query_embedding)
if cached := redis.get(f"query:{cache_key}"):
return cached
```
**2. RAG Results:**
```python
# Cache vector search results
cache_key = f"rag:{query}:{category}"
if cached := redis.get(cache_key):
return cached
# Cache for 1 hour (tech stacks don't change fast)
redis.setex(cache_key, 3600, results)
```
**3. Cost Data:**
```python
# Cloud pricing changes rarely
cache_key = "pricing:aws"
if cached := redis.get(cache_key):
return cached
# Cache for 24 hours
redis.setex(cache_key, 86400, pricing_data)
```
**Expected Impact:**
- Cache hit rate: 30-40% (similar queries)
- Latency reduction: 95% (3.3s β 0.2s for cached)
- Cost savings: 30-40% (fewer LLM calls)
---
## Lessons Learned
### Technical Lessons
**1. Start Simple, Scale When Needed**
- β
Vanilla JS served us better than React
- β
SQLite sufficient for MVP (100s of users)
- β
Single server until you need horizontal scaling
- β Don't prematurely optimize for millions of users
**2. Cost Management is Feature #1**
- Budget caps prevented surprises
- Cost tracking built in from day 1
- Railway paid plan was correct choice
- Monitoring > Prevention
**3. Authentication Complexity**
- OAuth is worth the setup time
- JWT is simpler than sessions for APIs
- Security can't be bolted on later
**4. Dependencies Matter**
- Pin versions in production
- Test upgrades before deploying
- NumPy 2.0 broke sentence-transformers
**5. Multi-Agent Architecture Scales**
- Easy to modify individual agents
- Clear boundaries for debugging
- Parallel execution possible (future)
---
### Process Lessons
**1. Documentation While Building**
- Wrote 8 comprehensive docs
- Saved hours in onboarding/debugging
- GitHub README as marketing
**2. Incremental Deployment**
- Week 1: Local only
- Week 2: Development environment
- Week 3: Free tier
- Week 4: Production
**3. User Feedback Early**
- Simplest UI (Streamlit) first
- Got feedback before full rewrite
- Saved time by validating concept
**4. Cost Transparency**
- Tracked every $0.001
- Users appreciate knowing costs
- Built trust with budget controls
---
### What We'd Do Differently
**1. Plan Deployment Earlier**
- Should have researched hosting options in week 1
- Free tier limits should be known upfront
- $5/month is nothing compared to development time
**2. Vanilla JS from Start**
- Streamlit was fast for prototype
- But migration took 8 hours
- Could have saved time
**3. Parallel Agents from Start**
- Architecture supports it
- Would be 3.7Γ faster
- Not critical for MVP but would be nice
**4. Better Knowledge Base**
- 34 documents is bare minimum
- Should have 100+ documents
- Quality > Quantity, but need both
---
## Conclusion
### Project Status
**β
Production-Ready System**
- 5 specialized AI agents
- Modern web UI with authentication
- RAG-powered recommendations
- Deployed on Railway
- < 4 second response time
- $0.0017 per recommendation
**Current Users:**
- Personal portfolio project
- Testing with 10-20 users
- 99.9% uptime
- Positive feedback
---
### Future Roadmap
**β
Recently Completed**
- [x] **Multi-turn conversations** - SessionStore with 30-minute timeout
- [x] **User query history** - Qdrant-based semantic search (384-dim vectors)
- [x] **Long-term memory** - Three collections (users, user_queries, user_feedback)
- [x] **Conversation Manager Agent** - 5th specialized agent for intelligent dialogues
- [x] **Personalized recommendations** - Based on user history and preferences
**Phase 1: Optimization (1-2 months)**
- [ ] Parallel agent execution (3.7Γ faster response time)
- [ ] Response caching with Redis (improve hit rate to 30-40%)
- [ ] Expand knowledge base to 100+ documents
- [ ] Fine-tune embeddings for tech domain
- [ ] Migrate SessionStore from in-memory to Redis for multi-instance support
**Phase 2: Features (2-3 months)**
- [ ] Comparison mode (compare 2 tech stacks side-by-side)
- [ ] Export to architecture diagrams (Mermaid, PlantUML)
- [ ] Historical trend analysis ("Show how my queries evolved")
- [ ] Technology recommendation confidence scores
- [ ] Integration with GitHub repos (analyze existing stack)
**Phase 3: Scale (3-6 months)**
- [ ] Horizontal scaling (multiple Railway instances with load balancer)
- [ ] Enterprise features (team workspaces, shared query history)
- [ ] API access for developers (REST API with rate limits)
- [ ] Premium tier ($10/month for unlimited queries)
- [ ] Advanced analytics dashboard
---
### Open Source Potential
**What's Ready:**
- β
Clean, documented codebase
- β
Comprehensive documentation (8 files)
- β
Working deployment configuration
- β
Example .env file
**What Needs Work:**
- [ ] Contributing guidelines
- [ ] Issue templates
- [ ] CI/CD pipeline (GitHub Actions)
- [ ] Docker Compose for local dev
**Licensing Considerations:**
- Custom License (non-commercial use allowed)
- **Commercial use requires license** - Contact for pricing
- Free for:
- Personal projects
- Educational purposes
- Non-profit organizations
- Open source contributions
- **Commercial use prohibited without written agreement** (hosting costs require compensation)
- Encourage non-commercial contributions
---
### Contact & Links
**Live Demo:** https://ranjana-tech-stack-advisor-production.up.railway.app
**Author:** Ranjana Rajendran
- GitHub: [@ranjanarajendran](https://github.com/ranjanarajendran)
- LinkedIn: [ranjana-rajendran](https://www.linkedin.com/in/ranjana-rajendran-9b3bb73)
- Email: ranjana.rajendran@gmail.com
**Tech Stack:**
- Backend: Python 3.11, FastAPI, LangGraph
- Frontend: HTML/CSS/JavaScript
- AI: Anthropic Claude (Haiku), sentence-transformers
- Database: Qdrant (vectors), SQLite (users)
- Deployment: Railway
**Repository:** (Private - available upon request)
---
**Built to learn. Deployed to production. Ready to scale.**