# Tech Stack Advisor - FastAPI Implementation
## โ
Completed: Production-Ready REST API
### Overview
A FastAPI-based REST API that exposes the multi-agent tech stack recommendation system with rate limiting, monitoring, and comprehensive error handling.
---
## ๐ Files Created
### 1. **API Models** (`backend/src/api/models.py`)
Pydantic models for request/response validation:
**Request Model:**
```python
class RecommendationRequest(BaseModel):
query: str # 10-1000 chars
dau: int | None # Optional override
budget_target: float | None # Optional budget
```
**Response Model:**
```python
class RecommendationResponse(BaseModel):
status: str # "success" or "error"
query: str
correlation_id: str
parsed_context: ParsedContext | None
recommendations: dict[str, Any] | None
error: str | None
```
**Monitoring Models:**
- `HealthResponse` - Service health status
- `MetricsResponse` - Usage statistics and cost tracking
---
### 2. **FastAPI Application** (`backend/src/api/main.py`)
Complete FastAPI application with:
- Lifespan management for orchestrator initialization
- Rate limiting with `slowapi`
- CORS middleware
- Request logging middleware
- Global exception handling
---
## ๐ API Endpoints
### **Public Endpoints**
#### **GET /** - Root / Web UI
Returns main web application interface
```bash
curl http://localhost:8000/
```
Serves the main HTML/CSS/JavaScript web application with authentication.
---
#### **GET /health** - Health Check
Service health and uptime
```bash
curl http://localhost:8000/health
```
**Response:**
```json
{
"status": "healthy",
"version": "0.1.0",
"agents_loaded": 4,
"uptime_seconds": 123.45
}
```
---
### **Authentication Endpoints**
#### **POST /auth/register** - User Registration
Create a new user account
```bash
curl -X POST http://localhost:8000/auth/register \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "securepassword123"
}'
```
**Response:**
```json
{
"message": "User registered successfully",
"email": "user@example.com"
}
```
---
#### **POST /auth/login** - User Login
Authenticate and receive JWT token
```bash
curl -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "user@example.com",
"password": "securepassword123"
}'
```
**Response:**
```json
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer"
}
```
---
#### **POST /auth/logout** - User Logout
Logout user (client-side token removal)
```bash
curl -X POST http://localhost:8000/auth/logout \
-H "Authorization: Bearer <token>"
```
---
#### **GET /auth/google/login** - Google OAuth
Initiate Google OAuth 2.0 flow
```bash
curl http://localhost:8000/auth/google/login
```
Redirects to Google login page.
---
#### **GET /auth/google/callback** - Google OAuth Callback
Handle Google OAuth callback (called by Google)
---
### **Protected Endpoints** (Require JWT Token)
#### **POST /recommend** - Get Recommendations
Main endpoint for tech stack recommendations (requires authentication)
**Request:**
```bash
curl -X POST http://localhost:8000/recommend \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-jwt-token>" \
-d '{
"query": "Building a real-time chat app for 100K daily users",
"dau": 100000,
"budget_target": 500
}'
```
**Response (Success):**
```json
{
"status": "success",
"query": "Building a real-time chat app for 100K daily users",
"correlation_id": "uuid-here",
"parsed_context": {
"dau": 100000,
"qps": 25,
"data_type": "structured",
"workload_type": "realtime",
"data_sensitivity": "medium",
"compliance": []
},
"recommendations": {
"database": {
"agent": "database",
"scale_info": {
"tier": "medium",
"cache_recommended": true
},
"recommendations": "..."
},
"infrastructure": { "..." },
"cost": { "..." },
"security": { "..." }
}
}
```
**Response (Error):**
```json
{
"status": "error",
"query": "...",
"correlation_id": "uuid",
"error": "Security agent error: ..."
}
```
---
#### **GET /metrics** - Usage Metrics (JSON)
Get usage statistics and cost tracking in JSON format (requires authentication)
```bash
curl http://localhost:8000/metrics \
-H "Authorization: Bearer <your-jwt-token>"
```
**Response:**
```json
{
"total_requests": 42,
"total_tokens": 15230,
"total_cost_usd": 0.0234,
"daily_queries": 12,
"daily_cost_usd": 0.0234,
"budget_remaining_usd": 1.9766
}
```
---
#### **GET /metrics/prometheus** - Prometheus Metrics
Get metrics in Prometheus format for monitoring systems like Grafana Cloud (requires authentication)
```bash
curl http://localhost:8000/metrics/prometheus \
-H "Authorization: Bearer <your-jwt-token>"
```
**Response** (text/plain in Prometheus format):
```
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{endpoint="/recommend",method="POST",status_code="200"} 42.0
http_requests_total{endpoint="/health",method="GET",status_code="200"} 15.0
# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{endpoint="/recommend",method="POST",le="0.005"} 0.0
http_request_duration_seconds_bucket{endpoint="/recommend",method="POST",le="0.01"} 0.0
http_request_duration_seconds_bucket{endpoint="/recommend",method="POST",le="0.025"} 2.0
http_request_duration_seconds_sum{endpoint="/recommend",method="POST"} 45.3
http_request_duration_seconds_count{endpoint="/recommend",method="POST"} 42.0
# HELP llm_tokens_total Total LLM tokens used
# TYPE llm_tokens_total counter
llm_tokens_total{agent="database",token_type="input"} 2340.0
llm_tokens_total{agent="database",token_type="output"} 1250.0
# HELP llm_cost_usd_total Total LLM cost in USD
# TYPE llm_cost_usd_total counter
llm_cost_usd_total{agent="database"} 0.0234
# HELP llm_daily_cost_usd Daily LLM cost in USD
# TYPE llm_daily_cost_usd gauge
llm_daily_cost_usd 0.0234
# HELP active_conversation_sessions Active conversation sessions
# TYPE active_conversation_sessions gauge
active_conversation_sessions 5.0
```
**Available Metrics:**
**HTTP Metrics:**
- `http_requests_total{method, endpoint, status_code}` - Total HTTP requests
- `http_request_duration_seconds{method, endpoint}` - Request duration histogram
**LLM Usage Metrics:**
- `llm_tokens_total{agent, token_type}` - Token usage by agent
- `llm_cost_usd_total{agent}` - Cumulative cost by agent
- `llm_requests_total{agent, status}` - LLM request count
- `llm_daily_tokens` - Daily token usage gauge
- `llm_daily_cost_usd` - Daily cost gauge
- `llm_daily_queries` - Daily query count gauge
**Application Metrics:**
- `active_conversation_sessions` - Active sessions count
- `user_registrations_total{oauth_provider}` - User registrations
- `user_logins_total{oauth_provider}` - User logins
- `recommendations_total{status, authenticated}` - Recommendations generated
**Use Cases:**
- Grafana Cloud monitoring (see [GRAFANA_CLOUD_SETUP.md](./GRAFANA_CLOUD_SETUP.md))
- Prometheus scraping
- Custom monitoring dashboards
- Cost tracking and alerting
---
#### **POST /feedback** - Submit Feedback
Submit user feedback (requires authentication)
```bash
curl -X POST http://localhost:8000/feedback \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-jwt-token>" \
-d '{
"message": "Great recommendations, very helpful!"
}'
```
---
### **Admin Endpoints** (Require Admin Role)
#### **GET /admin/users** - List All Users
View all registered users (admin only)
```bash
curl http://localhost:8000/admin/users \
-H "Authorization: Bearer <admin-jwt-token>"
```
**Response:**
```json
{
"users": [
{
"email": "user@example.com",
"role": "user",
"created_at": "2025-11-20T10:00:00Z"
}
]
}
```
---
#### **GET /admin/feedback** - View All Feedback
View all user feedback (admin only)
```bash
curl http://localhost:8000/admin/feedback \
-H "Authorization: Bearer <admin-jwt-token>"
```
**Response:**
```json
{
"feedback": [
{
"user": "user@example.com",
"message": "Great recommendations!",
"timestamp": "2025-11-20T10:00:00Z"
}
]
}
```
---
## ๐ก๏ธ Security Features
### 1. **Rate Limiting**
Protects against abuse with per-IP limits:
- **Demo mode** (no API key): 5 requests/hour per IP
- **Authenticated mode**: 50 requests/hour
Implemented with `slowapi`:
```python
@limiter.limit(settings.rate_limit_demo)
async def get_recommendation(request: Request, req: RecommendationRequest):
...
```
**Rate limit response:**
```json
{
"error": "Rate limit exceeded: 5 per 1 hour"
}
```
---
### 2. **Cost Controls**
Daily budget and query caps to prevent runaway costs:
```python
# In config.py
daily_budget_usd: float = 2.00
daily_query_cap: int = 100
```
**Budget exceeded response** (429):
```json
{
"detail": "Daily budget of $2.0 exceeded. Current cost: $2.15"
}
```
---
### 3. **Input Validation**
Pydantic automatically validates:
- Query length (10-1000 chars)
- Numeric ranges (DAU >= 0, budget >= 0)
- Required fields
**Validation error response** (422):
```json
{
"detail": [
{
"loc": ["body", "query"],
"msg": "ensure this value has at least 10 characters",
"type": "value_error.any_str.min_length"
}
]
}
```
---
### 4. **CORS**
CORS middleware enabled for browser access:
```python
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Configure for production
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
```
---
## ๐ Observability
### 1. **Structured Logging**
All requests logged with correlation IDs:
```json
{
"event": "http_request",
"method": "POST",
"path": "/recommend",
"status_code": 200,
"duration_ms": 2340
}
```
### 2. **Request Tracking**
Every recommendation gets a unique correlation ID for tracing through all agents.
### 3. **Usage Monitoring**
Built-in metrics endpoint tracks:
- Total requests
- Token consumption
- Cost accumulation
- Budget remaining
---
## ๐งช Testing Results
All endpoints passed comprehensive testing:
```
โ
GET / - Root endpoint
โ
GET /health - Health check
โ
GET /metrics - Usage metrics
โ
POST /recommend - Tech stack recommendations
โ
Rate limiting - 5 requests/hour per IP
โ
Input validation - Pydantic schema enforcement
```
**Test Performance:**
- Root endpoint: < 10ms
- Health check: < 10ms
- Metrics: < 10ms
- Recommendation: 660ms (with orchestrator)
---
## ๐ Running the API
### Local Development
```bash
cd /Users/admin/codeprojects/tech-stack-advisor
source .venv/bin/activate
# Run with uvicorn
python -m backend.src.api.main
# Or use uvicorn directly
uvicorn backend.src.api.main:app --reload --port 8000
```
Server starts on: `http://0.0.0.0:8000`
---
### Using Docker (Future)
```bash
# Build
docker build -t tech-stack-advisor .
# Run
docker run -p 8000:8000 \
-e ANTHROPIC_API_KEY=your-key \
-e QDRANT_URL=your-url \
tech-stack-advisor
```
---
## ๐ Interactive Documentation
FastAPI auto-generates interactive API docs:
- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
Both provide:
- Request/response schemas
- Try-it-out functionality
- Model definitions
- Authentication flows (future)
---
## ๐ง Configuration
All configurable via environment variables (`.env`):
```bash
# API Settings
API_HOST=0.0.0.0
API_PORT=8000
ENVIRONMENT=development
LOG_LEVEL=INFO
# Rate Limiting
RATE_LIMIT_DEMO=5/hour
RATE_LIMIT_AUTHENTICATED=50/hour
DAILY_QUERY_CAP=100
# Cost Controls
DAILY_BUDGET_USD=2.00
ALERT_EMAIL=your-email@example.com
# External Services
ANTHROPIC_API_KEY=sk-ant-xxxxx
QDRANT_URL=https://xxxxx.qdrant.io
QDRANT_API_KEY=xxxxx
```
---
## ๐ Performance Characteristics
### Latency
With real LLM calls (estimated):
- Parse query: 1-5ms
- Agent orchestration: 2-4 seconds
- Response formatting: 1-5ms
**Total: ~2-4 seconds per request**
### Throughput
Single instance:
- Rate limited to 5 req/hour (demo) or 50 req/hour (auth)
- Can handle ~200 req/hour without rate limiting
- Async processing allows concurrent requests
### Scalability
- **Horizontal**: Deploy multiple instances behind load balancer
- **Vertical**: Increase uvicorn workers
- **Caching**: Add Redis for repeated queries
---
## ๐ฎ Future Enhancements
### 1. **Authentication**
```python
from fastapi.security import HTTPBearer
security = HTTPBearer()
@app.post("/recommend")
async def get_recommendation(
credentials: HTTPAuthorizationCredentials = Depends(security)
):
# Verify API key
# Apply authenticated rate limit
```
### 2. **Response Caching**
```python
from fastapi_cache import FastAPICache
from fastapi_cache.backends.redis import RedisBackend
# Cache recommendations for 1 hour
@cache(expire=3600)
async def get_recommendation(...):
...
```
### 3. **Webhooks**
```python
@app.post("/recommend/async")
async def async_recommendation(
req: RecommendationRequest,
webhook_url: str
):
# Process in background
# POST result to webhook_url
```
### 4. **Streaming Responses**
```python
from fastapi.responses import StreamingResponse
@app.post("/recommend/stream")
async def stream_recommendation(...):
async def generate():
# Stream agent results as they complete
yield database_result
yield infrastructure_result
...
return StreamingResponse(generate())
```
### 5. **Prometheus Metrics** โ
IMPLEMENTED
Prometheus metrics are now available via `/metrics/prometheus` endpoint!
See documentation above for all available metrics including:
- HTTP request metrics (counter, histogram)
- LLM usage and cost tracking
- Application-level metrics
For setup guide, see [GRAFANA_CLOUD_SETUP.md](./GRAFANA_CLOUD_SETUP.md)
---
## ๐ API Usage Examples
### Python
```python
import httpx
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/recommend",
json={
"query": "Real-time chat app for 100K users",
"dau": 100_000,
}
)
result = response.json()
print(result["recommendations"])
```
### JavaScript
```javascript
const response = await fetch('http://localhost:8000/recommend', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'Real-time chat app for 100K users',
dau: 100000
})
});
const data = await response.json();
```
### cURL
```bash
curl -X POST http://localhost:8000/recommend \
-H "Content-Type: application/json" \
-d '{"query":"Real-time chat app for 100K users"}'
```
---
## โ
Summary
| Feature | Status | Implementation |
|---------|--------|----------------|
| REST API | โ
| FastAPI with async support |
| Rate Limiting | โ
| slowapi (5/hour demo) |
| Cost Controls | โ
| Daily budget + query cap |
| Validation | โ
| Pydantic models |
| Logging | โ
| Structured logs + correlation IDs |
| Monitoring | โ
| /health + /metrics endpoints |
| Prometheus Metrics | โ
| /metrics/prometheus endpoint |
| Documentation | โ
| Auto-generated Swagger/ReDoc |
| Testing | โ
| Comprehensive test suite |
| CORS | โ
| Middleware configured |
| Error Handling | โ
| Global exception handler |
---
**Status:** โ
FastAPI Implementation Complete
**Date:** 2025-11-20
**Server:** http://0.0.0.0:8000
**Docs:** http://localhost:8000/docs
**Next Steps:**
1. Set up Qdrant RAG for real knowledge retrieval
2. Build Streamlit frontend
3. Add authentication
4. Deploy with Docker
5. Set up CI/CD pipeline