Chapter 5: Setting Up Your Agent Stack

Building Production-Ready Agent Infrastructure

Overview

An autonomous agent is not a single model. It is a carefully composed stack of prompts, models, memory systems, tools, and orchestration logic. This chapter builds that stack from the ground up, emphasizing separation of concerns, governable boundaries, and production-ready abstractions.

Key Insight

Unlike traditional applications where behavior is encoded in deterministic logic, agent behavior emerges from how instructions, context, and constraints are framed. In production systems, prompts should be treated as versioned configuration rather than ad-hoc strings.

Agent Stack Architecture

The complete agent stack integrates multiple layers working together to produce governed, scalable agent behavior:

🎯 Prompt Engineering

Declarative prompt schemas with system/task separation and versioned templates

🤖 Model Selection

Multi-model routing with generalist and specialist models based on task requirements

🧠 Memory Architecture

Tri-level memory: episodic, semantic, and procedural with governed retrieval

🔍 Vector Databases

Hybrid search with metadata-driven routing and consistency guarantees

⚙️ Orchestration

Graph-based orchestration with state management and observability

🔧 Tool Integration

Federated tool servers with MCP and schema-driven service interfaces

Core Stack Components

1. Prompt Engineering & Template Systems

Prompting is the primary control surface of an agent. Production systems separate system prompts (role, constraints) from task prompts (objectives, context).

system_prompt:
  role: "planner"
  invariants:
    - cost_budget
    - tool_scope
    - safety_policy

task_prompt:
  objective: ${user_goal}
  context: ${retrieved_memory}
  expected_output: plan

Key Patterns:

  • Declarative Schemas: Treat prompts as structured, versioned artifacts
  • Pattern-Based Prompting: Reusable templates for planner, critic, router, executor
  • System vs Task Separation: Stable role definitions vs dynamic context

2. LLM Model Selection

Agent stacks benefit from intentional heterogeneity rather than reliance on a single model.

if task.type == "classification":
    model = lightweight_model
elif task.requires_reasoning:
    model = generalist_llm
else:
    model = fallback_model

Model Routing Patterns:

  • Generalist Models: Handle reasoning, planning, summarization
  • Specialist Models: Excel at classification, extraction, code generation
  • Dynamic Routing: Based on task intent, latency budget, cost constraints

3. Memory Architecture

Memory enables agents to improve over time. Production systems use layered memory with distinct lifecycles:

Tri-Level Memory Design

  • Episodic Memory: Concrete events, interactions, execution outcomes (write-heavy)
  • Semantic Memory: Stabilized knowledge, facts, documents, embeddings (retrieval-optimized)
  • Procedural Memory: Executable know-how, skills, workflows (callable modules)
episodic.log(event, outcome)
summary = summarize(episodic_window)
semantic.store(summary)

if plan.requires_skill:
    procedural.invoke(skill_id)

Critical Design Principle

Mixing short-term and long-term memory without clear boundaries leads to context overflow (stale information overload) and behavioral drift (deviation from intended behavior).

4. Vector & Graph Databases

Hybrid retrieval patterns balance semantic similarity with policy constraints:

# Step 1: Vector-based candidate retrieval
candidates = vector_db.search(
    query_embedding,
    top_k=50
)

# Step 2: Graph-based constraint filtering
filtered = graph_db.filter(
    nodes=candidates,
    constraints={
        "jurisdiction": "EU",
        "document_status": "approved",
        "policy_version": ">= v3"
    }
)

# Step 3: Ranked handoff to agent
results = rank(filtered)
return results

Key Considerations:

  • Metadata-Driven Routing: Domain scoping, versioning, trust boundaries
  • Consistency & Freshness: Time-aware metadata and controlled reindexing
  • Hybrid Search: Combine keyword + vector + graph constraints

5. Orchestration Runtimes

Modern orchestration defines how agent logic is composed, executed, and observed:

node(plan)
node(retrieve)
node(execute)
node(evaluate)

plan -> retrieve
retrieve -> execute
execute -> evaluate
evaluate -> execute  # retry loop

Key Frameworks:

  • LangChain: Chain-based composition, standardized abstractions
  • CrewAI: Role-based collaboration and delegation
  • AutoGen: Conversational agent coordination
  • LangGraph: Graph-based control flow with branching and state

Production Requirements

  • Structured logs of prompts, tools, outputs
  • Execution traces and version pinning
  • Partial or full replay capability
  • Cost-aware and budgeted execution

6. Tool Use, Function Calling & MCP

Federated tool servers with explicit contracts enable scalable, governed tool integration:

  • Independent Deployment: Tools versioned and deployed separately
  • Centralized Authentication: Credential management via service gateways
  • Policy Enforcement: Controls through existing infrastructure
  • Clear Ownership: Lifecycle management and responsibility

Multi-Component Pipelines (MCP)

MCP extends tool federation to expose entire workflows as callable, externally governed components with schema-driven interfaces (JSON Schema/OpenAPI).

Key Design Patterns

Separation of Concerns

Clear boundaries between prompts, models, memory, tools, and orchestration enable independent evolution and testing.

Versioned Configuration

Treat prompts, schemas, and policies as versioned artifacts in version control with audit trails.

Governed Retrieval

Memory access mediated by orchestration logic, not wholesale context injection, prevents drift.

Hybrid Memory Topologies

Combine vector, graph, and structured storage with appropriate consistency models for each layer.

Observable Execution

Structured logging, execution traces, and replay capabilities enable debugging and auditing.

Cost-Aware Operations

Budget tracking for tokens, tool calls, and latency with explicit enforcement mechanisms.

Why This Matters

Building the agent stack correctly enables:

The agent stack is not about maximizing model capability. It is about creating a production system that is observable, governable, and maintainable at enterprise scale.