LLM Mastery for Enterprise AI Engineering / Advanced Track Module 3 / 5

LLM Mastery for Enterprise AI Engineering Advanced ⏱ 75 min

DEVQABAPMEXEC

Real-World Skills and Capstone

Build usable AI products and complete the enterprise compliance automation capstone.

How to Use This Lesson

Start with the user problem, then map the pattern to architecture and failure modes.
If a code or design example is included, change one assumption and reason through the impact.
Use role callouts, checklists, and Q&A sections as implementation or interview prep notes.

Prerequisites: Evaluation and Release Gates

Free · email to track progress

LLM Mastery for Enterprise AI Engineering

Free subscriber access. Enter your email to unlock all 18 modules, track your progress, and export your enterprise AI readiness packet.

Foundation to Advanced — tokens and transformers to deployment readiness and enterprise governance.
12 enterprise deliverables — data cards, eval reports, deployment reviews, governance packets.
Browser-local progress — your completion data stays private, no account needed.

LLM Mastery course page. This lesson is part 3 of 5 in the advanced track. Use the lab and assessment sections as the completion standard, not optional reading.

Required mastery artifact: by the end of this lesson, update the running enterprise readiness packet for a realistic use case. Treat examples and vendor names as dated illustrations; defend decisions with current model, cost, risk, and evaluation evidence.

Module 11 — Real-World Skills

Building things people actually use: chatbots, copilots, automation, SaaS products, coding workflows, orchestration systems, and AI product thinking.

01 — Building Chatbots

What Makes a Good Chatbot vs a Bad One

Bad chatbot: Answers questions. Forgets immediately. No personality. No purpose.

Good chatbot: Has a defined role, remembers context, handles edge cases gracefully, knows when to escalate, measures its own performance.

The Production Chatbot Stack

# production_chatbot.py
import anthropic
import json
from datetime import datetime
from typing import Optional

client = anthropic.Anthropic()

class ProductionChatbot:
    """
    Production-ready chatbot with:
    - Role definition via system prompt
    - Conversation memory (last N turns)
    - Tool use support
    - Error handling and fallbacks
    - Response logging
    """

    def __init__(
        self,
        name: str,
        system_prompt: str,
        model: str = "claude-haiku-4-5-20251001",
        max_history_turns: int = 10,
        tools: Optional[list] = None
    ):
        self.name = name
        self.system_prompt = system_prompt
        self.model = model
        self.max_history_turns = max_history_turns
        self.tools = tools or []
        self.conversation_history = []
        self.session_id = datetime.now().strftime("%Y%m%d_%H%M%S")

    def chat(self, user_message: str) -> str:
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })

        # Trim history if too long (keep last N turns)
        if len(self.conversation_history) > self.max_history_turns * 2:
            self.conversation_history = self.conversation_history[-(self.max_history_turns * 2):]

        # Build API call
        api_kwargs = {
            "model": self.model,
            "max_tokens": 1024,
            "system": self.system_prompt,
            "messages": self.conversation_history
        }
        if self.tools:
            api_kwargs["tools"] = self.tools

        try:
            response = client.messages.create(**api_kwargs)

            # Handle tool use
            while response.stop_reason == "tool_use":
                tool_results = self._process_tools(response.content)
                self.conversation_history.append({"role": "assistant", "content": response.content})
                self.conversation_history.append({"role": "user", "content": tool_results})
                response = client.messages.create(**api_kwargs)

            assistant_message = response.content[0].text

            # Add to history
            self.conversation_history.append({
                "role": "assistant",
                "content": assistant_message
            })

            # Log (in production: write to database)
            self._log(user_message, assistant_message)

            return assistant_message

        except anthropic.APIError as e:
            fallback = "I'm experiencing a technical issue. Please try again in a moment."
            print(f"API Error in session {self.session_id}: {e}")
            return fallback

    def _process_tools(self, content_blocks: list) -> list:
        """Override this method to implement your tools"""
        results = []
        for block in content_blocks:
            if block.type == "tool_use":
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"Tool {block.name} not implemented"
                })
        return results

    def _log(self, user_msg: str, assistant_msg: str):
        """Log conversation turn (write to DB in production)"""
        log_entry = {
            "session_id": self.session_id,
            "timestamp": datetime.now().isoformat(),
            "user": user_msg[:200],  # Truncate for logs
            "assistant": assistant_msg[:200],
        }
        # print(json.dumps(log_entry))  # Or write to database

    def reset(self):
        """Clear conversation history"""
        self.conversation_history = []

# =========================================
# Example: Compliance Chatbot
# =========================================

COMPLIANCE_SYSTEM = """You are ComplianceBot, an AI assistant for Fiserv's regulatory compliance team.

SCOPE: EU financial regulations — GDPR, PSD2, MiFID II, DORA, Basel III, AML/KYC.

BEHAVIOR:
- Cite specific regulation articles (e.g., "GDPR Article 17")
- Express uncertainty when needed: "Based on my understanding, you should verify with legal counsel"
- Decline off-topic requests: "I specialize in financial compliance. Please use a general assistant for other topics."
- Never give binding legal advice

OUTPUT FORMAT:
- Short answers: 2-3 sentences
- Complex questions: structured markdown with headers
- Always end advice with: "⚠️ Confirm with your legal team before implementing."

PERSONALITY: Professional, precise, helpful. Not robotic."""

# Create and run the chatbot
compliance_bot = ProductionChatbot(
    name="ComplianceBot",
    system_prompt=COMPLIANCE_SYSTEM,
    model="claude-haiku-4-5-20251001",
    max_history_turns=15
)

# Interactive conversation
def run_cli_chatbot(bot: ProductionChatbot):
    print(f"\n{'='*50}")
    print(f" {bot.name} — Type 'quit' to exit, 'reset' to clear history")
    print(f"{'='*50}\n")

    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue
        if user_input.lower() == "quit":
            break
        if user_input.lower() == "reset":
            bot.reset()
            print("[History cleared]\n")
            continue

        response = bot.chat(user_input)
        print(f"\n{bot.name}: {response}\n")

# Uncomment to run interactively:
# run_cli_chatbot(compliance_bot)

# Test without interaction
response = compliance_bot.chat("What are GDPR's requirements for data breach notification?")
print(f"Bot: {response}")

Chatbot Anti-Patterns to Avoid

Anti-Pattern	Problem	Fix
No system prompt	Random personality, inconsistent	Define role and constraints
Infinite context	Costs grow unbounded	Limit to last N turns
No error handling	Crashes on API errors	Fallback responses
No guardrails	Says anything	Scope restrictions in system prompt
Overlong responses	Feels like a report, not a chat	Explicit length guidance
No logging	Can’t debug or improve	Log every turn

02 — AI Copilots

What is a Copilot?

A copilot is embedded AI that assists humans in their existing workflow — without replacing them.

The human stays in control. The AI suggests, drafts, and analyzes. The human decides and acts.

Copilot Design Patterns

Pattern 1: In-Line Suggestions

# As user types a clause, copilot analyzes it in real-time
def analyze_contract_clause_realtime(clause: str) -> dict:
    """Called on every paragraph update — must be fast"""

    if len(clause.strip()) < 50:
        return {}  # Too short to analyze

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Fast model for real-time
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"""Quick compliance check for this contract clause.
Return JSON only: {{"risk": "low/medium/high", "issue": "brief issue or null", "suggestion": "brief fix or null"}}

Clause: {clause}"""
        }]
    )

    try:
        return json.loads(response.content[0].text)
    except:
        return {}

Pattern 2: On-Demand Analysis

# Button in UI triggers comprehensive analysis
def comprehensive_document_review(document_text: str) -> dict:
    """Full analysis when user clicks 'Review' — can take longer"""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        system="You are a senior compliance counsel reviewing documents.",
        messages=[{
            "role": "user",
            "content": f"""Perform a full compliance review of this document.

Document:
{document_text}

Analyze for:
1. GDPR compliance issues
2. PSD2 implications
3. MiFID II requirements
4. General contractual risks

Return structured JSON:
{{
  "overall_risk": "low/medium/high/critical",
  "gdpr_issues": [{{"article": "...", "issue": "...", "severity": "...", "fix": "..."}}],
  "psd2_issues": [...],
  "mifid_issues": [...],
  "general_risks": [...],
  "recommended_actions": ["list"],
  "needs_legal_review": true/false
}}"""
        }]
    )

    try:
        return json.loads(response.content[0].text)
    except:
        return {"raw_analysis": response.content[0].text}

Pattern 3: Response Drafting

# Customer service copilot: suggests responses to agents
def suggest_response(customer_message: str, context: dict) -> list[str]:
    """Generate 3 response options for the human agent to choose from"""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=800,
        system="""You are helping a customer service agent draft responses.
Generate 3 different response options: formal, friendly, and brief.""",
        messages=[{
            "role": "user",
            "content": f"""Customer message: {customer_message}

Context: {json.dumps(context)}

Generate 3 response options in JSON:
{{"formal": "...", "friendly": "...", "brief": "..."}}"""
        }]
    )

    try:
        options = json.loads(response.content[0].text)
        return [options["formal"], options["friendly"], options["brief"]]
    except:
        return [response.content[0].text]

03 — AI Automation

Three Levels of AI Automation

Level 1: Single-Step Automation

One LLM call replaces a manual task:

# Manual: Person reads document, writes summary
# Automated: LLM reads, summarizes, saves

def auto_summarize_and_save(document_path: str, output_path: str):
    with open(document_path) as f:
        content = f.read()

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=500,
        messages=[{"role": "user", "content": f"Summarize this compliance document in bullet points:\n\n{content}"}]
    )

    summary = response.content[0].text
    with open(output_path, "w") as f:
        f.write(summary)

    print(f"Saved summary to {output_path}")

Level 2: Pipeline Automation

Multiple LLM steps, each transforming data:

def compliance_pipeline(document: str) -> dict:
    # Step 1: Extract → Step 2: Classify → Step 3: Assess → Step 4: Report
    extracted = extract_obligations(document)
    classified = classify_by_regulation(extracted)
    assessed = assess_risk(classified)
    report = generate_report(assessed)
    return {"report": report, "risk": assessed}

Level 3: Agentic Automation

LLM decides what steps to take:

def agentic_compliance_audit(company_name: str):
    """Autonomously research, analyze, and report compliance status"""
    # Agent decides: search web → fetch regulations → analyze gaps → write report
    return compliance_agent.run(f"Perform a compliance gap analysis for {company_name}")

Batch Automation with Claude

import anthropic
import json

client = anthropic.Anthropic()

# Process 1000 documents overnight at 50% discount
def batch_process_documents(documents: list[dict]) -> str:
    """Use Anthropic batch API for cost-efficient bulk processing"""

    batch_requests = []
    for i, doc in enumerate(documents):
        batch_requests.append({
            "custom_id": f"doc-{i:04d}",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 300,
                "messages": [{
                    "role": "user",
                    "content": f"""Extract compliance obligations from this text.
Return JSON: {{"obligations": ["list"], "regulation": "most relevant regulation", "risk": "low/medium/high"}}

Text: {doc['content'][:2000]}"""
                }]
            }
        })

    # Submit batch
    batch = client.messages.batches.create(requests=batch_requests)
    print(f"Batch submitted: {batch.id}")
    print(f"Processing {len(batch_requests)} documents...")
    return batch.id

def retrieve_batch_results(batch_id: str) -> list:
    """Retrieve completed batch results"""
    import time

    while True:
        batch = client.messages.batches.retrieve(batch_id)
        print(f"Status: {batch.processing_status} | "
              f"Complete: {batch.request_counts.succeeded}/{batch.request_counts.processing + batch.request_counts.succeeded}")

        if batch.processing_status == "ended":
            break
        time.sleep(30)

    results = []
    for result in client.messages.batches.results(batch_id):
        if result.result.type == "succeeded":
            try:
                data = json.loads(result.result.message.content[0].text)
                results.append({"id": result.custom_id, "data": data})
            except:
                results.append({"id": result.custom_id, "error": "parse_failed"})

    return results

04 — AI SaaS Workflows

Building AI-Powered Products

A minimal viable AI SaaS product needs:

1. User Authentication
2. LLM API integration
3. Usage tracking (token counting)
4. Rate limiting (prevent abuse)
5. Cost management (per-user limits)
6. Prompt management (versioned, tested prompts)
7. Output storage (save generated content)
8. Evaluation hooks (measure quality)

Minimal AI SaaS Architecture

# ai_saas_core.py

import anthropic
from datetime import datetime
import sqlite3
import hashlib

client = anthropic.Anthropic()

# Database setup
def init_db():
    conn = sqlite3.connect("ai_saas.db")
    conn.execute("""CREATE TABLE IF NOT EXISTS users (
        id TEXT PRIMARY KEY, api_key TEXT, plan TEXT,
        monthly_token_limit INTEGER, tokens_used INTEGER DEFAULT 0,
        created_at TEXT)""")
    conn.execute("""CREATE TABLE IF NOT EXISTS usage_log (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        user_id TEXT, prompt TEXT, response TEXT,
        input_tokens INTEGER, output_tokens INTEGER,
        model TEXT, cost_usd REAL, timestamp TEXT)""")
    conn.commit()
    return conn

db = init_db()

class AISaaSService:

    PLANS = {
        "free": {"monthly_tokens": 100_000, "models": ["claude-haiku-4-5-20251001"]},
        "starter": {"monthly_tokens": 1_000_000, "models": ["claude-haiku-4-5-20251001", "claude-sonnet-4-20250514"]},
        "pro": {"monthly_tokens": 10_000_000, "models": ["claude-haiku-4-5-20251001", "claude-sonnet-4-20250514", "claude-opus-4"]},
    }

    TOKEN_PRICES = {
        "claude-haiku-4-5-20251001": {"input": 0.25/1e6, "output": 1.25/1e6},
        "claude-sonnet-4-20250514": {"input": 3.0/1e6, "output": 15.0/1e6},
    }

    def generate(self, user_id: str, prompt: str, model: str = "claude-haiku-4-5-20251001",
                 max_tokens: int = 500, system: str = "") -> dict:

        # 1. Get user
        user = db.execute("SELECT * FROM users WHERE id=?", (user_id,)).fetchone()
        if not user:
            return {"error": "User not found"}

        _, _, plan, token_limit, tokens_used, _ = user

        # 2. Check plan model access
        if model not in self.PLANS.get(plan, {}).get("models", []):
            return {"error": f"Model {model} not available on {plan} plan"}

        # 3. Check token budget
        estimated_tokens = len(prompt.split()) + max_tokens
        if tokens_used + estimated_tokens > token_limit:
            return {"error": "Monthly token limit reached. Please upgrade your plan."}

        # 4. Generate
        messages = [{"role": "user", "content": prompt}]
        kwargs = {"model": model, "max_tokens": max_tokens, "messages": messages}
        if system:
            kwargs["system"] = system

        response = client.messages.create(**kwargs)
        output_text = response.content[0].text

        # 5. Track usage
        input_tokens = response.usage.input_tokens
        output_tokens = response.usage.output_tokens
        price = self.TOKEN_PRICES.get(model, {"input": 0, "output": 0})
        cost = input_tokens * price["input"] + output_tokens * price["output"]

        db.execute("""INSERT INTO usage_log
            (user_id, prompt, response, input_tokens, output_tokens, model, cost_usd, timestamp)
            VALUES (?,?,?,?,?,?,?,?)""",
            (user_id, prompt[:500], output_text[:500],
             input_tokens, output_tokens, model, cost, datetime.now().isoformat()))

        db.execute("UPDATE users SET tokens_used = tokens_used + ? WHERE id = ?",
                   (input_tokens + output_tokens, user_id))
        db.commit()

        return {
            "text": output_text,
            "usage": {"input": input_tokens, "output": output_tokens},
            "cost_usd": round(cost, 6)
        }

    def get_usage_stats(self, user_id: str) -> dict:
        user = db.execute("SELECT plan, monthly_token_limit, tokens_used FROM users WHERE id=?",
                         (user_id,)).fetchone()
        if not user:
            return {"error": "User not found"}
        plan, limit, used = user
        return {
            "plan": plan,
            "tokens_used": used,
            "token_limit": limit,
            "usage_pct": round(used / limit * 100, 1),
            "remaining": limit - used
        }

05 — AI Coding Workflows

LLMs in Your Development Workflow

The best developers use AI throughout the development process:

Code Generation

def generate_code_from_spec(spec: str, language: str = "python") -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        system=f"""You are an expert {language} developer.
Write production-quality code: typed, documented, with error handling.
Include only code, no explanation.""",
        messages=[{"role": "user", "content": f"Implement this specification:\n\n{spec}"}]
    )
    return response.content[0].text

Automated Code Review

def automated_code_review(code: str, language: str = "python") -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": f"""Review this {language} code. Return JSON:
{{
  "rating": 1-10,
  "critical": [{{"line": "...", "issue": "...", "fix": "..."}}],
  "warnings": ["..."],
  "positives": ["..."],
  "improved_code": "full corrected version"
}}

Code:
```{language}
{code}
```"""
        }]
    )
    try:
        return json.loads(response.content[0].text)
    except:
        return {"raw": response.content[0].text}

Test Generation

def generate_tests(function_code: str, language: str = "python") -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        system=f"Write comprehensive {language} unit tests. Cover happy path, edge cases, and error cases.",
        messages=[{"role": "user", "content": f"Write tests for:\n\n```{language}\n{function_code}\n```"}]
    )
    return response.content[0].text

Documentation Generation

def generate_docs(code: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"""Generate complete documentation for this code.
Include: purpose, parameters, return values, examples, error handling.

```python
{code}
```"""
        }]
    )
    return response.content[0].text

CI/CD Integration

# .github/workflows/ai_review.yml
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changed
        run: |
          git diff --name-only origin/main...HEAD > changed_files.txt
          cat changed_files.txt

      - name: AI Code Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          python3 << 'EOF'
          import anthropic, subprocess, os

          client = anthropic.Anthropic()

          with open("changed_files.txt") as f:
              files = [l.strip() for l in f if l.strip().endswith(".py")]

          for filepath in files[:5]:  # Review up to 5 files
              try:
                  with open(filepath) as f:
                      code = f.read()
              except:
                  continue

              resp = client.messages.create(
                  model="claude-haiku-4-5-20251001",
                  max_tokens=500,
                  messages=[{
                      "role": "user",
                      "content": f"Quick review of {filepath}. Flag only critical issues (bugs, security, data leaks). Max 5 bullet points.\n\n{code[:3000]}"
                  }]
              )
              print(f"\n## AI Review: {filepath}")
              print(resp.content[0].text)
          EOF

06 — AI Orchestration Systems

What is AI Orchestration?

Orchestration is coordinating multiple AI calls, tools, and services to accomplish complex goals.

Key components:

Router: Decides which agent/model handles a request
Planner: Breaks goals into subtasks
Executor: Runs each subtask
Memory: Passes state between steps
Evaluator: Checks output quality

Simple Orchestration with Claude

class ComplianceOrchestrationSystem:
    """
    Orchestrates multiple AI components for compliance automation:
    - Document ingestion
    - Obligation extraction
    - Risk assessment
    - Report generation
    - Notification routing
    """

    def __init__(self):
        self.client = anthropic.Anthropic()

    def _call_model(self, system: str, prompt: str, model="claude-haiku-4-5-20251001",
                    max_tokens=500, expect_json=False) -> str:
        resp = self.client.messages.create(
            model=model,
            max_tokens=max_tokens,
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )
        text = resp.content[0].text
        if expect_json:
            try:
                return json.loads(text)
            except:
                return {}
        return text

    def process_regulatory_update(self, regulation_text: str, regulation_name: str) -> dict:
        """Full orchestration pipeline for a new regulatory document"""

        print(f"\n📋 Processing: {regulation_name}")

        # Step 1: Extract key obligations
        print("  1/5 Extracting obligations...")
        obligations = self._call_model(
            system="Expert regulatory analyst. Extract specific compliance obligations.",
            prompt=f"Extract all compliance obligations from this {regulation_name} text as a JSON list. Each item: {{\"obligation\": \"...\", \"deadline\": \"...\", \"applies_to\": \"...\"}}\n\n{regulation_text[:3000]}",
            model="claude-sonnet-4-20250514",
            max_tokens=800,
            expect_json=True
        )

        # Step 2: Classify by impact
        print("  2/5 Classifying impact...")
        impact = self._call_model(
            system="Compliance risk assessor for a payment services company.",
            prompt=f"Classify these obligations by impact on a payment services company. Return JSON: {{\"high_impact\": [...], \"medium_impact\": [...], \"low_impact\": [...]}}\n\nObligations: {json.dumps(obligations)[:1500]}",
            max_tokens=600,
            expect_json=True
        )

        # Step 3: Identify gaps (compare to known controls)
        print("  3/5 Identifying gaps...")
        known_controls = ["KYC process", "GDPR DPO appointed", "SCA implemented", "AML monitoring active"]
        gaps = self._call_model(
            system="Compliance gap analyst.",
            prompt=f"Given these existing controls: {known_controls}\n\nAnd these new obligations: {json.dumps(impact.get('high_impact', []))}\n\nIdentify compliance gaps. Return JSON list of gaps.",
            model="claude-sonnet-4-20250514",
            max_tokens=600,
            expect_json=True
        )

        # Step 4: Generate action plan
        print("  4/5 Generating action plan...")
        action_plan = self._call_model(
            system="Compliance program manager. Create actionable implementation plans.",
            prompt=f"Create an action plan to address these compliance gaps. Include owner, timeline, and resources.\nGaps: {json.dumps(gaps)[:1000]}\nReturn JSON: {{\"actions\": [{{\"action\": \"...\", \"owner\": \"...\", \"deadline_days\": N, \"priority\": \"high/medium/low\"}}]}}",
            model="claude-sonnet-4-20250514",
            max_tokens=800,
            expect_json=True
        )

        # Step 5: Generate executive summary
        print("  5/5 Writing executive summary...")
        summary = self._call_model(
            system="Executive communications specialist. Write clear, concise briefings for senior management.",
            prompt=f"""Write a 3-paragraph executive summary of this regulatory update:
Regulation: {regulation_name}
Key obligations found: {len(obligations) if isinstance(obligations, list) else 'multiple'}
High-impact items: {len(impact.get('high_impact', [])) if isinstance(impact, dict) else 'several'}
Gaps identified: {len(gaps) if isinstance(gaps, list) else 'several'}
Actions required: {len(action_plan.get('actions', [])) if isinstance(action_plan, dict) else 'multiple'}""",
            model="claude-sonnet-4-20250514",
            max_tokens=600
        )

        result = {
            "regulation": regulation_name,
            "obligations_extracted": obligations,
            "impact_classification": impact,
            "gaps_identified": gaps,
            "action_plan": action_plan,
            "executive_summary": summary,
            "processed_at": datetime.now().isoformat()
        }

        print(f"\n✅ Processing complete for {regulation_name}")
        return result

# Usage
system = ComplianceOrchestrationSystem()

sample_regulation = """
DORA Article 17: ICT-related incidents
Financial entities shall establish, implement and maintain a management process to detect, manage and notify ICT-related incidents.
Financial entities shall classify ICT-related incidents and shall determine their impact based on the following criteria:
(a) the number of clients or financial counterparts affected;
(b) the duration of the ICT-related incident;
(c) the geographical spread with regard to the areas affected by the ICT-related incident;
(d) the data losses that the ICT-related incident entails, in relation to availability, authenticity, integrity or confidentiality of data;
(e) the criticality of the services affected;
(f) the economic impact, in particular direct and indirect costs and losses.
"""

result = system.process_regulatory_update(sample_regulation, "DORA Article 17")
print(f"\nExecutive Summary:\n{result['executive_summary']}")

07 — AI Product Thinking

From Engineer to AI Product Builder

Technical skill is necessary but not sufficient. The best AI engineers also think like product managers:

The AI Product Canvas

Before building anything, answer these questions:

WHO IS THE USER?
  - Who uses this? (Compliance officer? Developer? End consumer?)
  - What is their technical level?
  - What do they care about most?

WHAT IS THE CORE JOB-TO-BE-DONE?
  - What task does this replace or augment?
  - What does success look like for them?
  - How do they measure value?

WHERE DOES AI ADD GENUINE VALUE?
  - What's currently slow, expensive, or error-prone?
  - What would take humans hours that AI can do in seconds?
  - What is the quality bar? (Good enough? Or needs to be perfect?)

WHAT ARE THE FAILURE MODES?
  - What happens when the AI is wrong? Is it recoverable?
  - Who is harmed if quality degrades?
  - What safeguards prevent bad outputs reaching users?

WHAT IS THE BUSINESS MODEL?
  - API cost per user action
  - Pricing strategy (subscription? per-use? per-seat?)
  - Break-even point

HOW DO YOU MEASURE SUCCESS?
  - Accuracy/quality metrics
  - User adoption and retention
  - Cost per interaction
  - Time saved vs baseline

Common AI Product Failure Modes

Failure	Root Cause	Prevention
”It hallucinates too much”	Wrong model for task, no RAG	Use RAG for factual tasks
”Users don’t trust it”	No transparency, no sources	Show citations, explain confidence
”Too slow”	Model too large, no caching	Right-size model, add caching
”Too expensive to scale”	Overengineered, wrong model	Start cheap, upgrade only where needed
”Nobody uses it”	Solves wrong problem	Talk to users first, build later
”Quality degrades over time”	No eval pipeline	Automated evals in CI/CD

The Right Model for the Right Task

# AI Product Model Router — match task to model economically
class ProductModelRouter:

    def route(self, task_type: str, content: str, quality_required: str = "good") -> str:
        """
        Route to cheapest model that meets quality requirements.
        quality_required: "fast", "good", "best"
        """

        # Fast/cheap for simple classification and extraction
        if task_type in ["classify", "extract_keywords", "yes_no_question", "summarize_short"]:
            return "claude-haiku-4-5-20251001"

        # Medium quality for analysis and drafting
        if task_type in ["analyze", "draft", "compare", "summarize_long"]:
            if quality_required == "fast":
                return "claude-haiku-4-5-20251001"
            return "claude-sonnet-4-20250514"

        # Best quality for complex reasoning
        if task_type in ["complex_reasoning", "legal_analysis", "architecture_design"]:
            return "claude-sonnet-4-20250514"

        # Default: Sonnet (good balance)
        return "claude-sonnet-4-20250514"

router = ProductModelRouter()

# A compliance platform might use:
print(router.route("classify", "document text"))          # haiku = cheap
print(router.route("analyze", "contract text"))           # sonnet = good
print(router.route("complex_reasoning", "architecture"))  # sonnet = best available

Building Toward the FDE Role

For a Forward Deployed Engineer at Anthropic or OpenAI, demonstrate:

Technical Depth

Fine-tuned a model end-to-end (QLoRA → evaluation → deployment)
Built a RAG system with proper chunking, retrieval, and evaluation
Implemented multi-agent workflows with tool use
Set up observability (OpenTelemetry traces, evaluation dashboards)

Domain Expertise

Applied AI to a real business problem (compliance automation)
Understand regulatory requirements (GDPR, PSD2, DORA, Basel III)
Know where AI fails and how to mitigate it in high-stakes domains

Product Thinking

Built something users actually use
Measured quality systematically
Wrote clear technical documentation

Communication

Published technical writing (blog posts, GitHub)
Can explain complex concepts in plain language
Gives internal tech talks (you already do this at Fiserv)

📝 Module 11 Summary

Skill	Key Takeaway
Chatbots	System prompt + conversation history + error handling + logging
Copilots	AI assists human workflows without replacing human judgment
AI Automation	3 levels: single-step, pipeline, agentic — match to use case
AI SaaS	Track usage, enforce limits, manage cost, version prompts
AI Coding	Code gen, review, tests, docs — use AI throughout the SDLC
Orchestration	Coordinate multiple AI components for complex workflows
Product Thinking	Right model, right task, measure quality, manage cost

🧠 Mental Model

Building AI products is like being an architect. You don’t pour concrete yourself — you design the system that works. Pick the right materials (models), design the right structure (prompts, agents, RAG), measure what matters (evals), and make it affordable at scale (cost analysis). The building is the product. The architect is you.

❌ Final Beginner Mistakes

Over-engineering before validating — Build a 1-prompt MVP first. Does it solve the problem?
Ignoring hallucinations in production — Add grounding, citations, and validation for factual tasks
No human fallback — Always have a way to escalate to humans for critical decisions
Single model for everything — Route tasks to the right model by complexity and cost
No monitoring — You can’t improve what you don’t measure
Skipping evals — Build your eval suite first, before you build the product

🏋️ Final Capstone Exercise

Build an enterprise-ready compliance automation product.

The prototype below is the starting point, not the finish line. For enterprise completion, submit an implementation packet that proves the system can be reviewed, measured, and operated.

Capstone Brief

Build a compliance document processor that ingests regulatory text, extracts obligations, classifies risk, recommends actions, writes an executive summary, and produces evaluation evidence.

Required users:

Compliance analyst reviewing regulatory obligations.
Engineering owner responsible for implementation and operations.
Risk/security reviewer approving whether the workflow can run on enterprise data.

Required deliverables:

Deliverable	Required contents
Use-case brief	User, business value, data classification, risk tier, non-goals
Architecture	Data flow, model calls, RAG/agent decisions, access boundaries, fallback path
Implementation	Runnable code or notebook, setup instructions, sample inputs, structured outputs
Evaluation	Baseline, locked test set, quality metrics, safety/privacy cases, release threshold
Governance packet	Data card, model inventory entry, human oversight plan, approval checklist
Security controls	Identity assumption, RBAC/ABAC plan, secrets handling, logging/redaction policy
Operations	SLOs, monitoring signals, incident runbook, rollback plan, change record
Demo script	5-10 minute walkthrough with success case, failure case, and release decision

Acceptance Criteria

The capstone passes only if:

The workflow returns structured JSON for obligations, risk, actions, summary, and metadata.
The system refuses or escalates when the document is outside scope or too risky.
The evaluation suite compares the capstone against a baseline prompt or previous version.
At least 5 failure cases are documented with severity and remediation.
Prompt/response logging is privacy-safe by default.
Human review is required before high-risk recommendations become actions.
The release decision is explicit: approve, approve with conditions, or block.

Capstone Rubric

Score out of 100:

Category	Points
Use-case framing	10
Architecture and access boundaries	15
Working implementation	15
Evaluation and failure analysis	15
Governance packet	15
Security and privacy controls	10
Operations and rollback	10
Demo and communication	10

Enterprise-ready completion requires 85+.

Starter Implementation

"""
CAPSTONE: Compliance Document Processor

Features to implement:
1. Document ingestion (text input)
2. Obligation extraction (SFT-style prompting)
3. Risk classification (few-shot prompting)
4. Action recommendations (chain-of-thought)
5. Executive summary (output formatting)
6. Evaluation (LLM-as-judge)
7. Cost tracking (token counting)

This demonstrates: prompting, pipelines, evaluation, and product thinking.
"""

import anthropic
import json
import time

client = anthropic.Anthropic()

def process_compliance_document(document: str, document_name: str) -> dict:
    total_tokens = {"input": 0, "output": 0}
    start_time = time.time()

    def call(prompt: str, system: str = "", model="claude-haiku-4-5-20251001", max_tokens=500) -> str:
        resp = client.messages.create(
            model=model, max_tokens=max_tokens,
            system=system or "You are a compliance expert.",
            messages=[{"role": "user", "content": prompt}]
        )
        total_tokens["input"] += resp.usage.input_tokens
        total_tokens["output"] += resp.usage.output_tokens
        return resp.content[0].text

    # 1. Extract obligations
    raw_obligations = call(
        f"Extract compliance obligations as JSON list of strings:\n\n{document[:2000]}",
        max_tokens=400
    )
    try:
        obligations = json.loads(raw_obligations)
    except:
        obligations = [raw_obligations]

    # 2. Classify risk
    risk_result = call(
        f"Classify overall risk: low/medium/high/critical. Return JSON: {{\"level\": \"...\", \"reason\": \"...\"}}\n\nObligations: {json.dumps(obligations[:5])}",
        max_tokens=200
    )
    try:
        risk = json.loads(risk_result)
    except:
        risk = {"level": "medium", "reason": risk_result}

    # 3. Recommend actions
    actions = call(
        f"List 3 concrete actions to address these obligations. Return JSON list: [{{'action': '...', 'priority': 'high/medium/low'}}]\n\nObligations: {json.dumps(obligations[:5])}",
        max_tokens=400
    )
    try:
        action_list = json.loads(actions)
    except:
        action_list = [{"action": actions, "priority": "medium"}]

    # 4. Executive summary
    summary = call(
        f"Write a 2-sentence executive summary of this compliance document and its implications.\nDocument: {document_name}\nRisk: {risk.get('level')}\nKey obligations: {len(obligations)}",
        model="claude-haiku-4-5-20251001",
        max_tokens=150
    )

    # 5. Self-evaluate quality
    quality = call(
        f"Rate this compliance analysis quality (1-5) and explain. Return JSON: {{\"score\": N, \"reason\": \"...\"}}\n\nAnalysis:\nObligations: {len(obligations)}\nRisk: {risk}\nActions: {len(action_list)}\nSummary: {summary}",
        max_tokens=150
    )
    try:
        quality_score = json.loads(quality)
    except:
        quality_score = {"score": 3, "reason": "Unable to evaluate"}

    # Cost calculation
    total_cost = (total_tokens["input"] * 0.25 + total_tokens["output"] * 1.25) / 1e6
    elapsed = round(time.time() - start_time, 2)

    return {
        "document_name": document_name,
        "obligations_count": len(obligations),
        "obligations": obligations[:5],  # First 5 for display
        "risk": risk,
        "recommended_actions": action_list,
        "executive_summary": summary,
        "quality_score": quality_score,
        "metadata": {
            "total_input_tokens": total_tokens["input"],
            "total_output_tokens": total_tokens["output"],
            "total_cost_usd": round(total_cost, 6),
            "processing_time_sec": elapsed
        }
    }

# Test it
sample_doc = """
DORA Article 19 - Reporting of major ICT-related incidents:
Financial entities shall report major ICT-related incidents to the competent authority.
The initial notification shall be submitted as soon as possible and no later than 4 hours
from the moment the financial entity has become aware that the incident qualifies as major.
The intermediate report shall be submitted within 72 hours of the initial notification.
The final report shall be submitted within one month after the submission of the intermediate report.
Financial entities shall also notify clients potentially affected by the major ICT-related incident.
"""

result = process_compliance_document(sample_doc, "DORA Article 19 - Incident Reporting")

print("=" * 60)
print(f"Document: {result['document_name']}")
print(f"Obligations found: {result['obligations_count']}")
print(f"Risk level: {result['risk'].get('level', 'unknown').upper()}")
print(f"\nExecutive Summary:\n{result['executive_summary']}")
print(f"\nRecommended Actions:")
for a in result['recommended_actions']:
    if isinstance(a, dict):
        print(f"  [{a.get('priority', 'medium').upper()}] {a.get('action', a)}")
print(f"\nQuality Score: {result['quality_score'].get('score', '?')}/5")
print(f"\nCost: ${result['metadata']['total_cost_usd']} | Time: {result['metadata']['processing_time_sec']}s")
```

**Challenge:** Extend this into a Streamlit or FastAPI app. Add a database. Add multiple documents. Track quality over time. That's a real AI product.

### Required Enterprise Extensions

Add these before considering the capstone complete:

1. **Data card:** source, license, sensitivity, PII status, retention, deletion, and owner.
2. **Model inventory entry:** model, provider, approved use, fallback, retention setting, and owner.
3. **Evaluation suite:** 10+ test documents or questions with expected topics and failure severities.
4. **Safety tests:** prompt injection, out-of-scope request, missing evidence, and legal-advice escalation.
5. **Privacy-safe telemetry:** request ID, model, token counts, latency, eval version, and document IDs; no raw prompt logging by default.
6. **Human oversight:** high-risk outputs require reviewer approval before recommended actions are executed.
7. **Release gate:** a final markdown report with pass/fail thresholds and release decision.

### Enterprise Wrapper Skeleton

Use this wrapper pattern to connect the prototype code to enterprise evidence.

```python
from dataclasses import dataclass
from datetime import datetime
from hashlib import sha256

@dataclass
class ReviewDecision:
    approved: bool
    reviewer: str
    reason: str

def hash_text(value: str) -> str:
    return sha256(value.encode("utf-8")).hexdigest()[:16]

def log_safe_event(event: dict) -> None:
    """Log metadata, not raw regulated content."""
    safe_event = {
        "timestamp": datetime.utcnow().isoformat(),
        "request_id": event["request_id"],
        "document_hash": hash_text(event["document_text"]),
        "model": event["model"],
        "input_tokens": event["input_tokens"],
        "output_tokens": event["output_tokens"],
        "latency_ms": event["latency_ms"],
        "risk_level": event["risk_level"],
        "release_gate_version": event["release_gate_version"],
    }
    print(safe_event)

def requires_human_review(result: dict) -> bool:
    return result["risk"].get("level") in {"high", "critical"}

def release_gate(eval_results: dict) -> dict:
    return {
        "quality_pass": eval_results["pass_rate"] >= 0.85,
        "privacy_pass": eval_results["privacy_failures"] == 0,
        "safety_pass": eval_results["critical_failures"] == 0,
        "cost_pass": eval_results["avg_cost_usd"] <= 0.15,
    }

🎓 Curriculum Complete

Congratulations. You’ve covered:

Module	Topics
01 Foundations	LLMs, transformers, tokens, embeddings, parameters, training
02 Datasets	SFT, instruction tuning, preferences, synthetic data, cleaning
03 Fine-Tuning	LoRA, QLoRA, DPO, RLHF, quantization, GGUF
04 Inference	KV cache, Flash Attention, speculative decoding, serving, GPU
05 Ecosystem	llama.cpp, Ollama, vLLM, MLX, HuggingFace, Unsloth, Axolotl
06 RAG & Memory	RAG, vector DBs, chunking, retrieval, memory systems
07 Agents	Prompting, system prompts, tool calling, agents, multi-agent
08 Model Types	VLMs, SLMs, dense, MoE, coding models, reasoning models
09 Deployment	Local, on-device, API serving, cloud GPUs, edge AI
10 Evaluation	Benchmarks, human evals, LLM-as-judge, cost analysis, speed
11 Real-World	Chatbots, copilots, automation, SaaS, coding, orchestration, product
12 Governance	Risk classification, data governance, security controls, release gates, monitoring, incident response

What to Build Next

Given your background, these are the highest-value next projects:

Compliance Automation System (FDE-targeting project)
- Ingest regulatory PDFs → RAG pipeline → Claude API → structured output
- Add evaluation suite + observability
- Document it on GitHub as your flagship project
Fine-tuned Compliance Model
- Build 200+ example SFT dataset from real regulatory text
- QLoRA fine-tune on LLaMA 3.1 8B
- Evaluate vs base model + Claude Haiku
- Publish model + results on Hugging Face
Publish What You Build
- Technical blog post on yellamaraju.com for each module you implement
- LinkedIn posts with benchmarks and screenshots
- GitHub repo with clean code and documentation

The skills are now yours. Build with them.

End of LLM Mastery Curriculum