Cycles & Reflection: Advanced | Praveen Srinag Yellamaraju

This lesson focuses on Cycles & Reflection at the advanced level. Use it to move from definition to implementation-ready explanation.

Concept

Advanced patterns: LATS (Language Agent Tree Search) combines reflection with Monte Carlo Tree Search - generate multiple candidates via Send API fan-out, score each, expand the most promising. Confidence threshold routing stops the loop when the LLM reports high confidence via structured output. Cost control is critical: use cheap models for critique, expensive only for final generation.

Key Facts

LATS: Send API fan-out + scoring + tree pruning for planning problems
Confidence threshold: route to END only when structured output confidence > 0.85
Parallel critique: fan-out to multiple critics, aggregate weighted scores
Cost control: cheap model for critique, expensive model for final generation only
Streaming reflection: astream_events() streams intermediate drafts to UI

Reference Implementation

from langgraph.types import Send
from typing import TypedDict, Annotated, List
import operator

def add_int(old: int, new: int) -> int:
    return old + new

class LATSState(TypedDict):
    task: str
    candidates: Annotated[List[str], operator.add]
    scores: Annotated[List[float], operator.add]
    iteration: Annotated[int, add_int]
    best_candidate: str

def generate_candidates(state: LATSState):
    # Fan-out: generate 3 diverse candidates in parallel
    return [Send("gen_one", {"task": state["task"], "seed": i}) for i in range(3)]

def gen_one(state: dict):
    draft = f"Candidate {state['seed']}: {state['task'][:30]}"
    return {"candidates": [draft]}

def score_all(state: LATSState):
    # judge_llm.invoke each candidate in production
    scores = [0.6 + i * 0.1 for i in range(len(state["candidates"]))]
    best_idx = scores.index(max(scores))
    return {
        "scores": scores,
        "best_candidate": state["candidates"][best_idx],
        "iteration": 1
    }

def should_continue(state: LATSState) -> str:
    if state["iteration"] >= 3 or max(state["scores"], default=0) > 0.85:
        return "end"
    return "generate_candidates"

Interview Q&A

Q1. How would you implement a confidence-based loop that stops when certain enough?

Add a confidence field to state. In your generation node, prompt the LLM to output confidence 0-1 alongside the answer using with_structured_output. In the routing function: if confidence > threshold (e.g., 0.85) route to END; else route back to generate with previous result as context. Calibrate the threshold empirically using your eval dataset.

Q2. Explain LATS and when to use it over simple reflection.

LATS generates multiple candidate responses, evaluates each, expands the most promising, and backtracks dead ends - like MCTS. Use it when: the answer space is large and diverse, simple reflection converges to the same bad local optimum, or you have budget for 10-50 LLM calls per query. Standard reflection suffices for most production use cases.

Q3. How do you control costs in production reflection loops?

Use a cheap fast model for critique (GPT-4o-mini, Claude Haiku), expensive model only for final generation. Cap iterations at 2-3 and measure quality uplift per iteration - often diminishing returns after round 2. Track cost per query in LangSmith and set budget alerts. Cache critiques for identical drafts.

Q4. Why combine Send with reflection?

Send lets you generate or critique multiple candidates in parallel, then reduce their scores before choosing the next branch. This gives reflection breadth without hiding all work inside one opaque node.

Q5. What makes LATS expensive?

LATS expands multiple candidates over multiple iterations, so model calls grow quickly. Use strict depth limits, candidate pruning, cached scores, and cheaper judge models to keep search from dominating cost.

Practice Task

Explain when this LangGraph pattern is safer than a linear chain, then name one production failure it prevents.

How to Use This Lesson

Related Blog Deep Dives