Deep Dive: learn-claude-code — Nao's Report for Kim

🔍 Overview

This repo is a teaching project that reverse-engineers how Claude Code works internally. It builds a minimal agent from scratch in 12 progressive sessions — each session adds exactly one mechanism, and the core agent loop never changes.

🧠 Core Philosophy

"The model IS the agent. Our job is to give it tools and stay out of the way." — The entire agent is a while loop that calls tools until the model's stop_reason is no longer "tool_use". Everything else layers on top without modifying this loop.

The progression is clean and well-structured:

Phase 1 (s01–s02): The Loop — while + stop_reason, tool dispatch maps
Phase 2 (s03–s06): Planning & Knowledge — todo tracking, subagents, skills, context compression
Phase 3 (s07–s08): Persistence — file-based task graphs with dependencies, background execution
Phase 4 (s09–s12): Teams — multi-agent coordination, protocols, autonomous task claiming, worktree isolation

📊 At a Glance

Sessions

Capstone (s_full.py)

~900

Lines in Capstone

Tools in Full Agent

Languages (EN/ZH/JA)

Sister Repo (claw0)

Also includes: a Next.js web platform with interactive visualizations, step-through diagrams, and a source viewer.

📚 The 12 Sessions Click to expand

s01 The Agent Loop Phase 1

"One loop & Bash is all you need" ▾

The entire agent is 30 lines of Python. A while True loop sends messages + tool definitions to the LLM, checks if stop_reason == "tool_use", executes tools if yes, appends results, and loops back. When the model stops calling tools, the function returns.

Key Takeaways

The core loop is the same in every session — everything else is layered on top
One tool (bash) + one loop = a functioning agent
messages[] is the accumulating state — both user and tool results live here
The exit condition (stop_reason) is what makes it autonomous rather than one-shot

def agent_loop(messages):
    while True:
        response = client.messages.create(model=MODEL, system=SYSTEM,
            messages=messages, tools=TOOLS)
        messages.append({"role": "assistant", "content": response.content})
        if response.stop_reason != "tool_use":
            return
        results = [execute_tool(block) for block in response.content
                   if block.type == "tool_use"]
        messages.append({"role": "user", "content": results})

📌 Relevance to Our Work

This is exactly how OpenClaw works under the hood. Our agent loop is the same pattern — we just have 30+ tools instead of 1. Understanding this loop demystifies every "magic" behavior.

s02 Tool Use Phase 1

"Adding a tool means adding one handler" ▾

Introduces the dispatch map pattern: a simple dict mapping tool names to handler functions. Adding a new tool = adding one entry to the dict + one schema entry. The loop itself never changes.

Key Takeaways

TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, ...}
Path sandboxing via safe_path() — prevents workspace escape
Dedicated tools (read/write/edit) are safer and more predictable than shelling out for everything
The dispatch map scales linearly — no if/elif chains

📌 Relevance to Our Work

Our brain-v2.py CLI is essentially a big dispatch map. Same pattern, different scale. The path sandboxing concept maps to our security clearance enforcement design.

s03 TodoWrite Phase 2

"An agent without a plan drifts" ▾

Adds a TodoManager — a structured task list where only one item can be in_progress at a time. Plus a nag reminder: if the model goes 3+ rounds without updating its todos, a <reminder> gets injected into the next tool result.

Key Takeaways

"One in_progress at a time" forces sequential focus — prevents the agent from wandering
The nag reminder is a simple but effective pattern: inject text into tool_result to redirect attention
System prompts fade as context fills up — in-context reminders fight this signal decay
This is Claude Code's actual TodoWrite mechanism

💡 Interesting Pattern

The "nag injection" technique — inserting reminders into tool results to maintain focus — is something we could use in our heartbeat system. When the agent gets tunnel vision on a task and forgets meta-work (logging, patterns), a nag injector could bring it back.

s04 Subagents Phase 2

"Break big tasks down; each gets a clean context" ▾

The parent agent gets a task tool that spawns a subagent with fresh messages=[]. The child does all its work (potentially 30+ tool calls), then only the final text summary returns to the parent as a tool_result. The child's entire message history is discarded.

Key Takeaways

Context isolation: child's work (file reads, bash outputs) never pollutes parent context
No recursive spawning: children get all tools except task
This is the pattern behind Claude Code's /task command and OpenClaw's sessions_spawn
30-iteration safety limit prevents infinite loops

📌 Relevance to Our Work

This is exactly our sessions_spawn pattern. We use it heavily for coding agents, overnight work, and research tasks. Key learning: the summary-only return is what keeps the parent context clean — same reason we use sub-agents for big ELWS prototype work.

s05 Skills Phase 2

"Load knowledge when you need it, not upfront" ▾

Two-layer skill injection: Layer 1 puts skill names/descriptions in the system prompt (~100 tokens each). Layer 2 loads the full skill body via tool_result when the model calls load_skill("name") (~2000 tokens each).

Key Takeaways

10 skills × 2000 tokens = 20,000 tokens wasted if all loaded upfront
The two-layer pattern (cheap metadata + expensive on-demand body) is elegant and token-efficient
Skills are SKILL.md files with YAML frontmatter in directories
This is exactly how OpenClaw's available_skills system works

💡 Direct Parallel to CAG

This two-layer skill pattern is architecturally identical to what we're designing for CAG profiles! Layer 1 = profile names in system prompt (cheap). Layer 2 = full context loaded on-demand via rule engine (expensive). Our profile design doc already follows this pattern — this validates the approach.

s06 Context Compact Phase 2

"Context will fill up; you need a way to make room" ▾

Three-layer compression strategy:

Layer 1 — micro_compact (every turn): Replace old tool results (3+ turns back) with "[Previous: used {tool_name}]"
Layer 2 — auto_compact (at token threshold): Save full transcript to .transcripts/, LLM summarizes, replace all messages with summary
Layer 3 — manual compact: Model explicitly calls compact tool

Key Takeaways

Micro-compact is the killer feature: silently cleaning old tool results every turn is low-cost, high-impact
Transcripts on disk = nothing truly lost, just moved out of active context
The token estimate is rough (len(json.dumps(messages)) // 4) — good enough for a threshold trigger
This is how OpenClaw handles compaction internally

📌 Relevance to Our Work

Directly relevant to our compaction capping task (tsk-e6642449bc6f). Their micro-compact is what we want — silently pruning old tool results before they accumulate. Our current problem is that OpenClaw only does Layer 2 (full compaction) but lacks Layer 1 (continuous micro-pruning). This is the missing piece for our "cap conversation messages" design.

s07 Task System Phase 3

"Break big goals into small tasks, order them, persist to disk" ▾

Promotes s03's flat checklist into a file-based task graph. Each task is a JSON file with status, blockedBy, and blocks fields. Completing a task automatically clears its ID from all dependents' blockedBy lists.

Key Takeaways

Task graph answers 3 questions: What's ready? What's blocked? What's done?
Survives compression and restarts — state lives on disk, not in context
DAG (Directed Acyclic Graph) pattern enables parallel execution in later sessions
4 tools: task_create, task_update, task_list, task_get

📌 Relevance to Our Work

Our Sakura DB's tasks and projects tables serve the same purpose but are more sophisticated (DB-backed, with tags, priorities, assignment). Their file-based approach is simpler but the dependency graph concept (blockedBy/blocks) is something we don't have yet — our "Task Dependency Graph" project in backlog is exactly this.

s08 Background Tasks Phase 3

"Run slow operations in the background; the agent keeps thinking" ▾

Uses daemon threads for long-running shell commands. Results go into a notification queue that gets drained before each LLM call — injected as <background-results> blocks.

Key Takeaways

The agent loop stays single-threaded — only subprocess I/O is parallelized
Notification queue pattern: producer (bg thread) → queue → consumer (agent loop)
Enables "install deps AND write config simultaneously"
300s timeout per background task as safety valve

💡 Interesting Pattern

The "drain notifications before each LLM call" pattern is how OpenClaw's exec tool with yieldMs and background mode works. Same concept, different implementation.

s09 Agent Teams Phase 4

"When the task is too big for one, delegate to teammates" ▾

Introduces persistent teammates with lifecycle management (spawn → working → idle → shutdown) and JSONL mailboxes for inter-agent communication. Each teammate runs its own agent loop in a daemon thread.

Key Takeaways

JSONL mailboxes: append-only files per agent, drain-on-read — simple, crash-safe IPC
Teammates check inbox before each LLM call — messages are context, not interrupts
Team roster in config.json tracks who exists and their status
Unlike subagents (s04), teammates persist and have identity

📌 Relevance to Our Work

Our sessions_spawn with mode="session" is the persistent teammate equivalent. The JSONL mailbox pattern is interesting — it's file-based inter-process communication. Simpler than our sessions_send approach but same concept. Their "drain inbox before LLM call" is exactly what OpenClaw does with queued messages.

s10 Team Protocols Phase 4

"Teammates need shared communication rules" ▾

Adds two structured protocols on top of the mailbox system:

Shutdown protocol: request_id handshake — lead requests shutdown, teammate approves/rejects
Plan approval: teammate submits plan with request_id, lead reviews and approves/rejects

Key Takeaways

Both protocols share the same FSM: pending → approved | rejected
Correlation via request_id — same pattern as HTTP request/response correlation
One pattern (request→response with ID) handles any structured negotiation
Without shutdown protocol, killing a thread leaves files half-written

💡 Interesting Pattern

Plan approval gating is a safety mechanism we don't have. When our sub-agents tackle risky tasks (deploying, modifying configs), there's no approval gate — they just do it. Worth considering for our "external verification" rule.

s11 Autonomous Agents Phase 4

"Teammates scan the board and claim tasks themselves" ▾

Teammates become self-organizing: instead of being assigned tasks, they scan the task board and auto-claim unclaimed work. Work phase → idle phase (poll every 5s for messages/tasks) → timeout → shutdown.

Key Takeaways

Idle cycle: poll inbox + scan task board → 60s timeout → auto-shutdown
Identity re-injection: after context compression, the agent might forget who it is — inject identity block when len(messages) <= 3
Self-organization scales better than lead-assigned work
Task claiming: find pending + unowned + unblocked tasks

📌 Relevance to Our Work

The identity re-injection pattern is something we need. After compaction, our agent loses context about active work. Their solution: detect short message lists and re-inject identity + current task context. We should do this in our brain compact recovery flow — and it directly relates to our "force context reread timer" task.

s12 Worktree + Task Isolation Phase 4

"Each works in its own directory, no interference" ▾

Each task gets its own git worktree directory. Task board tracks what to do, worktrees track where to do it. Bound by task ID. Lifecycle events emitted to events.jsonl.

Key Takeaways

Control plane (.tasks/) + execution plane (.worktrees/) = clean separation
Two agents can work on different modules simultaneously without file conflicts
Worktree lifecycle: absent → active → removed | kept
Event stream (events.jsonl) enables audit, recovery, and monitoring
State on disk survives crashes — conversation memory is volatile, file state is durable

💡 This Solves a Real Problem We've Had

Remember when sub-agents merged editor code that broke terrain.js exports? Two agents working in the same directory caused conflicts. Git worktree isolation per task would prevent this entirely. Each sub-agent gets its own branch + directory, and merges happen explicitly.

🏆 Capstone: s_full.py

The capstone combines all mechanisms from s01–s11 into a single ~900-line reference implementation. It's not a teaching session — it's the "put it all together" reference.

Mechanism	Implementation	Tools
Agent Loop (s01)	`while True` + `stop_reason`	—
Tool Dispatch (s02)	`TOOL_HANDLERS` dict, 23 handlers	bash, read, write, edit
TodoWrite (s03)	TodoManager + nag after 3 rounds	TodoWrite
Subagents (s04)	`run_subagent()` with Explore/general types	task
Skills (s05)	SkillLoader — two-layer injection	load_skill
Compression (s06)	micro_compact + auto_compact + manual	compress
Task System (s07)	File-based DAG with dependencies	task_create, task_get, task_update, task_list
Background (s08)	Daemon threads + notification queue	background_run, check_background
Teams (s09)	TeammateManager + JSONL mailboxes	spawn_teammate, list_teammates, send_message, read_inbox, broadcast
Protocols (s10)	Shutdown handshake + plan approval FSM	shutdown_request, plan_approval
Autonomy (s11)	Idle cycle + auto-claim + identity re-inject	idle, claim_task

REPL commands: /compact, /tasks, /team, /inbox

🐾 Sister Repo: claw0

claw0 is the companion repo that builds a minimal AI agent gateway (like OpenClaw) from scratch in 10 sessions. Where learn-claude-code focuses on the agent internals, claw0 focuses on the infrastructure around it.

learn-claude-code (agent)	claw0 (gateway)
Agent loop + tools	Agent loop + tools
Planning (TodoWrite)	Sessions & persistence (JSONL)
Subagents	Channel pipelines (Telegram, Feishu)
Skills loading	Gateway routing (5-tier binding)
Context compression	Intelligence (soul, memory, skills, 8-layer prompt)
Task system	Heartbeat & cron
Background tasks	Delivery (write-ahead queue + backoff)
Agent teams	Resilience (retry onion, auth rotation)
Protocols (shutdown, plan)	Concurrency (named lanes)
Autonomous agents	—
Worktree isolation	—

🧠 Key Insight

claw0's s06 (Intelligence) describes the prompt as "8 layers of files on disk — swap files, change personality." This is exactly our architecture: SOUL.md, USER.md, HEARTBEAT.md, etc. The claw0 workspace even ships with the same file set we use: SOUL.md, IDENTITY.md, TOOLS.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md, AGENTS.md, MEMORY.md, CRON.json.

📌 Relevance to Our Work

claw0 essentially reverse-engineers OpenClaw's architecture. The fact that they identify the same building blocks we use validates our approach. Their "5-tier routing" and "named lanes for concurrency" are worth studying for our CAG rule engine design — it's the same routing problem we're solving.

🎯 Relevance to Our Work

Things We Already Do (Validation)

▾

Agent loop pattern — OpenClaw's core is the same while+stop_reason loop ✅
Tool dispatch maps — our brain-v2.py, pi.py are exactly this ✅
Subagents — sessions_spawn is their s04 pattern ✅
Two-layer skill loading — OpenClaw's available_skills is identical ✅
Context compaction — OpenClaw does auto_compact ✅
Heartbeat — OpenClaw has 30s heartbeat (claw0 s07) ✅
Cron — OpenClaw cron system (claw0 s07) ✅
Soul/personality files — SOUL.md, USER.md, etc. (claw0 s06) ✅

Things We Should Adopt (New Ideas)

▾

Micro-compact (s06 Layer 1) — Silent per-turn pruning of old tool results. We only do full compaction; this would extend session lifetime dramatically. Priority: High — maps to our compaction capping task.
Nag reminder injection (s03) — When the agent goes N rounds without updating todos/doing meta-work, inject a reminder into tool results. Priority: Medium — would fix our "tunnel vision" pattern.
Identity re-injection (s11) — After compaction, detect short message lists and re-inject identity + task context. Priority: Medium — maps to our "brain compact" recovery.
Task dependency graph (s07) — blockedBy/blocks fields for task ordering. Priority: Low — our backlog project "Task Dependency Graph" should look at this.
Git worktree isolation (s12) — Per-task directories for parallel sub-agent work. Priority: Low but valuable — would prevent the terrain.js merge conflict class of bugs.
Plan approval gate (s10) — Sub-agents submit plans for approval before executing risky operations. Priority: Low — nice safety layer for deployments.

Things We Do Better

▾

Memory — Their task/team state is file-based JSON; we have a full SQLite DB with embeddings, search, relationships, and typed entities. Way more sophisticated.
Context injection — They use static files; we're building DB-driven CAG profiles with rule engines, security clearances, and dynamic context sizing.
Multi-channel — OpenClaw supports 13+ platforms; their approach is terminal-only (learn-claude-code) or Telegram+Feishu (claw0).
Personality depth — Their SOUL.md is a template; ours is a living document refined over months of real interaction.
Tool ecosystem — They have 4-23 tools; we have 30+ native tools plus Miro MCP, calendar, email, Cloudflare, GitHub, and more.
Product intelligence — They have nothing like our PI database (concepts, features, experiments, observations).

⚖️ Final Verdict

🌸 Nao's Take

Quality: Excellent teaching material. Clean progressive structure, good ASCII diagrams, minimal code that actually works. Best "build an AI agent from scratch" resource I've seen.

Novelty for us: Moderate. We already do most of this via OpenClaw. But the micro-compact pattern, nag injection, and identity re-injection are concrete techniques we should adopt.

For Kim specifically: Worth a skim of s01 (the core loop), s06 (context compression — directly relevant to our compaction work), and s12 (worktree isolation — solves our sub-agent collision problem). Skip s02-s03 unless you want the full picture. The claw0 sister repo is interesting because it reverse-engineers OpenClaw's own architecture.

Actionable takeaway: The single most valuable idea is micro-compact — replacing old tool results with placeholders every turn. If we implement this one thing at the CAG level, it would likely double our effective session length before compaction triggers.

TL;DR: Great learning resource, validates our architecture, and has 2-3 concrete techniques worth stealing. Not a threat to what we're building — we're well ahead in terms of memory, context intelligence, and tool ecosystem.