Claude Code Session Patterns

A working playbook for getting useful work out of Claude Code: model selection, session structure, parallelism, and the trade-offs each pattern hides.

Claude Code is Anthropic's terminal-based coding agent. The product surface is small — a CLI plus a few hooks — but the way you structure a session has an outsized effect on output quality, latency, and cost. This module collects patterns that hold up across research, coding, and ops sessions, and notes where each pattern breaks.

1. Model selection: pick by task shape, not by reflex

The default reflex is "use the strongest model." That's wrong. The strongest model is slow and expensive, and most tasks don't need it. A better heuristic is to match model to task shape.

Task shape	Model	Why
Deep research, architecture, multi-file refactor with cross-cutting reasoning	Opus (1M-context tier when files span a large repo)	Reasoning depth and long context tolerance dominate latency
Daily coding, single-file edits, mechanical bug fixes, doc generation	Sonnet	Fast enough to keep flow state, capable enough for ~90% of edits
Bulk transformations, simple tool calls, scripted glue	Haiku	Latency dominates; quality difference is invisible at this task size

A few practical rules that emerge from running this for a while:

Switch down, not up, mid-session. If a Sonnet session is going well, finishing on Sonnet is faster than swapping to Opus for a "polish pass." Polish passes rarely change correctness; they mostly burn tokens.
Switch up when you hit a real reasoning wall, not when output looks slightly off. "Slightly off" is usually a prompt problem.
1M-context Opus is for when the repo is the prompt. If you can describe the relevant files in two paragraphs, you don't need 1M context — you need a better selection of files.
Avoid model A/B mid-task. Switching models mid-task resets the implicit working memory the model has built up about your codebase conventions.

Related: openrouter for cross-provider routing, openclaw-optimization for similar trade-offs in a different agent harness.

2. Session structure: research vs. coding flows have different shapes

Research and coding sessions look superficially alike (both are "ask Claude things, get text back") but the right structure differs.

Research session

Broad scope query — frame the question, ask for a landscape, accept that the first answer will be shallow.
Depth pass — pick 2–4 threads, ask follow-ups with specific entities or claims to verify.
Adversarial pass — ask for the strongest counter-argument, the weakest assumption, what would falsify the thesis. This is where most of the value comes from.
Synthesis — request a structured output (table, comparison, ranked list). Specify the structure; do not let the model pick.
Save — markdown to disk, not chat. Anything not written down is lost at session end.

Coding session

Requirements + constraints — what to build, what not to break, what existing files matter.
Plan first, code second — ask for a plan with file-level granularity before any code is generated. Cheap to revise a plan; expensive to revise 800 lines.
Iterate in slices — one component or one function at a time. Long generations are where quality drops fastest.
Test as you go — generated code that compiles is not generated code that works. The difference shows up in the first test.
Document last — README/comments after the code stabilizes, not before.

The common failure mode is using the research flow for coding ("just give me a plan and code in one shot") or the coding flow for research ("just give me the answer"). Both produce mediocre output.

3. Parallelism via subagents

Claude Code can spawn subagents. The pattern that works:

Fan out for independent work: searching multiple directories, summarizing several documents, checking N hypotheses against the same codebase. Subagents run in parallel and return text to the parent.
Don't fan out for sequential work: each subagent loses the parent's context, so multi-step reasoning that needs continuity gets worse, not better, when split.
Treat subagent output as untrusted: the parent agent should verify or at least read-through the result before acting. Subagents hallucinate file paths and APIs at the same rate as the parent.
Budget the fan-out: 3–5 parallel subagents is usually the sweet spot. Beyond that, the parent's job becomes "merge a lot of slightly-conflicting summaries," which is its own failure mode.

The latency win is real — a 5-way fan-out for codebase exploration finishes in roughly the time of one slow exploration — but only when the subtasks are genuinely independent.

4. Todo-tracking and structured planning

Claude Code maintains a todo list as part of its loop. Two patterns matter:

Make the todo list explicit early. Asking the agent to write down its plan as todos, before doing any of them, surfaces ambiguity at the cheapest possible point. Bad plans are obvious; bad executions are not.
Don't let the list drift. When new tasks emerge mid-session, they should land in the todo list, not become silent extra work. A task list that doesn't match what was done is worse than no list.

This is the same discipline as a human running a long task — the value is in the plan being visible and revisable, not in the formatting.

5. Hooks and harness configuration

Claude Code supports lifecycle hooks (UserPromptSubmit, Stop, etc.) configured in settings.json. Useful patterns:

Logging hooks to capture session transcripts for later review. Sessions are otherwise ephemeral.
Pre-flight hooks to inject project context (current branch, recent failing tests, open todo file). Cheaper than re-typing the same context every session.
Stop hooks to run linters, formatters, or post-session summaries automatically.

Hooks are not magic — they're shell commands the harness runs at fixed points. The trap is over-engineering: a hook that silently rewrites your prompt or auto-commits is more dangerous than helpful. Keep hooks observable and idempotent.

Related: openclaw-optimization for analogous hook patterns in the openclaw / hermes-agent family.

6. Output discipline

Sessions produce text. The text is worthless if it doesn't survive the session.

Markdown to disk for anything reusable. Chat output disappears; files don't.
Tables for comparisons, code blocks for code, prose for arguments. Mixing these inside a single answer is harder to scan and harder to extract.
Ask for the format up front. "Compare X and Y" yields prose; "compare X and Y as a table with columns A, B, C" yields a table. The model will give you what you ask for if you ask precisely.
Treat numbers as suspect. Any benchmark, price, or stat in a model's output should be cross-checked before it goes into a public document. Speeds and prices change weekly; the model's training data does not.

7. Cost and latency observations

A few rough patterns that recur:

Latency on simple queries is dominated by network and token-decode time, not reasoning. Switching from a strong model to a fast one on a one-shot query saves seconds, not minutes.
Latency on complex reasoning is dominated by output length. Asking for shorter, more structured output is a bigger win than picking a faster model.
Cost scales with context, not just output. Long sessions with large context windows get expensive even if each individual response is short. Compaction or starting a fresh session is cheaper than letting context grow forever.
Provider routing matters when budget is tight. Same model name, different provider, different cost and latency. See hermes-openrouter-models.

8. When Claude Code is the wrong tool

Worth naming explicitly:

GUI-heavy work. Native desktop or browser automation belongs in a computer-use or browser-control tool, not a CLI agent.
Real-time / interactive debugging. Stepping through a debugger or watching a live log is a human-in-loop task; a coding agent is too slow in that loop.
Tasks where one wrong action is catastrophic. Migrations, force-pushes, financial transactions. Use the agent to draft the command, then run it yourself.
Tiny edits. A two-character typo fix doesn't need an agent. Cost of the round-trip exceeds the cost of the edit.

The general rule: if the bottleneck is your typing speed, an agent helps. If the bottleneck is judgment, an agent is a draft tool, not the final actor.