Coordinating Claude Code Agent Teams Through Cognitive State Tracking: A Case Study

Claude Code agent teams run multiple AI agents in parallel on a shared filesystem. Each agent has its own context window and no shared state by default. I built a hook-based coordination layer that addresses eight documented pain points using five hooks and 1,555 lines of production code. This case study describes the reconnaissance, design, implementation, and testing of that system.

1. The Problem

Claude Code’s agent teams feature enables parallelism: a lead agent delegates tasks to teammate agents, each working independently. This is powerful for throughput but introduces coordination challenges that the platform does not solve natively.

The official documentation identifies these challenges across its Architecture, Best Practices, Troubleshooting, and Use Cases sections. I distilled them into eight concrete pain points:

#	Pain Point	Source Section
1	No visibility into what each agent is thinking or planning	Architecture
2	File conflicts when agents edit the same files	Best Practices
3	No reliable way to know which tasks are complete	Limitations
4	Lead agent implements instead of delegating	Best Practices
5	Agents stop on errors instead of recovering	Troubleshooting
6	No cross-agent context — each starts from zero	Architecture
7	No quality gates before task completion	Best Practices
8	Anchoring bias when agents commit to early hypotheses	Use Cases

The fundamental tension: parallelism enables speed, but coordination requires shared understanding. Each agent operates in its own context window. There is no built-in mechanism for agents to declare what they intend to do, see what others are doing, or verify what was done before marking work complete.

2. Phase 0: Reconnaissance

Before designing anything, I needed to understand what information hooks actually receive. Claude Code’s documentation describes hook events, but I could not assume which fields would be present in each event’s hook_input.

Debug Hook Collection

I built seven debug hooks — one for each lifecycle event (SessionStart, UserPromptSubmit, PreToolUse, PostToolUseFailure, TaskCompleted, TeammateIdle, SessionEnd). Each hook captured everything: timestamp, PID, PPID, all hook_input keys and values, environment variables, and argv. Output was dumped to .nsp/agent_recon/ as timestamped JSON files.

I ran a two-agent team (structure-reader and todo-finder) on a real Swift project (nexusnative). The task was deliberately simple: one agent reads codebase structure, another lists TODO comments. The goal was data collection, not useful output.

Result: 63 dump files from a single team run.

Key Findings

Question	Finding
Does PreToolUse carry `teammate_name`?	No. 53 PreToolUse events, none with teammate identity.
Which events carry teammate identity?	Only TaskCompleted and TeammateIdle.
Can PPID distinguish teammates?	No. All 53 PreToolUse calls shared PPID 54548.
Is there a `CLAUDE_TEAMMATE_NAME` env var?	No. Not present in any hook invocation.
Do teammates share session_id?	Yes. All teammates in one team share the same session_id.
Is each hook a fresh process?	Yes. 53 unique PIDs for 53 PreToolUse events.

The core constraint was clear: PreToolUse — the only hook where I can gate agent actions — does not know which agent is acting. TaskCompleted and TeammateIdle know, but they fire at task boundaries, not during execution.

PPID Correlation Attempt (Phase 0.5)

I investigated whether PPID (parent process ID) could distinguish teammates, since each agent might have a different parent process. The dump data showed this was not the case: all PreToolUse events shared the same PPID regardless of which agent triggered them. This closed the process-correlation approach.

Design Implication

Since the platform would not tell us who is acting during execution, agents would need to tell us themselves. This led to the core design principle: self-identification before action.

3. Design: Progressive Trust

The reconnaissance findings led to a design built on a single principle: require agents to declare their cognitive state before allowing them to act. This is not a novel idea — it extends the same pattern used in solo Claude Code sessions, where NSP’s PreToolUse hook gates write operations until the agent fills an understanding.yaml declaring what it understands and plans to do.

For agent teams, the same principle applies, but with additional gates:

Identity Gate --> Understanding Gate --> Code View Gate --> Quality Gate
  "Who?"          "What & why?"         "What files?"      "Done well?"

Each gate represents an increasingly specific cognitive declaration:

Identity Gate: The agent must declare who it is (name, team, role) before any tool use. In solo sessions, identity is implicit. In teams, it must be explicit.
Understanding Gate: The agent must declare what it understands about its task, its confidence level, and its plan — before any write operation. Read-only tools (Read, Glob, Grep, etc.) are allowed pre-understanding to let agents orient themselves.
Code View Gate: Before editing code files, the agent must declare which files it intends to modify and how it sees them. This declaration is cross-checked against other agents’ declarations to detect conflicts.
Quality Gate: Before marking a task complete, the agent’s declarations are verified: Is the understanding filled? Are code view entries complete? Are declared uncertainties resolved?

Key Design Decisions

Self-identification over platform-identification. Rather than working around the missing teammate_name with fragile heuristics alone, I made self-identification an explicit requirement. This turns a platform limitation into a feature: agents must think about their role before acting.

Fail-open on errors. Every hook is wrapped in try/except. On any error, the hook allows the action rather than blocking it. A crashing hook would break the entire agent — an unacceptable failure mode for a coordination layer.

Audit everything. Every hook decision (allow, block, and why) is logged to .nsp/team/hook_log.yaml with timestamp, event type, agent name, and PID. This audit trail was invaluable for debugging the E2E runs.

Cross-validation at boundaries. When TaskCompleted fires, it carries the platform-provided teammate_name. I cross-validate this against the agent’s self-declared identity in understanding.yaml. Mismatches are logged as warnings. This provides a correctness check for the self-identification mechanism at the one point where the platform does provide identity.

4. Implementation

Architecture

Five hooks, zero external dependencies beyond PyYAML, zero imports from the parent NSP core library:

Hook	Event	LOC	Purpose
`pre_tool_check.py`	PreToolUse	417	Identity + understanding + code view gates
`task_completed.py`	TaskCompleted	238	Cross-validation, quality gates, archival
`teammate_idle.py`	TeammateIdle	170	Error recovery (block premature idle)
`session_start.py`	SessionStart	74	Directory initialization, team context
`session_end.py`	SessionEnd	66	Archive all agent state
`common.py`	(shared)	375	Utilities, constants, agent detection
`install.py`	(setup)	215	Hook registration in settings.json

Total: 1,555 lines of production code.

The PreToolUse Pipeline

PreToolUse is where most coordination happens. It runs a five-step pipeline:

Step 1 — NSP writes pass: Any write to the .nsp/ directory is immediately allowed. Agents need to write their understanding declarations without being blocked by the gates that require those declarations.

Step 2 — Identity gate: The hook calls find_active_agent() to determine which agent is invoking. If no agent is identified:

Write tools are blocked with a message instructing the agent to create its understanding.yaml with identity section.
Read-only tools are allowed with the same instruction injected as context (the agent can orient itself while setting up identity).
The block message includes a “Team Context” section showing what other agents are doing.

Step 2b — Session binding: When an agent writes its understanding.yaml, the hook captures the current session_id and writes it to the agent’s directory. This creates a deterministic session-to-agent binding that makes future identification reliable.

Step 2c — One-time context injection: On the first tool call after identity is established (tracked by a marker file), cross-agent context is injected via additionalContext.

Step 3 — Read-only tools pass: Once identity is established, read-only tools (Read, Glob, Grep, Bash, WebSearch, WebFetch, Task) are allowed without further checks.

Step 4 — Understanding gate: Write tools require a filled understanding declaration: I_understand_as, confidence_reason, and at least one plan step with a non-placeholder value.

Step 5 — Code view gate: Code file edits (.py, .js, .ts, .html, .css, .swift, etc.) require a code_view section with filled entries. Additionally, the hook scans all other agents’ code_view sections for path conflicts.

Agent Detection: Two-Phase Strategy

Since PreToolUse lacks teammate_name, the hook must infer which agent is acting. The find_active_agent() function uses two phases:

Phase 1 (deterministic): Check each agent’s session_id file against the current hook’s session_id. If a match is found, that agent is definitively identified. This works because session_id files are written at the moment an agent creates its understanding — providing a reliable binding.

Phase 2 (heuristic): If no session_id match, fall back to understanding.yaml modification times. The most recently modified understanding within a 60-second window is assumed to be the active agent. Agents already identified via Phase 1 are excluded from this heuristic to prevent misattribution.

Phase 1 was added after Run 2 revealed that the mtime heuristic alone was insufficient when two agents wrote understanding files close together.

Lead Agent Detection

The lead agent is distinguished from teammates by the location of its understanding file. Teammates write to .nsp/agents/{name}/understanding.yaml. If a filled understanding.yaml exists at the root .nsp/ level (the solo-session location), the hook identifies the current invocation as the lead. The lead is allowed read-only tools and non-code writes, but code file edits are blocked with a delegation message.

5. Testing: Three E2E Runs

I tested on word-memory-system, a Python backend implementing SM2 spaced repetition. The team task: sm2-fixer fixes a near-miss multiplier bug in the SM2 algorithm, pagination-dev adds pagination to the study history endpoint.

Run 1: Baseline

Result: 4 PASS / 4 PARTIAL / 0 FAIL

#	Pain Point	Result	Evidence
1	Agent visibility	PASS	Both agents created understanding.yaml with identity, intent, plan, code_view
2	File conflicts	PARTIAL	Gate exists (unit tested). Agents naturally stayed in separate files.
3	Task status	PARTIAL	pagination-dev missed from completed_tasks — used TeamDelete before TaskCompleted fired. TeammateIdle fallback eventually caught it, but primary mechanism failed.
4	Lead delegation	PARTIAL	Lead invisible in audit trail. Never created root understanding.yaml; detection not triggered.
5	Error recovery	PASS	Both `.done` markers present. TeammateIdle: 4 allow, 0 block.
6	Cross-agent context	PASS	Both `.team_context_injected` markers. Log confirms context injection.
7	Quality gates	PASS	TaskCompleted fired for sm2-fixer with all gates passing.
8	Uncertainty anchoring	PARTIAL	Neither agent listed uncertainties. Mechanism exists but not triggered.

Critical issue: Identity storm. In the first 48 seconds, both agents hit PreToolUse 29 times with blocked results. They were trying to read files (Read, Glob, Grep) to orient themselves, but the identity gate was blocking all tools for unknown agents — including read-only ones.

The agents needed to understand the codebase before they could write a meaningful identity declaration, but the identity gate demanded the declaration before allowing any tool use. A deadlock resolved only when agents eventually guessed they should write to .nsp/agents/.

Run 2: After Identity Storm Fix

Fix applied: Modified the identity gate to allow read-only tools for unknown agents, with identity setup instructions injected as additionalContext. Write tools remained blocked.

Result: 5 PASS / 3 PARTIAL / 0 FAIL

Metric	Run 1	Run 2	Change
Identity blocks	29	15	-48%
Readonly pre-identity allows	0	11	New: agents can orient first
completed_tasks entries	1	2	Fallback recording fixed
Total hook events	81	62	-23% (less thrashing)

Pain point #3 (task status) improved from PARTIAL to PASS: both agents now appeared in completed_tasks without relying on the TeammateIdle fallback.

Remaining issue: sm2-fixer was never identified during PreToolUse. Its session_id file and .team_context_injected marker were missing. Root cause: both agents wrote understanding files within the same recency window, and the mtime heuristic consistently favored pagination-dev. The find_active_agent() function needed a more deterministic identification strategy.

Run 3: After Session Tracking Fix

Fix applied: Added _record_session_from_write() — when any agent writes its understanding.yaml, the hook captures the current session_id and writes it to the agent’s directory. This provides Phase 1 (deterministic) identification for all subsequent PreToolUse events from that agent.

Result: 6 PASS / 2 PARTIAL / 0 FAIL

Metric	Run 2	Run 3	Change
sm2-fixer session_id	MISSING	EXISTS	Fixed
sm2-fixer .team_context_injected	MISSING	EXISTS	Fixed
sm2-fixer named in PreToolUse log	0 events	3 events	Fixed
Identity blocks	15	8	-47% further reduction
Total hook events	62	50	-19% (cleaner execution)

Final results:

#	Pain Point	Run 1	Run 2	Run 3
1	Agent visibility	PASS	PASS	PASS
2	File conflicts	PARTIAL	PARTIAL	PARTIAL
3	Task status	PARTIAL	PASS	PASS
4	Lead delegation	PARTIAL	PARTIAL	PASS
5	Error recovery	PASS	PASS	PASS
6	Cross-agent context	PASS	PASS	PASS
7	Quality gates	PASS	PASS	PASS
8	Uncertainty anchoring	PARTIAL	PARTIAL	PARTIAL

The progression: Run 1 fixed the identity storm (read-only tools allowed pre-identity). Run 2 fixed task status recording (both agents properly in completed_tasks). Run 3 fixed agent identification (session_id-based detection), which also gave the audit trail clean agent attribution — enabling lead delegation to score PASS (lead behavior confirmed as delegation-appropriate with full visibility).

The two remaining PARTIAL results are for scenarios that did not arise organically during testing:

File conflicts (#2): Both agents naturally worked on different files. The conflict detection mechanism is fully implemented and unit-tested (PreToolUse tests include conflict scenarios), but no real conflict occurred.
Uncertainty anchoring (#8): Neither agent listed uncertainties in their understanding. The gate is implemented and unit-tested (Gate 3), but the trigger condition requires agents to declare uncertainties, which depends on the nature of the task.

6. Lessons Learned

Self-identification is a natural extension of “think before you act”

The self-identification requirement initially felt like a workaround for a missing platform feature. In practice, it produced a beneficial side effect: agents that must declare their identity, role, and plan before acting produce more focused work. The identity declaration is not just coordination metadata — it is a cognitive checkpoint.

Deterministic signals beat heuristics for multi-agent identification

The mtime-based agent detection (Phase 2) worked in simple cases but failed when two agents wrote understanding files within the same time window. The session_id-based detection (Phase 1) is deterministic and reliable. The general lesson: in multi-agent systems, avoid heuristics when deterministic signals are available.

Fail-open design is not optional

During development, a YAML parsing error in a hook would have crashed the agent’s entire tool pipeline. Every hook wraps its logic in try/except and defaults to allowing the action on error. In three E2E runs (193 total hook events), zero crashes occurred. The hooks handled malformed data, missing directories, and unexpected state without interrupting any agent.

The audit trail pays for itself

hook_log.yaml records every decision with timestamp, agent name, event type, and PID. Every fix between runs was diagnosed by reading this log. The identity storm in Run 1 was visible as 29 consecutive “block / unknown teammate” entries in 48 seconds. The sm2-fixer identification failure in Run 2 was visible as zero entries with agent=”sm2-fixer”. Without the audit trail, diagnosing multi-agent coordination issues from agent output alone would have been far more difficult.

In-process mode shares session_id (platform constraint)

All teammates in a team share the same session_id. This means session_id alone cannot distinguish teammates — it can only confirm that a process belongs to the team (vs. a solo session). Our session_id-based identification works because I write the session_id to a specific agent’s directory at the moment that agent creates its understanding. The binding is agent-directory-to-session, not session-to-agent.

In tmux mode (separate terminal windows), each agent gets its own session_id, which would simplify identification. The in-process mode constraint forced a more robust design.

7. Current State and Next Steps

Current state

1,555 LOC production code across 5 hooks + shared utilities
171 unit and integration tests, all passing
3 E2E test runs with progressive improvement: 4 PASS / 4 PARTIAL to 6 PASS / 2 PARTIAL
0 FAIL across all test categories and all runs
Standalone installation (single command, no nsp-core dependency)
Full audit trail for every hook decision

Next steps

Viewer integration: A team panel in the NSP viewer showing per-agent state, file ownership, and completion status in real time.
A/B experiment framework: Run the same task with and without coordination hooks to measure impact on output quality, conflict rate, and completion time.
Adversarial debate: Part B of pain point #8 — structuring cross-agent challenge/response for competing hypotheses.

Feature proposals

Based on what I built and tested, I have identified specific platform features that would make coordination layers simpler, more reliable, and more effective. These are detailed in Agent Teams Coordination: Feature Proposal. The single most impactful change would be adding teammate_name to PreToolUse hook_input.