I built a hook-based coordination layer for Claude Code agent teams to address eight documented pain points from the official agent-teams documentation. After three end-to-end test runs on a real project, I have concrete observations about what works, what required workarounds, and what would be better as native platform features. This document presents those observations as feature proposals.
The Core Constraint
PreToolUse is the only hook where coordination logic can gate agent actions — deciding whether to allow or block a file edit, inject cross-agent context, or enforce quality requirements. But PreToolUse does not carry teammate_name. Of the seven hook events I tested during reconnaissance (63 dump files from a two-agent run), only TaskCompleted and TeammateIdle include teammate identity. This means the most fundamental coordination operation — knowing which agent is acting — requires an indirect workaround at the exact point where direct identification matters most.
Eight Pain Points, Eight Proposals
Summary
| # | Pain Point | Our Solution | Native Feature Proposal |
|---|---|---|---|
| 1 | No visibility into agent thinking | Per-agent understanding.yaml via self-identification |
Add teammate_name to PreToolUse hook_input |
| 2 | File conflicts between agents | Cross-agent code_view scanning before edits |
Built-in file ownership registry |
| 3 | Task completion status lag | Filesystem-based completion records | Expose completed_tasks in hook_input or via API |
| 4 | Lead agent jumps into implementation | Detect lead via root understanding, block code edits | delegate_only mode flag for lead agent |
| 5 | Teammates stopping on errors | Plan-step check + .done marker gates idle |
Richer TeammateIdle context (plan/progress, not just name) |
| 6 | No cross-agent context | Two injection points with team awareness formatting | Provide teammates’ declared state in hook_input |
| 7 | No quality gates on completion | Three-gate check before TaskCompleted allows | Configurable pre-completion checks |
| 8 | Anchoring bias / premature commitment | Uncertainty resolution gate blocks completion | First-class uncertainties field that blocks completion until resolved |
#1: Agent Visibility
What happens: Multiple agents work in parallel on a shared filesystem. Each has its own context window. The lead agent and other teammates have no way to see what any given agent is thinking, planning, or about to do. The official docs note: “Each teammate has its own context window. When spawned, a teammate loads the same project context as a regular session.”
What I built: Each agent writes a structured understanding.yaml declaring its identity, intent, plan, and code view before taking any action. A PreToolUse hook gates all write operations until this declaration is complete. The hook uses a two-phase agent detection strategy (session_id file lookup, then mtime heuristic) to associate the current hook invocation with the correct agent — because PreToolUse lacks teammate_name.
What would be better natively: Add teammate_name to PreToolUse hook_input. This single field would eliminate our entire self-identification mechanism: the two-phase detection logic, the session_id file tracking, the mtime heuristic, and the identity gate that blocks agents until they declare who they are. Every other coordination primitive becomes straightforward when you know which agent is acting.
#2: File Conflicts
What happens: Two agents independently decide to edit the same file. The second write silently overwrites the first. The official docs advise: “Be specific about which files each teammate should work on” — a prompt-level suggestion with no enforcement.
What I built: Each agent declares which files it intends to modify in a code_view section. Before allowing a code file edit, the PreToolUse hook scans all other agents’ code_view sections. If another agent has already claimed the same file path, the edit is blocked with a message identifying the conflict and the other agent.
What would be better natively: A built-in file ownership registry where agents declare files at task assignment time and the platform enforces exclusive access. This would be more reliable than our approach (which depends on agents filling code_view honestly) and would catch conflicts at the filesystem level rather than at the hook level.
#3: Task Status Lag
What happens: The lead agent cannot reliably know which tasks are complete. Task completion events fire asynchronously, and there is no queryable task status API available to hooks. In our first E2E run, one agent’s completion was missed entirely because it used a different shutdown path before TaskCompleted fired.
What I built: TaskCompleted writes a .done marker file and records completion in a shared team/state.yaml file with timestamp and task details. TeammateIdle has a fallback: if .done exists but no completed_tasks entry, it records the completion then. This gives hooks ground truth independent of the platform’s task list.
What would be better natively: Expose completed_tasks (a list of {task_id, teammate_name, status} objects) in hook_input for all events, or provide a queryable API endpoint. Hooks should not need to maintain their own task completion registry.
#4: Lead Agent Delegation
What happens: The lead agent, which should be coordinating and reviewing, sometimes starts implementing code directly — defeating the purpose of having a team. The official docs suggest using “Shift+Tab” delegate mode, but this is a UI-level control, not an enforcement mechanism.
What I built: The PreToolUse hook detects the lead agent by checking for a root-level understanding.yaml (teammates write to .nsp/agents/{name}/ subdirectories). When the lead is detected, read-only tools and non-code writes are allowed, but code file edits are blocked with a message: “As lead agent, delegate code changes to your teammates.”
What would be better natively: A delegate_only mode flag that the lead agent (or the user) can set, making code-editing tools unavailable to the lead while preserving read and coordination capabilities. This would be cleaner than our heuristic detection and could integrate with the existing permission mode system.
#5: Error Recovery
What happens: Agents stop working after encountering errors instead of recovering. The official docs note this in Troubleshooting: “Teammates may stop after encountering errors instead of recovering.” The current TeammateIdle hook provides the teammate’s name but no context about what the agent was working on or how far it got.
What I built: A two-part mechanism. First, TaskCompleted writes a .done marker — the positive signal that work is complete. Second, TeammateIdle checks whether the agent has unfinished plan steps. If plan steps exist but .done does not, exit code 2 blocks the idle transition and feeds back the unfinished steps via stderr, effectively telling the agent: “You declared you would do X, Y, Z — you are not done yet.”
What would be better natively: TeammateIdle should include the agent’s declared plan or progress context in hook_input, not just teammate_name. If the platform knew an agent had declared 4 plan steps and completed 2, that context would make error recovery hooks far more targeted. Currently, I maintain this state ourselves on the filesystem.
#6: Cross-Agent Context
What happens: Each agent starts with a blank context window. It loads project files but has zero awareness of what other agents are doing, have done, or plan to do. The official docs describe this as architectural: “The lead’s conversation history does not carry over to teammates.”
What I built: Two injection points. First, when a new agent hits the identity gate (unknown agent, first tool call), the block message includes a “Team Context” section showing all existing agents’ roles, tasks, plans, and uncertainties. The agent sees the full team landscape before even creating its identity. Second, on the first tool call after identity is established, cross-agent context is injected via additionalContext in the allow response. A marker file prevents duplicate injection.
What would be better natively: Include other teammates’ declared state in hook_input (or make it queryable). If each agent had a structured state field that hooks could read — containing what they are working on, what files they have claimed, what they have completed — cross-agent context injection would be a simple formatting step rather than a filesystem coordination problem.
#7: Quality Gates
What happens: An agent declares a task complete, and the platform accepts it without verification. There is no built-in mechanism to check whether the agent actually did what it said it would do. The official docs suggest using hooks but provide no quality-checking framework.
What I built: Three quality gates in the TaskCompleted handler, all checked before allowing completion:
- Understanding completeness: Did the agent declare what it understood and why (confidence reason)?
- Code view integrity: If the agent declared file modifications, are those declarations filled out?
- Uncertainty resolution: If the agent listed uncertainties, are they resolved?
Failing any gate returns exit code 2 with feedback listing the specific failures.
What would be better natively: Configurable pre-completion checks that run before a task is marked complete. This could be a quality_gates section in team configuration, with toggles for common checks (understanding documented, tests passing, files declared) and hooks for custom validation. The platform already has the task lifecycle — adding a validation step before completion would be a natural extension.
#8: Uncertainty Anchoring
What happens: An agent forms a hypothesis early, works toward confirming it, and marks the task complete — even if it listed open questions or uncertainties along the way. The official docs describe a “competing hypotheses” use case but rely entirely on prompt engineering to prevent premature commitment.
What I built: Gate 3 in the quality gates: if an agent wrote uncertainties in its understanding declaration, those uncertainties must be resolved (section removed or entries cleared) before TaskCompleted allows. An agent that wrote “Is the token refresh the actual root cause?” cannot mark complete until it either investigated and removed the uncertainty or explicitly decided it was no longer relevant.
What would be better natively: A first-class uncertainties field in the agent’s task context that the platform tracks. When an agent declares uncertainties, completion should require resolution — either by the agent explicitly closing them or by a configurable policy (e.g., lead review, peer validation). This turns uncertainty tracking from a prompt-level suggestion into a structural guarantee.
The Most Impactful Single Change
If I could request one platform change, it would be: add teammate_name to PreToolUse hook_input.
This single field would eliminate the hardest part of our implementation — the entire self-identification mechanism. Our two-phase agent detection (session_id file lookup with mtime fallback), the identity gate that blocks agents until they declare who they are, and the session-to-agent binding logic are all workarounds for one missing field. With teammate_name in PreToolUse, per-agent gating becomes a direct lookup rather than a probabilistic inference. File conflict detection, cross-agent context injection, lead agent enforcement, and quality gates all simplify dramatically when the hook knows which agent is calling.
The identity storm I observed in our first E2E run — 29 blocks in 48 seconds as agents tried to orient themselves before they could identify — was a direct consequence of this gap. I fixed it by allowing read-only tools pre-identity, but the underlying problem remains: coordination logic in PreToolUse is working with incomplete information.
E2E Evidence
These proposals are grounded in three end-to-end test runs on a real project (word-memory-system, Python backend with SM2 spaced repetition), using a 2-agent team (sm2-fixer and pagination-dev):
| Run 1 | Run 2 | Run 3 | |
|---|---|---|---|
| Pain points PASS | 4 | 5 | 6 |
| Pain points PARTIAL | 4 | 3 | 2 |
| Pain points FAIL | 0 | 0 | 0 |
| Identity blocks | 29 | 15 | 8 |
| Total hook events | 81 | 62 | 50 |
Progressive improvement: task status recording fixed between Run 1 and Run 2 (both agents in completed_tasks), agent identification fixed between Run 2 and Run 3 (session_id-based detection gave clean audit trails and confirmed lead delegation behavior). The two remaining PARTIAL results (file conflicts, uncertainty anchoring) are for scenarios that did not occur organically in testing — the mechanisms are implemented and unit-tested, but the specific trigger conditions were not met during the E2E runs. See the case study for detailed per-run analysis.
Unit tests: 171 tests, all passing, covering 1,555 lines of production code (1.7:1 test-to-code ratio).
Implementation Reference
The full implementation is in nsp-agent-teams/:
hooks/: 5 production hooks + shared utilities (1,555 LOC)tests/: 171 unit and integration tests (2,668 LOC)recon/: Phase 0 reconnaissance (7 debug hooks, 63 dump files analyzed)design.md: Architecture design documentdocs/agent-teams-case-study.md: Detailed case study with E2E test progression
The system is standalone (no external dependencies beyond PyYAML) and installs via a single command to any Claude Code project’s .claude/settings.json.