Skip to content

SDD Process — Working Notes

Companion to spec_writing_guide.md. That doc = how to write one spec. This doc = the process, tooling, and team-cycle decisions around specs. Living doc; tracks decisions + open problems.


Workspace architecture (decided)

10 product repos + 1 dev-env repo, all siblings. Boss vetoes nesting repos under dev-env (best option, blocked by external path assumptions).

Launch Claude from dev-env, always. Consequences: - Skills in dev-env/.claude/skills/ load (cwd) — no plugin/marketplace machinery needed. Launching from dev-env deletes the skill-sharing problem. - dev-env/CLAUDE.md loads as the overarching spec. - Sibling repo specs do NOT auto-load (discovery walks cwd + ancestors, never siblings).

Repo-local specs: reference-map in dev-env/CLAUDE.md. A map of repo → spec file paths + an imperative rule: "when the user names a repo, read its references BEFORE planning or editing." Loads only the named repo's spec → keeps context local (matches the locality finding; precise task-relevant context beats whole-repo). Degrades gracefully ("read repo_c's spec" is a one-line fix). List each repo's refs in dependency order.

Rejected alternatives: symlink farm (path-resolution risk, unverified); @import of all repo specs (context bloat); per-repo settings propagation via git hook (worked, but unnecessary once launching from dev-env).


Modern SDD workflow

Staged pipeline, gate between stages:

intent → requirements → design → tasks → implement

  • Don't one-shot from a paragraph. Catch errors at the cheap stage (a requirement is one sentence to fix).
  • Decompose into independently verifiable, dependency-ordered task units; mark parallel-safe.
  • Auto-generate the task list, but human approves the list (cheap — it's bullets) before implementation. Gate on the plan = insurance against generate-faster-than-you-verify.

Status: this is the converging industry pattern, encoded in shipping tools. Spec-as-source-of-truth is a working bet, not a proven result.


Team cycle changes

Boss's direction: replace sprints with something lighter; standups at spec/feature level ("I implemented spec A, want to tackle B"), not ticket status. Assessment: right direction, with corrections.

What dies: implementation-effort estimation (story points sizing "how long to code"); fixed sprint as delivery batch; implementation-task tickets as human work units.

What persists / intensifies: intent capture, traceability, prioritization, audit. More code volume → these matter more.

What's new: - Spec subsumes the ticket. Epic (stable intent) → Spec (human work unit, source of truth) → Task (LLM-generated, regenerable leaf). Authorship gradient: higher tier = more human. - Plan at spec granularity; verify at task granularity (tasks demote to the review surface). - Continuous flow > time-boxed sprints. WIP limit moves to review, not coding. - Estimation shifts: size "how hard to specify + verify," not "how hard to build." Trivial-to-build/hard-to-verify is now the expensive task. - Standup must surface the bottleneck (generated-but-unverified, stuck-in-review, integration risk), not just declared intent. Done-status is in the system; standup = decisions + blockers.


Industry landscape (grounding)

The bottleneck moved to review + product validation: - Output/engineer ~+60% 2025→2026; many teams ship same pace or slower. - AI-generated PRs wait ~4.6× longer for reviewer pickup. - ~78% of companies use AI; ~80% report no significant business impact ("generative AI paradox") — features without product validation. - "Should we build it?" overtook "can AI build it?" — product became the limiting role.

What leading teams do: SDD as named practice (Thoughtworks, McKinsey); spec as standup/planning/review reference; AI first-pass review (CodeRabbit, Copilot review) to widen the bottleneck; PR-queue visibility + risk scoring; context engineering (precise per-task context

whole repo).

Source quality: mostly consultancy/vendor/practitioner, not peer-reviewed. Direction well-supported; specific figures indicative.


Tooling

Tool npm/dist Niche
OpenSpec @fission-ai/openspec lightweight spec layer; propose→apply→archive; folder-per-change; 20+ assistants. The one of interest.
Spec Kit specify CLI (GitHub) Spec→Plan→Tasks→Implement; 84k★; test-before-impl ordering
Task Master task-master-ai parse-prd → dependency-ordered tasks; editor-native
BMAD-METHOD bmad-method role-playing agent team (PM/architect/dev/QA)
AWS Kiro IDE EARS requirements → design → tasks

What they solve: the intent→agent gap. Force explicit spec before code; decompose into agent-sized units; scope context; stage with gates; traceability. They are context/intent harnesses around the model, not model improvements.

What they DON'T solve: - Review bottleneck — they worsen it (more code, faster); only help via reviewability (smaller, spec-traceable units). - Drift (spec↔code sync). OpenSpec sidesteps via immutable archived changes; doesn't reconcile a live spec against built code. - Spec correctness (spec vs intent) — still human judgment. - Prioritization ("should we build it") — untouched. - Granularity sweet spot — they decompose, don't tell you the grain.


TDD enforced off specs

Core trap: one agent writing tests + code from one spec/context = a self-consistent pair encoding the same misunderstanding twice. TDD's value is test independence from implementation. Enforcement must protect that independence.

Layered enforcement: 1. Spec gate — acceptance criteria must be testable (EARS). Reject vague criteria. 2. Test-first sequencing — tests generated from criteria as a separate reviewed artifact; halt before implementation. 3. Review tests-vs-spec, not code — the high-leverage inversion. 4. Red gate — tests must fail against stub/empty impl (kills tautological tests). 5. CI hard gate — block merge unless tests exist, pass, and every acceptance criterion maps to ≥1 test (criterion coverage, not line coverage); add mutation testing.

Stack placement: - Claude Code PreToolUse hook: block Write/Edit on impl files lacking a matching test (authoring-time enforcement, committed/shared). - /spec-to-tests dev-env skill: criteria → tests → halt → review → implement. - CI: criterion-coverage + mutation + red-first (uncheatable backstop).

Limits: criterion coverage ≠ correctness; spec-vs-intent still human; independence must be actively protected (separate test-author from code-author, or rely on red gate + mutation to recover it).


Open problems (carry forward)

  • Drift / spec↔code reconciliation. Central unsolved problem. Esp. spec changes mid-flight: regenerate task list while preserving review state of already-built/verified tasks. No tool reconciles cleanly.
  • Granularity sweet spot. Too coarse → variance; too fine → spec as big as code, no leverage. (= this repo's standing minimum-cost-spec conjecture; GPO's ~9-dim effective space hints specs sit far below code complexity.)
  • Spec-vs-intent verification. Tests verify code-vs-spec; nothing verifies spec-vs-intent. Loop only half-closed.

Status — outdated as of 2026-05-28

This doc is the original SDD process notes from llm_tools. The pipeline has since shipped; see sdd_plan_working_notes.md and the /sdd-* skill set. Leaving this file as reference for the original framing.

The original "TODO / not yet built" section listed: - dev-env/CLAUDE.md reference-map block — shipped (see developers-environment/CLAUDE.md). - /spec-to-tests skill — superseded by /sdd-test-write. The PreToolUse test-gate hook was not built; the spec-first model handles this differently via /sdd-spec-gap driving the implementer. - OpenSpec vs. build-our-own — built our own per sdd_plan_working_notes.md §G2.