MTK is an opinionated toolkit that turns any AI coding assistant into a disciplined engineering partner. Evidence-first workflows, adversarial review gates, and deterministic linters — built for enterprise teams shipping regulated software.
┌─ 01 · Brainstorm & spec ✓ spec written → .claude/tasks/0042-2fa-login.spec.json ✓ engineer approved → security impact: HIGH · security-and-hardening loaded ┌─ 02 · Batched TDD (3 batches) ✓ batch 1/3 · totp service 12 tests · 12 pass · build 0 ✓ batch 2/3 · login handler 18 tests · 18 pass · build 0 ✓ batch 3/3 · recovery codes 9 tests · 9 pass · build 0 ┌─ 03 · Spec-drift & adversarial review ✓ spec-drift files match · contracts match ! compliance-reviewer 2 findings HIGH rule S7.3 totp secret logged at auth_service.cs:84 [conf 100] MED rule S4.2 missing rate-limit on verify endpoint [conf 92] ✓ test-reviewer coverage 94% · assertions strong ┌─ 04 · Verification gate ✗ stop-hook blocked completion — 1 HIGH finding unresolved fix required before commit
AI assistants follow CLAUDE.md rules about 80% of the time. The other 20% ships to production. A mocked test that was supposed to be real. A secret logged in plaintext. A migration that "looked fine" but deadlocks under load. When you're building regulated software — finance, healthcare, critical infrastructure — 80% is not a number you can afford.
Vibes-based completion. AI claims success. Nobody cites evidence. Reviewer trusts the output. A week later, prod breaks.
Instructions buried in context drift. Under pressure, the model cuts corners — disables tests, swallows errors, skips review.
Hooks block unverified claims. Linters catch known-bad patterns. Adversarial agents review findings with rule citations before merge.
MTK industrialises the engineering practices enterprise teams already trust — specification, verification, adversarial review — and makes the harness enforce them, not the reviewer.
Written intent before code. Scope, files, security, and tests approved up front.
Failing test first. Green build per batch. No test, no merge path.
Adversarial reviewers in isolated, forked contexts — running in parallel on orthogonal axes. The author never reviews alone.
Exit codes, test counts, citations. Completion is a receipt, not a feeling.
Hooks and linters enforce the rules. 100% compliance, not 80%.
Every feature flows through four stages. Each stage has its own skills, its own gates, its own adversarial checks. Your AI doesn't get to skip steps — the harness enforces them.
Before a single line of code, the engineer approves a spec sidecar — scope, files touched, security impact, test strategy. No spec, no implementation.
Large work is decomposed into verifiable batches. Each batch: failing test first, then code, then green. Between batches: build, test, checkpoint.
Deterministic linters first — confidence 100 on known-bad patterns. Then three reviewer agents in isolated context: compliance, test, architecture. Findings cite rule IDs and file:line.
The stop hook blocks any "done" claim without cited build output, test count, and exit codes. Spec-drift detection compares the diff against the approved manifest. No evidence, no completion.
Nine capabilities that separate MTK from prompt-and-pray workflows.
The stop hook rejects completion claims without cited build output, test counts, and exit codes. "I think it works" doesn't pass.
Linter packs catch known-bad patterns — secrets, SQLi, disabled tests — at confidence 100. AI handles the judgment calls. One unified finding schema.
Changes to auth, secrets, audit trails, or external inputs auto-trigger the security-and-hardening skill. Security lives in design — not final polish.
Every review outputs JSON with severity, confidence, rule citation, and file:line. Teams audit and track over time. Auditors can read it.
Before review, the diff is compared against the approved spec sidecar. Scope creep, surprise files, and undeclared contract changes surface automatically.
Every skill ships with a "Common Rationalizations" table — the exact excuses AI uses to skip steps ("just this once", "small change") — pre-rebutted in-context.
Language-agnostic core + first-class .NET, Python, and TypeScript packs. EF Core, MediatR, SQLAlchemy, FastAPI, React, Next.js, Tauri.
Correction-capture turns every "no, not like that" into a persistent lesson. Handoff preserves state across sessions. Your AI gets better over time.
Native config generation for Cursor, Copilot, Windsurf, Gemini, and Cline — not just Claude Code. One source of truth, six target harnesses.
The same engineering milestones, compared across stock AI workflows and MTK-enforced pipelines.
| Dimension | Stock AI assistant | With MTK |
|---|---|---|
| Completion claim | "Done. It should work." | Cited build exit 0 · 47/47 tests pass · no lint findings |
| Security review | Skipped for "small" changes | Auto-triggered on auth / secrets / audited state |
| Findings format | Prose suggestions, no IDs | JSON · severity · confidence · rule · file:line |
| Scope discipline | Drifts during implementation | Diff vs. approved spec · drift surfaced pre-review |
| Critical rules | In CLAUDE.md · ~80% compliance | Enforced by hooks · 100% deterministic |
| Review pass | Same model reviews own work | Forked context · adversarial persona · tools restricted · parallel Stage 2 fan-out |
| Cross-harness | Per-tool config drift | One source · six native configs generated |
MTK is composed of small, pressure-tested pieces that snap together. Invoke one entry point — the rest orchestrate themselves.
/mtk-setupBootstrap a repo — detect stack, pull guidelines, generate CLAUDE.md/mtk-setup --auditRefresh architecture-principles.md after a structural change/mtk-setup --mergeUnify per-repo audits into one team-wide standard/mtk <description>Plain English. Routes to fix, implement, pre-commit-review, or context-report.compliance-reviewer
test-reviewer
architecture-reviewer
silent-failure-hunter
MTK reviews don't output prose. They output findings. Every finding has a severity, a confidence score, a rule citation, and a precise file:line reference. Pipe them into a dashboard, an audit log, or a PR bot. Auditors can trace every rejection back to a written standard.
{
"severity": "HIGH",
"confidence": 100,
"rule": "S7.3",
"source": "linter.secrets",
"title": "Secret logged in plaintext",
"file": "src/auth/totp.cs",
"line": 84,
"snippet": "_log.Info($\"totp: {secret}\")",
"guidance": "Never log secrets. Redact at the sink.",
"blocks_merge": true
}
MTK ships as a Claude Code plugin and works with every major AI coding assistant. Install once, bootstrap any repo, stay disciplined.
$ /plugin marketplace add moberghr/mtk-agent-toolkit $ /plugin install mtk
$ /mtk-setup → detects stack · audits architecture · generates CLAUDE.md
$ /mtk "add 2FA to login endpoint" ✓ spec approved · ✓ batched TDD · ✓ adversarial review · ✓ evidence gate
Join the teams who replaced "it compiles" with "it compiles, tests pass, findings resolved, spec approved." Your AI has never been more capable — make sure it's also disciplined.
No. MTK is Claude-Code-native (plugin marketplace, skills, agents) but generates native config for Cursor, GitHub Copilot, Windsurf, Gemini CLI, and Cline. The rules, standards, and review criteria stay consistent across all six harnesses.
No — MTK is tech-stack-agnostic and domain-flexible. A finance supplement ships with it (regulated state, sensitive data, audit requirements) because it was born in a fintech team, but any serious-software team benefits. Healthcare, infrastructure, and compliance-heavy domains fit the same workflow.
Per-feature wall-clock is slightly longer. Net throughput is dramatically higher because the drift tax — rework, production incidents, review rounds, "that looked fine to me" postmortems — drops. Engineers stop babysitting AI output and start shipping.
Yes. /mtk-setup audits your repo, extracts architecture principles from existing code, pulls team coding guidelines, and generates a project-specific CLAUDE.md. Run /mtk-setup --merge across multiple repos to produce one team-wide standard.
Yes — MIT licensed. Fork it, extend it, add your own tech stack pack. Contributions welcome via PR on GitHub.