Serious engineering, infinite possibilities

AI-assisted development
you can actually ship to production.

MTK is an opinionated toolkit that turns any AI coding assistant into a disciplined engineering partner. Evidence-first workflows, adversarial review gates, and deterministic linters — built for enterprise teams shipping regulated software.

MIT licensed 31 skills · 4 review agents 6 AI assistants supported
claude-code — /mtk "add 2FA to login endpoint"
┌─ 01 · Brainstorm & spec
 spec written      → .claude/tasks/0042-2fa-login.spec.json
 engineer approved → security impact: HIGH · security-and-hardening loaded

┌─ 02 · Batched TDD (3 batches)
 batch 1/3 · totp service       12 tests · 12 pass · build 0
 batch 2/3 · login handler      18 tests · 18 pass · build 0
 batch 3/3 · recovery codes     9 tests · 9 pass · build 0

┌─ 03 · Spec-drift & adversarial review
 spec-drift                     files match · contracts match
! compliance-reviewer            2 findings
  HIGH  rule S7.3 totp secret logged at auth_service.cs:84 [conf 100]
  MED   rule S4.2 missing rate-limit on verify endpoint [conf 92]
 test-reviewer                  coverage 94% · assertions strong

┌─ 04 · Verification gate
 stop-hook blocked completion — 1 HIGH finding unresolved
  fix required before commit
Works with
Claude CodeCursorGitHub Copilot WindsurfGemini CLICline Claude CodeCursorGitHub Copilot WindsurfGemini CLICline
01 The problem

Instructions are advisory. Production is not.

AI assistants follow CLAUDE.md rules about 80% of the time. The other 20% ships to production. A mocked test that was supposed to be real. A secret logged in plaintext. A migration that "looked fine" but deadlocks under load. When you're building regulated software — finance, healthcare, critical infrastructure — 80% is not a number you can afford.

Without MTK

"It compiles."

Vibes-based completion. AI claims success. Nobody cites evidence. Reviewer trusts the output. A week later, prod breaks.

The 80% problem

Rules in CLAUDE.md are mostly followed.

Instructions buried in context drift. Under pressure, the model cuts corners — disables tests, swallows errors, skips review.

With MTK

"Tests pass, evidence cited, review clear."

Hooks block unverified claims. Linters catch known-bad patterns. Adversarial agents review findings with rule citations before merge.

Foundations

No new methodology. Proven disciplines, enforced on the AI.

MTK industrialises the engineering practices enterprise teams already trust — specification, verification, adversarial review — and makes the harness enforce them, not the reviewer.

F.01

Spec-driven

Written intent before code. Scope, files, security, and tests approved up front.

F.02

Test-driven

Failing test first. Green build per batch. No test, no merge path.

F.03

Peer review

Adversarial reviewers in isolated, forked contexts — running in parallel on orthogonal axes. The author never reviews alone.

F.04

Evidence-based

Exit codes, test counts, citations. Completion is a receipt, not a feeling.

F.05

Deterministic gates

Hooks and linters enforce the rules. 100% compliance, not 80%.

02 How it works

A pipeline, not a prompt.

Every feature flows through four stages. Each stage has its own skills, its own gates, its own adversarial checks. Your AI doesn't get to skip steps — the harness enforces them.

Figure 01 · Request → Commit 4 stages · 4 artifacts · 1 evidence trail
MTK request-to-commit pipeline Four stages — Spec, Implement, Review, Gate — each producing an auditable artifact. Review findings loop the work back to Implement. ENGINEER plain-English request /mtk "add 2FA" 01 Spec Brainstorm, plan, engineer approves spec approved 02 Implement Batched TDD, build + test per batch green build 03 Review Linters · compliance · test · architecture findings = 0 04 Gate Spec-drift check, evidence citation evidence cited COMMIT with receipts ✓ MERGE-READY ARTIFACT 01 spec.json ARTIFACT 02 39/39 tests pass ARTIFACT 03 findings.json ARTIFACT 04 evidence.log if findings open → rebatch stage 02
Pipeline stage Passed gate Auditable artifact Failure loopback
01

Brainstorm & spec

Before a single line of code, the engineer approves a spec sidecar — scope, files touched, security impact, test strategy. No spec, no implementation.

brainstorming spec-driven-development planning-and-task-breakdown
02

Batched implementation with TDD

Large work is decomposed into verifiable batches. Each batch: failing test first, then code, then green. Between batches: build, test, checkpoint.

test-driven-development incremental-implementation source-driven-development
03

Adversarial two-stage review

Deterministic linters first — confidence 100 on known-bad patterns. Then three reviewer agents in isolated context: compliance, test, architecture. Findings cite rule IDs and file:line.

compliance-reviewer test-reviewer architecture-reviewer
04

Evidence gate before completion

The stop hook blocks any "done" claim without cited build output, test count, and exit codes. Spec-drift detection compares the diff against the approved manifest. No evidence, no completion.

verification-before-completion spec-drift-detection pre-commit-review
03 Why enterprise teams choose MTK

Built for teams that can't afford "mostly right."

Nine capabilities that separate MTK from prompt-and-pray workflows.

Evidence-first, not trust-first

The stop hook rejects completion claims without cited build output, test counts, and exit codes. "I think it works" doesn't pass.

Deterministic + AI layering

Linter packs catch known-bad patterns — secrets, SQLi, disabled tests — at confidence 100. AI handles the judgment calls. One unified finding schema.

Security embedded, not bolted on

Changes to auth, secrets, audit trails, or external inputs auto-trigger the security-and-hardening skill. Security lives in design — not final polish.

Structured findings, not vibes

Every review outputs JSON with severity, confidence, rule citation, and file:line. Teams audit and track over time. Auditors can read it.

Spec-drift detection

Before review, the diff is compared against the approved spec sidecar. Scope creep, surprise files, and undeclared contract changes surface automatically.

Anti-rationalization by design

Every skill ships with a "Common Rationalizations" table — the exact excuses AI uses to skip steps ("just this once", "small change") — pre-rebutted in-context.

Pluggable tech stacks

Language-agnostic core + first-class .NET, Python, and TypeScript packs. EF Core, MediatR, SQLAlchemy, FastAPI, React, Next.js, Tauri.

Compound learning

Correction-capture turns every "no, not like that" into a persistent lesson. Handoff preserves state across sessions. Your AI gets better over time.

Zero vendor lock-in

Native config generation for Cursor, Copilot, Windsurf, Gemini, and Cline — not just Claude Code. One source of truth, six target harnesses.

04 The difference

What "done" means, before and after.

The same engineering milestones, compared across stock AI workflows and MTK-enforced pipelines.

Dimension Stock AI assistant With MTK
Completion claim "Done. It should work." Cited build exit 0 · 47/47 tests pass · no lint findings
Security review Skipped for "small" changes Auto-triggered on auth / secrets / audited state
Findings format Prose suggestions, no IDs JSON · severity · confidence · rule · file:line
Scope discipline Drifts during implementation Diff vs. approved spec · drift surfaced pre-review
Critical rules In CLAUDE.md · ~80% compliance Enforced by hooks · 100% deterministic
Review pass Same model reviews own work Forked context · adversarial persona · tools restricted · parallel Stage 2 fan-out
Cross-harness Per-tool config drift One source · six native configs generated
05 Components

31 skills. 4 agents. 16 hooks. One discipline.

MTK is composed of small, pressure-tested pieces that snap together. Invoke one entry point — the rest orchestrate themselves.

Entry points — two commands, everything else routes
/mtk-setupBootstrap a repo — detect stack, pull guidelines, generate CLAUDE.md
/mtk-setup --auditRefresh architecture-principles.md after a structural change
/mtk-setup --mergeUnify per-repo audits into one team-wide standard
/mtk <description>Plain English. Routes to fix, implement, pre-commit-review, or context-report.
Reviewer agents (adversarial, isolated context)
compliance-reviewer
Stage 1 — security, correctness, standards.
test-reviewer
Stage 2 — coverage, assertion quality, test smells.
architecture-reviewer
Stage 2 — boundaries, dependencies, naming, coupling.
silent-failure-hunter
Stage 1 (conditional) — swallowed catches, silenced promises, optimistic fallbacks, test erosion. Dispatched when the diff touches error-handling tokens.
Tech stacks
tech-stack-dotnet tech-stack-python tech-stack-typescript + pluggable
06 What a finding looks like

Structured, cited, auditable.

MTK reviews don't output prose. They output findings. Every finding has a severity, a confidence score, a rule citation, and a precise file:line reference. Pipe them into a dashboard, an audit log, or a PR bot. Auditors can trace every rejection back to a written standard.

severityHIGH · MED · LOW · INFO
confidence0–100, linters default to 100
ruleID from project standards (e.g. S7.3)
locationfile:line, every time
review-findings.json
{
  "severity":   "HIGH",
  "confidence": 100,
  "rule":       "S7.3",
  "source":     "linter.secrets",
  "title":      "Secret logged in plaintext",
  "file":       "src/auth/totp.cs",
  "line":       84,
  "snippet":    "_log.Info($\"totp: {secret}\")",
  "guidance":   "Never log secrets. Redact at the sink.",
  "blocks_merge": true
}
07 Get started

Install in one command.

MTK ships as a Claude Code plugin and works with every major AI coding assistant. Install once, bootstrap any repo, stay disciplined.

1 · Install the plugin claude-code
$ /plugin marketplace add moberghr/mtk-agent-toolkit
$ /plugin install mtk
2 · Bootstrap your repo one-time
$ /mtk-setup
  → detects stack · audits architecture · generates CLAUDE.md
3 · Ship disciplined work every feature
$ /mtk "add 2FA to login endpoint"
 spec approved ·  batched TDD ·  adversarial review ·  evidence gate
View on GitHub Read the docs

Stop shipping vibes.
Start shipping evidence.

Join the teams who replaced "it compiles" with "it compiles, tests pass, findings resolved, spec approved." Your AI has never been more capable — make sure it's also disciplined.

Get MTK About Moberg
08 FAQ

Common questions.

Does MTK only work with Claude Code?

No. MTK is Claude-Code-native (plugin marketplace, skills, agents) but generates native config for Cursor, GitHub Copilot, Windsurf, Gemini CLI, and Cline. The rules, standards, and review criteria stay consistent across all six harnesses.

Is this finance-specific?

No — MTK is tech-stack-agnostic and domain-flexible. A finance supplement ships with it (regulated state, sensitive data, audit requirements) because it was born in a fintech team, but any serious-software team benefits. Healthcare, infrastructure, and compliance-heavy domains fit the same workflow.

Doesn't this slow the AI down?

Per-feature wall-clock is slightly longer. Net throughput is dramatically higher because the drift tax — rework, production incidents, review rounds, "that looked fine to me" postmortems — drops. Engineers stop babysitting AI output and start shipping.

Can I use it on an existing codebase?

Yes. /mtk-setup audits your repo, extracts architecture principles from existing code, pulls team coding guidelines, and generates a project-specific CLAUDE.md. Run /mtk-setup --merge across multiple repos to produce one team-wide standard.

Is it open source?

Yes — MIT licensed. Fork it, extend it, add your own tech stack pack. Contributions welcome via PR on GitHub.