Serious engineering, infinite possibilities

AI-assisted development
you can actually ship to production.

MTK is an opinionated toolkit that turns any AI coding assistant into a disciplined engineering partner. Evidence-first workflows, adversarial review gates, and deterministic linters — built for enterprise teams shipping regulated software.

Install MTK See how it works

MIT licensed 31 skills · 4 review agents 6 AI assistants supported

claude-code — /mtk "add 2FA to login endpoint"

┌─ 01 · Brainstorm & spec
✓ spec written      → .claude/tasks/0042-2fa-login.spec.json
✓ engineer approved → security impact: HIGH · security-and-hardening loaded

┌─ 02 · Batched TDD (3 batches)
✓ batch 1/3 · totp service       12 tests · 12 pass · build 0
✓ batch 2/3 · login handler      18 tests · 18 pass · build 0
✓ batch 3/3 · recovery codes     9 tests · 9 pass · build 0

┌─ 03 · Spec-drift & adversarial review
✓ spec-drift                     files match · contracts match
! compliance-reviewer            2 findings
  HIGH  rule S7.3 totp secret logged at auth_service.cs:84 [conf 100]
  MED   rule S4.2 missing rate-limit on verify endpoint [conf 92]
✓ test-reviewer                  coverage 94% · assertions strong

┌─ 04 · Verification gate
✗ stop-hook blocked completion — 1 HIGH finding unresolved
  fix required before commit

01 The problem

Instructions are advisory. Production is not.

AI assistants follow CLAUDE.md rules about 80% of the time. The other 20% ships to production. A mocked test that was supposed to be real. A secret logged in plaintext. A migration that "looked fine" but deadlocks under load. When you're building regulated software — finance, healthcare, critical infrastructure — 80% is not a number you can afford.

Without MTK

"It compiles."

Vibes-based completion. AI claims success. Nobody cites evidence. Reviewer trusts the output. A week later, prod breaks.

The 80% problem

Rules in CLAUDE.md are mostly followed.

Instructions buried in context drift. Under pressure, the model cuts corners — disables tests, swallows errors, skips review.

With MTK

"Tests pass, evidence cited, review clear."

Hooks block unverified claims. Linters catch known-bad patterns. Adversarial agents review findings with rule citations before merge.

Foundations

No new methodology. Proven disciplines, enforced on the AI.

MTK industrialises the engineering practices enterprise teams already trust — specification, verification, adversarial review — and makes the harness enforce them, not the reviewer.

F.01

Spec-driven

Written intent before code. Scope, files, security, and tests approved up front.

F.02

Test-driven

Failing test first. Green build per batch. No test, no merge path.

F.03

Peer review

Adversarial reviewers in isolated, forked contexts — running in parallel on orthogonal axes. The author never reviews alone.

F.04

Evidence-based

Exit codes, test counts, citations. Completion is a receipt, not a feeling.

F.05

Deterministic gates

Hooks and linters enforce the rules. 100% compliance, not 80%.

02 How it works

A pipeline, not a prompt.

Every feature flows through four stages. Each stage has its own skills, its own gates, its own adversarial checks. Your AI doesn't get to skip steps — the harness enforces them.

Figure 01 · Request → Commit 4 stages · 4 artifacts · 1 evidence trail

Pipeline stage Passed gate Auditable artifact Failure loopback

Brainstorm & spec

Before a single line of code, the engineer approves a spec sidecar — scope, files touched, security impact, test strategy. No spec, no implementation.

brainstorming spec-driven-development planning-and-task-breakdown

Batched implementation with TDD

Large work is decomposed into verifiable batches. Each batch: failing test first, then code, then green. Between batches: build, test, checkpoint.

test-driven-development incremental-implementation source-driven-development

Adversarial two-stage review

Deterministic linters first — confidence 100 on known-bad patterns. Then three reviewer agents in isolated context: compliance, test, architecture. Findings cite rule IDs and file:line.

compliance-reviewer test-reviewer architecture-reviewer

Evidence gate before completion

The stop hook blocks any "done" claim without cited build output, test count, and exit codes. Spec-drift detection compares the diff against the approved manifest. No evidence, no completion.

verification-before-completion spec-drift-detection pre-commit-review

03 Why enterprise teams choose MTK

Built for teams that can't afford "mostly right."

Nine capabilities that separate MTK from prompt-and-pray workflows.

Evidence-first, not trust-first

The stop hook rejects completion claims without cited build output, test counts, and exit codes. "I think it works" doesn't pass.

Deterministic + AI layering

Linter packs catch known-bad patterns — secrets, SQLi, disabled tests — at confidence 100. AI handles the judgment calls. One unified finding schema.

Security embedded, not bolted on

Changes to auth, secrets, audit trails, or external inputs auto-trigger the security-and-hardening skill. Security lives in design — not final polish.

Structured findings, not vibes

Every review outputs JSON with severity, confidence, rule citation, and file:line. Teams audit and track over time. Auditors can read it.

Spec-drift detection

Before review, the diff is compared against the approved spec sidecar. Scope creep, surprise files, and undeclared contract changes surface automatically.

Anti-rationalization by design

Every skill ships with a "Common Rationalizations" table — the exact excuses AI uses to skip steps ("just this once", "small change") — pre-rebutted in-context.

Pluggable tech stacks

Language-agnostic core + first-class .NET, Python, and TypeScript packs. EF Core, MediatR, SQLAlchemy, FastAPI, React, Next.js, Tauri.

Compound learning

Correction-capture turns every "no, not like that" into a persistent lesson. Handoff preserves state across sessions. Your AI gets better over time.

Zero vendor lock-in

Native config generation for Cursor, Copilot, Windsurf, Gemini, and Cline — not just Claude Code. One source of truth, six target harnesses.

04 The difference

What "done" means, before and after.

The same engineering milestones, compared across stock AI workflows and MTK-enforced pipelines.

Dimension	Stock AI assistant	With MTK
Completion claim	"Done. It should work."	Cited build exit 0 · 47/47 tests pass · no lint findings
Security review	Skipped for "small" changes	Auto-triggered on auth / secrets / audited state
Findings format	Prose suggestions, no IDs	JSON · severity · confidence · rule · file:line
Scope discipline	Drifts during implementation	Diff vs. approved spec · drift surfaced pre-review
Critical rules	In CLAUDE.md · ~80% compliance	Enforced by hooks · 100% deterministic
Review pass	Same model reviews own work	Forked context · adversarial persona · tools restricted · parallel Stage 2 fan-out
Cross-harness	Per-tool config drift	One source · six native configs generated

05 Components

31 skills. 4 agents. 16 hooks. One discipline.

MTK is composed of small, pressure-tested pieces that snap together. Invoke one entry point — the rest orchestrate themselves.

Entry points — two commands, everything else routes

/mtk-setupBootstrap a repo — detect stack, pull guidelines, generate CLAUDE.md

/mtk-setup --auditRefresh architecture-principles.md after a structural change

/mtk-setup --mergeUnify per-repo audits into one team-wide standard

/mtk <description>Plain English. Routes to fix, implement, pre-commit-review, or context-report.

Reviewer agents (adversarial, isolated context)

compliance-reviewer

Stage 1 — security, correctness, standards.

test-reviewer

Stage 2 — coverage, assertion quality, test smells.

architecture-reviewer

Stage 2 — boundaries, dependencies, naming, coupling.

silent-failure-hunter

Stage 1 (conditional) — swallowed catches, silenced promises, optimistic fallbacks, test erosion. Dispatched when the diff touches error-handling tokens.

Tech stacks

tech-stack-dotnet tech-stack-python tech-stack-typescript + pluggable

06 What a finding looks like

Structured, cited, auditable.

MTK reviews don't output prose. They output findings. Every finding has a severity, a confidence score, a rule citation, and a precise file:line reference. Pipe them into a dashboard, an audit log, or a PR bot. Auditors can trace every rejection back to a written standard.

severity — HIGH · MED · LOW · INFO

confidence — 0–100, linters default to 100

rule — ID from project standards (e.g. S7.3)

location — file:line, every time

review-findings.json

{
  "severity":   "HIGH",
  "confidence": 100,
  "rule":       "S7.3",
  "source":     "linter.secrets",
  "title":      "Secret logged in plaintext",
  "file":       "src/auth/totp.cs",
  "line":       84,
  "snippet":    "_log.Info($\"totp: {secret}\")",
  "guidance":   "Never log secrets. Redact at the sink.",
  "blocks_merge": true
}

07 Get started

Install in one command.

MTK ships as a Claude Code plugin and works with every major AI coding assistant. Install once, bootstrap any repo, stay disciplined.

1 · Install the plugin claude-code

$ /plugin marketplace add moberghr/mtk-agent-toolkit
$ /plugin install mtk

2 · Bootstrap your repo one-time

$ /mtk-setup
  → detects stack · audits architecture · generates CLAUDE.md

3 · Ship disciplined work every feature

$ /mtk "add 2FA to login endpoint"
✓ spec approved · ✓ batched TDD · ✓ adversarial review · ✓ evidence gate

View on GitHub Read the docs

Stop shipping vibes.
Start shipping evidence.

Join the teams who replaced "it compiles" with "it compiles, tests pass, findings resolved, spec approved." Your AI has never been more capable — make sure it's also disciplined.

Get MTK About Moberg

08 FAQ

Common questions.

Does MTK only work with Claude Code?›

No. MTK is Claude-Code-native (plugin marketplace, skills, agents) but generates native config for Cursor, GitHub Copilot, Windsurf, Gemini CLI, and Cline. The rules, standards, and review criteria stay consistent across all six harnesses.

Is this finance-specific?›

No — MTK is tech-stack-agnostic and domain-flexible. A finance supplement ships with it (regulated state, sensitive data, audit requirements) because it was born in a fintech team, but any serious-software team benefits. Healthcare, infrastructure, and compliance-heavy domains fit the same workflow.

Doesn't this slow the AI down?›

Per-feature wall-clock is slightly longer. Net throughput is dramatically higher because the drift tax — rework, production incidents, review rounds, "that looked fine to me" postmortems — drops. Engineers stop babysitting AI output and start shipping.

Can I use it on an existing codebase?›

Yes. /mtk-setup audits your repo, extracts architecture principles from existing code, pulls team coding guidelines, and generates a project-specific CLAUDE.md. Run /mtk-setup --merge across multiple repos to produce one team-wide standard.

Is it open source?›

Yes — MIT licensed. Fork it, extend it, add your own tech stack pack. Contributions welcome via PR on GitHub.

AI-assisted development you can actually ship to production.

Instructions are advisory. Production is not.

"It compiles."

Rules in CLAUDE.md are mostly followed.

"Tests pass, evidence cited, review clear."

No new methodology. Proven disciplines, enforced on the AI.

Spec-driven

Test-driven

Peer review

Evidence-based

Deterministic gates

A pipeline, not a prompt.

Brainstorm & spec

Batched implementation with TDD

Adversarial two-stage review

Evidence gate before completion

Built for teams that can't afford "mostly right."

Evidence-first, not trust-first

Deterministic + AI layering

Security embedded, not bolted on

Structured findings, not vibes

Spec-drift detection

Anti-rationalization by design

Pluggable tech stacks

Compound learning

Zero vendor lock-in

What "done" means, before and after.

31 skills. 4 agents. 16 hooks. One discipline.

Structured, cited, auditable.

Install in one command.

Stop shipping vibes.Start shipping evidence.

Common questions.

AI-assisted development
you can actually ship to production.

Stop shipping vibes.
Start shipping evidence.