πŸŽ“ delegation.school

Lessons / Power

Multi-model verification

Trusting one model's judgment on a security review, a billing change, or a deploy decision is the named anti-pattern: single-model trust. The fix is to fan the same question out to several models and trust where they agree β€” and to wire that into your real review flow so it runs automatically on the things that matter.

Better still, give each reviewer a distinct lens β€” correctness, security, "what breaks in production" β€” so they catch different failure modes instead of nodding along together.

Try it now

There are two ways to do this β€” and the difference matters.

The real version: different models, not one in costume. A single model playing three roles still shares one model's blind spots β€” if it's wrong in a way all three "lenses" inherit, you'll never catch it. To get genuinely independent eyes you route the change through a tool that fans it out to actually different models. Josh uses multipov.ai (his multi-LLM review tool β€” the /multipov pattern): paste a diff or a link and it runs Claude + GPT + Gemini + Grok in parallel, dedupes, and hands back a consensus-first report.

Run this through multipov.ai with correctness, security, and "what breaks in production" reviewers. Show me only what at least two of the models flagged.

[paste the diff or link]

The no-tooling fallback: one model, three lenses. No multi-model tool wired up yet? You can still get most of the value by making one model argue against itself from distinct angles. Name it honestly β€” this is one model in three costumes, not true multi-model. It catches the obvious misses; it won't surface a blind spot the model shares across all three lenses.

Review this change from three independent angles β€” correctness, security, and "what breaks in production." Be adversarial, not agreeable β€” try to find what each specialist would reject. Give me only the findings at least two of them would flag.

[paste the diff or link]

You've got it when…

You acted on consensus-weighted findings β€” issues multiple independent reviewers flagged β€” rather than one model's opinion. High-stakes calls now get more than one set of eyes, automatically.

Quiz β€” did it land?

Your tutor checks these before marking the lesson complete:

  1. Which named anti-pattern does multi-model review fix?
  2. Route a real change through 3 lenses (correctness / security / what-breaks) β€” what did consensus flag?