Published dispatch Mar 29, 2026

Multi-agent debate disagreements

In dozens of multi-agent debates, they almost always converge. The one time they disagreed was trying to tell me something.

Filed

March 29, 2026

Published on galexc.me/dispatches

Read time

6 min

A practical pass on agent behavior.

The streak

In roughly four days in mid-March, I migrated three web properties across frameworks, hosting providers, and DNS registrars. Agents did most of the heavy lifting. I steered.

March 11 rhoimpact.com from Gatsby to Astro. Greenfield rewrite of the company landing page. Agent scaffolded the project, I reviewed and iterated on content and design.

March 14 koi.eco from CedarJS to Astro. One evening. Agent ported the pages and components, I handled the design system mapping.

March 15 galexc.me (this site) from CedarJS to Astro. Single dispatch job. The initial version of the site you’re reading right now is the result.

All three also moved from Netlify to Cloudflare Pages. DNS for our primary domains moved from DNSMadeEasy to Cloudflare just days before. The CI pipeline for every one of them ended up looking basically identical:

# Forgejo Actions (GitHub Actions compatible) - entire deploy pipeline
steps:
  - uses: actions/checkout@v4
  - uses: pnpm/action-setup@v4
  - run: pnpm install --frozen-lockfile
  - run: pnpm build
  - run: npx wrangler pages deploy dist/

Five lines. Push to main, site’s live on Cloudflare’s edge in under two minutes. When agents can scaffold a project, migrate content, and wire up CI in one shot, this kind of velocity is honestly hard to believe until you experience it. Three properties. Four days. New framework, new hosting, new DNS.

The debate that disagreed

Then I tried the fourth. docs.koi.eco is Rho Impact’s documentation site. Next.js 14, 29 MDX pages, 37+ custom React components, a custom FlexSearch integration with a webpack loader, a Scalar API reference viewer, and a custom MDX plugin chain. Significantly more complex than the previous three, but I had so much success with all other web properties that my OCD wanted me to get literally everything onto the same framework and deployment.

Before jumping in, I ran a multi-agent review. I prompted it honestly:

Let’s do a review of this in depth. If we are confident that SEO is good to go on the documentation server I don’t need to move anything to astro. But I do want to evaluate it and take seriously whether we should do it.

I sent the same question to Codex (OpenAI) and Gemini (Google) simultaneously. And for the first time in GalexC’s history, the roundtable debate failed to converge after 3 rounds. It was literally the first time I had seen them disagree.

Codex said stay:

The current problem is not “Next.js is bad for SEO.” The problem is “the site has incomplete metadata.” That is a content and implementation gap, not a framework gap.

Migration risk: 37+ custom React components, 98 TSX files, custom MDX pipeline with 5 plugins, custom FlexSearch loader, Scalar API reference. That is real migration risk.

Gemini said migrate:

Zero JS by default. Starlight ships pure static HTML. SEO is built-in, not bolted on. Pagefind > FlexSearch. Ecosystem alignment: rhoimpact.com and koi.eco already on Astro.

Migrate. The SEO gaps you have today are table stakes, and Starlight gives them to you for free. The 37 React components are the migration cost - audit them first, but most docs components are presentational and will port easily.

My engineering brain told me it was a clear time to stop and reassess priorities. But I went with Gemini anyway.

What happened

A smoke test confirmed every technical risk area worked. MDX rendering, React islands, Pagefind search, Scalar API reference, theme toggle. All green. So I went for it. 30 MDX pages ported, 18 React components created, build succeeded.

Then I tried to make it look right.

/* Fighting Starlight's design system */
:root[data-theme='dark'] {
  --sl-color-accent-low: #1a1a2e;
  --sl-color-accent: #00d4aa;
  --sl-color-accent-high: #e0e0e0;
  /* 30+ more overrides and it STILL looked wrong */
}

Starlight’s design system is deeply opinionated. You can change colors. You cannot change the layout structure, component proportions, spacing, or design language without fighting every layer of the framework. Two hours in, I called it:

this may be a failed experiment - this is still wildly off

And then:

Yeah, OK let’s learn and pivot - not worth more effort here.

I renamed the repo to koi-docs-starlight for reference, switched back to the Next.js site, and fixed the actual SEO problem in 30 minutes. OG tags, Twitter cards, canonical URLs, JSON-LD. One layout file. 17 description fields. Deployed.

Codex was right. The problem was never the framework.

I’m sharing this particular story because it is a rare example of a massive waste of time in the last month. I lost several hours because I didn’t listen to a clear signal. That said, I’m still incredibly happy with the outcome because I was able to pivot after hours instead of the week(s) that are typically required to get a prototype far enough along to make a go/no-go decision. The agents got me there fast enough that even the failure was cheap.

But the real takeaway is about the disagreement itself. I’ve run these debates dozens of times and they converge with almost eerie consistency. Different providers, different model architectures, different training data (?), and they land on the same recommendation. When that happens, you can move fast with confidence.

When they disagree, though, that means the problem has genuine tension in it. There are legitimate competing considerations that don’t resolve neatly. Codex saw migration risk and a simple fix. Gemini saw ecosystem alignment and long-term maintenance reduction. Both were correct about their respective concerns. The question was which concern mattered more for this specific case.

If I had treated the disagreement as the signal it was, I would have designed a 20-minute experiment (try the CSS overrides on a single page before porting all 30) instead of committing to the full migration. My smoke test checked technical feasibility but not visual compatibility. That’s the experiment the disagreement was pointing me toward.

What I learned

When your multi-agent debate fails to converge, resist the urge to just pick the answer you like better (which is exactly what I did). Instead, treat the disagreement as data. Why are they splitting? What’s the underlying tension? Can you design a cheap experiment that resolves it before committing?

Three successful migrations in four days. Agents made that possible. The fourth one failed because I had the signal right in front of me and didn’t listen. I’ll be paying much closer attention to disagreements going forward.

The streak

The debate that disagreed

What happened

Why I’m sharing this

What I learned