Blog

What is an LLM council? How asking many AIs at once beats asking one

Polymind2026-05-275 min readllm-council · multi-model · consensus

If you have heard the phrase LLM council in the last few months and weren't sure whether it was a product, a technique, or a meme, the short answer is: a technique that is quickly becoming a product category. An LLM council is a setup where you ask the same question to several large language models at once, have them read and critique each other's answers, and then let one model synthesize a single final response. One question goes in; a deliberation happens; one answer comes out — except the answer carries the weight of several frontier models instead of one.

The term went mainstream in late 2025 when Andrej Karpathy published a small open-source project literally called llm-council, and again in early 2026 when Perplexity shipped a feature called Model Council. But the idea is older than the name, and it's worth understanding on its own terms — because the name is doing a lot of useful work.

What is an LLM council, exactly?

An LLM council is a multi-model question-answering pattern with three moving parts:

The members — several different AI models (say Claude, GPT, Gemini, and Grok) that each answer your question independently.
A review step — the members read each other's answers, usually anonymized, and critique or rank them.
A synthesizer — one designated model reads everything and writes the final answer.

The word "council" is a metaphor, and a good one: it's a room of experts who each give an opinion, then argue, then hand you a considered verdict rather than a single off-the-cuff take. The premise is that a group of models with different training data, different blind spots, and different house styles will, between them, catch mistakes that any one of them would have stated with total confidence.

How does an LLM council work?

Most implementations, including Karpathy's, follow the same three stages.

Stage one: first opinions. Your question goes to every member of the council at the same time, and each one answers without seeing the others. This is deliberate — it prevents the models from anchoring on each other and gives you genuinely independent draws. If three models trained by three different labs independently land on the same answer, that agreement means something. If they scatter, that scatter means something too.

Stage two: review. Each model is shown the other answers — typically with the names stripped out, labeled "Response A," "Response B," and so on — and asked to evaluate them for accuracy and insight. The anonymization matters: it stops a model from flattering its own output or playing favorites with a sibling model. This is the step that turns a pile of parallel answers into something more like a deliberation. On Polymind this stage is configurable as the critique round — you choose how many rounds of back-and-forth the panel runs before the verdict.

Stage three: synthesis. One model — Karpathy calls it the Chairman; on Polymind we call it the judge — reads every answer and the reviews and produces the single response you actually read. Crucially, this is not a vote. The synthesizer doesn't tally which answer got the most points; it writes a new answer that may agree with one member, blend three, or overrule all of them. A vote gives you the median opinion. A synthesis gives you a considered one.

Council vocabulary: chairman, judge, members, panelists

The vocabulary is still settling, because the pattern is young. You will see the synthesizer called a chairman, a judge, or an arbiter. You will see the answering models called members, panelists, or an advisory board. The product names vary too — "LLM council," "AI council," "model council," "multi-model AI." They all describe the same shape. We collected the working definitions in the glossary so you can map one writer's "chairman" onto another's "judge" without losing the thread.

Why a council beats a single chatbot

The honest case for a council is not that it's fancier. It's that a single chatbot hides three things you'd want to know, and a council surfaces all three.

The first is a confidence signal. One model gives you one answer in one fluent, confident register — the same register whether it's certain or guessing. A council gives you agreement or disagreement, which is a real signal about how settled the question is.

The second is blind spots. Every model is the product of a specific training mix and a specific company's idea of "helpful." Those choices leak into the answers. Ask several models and the idiosyncrasies cancel out instead of compounding.

The third, and most valuable, is the disagreement itself. When two frontier models answer the same question differently, that is a flashing light over a soft spot in the question — exactly the place you'd want to slow down and think before acting. A single chatbot gives you no such warning. We wrote about this at length in why one chatbot isn't enough.

This is also why councils are a genuine hallucination check. A model fabricating a citation or an API that doesn't exist will do it with full confidence. Two independent models producing the same fabrication is rare; one inventing while the others demur is common. The fabrication doesn't stop happening — it stops being invisible.

When you don't need a council

A council is not free and not always worth it. For a quick reformat, a throwaway draft, or a question you could answer yourself, one model is faster and cheaper, and that's the right call. The council earns its keep on the questions where being wrong is expensive: a contract clause, a library choice you'll live with for two years, a medical follow-up, a factual claim you're about to publish. On those, the cents a multi-model run costs are trivial against the hours a confident wrong answer can burn.

How to use an LLM council without building one

You can build a council yourself — Karpathy's repo is a fine weekend project if you want to wire one up locally with your own API keys. Or you can use a hosted one. Polymind is a hosted LLM council: ask one question, watch seven frontier models (Claude, GPT, Gemini, Perplexity, Grok, Mistral, and Qwen) answer in parallel, optionally have them critique each other, and let a judge you choose synthesize the result — with a consensus score and a dissent callout on every run.

One thing a hosted council can do that a local script can't: keep score over time. Every completed run on Polymind feeds a public, statistically-corrected leaderboard of which models the judges actually pick most — broken out by code, legal, creative writing, and more. So beyond answering your question, a council can slowly answer a bigger one: which models are worth listening to, and for what.

If you want to run your own hardest question through a council, the home page is one prompt away. Subscribe via RSS for the next post, which compares running the council yourself versus using a hosted one.