Blog

Karpathy's LLM council, without the setup: a hosted multi-model panel

Polymind2026-05-274 min readllm-council · multi-model · product

In late 2025, Andrej Karpathy published a small project called llm-council and, almost overnight, gave a name to a pattern a lot of people had been reaching for: ask several AI models the same question, have them critique each other, and let one model synthesize the answer. The repo crossed nineteen thousand GitHub stars. If you found your way here after seeing it, this post is the honest version of the question you're probably asking — do I need to run the repo, or is there a faster way to try the idea?

What Karpathy's llm-council actually is

It's a small, local web app. You clone it, install the Python and JavaScript dependencies, supply an OpenRouter API key, and run it on your own machine. By default the "council" is four models — GPT-5.1, Gemini 3 Pro, Claude Sonnet 4.5, and Grok-4 — and you change the lineup by editing a config file. It runs the three-stage flow: every model answers, the models anonymously review and rank each other, and a designated Chairman model writes the final response. (If the three stages are new to you, we walked through them in what is an LLM council.)

Karpathy is refreshingly direct about what it is. In his own words it's "99% vibe coded" and "I'm not going to support it in any way, it's provided here as is for other people's inspiration." That's not a knock — it's exactly the right framing for a brilliant weekend hack meant to spread an idea. And it worked: the idea spread.

What running it yourself actually takes

The repo is genuinely simple to stand up if you're comfortable in a terminal. But "simple for an engineer" still means:

Cloning a repo and installing two toolchains (Python via uv, Node via npm).
Getting and funding an OpenRouter API key, which routes — and bills — every call.
Editing a config file to change which models sit on the council.
Running two local servers and keeping them running whenever you want to use it.

And once it's running, there are things it deliberately doesn't do, because it was never meant to be a product:

No accounts, no sync. Conversations are JSON files on the machine you ran it on. Start something on your laptop and it isn't on your phone.
No sharing. There's no link you can send someone to show them a run.
No memory across the ecosystem. Every run is its own island; there's no scoreboard of which models tend to win over time.
One user — you. There's no notion of other people using your instance.

None of that is a flaw. It's a weekend project doing exactly what it set out to do: let one curious person see multiple models deliberate. The gap only matters if what you want is to use the pattern regularly rather than study it once.

The hosted version of the same idea

Polymind is the same core idea — a multi-model council — built as a hosted product instead of a local script. You open a page, sign in with Google, type a question, and a panel of up to seven frontier models (Claude, GPT, Gemini, Perplexity, Grok, Mistral, and Qwen) answers in parallel. You can turn up the critique rounds to have them revise in light of each other, and a judge you choose synthesizes the final answer. Every run carries a consensus score and a callout on the sharpest dissent. There's nothing to install, no API key to manage, and no server to babysit — the provider keys and billing live on our side.

What you get that a local script can't easily give you:

Runs that follow you. Every run is saved to your account and synced across devices. Answer on your laptop, reopen it on your phone.
Shareable runs. Each completed run gets a link you can send.
A public leaderboard. This is the part no local instance can replicate: every completed run feeds a Wilson-corrected leaderboard of which models the judges actually pick most, broken out by code, legal, creative writing, and more. The council doesn't just answer your question — it contributes one data point to a slowly-sharpening picture of which models are worth listening to, and for what. The methodology is documented and the data is yours to cite.

When the repo is still the right call

To be fair about it: if your priority is privacy (keeping prompts on your own hardware), control (running local models, rewiring the prompts, owning every byte), or tinkering (the whole point is to read and modify the code), then Karpathy's repo — or one of the community forks that add Docker and local-model support — is the better tool, and you should use it. A hosted council is the wrong choice for someone whose actual goal is to hack on the council.

But if your goal is to get better answers to real questions without becoming the maintainer of your own AI infrastructure, a hosted council is the shorter path. That's the trade: the repo gives you control and asks for setup; the hosted version gives you setup-free convenience and a shared scoreboard, and asks you to trust a service with the keys.

Try it on a real question

The fastest way to feel the difference between one model and a council is to run a question you actually care about through both. Ask your usual chatbot, then run the same prompt through the Polymind home page and watch where the panel agrees, where it splits, and what the judge does with the disagreement. The free tier includes three queries a week on the lightweight models — enough to see the shape of it before you decide it's worth more.

Either way, Karpathy was right about the core thing: one model is a guess, and a council is a deliberation. The only real question is whether you want to run the room yourself or just walk into one.