AI Parallax — See it from every angle

Step by step

HOW TO
RUN IT

The problem with asking three AI models the same question is that you don't get three opinions — you get one opinion in three costumes. Here's how to actually break that.

📅

Model versions — verified 7 April 2026

Free-tier models change frequently. The versions listed below were confirmed current as at 7 April 2026. If you're reading this later, check each platform's pricing page for the latest free-tier frontier model before starting your session.

PART 1 Open Three Tabs — Before You Do Anything Else

ChatGPT — Advocate role

Go to chat.openai.com — free account works. The free tier runs GPT-5.3 Instant (up to 10 messages per 5-hour window, then falls back to GPT-5.4 mini). Note: GPT-4o was retired on 3 April 2026. Hit New chat — start completely fresh, no prior conversation in this session.

Gemini — Critic role

Go to gemini.google.com — free account works. The free tier runs Gemini 2.5 Flash as the default; Gemini 3 Flash is rolling out as default in some regions — use whichever is available to you. Fresh conversation, no history.

Claude — Synthesiser role

Go to claude.ai — free account works. The free tier runs Claude Sonnet 4.5 (Opus is paid-only). New conversation. Claude adjudicates the other two outputs — it doesn't produce a third opinion of its own.

✎

Write your question down first — before you paste anything

Open Notes, a doc, anything. Write your topic, why you're researching it, and what a good answer actually needs to do. This one step stops you sending vague prompts that all three models answer identically.

PART 2 Running the Session

Send the Advocate prompt to ChatGPT

Go to Copy-Paste Prompts tab. Copy the ChatGPT prompt exactly, fill in your topic, send it. Wait for the full response. Copy the entire output into your notes — don't summarise it yourself.

Send the Critic prompt to Gemini

Copy the Gemini prompt, fill in the same topic, send it. The framing is deliberately opposite — it should feel like you're asking it to tear apart an idea. That friction is the whole point. Copy the full response to your notes.

Send the Synthesis prompt to Claude — with both outputs pasted in

Copy the Claude prompt. Where you see [PASTE CHATGPT OUTPUT HERE] and [PASTE GEMINI OUTPUT HERE] — paste the actual raw text from A and B. Don't trim it. Claude needs the full material to spot what each model dodged.

⚠

Read the synthesis — then check: did all three agree on something?

If Claude says all three models converged strongly on the same point — pause before you trust it. That's your monoculture warning. Ask Claude directly: "What would make this conclusion wrong?"

↺

Optional: rotate roles next session

Next time you research the same topic: swap Gemini to Advocate, Claude to Critic, ChatGPT to Synthesiser. The outputs will differ — that variance is information, not noise.

💡

Beginner's Rule of Thumb

If your research took under 10 minutes, you didn't do adversarial research — you did fast-food research. The friction between Advocate and Critic outputs is where the value lives. Lean into the discomfort.

For developers — install as a skill or MCP

This page is the browser version for anyone. If you use Claude Code, OpenClaw, Cursor, or any MCP-compatible tool, you can install AI Parallax as a proper skill or MCP server and trigger it with a single command — no copy-pasting required. The SKILL.md works for both Claude and OpenClaw. The MCP server publishes to npm. See the GitHub repo for full packaging instructions.

Ready to use

COPY-PASTE
PROMPTS

Four prompts. Copy each exactly, fill in your topic, paste into the right model. The structure forces real friction — don't simplify it.

ChatGPT — chat.openai.com Role: Advocate / Steel-man

You are an Advocate. Your job is to make the strongest possible case FOR the following topic. Do not hedge. Find the best evidence, the most compelling arguments, and the most credible sources that support this claim. TOPIC: [write your research question or claim here] Respond with: 1. CORE ARGUMENT (2 sentences — what is the strongest version of this claim?) 2. TOP 3 SUPPORTING POINTS (each with a confidence level: HIGH / MEDIUM / LOW and why) 3. BEST EVIDENCE (what real-world data, studies, or examples back this up?) 4. WHAT WOULD FALSIFY THIS? (what evidence, if it existed, would destroy your argument?) Do not soften your position. Commit to the strongest defensible version of this case.

Before you send: Replace the bracketed section with your actual topic. Keep everything else exactly as written — the structure is what forces a committed answer, not a hedged one.

Gemini — gemini.google.com Role: Critic / Attack

You are a Critic. Your job is to assume the following claim is flawed, incomplete, or wrong — and find out why. You are not trying to be balanced. You are trying to break this argument. TOPIC: [write the same research question or claim here] Respond with: 1. CORE WEAKNESS (what is the most fundamental problem with this claim?) 2. TOP 3 ATTACK POINTS (where does the argument fall apart? Rate each: FATAL / SERIOUS / MINOR) 3. MISSING EVIDENCE (what would a strong version of this claim need to prove that it hasn't?) 4. ALTERNATIVE EXPLANATION (what else could explain the same evidence?) 5. WHAT WOULD CHANGE YOUR MIND? (what evidence would make you consider this claim valid?) Be as specific as possible. Vague criticism is useless. Find the real cracks.

This prompt should feel aggressive. That's intentional. If Gemini gives you a balanced answer, follow up: "Stop being balanced. What is the single biggest problem with this claim?"

Claude — claude.ai Role: Synthesiser / Adjudicator

You are a research adjudicator. Below are two analyses of the same topic — one from an Advocate, one from a Critic. They were produced independently. Your job is to map them honestly. TOPIC: [write the same research question or claim here] --- ADVOCATE OUTPUT (from ChatGPT) --- [PASTE CHATGPT OUTPUT HERE] --- CRITIC OUTPUT (from Gemini) --- [PASTE GEMINI OUTPUT HERE] Now produce: 1. AGREEMENTS — what do both analyses accept as true? (If they agree on everything, flag this as a potential monoculture warning) 2. CONTRADICTIONS — where do they directly conflict? List each conflict clearly. 3. SINGLE-SOURCE CLAIMS — what did only one analysis mention? (Treat these as unverified) 4. BLIND SPOTS — what did neither analysis address that matters? 5. CONCLUSION — based only on what survives both analyses, what can be asserted with confidence? 6. OVERALL CONFIDENCE: HIGH / MEDIUM / LOW — and your explicit reasoning for that rating. Do not try to be diplomatic. A claim either survived the process or it didn't.

The two paste sections are critical. Don't summarise the other models' outputs — paste them in full. Claude needs the actual text to spot what each model avoided saying, not just what they said.

Any model — run after synthesis Role: Red Flag Check

I've just run an adversarial research session using three AI models on this topic: [your topic]. The synthesis produced this conclusion: [paste Claude's conclusion here] Now I want you to stress-test it. 1. STRONGEST OBJECTION — what is the best argument against this conclusion that none of the three models raised? 2. SHARED BLIND SPOTS — is there a way all three models might have been wrong in the same direction? (Same training data, same cultural bias, same framing?) 3. WHAT TYPE OF EXPERT WOULD DISAGREE — who, professionally, would push back and why? 4. DISCONFIRMING EVIDENCE — what real-world finding, if published tomorrow, would force this conclusion to be revised? 5. YOUR CONFIDENCE — on a scale of 0–10, how much should I trust this conclusion, and what would move that number up?

This is the final safety check. Run it on any model — or all three. If all three give the same disconfirming evidence, you've found your monoculture. If they give different answers, that's your research frontier.

What the labs say

LAB
POSITIONS

Each lab has a different safety lens. Click any card to see what they cover — and what none of them address.

OpenAI

ChatGPT

⚡

External Threat Defence

Prompt injection is the core risk. Solution: AI-powered red teamers and rapid response loops. Acknowledged that prompt injection "may never be fully solved."

Coverage

External attack hardening

Prompt injection testing

Agent-specific system cards

Correlated failure detection

Model diversity requirements

LLM-as-judge independence

Threat framed as external. Monoculture collapse — a structural, internal property — is not on the radar.

Google

Gemini

🔬

Interoperability & Cross-Vendor

Agent2Agent protocol lets agents collaborate regardless of framework or vendor. Game Arena stress-tests via adversarial simulation. Most structurally diversity-aware of the three.

Coverage

Cross-vendor comms (A2A)

Adversarial agent simulation

Reasoning lineage diversity

Attribution in multi-agent chains

Correlated failure detection

Evaluation independence

A2A solves communication across vendors — not shared cognitive lineage. Agents can interoperate fluently and still form a monoculture.

Anthropic

Claude

⚖️

Human Oversight & Minimal Footprint

Agents should do less, check in more. Read-only permissions by default. Evaluates explicitly for sycophancy and deception. Most governance-oriented of the three labs.

Coverage

Human-in-the-loop controls

Sycophancy & deception testing

Minimal footprint principle

Multi-agent correlated failures

Cross-model verification gaps

System-level monoculture risk

Most governance-oriented. Structural conservatism helps — but doesn't address verification layer failure at the system level.

⚠️

The Shared Blind Spot

None of the three labs formally address correlated failure as a structural property of multi-agent systems. Safety is framed as an external attack problem, a communication problem, or a human oversight problem — but not as a verification layer failure. That gap is yours to manage.

Coverage matrix

THE
SHARED GAP

A direct comparison of what each lab covers. The red rows are five capabilities no lab has built yet — and they're the ones that matter most.

OpenAI

Google

Anthropic

Not covered

Correlated failure detection Gap

Model diversity requirements Gap

LLM-as-judge independence Gap

Reasoning lineage diversity Gap

Verification layer blind spots Gap

External attack hardening

Adversarial stress testing

Human oversight controls

Sycophancy / deception testing

Cross-vendor agent comms

Agent-specific system cards

WHAT THE GAP MEANS IN PRACTICE

→You cannot rely on any lab's built-in governance to catch monoculture collapse

→Unanimous agent agreement has no official red-flag mechanism in any production framework

→LLM-as-judge pipelines carry unacknowledged correlation risk across all three vendors

→The verification layer is the blind spot — and it's yours to architect around