Pattern: Cross-Model Review

Category: Quality Assurance Source: garrytan/gstack (/codex) Status: Cataloged Evaluation: RD-0013

When to Use

When a deliverable needs adversarial review and a single model's blind spots are a risk. Running a second AI model (different provider) as an independent reviewer catches issues the primary model normalizes or overlooks. Most valuable for high-stakes deliverables, security-sensitive code, and client-facing outputs.

How It Works

Primary review: The main agent (Claude) performs its standard review pass
Independent review: A second model (e.g., OpenAI Codex/GPT) reviews the same artifact independently
- Uses a different system prompt optimized for adversarial challenge
- Has no access to the primary review's findings
Comparative synthesis: Results from both reviews are merged
- Overlapping findings = high-confidence issues (both models agree)
- Unique findings = potential blind spots worth investigating
- Contradictions = areas needing human judgment
Adversarial modes: The second model can be prompted to specifically challenge assumptions, find edge cases, or stress-test the primary model's reasoning

Example

A digital talent's agent.md is reviewed by Claude for completeness and correctness. Simultaneously, a GPT-4 review pass challenges the scope boundaries, looks for prompt injection vulnerabilities in the system prompt, and flags ambiguous instructions. The synthesis reveals that Claude missed a potential escalation loop in the error handling, while GPT flagged a false positive on a naming convention issue. The real bug gets fixed; the false positive is dismissed.

Tradeoffs

Pro	Con
Catches blind spots specific to any single model	Doubles API cost for each review
High-confidence when models agree on an issue	Contradictions require human triage
Adversarial framing finds issues polite review misses	Requires maintaining prompts for multiple model APIs
Builds confidence for high-stakes deliverables	Slower — sequential or parallel API calls

Factory Usage

Quinn (QA Engineer): Use for final QA gate on client deliverables
Clara (CTO): Use for architectural review of production line designs
Future: Could become a standard gate in the quality pipeline for Tier 1 clients