Claude Fable 5 review: AI model crushes benchmarks but struggles with practical GTM tasks
The Gist
- Claude Fable 5 hits 80% on SWBench Pro, outperforming GPT-4.5 and Gemini 3.1 Pro
- Costs $10/$50 per million input/output tokens, twice the rate of cheaper models
- Excels at vision tasks but produces unreadable specs and PRDs
Key Quotes
Fable 5 works like a 'seasoned engineer'—which is both its superpower and its Achilles’ heel.
The writing is nearly unreadable for specs and PRDs... It gets wrapped around the axle on details, creates big blocks of dense paragraphs with internal references, and makes it hard to see the forest for the trees.
Key Insights
- Claude Fable 5 excels in benchmarks (80% on SWBench Pro) but struggles with practical tasks like design and writing readable specs.
- The model is expensive ($10/$50 per million tokens) and consumes tokens at twice the rate of other models, requiring strategic deployment.
- Fable 5 is exceptionally good at vision tasks (e.g., document formatting, PDF parsing) but produces poor design output for one-shot tasks.
- The model is overly conservative in execution, often delivering minimal but not useful MVPs due to built-in safety guardrails.
- Multi-agent orchestration is technically possible but unreliable, with frequent stalls and errors.
- Match model intelligence to task complexity: use Fable 5 for hard technical problems and vision tasks, but cheaper models for front-end work and design.
Actionable Takeaways
- Use Fable 5 strategically for high-complexity tasks (e.g., vision, technical problems) but opt for cheaper models for simpler tasks.
- Avoid relying on Fable 5 for design or spec-writing tasks due to its poor output quality in these areas.
- Monitor multi-agent workflows closely, as they are prone to stalls and errors.
- Leverage Fable 5’s vision capabilities for document parsing and formatting tasks.
Data Points
- 80% (Fable 5's score on SWBench Pro, outperforming Opus 4.8, GPT-4.5, and Gemini 3.1 Pro.)
- $10/$50 per million tokens (Cost of input/output tokens for Fable 5, making it more expensive than other models.)
- 95% (Sessions that don’t trigger a fallback to Opus 4.8 due to safety guardrails.)
- 30-day retention policy (Anthropic's policy to catch misuse of Fable 5.)
RevBots.ai View:
GTM teams should treat Fable 5 as a specialist tool for specific tasks rather than a general-purpose AI assistant.
Full Story:
Lenny's Newsletter →
Join The RevBots ARMy
The insider daily for Autonomous Revenue Masters.