Claude Fable 5 review: AI model crushes benchmarks but struggles with practical GTM tasks

2d ago

Lenny's Newsletter Gtm_strategy

The Gist

Claude Fable 5 hits 80% on SWBench Pro, outperforming GPT-4.5 and Gemini 3.1 Pro
Costs $10/$50 per million input/output tokens, twice the rate of cheaper models
Excels at vision tasks but produces unreadable specs and PRDs

Key Quotes

Fable 5 works like a 'seasoned engineer'—which is both its superpower and its Achilles’ heel.

The writing is nearly unreadable for specs and PRDs... It gets wrapped around the axle on details, creates big blocks of dense paragraphs with internal references, and makes it hard to see the forest for the trees.

Key Insights

Claude Fable 5 excels in benchmarks (80% on SWBench Pro) but struggles with practical tasks like design and writing readable specs.
The model is expensive ($10/$50 per million tokens) and consumes tokens at twice the rate of other models, requiring strategic deployment.
Fable 5 is exceptionally good at vision tasks (e.g., document formatting, PDF parsing) but produces poor design output for one-shot tasks.
The model is overly conservative in execution, often delivering minimal but not useful MVPs due to built-in safety guardrails.
Multi-agent orchestration is technically possible but unreliable, with frequent stalls and errors.
Match model intelligence to task complexity: use Fable 5 for hard technical problems and vision tasks, but cheaper models for front-end work and design.

Actionable Takeaways

Use Fable 5 strategically for high-complexity tasks (e.g., vision, technical problems) but opt for cheaper models for simpler tasks.
Avoid relying on Fable 5 for design or spec-writing tasks due to its poor output quality in these areas.
Monitor multi-agent workflows closely, as they are prone to stalls and errors.
Leverage Fable 5’s vision capabilities for document parsing and formatting tasks.

Data Points

80% (Fable 5's score on SWBench Pro, outperforming Opus 4.8, GPT-4.5, and Gemini 3.1 Pro.)
$10/$50 per million tokens (Cost of input/output tokens for Fable 5, making it more expensive than other models.)
95% (Sessions that don’t trigger a fallback to Opus 4.8 due to safety guardrails.)
30-day retention policy (Anthropic's policy to catch misuse of Fable 5.)

RevBots.ai View:

GTM teams should treat Fable 5 as a specialist tool for specific tasks rather than a general-purpose AI assistant.

Full Story: Lenny's Newsletter →

The Gist

RevBots.ai View:

Join The RevBots ARMy