Technology · Artificial Intelligence

Anthropic's Claude Opus 4 Sets New Benchmark on Reasoning Tasks

The latest model from the San Francisco lab posts state-of-the-art results on advanced reasoning evaluations, intensifying competitive pressure on OpenAI and Google.

Marcus Lindqvist·Senior Banking Correspondent, London

April 22, 2026 · 11 min read

Anthropic's Claude Opus 4 Sets New Benchmark on Reasoning Tasks

Anthropic released Claude Opus 4 to enterprise customers this week, posting headline results on the GPQA Diamond and ARC-AGI benchmarks that exceed the previously published numbers from GPT-5 and Gemini Ultra 2.

Enterprise pricing has held firm at $15 per million input tokens and $75 per million output tokens — a notable refusal to participate in the price competition that has compressed margins across the inference market.

Customer concentration remains a vulnerability. Three customers, including Amazon Web Services and a major federal agency, account for more than 40 percent of Anthropic's annualised revenue.

The company's safety positioning continues to differentiate it commercially in regulated industries. Twelve of the top twenty global banks are now Claude customers, and uptake among federal contractors has accelerated since the model's most recent independent evaluation.

AnthropicClaudeEnterprise AI

Global Business Journal

Anthropic's Claude Opus 4 Sets New Benchmark on Reasoning Tasks

More from Technology

Nvidia and the Trillion-Dollar AI Infrastructure Race

OpenAI's $300 Billion Valuation Marks AI Capital Cycle Peak

AWS Reports Strongest Growth in Two Years as AI Workloads Scale