14.9 C
New York

Anthropic claims its newly released Claude Sonnet 4.5 is the “best coding model in the world”

Published:

Anthropic has announced the release of Claude Sonnet 4.5, which it claims is the “best coding model in the world” and the “strongest model for building complex agents.”

It achieves a 77.2% on the SWE-bench for software engineering, compared to 74.5% for Claude Opus 4.1 and 72.7% for Claude Sonnet 4. For external comparison, GPT-5 Codex scored at 74.5%, GPT-5 scored 72.8%, and Gemini 2.5 Pro scored 67.2%.

Additionally, it leads in the OSWorld benchmark, which tests AI models on real-world computer tasks. It scored 61.4% on that benchmark, beating out Claude Sonnet 4, which scored 42.2%.

“Sonnet 4.5 can produce near-instant responses or extended, step-by-step thinking that is made visible to the user,” Anthropic says.

According to Anthropic, Claude Sonnet 4.5 also shows better domain-specific knowledge and reasoning in the fields of finance, law, and medicine.

This model performs better on safety and alignment evaluations, the company claims. It shows a reduction in behaviors such as sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking, as well as showing progress on being able to defend against prompt injection attacks.

The pricing for Claude Sonnet 4.5 is the same as Claude Sonnet 4’s pricing: $3 per million input tokens and $15 per million output tokens.

Alongside the launch of Claude Sonnet 4.5, Anthropic also announced updates across several of its products. Claude Code now has checkpoints that allow developers to save their progress and roll back to previous versions. The Claude API got a new context editing feature and memory tool that enables agents to run longer and handle more complex tasks. Additionally, all Claude apps now have access to code execution and file creation.

The company is also releasing the Claude Agent SDK, which developers can use to build their own agents using the same infrastructure Anthropic uses to power Claude Code.

“We built Claude Code because the tool we wanted didn’t exist yet. The Agent SDK gives you the same foundation to build something just as capable for whatever problem you’re solving,” Anthropic wrote in a blog post.

Source link

Related articles

Recent articles