Skip to main content

GPT-4o and Gemini 1.5 Pro just got beat in the AI race

a screenshot of claude 3.5 sonnet, with an 8-bit crab
Anthropic

There’s a new leader, technically, in the race for AI assistant dominance, and it’s Anthropic’s new Claude 3.5 Sonnet. The newly released model outperforms both Gemini 1.5 Pro and ChatGPT-4o across a spectrum of benchmark tests, the company announced on Thursday.

This new iteration of Sonnet is the first in Anthropic’s upcoming line of 3.5 models, and it significantly outperforms the more expansive Opus 3.0 model, and does so at a fraction of the larger model’s energy cost. Compute efficiency is becoming an increasingly important aspect of AI system design, especially as the cost of both powering and cooling AI data centers soars while the infrastructure pushes into the gigawatt range.

Claude 3.5 Sonnet for vision

“Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus,” the Anthropic team wrote in a blog post. “This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows.”

Recommended Videos

The new model has reportedly set benchmark results across three standardized tests: graduate-level reasoning with GPQA, undergraduate-level knowledge with MMLU, and coding proficiency with HumanEval. It beat out Google’s Gemini 1.5 Pro, Meta’s Llama-400b, and OpenAI’s ChatGPT-4o, though not by any huge margin and typically only by a couple percentage points.

A table showing Claude 3.5 Sonnet's performance compared to other leading AI systems.
Anthropic

Sonnet 3.5 is being billed as Anthropic’s “strongest vision model yet. ” It’s capable of performing a number of vision-based tasks — like interpreting charts and graphs or transcribing text from imperfect image sources like screenshots or scanned receipts — more accurately than Opus 3.0. In fact, Sonnet 3.5 beat out Opus 3.0 by anywhere from 6 to 17 points across industry standard vision benchmarks. The new model is also reportedly much more competent at handling humor and can converse in a much more lifelike manner.

Sonnet will also be the first Anthropic AI to offer the Artifacts feature to users. Rather than generate images or code snippets directly into the flow of the conversation, Artifacts will create that content in a dedicated space to the side of the chat. This allows users to create “a dynamic workspace where they can see, edit, and build upon Claude’s creations in real time, seamlessly integrating AI-generated content into their projects and workflows,” the Anthropic team claims. It also announced that Claude will soon support team collaboration wherein a company can store its data, documents and projects in a single, central silo, with Claude acting as an on-demand assistant.

You can try out Claude 3.5 Sonnet today for free on the Claude.ai website and the Claude iOS app (a Claude Pro or Team subscription will garner you significantly higher rate limits). Third-party integration is also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude Haiku 3.5 and Opus 3.5 are scheduled for release later in the year.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
AMD’s next-gen APU may trail behind the RX 6600
AMD's CEO delivering the Computex 2024 presentation.

AMD's next-gen APU lineup, dubbed Strix Halo, is right around the corner -- but for now, all we can rely on are leaks when it comes to information about these processing units. Today, a leaked Geekbench test gave us some insight into the graphics performance of one of the upcoming top processors. While the integrated GPU sports more cores, it failed to beat the aging RX 6600, and actually trailed behind by a significant margin.

Brace yourself, because the APU in question has a name that you'll need to write down. In the Geekbench test, the chip is referred to as AMD Ryzen AI Max+ Pro 395 w/ Radeon 8060S. The actual product name will likely omit the mention of the GPU, but even just the first part is quite a mouthful. AMD also drops the "9" that you'd usually expect to see in a flagship processor, such as the Ryzen 9 9900X.

Read more
Who needs iCloud with this cloud storage service 1TB lifetime subscription
Koofr cloud storage service hero

TL;DR: Keep all your files in one place with a 1TB Koofr Cloud Storage Lifetime Subscription, only $110, normally $810, until December 8 at 11:59 p.m. PT. 

There's no shortage of cloud storage options. The problem is finding one that has enough space for all your files and the security to keep them safe without costing more than a nice external hard drive. 

Read more
Upgrade to Windows 11 Pro and enhance your PC experience
windows 11 pro deal retailking december 2024 upgrade to and reimagine promo  edited

TL;DR: Get Windows 11 Pro for $17.97 until December 22 and enjoy premium features for work and play.

Microsoft Windows 11 Pro is the upgrade that takes your PC to the next level. With an intuitive design, enhanced multitasking tools, and robust security features, it’s built to streamline your workday and elevate your entertainment experience — with a lifetime license on sale for just $18 through December 22.

Read more