Anthropic Rolls Out Claude 3.5 Sonnet Upgrade, New Claude 3.5 Haiku, and Computer Control Feature

Calculating... Comments

Anthropic has just dropped some big updates, featuring the improved Claude 3.5 Sonnet and a brand-new model, Claude 3.5 Haiku. Plus, they’ve released a game-changing feature in public beta: computer use.

Key Announcements:

Claude 3.5 Sonnet Upgrade: Boosted performance, especially in coding, where it’s already a top contender.
Claude 3.5 Haiku: Matches Claude 3 Opus in many tests, with the same price and speed as the earlier Haiku model.
Computer Use Beta: Claude can now control a computer by moving the cursor, clicking, and typing. This beta feature is now available via API.

Claude's Computer Control:
With the computer use feature, Claude can handle tasks that need dozens or even hundreds of steps. Major players like Asana, Canva, Replit, and DoorDash are already testing it out. Replit is using this ability for key features in their Replit Agent product.

Claude | Computer use for automating operations

With the upgraded Claude 3.5 Sonnet, we’re introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Claude 3.5 Sonnet Benchmarks Breakdown

claude-benchmarks — Benchmarks posted by Anthropic

Coding: Claude 3.5 Sonnet earned 49% in software engineering tests, beating all other models, including specialized coding and reasoning AIs like OpenAI’s preview. It improved a lot from Claude 3 Opus' 33.4%.
Math: The model nearly doubled its performance in high school math contests, showing better math reasoning.
Visual QA: Claude 3.5 Sonnet’s visual understanding improved by 2%, hitting top-tier levels. This is key for tasks where it analyzes screenshots.
Agentic Abilities: It scored 49% in agentic coding tests and 46% in agentic tool use for airlines, both much higher than its previous version.
Overall: Claude 3.5 Sonnet outperformed GPT-40 and Google’s Gemini 1.5 Pro in graduate-level reasoning and maintained the lead on MMLU Pro with a 3% increase.

Claude 3.5 Haiku: Fast and Affordable

Coding: Claude 3.5 Haiku does better than many top models, including the earlier Claude 3.5 Sonnet and GPT-40, while also being the fastest and cheapest model in its class.
Agentic Tool Use: Claude 3.5 Haiku scored 51% on retail and 22.8% on airline benchmarks, showing its efficiency and speed.

According to Artificial Analysis:

Claude 3.5 Sonnet (October 2024) Comparison Overview

Quality: Claude 3.5 Sonnet (Oct) ranks higher than average, achieving a Quality Index of 80 across evaluations.
Price: It’s priced above average at $6.00 per 1M tokens (blended 3:1). Input tokens cost $3.00, while output tokens are priced at $15.00 per 1M.
Speed: Claude 3.5 Sonnet (Oct) operates slower than average, processing 58.6 tokens per second.
Latency: It has a quicker response time than average, taking just 0.73 seconds to deliver the first token.
Context Window: The model supports a larger-than-average context window, holding up to 200k tokens.

Availability

Claude 3.5 Sonnet: Available now.
Computer Use Beta: Ready for developers to test via the API, Amazon Bedrock, and Google Cloud’s Vertex AI.
Claude 3.5 Haiku: Coming later this month.

Published: Oct 23, 2024 at 12:00 PM