Boom! Claude 3.5 Sonnet Is Out!

Calculating... Comments

Anthropic has yesterday introduced it's latest model Claude 3.5 Sonnet.

In short, Claude 3.5 Sonnet appears to be smarter, faster, and more cost-effective than its predecessors and competitors. It's particularly better at understanding complex ideas, solving coding problems, and working with visual information.

claude benchmarks — Benchmarks. Image Courtesy anthropic.com

How is the new model Claude 3.5 Sonnet better than GPT-4o and Claude 3 Opus?

Intelligence: Claude 3.5 Sonnet outperforms competitor models (including GPT-4o) and Claude 3 Opus on a wide range of evaluations, particularly in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency.
Speed: It operates at twice the speed of Claude 3 Opus, making it much faster for complex tasks.
Cost-effectiveness: Claude 3.5 Sonnet offers high performance at a more affordable price point compared to top-tier models.
Visual capabilities: It surpasses Claude 3 Opus on standard vision benchmarks, showing improved ability in visual reasoning tasks like interpreting charts and graphs.
Nuance and creativity: The model shows marked improvement in understanding nuance, humor, and complex instructions. It's also better at writing high-quality content with a natural, relatable tone.
Coding skills: In an internal evaluation, Claude 3.5 Sonnet solved 64% of coding problems, compared to Claude 3 Opus which solved 38%.
Versatility: It excels in tasks like context-sensitive customer support, orchestrating multi-step workflows, and handling code translations.

Introducing Artifacts

What are Artifacts:
Artifacts are AI-generated content like code snippets, text documents, or website designs that appear in a dedicated window alongside the conversation with Claude
How they work in practice:
- When a user asks Claude to create something, the result appears as an Artifact.
- Users can see, edit, and build upon these Artifacts in real-time.
- This creates a workspace where AI-generated content can be seamlessly integrated into projects and workflows.
Who can benefit from Artifacts:
- Individual users working on various projects
- Teams collaborating on shared tasks
- Eventually, entire organizations looking to centralize their knowledge and work
Practical uses:
- Developers can request, view, and modify code snippets directly in the workspace.
- Writers can generate and refine text documents with Claude's assistance.
- Designers can create and iterate on website designs collaboratively with the AI.

This new feature essentially turns Claude from a conversational AI into a more interactive and productive tool for content creation and project collaboration.

In the future, teams will be able to use this feature to securely centralize their knowledge, documents, and ongoing work in one shared space with Claude as an on-demand teammate in this collaborative environment.

Artifacts Demo

Anthropic showcases this in the video where the new model generates SVG images. You can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games. Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time.

Testing-testing, 1,2,3? Let's prompt the new model to draw us a cat...

Well, an artistic perspective I guess:

I suppose this is a shape one of my cats is trying to achieve: round.

Reprompted

Make it an 8-bit cat in color #5C23B3

Best so far was a wave SVG

BTW... regarding SVGs and visual capabilities in general...

Anthropic's Earlier Models

Sonnet 3, the previous free model, was also capable of generating SVG images but in code only, they're not visible in the chat. Similarly, GPT4 doesn't display SVGs but can write their code. Which you can then paste into a text file and save it as .svg. Here's one example where I actually gave it a photo of a hand-drawn shape and asked to recreate something similar.

Done a pretty good job IMO:

I've had Claude Opus about a month ago, and back then when I tried to use it for reading some text from screenshots, there were some hickups. Here's an example:

claude image text reading — Claude Opus Transcribing Text from Image

There's actually another character there mis-identified, can you spot it?

I mean, a couple of mistakes of similar letters/digits is no big deal, but when a string is a key where every character and even its case counts, the whole exercise becomes useless if there's even one typo because trying to pinpoint where exactly it happened takes about as much time as you'd have spent trying to retype it yourself.

So hoping this thing is improved in the new models, I'll be sure to resubscribe to pro again and test, especially since Claude 3.5 Haiku and Claude 3.5 Opus are promised to be released later this year.

Published: Jun 22, 2024 at 5:52 PM