AGEofLLMs.com
Search

OpenAI Drops o3

Calculating... Comments

OpenAI has announced its newest AI model, o3, sparking buzz in the AI community. This release is seen as a major leap toward Artificial General Intelligence (AGI), showcasing extraordinary skills and record-breaking achievements.

OpenAI releases their latest and smartest o3 model; digital illustration
OpenAI releases their latest and smartest o3 model; digital illustration

What Makes o3 Special?

o3 is built on the success of earlier O Series models, but it focuses on solving problems step-by-step. It uses reinforcement learning to create solutions, then reviews and improves them through a verifier system. By moving from predicting single words to sequences that make sense logically, o3 changes how AI approaches problems.

Notably, OpenAI skipped naming this model o2, avoiding a clash with a telecom company, and launched o3 and o3 Mini—a more affordable version with solid performance.

How Does It Perform?

o3 benchmarks; arcprize.org website screenshot
o3 benchmarks; arcprize.org website screenshot

o3 sets new benchmarks in several fields:

  • Coding: It scored 71.7% on Sbench Verified (real-world coding tasks), outperforming both earlier OpenAI models and the competition.
  • Math: With 96.7% accuracy on a key competition test, it’s close to perfect at solving complex problems.
  • Science: Achieving 87.7% on graduate-level science questions, o3 proves it can handle advanced challenges.
  • Novel Problem-Solving: On the ARC Prize benchmark, it outdid human performance with an 87.5% score.

Is This AGI?

OpenAI defines AGI as AI that’s better than humans at most useful tasks. o3’s ability to tackle math, coding, and innovative problem-solving hints it might qualify in these areas. Still, it’s not perfect—it struggles with subjective tasks and spatial reasoning.

High Costs and How OpenAI Is Tackling Them

Hitting benchmarks like Frontier Math didn’t come cheap. OpenAI spent about $350,000 on "thinking time" for these efforts. That’s a lot, but it’s not expected to stay that way. With better GPUs and faster progress in tech, these costs should drop in the near future.

o3-coding-tasks-cost-sam-altman
Sam Altman on X: "on many coding tasks, o3-mini will outperform o1 at a massive cost reduction! i expect this trend to continue, but also that the ability to get marginally more performance for exponentially more money will be really strange."

To make things more affordable, OpenAI launched o3 Mini. It offers similar performance but costs way less to run. Plus, it has a cool feature: adjustable thinking time. This means you can tweak how much effort the model puts into a task, so you save on resources for simpler problems. It’s a practical way to make advanced AI tools work for more people and businesses.

Buzz Around OpenAI's o3 Model

OpenAI’s o3 announcement has sparked excitement and skepticism.

  • Performance vs. Costs: o3 achieved 87.5% on ARC-AGI but required $350,000 in compute, raising concerns about practicality. Costs are estimated at $20-$3,500 per task depending on compute settings.
  • AGI Debate: While o3 excels at tough problems, critics argue it’s not AGI since it struggles with simple tasks. OpenAI says ARC-AGI isn’t an AGI test but shows progress.
  • Transparency and Concerns: Questions about overfitting and discrepancies in data presentations have raised skepticism.
  • Making It Affordable: o3 Mini offers similar performance with adjustable compute at lower costs, though pricing remains a concern.
  • Future Outlook: High costs may drop as technology advances, making such models more practical and competitive.

The buzz reflects both awe at the innovation and questions about its real-world use. [ Reddit ]

OpenAI is being cautious, ensuring o3 is thoroughly tested before wider use. Ethical questions, oversight, and responsible deployment are top priorities as AI grows smarter.

The company is already planning o4 and o5, with releases expected next year, though safety tests may delay them. Meanwhile, o3 Mini aims to bring high-level AI skills to more users at a lower cost.

Related Posts

Visitor Comments

Please prove you are human by selecting the tree.