OpenAI drops GPTâ€‘4o's New and Wild Image Generator

Calculating... Comments

GPTâ€‘4o integrates new image creation right into its model, making AI-generated visuals more accurate and realistic.

It nails text rendering in images, something AI struggled with before.
It's better at understanding detailed prompts and creating images that match what you ask for.
Users can upload images as inspiration or get completely original visuals from text descriptions.
It’s slowly rolling out to all ChatGPT users, but Sora is your best bet if you’re not seeing it yet.

OpenAI's new image generation model handling text, from OpenAI's X(Twitter)

OpenAI’s new GPTâ€‘4o model is breaking new ground by making AI image creation part of its standard toolkit. Instead of just handling text or images separately, it now does both in one go, producing better, more realistic visuals that actually match what you’re describing. It also gets better at understanding your prompts. You give it details, and it actually listens. Plus if you upload a photo for inspiration it can tweak or recreate it based on your description.

An image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.

The most mindblowing part is how it handles text in images. It actually writes stuff that’s readable and makes sense which was a huge problem for AI before. And it’s not just about slapping text onto an image—it’s about making it fit the scene perfectly. Previously, I think the leaders of readable text in images were Ideogram, Recraft. Just yesterday Reve the new model was dropped that boasted great longer readable text capabilities. But with this latest release by OpenAI this is mindblowing.

What is even more impressive is that it is going to have editing capabilities right in the chat, very similar to what Google has done with Gemini. I am excited for this, but also a bit sad that these groundbreaking shifts are happening again with largest players leading. Always nice to see more competition and small businesses, I hope they will not end up getting crushed by what the leaders of the game can offer.

The model can make all sorts of things, including step-by-step guides, memes and infographics:

Infographics example, sourced from top images on Sora

Prompt was just:

visualize an infographic explaining Newton's prism experiment in great detail, dark blue background

How Can You Try It?

GPTâ€‘4o’s image generation is slowly being rolled out to all ChatGPT users even on the free tier but not everyone has it yet. Turns out asking your ChatGPT model what it’s using might help—many are still running Dall-E.