Incredible: ChatGPT's New 4o vs Gemini 2.0 Flash Experimental—Who Draws Best?
Yes, this post is all images! And yes, 4o created that cover image when I asked it how it felt about having the ability to create images now. I dive into the tech and do a side-by-side with Gemini!
I started out skeptical but I am amazed.
ChatGPT 4o came out with an image update, and it’s very very good. I’ll get into the juicy details with lots of visuals down below, but let me say the new technique they’ve developed to generate these images (called autoregressive image generation) is really incredible. Text is crisp. It hears your entire prompt. It’s really remarkable.
But of course we’re not satisfied with vague platitudes here. I want specific prompts, across a range of specified tasks, and I want to make sure I compare 4o to Gemini, which wowed all of us only a couple of weeks ago. How do these two models stack up? Read on to for a really detailed breakdown of how 4o works plus an overall assessment of the model across a whole range of visual challenges, including object manipulation, text in image, and fine detail work.
Things you can try with 4o you couldn’t try before: draw a picture of my pet like anime (you’ll see mine down below), write “this text” on a shirt and “this text” on a sign in a complicated scene, take a product and move it around, generate a photorealistic portrait of me. The sky is the limit!
Keep reading with a 7-day free trial
Subscribe to Nate’s Substack to keep reading this post and get 7 days of free access to the full post archives.