Google Gemini’s Image Generation Has Just Been Supercharged


Google Gemini started as Google Bard, and back in those days, I saw it as a shameless attempt to replicate what ChatGPT was doing. I mean, it wasn’t the only shameless replica that popped out, so I don’t quite blame the tech giant for that. However, it surprises me that when I need an AI chatbot of any kind, I default to Gemini.

The AI platform has improved shockingly fast over time, and now, one of its major updates is rolling out to users in the Gemini app. The platform is getting a bunch of updates to its image generation functionality, and if it wasn’t so amazing how capable it has become, it’d have been scary. Let’s talk about it.

Google Gemini Improves on Several Facets of the Image Generation Experience

Hand holding Oppo Find X8 phone with Gemini activated on screenHand holding Oppo Find X8 phone with Gemini activated on screen
Image: Oppo

It’s hard to say that any AI chatbot or platform is aiming for a sizable market share if it is yet to acquire image generation magic at this point in the AI journey. It’s for that reason that we shouldn’t be surprised that Gemini has made its image conjuring skills a big part of its focus. The best part is that all of this comes with the regular Gemini 2.0 Flash model. Let’s dive into some of its new tricks.

Generating Images Alongside Text in a Single Response

The way most AI models are built requires you to construct separate prompts for their text response and their image response. While that’s not the worst thing in the world, it means that responses aren’t as comprehensive as you might want.

Gemini now has the power to generate a response that includes both text and images, which is perfect for generating stories. What makes this superior to using multiple prompts to make something similar is that the story elements will remain a lot more consistent.

Image Editing Through Natural Language Dialogue

You know when you ask AI to generate an image for you, but it isn’t quite what you’re looking for? Maybe you want the fantasy monster to be hairier than it was generated. And then after that, you’d like it to have longer horns. And perhaps after that, you want to fix its skin color. That kind of natural dialogue and sequential editing can be done by Gemini without issues.

Gemini’s Image Generation Uses Advanced Reasoning

Google says Gemini now sets itself apart from most of the competition thanks to better “world knowledge” and “enhanced reasoning”.

Google’s demonstration of this might be the most impressive thing of the set to me. It shows someone asking Gemini to produce a recipe, which it does. But what makes it special is that with each step, it generates an appropriate image showing you what that step should look like.

Text Rendering Capabilities Level Up

Fingers were one place where we could see AI-generated images slip up, but those have pretty much been “fixed”. Text in an image used to look like garbled nonsense, but now that’s been “fixed” too. Google says that Gemini 2.0 Flash‘s text rendering is a lot better than other options.





Source link

Previous articleApple should offer instant access to overseas eSIMs when traveling
Next articleBest and brightest innovators of Victoria tech sector celebrated