Key Takeaways
- ChatGPT’s Advanced Voice Mode allows natural conversations and deep discussions with users leveraging the inference power of GPT-4o.
- Gemini Live by Google offers free access in 40-plus languages, and is available through the Gemini mobile apps.
- Microsoft’s Copilot Voice Interactions is also free but only currently available in Australia, Canada, New Zealand, the UK, and the US.
Who needs text-based prompts when you can simply talk to your favorite AI? Voice interaction is the hot new feature that developers are scrambling to add to their models, with ChatGPT’s Advanced Voice Mode, Copilot’s Natural Voice Interaction, and Gemini Live leading the way.
Chatbots are Growing Fast
It’s been less than two years since the debut of ChatGPT, and we’re already witnessing AI chatbots undergo a fundamental change in the way they communicate with humans. As these models have rapidly evolved and gained multimodal capabilities, they are no longer bound strictly to text-based prompts and replies. Today, they can converse with you as you would another person and, in Gemini Live’s case, do so in more than 40 languages. Obviously, traditional written prompts still have their place—I mean, nobody’s sitting down and dictating thousands of lines of Python code to a chatbot—but voice interactions and conversational AIs are poised to further revolutionize how we interact with the modern world.
OpenAI was the first to bring the technology to market with Advanced Voice Mode, but was quickly followed by Google’s Gemini Live and, more recently, Meta’s Natural Voice Interactions. Each system offers its own unique set of capabilities and constraints. This guide will help give you the information and insight you need to choose the best one for your specific needs.
ChatGPT Advanced Voice Mode
ChatGPT’s Advanced Voice Mode (AVM) leverages OpenAI’s latest large language model, GPT-4o, to facilitate more natural, back-and-forth conversations with you, the user. This makes it ideal for tasks that require real-time interaction, such as brainstorming or discussing complex topics. And, since it has GPT-4o under the hood, AVM is capable of competently discussing a wide range of topics, from biochemistry to 14th century Japanese philosophy. What’s more, it can provide in-depth responses on those topics where other AIs will provide brief summaries. Personally, I find that it offers a strong combination of natural language understanding, adaptability, and personalization, alongside a broad knowledge base.
AVM was the first conversational AI feature to reach the market. It first debuted in May at OpenAI’s Spring Update event before being released as a beta to select ChatGPT Plus subscribers in July for testing and feedback. It eventually rolled out in late September to Plus and Teams subscribers. It’s accessible through the ChatGPT mobile apps as well as the desktop portal, but unfortunately, it is not yet available if you use ChatGPT’s free tier. Nor is it yet available in the EU, the UK, Switzerland, Iceland, Norway, and Liechtenstein. If you live in one of those regions, you’ll have to keep typing.
Gemini Live
Gemini Live is Google’s answer to Advanced Voice Mode. It is built atop the Gemini 1.5 Pro model, which is Google’s most advanced to date. The company unveiled Live in May at I/O 2024 and initially trialed it with Gemini Advanced subscribers in August before releasing it to all users, free of charge, in late September. That alone gives Gemini Live a leg up over AVM in my opinion, because I don’t have to shell out $20 a month to try it.
While Gemini 1.5 Pro can’t post the same benchmarks as GPT-4o, it does offer a host of capabilities that AVM does not. I cannot overstate this, it’s free to use through either the Google app or the dedicated Gemini iOS and Android apps. There are no region restrictions for it as there are AVM. The only place you can’t get Gemini Live is on the desktop, though Google is reportedly working on adding that capability in the future. Gemini Live is currently available in five languages beyond English: French, German, Portuguese, Hindi, and Spanish, and will expand to nearly four dozen languages in the coming weeks.
Copilot Voice
Copilot Voice is one of a host of new features that recently debuted alongside the revamped Copilot personal interface, which runs on a custom instance of GPT-4. Like AVM and Live, it enables you to converse naturally with the AI instead of typing out your queries. Like the others, Voice is primarily designed to answer general questions and act as a digital assistant, though because it does operate atop GPT-4, it has access to that model’s expansive training corpus. And unlike Live, Voice is available through the Copilot desktop portal.
Microsoft bills it as “the most intuitive and natural way to brainstorm on the go, ask a quick question or even just vent at the end of a tough day.” Because who needs real friends when you can just yell at your pocket computer on the subway ride home?
It is free to use, unlike AVM, though it is currently limited to conversations in English, and only if you live in Australia, Canada, New Zealand, the United Kingdom, or the United States. Microsoft is working to expand both the feature’s language capabilities and geographic availability in the coming weeks.
Which Voice AI Is Right for You?
That’s a question that depends on a number of variables such as how much you’re willing to pay, what you intend to do with the AI, and what brand ecosystem you subscribe to. For me, I prefer Google Live. Not just because it’s free, but because I am already deeply integrated into the Google ecosystem. I mean, I use Gemini on an Android phone, and I’m writing this post on an Acer Chromebook.
If I were a Windows guy, I’d be more likely to use Voice, if only to minimize potential friction points with the rest of the apps I already use. If I ran iOS, well, I’d be patiently waiting for Apple Intelligence to arrive with its AI-enhanced and supremely upgraded Siri. If you, on the other hand, actually need the lake-boiling inference capabilities and performance that ChatGPT provides, and have $20 burning a hole in your pocket, Advanced Voice Mode is probably the way to go.