Android’s Live Captions feature is a boon for some content, but since the captions are generated on the fly, the feature only goes so far as to transcribing stuff as-is. Now, Expressive Captions are here to give them a little more pop.
Google has introduced a new feature called Expressive Captions for Android. This new feature, powered by AI, acts as an extension to Live Captions and goes beyond traditional captions by conveying not just the words spoken but also the tone, volume, and even ambient sounds. The AI models developed by Google and DeepMind analyze audio in real-time, translating it into stylized captions that reflect the speaker’s emotions and the surrounding environment.
Imagine seeing captions that not only show someone shouting “HAPPY BIRTHDAY!” in all caps but also indicate laughter, applause, or even a sigh. This added layer of information can be crucial for understanding the nuances of live and social content. These kinds of expressions are common to see in manually-transcribed captions, but having them in AI-generated captions is going to come very in handy since you can gather social cues that would otherwise only be heard by listening to the audio. Being an extension of Live Captions, Expressive Captions are available at a system-level for whatever you watch or listen to on your phone, for something like a live social media stream or video messages sent through IM services.
Since it’s AI, the captions will evidently not be 100% perfect. It might capture a cue that’s not there, or fail to capture others that are actually there. It will likely need finetuning once the feature is available for everyone. Still, if you’d like to give it a shot, make sure to try it out now. Google says that the feature is rolling out now to any Android 14 and above smartphone that has Live Captions, but it might take a few weeks to land for everyone.
Source: Google