OpenAI’s delaying of ChatGPT’s impressive Voice Mode upset many fans of the AI chatbot, but they may have now been scooped. French artificial intelligence developer Kyutai has introduced a real-time voice AI assistant named Moshi.
Moshi is designed to provide lifelike conversations with users through voice, like Alexa or Google Assistant, but is powered by the large language models underlying ChatGPT and its rivals, in this case, the Helium 7B model. According to Kyutai, Moshi can speak in various accents and has 70 different emotional and speaking styles. The AI can even handle two audio streams simultaneously, allowing Moshi to listen and talk simultaneously.
Kyutai’s development of Moshi involved fine-tuning over 100,000 synthetic dialogues made using Text-to-Speech (TTS) technology. The aim was to help teach Moshi the nuances and tones of human communication. The brand even collaborated with a professional voice artist to enhance Moshi’s voice quality.
This AI assistant integrates both text and audio training, optimized for multiple backends, meaning it can run on devices like laptops without needing to interact with the cloud. The company pitches this as a way to maintain privacy and security by preventing the transmission of sensitive data over the internet. You can see a demo of Moshi here.
Open Talk
Kyutai proclaimed that Moshi will be an open-source project, including the model’s codes and framework, providing a foundation for further innovation. The open-source approach may also help mitigate complaints that bigger AI companies are dealing with regarding safety and ethics regarding their closed models. Kyutai’s backers, including French billionaire Xavier Niel, are boosting the open-source approach.
Kyutai is also working on AI audio identification, watermarking, and signature tracking systems to be incorporated into Moshi. These features will help identify AI-generated audio, promoting accountability and traceability while ensuring that AI-generated content can be monitored and verified.
Moshi is still developing, but the voice mode in the presentation is impressive. The voice approach may act as a catalyst for other voice-enabled versions of ChatGPT rivals or speed up the addition of LLMs to Alexa and other voice assistants should Moshi catch on and become popular.
If you want to try Moshi, a demo is available online, and you can sign up for early access to the complete chatbot there as well.