Nvidia has been instrumental in the current AI boom that’s going on, but primarily as the manufacturer of GPUs that power all the next-gen AI processing tasks. But the company isn’t content with just providing shovels to all the diggers. They’ve gone ahead and joined in the fray with their own AI model that does something truly novel.
Reported by Ars Technica, Nvidia’s new AI model is called Fugatto and it combines new AI training methods and technologies to transform music, voices, and other sounds in ways that have never been done before, to create soundscapes never before experienced.
Fugatto is based on an advanced AI architecture with 2.5 billion parameters, trained on over 50,000 hours of annotated audio data. The model uses a technique called Composable ART (Audio Representation Transformation), which can combine and control different sound properties based on text or audio prompts. The result is completely new sound combinations that weren’t present in the training material.
For example, Fugatto can generate audio of a violin that sounds like a laughing child, or a factory machine that screams in metallic pain. The model also allows fine-tuning of specific characteristics, such as amplifying or reducing French accents or adjusting the degree of sadness in a voice recording.
In addition to combining and transforming sounds, Fugatto can perform classic AI audio tasks, such as changing the emotion of a voice, isolating vocals in music, or adapting musical instruments to new sound sources.
For all the nitty-gritty details, you can read about Fugatto in Nvidia’s official white paper (PDF). Otherwise, check out the Fugatto page with examples of emergent sounds and emergent tasks.
This article originally appeared on our sister publication M3 and was translated and localized from Swedish.