Google Created An AI That Can Generate Music From Text Descriptions, But Wont Release It

Google Created An AI That Can Generate Music From Text Descriptions, But Wont Release It

Google's impressive new AI system can create music of any genre by providing textual descriptions. But the company, fearing the risks, has no plans to release it in the near future.

The Google system known as MusicLM isn't necessarily the first AI-powered song system. There are other efforts, including Riffusion, an AI that composes music by seeing it, as well as Dance Diffusion, Google's AudioML, and OpenAI's Jukebox. But due to technical limitations and limited training data, none of them were able to create very complex or high fidelity songs.

Perhaps MusicLM will be the first to do so.

As detailed in an academic paper, MusicLM was trained on a dataset of 280,000 hours of music to learn how to create coherent songs that describe, as its creators put it, "incredible complexity" (for example, "a fascinating jazz song with impressive solo saxophone"). . and solo" or techno). Berlin of the 90s with low bass and a powerful kick.” The songs sound great, as a human artist might, though not necessarily creative or musically cohesive.

It's hard to overstate the sound quality of the samples, as there are no musicians or instrumentalists in the circle. Even with long and somewhat confusing descriptions, MusicLM manages to capture nuances such as songs, melodies and moods.

The comments on the examples below, for example, include the section on "inducing the feeling of being lost in space" and are definitely conveying that sense (at least to my ears):

Here's another example based on a description that starts with the phrase "main soundtrack for arcade games". Makes sense, right?

MusicLM's AI capabilities go beyond creating short song clips. Google researchers have shown that the system can be based on an existing melody, whether it's humming, singing, whistling or playing a musical instrument. In addition, MusicLM can accept multiple descriptions written in sequence (e.g. "meditation time", "wake up time", "running time", "give 100% time") and can create a sort of "story" or melodic narration lasting a few minutes. . - Ideal for movie soundtracks.

See below, "Electronic song playing video game", "Meditation song by the river", "Fire", "Fireworks".

Not only that. MusicLM can navigate through a series of images and descriptions or generate the sounds played by a particular type of musical instrument in a particular genre. The skill level of the AI ​​"musician" can also be set, and the system can create music inspired by a place, time, or requirement (such as motivational music for rehearsals).

But MusicLM is perfect, far from it, to be honest. Some samples have skewed quality, which is an inevitable side effect of the training process. While MusicLM can technically reproduce vocals, including choral harmonies, it leaves a lot to be desired. Most of the "lyrics" will come out almost entirely in English, performed by synthesized voices that sound like a combination of multiple performers.

However, Google researchers have noticed many ethical problems with systems like MusicLM, including a tendency to include copyrighted material from training data in the songs they compose. In one experiment, they found that about 1 percent of the music the system produced was copied directly from the songs it was trained on—a threshold high enough to prevent MusicLM from running in its current state.

“We recognize the creative misuse risks associated with use cases,” the study authors wrote. "We strongly emphasize the need for further work to address the risks associated with this generation of music."

Assuming MusicLM or a similar system becomes available, serious legal issues will arise, even if the system is positioned as a tool to support artists, not replace them. They already have the truth about the simplest AI systems. In 2020, Jay-Z's record company filed a copyright infringement lawsuit against YouTube channel Vocal Synthesis for using AI to create versions of Jay-Z songs like "We Didn't Start the Fire" by Billy Joel. After initially taking down the video, YouTube brought it back after it discovered the takedown request was "incomplete". But deep music still rests on a vague legal basis.

A white paper by Eric Sunray, now a legal adviser at the Music Publishers Association, argues that AI music generators like MusicLM violate music copyright by "creating coherent audio services out of the work they consume while learning, thus violating copyright laws. "True" US copyright. Following the release of Jukebox, critics also questioned whether training AI models on copyrighted musical material was fair use. Similar concerns have been raised about training data used in AI systems for image generation, coding and text, which is often deleted from the network without the knowledge of its creators.

From a user perspective, Waxy's Andy Baio suggests that AI-generated music would be considered a derivative work, in which case only the original elements would be copyrighted. Of course, it's not clear what is "authentic" about such music. Using this music for commercial purposes opens up uncharted waters. It's easier if the music created is used for purposes protected by fair use, such as parody and commentary, but Baio believes the courts will need to make decisions on a case-by-case basis.

Perhaps soon there will be clarity on this matter. Several lawsuits hitting the courts could affect the creation of AI for music production, including the rights of artists whose work has been used to train AI systems without their knowledge or consent. But time will tell.

The new voiceover gives you the voice of any celebrity