# Google’s New AI: DALL-E 2, But For Music!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=EggmA0g71xA
- **Дата:** 20.03.2023
- **Длительность:** 8:47
- **Просмотры:** 124,600

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers 

📝 The paper "#MusicLM Generating Music From Text" is available here:
https://google-research.github.io/seanet/musiclm/examples/

My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=EggmA0g71xA) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today I am going to show you a paper and I am almost out of words. It is so good. You know what, let’s jump right in. This new AI is very much like DALL-E 2, which is a text to image AI. Our text goes in, an image comes out. And this new paper is not text to image, but music. Yes, a piece of text goes in, and some hopefully amazing music comes out. Now, let’s see what it can do. Let’s ask for an epic soundtrack with orchestral instruments. And, do we let the AI get away with such a simple prompt? Of course not! The piece should build tension, and create a sense of urgency, and you know what? Add an a capella chorus on top of it. There is no way that an AI could generate something like that, right? Well, hold on to your papers Fellow Scholars, and listen. That was incredible. We will see in a moment how incredible exactly. Now let’s give this AI an even harder time. Oh yes, we wish for a bass and drums led reggae song, with electric guitar, very difficult. But it gets even worse. Vocals. Expressive vocals with a laid-back style. There is absolutely no way that this is happening. But just in case, let’s listen. I can’t believe it. Electric guitars, checkmark, I loved the intonation and the personality, and the vocals were also something else. Wow. Now, let’s ask for something completely different. We are real scholars, so we are going to ask for real, 8-bit arcade music. Holy mother of papers, now that’s a song that has momentum, intensity, a great guitar riff. I am truly stunned. The prompt also asked for unexpected sounds. I have to admit I am not sure about that part, but the rest was incredible. But not as incredible as these four other things that is can also do. One, we already know that we can ask for some piano playing, but it gets better. We can ask it to be a beginner piano player, or a real pro. Let’s see. Now that’s super cool, loving it. Two, these were only 30-second sequences. Most songs are significantly longer than that, and there is a problem with that. What is the problem? The problem is that these AI-based techniques have trouble with long-term coherence. That means that a couple minutes into the song, they don’t quite remember what they played. And that is indeed a problem, because writing music is about building up a melody, and we have to remember the melody and embellish it over time. So, what can this one offer in this area? What’s this? You are kidding right? A 5-minute long song? Well, let’s give it a shot, we will listen to small snippets of this jazz song from different parts.

### [5:00](https://www.youtube.com/watch?v=EggmA0g71xA&t=300s) Segment 2 (05:00 - 08:00)

Now this was truly something. For about 3 minutes, it felt like one coherent song, and after that, it did not morph into another one, but I feel that it lost some coherence. However, it was not too far. Let’s say that this was 1. 5 songs. That is an insane result because many previous techniques could barely stay coherent for even a few seconds. A few minutes is science fiction region. And all this, just one more paper down the line. My goodness. Three, it can even image what paintings would sound like. I have to admit that I did not find these samples too convincing, however, the one for Starry Night was pretty good. This is tranquil, soothing song, with a hint of mysteriousness. It’s not perfect, but I like it a great deal. And I think the capability is there for all of these paintings, it is truly up to us humans writing a good prompt to make them come alive. But wait, four what happens if we write a prompt and we don’t quite like the results? Well, it can generate several variants. In fact, as many as you wish. So, I promised that I would tell you how good it exactly it. So how do we know that? Well, through a user study. Get some humans in the room, show them some real music, some AI music from this technique and from previous works, and see which one they like best. And the results are truly something else. MusicCaps is real music, so that should win most of the time. That is not surprising. However, what is really surprising is that it does not win all the time, and it gets even better. Look at that! The new technique, MusicLM is rapidly closing in. And remember, DALL-E 1 was an okay text to image AI, and just one more paper down the line, this is what DALL-E 2 was capable of. And with all this, I am convinced that we are at a DALL-E 2 moment for music generation. It only gets better from here on out. And just imagine what we will be able to do two more papers down the line. What a time to be alive! So, what do you think? What prompts would you like to try in the future? And what paintings would you like to come to life? Let me know in the comments below! Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13244*