This Shouldn’t Be Possible… Open Source AI Music (SUNO LEVEL)
23:02

This Shouldn’t Be Possible… Open Source AI Music (SUNO LEVEL)

MattVidPro 22.01.2026 16 072 просмотров 724 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this video, I explore an incredible open-source AI music generation model called HeartMula, which can generate high-quality, multimodal music entirely locally on a consumer-grade computer. I share my excitement about generating a three-minute song on my computer using this model and discuss the hardware specifications required to run it efficiently. I also compare HeartMula's performance to other top models like Suno V5, highlighting its strengths and unique features. Additionally, I provide a step-by-step tutorial on how to install and run HeartMula locally with the help of Google's Anti-Gravity AI software, making it accessible even for those without advanced technical skills. Plus, I showcase different generated music samples with and without lyrics to demonstrate the model's capabilities. Huge thanks to Minimax for sponsoring this video. Don't forget to drop a comment below with your project ideas and share your creations on my Discord server! Huge thanks to Minimax for Sponsoring today's video! Check out M2.1: https://platform.minimax.io/subscribe/coding-plan?code=72cLC7uGeb&source=link ▼ Link(s) From Today’s Video: HeartMula: https://heartmula.net/ Project Page: https://heartmula.github.io/ HeartMula Github: https://github.com/HeartMuLa/heartlib Antigravity: https://antigravity.google/ ► MattVidPro Discord: https://discord.gg/mattvidpro ► Follow Me on Twitter: https://twitter.com/MattVidPro ► Buy me a Coffee! https://buymeacoffee.com/mattvidpro ------------------------------------------------- ▼ Extra Links of Interest: General AI Playlist: https://www.youtube.com/playlist?list=PLrfI66qWYbW3acrBQ4qltDBsjxaoGSl3I AI I use to edit videos: https://www.descript.com/?lmref=nA4fDg Instagram: instagram.com/mattvidpro Tiktok: tiktok.com/@mattvidpro Gaming & Extras Channel: https://www.youtube.com/@MattVidProGaming Let's work together! - For brand & sponsorship inquiries: https://tally.so/r/3xdz4E - For all other business inquiries: mattvidpro@smoothmedia.co Thanks for watching Matt Video Productions! I make all sorts of videos here on Youtube! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe! All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them. 00:00 Mind-Blowing AI Music Generation 02:27 Introducing HeartMula 03:38 Sponsor Message: Minimax 05:19 Comparing AI Music Models 10:33 Setting Up Hart HeartMula 15:45 Generating AI Music with HeartMula 21:23 Final Thoughts and Future Prospects 22:27 Conclusion and Community Engagement

Оглавление (8 сегментов)

Mind-Blowing AI Music Generation

Okay, this is blowing my mind right now. I just generated a 3minut song with AI entirely locally on my computer. It's stereoeps in across the floor. I hear the traffic outside the door. The coffee pot begins to hiss. It is another morning just like this. The world keeps spinning round and round. Feet are planted on the ground. I find my rhythm in [singing] the sound. Every day [singing] the light returns. Every day the fire burns. We keep on walking down the street. moving to the same steady. It is the ordinary magic that we need. Tell me that's not incredible. We've seen music generation with AI before that is better quality than that, but I've never been able to fire it up on my computer and generate it for free locally offline even. My mind is so blown right now. And some of you might spot the RTX 5090 up here. You don't need a 5090 to run this. We'll talk about VRAM specifications, but a lot of consumer grade GPUs support this. Also going to talk about some other open- source projects today, but I wanted to open up with this. And yes, I am also going to be showing you guys how to install and run this locally for yourselves today. This is blowing my mind on multiple levels, too, because not only are we generating music of this caliber locally, — but to actually make this setup work, to run it locally, I've been interacting with Google's anti-gravity AI software. And I'm going to recommend that you guys do exactly the same because it really is that easy and almost even better than doing everything manually yourself. So

Introducing HeartMula

this is Heart Moola. It's fully open-source LLMbased song genen multimodal inputs. Apparently there's even capability for section specific style prompts. Even the benchmarks apparently claim that it tops Sunno V5 and Udo 1. 5 on lyric clarity. While there aren't too many audio music generation AI models out there, comparing the ones that are out there, we can see Hart Moola definitely scores towards the top. Like in almost every aspect, it is pretty much right up there with Sunno V5, which would be my goto. And this is open- source weights and all. It's pretty amazing. It's basically open- source v5 really or a sunno v4 v3. 5. All these models are a little bit different, but you heard the intro. It sounds good. Yes, heart mula's got a full product page, a full paper that breaks things down. Here you can see the heart mula backbone text tokenizer, audioenccoder, heart codec tokenizer, and through the local decoder, you end up with generated music. very simplified explanation. Before we dive into the

Sponsor Message: Minimax

rest of today's video, I've got a quick word from today's sponsor. If you like the idea of a claude level LLM that builds the whole app in one shot, but want something open source that you can actually deploy locally and tinker with, Miniax just dropped M2. 1. With the M2. 1 update, the whole pitch is better performance on realworld coding tasks. It's one thing to ship something that technically works. It's another when it actually builds something that looks good and you can use no problem right off the bat. They specifically call out stronger design comprehension and UI interaction quality for web and mobile. They also optimize this model to be more concise and efficient. Deliver the most necessary information to you upfront with less of what people call token burn. This is exactly the type of model you want if you're running agentic workflows or iterating over time on a codebase. This is not your novel writer, but your quick and efficient logic coder. M2. 1 is positioned to play nice with all of the scaffolding that agents are accustomed to. Context rules, slash commands, MCP connectors. It's open source, meaning you can use it as a base to build all kinds of different custom workflows. And when I say open- source, I mean weights and all. You can grab them on hugging face or you can use the API or if you want something that behaves like a chat GPT out of the box, check out their Miniax agent. As always, everything is linked down in the description below. Drop a comment down below with your various project ideas using this LLM. Huge thanks to Miniax for sponsoring today's video. Now, back to your regularly scheduled content.

Comparing AI Music Models

Welcome back, guys. This product page is loaded with comparisons here. Let's try R& B, keyboard, drum machine, guitar. Here's a poor performing music gen modeling on the podium with lights in my eyes. Finally broke the ceiling and touched the skies. They said I wouldn't make it, but here I stand. You can hear what it's going for, but a lot of it just feels missing. Now, compare that to Sunno V4. 5, a closed source top performing model. Standing on the podium, lights in my eyes. Finally broke the ceiling and touched the skies. You said [singing] I wouldn't make it, but here I stand. The whole world is waiting in the palm of my [singing] hand. Sweat on the concrete and blood on the floor. Knocking down my knuckles on the door. [singing] Very — now compared to heart moola. on the podium with lights in my eyes. Finally broke the ceiling and touched the skies. They said wouldn't make it, but here I stand. The whole world is waiting in the palm of my hand. Sweat on the concrete and blood on the floor. Knocking till my knuckles bled on the door. Every time I fell down, I got up twice. Paid the heavy toll. Yeah, I paid the price. Now the doubt is fading out. Hear the crowd begin to shout. This is what it's all about. I'm a winner, baby. Look at me now. Wearing the gold [singing] and taking the bow. Nothing can stop this runaway train. Dancing through the fire and the [singing] rain. — Definitely a focus on lyrics, I think, from this model over necessarily the instrumentation. Sunno V4. 5 or Sunno V5. These top-of-the-line closed models, they seem to be very much all-encompassing. They give off big model vibes, at least for music generators. Hart Moola, on the other hand, we know it's a smaller model, 3 billion params, and it runs locally. I think there is going to be a quality difference here compared to Udo or Sununo. But being open source, open weight, kind of like comparing oranges and apples in a way. We're going to be able to fine-tune hopefully heart mula and do way more with it. Let's try Murea 7. 6. This is for electronic synthesizer self-discovery. Prior to — I mean that's a very good model as well — with a hollow — But the neon [singing] lights are flashing in my eyes. Cutting through the step and the old disguise. — Now heart moola. I was playing a just a steady signal with a hollow. — Man, that's good though. — Cutting through the static and the old disguise. Turn the voltage [singing and music] up. Watch the me red line. I am stepping out of the rigid design. The base is kicking in, shaking off the rust. Turning all the shadows into diamond dust. [singing] I'm finding my frequency loud and clear. Nothing left to hide, nothing left to fear. — Man, it is right up there with the closed source models. Honestly, if you want to hear even more examples, it's all going to be linked down below. This model is also multilingual, though. As you can see, it's proficient in Chinese as well as Japanese, Korean, and Spanish. probably still not going to be as diverse as a Sunno or an Udo for those multilingual frameworks, but I can see that there is a direction here pointed towards that, which is good because if there is an upgrade of Hartmoola, a Hartmoola 2, we would expect it to hopefully support even more languages. Anyways, let's go to GitHub

Setting Up Hart HeartMula

check out that code. You can see the license is Apache 2. 0. Doesn't get any better than that. There still are a few to-dos that will enhance the project like releasing scripts for inference acceleration and streaming inference useful for building websites competitors potentially based off of a model like this. This really could be the stable diffusion of AI music. They also are going to add support for reference audio conditioning fine grained controllable music generation, hot song generation, whatever that means. They've also got a larger 7B variant. Clearly going to require more VRAMm and resources to run, but it will be much closer to competing with a full Sunno V5. Regardless, awesome stuff. And like I said, we've already got weights. We've already got code. You can run it locally today. Here is their full tutorial for getting it up and running. But I'm not going to recommend that you sit through and read the read me here for how to install this. If you want to get this up as quick as possible, click on the green code button on GitHub, then move down and download the zip. We'll literally download a zip file that contains this repository. Next up, I recommend you download Google Anti-Gravity. It's not just going to be useful for setting up this project, but pretty much anything from GitHub it can handle. This is AI integrated with your computer in a fundamental and core way. It is a gamecher and it is totally free to try, but you do need to sign in with your Google account. Regardless, go ahead and download it. And before we move further, I should mention the requirements for your system at home. Not just any computer, of course, can run a local AI music generation model like this, but you'd be surprised to see a lot of traditional gaming PCs or workstations can. First of all, you're going to want an Nvidia GPU relying on that CUDA framework. But yeah, this is going to be like any modern Nvidia GPU, and you're going to need a minimum amount of VRAM, at least 10 to 12 gigabytes minimum. You even may need to use something called lazy loading to help keep it stable. The recommended amount of VRAM is going to be 16 plus GB mid to high-end range card, I know, but it should be very smooth and stable at this. If you have less than 10 GB or have like an AMD GPU, you can run this on CPU, but it's probably going to take forever. If you're on Windows and you want to figure out how much VRAM you have, it's very easy. Hold the Windows key, press X, click on task manager, performance on the left hand side, down to GPU, and you will be able to see exactly how much VRAM your GPU has. Regardless, once you open up anti-gravity, it's probably not going to look like this. But we want to direct anti-gravity to the file that we're putting Hart Moola in, the zip we downloaded earlier. What I recommend you do is make a new folder anywhere you want on your computer. Label it repos. Put all your GitHub repos you downloaded locally in there. Regardless, wherever you decide this folder is going to live, you can place your heartlib-ain. zip into it. And really, that's all you need to do. Next, in anti-gravity, go to file, open folder, select repos, and then click select folder. Or if you made a specific heart moola folder, select that. In anti-gravity, in the chat interface on the right hand side, once the AI is directed at the heart moola zip, just say, "Hey, I need help setting up this repo for local testing on my computer. " That simple. Just explain to it, say, "I need help setting up this repo. Thanks. " It's going to go to town. I recommend leaving it on fast mode with Gemini 3 flash. You don't need anything more potent than that. If you're wondering what the AI is going to do, well, it's going to take a look at that readme inside of the heart moola folder. The GitHub file contains all of the information you need, albeit very technical information and specific information to get stuff like this running on your system. But every system can be a little bit different. Things need to be installed, set up, and it can take a long time. Since I had the RTX 5090, a newer Nvidia GPU, top-of-the-line stuff, we ended up needing to install a custom version of PyTorch, the nightly variant. Anti-gravity, in my scenario, even applied a small fix to the code to use sound file for saving the audio. The experimental torch audio build was having some trouble with the MP3 encoding on Windows. This is really incredible stuff. of cutting edge AI technology has totally loosened up what is possible for the average person on a PC. Anti-gravity can use my computer like a college graduate expert. I can take that downloaded GitHub zip and get it working for us no problem. You don't have to be a tech coding genius and you don't have to pay for a closed source AI music generation model. In fact, I'm willing to bet we can even prompt for AI music right inside of anti-gravity. Of course, though, a big part of running the model locally is the ability to not have to rely on APIs like anti-gravity.

Generating AI Music with HeartMula

Let's generate a few more songs. So, in the heartb- folder, inside of the assets folder, you'll see lyrics. ext and tags. ext. Lyrics. ext is going to be your lyrics for the model. You could see it takes lyrics in exactly like Sunseo AI or really any AI music generator for that matter. I wonder what happens if you just leave it blank. Maybe we'll get an instrumental tags. This is for what your song's actually going to sound like. We've got piano and happy in here. These look CSV style, separated. Ooh, piano, happy, slow, romantic. Sure, why not? I'm going to try to generate without any lyrics and see what the model does right out of the gate. As you can see, the model is loading up into memory. It is not trying to be sparing whatsoever. We're using about 21 gigs. Let's see how fast we can generate a 3 to 4 minute song. Oh wow. Okay, that was a very quick generation. Look at that. You can even see anti-gravity working with us, explaining everything as we go through. So darn cool. Let's listen to the output. All right, here is output. No lyrics. — The sunsep across the floor. — Okay, it definitely kept lyrics, obviously. — What happened? I thought I wrote Oh, you know why? It's because this isn't even saved. How silly of me. Run it again. Oh crap. It looks like it unloaded all the weights from the model. It takes way longer to load the model into the GPU than it does to actually just generate the song. While that generates, I'm going to dig up some old lyrics. This is a rap alternative rock song about counting sand. That's going to be amazing. All right, the model is loaded. As you can see, it is generating the song right above me. It does it in like 20 seconds. That's how fast we get a song on the RTX 5090. It's insane to load up the 3B model. Maybe about 3 and 1/2 minutes. Oh no. See, it just automatically unloads all the checkpoints. very aggravating. All right, it's reloaded the model for me. I'm going to go into lyrics and paste our new lyrics in really quick. We're going to save it. I'm going to grab some tags also for the style here. And we'll save the tags as well. All right. Well, while our third generation loads up and goes through, let's listen to our output number three. This time, it really should have zero lyrics. — [singing] — It's mixing instruments and voice. — How strange. — You can tell it's meant to be prompted. It's kind of like it's like this relaxing like digital robot singing almost. This is like what you would expect to get out of an AI model if you were in 2018. — Weirdly catchy. — Oh, there it goes. Generating the actual music. This is the part. Oh my gosh. There it is. It's actually insane how fast it comes through. That is crazy. Oh, here it is. One 2 3 4. Somebody stop me before I lose count. I hit the beach with a notebook. Stopwatch rage. Sunburn scalp locked in a sandsized cage. Every single grain got a name. I'm insane. Scribbling numbers till they bleed through the page. Shovel in my skull, digging pits in my brain. Microscopic math. Yo, [singing] it's friction. It's pain. Seagull laughing. I'm attacking every fraction. My sanity collapsing, but I'm stacking satisfaction. — Some for it grain by grain vein by vein. Sweat dripping ticking. I'm addicted to the pain. Million billions zillion numbers slip through my hand. I'm the lunatic static trying to measure the sand. Sun rises eyelids like rusted blinds. But I won't quit till I conquer time. Every speck's a threat. [singing] I dissect a set like a deck of regrets. Can't collect what's time like a cosmic troll. Wave wipes out, but I won't let [singing] go. I restart, recharge, keep spinning these bars like Eminem,00.

Final Thoughts and Future Prospects

— This just makes me even more excited for that 7B model. We're probably going to get some distilled versions of that are going to run on, you know, consumer grade hardware like this and sound even better. Man, this is so cool. I don't think you can beat the original Sunno version. One, two, three, four. Somebody stop me before I lose count. — I hit the beach with a notebook. Stop watching a sandsiz cage. Every single grain got a name, I'm insane. Scribbling numbers till they bleed through the page. Shovel in my skull, digging pits in my brain. Microscopic math, yo, it's friction is pain. Seagull laughing. I'm attacking every fraction of my sanity collapsing, but I'm stacking satisfaction. Count it up. Brain by grain. Pain by pain. Sweat dripping ticking. I'm addicted to the pain. Million billion zion numbers slip through my head. I'm the lunatic statistician trying to measure the sand. — Count it up by sweat dripping. — Oh my god, AI music is nuts, bro. Okay

Conclusion and Community Engagement

well, thanks so much for watching, guys. Yeah, this is all great stuff. I've got even more open-source advancements coming in tomorrow's AI news roundup. Wanted to do a deep dive on this though because it really is like open- source v3. 5 or something. I'm sure this is also going to be integrated into noode workflows or things like Pinocchio, right? But for now, this was my little guide on how to get this working on your machine today. Thanks so much, guys. If you generate anything cool, feel free to share it on my Discord server linked down below. Well, I'll see you in the next one and goodbye.

Другие видео автора — MattVidPro

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник