# Udio, the Mysterious GPT Update, and Infinite Attention

## Метаданные

- **Канал:** AI Explained
- **YouTube:** https://www.youtube.com/watch?v=QASOCG5QLUM
- **Дата:** 11.04.2024
- **Длительность:** 14:08
- **Просмотры:** 119,775

## Описание

It’s been a strange 48 hours in the world of AI, with the ‘ChatGPT moment for Music’ from Udio, that has reminded millions of what AI is capable of, and papers from Google that show that models can give infinite attention to text but we also got befuddling updates from OpenAI that suggest that not all is smooth sailing. We’ll begin with the quirky new tool on Udio.com and how musicians are reacting to it, then cover the strange manner of the release of GPT-4-Turbo with Vision and quickly touch on Mixtral 8 x 22b and Command R+ before turning to a fascinating new ‘Infinite Context’ paper from Google. One of the authors worked on Gemini, but that may or may not be relevant…

https://www.assemblyai.com/?utm_source=youtube&utm_medium=social&utm_campaign=universal1_philip

AI Insiders: https://www.patreon.com/AIExplained

Udio Intro: https://www.udio.com/
https://twitter.com/udiomusic/status/1778045322654003448
‘The Site Is ****ing Down’ https://twitter.com/udiomusic/status/1778093021378089240
Musicians React: https://www.reddit.com/r/Music/comments/1c0mjkg/udio_ai_music_generation_is_scary/
Investors: https://www.udio.com/about-us
Will.i.am: https://twitter.com/iamwill?lang=en
https://suno.com/
Mixtral 8 x 22B and Command R+ Benchmarked: https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1/discussions/4?s=09
LIveCodeBench Leaderboard: https://livecodebench.github.io/leaderboard.html
Majorly Improved: https://twitter.com/OpenAI/status/1777772582680301665 
MATH Benchmark: https://twitter.com/GanjinZero/status/1777926220132626753
Function-calling Usable with Vision: https://twitter.com/OpenAIDevs/status/1777769463258988634
GPT-4 Turbo Vision Benchmarked on GPQA: https://twitter.com/EpochAIResearch/status/1778463039932584205
Hassabis Chafes: https://www.theinformation.com/articles/googles-demis-hassabis-chafes-under-new-ai-push?rc=sy0ihq
Robot Football Simulation Paper: https://www.science.org/doi/10.1126/scirobotics.adi8022
Video: https://www.youtube.com/watch?v=dOBfnOMuTz4
Udio Origin Story: https://www.theinformation.com/articles/more-google-deepmind-staff-depart-to-launch-an-ai-startup?rc=sy0ihq
Leave No Context Behind: https://arxiv.org/pdf/2404.07143.pdf
Manaal Faruqui: https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=W-CxZCgAAAAJ&sortby=pubdate&citation_for_view=W-CxZCgAAAAJ:PoWvk5oyLR8C
Gemini 1.5: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf
Llama 3 Coming: https://www.theinformation.com/articles/meta-platforms-to-launch-small-versions-of-llama-3-next-week?rc=sy0ihq


AI Insiders: https://www.patreon.com/AIExplained 

Non-Hype, Free Newsletter: https://signaltonoise.beehiiv.com/

## Содержание

### [0:00](https://www.youtube.com/watch?v=QASOCG5QLUM) Segment 1 (00:00 - 05:00)

it's been a strange 48 hours in the world of AI with releases like udio that have reminded millions of people what AI is capable of and models that can pay you infinite attention but we also got befuddling updates from open AI that suggests that not all is smooth sailing I'll start of course with the new model on audio. com and how musicians are reacting then cover the perplexing manner of the release of gp4 Turbo with vision and touch on a fascinating new infinite context paper from Google but now let's hear three 20 second extracts from udio to give you an inkling if you haven't heard it already of what it's capable of here's June the Broadway musical TR of the greatest leader we've ever seen they say that he's a listen out guy eyes bright blue and hair black you should see him ride on a sand worm's back up to Victory useful and now for some quite frankly amazing AI generated classical music and next something I'm going to bleep a little bit but represents the reaction of Uncharted Labs who are behind udio to their servers going down getting huged but the is down we appreciate you and we want your sign ups but the side is down and of course I have been playing about with udio like almost everyone has and did you know it can do standup comedy and even Brits say hello yeah we're talking Posh don't baffle you and have a cup of tea it's odd to hear AI talk right now I'm not sure if this guy's talking about me but I thought I'd let you know that this kind of thing is possible and how about a quick direct comparison between udio Ando V3 missar from open AI he's gone so far together missing star from openi now I prefer video there but you do sometimes get complete the I sick you a CRI R he coming down and whs her you got now will I am calls udio the best tech on Earth and Uncharted Labs which is the company behind udio he says is really aiming to be an ally for creatives and artists now it should of course be noted that Will I Am is an investor in udio but again they repeat that udio is about building AI tools to enable the next generation of Music creators now of course everyone has their own opinion but let's now get a taste of the reaction from some musicians one says it's pretty scary thinking what is going to exist a year or two from now and what it means for musicians listeners and the industry as a whole the top comment says I would buy a band t-shirt but never buy a shirt for an AI which makes sense but here are two more common reactions I'm a music professional producer SL composer This is highly Advanced and I thought this stuff was years away and one more I've already gone full circle with it past the confusion and Devastation and now I'm just curious what Gregorian chant would sound like with I can't even pronounce that and blast beats so definitely a mixed reaction from musicians personally I don't think it's too much of an exaggeration to call this the chat GPT moment for music generation sunno often has a slight tinniness that gives it away for those not following AI but with udio I think you could convince many people that they're listening to Human music just like chat felt like human text if you didn't look too closely I could well see before the end of this year hundreds of millions of people using this for entertainment imagine every school child in the world walking out of their lesson in whichever language with a catchy tune about what learned so yes I do believe

### [5:00](https://www.youtube.com/watch?v=QASOCG5QLUM&t=300s) Segment 2 (05:00 - 10:00)

that udio is the biggest news of this week but of course we had the mysterious release of a new gp4 turbo model from open Ai and why do I call it mysterious well not because it wasn't named GPT 4. 5 they probably thought it wasn't enough of a step forward to give it that name the strangeness was the repeated emphasis on it being better than previous iterations but without any detail they called it major improved where are the benchmarks though and now here's some more mystery all the top players at open AI like Greg Brockman and Miram murati tweeted out the news of the new model but strangely for the first time samman didn't now this isn't about reading any tea leaves is just a very strange announcement from open aai I ran my own maths and logic benchmarks and I couldn't see much of a difference it failed the same questions that the January version of gpc4 Turbo failed of course the functionality improved with function calling within Vision but what intrigued me was the repeated claims that gp4 reasoning had been further improved naturally on this channel that's what I was most focused about The Cutting Edge of intelligence here though is some of the best benchmarking work that I could find on the noted math benchmark from Dan hendris you could see a bump in its performance on the hardest style of questions from 35% to around 45% even one level down the performance bumped up from 57% to 66% the difference on the easier questions wasn't nearly as pronounced it seems pretty clear that the data set got augmented with some highlevel mathematics and code otherwise it wasn't too much changed here's another example live codebench you can't complain about contamination because they Source their questions from after the training day of the models and again as you can see performance has increased particularly for harder questions these are sourced from contests like leak code and that applies not just to code generation but self-repair again though we're not talking about massive leaps just small bumps here though is the clearest assessment from Epoch AI the diamond set of the GP QA are the hardest kind of graduate questions we're talking Google proof stem questions that even phds find hard and yes there was a bump Maybe by 2 or 3% but gp4 turbo April Edition is still lower performing than Claude 3 Opus of course the deeper question is whether or not this indicates some inherent limitations on just simply training on more and more Advanced Data it's a bit like the current Paradigm can only go so far even with better data of course you can watch any of my other videos to see why I don't think that will be much of a bottleneck that much longer now it would be remissive me not to spend a few seconds touching on two releases from the open weights Community I'm not going to call it the open source Community because they're not releasing their training data sets I'm talking about the new mix trial 8times 22 billion mixture of experts model and coher command r+ now you can judge for yourself but they land around the level of Claude 3 Sonet which is the mediumsized model of course that is a proprietary model some people may have expected the open weights Community to have caught up to GPT 4 by now but that's not quite the case of course let's wait to see if llama 3 can further bridge that Gap now before we get to Google there was one more announcement of a model I want to touch on this time though as I've done once before on this channel I reached out to the company to ask about a sponsorship I've probably turned down thousands of sponsorship offers but I'm happy to say that this part of the video is sponsored by assembly AI so what happened they released universal one and basically the reason I reached out to them is because it's really darn good I'm often transcribing videos and rarely do they get characters like GPT correct let alone names like sassia nadela universal one did so yes universal one is the model I personally use and you can see some comparisons to other models in this chart it does seem to hallucinate less than whisper and takes 38 seconds to process an hour of audio anyway Universal one only came out like a week ago and I think it's epic but let me know what you think the link of course will be in the description but now from yesterday a quite fascinating paper from Google it's about Transformer models that could have infinite context not 1 million or 10 million but infinite I must say unusually for this channel I haven't had a chance to finish the paper before talking about it but I wanted to include it in this video for a reason of course the prospect of feeding in entire

### [10:00](https://www.youtube.com/watch?v=QASOCG5QLUM&t=600s) Segment 3 (10:00 - 14:00)

libraries is fascinating but my theory is that this approach might be behind Gemini 1. 5 long context ability if you remember Gemini 1. 5 whose API is now widely available was able to process up to at least 10 million tokens notice the phrase at least there if you're not familiar with tokens think 10 million tokens is being around 8 million words and if that's a daunting number think eight entire sets of Harry Potter novels now on the day that Gemini 1. 5 came out I called it the biggest development of that day despite it being the same day that Sora came out I would still stick to that to this day Gemini 1. 5 could find metaphorical needles in videos 3 hours long or audio 22 hours long and the performance just kept improving up to and Beyond 10 million tokens but back to yesterday's paper why do I think there's any link now one hint is that one of the authors manal fari and sorry if I'm mispronouncing your name was also an author in the original Gemini papers and the other hint comes from the paper itself where they call their approach a plug andplay long context adaptation capability with which they can continually pre-train existing llms in other words it appears like you can take existing llms and just pre-train them with this approach to make them great at long context or indeed infinite context is that part of what happened to Gemini 1 Pro to turn it into Gemini 1. 5 Pro anyway it is interesting that Google published this while still being a bit Cy about some crucial details they do conclude though that this approach enables llms to process infinitely long contexts even though they've got bounded memory and computation resources now I am going to consult with some colleagues before I say much more about this paper but just think about some of the possibilities imagine a model being able to process every film made by a particular director or every work of French literature between a particular period or every email that you've ever sent since birth but let's not get too far ahead of ourselves because it's not like Google don't have their own issues this week we learned that apparently Demis cabis said that he thought it would be especially difficult for Google to catch up to its rival open AI with generated video he also apparently mused about leaving Google and raising billions of dollars to start a new research lab if he did leave to start his own lab that would swiftly become a very competitive lab to bring us back to the start that's actually how yudo was born we learned from the information that yudo is the work of Uncharted Labs made up primarily of former Google deep mine staff those researchers had created the model Lia back in this spring of last year that could be a very similar model to what we now have in udio but the company didn't unveil it until November of last year and Google still hasn't made it available to the public it seems like Demis aabis isn't the only one with some frustration at Google but before I end the video I must give Google great credit for this release within the last 24 hours with deep learning of course they trained these Ultra cute football players and yes I'm calling it football these two players weren't manually designed to do the moves they're doing through deep reinforcement learning they leared to anticipate ball movements and block opponent shots and these guys were trained in simulation which I talked about in my recent Nvidia video compared to a prescripted baseline these agents walked three times faster turned four times faster and kicked the ball 30% faster soon therefore we could have our own mini earling Harland so quite the roller coaster 48 hours in AI as always let me know what you think in the comments feel free to hop on board my patreon but regardless thank you so much for watching and have a wonderful day

---
*Источник: https://ekstraktznaniy.ru/video/12547*