# Googles New STUNNING AGI Breakthrough "Genie 1.0" (Bigger Than You Think)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=8GhUwSfgDo4
- **Дата:** 27.02.2024
- **Длительность:** 20:54
- **Просмотры:** 22,608

## Описание

✉️ Join Our Weekly Newsletter - https://mailchi.mp/6cff54ad7e2e/theaigrid
🐤 Follow us on Twitter https://twitter.com/TheAiGrid
🌐 Checkout Our website - https://theaigrid.com/

Links From Todays Video:
https://twitter.com/_rockt/status/1762027814369267901

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=8GhUwSfgDo4) Intro

so Google just released a research paper that will definitely change how things are done in the future and is a major step towards AGI so without wasting any more time let's take a look at why this research paper actually managed to get everybody talking about it and why it's so special so ladies and gentlemen this is Genie and this is essentially an AI system that allows you to essentially go from text to generative AI game you can see here that you can essentially enter your text prompt and then you are get getting a completely AI generated game that you can move in so it basically first does a text to image and then you can control the agent within that game and then you can move around and do different things if you're wondering why the resolution is so low it's because they train this on a data set that was really low resolution and then upscaled it to 360p that's why what we're seeing isn't exactly the best of quality and remember this is of course the first iteration of this paper we could potentially be having Genie 2 genie3 and of course remember just how small some of the other large language models were and this is an entirely different kind of system so essentially they State here we introduce Genie the first generative interactive environment trained in an unsupervised manner from unlabeled internet videos the model can be prompted to generate an endless variety of action controllable Virtual Worlds described through text synthetic images photo graphs and even sketches at 11 billion parameters Genie can be considered a foundational World model and this is why it has everyone talking because apparently this is going to be the year of world models and World models are essentially a very important part for AGI and this is why this paper is so key in leading us to that next level so essentially you can see right here this is the paper and you can see right on the left hand side we've got text to image handdrawn sketch and then a real world photo and all of these input modalities well I guess they're not different modalities they're just different styles I guess you could say but all of these actually do work in this kind of system and the crazy thing

### [2:14](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=134s) The Paradigm

is no exaggeration it literally states that we propose a generative interactive environments that are essentially a new paradigm for generative AI so this paper shows us that we are moving towards a very fascinating area in generative AI because this is literally a new paradigm because this hasn't been something that we've seen before and it means that now that Genie is out of the bottle excuse the pun this does mean that we could be receiving some crazy things in the future so it says what if given a large training Corpus from videos of the internet we could not only train models capable of generating novel images or videos but entire interactive experiences so why they state that we could be going towards a new paradigm is because gen AI is right now in video it's in text it's in images and it's in sound and it looks like we're about to enter the Paradigm of generative experiences which is going to be something that you know increasingly talked about because it has been talked about due to the kind of sci-fi things we've seen in TV shows like Westworld and due to certain industries and games kind of presenting that as a really futuristic ideas but these research papers actually will lay the foundation for what we will experience in the future as these research papers are cited iterated upon and of course produced in different variations so what they also State as well is that this is a new class of generative model table one Genie is a novel video and World model that is controllable on a frame byf frame basis which requires only video data at train time and you can see of course we've got Genie which is a video model controllability is that frame level we've got video models which are video and text and we've got World models which are videos and actions which is really cool so essentially there's a Twitter Thread about this by the authors of the paper and you can see here that Tim essentially says our model can convert any image into a playable 2D World Genie can bring life to human designed Creations such as sketches for example beautiful artwork from senica and Caspian two of the youngest ever World creators and this is something that does lay the ground work for future things and I'm going to show you later on in the video why this is really cool because there are some things in the paper that I saw that looked really nice in terms of further exploration in addition what they also did state was meaningful actions they stated that Genies learned latent action space is not just diverse and consistent but also interpretable after a few turns humans genuinely figure out a mapping two semantically meaningful actions like going left jumping right and I was actually supposed to play that clip right there but it shows you playing from the same starting frame can generate diverse trajectories which seems fun so you can go pretty much anywhere so now this

### [5:07](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=307s) AGI

slide right here is actually where they talk about how this is towards AGI so you can see that they state that Genie model is a general and not constrain to 2D which is absolutely incredible because World models that are not constrained to 2D means that they can do things that would be really useful for us so essentially say we also train genie on uh robotics data rt1 which is essentially Google's robotics thing that they've been working on they've had this rt1 and rt2 robot that is really cool it's been able to do some really cool stuff and they've decided that they wanted to train it on this thing and it says on robotics data without actions and we demonstrate that we can learn an action controllable simulator there too we think this is a promising step towards General World models for AGI now why is this such a good thing because essentially the Genie model is being described as you know general which implies that it is not limited to a specific type of data or application and in the context of machine learning this actually suggests that the model has a wide range of potential uses or can be applied to various types of data Beyond just 2D images or environments now the mention of not being constrained to 2D indicates that it can handle data in more complex formats you know which indicates the 3D Dimensions that we actually did just see there and the reason this is so crazy okay is because when they trained it on the robotics data here referred to as rt1 okay this means that it's you know EXP osed to data that is from Real World sensors or real world simulated robotics data okay and environments which essentially means that because this was without actions and this part suggests that the training data didn't actually include specific Action commands or outcomes focusing instead on understanding the environment or sensory input this is really crazy because now this actually leads us towards AGI because this could be data that could be used to train robots in the future they state that we think this is a promising step towards General World models for AGI and a general World model is a comprehensive framework that can simulate and understand various aspects of the real world which does allow an AGI to predict outcomes plan actions and learn from interactions in a manner in a way that humans would understand so essentially this tweet and this post right here just kind of shows us how crazy this software is and it also shows us that you know whilst this might be something that you know we can use to play completely generative games and that kind of thing it shows us that with the 3D elements here that it does mean that this is a real step towards AGI because of course this could definitely be used as a really big development as a framework that does help towards AGI and here we have

### [7:42](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=462s) Another Tweet

another tweet from another author on the paper that stated when we started this project the idea of training World models exclusively from internet videos did seem wild but it turns out latent actions are the key and the bitter lesson holds now we have a viable path to generating the rich diversity of environments we need for AGI and he's basically stating that this is a really big step towards AGI because now we have a action simulator that is beyond 2D and it's something that is essentially a general World model that once they pretty much scale it up it's going to deliver some really impressive results because we did actually see in the paper um I don't actually have the screenshot here but in the paper they actually talk about how they could potentially scale this up because as they were increasing the compute of the model it didn't show any decrease in performance so I do hope that Google does manage to do some crazy stuff for this cuz this is pretty cool

### [8:36](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=516s) What is a World Model

and of course we did have Yan Lan's you know definition of a world model and essentially I'm going to give you guys the layman's terms cuz this is you know really kind of hard to understand if you're not someone that's a little bit more on the technical side and essentially a world model if you just think about it like a little AI brain that tries to mimic the rules of how an actual real world works so just like you know how kids learn by playing and watching the world a world model all learns by being fed information and this information includes what it sees right now so like a snapshot of the environment like a picture or a short video clip what happens before so a memory of how things were in the past and what action it could take so an instruction like move left or jump and of course there's the unknown factor so this is where things get crazy cuz a bit of random information representing things that AI can't perfectly predict like whether a coin to cost will be heads or tails and the world model will use all of this to do one important thing and that is to predict what happens next and essentially World models work because they have a translator and that is the part that turns the snapshot of the world model into a special code that the AI can then understand and then think of it like translating a picture into a secret language and then the second part that they have is the predictor and this is the brain well essentially the operation and it takes the translated code the memory of the past the action and the random of information to make its best guess about the future and essentially the secret ingredient of this all is of course a brain without making it a cheetah so it would be easy for the translator to learn to ignore the picture and always give the same code and a world model that does this wouldn't actually learn about the world and special training techniques actually do prevent to help this from happening so essentially what we have here is something that is a really big step in towards artificial gener intelligence and we've spoken about world models before in previous videos when we talk about Sora because that is actually kind of a world model because it does actually predict things that do happen in the real world so it's kind of like a world model and I do think that now we're moving towards these World models it does show us how crazy AI is about to get because once these World models get more and more accurate we're going to really start to see a huge performance increase on these types of You Know video clips like Sora even though it's already really good and especially in these you know other General World models like Genie now

### [10:40](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=640s) Training Agents

another thing that they did actually talk about in the paper was of course training agents and they stated that we believe that Genie could one day be used as a foundational World model for training generalist agent so in figure 14 we show that the model can already be used for generating diverse trajectories in unseen reinforcement learning environments given starting frames and we further investigate if latent actions learned from y y so essentially I'm going to just explain this all to you guys cuz this is the part of the paper that I read that I was like yo this is pretty crazy so essentially they think okay that Genie their entire world model has the potential to become the training ground for future super smart AI agents and this is crazy guys cuz when you see my next video okay about open AI you're going to be like okay like things are really starting to click together okay so essentially what they did was um Genie went exploring so they said uh you know Genie were're going to put you in a new place so they showed Genie a picture of a game level it had never seen before from a game called coin run and essentially Genie could create and generate lots of new creative ways to move through this new level and it shows that it understands basic World rules even in unfamiliar places now essentially they did a second experiment which was can Genie copy others and they used the part of Genie that extracts actions from videos and I watched videos of skilled players in a specific environment and Tagg them with its own internal action codes and then of course imitation time so they trained a separate AI agent to predict which of Genie's actions an expert player would likely choose based on what they were seeing on the screen and of course Tiny coaching needed so with just a little bit of extra information to connect Genie's actions to the real game controls this new AI agent learned to play as well as the expert and this is really important because this means that Genie's actions aren't actually random at all they translate to meaningful Concepts in different game situations and even Genies ones that they hadn't seen directly and instead of needing real world robots which can be expensive and dangerous to train Genie actually might let us create safe simulated worlds for AI agents to practice in and this is why this right here this training agent area is really cool and really crazy and this is

### [12:43](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=763s) Will Google Do Something

why this entire kind of project I think is just it's just so like so good because it's something that you know it just shows us how crazy and how quickly we are moving now there was also something I wanted to show you all and this is a question that I really do want to ask okay um and there's a lot more that I want to show you guys because this Genie thing is absolutely incredible so essentially I want to put the question to Google okay and I know you know Google probably won't watch this video in fact maybe some of them will who knows okay Google is a giant company but are Google actually going to do something okay and the reason I POS this question you know is because um Google did make the technology that basically is chat GPT okay um and of course that's a layman term definition of course there's a lot of different things that you have to do but this is the you know a kind of paper that made everything possible you know attention is all you need it's a very famous paper and that's why I'm stating that will Google actually do something the point I'm trying to make here guys is that Google Now have developed this you know really cool thing called Genie it's an Incredible World model that they could use as the foundation for other great products and Innovations but the thing is they had this guys in 2017 okay chat GPT was released in I think 2022 late 2022 or perhaps 2023 I'm actually not sure I'm pretty sure it was late 2022 but the point is that Google were 5 years ahead and they waited for chat GPT to come out for them to start you know actually doing a product because it didn't affect them but the point is that are Google actually going to do something this time and make something with this product and I do think they do because in a recent interview Demis saus did say that Google are going to be shipping loads and loads of products now and we've seen that recently so I hope that this is going to be the start of Google actually shipping products because they of course need to

### [14:17](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=857s) The Broader Impact

okay and essentially here they also talk about the broader impact so it says Genie could enable a large amount of people to generate their own game like experiences and this could be positive for those wishing to express their creative ity in a new way for example children who could design and step into their own imagin worlds now this is not something that is impossible at all because I actually recently did this because I wanted to test the limits of our current AI systems and if combined with Genie how could they be so I want to show you guys something that's really crazy okay so if you actually go on D 3 or chat GPT or any AI art generator and you actually try to like generate an image from a game this is what you guys get and you might be thinking why is this even cool guy think about this kind of image okay and then imagine if we got this kind of image and then we translated it into something like Genie like we were able to use the genie system to immediately get into this game and then immediately start playing like how cool would that be you can literally go from a screenshot to a game now I've got a lot more screenshots because strikingly enough and crazily enough these AI you know image generators are really good at doing this if you use a certain kind of prompt and if you want I'll leave the prompt in the description but if you do this you can see that you're able to get like a real first person kind of game experience um just based on screenshots and imagine we had real like really good consistency I'm pretty sure that Sora could actually use this and provide us with some kind of game that we could most likely explore and of course as well you can see here this is one that's like an underwater Style game you can see it's got the firstperson hood and of course you know some of the text is not great but that's something that would be improved on in the future but I just think that like you know your own creativity is the only thing that's limiting you because these like things are really crazy and this one is Dar 3 but like if we take a look at the mid Journey ones like imagine you know you're playing a game like you know someone here in this world I mean the technology does exist and this is why like when we're looking at these screenshots on what these you know uh game things are capable of I think it's important to realize that you know across the board AI is advancing as a whole it's not just llms it's not just you know generative AI there are you know things across the board including robotics that are just all increasing and cohesively things are moving forward at an insane space because when one technology does improve it's clear that you know um our AI generated what we're seeing is an entire field move forward at an exponential rate and I think that when you look at how these you know game engines well they're actually not game engines these you know generative a systems are able to you know get these images of firstperson Shooters and they're able to get them with striking accuracy with even hard elements this was one that I did in mid Journey you can see it's able to get the map first person there it's able to get the motion blur some other things on the left and right and let me know if you think this stuff is actually kind of cool or if it's just you know I guess you could say not that impressive cuz I feel like you know this is some really impressive stuff when I was actually prompting it I didn't know how good this was going to be but mid Journey actually did really surprise me and so did darly as well cuz a lot of this stuff is pretty much photo realistic and with Sora as well I mean imagine putting this in Sora once it's there I mean are we going to have you know gameplay you know of stuff that isn't there yet I mean is this going to be good for like concept art I mean there's just so many different things like I mean in the next two years now something that I did

### [17:35](https://www.youtube.com/watch?v=8GhUwSfgDo4&t=1055s) The Gaming Industry

actually want to add to this video was that in the future there was a trend that I did start to see across the gaming industry and that is the fact that the world's largest game publisher on the PC platform has adjusted its review policy for AI content saying it will release the vast majority of games that use it and you can see here essentially steam have adjusted their policy to allow the vast majority of games to use Ai and this is really great for the industry because it now means that we are now going to open the floodgates to what could be a very interesting new paradigm in the generative AI landscape and whenever game does manage to capture the collective conscience of the gamers will most certainly reap the rewards because we haven't truly seen a game that is truly generative just yet there was a game that was kind of generative it was procedurally generated and this is the game called no man Sky it was procedurally generated including solar system planets weather systems flora and fauna on these planets and the behavior of these creatures and artificial structures like buildings and spacecraft but I do think that this was a while ago and the technology that they're using was quite out of date compared to the tools that we now have available so it will be interesting to see how the industry changes as a whole because the generative AI technology that is currently here and currently expanding does have a few capabilities that haven't been explored Beyond some really initial Concepts that I would love to see at a really big scale and remember okay this is an 11 billion parameter model so what happens when they scale it up to something like 175 billion parameters or even you know 1. 7 trillion which is GPT 4 now one thing I do want to add okay and this isn't just to dig at goul or anything like that but I think it's always important to present all the information was of course that you know someone actually commented saying I wonder how much of it is actually happening versus how much was spliced together to make it appear as if it was happening in real time like what they did with Gemini and he said we're transparent about this the model is currently running at 1 FPS so right now it's far away from realtime playable however of course remember you know these very small models they don't always perform that well in terms of what they're able to do at an actual use case but larger models more efficient models as things are going to happen we do know that this will become possible in the future and another thing that was rather strange I'm not sure if I should add this but I think it's important to just take a note of this because AI hate is still here and um I would say you know if you do see AI hate definitely you know tweet it at me let me see the comments as well because I'm trying to kind of gauge how the entire industry is reacting to AI developments because someone says uh God I hope it stays that way this product should be canned I have no idea why they said that but if you aren't familiar with AI hate basically recently there was a ton of AI backlash when Sora was released because a lot of people just hate the technology and said that it wasn't actually good for Humanity at all and I think that trend is starting to pick up even on related products so I think it's important to kind of look at where people's heads are at and let me know what you think about AI hate do you think that this is really cool are you excited for future generative AI games do you think that you were going to get the hollow deck where you can just put on a VR and step into a world that you want do you think that it's cool that we're now rapidly moving towards a level where AGI is just around the corner I think that this new paradigm of generative AI World models is going to be really cool but let me all know what you think

---
*Источник: https://ekstraktznaniy.ru/video/14502*