# Elon Musk STUNS The Industry With GROK 2

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=6LHjlZJnKJI
- **Дата:** 14.08.2024
- **Длительность:** 17:53
- **Просмотры:** 48,196
- **Источник:** https://ekstraktznaniy.ru/video/14129

## Описание

Prepare for AGI with me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://x.ai/blog

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Транскрипт

### Segment 1 (00:00 - 05:00) []

so today's update actually did just stun the entire AI community and by I mean those of you who are focused on the space and following the updates on a day-to-day basis for those of you who are unfamiliar with what I'm talking about recently x. announced their new chatbot grock 2 now grock 2 which is a very capable chatbot confirmed recently that they are actually the chatbot that was SAS column R now if you don't know why this was such an incredible statement from them on Twitter it's because SAS column R many people were speculating that this model was actually strawberry and considering the fact that this chatbot was grouped with the open AI chatbots on the chatbot Arena many people thought that this was a speculative open AI model that was able to do Advanced reasoning and today we got the news that this chatbot is actually a model from x. you can see clearly here that Elon Musk is confirming this by tweeting r r sus this tweet got 38. 7 million views and I think this is one of those tweets that you know is like not really a slap in the face but more so a look I told you so um because Elon Musk is someone that has been you know working on his chat Bots for quite some time and of course the thing is that one thing that people never do regard regardless of whatever industry Elon Musk is in never count him out there are many naysayers many critics and whatever but one thing that you can't dispute is of course results now the thing about this model is that if we do take a look at x. a the sus column R we can see that on the leaderboards for the chatbot Arena that this model actually performed rather well I have to be honest when looking at the leaderboards especially on the LM Arena area one of the things you do have to do is note that sometimes things aren't exactly as they seem because in some of the arena battles sometimes models are not the right context length sometimes there aren't you know the right kind of queries but when I've used SAS column R this has been a model that has consistently been basically state-ofthe-art now you remember in my recent video we actually spoke and tested this model against some of the other models one of the things that I spoke about was the fact that this model it was on par with the state-ofthe-art but it wasn't really crazy but one of the things that I did find out about this model was that it seemed to be trained in a way that it was essentially reasoning a lot like its problem solving capabilities seem to be more so than the other chatbots just by the way that it responded I think that we've moved now from a stage to where we've got these standard chat Bots to where we've now had have the models that are more so trying to reason about their questions and queries in order to make themselves more helpful if you haven't been familiar with the recent updates from anthropic they were one of the first Labs that actually Incorporated this natively into the chatbot one of the early things we saw from Claude that was a leaked prompt was that we could actually look internally at what Claude was thinking before it responded to a message and I'm guessing that this combined with the extra data more training better posttraining and pre-training and of course a new entire model that we managed to get CLA 3. 5 Sonic which is just remarkably more smarter than anything else on the market now this you can see currently manages to apparently beat Claude 3. 5 Sonic the only problem is we don't have access to grock 2 just yet because it is slowly being rolled out now you can see that it does beat Gemini 1. 5 Pro it beat betters llama 3. 1 and of course Claude 3. 5 Sonic now I would you know question that Claude 3. 5 Sonic because on the leaderboards consistently from what I've heard for people diagnosing the hardest issues the best chatbot in terms of raw intelligence is claw 3. 5 Sonic you have to remember that the LM Ys Arena board doesn't just measure things like intelligence it measures things like how a chatbot manages to actually form an X responses and how helpful the responses are in comparison to another chatbot so always you know take that into consideration when you're looking at which model you want to use because this leaderboard while useful it doesn't serve every purpose that you might think it does now once again they released the overall elos for the chat B Arena this was their image I got to be honest it's kind of annoying having to tilt my head to kind of read I mean of course I don't have to tilt my head but I mean there's easier ways to present data but that's

### Segment 2 (05:00 - 10:00) [5:00]

not part of this video the point here is that what we are looking at here is a remarkable release from this team because the thing is that they came into the game in terms of AI pretty late compared to everyone else and a lot of people did Count out x. a because they're a much smaller team and they don't have the necessary billions and billions of dollars and infrastructure like meta and Google and of course open AI have had considering that those companies have been existing for quite some time so the fact that a much smaller team on a shorter timeline has managed to reach state of the art shows us that there are still Innovations to be had in terms of the many different teams that are going to be working on different products and I think this is rather fascinating because what we see here is that this is essentially their GPT 4 level model and the honestly like when I used this SS column R using it definitely did feel like it was a state of the op model it didn't feel comparably worse in many scenarios and I'm going to show you guys later on in the video a small demo of some of the things that this chatbot actually does well now what we can also see here is the win rate of Gro 2 against competing models on the chatbot Arena and we can see that it manages to get pretty much every single model apart from the Gemini 1. 5 Pro their very new experimental one which is rather interesting because some people have tested it found it to be not great and others have tested it like myself and have found it to be you know absolutely amazing so I think one of the things I would say to you and this is something that I've been saying in certain communities and to people that have you know asked me about this is that with these large language models all of them have different kind of areas of expertise that are going to be different for every single person for some of your queries and some of your requests it might be best to of course test out all of the you know top models to see which one kind of edges out in the competition because there isn't just a one model for absolutely everything although you might think that that's Claude 3. 5 some models like Gemini are more creative some models like GPT 4 are more structured in terms of their responses and models like clae 3. 5 Sonic are just completely raw intelligence so I think there's definitely a mixed bag on different things here that you can completely use and some people interest interestingly enough on their own benchmarks have tested things like mistol large which is an extremely underrated chatbot that I personally use for my daily queries and llama 3. 1 45b that have enough reasoning to do a lot of low-level tasks really well and a lot of tasks that previously you would have used for GPT 4 but now that you can use you know these other models for in terms of your need for intelligence so essentially overall just decreasing your overall cost so rather than putting all of your queries through Claude you just put the most intelligent ones through Claude or Gro 2 and then of course route the other ones to like mistro large or llama 3. 14 5B without actually losing any tokens now of course the benchmarks have been yapping for quite some time but you can see here that the benchmarks are a decent improvement over the grock 1. 5 we can see that there was a huge jump from Gro 1. 5 to grock 2 you can see that on the GP QA there's a 15% jump on the MML U there's a 6 to 7% jump on the mlu pro there's like a 25% jump on the math benchmark 26% jump on the human eval there's a huge jump there and then of course on the mmu the math Vista and the doc vqa there's a lot of jumps there as well so overall what we can see on just these initial you know first like not comparisons but first impressions is that we can see that there have been some stock improvements in terms of How It's managed to perform form regarding these other models and we can see here exactly where the model ranks compared to its other counterparts and I got to be honest for x. a this is remarkable considering the fact that they've been somewhat behind in terms of their starting position and when we look at these other models in terms of you know the other benchmarks it is quite hard to see what is first but we can see these blue bars here so you can see that this one is like second this one is third this one is you know second this is third this is first and this is second so overall we can see that this is definitely a mixed bag it's not completely state-of-the-art but it is a model that is currently on par with state-ofthe-art and as someone that's tested it does do well now there are some unique features about grock 2 that most people are going to miss because they don't realize that this model is becoming increasingly more capable one of the things that you know this model is good at is of course its image capability in terms of it being able to

### Segment 3 (10:00 - 15:00) [10:00]

actually look at an image and completely understand what's going on you can see here it's able to look at this image where it says and that is the original processor and then of course write down exactly what is going on so here what we can also see is that if you are currently on X I know a lot of people don't use the platform it's completely up to you but like I said before this is you know elon's way of getting people to use his of course social media site and I can't hate him for that because if you spent billions of dollars on a social media site you'd be doing anything to get people to use the platform so if you do want to use grock 2 and grock 2 mini which is a you know smaller lightweight version of the model you definitely want to sign up to x. com because I'm not entirely sure if you're going to get instant access because before it did take a bit of time of course for you to get verified and then to get access to the model now essentially you know they've got grock 2 and grock 2 mini available now and one of the most interesting things about gr 2 mini is the fact that it does have some kind of cool reasoning capabilities that other models do suffer from which is weird on how they managed to do that but essentially they do have text and vision understanding which is really nice integrating real-time information from the xplatform and what's really cool is that of course in collaboration with the black for Labs the recent company that managed to make this flux. one which is a remarkable model in terms of prompt adherence quicy and photo realism that model is going to be natively built into grock's capabilities on Twitter so essentially if you want to use flux if you don't know how to access it there is a million ways but if you're on Twitter and you want to easily access this model this is going to be a way that you can do that now I want to show you guys some of the capabilities of grock 2 mini because it is rather surprising and you might laugh at the tests but I think they're pretty humorous so one of the things that we do know is the fact that large language models cannot count the number of letters in a word due to the tokenization so essentially what happens is the fact that large language models don't view words the same way that we view words so we view words as broken up by letters but large language models view words as you know tokens so for example this word might be wo and RD you know for letter it might be you know let TR it just completely depends I wish I had the image on screen to explain to you how it works but long story short it's not broken up you each token isn't one letter that's not how things work which means that when they're counting things they can sometimes get it wrong but somehow I'm not entirely sure how they've done this because it doesn't seem to be like a neuros symbolic approach it seems to be rather fast intuitive and quick is that when I ask this model how many you know letters of a certain letter are in a certain word it manages to get it right every single time so for example I can write how many letters a are in the word Andrew okay so I'm going to put that in but you can see right here I've put Three A's in because I wanted to kind of confuse the model to see if it was going to count the extra A's and you can see here it manages to do that rather well so this is a model that seems to have an engine that kind of reasons in a way that I'm not understanding just yet because it didn't you know use stepbystep prompting to reason that way and usually if it does step-by-step prompting for example with GPT 40 if you ask this usually for a variety of different you know um queries what will happen is it will usually give you the wrong answer then you have to say you know write it out step by step the other day this is exactly what it had to do with GPT 40 for example in another video I said how many L's are there in the word laloa you can see that they've put six instances when it's just one 2 three and of course four and then essentially you can see that through prompt engineering I said write out the letter of each word and verify if it is an L or not and then count that and you can see it's manually outputting you know the word once again then classifying if it's an L or not and then producing its response now you can do this for a variety of different AI tasks which is you know how prompt Engineers manage to provide value in various different ways and it's still something that you know people are gaining ground on because LMS are weird and tricky systems so we're going to keep figuring out ways to make these chat Bots more effective but of course

### Segment 4 (15:00 - 17:00) [15:00]

with prompt engineering once you have figured out how to you know get the model to Output the right response you can then share that method with everyone else now what's cool about this okay is that we can see here that this is potentially natively built in now I think the reason I don't see this model managing to do the step-by-step calculations is because this is grock 2 mini which means that this is a fast and lightweight model which means that it's likely that this model could have done all of those previous steps within a split second so essentially what we do have here is potentially a model that has you know internal change of thought or an internal prompting strategy before the final output if that is true then that mean that this model would definitely be smarter than you know other models of a similar size because usually what I've seen when models are able to do that they output better responses so I wouldn't be surprised if this model has like a really cool system prompt where it says think carefully through your answer write it down first or yada yada so it will be kind of interesting to see what kind of way that this system prompt is I mean there's going to be some people on Twitter that are going to definitely try to you know crack this system prompt over time I'm not sure if that's going to come out but if it does come out I would really love to see exactly what the model thinks before it manages to do its responses because you really do want to understand how the these models are able to respond with such high accuracy especially in certain scenarios so that is something that I find absolutely incredible and of course you can say make an image of London so I'm just going to put make an image of London I know that's super boring and I know that's probably not what you wanted to see but I just wanted to show you that you can actually now use these wow that looks you know remarkably photorealistic that actually looks like a picture and not something that was AI generated which is rather strange and uncan but nonetheless you can see here that this is grock 2 now this is the grock 2 mini so I'm excited for when we have you know the final grock 2 which is probably going to be even more capable but the grock 2 mini seems rather effective at its current moment now let me know if this completely surprised you did the sus column R completely throw you off guard it seems that the information that we were getting from this Twitter account unfortunately might not be entirely true but that just begs the question with as to why Sam Alman did respond to this tweet I guess some Mysteries we will never know but for now it seems that the sus colum R the model that everyone was trying to figure out is of course belonging to x. a and it's a rather capable model from the cracked x. team with that being said if you did enjoy this video and you want to try the model or have me run any other tests leave a comment down below and I'll see you guys in the next one