xAI’s next Grok model is quietly appearing in public evaluation arenas.
In this video, I run early tests on Grok 4.2, focusing on web design, coding improvements, and how it stacks up against previous versions.
For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: @intheworldofai
🔗 My Links:
📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com
🔥 Become a Patron (Private Discord): /worldofai
🧠 Follow me on Twitter: /intheworldofai
🌐 Website: https://www.worldzofai.com
🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/
Grok 4.2, Grok AI, xAI Grok, xAI AI model, Grok early tests, Grok 4.2 testing, Grok vs GPT, Grok vs Gemini, LM Arena, Design Arena, AI model testing, AI benchmarks, AI web design, AI coding, large language models, frontier AI, AI news, AI updates, artificial intelligence, Universe of AI, AI evaluation, AI performance, new AI models, Grok 2026, xAI updates
#Grok42 #xAI #grokai #artificialintelligence #aimodels #aibenchmarks #lmarena #designarena #UniverseOfAI #aiupdates
0:00 - Intro
1:42 - Web Design Tests
5:09 - Coding Games! (Snake, Tetris & Battleship)
Outro - 8:55
The end of last year has been unusually busy in AI. We've seen Google move fast with Gemini updates. OpenAI continue iterating across GPT models and overall the pace of release has felt faster than ever. Almost every few weeks there has been something new to test out or play with. But one company has been noticeably quieter, at least publicly, and that is XAI. And whenever XAI goes quiet, it usually doesn't mean nothing is happening. It usually means something is being tested behind the scenes. Today, it looks like we're starting to see signs of that because a new Grock model, often referred to as Grock 4. 2, is beginning to show up inside public evaluation platforms like Design Arena and LM Marina. So, today I'm going to walk you through these stealth models named Upside and Vortex Shade. So, let's get into it. Now, I want to be very clear with the wording here. This is not an official launch and there's no blog post and there's no press release confirming that the models we're about to see today are actually Grock 4. 2. But I have good confidence that these stealth models that we're about to see today and test out are actually Grock 4. 2. And what is interesting is that when models start appearing inside Arena style evaluation system, it usually means the company is testing realworld performance, preference, and reliability before making anything official. And that's exactly how we've seen early Gemini checkpoints, GPT variants, even earlier Grock versions surface in the past. Now, I've made videos in the past where models have been leaked on LM Marina and Design Arena and I've shown tested them and turns out they actually have been the stealth models that were used to test out the actual model releases. So, just wanted to say that before we actually get into the video. So the
first update I have been noticing after testing the model out on design arena for a bit is that the new model might be much better than the previous model 4. 1 at web design and creating beautiful dashboards like the ones that you're seeing right now. So I asked the model to create a financial dashboard for a finance company and as you can see it created bunch of things over here that are not only visually appealing but make sense. For example, on the left here we have this company called FinTrust Wealth Management Company. And we have a dashboard section here. We have a schedule the accounts, transaction, investment, reports, and then the typical settings and sign out. So, this kind of looks very legit to what an actual dashboard would look like. And it looks like we have these hover cards here that actually move as we rotate over them. So, you can see over here if I move over these, it looks like they move up and down, which is a good touch when it comes to UI elements. Then we also have this amazing icon that shows our balance and everything like that. So visually speaking, this is pretty good and it's able to code all of this which is really nice. And even this chart over here which is much better than a static chart that we might be used to with previous versions of the model. So once again we're seeing much more upgrades when it comes to visual designs and UI. So this is pretty good. And even like for example the credit here and everything like that. I really like that touch and the elements on here as I scroll over them. Then we also have this portfolio distribution which has these elements that look much more visually appealing than previous versions. So this is not bad at all. And here's another example of the capabilities of Obsidian to code amazing website also known as Grock 4. 2. So I asked Design Arena to build me a website for a SAS company. What you're seeing here is that Obsidian actually beat Cloud Sonic 4. 5, Gemini 3 Flash Preview, and GPT 5. 2. So, if you look at Obsidian here, you're looking at a SAS company. Once again, I like these hover card elements that it's adding throughout all of its web design, which I think adds a professional touch and something more modern on websites that you're seeing nowadays, which is a good thing to see. And then in a SAS page, it kind of has your 14-day trial thing, watch demo button, which is nice. Then you have these stats. And as you go down, you see these elements appearing. Uh, you know, it's a re intuitive dashboards, real time collaboration, advanced analytics. All these things seem legit, makes sense for a SAS company. And then you saw how it naturally appeared on the screen like the simple pricing and real business value. So beautiful elements that it's adding. Like once again as I'm scrolling down the pricing features are coming up and these look pretty nice. So I personally think the biggest upgrade so far I've seen is on his website development and we'll take a look at more examples. So here's another example. So I asked Design Arena to create a weather application. So I put in New York and I can search any city up. So I searched up New York and it looks like it's able to fetch the data for what the temperature looks like. So currently it's -2° feels like -7. So, all this data is fetched from online and we can see overcast and everything like that. It also has created a 7-day forecast. Pretty minimalistic, but it gets the job done. It tells you the highs and the lows. So, this is not bad at all once again. And, you know, it's able to pull this. We can also convert our data into Fahrenheit, which is a good touch. It added by itself. And the prompt was pretty minimal. The prompt was simply build me a website for a weather checker. So, it's not bad at
all. Now, what you're looking at here is a little bit more advanced, just web designing. It's creating something that's playable, something that's usable. And this is a typical snake game. And also, this is oneshot coded. I just said, make me a snake game. And this is what Obsidian came up with. So, we're going to look at everything was done with one simple prompt, which is about four or five words, and it's calling it welcome to neon snake. Controls are typical keyboard error. So, let's start the game. Test this out. So, the snakes look not bad. Oh, looks like I failed. So, sorry guys for if I have to show you guys my embarrassing snake gameplay. Uh, it is harder than it looks. Like, jeez. Okay, there we go. We finally got that. Is our thing growing? Yes, it is. Let's get this one. I'm not going to bore you guys with my more gameplay, but let me just try it out one more time. I just wanted to make sure. Yeah, it is growing. Like, it was three and it grows up. So, it's pretty good. It's not bad at all. And like the snake looks good. The interface is good. of the movements are pretty natural. So, it's pretty decent at coding as well. And this is the Tetris game that it has created. So, the first thing I'm going to tell you is that visually speaking, out of all the games that I tested out, this one looks pretty good. It looks pretty similar to how a Tetris layout should look like. And like we have our typical controls over here, moving up and down, and then soft drop and then rotating and rotate left. Let's test this out. So, add a sound effect as well. So you can see that over here the sound effects. Let's drop this down. Let's do a hard drop. So spaces hard drop. Put this. Yeah, we can rotate and everything like that. Can I Why is it not letting me? Oh, okay. This is embarrassing. Okay, but you guys get the point. I'm just going to test out if it works. Okay. Yeah, if I die, it does say game over. It's a little bit glitchy. Like I wasn't able to fully rotate all the pieces. And you saw at the end, like I still have like one line, I guess, of area to cover, but I guess it already assumed I died. And it does keep track of our score. Like it has the final score as 258. So pretty good. Not bad when it comes to movement and everything, but obviously it's not 100% there yet. Actually, when I tested this out compared to other models, it came third and the first one was actually GPT 5. 2 at a medium level that came first, which makes sense. It's a much more stronger model. And then actually DeepSeek version 3. 1 came second in my opinion. But, you know, these are not bad. And what's funny enough is that the old version Grock 4. 1 fast came behind Obsidian. So, we definitely know that this is an upgraded model compared to the older one. Then I also tested out Obsidian against creating a simple battleship game. So, we can start our mission and we can see we can place our options here. So, we can convert it into vertical. Place our ship and everything like that. So, let's do now. Now, we have to put a four piece. Let's make it horizontal. Put that there. Then the cruiser, we'll put it vertical. And you also actually have a auto place feature. So, I'm just going to click on that and then we can go to complete deployment. And then you can play the computer, I guess. So, if I were to place this, I missed. Oh, damn. The enemy got me. Miss. Okay, it looks like I'm about to get destroyed right now. But oh, I got it. God damn. Why am I bad at all the games today that I'm showing? But anyways, you guys get the point. Like, even when it comes to coding, Obsidian is pretty good. So to wrap this up, the biggest improvement I'm seeing here is in web design and UI structure. The layouts are cleaner. The hierarchy also makes more sense, and it feels much more closer to how an actual product designer would think. On the coding side, this version is also noticeably stronger than previous Grock models. The code is more structured, more readable, and it breaks less often compared to earlier versions, which is a real step forward. That said, I don't think this is a model that's going to take the top spot just yet. Right now, it feels more like a competitive model rather than a clear leader, but this is also a stealth version. And historically, the final release can look very different once optimization and tuning are finished. So, if this is what Grock looks like before an official drop, it'll be interesting to see how far it goes once it's actually fully released. If you enjoyed this video, this is what we do here. fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side, demos, tools, workflows, and everything developers can actually build, check out the World of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.