# Surprising developers with GPT-5

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=-gXmWYQtv5o
- **Дата:** 07.08.2025
- **Длительность:** 6:34
- **Просмотры:** 306,761
- **Источник:** https://ekstraktznaniy.ru/video/11281

## Описание

Five developers—Simon Willison, Claire Vo, Theo Browne, Shawn Wang, and Ben Hylak—are invited to OpenAI, not knowing who else is coming or why. Before GPT-5’s public release, they get full creative freedom to push its coding skills, design sense, and personality to the limit.

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Hello. — Welcome to one of the island. — All right. Well, thank you so much for coming here today. Uh we brought you all here to test GPT 5. — You one of the first people externally to see it and to get your hands on it. Your first task here on dev island is to open up chatbt with canvas and switch to GPT5 thinking and create a personal website. — Oh, it's dark and pink, but it's it is quite girly, so happy with that. — Yours is cuter. — Oh my god. Let's see what you got. — I said Mac OS 9 uh nice prompting. — It actually has some windows here. This is oneshotted. That's actually pretty cool. — I've got it to build me a drawing app where I can draw in step five. — And it gives me an SVG of the drawing as it goes. — All right, let's move on to individual testing. You're welcome to test the model. Do whatever you want with it. Build your own app with it. Put it in your existing apps if you like. See how it works for you. Please have at it. There's the legendary ball test. It's a fun test of physics and its knowledge of Python and like weird game mechanics. Yeah, that's one of the best ball tests I've seen to date. So, I've got my classic Pelican SVG benchmark where I ask it to draw me a picture of a pelican riding a bicycle. What I've never done before, though, is had it do that in a loop. So, it draws the pelican, then looks at what it's drawn and has another go. So, I made this like little platformer. It's just like probably like three prompts and I haven't touched the code at all. So, that's been pretty cool. You have to like kind of stomp on the enemies. I'm not very good at this game. So, since I added the lava, I've not beat the game yet. Um, so, uh, yeah, I'm going to have to make some adjustments in order to actually beat it. — Keep dying. I'm coming to it from a practical user point of view in trying to solve an actual problem with this model. What set of problems would I solve? What's the experience? — The original ball prompt was way too easy, so I gave it a significantly harder one. They went a little crazy with it, but it doesn't have many opinions of its own. Steering it makes it do what you tell it to. I do like how the Pelican's feet are on the pedals. That's a rare detail that most of the other models I've tried this on have missed. — I'm taking my product Chat PRD, which is an AI for product managers, and I'm updating the key parts to use the new model. And then I'm comparing inputs and outputs. — The first time it really nailed all the mechanics, and then it kind of like visually wasn't that interesting. So, I just said make it look better. And it added like mountains, it added stars, it added gradients, added like some bloom effects. — The point is that Dreamc is more agentics. Let's see what it does. So, this is make a tic-tac-toe game and this is 03. It does a pretty basic tic-tac-toe game. Uh, this is the same prompt with GBC 5. It's a lot more flashy. It's got sound and an AI, which I mean, it just beat me. Yeah. I mean, we're going to draw. Oh, I lost again. I'd love to hear what you feel about GPT5. It ran so fast that I didn't think it was actually running the tests. You know, we had like 30 minutes. I made a personal website. I have like a platformer game. Like I like used it a bunch to improve like our actual app. Like it it's a pretty like insane amount of stuff. I have this Chrome extension I've been working on. I primarily focus on agentic coding. So I want to just test it like the maximum capability. I think it was much better than other models in terms of adding telemetry using those telemetry to make inferences and then fixing those bugs. I think that was the key breakthrough. — I dumped the documentation of my own little Python library for working with LMS into it and then told it to build me a script based on this documentation that runs my Pelican benchmark in a loop. The bicycle was flawless. Even today, most models have trouble drawing bicycles because most humans have trouble drawing bicycles. The thing that stood out the most to me was it kind of just did what you told it to and it wouldn't insert its own things. the system prompt really shapes the behavior. — It is an engineer like you read the outputs. It's exceptional at code generation. I also had the same experience that on the front end none of the current models I think have good taste. Um they all do like AI slop aesthetic and this one actually kind of oneshotted things that I thought looked nice from a front end perspective. — It's really good at Tailwind in like good styles. I just handed it an ugly app and said make this less ugly. The

### Segment 2 (05:00 - 06:00) [5:00]

first try did that really well. So, it's unpinated except for gradients and making your websites look nice. Yeah, it's very tasteful with design. I found in general when I had it do this little like drawing app and it like implemented a color picker and the color picker like worked and then it had like a thickness slider and it like the thickness slider like worked like usually uh I don't see that when I was making a platformer game at some point I said like add a jetpack. It added a jetpack but I had to actually pick up the jetpack and it like added like a little fuel level. That's pretty cool. Up until very recently, I've had zero trust in these things. I know that they were hallucinate all sorts of made up. So, I love feeding in raw source code and asking for documentation without giving it any of the existing documentation. And it nailed it. It gave me the exact information I needed. It gave me a full architectural overview. It was clearly very good at consuming a quarter of a million tokens of trust cuz for me as a sort of constituent code reviewer just I my trust issues are beginning to fall away. Yeah. — He's saying GP5 is so good, he's turning into a vibe coder. That's what I heard. — There we go. That's exactly it. as a product builder. That was my exact takeaway as well. I was like, this is the first one I would trust to go wild in a pretty large codebase. — As an AI collaborative coder, you can actually trust it as a co-orker. Like I think that's ultimately what all of us want. — It's definitely by engineers for engineers. Highly technical, exceptional, instructible, can execute, solves the hard problems. Everybody needs one of these on their team. Any other parting thoughts? What's next? Let's be 56.
