# Codex checks its work for you

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=dHCNpcNyoFM
- **Дата:** 11.02.2026
- **Длительность:** 2:24
- **Просмотры:** 24,145
- **Источник:** https://ekstraktznaniy.ru/video/11113

## Описание

Javi walks through a logging refactor and shows why Codex's self-verification is a step change: the model runs the app, finds the right session, and proves logs still flow.

Takeaways:
- Codex can validate its work by running tests and launching the app.
- It excels at broad refactors that touch many files.
- The model can find session IDs and query tools on its own.
- Verification collapses a risky manual loop into minutes.

When the agent can prove correctness, you can move faster with less risk.

Chapters:
00:00 Why Codex has been a step change
00:18 Self-verification: run tests and launch the app
00:52 The task: a logging refactor across many files
01:10 The risk: do not break observability
01:28 How this used to be verified manually
01:35 Ask the model to verify logs end-to-end
01:50 It finds the session ID and queries logs MCP
02:03 Proof: logs still pipe, task done fast

## Транскрипт

### Why Codex has been a step change []

I've been a huge fan of Codex for a lot of last year. Really dramatically changed how I work, how I build software, and the app has been another step change, and it's made my job even more fun. I trust that it's going to make a lot more progress in one go without, you know, babysitting or handholding. And especially its

### Self-verification: run tests and launch the app [0:18]

improved ability to validate the work that it's done to write the code and then, like, automatically run tests or even launch the app and do checks like that. It means that when I get back to that session and it says that it's done, a lot more of a time, you know, it's not just, hey, I wrote a bunch of code and now, you know, you have to build compiler errors or whatever it is. But actually, you know, this code works and, you know, might need some refactoring or polishing, but, you know, I can immediately start testing this thing that I asked to build. And that's been just transformative for all sorts of work. So this is a

### The task: a logging refactor across many files [0:52]

task where I've been doing a little refactoring related to logging. And this is one of those tasks where Codex can really excel because there is, it's not a complicated task, but it does, it did require modifying a lot of files. And there was also a bit of risk where you're modifying, you know, sort of a crucial component of

### The risk: do not break observability [1:10]

the app where our regression in this case would have meant our logs stop working and, you know, our observability pipeline breaks, right? So our ability to see the logs in the beta version of the app so we can diagnose back reports would break. So then the way that I would have done this before Codex is, you know, I've

### How this used to be verified manually [1:28]

made a change. I'm going to compile the app and run it and look if the logs are there, right? So in this case, I just told the model and we can

### Ask the model to verify logs end-to-end [1:35]

give that a try. I can see it using our logs tool and it's querying some logs. It ran the app and then it tried to find the session ID by

### It finds the session ID and queries logs MCP [1:50]

writing some Python code. It found right here. Nice. And then now it's using the old logs MCP to go and query that. Yeah, so I just came back to our conversation and the

### Proof: logs still pipe, task done fast [2:03]

model's telling me that it ran the command that I told it, it found the session ID and then it ran this and it found some logs statement. So I can tell that after our refactor, you know, logs are still being piked. So awesome. That's a piece of work that it just takes to me. That's very cool. Like 10 minutes on this task.