This image started as pure noise

2:56

This image started as pure noise

Steve Mould 08.05.2026 455 991 просмотров 23 618 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

You can buy my books here: https://stevemould.com/books You can support me on Patreon and get access to the exclusive Discord: https://www.patreon.com/stevemould just like these amazing people: Glenn Watson Peter Turner Joël van der Loo Matthew Cocke Mark Brouwer Deneb Twitter: http://twitter.com/moulds Instagram: https://www.instagram.com/stevemouldscience/ Facebook: https://www.facebook.com/stevemouldscience/ TikTok: https://www.tiktok.com/stevemould Buy nerdy maths things: http://mathsgear.co.uk

Оглавление (1 сегментов)

Segment 1 (00:00 - 02:00)

The way you train a diffusion model neural network is you take some images. That's the training data. And you tweak the RGB values of all the pixels randomly. That's called adding noise. You then ask the model to tell you how it thinks you tweaked all of those pixels. Like I think this pixel is too green by this much, this pixel is not red enough by a small amount, and so on. And it gives you that as an array of values. In other words, you're asking it what the noise was that you added to the image. And remember, this is still the training stage. So, at this point, you say, "No, that's terrible what you came up with. This was the actual noise that I added. Now, go away and fix all of the parameters in your model, so it does a better job next time. " But actually, you're doing it with millions of images at the same time. So, you're saying, "Look, you got all these pixels wrong by this much in this image, and millions of images. tweak all of the parameters inside your neural network so that it's better for all of these images at the same time. I told you it was wild. So anyway, you train it on noisier and noisier images. In other words, images where you've messed about with the values of all the pixels more and more until eventually you can give it an image that's just pure noise, completely random pixels. And so then when it gives you a prediction of the noise that was added to an image, when you remove that prediction, it will give you an image that looks like something recognizable from the training images. Now, at this point, you have no control over what comes out at all. It's just going to be random what you get. Well, the model is able to figure out what noise needs to be removed from an image because it has learned to understand features of images. And in some ways it's very similar to the way a language model understands the meaning of text which is to say it goes through several iterations where its understanding of image features become more and more nuanced. So maybe in the first iteration it's just picking up on where the edges are in the image and then in later iterations maybe it's picking up on concepts like furriness or maybe light glinting off a shiny surface and then later on maybe that it's in the style of a Monet. It would just be a long list of values that represent these things. And again, if you actually interrogate the values, it might be very difficult to actually pick out specific things that we would understand. And that's where a model called clip comes in. Clip was trained on millions of images from the internet and their captions. So during training, it was learning the meaning of the text and the features of the image and the vectors it was producing for both of those things. it was putting in the same place in a shared vector space of semantic meaning and image features. And so the model will now guess what noise needs to be removed in a way that pushes the image towards having features that are encoded in this spectre. Yeah.

Другие видео автора — Steve Mould

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник