# Visualizing Neural Network Internals

## Метаданные

- **Канал:** sentdex
- **YouTube:** https://www.youtube.com/watch?v=ChfEO8l-fas
- **Дата:** 14.02.2024
- **Длительность:** 53:40
- **Просмотры:** 54,802
- **Источник:** https://ekstraktznaniy.ru/video/11401

## Описание

Visualizing some of the internals of a neural network during training and inference.

Starting and full code: https://github.com/Sentdex/neural-net-internals-visualized

Neural Networks from Scratch book: https://nnfs.io
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Reddit: https://www.reddit.com/r/sentdex/ 
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Instagram: https://instagram.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex

## Транскрипт

### Segment 1 (00:00 - 05:00) []

what is going on everybody Welcome to a bit of an old style sentex video a little bit ago I released the following two animations for um neural networks and kind of visualizing what is happening inside of neural networks so I have this one showing basically the layer various layer outputs all the way from input all the way to the actual output uh confidences for this classification model uh as well as another one showing the live weights changing uh inside the model as the model trains I really just made these purely for visualization sake uh but people were asking you know how it was done it's really just a lot of map plot lib code for the most part um but I can go ahead and just show how it's done the code is not the greatest it wasn't really meant for uh public consumption but my code never is really that great so I think we'll be okay um I'm going to go ahead and start with the following code already in place now you might be seeing this and being like hey what the heck um that's basically everything isn't it and uh it's really not uh basically the way I've made these animations is with the neural networks from scratch architecture itself so I did that because that's the one I know the best I know the neural networks from scratch architecture well like from scratch so um it's like the lowest level and easiest one to truly understand so um that's what I'm going to be using here you don't need to have Fen follow anyway followed along uh in or bought the book or anything like that uh to follow along here because here all we're going to be doing is just utilizing this Library um so if you want to learn more about how that Library actually works go for it but otherwise you can treat it like you probably treat tensor flow or pytorch or whatever your favorite machine learning library is which better be pytorch so anyways um I'm going to start with the following code so first this little bit of code here is going to just download that fashion mnist data set and that fashion Mist data set if you don't know it's like Mist only rather than numbers it's articles of clothing so you've got t-shirts and pants and shoes and other things like bags something like that anyway it has 10 total classes um and it's just a little bit harder to learn than regular mnist but it's still a very easy one to learn um you know for a neural network model I mean to learn uh and it's also a little more interesting because everyone just does mest um so it's just a tiny Step Above anyway you can feel free to do this sort of project though with your own data your own models all that um so uh this first bit of code just simply downloads that data set so I am going to host uh this notebook in starting form and then in the completed form so once I'm completely done um making it so I'll probably have one of the notebooks and then it'll say like start or something like that and then the other notebook will be like totally complete CU I will be utilizing co-pilot here as we write code I'll I might copy and paste some other code um so you don't want to I don't want anybody freaking out thinking they got to keep up with everything that's happening instead I'm just going to explain everything as I go through it um but anyways so this little bit is like I think four cells so it's grabbing the data then this next one uh so it down it grabs the data downloads it and extracts it um down here this is basically all that neural networks's from scratch code let's see if I can't get a zoom out a little bit so here it's basic we've got some layer definitions we've got um Dropout activation all that stuff um the only modification if you did if you have follow followed along I don't know why I want to say follow in along I don't know why my brain keeps wanting to say that really bad um if you have followed along with the normal networks from scratch book or whatever and you've made it to the end the only things that I've added here are the training dictionary definition here and then the actual utilization of it so at every step of the way we are we've got this training dictionary for every Epoch we add an empty dictionary for every step we also and then finally um at every step inside that empty dictionary for um each layer we have another dictionary and then inside that dictionary we are adding the per layer weights data bias data um the derivatives of the weights biases the weight momentums and the bias momentums um all of which are very interesting pieces of information that could be visualized it's kind of funny um everybody calls those you know internal layers of a neural network hidden layers and honestly no one ever looks at them but they're not really hidden it's just all anybody ever cares about is that final output layer running you know like loss and accuracy on that output layer but really there's all these other layers and lots of stuff is going on there and if you never actually look there um you're kind of missing out those layers are anything but hidden the values are definitely there um so

### Segment 2 (05:00 - 10:00) [5:00]

anyways it's kind of a misnomer and kind of a sad thing that we call them hidden layers and really to this day people pay very little attention to the internals of a network um and as you'll see this is just obviously a really basic example but you can derive a lot of very useful insights from not just we will be analyzing a lot of that kind of output layer but the result of you know hey why is this output layer doing what it's doing like why is the model getting confused on certain things or which things is it confused on it becomes really easy to tell this if you're just simply pull open the hood just a little bit and look um so anyways that's uh the only addition I've made is just adding this training dictionary and then basically as the model trains we're just grabbing these attribute values for each layer um and saving them into a dictionary that way every time we want to we have like a visualization idea we don't have to retrain the model um and regrab these values as it trains so anyway we're going to save that um and then coming down here I think all we do first is create and split the data yes so I'll go ahead and run that cell before I forget um okay I guess it's only three cells so then finally this third cell is just simply defining the actual model uh itself so in this case it's going to be two layers uh two hidden layers of 32 units each and then the final output uh we are going to use the atom Optimizer categorical cross entropy for loss uh because we are going to have that output of five samples or I'm sorry 10 classifications uh where each value is a sort of a degree of confidence uh we're going to train for Epoch patch size 128 cool cool um basically when everything's done to we will save that training dictionary so let's go ahead and run that and let this bad boy train um you can see already before even Epoch 1 was over we were at what 75% accuracy uh by the final Epoch looks like we're about 85 82 80 it gets a little wild actually at the end there is it done no still running so while we wait on that let me go ahead and close this we will actually begin um to potentially write some code if you are following along I'll do my best but at this point I assume as I've evolved I hope my viewers have somewhat evolved but if it says you're missing a package uh download the package right but just in case pip 310 uh we will be installing tqdm map plot lib if you don't have map plot lib and you're watching this video you're screwing up life and same thing with open CV python honestly uh the only other thing is if you are following Along on auntu with um a notebook and we're going to want to do both map plot lib matap plot lib inline but we're also going to use TK and I want to say on Ubuntu it's like you got to do like an app to get uh it's like Python 3 tkinter or something like that Google it GPT for it whatever it is you want to do Quin it mixl it uh figure out what that right one is but anyways continuing on uh there will be little things like that you'll probably have to grab and make sure you have but anyways Okay so we've got the model it's trained we can view the scrollable element and um this just gives us I'm not even really sure what are we oh this is the whole TR oh my goodness that's that whole training dictionary I'm going to regret having done that did we really print oh my gosh we do uh I'm going to get rid of that I shouldn't have viewed the scrollable element um so I think that's just going to give us lag so the first thing that we'll go ahead and do is uh we'll come down and let's just get a quick visualization of the data that we're working with so import map plot pipot as PLT that's fine I hit my caps lock um and then we'll just do pltm show and we're just going to show uh X test zero I can't I already can't type and I just started it's been too long since I've done a old style video apparently um okay so we will PLT to M show this should show in line because we're in a notebook very good uh looks like a pullover or some sort of jacket something like that it's obviously giant um and in fact it let's make it even more Giant so y'all can read my text hopefully okay um yeah so the so this is our input data and actually our input data is this is a reshaped version of our input data 28 x 28 is actually just a vector of 784 values um but anyways uh not only that we can also model oops model model. predict and we will predict on x test0 and anytime we do anything with this like model object all the model attributes are getting modified so when we do a model. predict we can go ahead and do the prediction and this just outputs that output layer and that's really all the classification models are doing right they're just going to Output that Vector of 10 values and whichever one is the argmax is the predicted value so we could say uh np. argmax and we can just run a model. predict again uh yes and we can see okay so class two is the predicted value uh

### Segment 3 (10:00 - 15:00) [10:00]

we can also go ahead and just print out what was the correct value so why test zero okay the correct value was two um not only did we see that full output layer here this is generally where people stop but again like I was just saying you can go very deep into these models so for example we could go for layer in model. layers and this little iteration here it will every time you send you do a forward pass through this model and the back pass it is going to update the various values on that model object so at any point we can iterate through the layers and grab information so I can say print layer whoops layer. output uh we don't actually want do shape and these are all of our outputs and as you can see we got 1 2 3 4 five six now you might be thinking to yourself what the heck we had we should have only three layers right you've got your input then you've got your hidden h and then output so really in theory if we're excluding input layer um which is just your input data we should only have three layers well the reason why we have six is because we have the layer and then the activation of that layer which in this case is rectified linear so okay so that's why we have six values but again these are all these are the output values of the layer and then the output layer output values after activation which is why in this Vector here we can see well really uh give me this here we can see there are zeros and those zeros are replacing any instance essentially of a negative number right because of that rectified linear activation function okay so now what we can actually do is visualize um essentially all these values so the way we're going to do that is we are going to start off everything the most important thing whenever you're doing map plot lib is uh from map plot lib you need to import style and the style that we almost always are going to want to use is not the one that gets recommended from your editors all the time and instead it is actually dark background that is the style you want to use not 538 it always wants to use 538 so now we're going to Define visualize uncore the layers we will pass two parameters it'll be sample it could be sample but I'm going to go sample num since that's what my notes have and figure okay so in here we will pass the figure the only reason I'm doing it that way rather than like defining the figure and then just kind of like you know clearing it or just updating it in the function or setting it every time in the function is eventually I want to be able to animate this um or use the pause thing and I think in both instances but for sure with animate you want figure to be outside so that's all that's coming outside for that reason so um yeah this might do a thing uh but this isn't actually where we're going so let's uh just I guess code for ourselves for once so input data will equal x test and it will be the sample num then we're going to have output truth turth or truth whichever oh my gosh truth equals y test sample num and then finally the actual prediction will be yes sir just like that very good so then um the we are going to change some colors based on whether or not the prediction was correct or incorrect so if I come over here you can already see maybe not those are averages let's go over here so you can see here as it's predicting sometimes it's showing red like hey it got the label wrong so that's all I'm trying to mimic now so what we're going to say is if um and there's probably a better way you probably just check the prediction first before we're going to do what I'm going to do but again I did not write this code thinking I was going to show it off and I'm definitely not showing it off I am showing it though mp. argmax uh is correct man sometimes it is pretty crazy um co-pilot that's pretty wild um that's it's not exactly my notes but it's close enough so what we're going to say is if the if it was correct we're going to say title text equals it will be an FST string and we'll say correct prediction was something and um prediction was uh and in fact we probably want to AR probably want to do we really should have saved this to a variable darn it well really actually predict uh let's say PR class we'll say is that I just I have to fix it it's causing me too much pain every time um correct prediction predict class uh truth should be output truth that's fine okay um so if that's true color will be green we actually don't really even need to set color so in fact what I'm going to say here is fig. sub

### Segment 4 (15:00 - 20:00) [15:00]

title um title text font size will be 14 I did font size 20 you can Tinker with this as you please um I'm going to save the line and just say color equals green okay so now that we have that uh that's if it was correct If it's incorrect hopefully this will just solve the problem for me wrong prediction truth cool okay very good so now that we've done that looking back at the chart we essentially have uh we've got one axis here 2 3 4 5 6 seven total axes that we want to graph and um everything is workable to be like a vertical um chart so that's kind of what we're going to go for here next so we are going to say ax0 will be PLT do subplot to grid and uh we can take the starting code here but we're going to modify it so because we're essentially going to do those seven verticals it's going to be a 1 by seven grid and then this one will start at the 00 spot and then it is going to be row span one call span also one hey man one okay so that's axis zero then we're going to do a quick speed test for co-pilot and see how fast does it go that's it's it's on the ball today I love it keep it up some days it's just too slow and it would be faster for me to copy pasta but that worked out and copilot is smart enough to understand that we are out of axis this is all we want right because it's a 1 by S we filled them up I love it okay so now that we have our axes we can actually plot to the axis so let's uh we're going to call these values um what they really are so what I'm going to say is I'm going to call this layer one is going to be equal to its models model. layers zeroi do output is that layer one now what do we want to do to layer one well right now that is effectively a vector right so it's if we were to graph it would be like a horizontal Vector what we really want to do is graph it and make it like that vertical so to do that we're going to use numpy to rotate it so np. rotate or rot um and in fact I think it's rot 90 right yeah rot 90 um we want to rotate those that Vector next how many times do we want to rotate it well you might be thinking well to go from horizontal to Vertical we would rotate it one time um but actually we need to rotate it three times so effectively 270° because if we just rotate it one time it's actually upside down if we rotate at 90° so instead what we're going to do is we are going to rotate because this is rotating uh counterclockwise so if we didn't do this you probably would never know unless you were paying attention like I was when I started with k equals 1 um that I was paying attention to the output layer and being like no the correct classification was two this was accurate but yet it's showing I don't know what it would be like seven um if it's flipped so anyway that that's the only reason I realized this I'm not a big brain I just happened to realize that something was wrong uh and then the axes that we want to perform this on um I believe we want this to be zero and one because we're trying to rotate both the column and like both rows and columns okay that's layer one next what we want to do is layer uh one activated is going to be equal to basically the exact same thing just layer one all we're doing is just giving it a slightly more appropriate name um that I think just makes more sense uh to me so then what we're going to do is basically I'm going to let co-pilot just have us I'm just do all this for us um we do have layer three activated how smart are is co-pilot is pretty smart we do actually want to show that um I don't want to get let it get too far in advance for us but anyways uh we do actually want to so we set the data okay we've got all everything correct now it's we're going to run through okay so we got ax0 is going to mow the input data reshaped kind of like we just did above to be honest then what we want to do is actually um mow layer one um layer one activated and so on so it's a little different order than I have in my notes but I think we'll be okay I bet co-pilot will figure it out um actually really ax0 I don't really think we even have it I never did a title for ax0 uh let's see if it will start doing ax1 so it does start to do ax1 but it's not quite what I want but ax1 mow layer one cmap that might be acceptable but really the color mapping that I want to go with is going to be uh red yellow green for layer one because layer one could be anything from a negative to a positive for the others for the activated layers so actually let's just go down one more

### Segment 5 (20:00 - 25:00) [20:00]

cool layer one activated for these because it's rectified linear you can't have a negative value so we won't actually have red and it will instead just be yellow to Green now yellow to Green sure looks a whole lot to me like it's more like white to yellowish to Green um but anyways um I think that makes way more sense for the active activated layers uh which is basically this one and the final layer uh output I think that makes more sense so that's why I went with that but you can f afraid to change the colors be whatever the heck you want you could go with verdice or whatever that is I don't even know how to pronounce that so then uh ax1 ax2 ax3 ax4 cool we'll show those um ax5 that's fine and ax6 that's fine zero axes off um that actually might look better than what I have because the axes are on uh do we want to turn the axes off I don't actually think it'd be nice to turn off the x- axis uh potentially but Ah that's enough we've done more than enough I'd say um okay so that can be our function I'm happy with that so what I'm going to say is fig equals pt. figure fig size we'll go with a 12 x 12 it might look a little goofy because of the um you know the place I'm recording in but this should be fine uh figure okay so let's go ahead and run it visualize layers uh we don't need a PLT doow because we're just going to do this okay yeah that is quite large um cool all right so in my thing I did actually give titles uh to each one L1 activated we could do that if uh if we want um it's not really totally necessary I think you can tell what what's what here but um yeah okay so that looks pretty good um but of course uh you can see here that it's very confident seems like it's predicting Class 2 quite well um but this is essentially very doal it's just one sample and instead what we could do is kind of iterate through U the above so instead what I'm going to do now is I'm going to go uh map plot lib and we will do TK I will zoom in thank you for a reminding me person yelling at their screen um so uh the next thing import OS because we're going to save them and then from map plot lib we're going to import animation um I'm going to set a limit to a th I don't want to go any more than a th samples and then I'm going to we want to save these so we don't have to keep making them because it does take a little bit to make them and later if you wanted to make cool animations uh you would want to have already saved them and then use like FFM Peg or something else to you know compile them so that's the directory name um if not yep cool make it thank you I love co-pilot I've heard good things about cursor uh one day maybe I'll try cursor but I just don't know why I would so PLT equals PLT figure okay we'll keep it a 12 x 12 uh next we are going to animate uh I is essentially the frame but that's all right are you kidding me yes that's basically what that is what we want to do so cool um yeah so we want to animate based on the frame will also serve as our sample number uh every time we animate we want to clear the whole figure um and then we want to visualize that figure and in fact are we defining we're not right no seems a little sloppy um something seems goofy there but that's okay I think we're fine cuz we redefine the axes every time so I think that'll be that's fine so fit clear visualize layer uh PLT Dove the figure um this is actually acceptable but the way that I did it in my notes is I also added the actual um truth in front so later you could go back and inspect I suppose the ones that aren't accurate um cool okay so once we've done that we can actually run the animation so that's just animation. fun FK animation yes uh figure animate uh the frames how many frames do we want to do we just set the limit interval at 100 um in fact I would just probably go with whatever the heck the default interval is honestly um and that's good so then we just need a pt. show yep too good um and we can go ahead and run it so here we have our um it's just iterating through the sample so as you can see it's not the fastest thing in the world which is why it's it could be convenient to go ahead and save them but one thing that we can already see is you have like two four and six uh seem to be often misclassified and whatever the heck we just saw was all over their charts so it is clear that despite such a high accuracy for this model at least this particular um class seems to have some confusion at least some of the time um that it's it clearly is a trend to often predict two four or six for Class 2 and so you could probably guess that four and six also often classify as two and possibly the other one of six or

### Segment 6 (25:00 - 30:00) [25:00]

four so we can then start digging into that and then once we dig into that we could also be looking at the internal layers um kind of from there so uh that is kind of the next thing that I really want to do is actually save rather than going sample by sample what if we just look at the average for the entire um the entire sample um truth let's say and I think that will bring a little more attention to what are the common mispredictions but then also um what's happening internally in that model like are there actual differences at what point in the layers is are things going toy and this can help to inform maybe we need more layers maybe we need bigger smaller layers sometimes um so it just really depends because if your model is too big it's just going to overfit to all your samples for example um and then from there you could also be paying attention to um the changes in these values how active is it and then especially on like weights for example um that can kind of tell you too when the model's kind of done training but anyways let's go ahead and close this out and instead of what we're doing going sample by sample let's go for an entire uh average so I'm going to from tqdm import TQ tqdm and then uh we will do layer data by class and this is going to store all of the classes of um you know potential truths and then each truth will have a another diction AR of each class and all the predictions did it just suggest that um yeah that's basically what I want but instead of going all the way to nine so I guess it's not the most perfect we actually only want to go to five and the reason for this is so this is sample so for a sample type or sample class of zero we want to get layer one essentially or the zeroth layer Firth second third fourth fifth right so all six layers we want to get every single for every single sample every single layer and then later we can average every single layer to get average values so that's what we want to do for class zero and then we just basically want to do the same thing for class one two um I'm not sure why it ended there why would it think that it should end there that's kind of weird uh we want sample three four what happened we'll get we'll figure something out here five use oh there's no comma I see okay well I thought we were saving time we are not saving time okay so five and then 6 7 8 9 and then finally we will fully close off this dictionary okay I think that's how we want it okay so now what we want to do is um we want to run through all of our testing data and start populating that dictionary so for data n so data number in TQ or data oh hold on tqdm uh range Len X test so this is basically just going to be uh why are you so ugly what happened did I screw something up here why you X test oh text oh no tried it again okay so for that data number we are going to essentially we want to do um actually don't need to we just need to run a predict so model. predict because again as soon as you do the predict um it's modifying all those attributes so we can actually just run the prediction um and then we can grab um actually don't know that we need no we actually of course we need the truth God stupid uh the truth will be y test data n then what we want to do is start adding all of the lay layer data to whichever class it's supposed to be we want to populate basically all of these for every sample so uh to do that we are going to layer data by class and then it will be um truth and then yes the layer so zeroth and then we want to append model. layers that zeroth layer and that output I believe that's that is what we want okay so then we'll do the same thing yes cool all the way down come on why does it keep wanting to stop at two oh no this is also wrong o so

### Segment 7 (30:00 - 35:00) [30:00]

this is why you don't have um messy code so for in my head I'm thinking this is for every class right but this is not for every class because you're really only like you're not doing this for every class it's for every data you figure out what the truth is and then all we want to do is basically layer data by class truth whatever that truth is and then we want to save every layer right so we have these five layers we want to save essentially this information and this is probably why uh doing a loop instead of this makes a little more sense so maybe for uh number of layers or something like that uh would probably work a little better than what I've done here but anyway let's go ahead and fix this three four and five so again for given a specific truth we're going to access whatever that truth class was given that class we then want to populate and append the uh Vector of values for that particular layer and that's going to be the output layer and then also the activated output layer and that's why we have six of them okay once we've done that uh let's go ahead yeah let's go ahead and run that just make sure okay cool it runs that's good start um and now what we want to do is actually average this layer data by class average so I actually want to say layer data by class averages and that's going to be equal to um really the exact same thing yeah this yes um and then what we want to do is for class n in range we know already know we have 10 so we'll just do that and then for layer and in range six because we have six different layers uh what we want to do is calculate the average so layer data by class averages class n layer n I'm going to go ahead and accept it um equals np. mean and we want to run the mean of the layer data yes layer data by class n layer n yes we want to do that on axis zero yes uh which is the row and yes I think that's everything so now we will have all the averages hopefully we'll see we'll find out really quick so now what we want to do is GR um the averages so I'm going to go back to M plot lib in line thank you and we are going to from well actually I think we have really everything we need I don't think we actually have to import anything anymore so uh Dame we'll call it class underscore a averages cool and then we'll go ahead and make that path just in case we will use um let's see so class there's also probably a much better way to do what I'm about to do but we'll go ahead and set the figure as well fig equals cool 12 x 12 class sample dict um there's got to be a better way someone comment below the superior way uh and class sample dick so this will just be an empty list I suppose 0 one hopefully you'll finish for all the way up to class 9 so right now this is just an empty dictionary with each class and then what we want to do is actually set it so we're going to say class sample dict zero and then here we are going to uh we won't aend we're just going to re reset it so equals uh X test and for example we'll do this 3328 and I think I might just copy and paste um these uh paste so what is going on here uh all I'm doing is grabbing a like representative sample of each classification so for a zero I'm grabbing This and like the class two for example they're all in like order two so Zero by thousands right there's 10,000 samples and every thousand is a new classification for some reason it doesn't go 0 1 two through nine it starts at like samples 0 to 999 are class two for example so and that's why you know the prediction and ground Truth for X test zero is actually a two and not a zero anyway okay so once we have that we also I'm going to copy and paste this as well um it's only going to be for writing uh you don't really actually need to copy and paste this um and then even for these you can just put a random sample if you want uh so class description dict basically this is just the actual text description for each classification so zero is t-shirt top one is a trouser two is a pullover three is a dress coat and so on okay so once we have those we are ready to basically

### Segment 8 (35:00 - 40:00) [35:00]

graph every single one of these um their averages so for class n in range 10 we want to graph each of these so uh fig. clear we probably clear at the end and in fact I think we'll do the figure I don't actually know well whatever we're just I just we're at least in my notes I'm redefining it down here and so I'll just redefine it again um we're not going to do that yet um and fig. subtitle is going to be average values for class and it will be whatever that class n is and then in parentheses we'll put CL whoops we'll put the class description for class n Okay cool so that'll be our subtitle next we want to have those exact same axes so I'm actually just going to go copy post copy make some more space here uh tab over paste those very good all right then what we want to do is really copy and paste all that other previous data too and then we'll just change the values because the values will change to be the averages but otherwise we just copy this come back down here paste very good so rather than model. layers it's going to be layer data by class averages so and then class n so class n and then the value uh that we want to graph and then it won't be do output it's just whatever because we didn't save it as a object so now it's basically this replaces this and this becomes a one and then replaces this becomes a two it's going to be a miracle if I get away with this three four and five cool okay ax show um it won't be input data it will be class whoops class sample dict it will be oops uh class n and then reshaped 20 x 28 cmap gray very good um off very good and then we'll actually go ahead and PLT Dove fig we will save it to um did we actually not do we didn't so Dame class averages so it is actually making that why did it okay so we're fine um class n uh and in fact let's just save it as class n. png. png cool okay so now that we've done that it should I think it should graph all of these for us right below it let's see if we get an error oh it's actually going to work oh my goodness it's incredible oh my Lord how wonderful very cool okay so we can see here by getting the averages we can see that for example class zero a t-shirt um often misclassifies as a three which I think is a dress if I recall right I can't remember or a six um we can then see for trouser that no the model is definitely very confident almost all the time about trousers class two pullover we can see a little faint zero mostly two which is the right one uh but also a lot of four a lot of six um we get to the Dress which is yeah that three so dress is getting confused sometimes with um a shirt for example uh coming down to class 4 a coat I'm very confused about a two a lot a six sometimes a three even uh and continuing along we can see here probably class six shirt is potentially the worst one we've seen yet where it's all over the place um yeah you get the idea bag very confident ankle boot pretty confident um then what we could do is even come over to let's say one of the ones that is like frequently wrong like six let's look at six and four so if we pull up the image of six and

### Segment 9 (40:00 - 45:00) [40:00]

four um we can kind of see that there are a lot of similarities at like after the second hidden layer for example so after the second activated hidden layer it is very similar right as opposed to like if we just switch to sandal for example you can see how much variance there is from class 4 to class 5 on that uh second layer second activated layer uh and then also coming back to two for example so two versus uh six again looking back at six we can see that really that output layer preactivation looks unbelievably similar I mean really the only difference glaringly really I just don't see that many at all like this one is a little darker over here um but yeah you can see there's so much similarity there and it's interesting because actually the activated first layer um there are quite a lot more differences but as it goes through the model it definitely comes together pretty significantly but even here not too many differences um anyways interesting but at the same time you might think oh okay well that's obvious okay so we sort of have identified a problem but then also I mean just look at these a pullover versus apparently a shirt um if you ask me to identify which of these is a pullover and which of those is a shirt um I would be frustrated come on I mean come on um so anyways no doubt that you know it's like is the model messed up or is the samples silly but we also know that it predicts like class six predicts as a zero sometimes so after we get done throwing our Mouse around um this one's a little more clear like it really shouldn't be confused for a pullover or a shirt but a t-shirt is in theory a shirt right so again it might maybe at some of the other s samples it gets confused with potentially uh what else do we have here a four right a coat again I mean these are so close um yeah okay anyway poor models okay so with the actual weights at from a you know Frozen trained model there's still a lot of stuff that we could show you could show the variance between samples and figure out which of the neurons is still the most active between sample given a specific classification stuff like that there's lots of cool things that you can still do from here but let's go ahead and jump to the next one which is the live updating of Weights which is actually kind of cool to see um although with such a simple problem like this it's uh over before you know it but that's okay let's go ahead and uh make sure we're back on TK matplot lib TK wow it already knew that's what I guess it makes sense that I'm flipping between them but anyway that's cool that it knows that uh we are going to import pickle we probably already have pickle because we saved it as a pickle right and then we also have style which is necessary and we have PLT already so really we have all this stuff I don't even think I need pickle so we're going to define a new um function we're going to call it makore plots and in this one what we're going to say is with open and we are going to open train dict train dick. pkl as RB cool and then we will set train dict is equal to pickle. load that file now we want to have our plots so we are going to first set our figure is going to be PLT do um actually PLT do figure we'll go with a tww in my notes I have a 15 by5 but I think I actually ended up changing it somewhere as well I think it gets like redefined no maybe not okay well the more you know don't even know my own code okay 155 sure we'll just leave that there and now we're going to Define three axes because again in this chart uh where is my uh I think it's this one yes right in a sense we want three graphs on this figure one two three so I'm actually going to use uh the more basic way of making figures and that will be ax0 equals fig. add subplot and it's going to be a 1 by three and this will be plot number one and then we'll do it two more times so Bang Bang uh one two and this will be plot Number Two plot Number Three then um there's probably a better way to know this but I'm just going to say we know there are five epochs four Epoch in range of epoch but train dict we actually saved the epoch number so I actually want to say range one to EPO + one then um oh yeah this is all goofy I mean maybe that

### Segment 10 (45:00 - 50:00) [45:00]

would be cool right because we have our training dictionary um pltp where does it oh it clears here let's just look at what that shows I'm curious actually so um I think we actually have to PLT that show like here probably right ma yeah TK so PLT show um let's just see what happens I don't think that might not run but oh that is kind of cool that that's messy as heck but um kind of neat looking I suppose um yeah okay sure bye uh oh we may never uh we never live that one down hope I get away with that um okay so for uh let's get back to what we were trying to do so PE up pause uh Legend I don't know uh for step in range or is that really what I want to do first step in oh man I broke my brain okay so for Epoch let's go ahead and print uh Epoch where we're at basically um and then a four step in train dict that particular Epoch what do we want to do well we could print um or we could make a graph for every single step but I propose that we don't do that because but step to step the variance is pretty minimal so we can actually do things a little quicker a little more seamlessly if we don't graph every single step because it's a little unnecessary um if you saved every single chart and then you later put them together again with like FFM Peg or something like that then sure um go have at it but otherwise it'll just be too slow I think if we do for every single step so instead what I'm going to say is if um Step modulo 10 equals z then what we will do is we'll print the step um and in fact let's print step step cool so if that is the case now we want to graph it so what we're going to say is weights uncore 0 equals train dict Epoch Step Zero weights yes so we're just accessing that training dictionary and then we're going to do this for a weight zero one and two cool now we want to show these so we're going to say ax. mow weights Z cmap will be red yellow green beautiful and then we want to do that whoops for pasta one two one two very good and then we will fig. Super tile um fashion mnist denet Epoch uh what is it just Epoch right yeah Epoch uh step cool oh yeah we'll set the font size to I'll take that font size 20 cool all right so now um we can um we could give a little description just in case anybody's not fully following along set title um this first one will be layer one weights um this takes in 784 out 32 ax1 layer 2 in 32 out 32 and oh my gosh oh that's pretty impressive okay very good um I also set it I did set the font size so 0 title do set font size 10 so I did set these a little smaller so I'm going to go ahead and copy paste paste one and two cool uh for some reason I can't remember if it's to I think it's just to stretch so things didn't look a little goofy so basically I wanted these all to be approximately as wide as each other irregardless is not an acceptable word regardless of how wide or how many cuz like here we've got quite a few values versus here um well really actually these are

### Segment 11 (50:00 - 53:00) [50:00]

this is 32 this is only 10 so anyways continue along so I did set that so let's go ahead and ax. setor aspect to I went with 0. 05 again that's just a stretch axis zero um I'm not really sure I guess I'm trying to decide why maybe because they were trying to be the same size as the y axis maybe that's what it was anyway continuing along so uh ax. setor ym we 0 to 784 so it doesn't show any negative values there and then fig. set size Ines will go 8 by 8 again this was just a my notes this is probably only for like if since I was saving the figure but anyway now we will use pltp pause indeed just a nice short pause and then finally we need to clear the axes so ax. clear ax1 do CLE ax2 do CLE then we run a pl. show and we can make plots and let me see I don't think we have anything else so this should show the changes over time let me go ahead and run this really quick so actually you can already see um it started when I wasn't able to show it like I couldn't show it so quick so really in that all the action pretty much happens in that first Epoch and then from 2 to 5 very subtle changes are happening it's very hard to see anything really in real time um changing which is why if we come over to the uh Twitter post you'll actually see that we start on Epoch one show it quite heavily but then as soon as we jump uh to Epoch 2 um I basically go Epoch I super speed up Epoch 2 to 5 like 2 to five is like a snap of a finger speed because you can see how very little is changing uh here's Epoch 2 and then boom like such minimal changes tons is happening in Epoch 1 but then towards the end very little is happening so anyways um thank you so anyways um those are the two animations very long video but uh hopefully some of you found that interesting uh and have enjoyed the uh slight uh return to the more old style videos um anyway pretty cool I feel like there's so much that can be visualized here so if you guys have any other ideas of like things that can be visualized uh like make a new file and then like a poll request or something on the repo that I'll link below with the starting code and the finished code for at least up to this point um I'd love to see what other people can come up with I feel like there's so many cool visualizations um that can be made when you have such an obvious how it works uh neural network kind of framework um and again if you don't have to have known neural network from scratch but uh if you want to learn more about all this code and exactly how everything works how all the math works all that uh check out the neural nerves from scratchbook at nf. otherwise I will see you all in another video for
