Radioactive data: tracing through training (Paper Explained)
36:02

Radioactive data: tracing through training (Paper Explained)

Yannic Kilcher 26.08.2020 5 603 просмотров 187 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
#ai #research #privacy Data is the modern gold. Neural classifiers can improve their performance by training on more data, but given a trained classifier, it's difficult to tell what data it was trained on. This is especially relevant if you have proprietary or personal data and you want to make sure that other people don't use it to train their models. This paper introduces a method to mark a dataset with a hidden "radioactive" tag, such that any resulting classifier will clearly exhibit this tag, which can be detected. OUTLINE: 0:00 - Intro & Overview 2:50 - How Neural Classifiers Work 5:45 - Radioactive Marking via Adding Features 13:55 - Random Vectors in High-Dimensional Spaces 18:05 - Backpropagation of the Fake Features 21:00 - Re-Aligning Feature Spaces 25:00 - Experimental Results 28:55 - Black-Box Test 32:00 - Conclusion & My Thoughts Paper: https://arxiv.org/abs/2002.00937 Abstract: We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimization methods. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Our experiments on large-scale benchmarks (Imagenet), using standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we can detect usage of radioactive data with high confidence (p < 10^-4) even when only 1% of the data used to trained our model is radioactive. Our method is robust to data augmentation and the stochasticity of deep network optimization. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods. Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Оглавление (9 сегментов)

Intro & Overview

are you tired of other people training on your data that annoys me every time it happens ah i'm mad about this uh if only there was a way to somehow mark your data and when other people train on it their computer would explode well this paper is a little bit like this not entirely the explosion part i think they're still working on a follow-up paper but in this case in this paper called radioactive data tracing through training um by alexander sableroll matis dus cordelia schmidt and erve jigu they develop a method that at least you can detect if a given model was trained on your data or not on your data and they call this process radioactive marking or radioactive data for short so the overview you can see it's pretty easy paper actually the concept is pretty easy and it's a nice concept and it's been around in one form or another it touches on adversarial examples it touches on differential privacy but in essence it works like this if you have suspect if you suspect someone else training on your data or if you just have a data set that you want to protect what you do is you mark it with this mark and they call this a like a radioactive mark but essentially you just distort your images a little bit then um when someone else trains on that data so here a convolutional neural network is trained on this data and not all of the data needs to be marked they can go as little as like one or two percent of the data being marked then from the output of that network or from the net inspecting the network itself you can then test whether or not this network has been trained on this radioactively labeled data so you will see a clear difference to a network that has been trained on only what they call vanilla data so data that has not been marked so i hope that's clear what you do is you train sorry you mark your data what the kind of what bob does no what's the attacker's name i don't know but what eve does um is train here a network on data and you don't know whether it's this or this and then you do a test to figure out which one it is okay so we'll dive into the method and look at how well this works pretty simple but pretty cool

How Neural Classifiers Work

so their entire method rests on this kind of notion that these classifiers what they do is if you have a neural network like a convolutional neural network you have your image your starting image of your prototypical i don't know cat and you input this into many layers of a neural network as we are used to but the last layer is a bit special right because the classification layer if let's just assume this is a classifier so if this is c410 for example there are ten different classes that you could output and so 10 of these bubbles right here that means that this matrix right here is a number of features let's call it d by 10 matrix okay so the network this part right here we would usually call a feature extractor something like this so the bottom part of the network basically does this it's non-linear transformation and so on extracts d features these are latent features and then those features are linearly classified into 10 classes okay the important part here is that last layer is actually just a linear classifier and we can reduce this actually down to a two class classifier so the five function would just put points here in somehow you know i let's just make them two classes the x's and the o's and so on so if the phi is good then the last layer has a pretty easy job linearly classifying it right here you can see here the file is not very good we can't linearly classify this data so by training the neural network what you do is you make phi such that it will place hopefully the one class somehow on one side the other class on the other side and you can pretty easily linearly classify that data okay the exact slope of this line right here the exact location of this line and direction of this line that's what's encoded ultimately in this matrix right here so this matrix now not only for two classes but for 10 different classes it records these hyper planes that separate one class from the other class and these are in d dimensional space so you have d d-dimensional 10 d-dimensional hyperplanes separating the space of features linearly into the classes so what you can

Radioactive Marking via Adding Features

do is you can actually think of this d um sorry of these d dimensions here as features right this is a feature extractor so it provides features to a linear classifier now what this method does is when it radioactively marks data points it simply adds a feature okay so how do you think about these features so for example let's say this is actually this animal classification example and if you are asked to classify cats from dogs from horses and so on one feature could be does it have whiskers fur right you can maybe distinguish cats from turtles and so cats and dogs from turtles um does it have how many legs so the number of legs and so on so you have all these features and the last layer simply linearly classifies those features together what this method does this radioactive measure that it adds a new feature per class so um down here i would add a new feature that says like this is the radioactive feature can i draw the radioactive symbol this is the radioactive feature for the class cat okay and then of course i also have one for dog and so on so it would add or basically would you don't change the dimensionality but in essence you add one feature per class and that's what they mean here by this direction u so in this high dimensional space that is spanned by these uh d-dimensional vectors and you can so this thing here okay sorry i'm switching back and forth this thing here you can sort of if d is equal to 2 you can imagine it as 10 vectors in a space in this feature space okay 10 of these vectors and whenever you get a point that's is that eight you simply look at so if you get a data point right in here goes through here you come here and you look with which class does it align more the most and that's how you classify it okay so if you think of this then what you want to do is you want to add a feature here such that um this is one per class i'm having trouble articulating this and you want to change your data points here you can see your data points and for this class x we make this radioactive feature right here which is the blue thing we shift the data into the direction of this feature okay so basically we add the feature u which is just a random vector in this high dimensional space we choose one vector per class but then we shift all the data for that class along this feature so what we are doing is we are introducing fake a fake feature that we derive from the label right so we we kind of cheated here we have x and you're supposed to tell y from it and that's your training data but then we cheat we look at y and we modify x with the feature of that particular class so what does that do ultimately we have we end up with u1 u2 and so on so one feature per class it trains the classifier to pay attention to these features right so if u1 is the feature for cat then we train this classifier by training it on the data that has been modified in this way we train it a cat should consist of something that has whiskers has fur has four legs and so on and also has this cat feature okay now the um the danger of course here is that the classifier will stop to pay attention to anything else and only look at the cat feature because we introduced this feature to every single example that was of class cat so the classifier could have a pretty easy way just looking at this feature determining well all of this is cat and then it would not generalize at all so what we can do is first of all we can make the feature very low signal we can make it very small such that there are other features such as these other features are also pretty easy for the network to pay attention to and second of all we can label not all data and that's what they do here they label maybe ten percent maybe two percent of the data with that which forces the network to pay some attention to this feature but also to pay attention to the other features and that ultimately if you trade this off correctly results in a classifier that it does give up some of its generalization capability because of course zero percent of the test data has these features right here we modify the training data uh to add these features so you give up a little bit of generalization capability but you force the classifier to pay attention to this feature during training and that is something that you can then detect so you can imagine if you train a classifier that has been trained on training data where some of the training data have these features in here and that's one distinct feature per class right then you can look at the final classifier and figure out whether or not um whether or not the classifier has been trained how do we do that so let's imagine that in this high dimensional space here the training examples they all you know they point in kind of this direction right here okay so all the training examples of one particular class so this is now the dog class all the training examples point here how would you build your classifier well it's pretty easy i would build it such that the dog class points in this direction okay i'm just erased a bunch of other classes right here now i choose a random feature when i build my radioactive thing i choose a random feature like this one right here okay and what i'll do is i'll shift my training data a bit into that direction okay um how do we do this how are we doing this i'll just dash it okay so i'll shift my training data a little bit into this direction so all of these they move over right here and that's where the final classifier will come to lie a lot more towards this new feature and this is something we can now test with a statistical test and that's what this paper kind of works out in the

Random Vectors in High-Dimensional Spaces

math so usually if you have two if you have one vector in high dimensional space like this one and then you look at the distribution of random vectors so this one maybe this one this one feels pretty random this one's pretty random okay humans are terrible random number generators but these feel pretty random and you look at the cosines between the random vector and the vector you plotted initially they follow if this is truly random they follow a distribution they follow this particular distribution that they derive here okay so you can see a classic result from statistics shows that this cosine similarity follows incomplete beta distribution with these parameters now they from this they derive a statistical test so if you know what kind of distribution i um a quantity follows you can derive a statistical test to see whether or not what you measure is actually likely to come from that distribution or not so what we would expect if our data has not been modified is that you know we choose a random direction u right here um this is u for dog we choose that random direction and if our training date has not been modified we would expect this dog here to have its cosine similarity to be not very high because there's no reason for it right these are just basically two vectors that are random to each other and in high dimensions they should be almost orthogonal so in high dimensions random vectors are almost orthogonal however if the data has been marked during before training that means if the classifier used our marked data set to train it we would expect this cosine similarity right here to be not orthogonal so to be higher than just random and that's exactly what we can test you saw at the beginning right here so here is the down here you can see the distribution of cosine similarities and um you can see that if you train with without marked data this centers you know around zero however if you train with marked data you have a statistically significant shift between the marking direction the marking feature and between the classifier direction so the all you have to do is mark your data in this way and then look at the final classifier look and these blue vectors right here these are just the entries of this final weight matrix right these are the blue vectors you look at those and you simply determine if the for the given class if the vector has a high cosine similarity with the marking direction that you chose to mark your data if it does you can be fairly sure that the network has been trained using your data okay so i hope the principle is clear you introduce a fake feature per class and you make the network pay a little bit of attention to that feature because it's you know a good feature in the training data and then at you know after training you can go ahead and see whether or not the network is actually sensitive to that feature that you fake introduce that is actually not a real feature in the data if the network is sensitive to it you can conclude that um your training data was used uh in order to produce it so there's a couple of finesses right

Backpropagation of the Fake Features

here um so as you might have noticed we introduced these fake features in this last layer feature space right here however our pictures are actually input here in front of this feature extractor so we need a way to say what we want to do is we want to say i want this data point here to be shifted in this direction but i actually this data point is actually a result from an input data point i'm going to call this i right here going through a non-linear neural network ending up here so the way this is done is by using the same kind of back propagation that we use when we create adversarial examples so what we do is we define this distance or this distance here where we would like to go and where we are as a loss and then back propagate that loss through the neural network and then at the end we know how to change the image i in order to adjust that feature so they define a loss right here that they minimize and you can see here is where you want to go in feature space and they have different regularizers such that their perturbation in input space is not too high and also here their perturbation in feature space is actually not too high so they want they also have the goal that this radioactive marking cannot be detected first of all and also that is it's a robust to re-labeling like if you give me data and i go and re-label it and ask my mechanical turk workers to relabel that data again they will give them the same label even if you have radioactively marked them right this paper says nothing about defenses right these things are defended against fairly easily i would guess by some gaussian blur uh i guess would be fairly effective right here though there are also ways around this gets into the same discussion as adversarial examples the question here is can you detect somehow in the final classifier whether or not this someone has smuggled radioactive data into you into your training process i'm not sure but i'm also sure there are better ways to radioactively mark right here this is kind of an establishing paper um doing the most basic thing right here interestingly they also back propagate through kind of data augmentation procedures as long as they are differentiable and the last kind of difficulty you have is that these neural networks they are they have

Re-Aligning Feature Spaces

some symmetries built into them so if you retrain a neural network there is actually no um so if your neural network's classification let's say it's a three-class classification looks like this right this is the last layer and these are the classes it's determined if you retrain it might as well be that this now looks like this right so um if you marked it with this direction right here and then you try to recover this direction you'll find that it doesn't work because the entire classifier has shifted so what they have to do is what they call a subspace alignment which you can do by simply um here determining a linear transformation in the last layer this is usually enough and what this does is so their entire procedure is they train themselves a classifier on unmarked data i forgot this before i should have mentioned this they train themselves a classifier on unmarked data they use that classifier to mark the data which you know you need in order to do this back propagation thing you actually need a working classifier and then when they give the data to someone else to train they are going to train their own classifier on the same data right so there is no guarantee that these two classifiers spaces align especially because you have this kind of symmetry and they say right here we can fix that by if you know we have this classifier and at the end they give us this classifier to test um we can simply determining this linear transformation here that maps one to the other so we go over our data set we determine m a linear transformation so basically here you would determine a rotation of this space that would map one to the other and vice versa this is not exact of course because the two classifiers there's no reason why they should even be linearly related but there is a reason coming from kind of neural network knowledge and that is that we know or we have a strong suspicion that these neural networks of course if they work well and if they reach good accuracy similar accuracy it's very probable that they have somehow figured out the same features okay even though these networks learn each feature from scratch and that you as i said you have some symmetries in there but ultimately at the end of the neural network is very likely that the network learns the same features as another network even of a different architecture that has been trained on the same data set this is i think this is supported by research in adversarial examples like this paper adversarial examples are features not bugs you know transfer learning transfer of adversarial examples all of this kind of stuff points to the fact that these networks tend to extract the same features and therefore we can assume pretty safely that up to this kind of rotational symmetry our space our feature space will look the same and therefore determining kind of this linear transformation should probably be enough in order to align one subspace with the other and be able to compare your own classifier that you have trained to mark the data to the classifier that someone else has trained on the same data okay this is kind of the one of the difficulties they had to overcome

Experimental Results

and um yeah they evaluate this uh they evaluate this experimentally you can see right here on the right 20 of the training data has been marked in order for the orange um for the orange data this is these are random directions so blue would be the correlation with random directions and because sorry orange is the correlation with these carrier directions with the directions of the fake features and green is the alignment with actually the features of the classes itself so you can see even if 20 of the data is marked the classifier still aligns mostly with the features of the actual classification problem it aligns a little bit with the features of the um fake features or with the fake features and it does so such that there is a statistically significant difference between random directions and these and you can see even if two percent of the uh data only are marked so only two percent of the training data has this mark and the mark is always imperceptible right such that you can't see it by eye even then you can see that there is a difference so the classifier does learn to pay attention to that feature which is something you can detect afterwards this experiment on the left here is just the same basically saying so up here it starts with not a lot of data being marked and you can see it mostly aligns with the semantic direction which is the true features as you mark more and more of the data it goes down and down but it does not so i think this is 50 is the yellow 50 of the data is marked and still you can see there is a pretty good alignment with the actual features because the network um will start paying more and more attention to your fake features because they're pretty good predictors right but it also has this other training data that it can solve using those features so it still needs to pay attention and of course your marked data also has these other true features so it is to be expected that even though your data is marked it's still the classifier still aligns more with the true features than with your fake features and they also show in experiments that you do not sacrifice a lot in accuracy so here you can see the delta in accuracy it through their experiments is fairly fairly low and they do imagenet on resnet18 so these differences in accuracies there they are you know you notice but they are fairly small so you know some someone also couldn't just go on a big accuracy drop when training on data like this so someone training with data couldn't just notice that it's radioactively marked by just saying like well this doesn't work at all i guess some clustering approaches would work where you look at the features and you just see this one feature is like only present in this very particular group of data that i got from this very shady person selling me 3. 5 inch floppy disks around the street corner but other than that yeah it's not really detectable for someone training on it

Black-Box Test

and lastly they have black box they defend against black box attacks and here is where i'm a bit skeptical they say well if we're we don't have access to the model what we can still do is basically uh this is here we can analyze the loss so value of um the radioactively marked data and if the network we're testing is has significantly lower loss on our on the radioactively marked data than on non-marked data then that's an indication that they trained on marked data which you know if you don't have access to the model like what's the probability that you have access to the loss of the model like the usually you need the output distribution or something it's a bit shady what i would do actually is um just a little bit more uh sophisticated but what you could do is you could take your direction you right you could back propagate it through your network to derive like a pure adversarial example so not even going from some image just go from random noise like just derive like a super duper a image that only has that one feature like and then input that into this classifier so this is yours and then input that into the classifier that you are testing okay and if that classifier gives you back the class that you just you know each one of these use is actually of a given class right so you have one feature per class if that gives you back the class of that feature you have a pretty strong indication that someone has been training on your data because so if you look at data in general as we said it has these true features and if it's marked it also has the fake features so what kind of class it's going for you can detect in the output distribution but if you then input like a pure only the fake feature and it still comes out the class that you assigned to the fake feature you know there is a one over number of classes uh probability only that happens by chance and if you want you can derive a different you can do this again you can drive a different um pure only this feature sample input it again and look what comes out so um it's not a pure test so these are not going to be independent so you probably shouldn't like just multiply but i would think a procedure like this and maybe they'd do this somewhere but they'd simply say we can look at the loss of marked and unmarked data which you know i'm not so sure that that's going to work fairly well okay um as i said there are going to be

Conclusion & My Thoughts

many many ways to improve this the paper has more experiments ablations transfer learning between architectures and so on i just want to point out i have a so there's a bit of a an issue here where i think there is a lot of room to grow uh first of all here you simply train the network and then you look at the network at the end right you simply look at these 10 vectors right here and you determine their inner product with the marking directions and that's you know that's what you go by what i would like to see as an iteration of this is where you have a neural network and you you can't just detect by looking at the end what you'd have to do you'd have to be much more sneaky so in order to avoid detection detecting your detecting strategy defenses against this um i would guess what you want to do is not just you know make the network such that in the end it's fairly obvious if by looking at this last matrix maybe you should only be able to detect this uh at the end by actually feeding data into it like we did with the black box test but if we had a white box test by feeding data into it and then um and then looking at the responses of the network so but someone couldn't not tell it was trained with radioactive data by just looking at the network's weights so maybe one idea would be that you craft inputs in some way that correlates two of the hidden features so let's say we have some hidden layer here and one here and these features are learned by the network right and they appear to be fairly independent so you make sure that they are fairly independent during if you pass regular data and then you craft data specifically you craft data like you did here with the marking that makes the network correlate the two features but has little effect actually on the output distribution of the classes so you can retain your generalization much more right it doesn't change this last layer necessarily that much or not in a completely class dependent fashion what i would simply do is i would correlate two of these internal features i would force the network to learn to correlate them and because then i would expect this to be much more you know secretive and then at test time i can simply introduce my forge data again and look whether or not the internal responses are actually correlated um as i said i could do this across classes to cancel out the effect of this actually being a feature for one given class and therefore changing the network's accuracy too much i think that would be a cool next direction to go into and again this should work because even the intermediate features we have good reason to assume that different networks even different architectures different training runs learn the same kind of intermediate features the question is only in the next network that feature could actually be like you know two layers up or three layers down or and so on so you'd have to learn some kind of more sophisticated alignment there but still i think that would be um kind of an iteration of this which would be cool um you know if you're doing this inside the channel um yeah all right so that was it uh for me for this paper as i said pretty simple paper pretty cool idea and i'll see you next time bye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник