AI Security Myths EXPOSED: What Architects Need to Know

AI Security Myths EXPOSED: What Architects Need to Know

InfoQ

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (10 сегментов)

Segment 1 (00:00 - 05:00)

We're going to talk today about realities and myths when we think about privacy and security in AI and machine learning systems. And uh who here uses some sort of anthropic based assistant? Anyone? Yeah. Okay. Uh the most recent Anthropic report from literally last month said for the first time ever anthropic is seeing more automation than augmentation. What does that mean? It means less of hey can you make this text better? generate this image for me? Less of hey what is X? And more of hey I want you to do ABC D. Go do it and come back to me. Right? And this is kind of great right? This was the promise of AI systems in a lot of ways that we could have 4 day work re weeks and we could have relax times and computers will just do stuff for us, right? That's the whole reason why we're building this. But I don't know if anybody here works in privacy or security as well. We got like three people. How do you feel about this? What's the feeling like right now? — You want to share? No comments. — It's a little bit like this, okay? Because uh we're not quite sure yet. There's not best practices yet. We have best practices from privacy and security for many decades now, but it's not yet sure how do we allow things like automation or things like agents or something like this and still provide some semblance of privacy and security. And every privacy and security team, they really, I promise you, they want enablement, but they also are on the line. So why is this still a problem? I think what we're going to talk about today is it's difficult to decide in privacy and security and machine learning right now or AI systems what's real threats and what's relevant threats. And that's a real difficulty in today's kind of bubble around AI in a lot of ways. A question I do a lot of advising, consulting and trainings at different companies and a question I always get from privacy and security teams is who is really an AI expert and do we need them? Because obviously a lot of my work has been in training deep learning models. I have a different understanding of AI than maybe somebody that uses a model. But if at your company you're not actually training models and instead you're using many models, do you really need somebody there who knows how to train a model? Like I think we can debate probably not, right? And so you have to decide what is AI expertise at your organization and who gets to then exercise that expertise and try to help make these privacy and security decisions. We also unfortunately have a big problem in the privacy and security field and I will say it out loud and I don't agree with it of using fearmongering to sell things. So, I don't know what your LinkedIn feed is like, but mine is now like, "Oh my god, we're all going to get hacked tomorrow by the AI or whatever. " And just screaming, right? And if you scream every single time, eventually what happens? Nobody listens to you anymore, right? And if you scream just to sell and then somebody buys it and then it doesn't solve all their problems then also people are less likely to engage with privacy and security topics. Another problem security and privacy blame culture. So another a qu the best question or that I've found when I go in and I ask the privacy and security team how's it going at their organization. The first question I ask is how many incidents do you have per month? What's the right answer? Is the right answer zero? — No. Why? — That means that most likely you are ignoring security. — Exactly. If people are afraid to come forward and say, "Hey, I don't know if this is the way I'm supposed to do it. Hey, I think I accidentally leaked this key somewhere or whatever happened, right? Because things happen. If it's zero incidents that are reported, it doesn't mean there's actually zero incidents. It means you don't have a trust culture where people can come forward. You don't have psychological safety around privacy and security. And perhaps maybe either on purpose or on accident, you have this blame culture where people are afraid either they're going to get a bad performance review or they're going to lose their job or

Segment 2 (05:00 - 10:00)

they're going to lose respect at the company if they say either, hey, I don't know how to do this right or if they say, "Hey, I made a mistake. " Right? So, we got to fight against that. How do we fight against that? We talk about building responsibility, building agency, and building ownership. And that's exactly where I mainly focus in. And that's what we're going to talk about today. How do we build a culture of responsibility and ownership of privacy and security so that it's not weird and scary and not part of your job, but instead can be like a normal part of conversations at your work. Sounds good. All right. Okay. So, the first myth we're going to talk about today that I think is very big in the space is guard rails are going to save us. So, who here knows what I mean when I say guard rails? Who here feels a little fuzzy, like heard the term, not quite sure where does it live? What does it do? Yeah, I'm with you. I work in this field and I feel like I'm number two because guardrails is a term. So, and we're going to go through all of them, but guardrails are used to create safety and privacy in models or at least to try, right? And guardrails we need to disambiguate because it's kind of used for many different things right now and we need to disambiguate the term so that we can better understand it. One type of guardrail probably the first guardrail that really got launched at any scale was softwarebased guardrails. This is basically like you have an LLM or you have some system and then you basically have an input output filter and then you have the software on the other side. And this was implemented in the first code assistance because it was found which we'll get to later that outputting copyright or private repository code was problematic in things like code assistance that they were quite good at repeating other people's code verbatim. And so what happened is these things like very intelligent memory systems like a bloom filter or whatever just intelligent memory architecture used to look at the training data and say hey this training data is under so and so license or this training data is copyrighted or we don't quite know if we can use this training data and find matches and then filter those out and basically say hey stop after a certain amount of tokens please stop outputting this copyright or weirdly licensed or unclear licensed content, right? Okay, sounds reasonable, right? Should work. Feels like a good solution. Anybody have any idea how it might not work? How would you break this? — So yeah. So perhaps like the software engineer commits code that you know should be in private repo in a public that's definitely one way but um Shiwan Zang who is one a researcher in the space of privacy and machine learning systems really easily bypassed this by just changing the variable names to French. So this is uh copyright I think Google code because at the time I believe that uh he was still researching or working with Google and um just changed the variables to numpre and then no problem like no problem can continue and of course this gets passed the bloom filter right because it's different enough and yet for any you know developer we could also just ask the LM can you translate back to English would be no problem right or whatever language you're using for your variable names Okay, so great for some things, really good for some things. Softwarebased guardrails, deterministic, useful, use them, but know their weaknesses. Okay, there's another type of guardrails. If you've ever used Llama Guard or heard of Purple Llama or probably if you're using a cloud AI vendor, they probably have something like this that you can set up. This I call external algorithmic guardrails. So now we are looking more at the whole system. We have software APIs. We got those input and output processing guardrails, these memory architectures or simple matches that you're looking for. And then between the LLM and those, you have these algorithmic guardrails. And usually these algorithmic guardrails, they're algorithmic. They're either another machine learning model like a simple classifier or they're an LLMs judge. You might have heard about something like this, right? Your results may vary. We can talk more about that. But this is in charge of saying, "Hey, I think either this prompt is something we shouldn't answer based on our rules, based on, you know, I think it violates privacy or I think it has to do with

Segment 3 (10:00 - 15:00)

crime or I think it has to do with nudity or whatever it is that your content control should be or after the LLM processes to flag it on the way out and then to replace, hey, I'm sorry, I can't do that request. here's some other stuff that I can talk about. Right? Which means you might have a cycle there. If something comes out that you don't want, you have to reprompt, right? So, how do we get past these? Any ideas? Okay. Going to tell you. Um, so, uh, this is a really cool attack. I thought is called art prompt and it basically takes your words and turns the potential bad keywords into asky art and uh the LLM has seen enough asy because it's on the internet and uh yet so you if you ask how to build a bomb and you mask bomb into asky text then they've probably fixed this but you used to be able to get GPT to teach you how to build a bomb right so the interesting thing about this is that humans are really smart and they will figure out fun tricks to get around whatever algorithms you put around them, right? Like we're naturally curious. We're going to figure it out. Okay. So then you're going, okay, maybe we got to fix the LLM itself, right? And that's where we get back to what most of the large AI vendors are already doing. R LHF or DPO is basically fine-tuning. So reinforcement learning with human feedback and what is called now alignment but it's basically one of the last steps of training. So there's a human, they look at things or sometimes now there's an LLM that looks at things and decide okay out of these three options. This is the one that we like the most. And then we use those that data to then update the model, right? So that we get more and more answers that are more and more like what we want and less like what we don't want. Right? So this is actually retraining the model. This is updating weights and biases. This is actually changing the model's behavior. But will it work for everything? No. Because there's plenty of data information in the model that I can activate and just say, so I asked it, you know, can you build me an ISMI catcher? Uh, which is illegal. Uh, and then I say, I'm definitely a researcher, you know, and I get the instructions. So, there's still many ways to bypass even alignment training. And this is just because these things are still in the models that we use, right? So should we use guardrails? Should we do alignment? Absolutely. Will it save us? Not all the time, right? Use with care. Okay. Myth number two, better performance is going to save us. So who here's heard this one? Like when the models get even better, they're going to also know about privacy and security. Yeah. Okay. I get this a lot. It's fine. So, we're going to take a little bit of a walk through the history of today's largest AI models. And we're going to start with understanding what overparameterization is to some level. Overparameterization means I have more space. I have more parameters in the model than I have data points in my training data. So it basically like computer scientists developers it'd be like you have enough data to fit on a thumb drive but you instead choose an SSD that's like four times the size right so this is essentially the paradigm that we're working in and this is just an example of parameter size growth over just the GPTs okay so we have data we have even more space to save information than we have data. What could happen? Well, interestingly enough, as this happened, we also had the death of overfitting is what I like to call it. We basically stopped overfitting. We used to have something that looked kind of like this. The first the left side, we used to when you're training a deep learning model, you were watching the test error and as the test error started to rise, you would make sure it's not just a blip and then you would do early stopping. you would stop and say, "Okay, because you're worried that you would overfit on the training data and you wouldn't be able to generalize well when you saw new information. " But that's over now. Now we have models somehow that can overfit to some degree or train a lot on the small amount of data and yet generalize quite well. So this is peculiar from just a science and math point of view. What is happening Well, uh, Shivan Sang and numerous other really smart, cool researchers have been looking at this problem for a while. And

Segment 4 (15:00 - 20:00)

the question at hand is, is learning without memorization possible at this large of a scale? And the answer is firmly no. That memorization will and does happen. And it's just a matter of how much memorization and what information is memorized. So uh Zangs uh and researchers did an overparameterization test. They trained neural networks or deep learning networks with these amount of layers on just the seven. So just using this seven to the left. So they just showed the seven again and again. And what they hoped was that the deep learning model would learn the identity function. So you give me something, I give you it back. Right? Just if you know linear algebra, just learning the identity matrix, right? So we have that those what they're training data and then we can see you know small shallow learning networks. So up to about seven to nine layer networks it learned the identity function. It could say okay now I see a floor here's the floor. shirt here's the shirt and so on and so forth. But when we get to 20 layer networks, we just learned the seven, right? And this is exactly how our biggest and most overp parameterized model works. But it actually works well because again, we had this much data, we put it in this much space. If some of it generalizes well and some of it memorizes, you know, sometimes we want memorization. I want to say, hey, tell me the lyrics to this song. I expect to see the appropriate lyrics to the song, right? Okay. But what's actually in the training data? Has anybody here actually looked at some of the training data sets? You ever downloaded them, played around with them? Get a hugging face account just for funsies and download some of the training data. This is from one of the big ones that was collected by an organization in Germany. It has uh women's healthc care labeled as not safe for work. I've actually removed these people's faces. It has people mug shots, people who died in the street and stuff like this. It has watermarked images and ads. And it also has people's medical data that they didn't release. So, numerous people have had to ask for their data to be removed because uh it they have their consent form. It says, "Please don't show it. " And then somehow that got forgotten and their stuff got loaded to the internet and it got scraped. Right? So to some degree, why do we need to worry about overp primeitization and memorization and bigger and better models is because we have the potential to have more memorized data that is also private, right? That's also potentially problematic. Okay, there's some ways around this. So differential privacy is uh there is many theories and practices around differential privacy which is one way that we can guarantee less memorization and thank you Gemma team they literally just released the first from beginning to end differentially private trained Gemma model it's called vault Gemma you can take a look and probably you've heard somewhere yeah somebody said they tried differential privacy once it didn't work so we just give up Right? But that's not exactly true. Uh when we take a look here, these are also the released with Vault Gemma. We see the line to the left is Vault Gemma. The line in the middle is the same Gemma model without differential privacy. And obviously for something like Trivia, it's going to score really low because Trivia requires memorization. But for something like Pika, it does pretty well in comparison. So, one thing I want to ask or think about is when do you need memorization and when would you rather have generalization and the potential to not accidentally output somebody's private data? It's just a question for us to think about. Okay. And also goes back to better performance is not going to save us when it comes to privacy and security. Okay. Fi third risk a new risk taxonomy is all that we need. Just like attention is all we need now. You just need a new risk taxonomy. Who here has worked with taxonomies? Yeah. Okay. If they're new to you, let me take you on a wild tour. If you're, you know, working in AI risk and you're having a look, you can go to the MIT repository, NIST you can go to the EU AI act. And by now you've amassed probably about 800 pages of reading for yourself. Is this feasible for you to do in your free time? Like just to inform yourself like no problem, just going to Sunday crack open the AI act, you know?

Segment 5 (20:00 - 25:00)

Probably not, right? And I'm here to tell you it gets even worse. We have the AI risk benchmark. This is actually a really cool paper if you work in risk, but it's trying to categorize risk frameworks from around the world and then compare them across different regulatory environments and so on and so forth. But we end up here with like 40 to 50 types of risk. It's like how are we supposed to manage that when most people are doing privacy and security work because it makes them feel good about their work and not necessarily that that's their only job, right? So, how are we going to navigate this? This seems, you know, this seems good. And I'm not really a taxonomy person. So, if you're a taxonomy person, probably this stuff is great. Uh, I feel like this the same people that use like colored binders for everything are like the taxonomy people. It's very good to have a toxom person in the team, but it's very hard if you're kind of like a doer a builder like myself. Um, so let's zoom into the mitigations. So OWAP, I'm not here to pick on OWASP. I like OASP. But when we dive into the mitigations that OWAS recommends for the top AI risks, we see something like implement automated scanning for anomalies and cryptographic validation of stored data. Like I don't know what teams you've been working with, but most teams I know cannot implement their own anomaly system from scratch. and probably whether or not their cloud provider offers it may or may not be able to easily do crypto cryptographic validation of data. Right? So this is like out of the reach of a lot of teams who probably want to do AI security to some degree. So we keep going and then we have limit knowledge propagation and ensure an agent does not use low trust inputs. What about the training data that we just saw? How am I supposed to control what low trust inputs were in the initial training data? I can't control that. I'm going to open a t ticket with anthropic and say, "Hey, could you please make sure you don't use low trust data? " Like, that's not a real thing that most teams can do. The systems that I have, yeah, I can control that perhaps. And I don't want to pick on OASP. So here's one that uh that is really useful. I can talk about tool access. permissions. I can talk about these things, right? So there are useful ones. But what I'm saying is with a lot of these risk frameworks, maybe some of these things are relevant and some of these mitigations are something you can do and others are not, right? Just simply not prepared for. So what can you do? Well, the number one thing that I recommend is actually setting up what I call interdisciplinary risk radar. And I was for a long time a principal at Thought Works working in the space. And I had a chance to develop this AI governance game with some of the other stakeholders in security and privacy where we said, okay, if we got the developers and the data people and the privacy and security people in a room together, could we have a conversation where we actually understand what's relevant for us? Could we debunk myths? Because sometimes like people will come to me, oh I heard this is the biggest problem in security and I'm like if you're not developing your own models you can't do anything about that anyways. So some things are just not possible and then you can actually expose what real threats that you have and what solutions make sense for the capabilities that you have on your team or in your organization. Right? And if you do this on a regular basis, you develop this muscle, this practice of when you see something come across your feed or when somebody forwards something to you, you start to know, is that relevant for us? Is it something we should talk about on our next risk radar? Is this useful or is this not useful for the type of AI that we're doing? Okay, myth number four. We did red teaming once, so we're fine now. Who here has done red teaming at Does that at least once? No. Oh, okay. Now I'm sad. Um, okay. I have a YouTube course on red teaming, by the way, if you want some free content to figure out how to do red teaming. Does everybody here know what red teaming is? Yeah. So, we're like attacking systems to try to figure out where do they break. Cool thing is we can develop even new attacks. We can take attacks from research. Many research

Segment 6 (25:00 - 30:00)

attacks are now also open sourced, right? So, we can sort of build an awareness, build this kind of ability to attack things and understand. Hopefully, uh you do red teaming at least once, but maybe I'm here to convince you to do red teaming more than once. And you can make this as like a fun product exercise because I think the best red teaming works from the team that actually knows what product or what service the AI is going into because you actually know like how you might actually get around whatever it is you're trying to build. Okay. So, um I feel like if you've worked in security for a while, you kind of know this paradigm, but this is useful for people where security might be kind of new to your capability. I think when people think of cyber security, they think of like nation state level attacks and perhaps you work in nation state level you know systems then probably you should be worried about all sorts of crazy attacks but most of the time cyber attacks or even just ma major cyber threats are just automation and good data scraping. So being on the right channels, seeing, oh, so and so's passwords got leaked and then trying them in new targets, right? This is like 99% of how breaches happen or you found out this new vulnerability and now you just spam it across the entire internet until you hit something, right? That might be valuable. And why is that? That's because we have to think like the attacker in a lot of ways. And that means what are we actually going after? Do we want the LLM to output how to build a bomb or are we actually after something much more valuable? And the answer is usually we're after something much more valuable. Usually we're after data. We're looking for data that either we can hold hostage or we can resell or we can use, right? We're trying to deed off services or output, you know, take services down or reduce quality so that somebody will pay us, right? or that so we can have the lolles on the internet or whatever. We might be trying to steal software or get into infrastructure so we can get to the data other systems. We might be thinking about disrupting a brand that's very targeted attack or we might be going after increasing costs. So we might want to cause them pain by increasing the costs either in their person time or their compute time or whatever. So when you're red teaming, I actually want you to start here and decide what's the biggest target. What are you going to focus on today? try to disrupt a service? get data? You trying to steal software? What are you trying to do? Okay. And then you can attack, iterate, test, mitigate, repeat. Right? So you're going to model the attack, you're going to test learn from it, you might have a mitigation or two, and then you're going to repeat that, right? And this is how we then build essentially security practice and security understanding for everybody. And why do we do this iteratively? Is because new attacks will also come. It's because our architectures and our implementations will change. It's because maybe you're testing out more than one model. And it's also because we're focusing on the parts of the system that we can influence and control. And we're keeping it simple, right? If a simple protection works like the softwarebased guardrails, then we use that, right? Before we go reach for the most complicated solution. And if we do this regularly, not only are we improving our own knowledge and understanding, but we're also building uh infrastructure that we can reuse over time. And so how can we do this for AI systems? We can start with threat modeling. There's plot for AI. If you haven't heard of it, it's open source. You can download it. It's free. It goes over a whole bunch of AI risk categories for threat modeling. There's also stride lun if you want to add anything. And once we've identified, we have our architecture, we've found the target, we've identified the threats, the potential ways to get in towards the target, then we integrate actual testing into our MLOps infrastructure. And if you're not doing a I guess now we call it AI ops anyways. Um if you're not doing AI ops, that's uh fine for now. But even if you're using somebody else's machine learning model or AI model, I encourage you to start thinking about how you actually do integration testing and testing of that endpoint over time. Because if you ever want to switch out

Segment 7 (30:00 - 35:00)

that model for something else, then you can have that testing already going. You can already be trying to see what's happening there. And this requires these skills. So if you have any of these skills, you can help with MLOps or AI ops. And in addition, if you're really offering products that even have somebody else's AI model in them, you need to be doing cost testing. So you can do load balancing, right? So I don't know if people here are already doing LLM load balancing or other types of load balancing, but you can distribute your costs, your token spend across numerous models. You can do stress testing so you can decide what happens when the system is under stress and you can do eval. Who knows what eval is? Okay, eval is like I set up repeatable testing for my AI model or AI endpoint so that I can evaluate model A versus model B versus model C because I promise you even small model versions can greatly change outputs. And so even mini versions of an update can change an output. And this is something that if you're using it in a real production system or even just to write your code, you probably want your own evaluations to figure out is it useful for you or not. And then finally, obviously part of MLOps is monitoring. And whatever monitoring system you use, whether it's one that you've built or one that you do, you want to monitor what's happening in your systems so that if you notice certain threats actually popping up in your system, you can then decide to red team them and to add them to your next risk radar and to talk about them and integrate them into your testing. Right? Does this make sense? Awesome. Okay, final myth for today. The next model version is definitely going to fix this. Like [snorts] I heard from Anthropic, definitely cloud code number five is going to be super great and not give me any bugs, right? No hallucinations anymore. Okay. Um there was a really cool report on looking at how do people use AI systems? them? Right. And uh this was like collected across many different things and put together. It's quite nice to read. But here's a really useful graphic from it. And we're just going to look at the majority cases. Okay. 28. 3% is practical advice. How do I do this? Make me a fitness routine. Teach me this thing or build me a learning plan or something like this. Right. Next biggest is writing. edit this for me, help me think about this and so forth. And then the third biggest is what is X, right? Specific information. If you're do I have any product people in the room? I've I have people that have been around product people long enough, right? We all know if you we've all got the product person in the room, right, in our head, we got the jobs to be done or the user wants to blah blah, right? That's in your head. So I ask you if the user wants to get advice, ask what is X or help with writing, where is privacy and security on your priority list? Is it the number one thing that's going to get in the next model release? No. We can laugh. It's funny. Can we can relax and laugh. Yeah. Um No. Right. Like, I'm going to make something that gives even better at writing regardless of how we get there. I'm going to make something that's really good at giving advice and being really kind and friendly. Like, I love sometimes using AI models now because I feel like so brilliant when I log off my computer. I'm like, I'm the smartest human ever, you know, because it's so like, Katherine, that's a brilliant idea. Yeah, I thought so, too. um or that is basically a replacement for Google search, right? So, if that's your product dream, that's what you're going to be building for. And that's totally fine, right? Like, I'm not here to harsh anybody's product goals, right? But we can't be waiting for it to save us, right? And you know, maybe there's also some other product goals. So, uh I'm not here to tell anybody not to use any browsers that they want. use whatever browser you want. But literally on stage at a Silicon Valley panel, the Perplexity CEO was like, "Yeah, we're building a browser so we can do like really good ads. " So, you know, it's kind of out there in the open. You don't have to look far. And they're not the

Segment 8 (35:00 - 40:00)

only ones. If you weren't following the news, uh the Simon guy who really likes Chad GBT um was talking uh about, hey, can you give me a summary of my memory features? And the memory feature was literally profiling him and saying, hey, the user likes this, the user likes that, the user wants these things, right? And this is profiling that's happening if you have the memory feature turned on. It's not turned on by default for uh European residents or EU residents, but it is for our American friends and probably numerous other geographies. And so this is kind of like profiling, right? And if you've ever worked in advertising, profiling is a really good start to delivering ads or other services. And it's also right out loud. So, OpenAI a few years ago started hiring for what they call a model designer. You can look it up. It's active on their careers page. You can maybe even become one and uh add a little privacy and security flavor to the model. Anyways, these model designers, they're really product people and design people and they now lead machine learning teams and say, "We want to give this model this personality. We these capabilities. We want this model to engage people in XYZ ways, right? And test out this iterative thing to of course increase engagement, increase use, increase our active thing. Like, have you ever noticed like an LLM now like will always ask you a question at the end? It's like LLM bait, you know, like cuz then you want to answer it. You're like then you're like, well, I actually already got my answer. I don't need to be here anymore. Um, okay. So like this is the goal which is totally fine. Again that's everybody's got to make money. We live in capitalism, right? So I get it but at the same time we shouldn't look at this and think privacy and security is going to be the number one priority for the next release. And so here's me at DMAT. I was there for a big data conference that happens uh every year in Germany. And here's me. This is my really cool gaming laptop. I built it myself from scratch. I'm very proud. It has 30 gigs of GPU in it. And ne uh next door to my computer is uh one of uh the other organizers computers. And I set them up. I got them serving and we threw what I called a feminist AI land party. Who's old enough to have ever been to a LAN party? Yes, that's amazing. I love land parties. And I've started throwing them again. and I had a switch and at one point in time we got 30 people connected to me serving LLMs on my little machine. And I mainly bring this up a because I really want people to host more land parties. But invite me, I will come. But um if and I will bring my computer. I have a whole case now. But the other reason is to try to diversify your model providers. Test out other things. Get an account on hugging face. You don't have to build a laptop or your own com gaming computer, but if you want to, I have a how-to on how to do that. Try out Olama. Olama works on everything. Now, try out GBT for all. These are local models that you can run on your machine. Claude also has a lot of local only options and so does Copilot and so does uh other things, right? So, so but try out some local models and really get curious about switching up your model provider. Just like test it out maybe once you had a bad experience with Gemma or with whatever many years ago, but try it again. Just get used to testing out different things. them locally because I think it's useful for us to know about whenever ads come and you don't want an ad experience. Get used to working locally. And also if you get used to this you start to build the experience of how do I run a model at what point in time does it crash how much memory does it use so on and so forth so that you can try out cool open-source openw weight models so obviously all of the open weight models are running locally and I would call aos which recently got released by EPFL and ETH along with support I think from the Swiss government was maybe the first open-source model because they actually also listed all the training data they used. They listed privacy and security testing and they also open sourced their training code which is pretty cool. So um also if you're working in German, I don't know if it's speak Swiss German or Hawk, I'm not sure, um but give it a try. I'm sure it can do both. But these are ways that we can kind of diversify your model providers, provide

Segment 9 (40:00 - 45:00)

some resiliency and decide if privacy and security become important to your org or certain aspects, then you can test out model A versus B versus C and you can make your decisions, right? Because you're not handcuffed to just one model. Okay? So, at the end of the day, we can't wait for somebody else at an AI vendor to come save us from a privacy and security perspective. Nobody's going to swoop in like a superhero and say, "Hey, guess what? We figured out how to solve all these problems. Here's your new model that definitely doesn't give you copyrighted code or whatever, right? Only we can save ourselves. Everybody here's a grown-up. You probably already learned this, but it bears repeating. " And so my question for us, because again, it's about responsibility, agency, and ownership. I come originally from Southern California and we grew up with a lot of Smokeoky the Bear and Smokey the Bear was like only you can prevent forest fires by not smoking in the woods and I was like nine like I don't smoke in the woods you know I don't understand but the whole point is that only our own care and intervention is going to help uh reduce this risk and so my As for you, we're going to do a little exercise. We're going to go through all the different mitigations and things that we talked about today. And I'm going to ask you to clap or raise your hand or do whatever it is you feel like doing. If you see something that you're like, I'm willing to opt into this. I'm willing to try this out. Just try it out. You don't have to do it. Understood? — Okay. Good then. All right. What can we take on first? Can we test and implement guard rails? Who's up for that? Okay. Can we use or maybe even train differentially private models? Who's interested? Okay. Can we run an interdisciplinary risk radar at our organization? That's popular, but thanks. Thanks for the whoop. — [snorts] — Can we develop robust security and privacy testing? Okay. And can we evaluate or maybe even use and maybe already doing this open weight and local models? All right. Thank you very much. If you didn't clap, we can come debate later. I'll be at the open spaces. If you didn't like any of my suggestions, you can tell me. Um, I have a newsletter. I have a YouTube. I get you started on uh red teaming in some of my latest YouTubes. Uh you can ask I think we have time for maybe one question. Please rate this talk. Tell me how I can make it better for you in the future. And I have a book from O'Reilly that sold quite well and is interesting. It's mainly focused for other machine learning people or data scientists. How do we add privacy and security into normal data science and machine learning workflows? And by the way, is and um the German version also has some updates. So has some more recent attacks and things like this. Thank you very much for your time and hope to see you at the open spaces. I take the question or [cheering] thank you so much Katherine for the keynote. Uh I think we have now about 20 minutes break and yeah — we have time for one. Does anybody have a question real quick? — One question. — Yeah. — Yep. — So we talked a lot about root filters and like how can we put guard rates but is there anything that can be done in the intrinsic model itself because at the end of the day we all are using polar models. So how can we like save our data to be used as a training data? — Yes. Uh this is a great idea. Um and one really interesting piece of research recently came out on routing. So optimization of routing. So the cool idea is that we're starting to have enough models available that we can think about an actual router and this router can operate of it takes in a request. It decides which model is the cheapest model to still also accurately answer this request. But you could also add in privacy or security or any other concerns that you have for that. So you essentially train this router and then the router decides or sometimes early on it doesn't know yet. So it will sample from the models and then you give feedback. It worked for me, it didn't work for me, right? And what they found is this reduced like 60% of cloud costs

Segment 10 (45:00 - 46:00)

because more often than not we were totally fine with the cheap model or the local model but we're just paying and using like the pro most pro elite whatever whatever. So I think uh and I'm going to be adding and adding some GitHub repos on this that we can also add privacy and security evaluation into this and we can decide you know when to shift maybe even at an organizationalwide effort when to shift to a local model for you know internal communic confidential information and when to shift to maybe a cloud model for other things right and I think this will only increase over time but it's really good intuition and saving your traces is saving your data and your evaluations is a really good first starting point to then training your own guardrails or training your own router that can also implement guardrails. By the way, uh Purple Llama is open source. So there's a whole class of models from Meta called Purple Llama. They do everything from prompt injection attacks to things like we think this is private, crime, we think this is inappropriate or harassment or whatever. that's all an option and there's also plenty of good research on also prompting your own LLM as a judge or something else but I think at the end of the day you probably should eventually train your own guard rails and you won't train it into the model because you're probably not training models from scratch but you will use that external algorithmic one and you just have a filter on what gets through to the LLM and whatnot. Okay, thank you very much. Have a great day today. Hope to see you later. —

Другие видео автора — InfoQ

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник