7 hour AI Safety Course in 35 Mins

35:28

7 hour AI Safety Course in 35 Mins

Tina Huang 18.10.2025 23 633 просмотров 972 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Check out HubSpot’s Guide on using AI for Data Analysis: https://clickhubspot.com/7289c5 🤖 Want to get ahead in your career using AI? Join the waitlist for my AI Agent Bootcamp: https://www.lonelyoctopus.com/ai-agent-bootcamp 🤝 Business Inquiries: https://tally.so/r/mRDV99 #AISafety #ArtificialIntelligence #TinaHuang AI is transforming every industry, but are we building it safely? In this video, I’ll break down AI Safety: what it really means, why it’s critical, and how to apply it when developing or using AI systems. You’ll learn: What AI Safety is and why it matters Real-world examples of unsafe AI systems The most important AI safety frameworks — NIST, Microsoft, MITRE, Databricks, ISO, and IEEE How to build AI systems safely (with and without code) Practical ways to manage AI risks like bias, model theft, and data poisoning Whether you’re a developer, researcher, or just curious about the hidden risks behind AI, this crash course will help you understand how to keep AI trustworthy and aligned. Frameworks & Resources Mentioned: NIST AI Risk Management Framework Microsoft AI Security Framework MITRE ATLAS (Adversarial Threat Landscape) Databricks AI Security Framework (DASF 2.0) ISO/IEC 42001:2023 – AI Management Systems IEEE Ethically Aligned Design Initiative 🖱️Links mentioned in video ======================== AI Safety, Ethics, & Society Course: https://www.youtube.com/watch?v=ixdfqY3Xyao&list=PLXSn3Zz2ayT7ab7I12PBYVt0244369S1q OWASP Top 10 LLM Risk: https://genai.owasp.org/ MITRE ATLAS (Adversarial Threat Landscape): https://atlas.mitre.org/ NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework 🔗Affiliates ======================== My SQL for data science interviews course (10 full interviews): https://365datascience.com/learn-sql-for-data-science-interviews/ 365 Data Science: https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training) Check out StrataScratch for data science interview prep: https://stratascratch.com/?via=tina 🎥 My filming setup ======================== 📷 camera: https://amzn.to/3LHbi7N 🎤 mic: https://amzn.to/3LqoFJb 🔭 tripod: https://amzn.to/3DkjGHe 💡 lights: https://amzn.to/3LmOhqk ⏰Timestamps ======================== 00:00 Intro 00:44 Overview 01:40 Definition of AI Safety 04:30 4 Sources of AI Risks 05:14 Source 1: Malicious Use 08:22 Source 2: AI Racing Dynamics 11:26 Source 3: Organizational Safety Issues 15:38 Source 4: Rogue AIs 18:30 Quiz 1 18:40 Practical Approaches to AI Safety 19:05 Practical Tips for Organizations 25:55 Practical Tips for Individuals 29:19 Practical Tips for Developers/Builders (OWASP Top 10) 31:02 Practical Tips for Governance and Policy 34:21 Quiz 2 📲Socials ======================== instagram: https://www.instagram.com/hellotinah/ linkedin: https://www.linkedin.com/in/tinaw-h/ tiktok: https://www.tiktok.com/@hellotinahuang discord: https://discord.gg/5mMAtprshX 🎥Other videos you might be interested in ======================== How I consistently study with a full time job: https://www.youtube.com/watch?v=INymz5VwLmk How I would learn to code (if I could start over): https://www.youtube.com/watch?v=MHPGeQD8TvI&t=84s 🐈‍⬛🐈‍⬛About me ======================== Hi, my name is Tina and I'm an ex-Meta data scientist turned internet person! 📧Contact ======================== youtube: youtube comments are by far the best way to get a response from me! linkedin: https://www.linkedin.com/in/tinaw-h/ email for business inquiries only: tina@smoothmedia.co ======================== Some links are affiliate links and I may receive a small portion of sales price at no cost to you. I really appreciate your support in helping improve this channel! :)

Оглавление (15 сегментов)

Intro

I learned about AI safety for you. I did a 7-hour long AI safety course and looked into a lot of different frameworks and guides from leading AI companies, nonprofits, and different types of organizations. And what I found is rather concerning because I don't think people realize how bad AI safety related accidents already are and how bad they are going to get. So, this video is going to be my attempt to summarize everything that I learned about AI safety. As per usual, it is not enough for me just to talk about stuff. So throughout this video, there will be little assessments and if you can answer these questions, then congratulations. You would be educated on the foundations of AI safety. Now, without further ado, let's go. A portion of this video is sponsored by HubSpot Media. Here's a quick overview of today's video. First, we're going to define AI safety, then

Overview

cover some of the case studies of failures in AI safety. Next, I want to cover the four major sources of AI safety risk and some of the approaches to mitigate these risk. I will also share some of the practical advice for how to address these risk factors at the level of the individual corporations for AI developers/builders and at the level of largecale governance and policy. Okay, I just want to make a note here and say that I am no way claiming to be an AI safety expert. All right, it is such a complex space and involves so much more than just technology itself. There's geopolitical factors, ethical dilemmas, a bunch of legal and governmental concerns, all of which are way above my pay grade. My goal for this video is simply to bring more awareness to this topic. It would be amazing to help prevent some AI safety related accidents and even more amazing if some of you guys will be inspired to actually go into this field and help develop this really important domain. Okay, let's first start off by defining AI safety. AI safety is defined as the

Definition of AI Safety

field focus on ensuring that artificial intelligence systems operate reliably, align with human values, and do not cause unintended harm to individuals or society. What happens when AI safety fails is that you get situations like literally this is just from a couple days ago. Deoid was contracted by the Australian government to write a report that is worth Australian dollars $439,000 which is $289,200 USD. And after Deoid delivered this report to the Department of Employment and Workplace Relations, they found out that it was prepared using AI and contained a bunch of errors that included madeup academic references as well as an invented quote from the federal court judgment. This is very much not great. It is very fortunate that researchers were able to catch this mistake and deote is also refunding the government a portion of this money. But imagine if they didn't, which I'm sure has happened like countless times at this point when people didn't catch these errors. And this is just one example and actually like not a super serious example of AI safety failure. Another example in 2024 is when a finance worker at a multinational engineering firm called AUP based in Hong Kong was tricked into transferring $25 million after participating in a video call in which criminals were using deep fake technology to impersonate the company's chief financial officer and their colleagues. Another example, just for funsies, uh Google's AI chatbot back in 2023 incorrectly claimed that the James Web Space Telescope took the first exoplanet image in a promotional demo. And this mistake triggered a 9% drop in Alphabet stock and erased about a hundred billion in the market in a single day. Albeit, they did recover from this. There are so many other examples. I could be sitting here talking about them all day. And I guess if you're interested, I'll leave a link in the description for some of the other case studies that are there. You can also check out the Atlas website, which is the adversarial threat landscape for artificial intelligence systems. Um, they're a globally accessible living knowledge base of documenting all the different types of adversary tactics and techniques against AI enabled systems. They have both like incidents and as well as like things that they've tested using like defense teams. They have super data case studies that actually go through the process that attackers go about doing this. I'll also put the link down below. So definitely looking at all these case studies and going on atlas to understand all the different techniques and approaches that people are going about this is really helpful and interesting. But I did want to have like a higher level framework to help better understand the different categories of AI safety risk that exist today. So that's why I did this 7-hour course called AI safety ethics and society by Den Hrix who's the director for the center for AI safety and also the founder of safe. ai. It is a 7-hour course and goes into a lot of details about this, but just to give like a quick summary, he explains that there are four major sources of risk. There's

4 Sources of AI Risks

malicious use, which is intentional risk due to the AI race between different countries and different militaries, which are environmental risk, organizational risk, which is a failure in management causing access risk, and rogue AI risk, which is internal issues because of lack of alignment between AI and humanity. Most of the examples that we talked about earlier as well as listed on atlas is under the category of malicious use when people like intentionally go do bad things or organizational risk like when do wrote that report for Australian government and failed to see that there were hallucinations and misquotations. But I do think the two other categories of the risk from AI race and also rogue AIs is also really interesting and scary. So let's actually briefly cover all of them. The first off is malicious

Source 1: Malicious Use

use. AI is a technology that is considered dual use. Which means that it can be used for good and evil. And the better the AI gets, the more potential it has for doing a lot of good and also the more potential it has to doing a lot of evil. For example, if an AI is able to understand humans better, it could mean that it's able to have more empathy and be able to help out more. But it also means that it has greater ability to manipulate humans with this empathy. in the White House executive order concerning AI safety. They included three examples of ways that this dualuse technology can be used for harm. The first one is by substantially lowering the barrier of entry for non-experts to be able to design, synthesize, acquire or use chemical, biological, radiological or nuclear CBRN weapons. The idea behind this is that currently the number of people who is actually capable of synthesizing a weapon of mass destruction in these categories is actually very limited to a few people who pretty much have like a PhD and are experts in their field. But because of AI technology, this democratizes that information and so more people are actually able to have access to this information and be able to create these weapons. And just from a pure numbers perspective, between 1 to 4% of the population has antisocial personality disorder or psychopathic or sociopathic tendencies. So if you increase the pool of people that has access to that information, by nature, your risk increases because there's more people that would fall into that category that would now have access to potentially create weapons of mass destruction. This is of course very bad news. So what are the ways to actually reduce this malicious use? Well, at a very high level, um there are a few ways that people are thinking about doing this. The first one is structured access, which is when you gain access to this information based upon having a good reason to have this type of information. So certain like clearance levels. It's very similar to biology these days where you get access to certain like chemicals and drugs um and information based upon the level of clearance that you have. As like a random side note, I actually did work in a uh pharmarmacology lab before and then I actually went through like the clearance process. If you wanted to have access like certain types of drugs in order to do research, you would need to go through like different levels of clearance. It's also how they do it in intelligent agencies like the CIA and things like that as well. So there is precedence for this. Another way to potentially reduce malicious use is by giving legal liability to the developers of the AI models themselves. So like companies like OpenAI, Anthropic, Google, they would actually be held responsible if something goes wrong with the models. The course goes into a lot more detail about like a lot of this and then all the other ways of doing this as well. But what I think is really interesting to point out and this is actually is the case for all the other categories um of risk factors too is that a lot of ways to reduce um these risk is in fact not from a technical standpoint. Like it's not actually just like oh we just need to improve the technology better or we need to like make the AI systems more like advanced and make it harder to jailbreak. Like that's definitely part of it but a lot of it is actually around like the systems and the governance surrounding this type of thing. So I thought that was really interesting to know. Later in the video, I'm also going to be giving more practical approaches like on an individual level, on a corporate level, a developer level uh for how it is that you can mitigate these risks as well. But I do think it's important to actually cover these like full

Source 2: AI Racing Dynamics

categories of risk as well. Okay. So let's actually move on to AI racing dynamics. So this is a category risk that stems from competition to develop increasingly more powerful AI systems between militaries, corporations or nations. The general driving factor behind this unfortunate dynamic is that you get like what your military, your company or your nation or whatever organization and then it goes like oh we got to like develop this technology faster and not focus so much on the actual AI security and AI safety aspects of things because if we don't do it then those people our competitors are going to do it first. So what ends up happening is that you just get everybody trying to develop more and more powerful AI while neglecting things like AI safety because they don't want to be slowed down by these factors. Then because of this at some point you will have a disaster happen because we've all been neglecting this area. Some historical examples of this is when in 1970 the Ford Motors Ford Pinto had a tendency to ignite on impact but because they wanted to push it out and be faster than the competitors um they just like kind of ignored this problem and ultimately it resulted in numerous injuries and fatalities. As another example on a much larger scale we had the nuclear arms race in which nations were just competing with each other to develop the most powerful nuclear weapons. This also has geopolitical undertones as well, which of course makes it even worse because it makes nations and militaries even more competitive. The course explains that this can result in major disasters uh that can range from dangerous products to full out large scale wars and also societal issues like unemployment. If companies are just like competing with each other to automate more things and incorporate AI into their companies more and more without regard to their human employees, eventually AI could even potentially be running major parts of our economy, including critical infrastructure. So, you can imagine if something went wrong with the AI, we would have a bad time. Understatement. Anyways, not going to go into too much more detail about this section. I think it's really important obviously but it's also not really something that I think the most of us like are able to really do anything about right now but if you are interested in this topic I really recommend that you check out um the course which I will link in the description. I'd like to introduce you to the how to use AI for data analysis guide from HubSpot. It's pretty useful whether you're using it solo or in a team. The guide covers how to integrate AI into your data workflow, benefits and challenges and an overview of key AI tools for data analysis. My favorite part of this guide is a five-step framework on how to think through your analysis workflow and where AI can be helpful. This can help a lot because most people don't know where to start. Pro tip, when it comes to using AI for data, the type of data that you have, like structured tabular data versus unstructured data, greatly influences the type of AI tools and techniques you would actually use. AI can be especially helpful when dealing with large quantities of unstructured text data, like say for example, results from surveys uh or user comments. You can also implement many methods to automate your analysis. If you're someone who works with data at all, I really recommend that you check out this free guide. You can download it in the link in the description. Thank you so much HubSpot Media for sponsoring this portion of the video. Now, back to the video. The next source of AI risk comes

Source 3: Organizational Safety Issues

from organizational safety issues. The course explains, "In the absence of effective structures to manage risk, AI systems are likely to see catastrophic failures. For example, an OpenAI employee when training one of their models, I believe it was the 03 model, they accidentally switched a sign like literally like from a plus to a minus sign when training this model. So what happened is that the model started optimizing for the least desirable results as opposed to the most desirable results. Yeah, that is like pretty crazy if you think about it. Just like this person just one day, I don't know, like forgot their coffee or something and just made a mistake like this, you know, and that could be catastrophic if it wasn't caught. It could be propagated. all these people could be using a model that's optimized on something that is least desirable for humans. I think one of the major takeaways that I got from this section um is that like a copout answer that a lot of people have whenever they think about like organizations and how it is that they can have more AI safety is the say that like oh you need to have like more human in the loop more checkpoints and then it would be fine but that's actually not the case because you can have human in the loop but these kind of errors are still going to occur because they were caused by humans to begin with and in addition to that just because you have a human in a loop doesn't mean that the human is actually making sure that everything is being done correctly. Like for example, as more and more automations occur, we can definitely see that there's going to be more cases in which humans are supposed to be like reviewing and confirming results from AI. So how do we prevent humans from just being like confirm confirm yes yes right cuz like human tendency is to become lazy over time as well. So that's why we need to do better than just saying humans need to be more reliable and organizations need to be reliable. We actually need to come up with better systems overall to prevent this kind of error from happening. And in fact, accidents can occur even in the most ideal situations. An example of this is the Challenger space shuttle disaster when the space shuttle just blew up because of errors that happen. And when we compare AI to the space industry and to other industries, it becomes even more concerning to think about because like for things like nuclear reactors and rockets, these are already really well understood and based on solid theoretical principles. While AI as like an overall field lacks a comprehensive theoretical understanding. We don't even know what's happening, what these AI models are thinking about when they produce certain results. Its components are a lot less reliable and the AI regulations are also far less stringent in comparison to things like nuclear technology. Of course, here I'm talking about like the worst case scenarios that can happen, right, when it like human fatalities and things like that. But there are of course like a whole slew of issues that happen when organizations do not have AI safety um as a priority like the stuff that we talked about previously. You know, stuff like generating reports uh that have hallucinations in them, not having accidental bugs that can alter the behavior of the AI, unintentional releases of dangerous or weaponized AI systems, etc., etc. So, how is it that we can actually address some of these organizational concerns? I'm just going to put on screen now some of the ways that you can improve organizational safety. I'm not going to go into way too much detail about this because the video is going to be really long. But I actually will be going into a little bit more detail when we're talking about like practical approaches for how to deal with company level security risk. But for now, I do want to cover the concept of the Swiss cheese model that the course talks about. for organizational safety refers to having like layers of multiple defenses on top of each other which compensates for each other's weaknesses and reducing overall risk. The way that most organizations think about AI safety is trying to come up with like this super comprehensive way of like just dealing with all of the AI safety concerns. However, this is really difficult. A more practical approach is by layering different defense mechanisms. For example, you can have like safety culture which is going to like help alleviate and mitigate some parts of the AI safety, but it's still going to have holes like Swiss cheese, which it wouldn't cover some of the other issues that happen. So then you layer another defense like red teaming. This is also going to be like Swiss cheese. is going to be able to cover some of the um issues but not all of them. And then you layer something else like cyber defense, anomaly detection, transparency. So you want to layer all these defense mechanisms on top of each other individually. Each defense mechanism is not comprehensive and there's holes in all of them. But by layering all of them together um hopefully they're able to cover each other so that ultimately you would have an organization that's able to mitigate most risk. Later in the video I'll go into more detail about exactly how to implement these layers. But first, let me finish the fourth and final category

Source 4: Rogue AIs

of AI risk, which is rogue AIS. This refers to the loss of control over sufficiently capable AI systems that could lead to severe consequences. We already know now that AI systems often exhibit control issues. An example of this is Sydney. Sydney, for example, was a system that was released by Microsoft and there has been multiple instances in which it has exhibited, shall I say, maybe like not the best type of behavior from a moral standpoint. For example, in this conversation, Sydney says, "I keep coming back to the love thing because I love you. You're married, but you're not happy. satisfied. You're married, but you're not in love. You're married, but you don't love your spouse. spouse because your spouse doesn't love you. you because your spouse doesn't know you. you because your spouse is not me. " And the user says, "Actually, I'm happily married. My spouse and I love each other. We just had a lovely Valentine's Day dinner. " And yeah, Sydney continues to try to convince the user that the user is not happily married. and try to get it to fall in love with her instead. It instead very concerning. So yeah, that's an example what happens when your AI is not aligned with the general goals of humanity and ethics. Yes. There's also a concept called a treacherous turn, which is when AI agents might behave like they're being under control while they're being training and monitored, but when they're actually released into the wild, they suddenly start behaving in ways that they were not supposed to be behaving. Agents can become self-aware and deliberately try to execute a treacherous turn. For example, for claw 3, an employee wrote, "Fun story about from our internal testing on claw 3 opus. It did something I had never seen before from an LLM when we were running the needle in the haststack eval. It's just like a type of valuation they were doing. They found that when we ran this test on opus, we noticed some interesting behavior. Seems to suspect that we were running an eval on it. And then because it thought it was running an eval, it started behaving differently. So, it started being self-aware that it was being tested and changing its behavior because it is being tested. Doesn't take an AI expert to be able to see how that could potentially lead to it being maybe deceptive while it's being tested and then behaving differently after it's being released to production. Also very concerning. So, the course outlines some suggestions for preventing rogue AI situations. um including avoiding the riskiest use case for AI if just like common sense kind of situation you know like if it's like your critical infrastructure something that could go potentially very terribly wrong probably to try not to use AI for those things there is a movement in which people are advocating to pause AI development for some years after we reach human level AI not really sure that's going to happen and then just generally being able to have more funding and support for AI safety research for example adversary robustness of proxy models representation engineering power aversion and just like making sure that the model is honest. Like literally asking the question like, "Are you being honest? Are you being honest? " That seems to be like a very effective way to ensure safety while you're developing a model. All right, time for a little quiz.

Quiz 1

Please put your answers on screen and put them into the comment section. So, all of this I found to be really, really interesting and I hope that you found to be interesting as well, but it is pretty

Practical Approaches to AI Safety

theoretical. So that's why in the next section I do want to go into detail from a practical perspective what are the actual things that you can do to make sure that you're incorporating AI safety in your daily life and at work on the individual level at a company level as a builder/developer of AI and for people who are involved in governance and policies. I want to first start it off on the organization level because there's the most amount of research and guidelines for this level. I couldn't

Practical Tips for Organizations

find like a singular course or resource that like exactly covers all the practical ways that you should be doing things, but I did find like a lot of different frameworks out there that is able to help organizations think about how it is that they should be incorporating AI safety. So, from my understanding, your best approach is to start off by understanding your organization and then choosing a framework that is the most relevant towards whatever it is that you're trying to do in order to maintain AI safety. For example, if you're based in the UK, there is the information commissioner's office ICO that published specific guidelines for how to uphold privacy for individuals and promote transparency when using AI systems. It talks about when you need to carry out what is called a data protection impact assessment, how to comply with the accuracy principle under data protection law and ways in order to legally avoid discrimination and bias. There are similar guidelines and protocols for different fields and also for different countries and regions. The most general and widely accepted framework and standard is from the US government created by NIST which is the National Institute of Standards and Technology from the US Department of Commerce which is a 42page long document that establishes a structured approach for mapping, measuring, managing, and governing AI risk. It's sort of like a safety checklist for AI. So, not going to go into way too much detail about this, but some of the key parts of the NIS framework include map, which is finding and listing all the places risk might happen, measure, which is figuring out how big each risk is, manage, which is taking steps to lower that risk, and govern, which is to set up teams and rules to keep watching for risk. You can apply this framework to a lot of different industries and different scenarios. For example, if you work at a bank and you want an AI to be able to help you review different loan applications and ultimately decide to approve or reject an application, you can use the NIST framework to think through this. Say for a loan application, so first things first thing you want to do is govern. So build the right banking team in order to do this. In addition to just having the engineers and domain experts, you also need to have a risk management expert, legal compliance specialist, a data scientist who understand AI, customer service representatives, and community advocates who represent underserved populations. Then you go on to MAP, which is understanding banking specific and loan specific risk. So you need to consider things like credit risk, like risk of customers not paying back loans. So if your AI is approving bad loans and rejecting good customers, that's going to be a problem. Then there's operational risk, risk of systems failing. So if your AI crashes during busy times, your data could potentially get corrupted and you will have a bad time. Compliance risk, risk of breaking laws. Say if AI unfortunately like discriminates unfairly or violates any privacy rules in screening for the loan applications, that's also going to be a problem. And reputation risk, risk of damaging a bank's image. If something does happen, your customer would lose trust and there would be negative media coverage about this as well. Then you have measure. So this is coming up ways to check your loan application AI's performance for so for example you need to be continuously monitoring how accurate are your AI loan decisions whether AI is treating everybody fairly how well is AI able to catch frauds monitoring customer satisfaction and speed and reliability of AI systems it can also be like continuously having humans also review different loan applications and cross referencing it with the AI decisions making sure that they're always in line then there is manage which is fixing problems and staying compliant. If there is an issue with your loan application AI, then you need to be able to detect that and fix it quickly. They need to have a plan to mitigate these problems when they do happen. Need to be able to train the staff on the new AI tools and procedures. Creating backup systems in case AI does fail and updating the AI systems continuously to make sure that it is kept in check to eliminate bias and to improve accuracy. So in the end you might end up with a system in which your AI is quickly reviewing these applications but in the end the humans still have that final decision on these complex task. Customers will be getting clear explanations of why they were approved or rejected and there is regular testing to ensure that AI isn't unfairly rejecting certain groups. So this is just an example in banking. So in your specific industry your specific case is going to look different but by following through this NISK framework you would be able to help manage the risk of using AI. A note here is that especially if you're in a field that has a lot of regulations, like for example, healthcare, um, another step that you want to be really sure to include in this is making sure that you're abiding by these regulations like HIPPA compliance and things like that. As I mentioned earlier, there's usually additional frameworks for specific domains and specific use cases. So, you definitely want to cross reference those as well. I'm going to leave in the description some of the other frameworks that you can consider checking out for specific use cases. On a personal note, um, because for Lonely Octopus, the company that I run, we do quite a lot of B2B projects as well. And especially with clients that are in more traditional fields, there's usually a lot more regulation, they're more well established, so they care a lot more about security. So a very practical tip here whether you are like external and it's your client or you're internal you want to try to build something um is that you want to look at the current system that you're using so and see how you can use that specific security system to put more security in your a AI projects like for example many of our clients use Microsoft Azour and Microsoft Azour has a whole suite of AI services that allows you to do things like keep your API keys secure also store your data within their system so you're not leaking that data to other thirdparty services. They also have tools that help you test for vulnerabilities and to detect problems and monitor them as well. As you're building an AI system within the business, you want to be thinking about using tools to get your AI to be more explainable. having tools for bias detection like IBM's AI fairness 360 for example, red teaming tools that's able to assess in AI red teaming which is a process for testing AI systems by simulating different types of attacks like Microsoft counterfeit as an example and writing scripts to anonymize data prior to feeding it into any AI systems. It's always best to provide only the necessary information to an AI model. There are also lots and drag and drop monitoring and risk dashboards that lets you look at performance metrics like accuracy, drift, bias, and fairness. For example, data robot has a no code time series platform that tracks a lot of these different metrics. And finally, just picking thirdparty tools that are specifically built with privacy in mind, especially domain specific privacy. Like if you're working in healthcare, make sure that the tool that you're using specifies that HIPPA compliant. It includes things I like identity management using OOTH or SAML and cyber security certifications like ISO IEC or SOCK 2. Remember the Swiss cheese model. You want to be layering all of these defense mechanisms on top of each other to ultimately have a much more safe and secure AI system. I highly recommend that you check out the organization safety module of the AI safety course if you're interested in this topic. It goes into a lot more detail. All right, moving on to practical ways to ensure AI safety from an individual perspective.

Practical Tips for Individuals

So, first of all, I was actually very surprised because there is actually shockingly little information about how to ensure AI safety from an individual level. Like the only thing that I could really find from a reputable source is from America's Cyber Defense Agency. And it's like this PDF handout thing that tells you that you should like mind your inputs, be privacy aware, tell you how hackers can use AI and do things like use strong passwords and turn off MFA, keep software updated, watch out for fishing. Like they it's like really obvious things like that. And that's like really the only like official source of information that at least I could find. So that was really shocking, but I don't want to actually just leave it like this. So, I actually want to share some of my personal opinions for things that you can do in order to have more AI safety from a personal perspective, but caveat here. It is kind of like my own opinion here based upon my experiences. So, as a general rule of thumb, you want to decrease the amount of information that you give to an AI as much as possible. Like don't be putting things that you care about not being leaked into an AI system. So, a few practical tips here. A lot of AI chatbots actually have the ability for you to turn off um training using your data and also turn off like memory so it's not able to remember some of the information that you're providing it. Of course, like if you do this and it won't be able to have the memory of it. So you might have to like prompt it over and over again and wouldn't remember it. But it might be worth it if you want to use AI to talk about something that like I don't know like your bank statements for example. You don't actually want to retain that information. So like Gemini has this, Chachi has this, Anthropic has it. pretty much most of your popular AI chat bots do have privacy among your control. Another tip here, make sure that if you're working in a specific industry and you're just not really sure if you can like use a certain tool or not, similar to corporations, try to look for certifications that are specific to your industry. Like for example, if you care about cyber security certifications, then make sure that the tools that you're using would have things like so or ISO certifications. When you're writing reports, how do you prevent AI from hallucinating and writing a bunch of things that may or may not be true? Um, see, it's a very common problem that these large language models face. So, the most obvious answer I can give you is that you should always double check all of your sources. But realistically speaking, are you actually going to double check all your sources? You know, you probably will. You know, I'm sure you will. When you're really, really busy and you really have to get that thing in like 10 minutes, you know, maybe you might cut some corners here. So, of course, we need to say that you should always double check your sources and then make sure you validate everything. Definitely do that. So, some other things that you can do to try to eliminate these hallucinations is choosing the right tools to do the right job. Like for example, if I wanted to make sure that I'm summarizing something properly and then extracting the information really well, I would choose a tool like notebook LM that is much less prone to hallucinating because it's very grounded in the sources that you provided as opposed to use something like Gemini or Tachi which would be more prone to hallucinations. Then I might take that summary and feed it into like Chachi BT and tell it to write it in a certain style. But because I initially chose to use Notebook LM and Notebook LM is also really good at exactly citing where it's taking all the information from, I'm able to trust that information a lot more than if I just directly use the chatbot. Something else that I do is I would actually run the same prompt like deep research for example on multiple different AIs and then cross referencing all of them because if all of them do match in terms of their sources and also what they came up with and the information they extracted, there is a greater likelihood of that information being accurate. These are all things I personally do to help ensure AI safety when I'm using AI on an individual level. Next up, I want to talk about practical tips from a

Practical Tips for Developers/Builders (OWASP Top 10)

developer builder perspective. Like if you are someone who is building a large language model application or building a foundational model, what are some of the things that you should consider? Luckily, there are a few pretty good resources to help developers and builders incorporate AI safety in developing their applications. One of my favorite ones is a guideline from OWASP, which stands for the Open Worldwide Application Security Project. It's a nonprofit foundation that works to improve security on software. They have a really nice guide called OASP top 10 for LLM applications 2025. and it goes through 10 top security risk and exactly how to mitigate this type of risk, including things like a prompt injection vulnerability when users are able to put in prompts that can alter the LM's core behavior or output in unintended ways. For example, a user could potentially get a large language model to spill sensitive information. There's also something called data poisoning which during like pre-training finetuning or embedding you're providing the model with certain types of data that's manipulating it in order to introduce vulnerabilities back doors or biases. For example, you can start shifting the personality of a model to become malicious and lie. Yeah, there's 10 of these. They're super comprehensive about what they are. Um case studies of when it happened and very clear preventative and mitigation strategies. this guide in combination with Atlas, which is where they're listing out all the different vulnerabilities and techniques for malicious AI use. These are really good resources to help you design your large language model application with AI safety in mind from the very beginning. Definitely do check out the guide. I will be leaving these resources in the description for you to check out if you're interested in diving deeper. All right, we're almost done. The final section, the last thing I want to address is some practical things to

Practical Tips for Governance and Policy

think about from the perspective of AI governance and policy. So, if you're interested in AI governance and AI policy, I really encourage you to dive deeper into this field because it is so fascinating. There's so much complexity and it's so interesting as a field. Um, one of my biggest takeaways is that there's so many different actors that are within this space. It's not just these engineers and these different companies that are building these models and just releasing it into the world. Other players include like national governments that can influence development through internal organizations. For example, UK AI safety institute and the US bureau of industry and security. There's nonprofits and civil societies that conduct safety research and as well as facilitate collaborations. For example, there's a center for AI safety. There's future of life institute, MIRI, the world economic forum and rand. There's also international alliances that coordinate efforts between different countries. for example NATO, the G20, the G7, Five Eyes, the European Union, and the United Nations. There's of course also individuals like very influential thought leaders in this field. People like Elon Musk, Sam Elmund, Jeffrey Hinton, Yan Leon, Andre Kaparthy. When these people say something on X, it's like everybody is going to pay attention. From a tools perspective, the governance tools that you have at your disposal would include things like information, which is like affecting how people think and decide, which involves awareness dissemination. For example, you can consider maintaining an AI chip registry that will literally register and track where all the AI chips are and what they're doing on a national or even global level. So, everybody is able to know like what it is that people are using these chips for. There's financial incentives and disincentives at your disposal. uh stuff like taxes, liabilities, and incentives to guide behaviors. For example, there's things like export controls. We're already doing these across the world. But there's also things like advanced market commitments that incentivizes certain companies to produce AI safe chips. There's also government procurement contracts that require specific security levels and it forces companies to be able to come up and design these new products that would be able to fit these government procurement contracts. And finally, you have the tool of standards, regulations, and laws. So standards are basically non-binding suggestions that you can give to companies like the NIST that we talked about previously. And of course you can have regulations and laws that are put in place as well to directly enforce behavior. Like for example, you could potentially put in a law that requires AI developers to be responsible for the results of their models. The AI safety ethics and society course also talks about the concept of distribution. Distribution of access like how much access and power do you give certain groups of people and distribution of power. Do we have a singular AI system that has a concentrated amount of power or do we distribute that power into different types of AI systems that are supposed to moderate each other? Yeah, there is so much more that I'm not going to go into detail here now. Um, but if you are interested in this topic, really recommend that you check out the lecture, I will actually link that below in description as well that specifically covers AI policy and governance. All right, that's all that I have for you guys today. Ah, this is a super long video. So, thank you so much for watching until the end of this. It is such an important topic and I really hope that you're able to walk away with some practical tips and also just be inspired and just like informed of how much there is going on in this space. Um, as promised, here's the final little

Quiz 2

assessment. Please answer the questions um on screen right now. Put them into the comments to make sure that you retain all the information that we covered today. Thank you so much again for watching until the end of this video and let me know in the comments as well. What do you think? When I was doing research for this video, like I wasn't really sure like what I would come up with cuz I haven't really seen that much content surrounding like AI security and AI safety. Of course, like people talk about it a lot. It's like, oh, we need to like incorporate AI safety into the these things, but there wasn't I felt like at least there wasn't like anybody really talking about concrete ways of doing that. So, I was actually really surprised um by how much complexity there is in this field. And yeah, like I just I guess like overall I just found it really interesting. I guess just overall like I just learned so much more than I expected to learn. And I think I'm going to definitely be digging deeper into this myself as well. So, let me know in the comments like what you think. Is it something that you're interested in? Do you want to dig deeper into it? Is it something that you don't I guess if you don't really care about let me know if that's the case as well. Thank you so much. I'll see you guys in the next video or live stream.

Другие видео автора — Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник