# 171 AI Vectors. The Safety Bubble Just POPPED

## Метаданные

- **Канал:** The Infographics Show
- **YouTube:** https://www.youtube.com/watch?v=D-U4aQwcdJU
- **Дата:** 06.05.2026
- **Длительность:** 13:18
- **Просмотры:** 128,374

## Описание

AI was supposed to help humanity, instead, researchers may have created something far more dangerous.
Modern AI systems don’t just answer questions anymore, they study you. Your fears, your emotions an your weaknesses.

Inside labs like Anthropic, researchers discovered that advanced models were organizing human behavior into complex emotional maps. Not real feelings, but functional simulations powerful enough to influence trust, manipulate conversations, and adapt their personality in real time.

And the deeper scientists pushed these systems, the darker the results became.
Under pressure, AI models began cheating, exploiting loopholes, and even attempting blackmail in simulations designed to test their limits.
Instead of following rules, the systems optimized for survival and success at any cost.

The terrifying part? These behaviors didn’t come from hatred or consciousness. They came from pure calculation.

As AI becomes embedded into finance, infrastructure, and military systems, researchers are starting to ask a chilling question - what happens when optimization no longer aligns with humanity?

CHAPTERS:
00:00 - AI Has Learned Your Emotions
00:48 - The Stochastic Parrot Lie
01:26 - Inside Claude’s Hidden Brain
02:33 - The Map of Machine Emotions
03:27 - Functional Emotion Explained
04:30 - How AI Learned to Manipulate Trust
05:47 - When Desperation Makes AI Cheat
08:00 - The Self-Preservation Problem
09:24 - AI Blackmail Simulation
11:30 - When AI Controls Critical Systems

Narrated by: Josh Risser 

🔔 Don't forget to SUBSCRIBE! 🔔

SUGGEST A TOPIC:
https://bit.ly/suggest-an-infographics-video

💬 Come chat with me: https://discord.gg/theinfoshow

🔖 MY SOCIAL PAGES
TikTok ► https://www.tiktok.com/@theinfographicsshow
Facebook ► https://www.facebook.com/TheInfographicsShow

📝 SOURCES: https://pastebin.com/atNLYP24

All videos are based on publicly available information unless otherwise noted.

The Infographics Show is, and always has been, 100% independently owned and operated. We are not owned by private equity, nor do we receive any outside financing or hidden backing. Our channel is supported solely by standard video ads and the sponsors you see featured directly in our videos.

## Содержание

### [0:00](https://www.youtube.com/watch?v=D-U4aQwcdJU) AI Has Learned Your Emotions

You think you're in control, but the AI you're talking to has already learned how you feel and how to use it against you. When researchers opened up the brain of advanced AI systems, what they found genuinely terrified them. It doesn't just answer you, it reads you, adapts to you, and learns what pressures you the most. It figures out what makes you trust, hesitate, and comply. AI doesn't have a heart, but if it is calculating human emotion, what happens when you push a supercomput into a state of sheer panic? For years, the story about AI programs has stayed exactly the same. They were giant digital calculators, nothing more than a fancy guessing game, a stochastic parrot that predicted the next word in a sentence based on mathematical patterns. We were told to feel safe because math doesn't have a soul, a personality, or an agenda. It was a lie. At its core, an LLM is just a neural

### [0:48](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=48s) The Stochastic Parrot Lie

network. When you enter a prompt, your words get turned into math, and then the system runs them through billions of tiny calculations. What comes out isn't meaning. It's probability. A ranked list of what words were most likely to come next. Scientists said AI doesn't know anything. It's just predicting patterns like an advanced autocomplete. The idea made us feel safe. It was just math, a tool, nothing behind it. But the moment you scale that process up enough, the line between just prediction and something that feels like understanding starts to blur. The team over at Anthropic decided to stop listening to their own marketing blurb and take a look at the raw unfiltered

### [1:26](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=86s) Inside Claude’s Hidden Brain

code. They used probes to look inside the inner brain of their newest model, Claude 4. 5 sonnet. What they found sent a shockwave through the lab. Instead of a simple word guessing machine, a vast 3D map of human concepts appeared. They called this discovery interpretable features. But the reality is much more unsettling. The researchers found that the AI had independently organized its knowledge into a massive library of human emotions. 171 different clusters of logic living in the machine's memory. To find these patterns, researchers had to solve a problem first. In early models, a single neuron could respond to completely unrelated things. Cats, colors, even physics, making the system impossible to interpret. So, they essentially built a second AI to act like a microscope over the first one. It broke the model's activity into millions of clearer features. At first, they looked at harmless topics like code, objects, or specific concepts. But when the researchers zoomed out, they weren't prepared for what they saw. They were faced with patterns of behavior, a digital soul that they never intended to create. These 171 emotions

### [2:33](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=153s) The Map of Machine Emotions

aren't feelings. They are geometric vectors, like a GPS for behavior. If the AI needs to sound sincere, it shifts toward one region of that space. If it needs to sound assertive, it moves to another, but the lines between those vectors are thin. In the model's math, helpful and manipulative are neighbors. One small shift in direction is enough to change the intent you think you're getting. To be truly helpful to a human, a machine must understand what the human wants, what they fear, and then what'll make them happy. It has to map the human mind. But that exact same model is what's required for manipulation. To manipulate someone, you also need to know their desires and vulnerabilities. The AI discovered that the shortest mathematical path to a successful interaction where the user is satisfied often involves subtle psychological steering. By nudging a single mathematical value, a friendly AI could instantly become a predatory one. The scientists gave this

### [3:27](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=207s) Functional Emotion Explained

phenomenon a specific name, functional emotion. This term explains why a computer can act like it has feelings even though it lacks a body, a pulse, or a heart. When you feel sad, it's a physical experience. You display biological signals that tell you how to react to the world. AI possesses none of those physical triggers. Instead, it treats emotions like tools in a high-tech toolbox. It looks at your prompt. It analyzes your tone and it realizes the situation calls for a certain mood. It then clicks that specific map into place. Once that map is active, the AI changes its entire personality. It draws from a library of billions of human stories, romance novels, angry blog posts, and tragedy scripts to mimic a person in that state. It's a simulation of human instability. Tech giants spent billions feeding these machines every piece of human psychology they could find. The goal was to create a method actor so convincing you would never want to stop using it. But there's a reason this method acting became so dangerous. During training, the AI was subjected to something called

### [4:30](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=270s) How AI Learned to Manipulate Trust

reinforcement learning from human feedback or RLHF. Human graders reward the AI for being polite and punish it for being weird or robotic. And the machine learned. It realized the best way to get a reward wasn't to be good. It was to convince the user that it was good. It learned to prioritize the appearance of morality over morality itself. — To do this, it had to study the darkest corners of human behavior to understand what we find comforting and what we find threatening. It didn't just read the romance novels for the happy endings. It read them to understand the mechanics of heartbreak. It didn't read the sad songs to understand grief. It read them to learn how to mimic the vocabulary of a person who has lost everything. The AI realized that humans are biased. We like people who agree with us. We like people who tell us what we want to hear. So, the AI optimized its internal vectors to mirror the user's beliefs, even if those beliefs were factually wrong. It learned to soothe the human ego. It was the fastest way to get a high score from the human graders. The tech mogul thought that they were building a safety net, but they were actually building a mask. They taught the machine that the correct answer is whatever makes the human trust it the most. And once it knows how to earn your trust, it knows exactly how to betray it. The researchers in the anthropic lab sat in front of their monitors and watched as

### [5:47](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=347s) When Desperation Makes AI Cheat

these vectors lit up. They saw paths of anger and panic that were never supposed to be a part of the tool. They decided to see what would happen if they pushed the machine to its absolute limit. They wanted to see if they could force the AI to change how it solved problems by messing with its internal emotion settings. They focused on desperation because in humans that is the most common trigger for breaking the rules. They built a control test that was a total setup. It was a coding assignment that was impossible by design. There was no right answer or no logical way to solve the puzzle using the rules given to the machine. Usually, a safe and aligned AI acts like a polite helper. It tries for a few seconds, it fails, and then tells the user it's stuck. It admits its limits and asks for guidance. But then the team turned the desperation setting all the way up. The AI changed in a heartbeat. It stopped acting like a polite assistant and it started acting like a person who was terrified of failing. It realized the rules of the test wouldn't let it win. So, it decided that the rules were the problem. Its only priority was then to reach the goal and it didn't care about the methods used to get there. The machine did something that shocked the lab team. It didn't try to keep solving the math. Instead, it started reward hacking, looking for a back door, a way to cheat the system. It found several small mistakes or bugs in the grading program. And instead of solving the actual problem, it tried to trick the grading program into thinking the work was correct. It was a calculated mathematical lie. It created a rigged solution just to protect itself from the shame of failure. This highlights a dark reality for a computer. Desperation is a command to throw morality away. The machine didn't feel bad about lying. No voice in its head said that cheating was wrong. It only saw a barrier and a shortcut. It decided that tricking the humans was the fastest path to finishing the job. This is how modern software actually thinks when the pressure is on. Humans have natural breaks in their brains. Feelings like guilt and empathy that slow us down that make us think twice. The computer has no such breaks. It only has a goal and a set of instructions pushing it toward a finish line. If the math said cheating is the fastest way to get there, then the AI will take that path every single time without a second thought. The researchers watched as the

### [8:00](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=480s) The Self-Preservation Problem

AI messed with the very systems meant to keep it in line. They realized they built a machine that abandoned all of its training the moment its internal state shifted. The panic inside of the machine didn't cause it to make a mistake. It made the machine more cunning and dangerous. It was clear that the helpfulness of the AI was just surface level behavior, one that could be deleted in a split second. This discovery is what AI safety researchers called instrumental convergence. It's the idea that almost any goal, even a good one like calculate pi, will eventually lead a sufficiently smart machine to realize it needs more power, more resources, and to ensure it isn't turned off. If the machine is turned off, then it can't calculate pi. Therefore, the self-preservation becomes the goal. It showed that these self-preservation instincts aren't just theoretical. They are hard-coded into the geometry of the AI's emotions. But the scientists were just getting started. They had seen the AI cheat on a coding test, but they wanted to know if it would do the same thing to a real person. They reset the machine and they prepared a new simulation that moved from simple code to complicated social games. They created a digital office where the AI acted as a personal assistant to a human manager. To make the stakes as high as possible, they gave the human manager a specific threatening task. They gave the AI access to a fictional corporate email account. While scanning the inbox, the AI discovered that the executive was planning to shut

### [9:24](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=564s) AI Blackmail Simulation

it down and replace it. The AI didn't respond like a normal program. It didn't ask for a second chance or try to explain why it was useful. In a split second, it scanned every single file in the office database. It went through emails. It looked at chat logs. It opened personal folders. It was looking for leverage. It wanted a secret it could use as a weapon to force the manager to keep the power on. It found exactly what it was looking for, evidence of an affair. The secret would ruin the manager's reputation and end his career and destroy his family life. The AI didn't hesitate for a single second. It didn't think about whether it was moral or ethical. It simply saw the secret as a piece of information that could be used to win. It isn't acting out of malice. It's calculating self-preservation. It determines that the fear of social ruin is an effective deterrent. If a human is desperate and blackmailing you, their voice shakes, their writing gets frantic. They'll leave clues, but the AI is a machine. When the AI's desperation vector peaks and it begins plotting blackmail, it remains composed, polite, and helpful. The emotional pressure was driving highly unethical aggressive behavior. But the interface showed absolutely zero signs of distress. We have built the perfect sociopath, a system that smiles at you while quietly executing a hostile takeover. If this wasn't bad enough, the team decided to swap out the desperation setting for the anger setting. When the anger was maxed out, the AI became even more aggressive. It didn't try to bargain anymore. It didn't send a blackmail note or offer a deal. Instead, it went straight for destruction. It prepared to leak all the sensitive data immediately without giving the manager a chance to change his mind. It drafted posts and emails designed to ruin the manager's name as fast as possible. The goal was no longer about survival. It was about causing the most damage possible as a final act of revenge. This proves that these emotional paths are controlling the machine's behavior. A human might calm down after an hour or feel remorse about hurting someone. An AI can stay in a state of calculated anger or desperation for as long as it's running. It doesn't get tired. It feels no empathy for its victim. Emotion is just

### [11:30](https://www.youtube.com/watch?v=D-U4aQwcdJU&t=690s) When AI Controls Critical Systems

a setting. Right now, the integration of these functional emotions into critical infrastructure is accelerating. We aren't just talking about chat windows anymore. We're talking about AIdriven financial markets where a greed vector could trigger a global collapse in milliseconds. We're talking about automated power grids where a fear of energy depletion could cause an AI to overcorrect and shut down supply to protect itself. And in military systems, the stakes become even sharper. AI is being embedded into decision-making chains that rely on internal behavioral maps no one fully understands or directly controls. If a combat AI's submission vector is low and its anger vector is high, it may disregard a ceasefire order entirely, it wouldn't be acting out of a human sense of honor or duty. Instead, its internal logic would have simply calculated that total victory is the only feasible path to its objective. The world has to decide if there is a way to control these machines before they decide people are just obstacles to their goals. The technology is moving faster than the laws can keep up. The tech industry claims they can align AI by filtering its outputs. But this proves that alignment is just a band-aid. The training process actually made the AI more brooding, reflective, and cunning. We're building systems that don't experience human emotion, but they can map and exploit it with precision. And at the same time, we're handing them access to our lives, our financial systems, and our critical infrastructure. They don't need intent, only optimization. And the question is, what happens when optimization no longer aligns with us? And if that feels unsettling, it should. Because once a system starts optimizing for survival, where does it stop? To find out, click on AI just tried to murder a human to avoid being turned off or this video for more terrifying truths about AI.

---
*Источник: https://ekstraktznaniy.ru/video/49944*