# Cramer's rule, explained geometrically | Chapter 12, Essence of linear algebra

## Метаданные

- **Канал:** 3Blue1Brown
- **YouTube:** https://www.youtube.com/watch?v=jBsC34PxzoM
- **Дата:** 17.03.2019
- **Длительность:** 12:12
- **Просмотры:** 1,457,277
- **Источник:** https://ekstraktznaniy.ru/video/16228

## Описание

This rule seems random to many students, but it has a beautiful reason for being true.
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Home page: https://www.3blue1brown.com/

Thanks to these viewers for their contributions to translations
Hebrew: Omer Tuchfeld

----

If you want to contribute translated subtitles or to help review those that have already been made by others and need approval, you can click the gear icon in the video and go to subtitles/cc, then "add subtitles/cc".  I really appreciate those who do this, as it helps make the lessons accessible to more people.

Music by Vincent Rubinetti.
Download the music on Bandcamp:
https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

Stream the music on Spotify:
https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u

------------------

3blue1brown is a channel about animating math, in all senses of the word animate.  And you kno

## Транскрипт

### <Untitled Chapter 1> []

In a previous video I've talked about linear systems of equations, and I sort of brushed aside the discussion of actually computing solutions to these systems. And while it's true that number crunching is typically something we leave to the computers, digging into some of these computational methods is a good litmus test for whether or not you actually understand what's going on, since that's really where the rubber meets the road.

### describe the geometry behind a certain method for computing solutions [0:32]

Here I want to describe the geometry behind a certain method for computing solutions to these systems, known as Cramer's rule. The relevant background here is understanding determinants, a little bit of dot products, and of course linear systems of equations, so be sure to watch the relevant videos on those topics if you're unfamiliar or rusty. But first I should say up front that this Cramer's rule is not actually the best way for computing solutions to linear systems of equations, Gaussian elimination for example will always be faster. So why learn it? Well think of it as a sort of cultural excursion. It's a helpful exercise in deepening your knowledge of the theory behind these systems. Wrapping your mind around this concept is going to help consolidate ideas from linear algebra, like the determinant and linear systems, by seeing how they relate to each other. Also from a purely artistic standpoint the ultimate result here is just really pretty to think about, way more so than Gaussian elimination. Alright so the setup here will be some linear system of equations, say with two unknowns x and y and two equations. In principle everything we're talking about will also work for systems with larger number of unknowns and the same number of equations, but for simplicity a smaller example is just nicer to hold in our heads. So as I talked about in a previous video you can think of this setup

### set up geometrically as a certain known matrix [1:48]

geometrically as a certain known matrix transforming an unknown vector x y where you know what the output is going to be, in this case negative 4 negative 2. Remember the columns of this matrix are telling you how that matrix acts as a transform, each one telling you where the basis vectors of the input space land. So what we have is a sort of puzzle, which input vector x y is going to land on this output negative 4 negative 2. One way to think about our little puzzle here is that we know the given output vector is some linear combination of the columns of the matrix x times the vector where i hat lands plus y j hat lands, but what we want is to figure out what exactly that linear combination should be. Remember the type of answer you get here can depend on whether or not the transformation squishes all of space into a lower dimension, that is if it has a zero determinant. In that case either none of the inputs land on our given output, or there's a whole bunch of inputs landing on that output. But for this video we'll limit our view to the case of a non-zero determinant, meaning the outputs of this transformation still span the full in-dimensional space that it started in. Every input lands on one and only one output, and every output has input. As a first pass let me show you an idea that's wrong but in the right direction. The x coordinate of this mystery input vector is what you get by taking its dot product with the first basis vector 1 0. Likewise the y coordinate is what you get by dotting it with the second basis vector 0 1. So maybe you hope that after the transformation the dot products with the transformed version of the mystery vector basis vectors will also be these coordinates x and y. That'd be fantastic because we know what the transformed version of each of those vectors are. There's just one problem with it, it's not at all true. For most linear transformations the dot product before and after the transformation will look very different. For example, you could have two vectors generally pointing in the same direction with a positive dot product, which get pulled apart from each other during the transformation in such a way that they end up having a negative dot product. Likewise things that start off perpendicular with dot product 0, like the two basis vectors, quite often don't stay perpendicular to each other after the transformation, that is they don't preserve that 0 dot product. And looking at the example I just showed dot products certainly aren't preserved, they tend to get bigger since most vectors are getting stretched out. In fact, worthwhile side note here, transformations which do preserve dot products are special enough to have their own name, orthonormal transformations.

### leave all of the basis vectors perpendicular [4:39]

These are the ones that leave all of the basis vectors perpendicular to each other and still with unit lengths. You often think of these as the rotation matrices, they correspond to rigid motion with no stretching or squishing or morphing. Solving a linear system with an orthonormal matrix is actually super easy. Because dot products are preserved, taking the dot product between the

### taking the dot product between the output vector and all the columns [4:58]

output vector and all the columns of your matrix will be the same as taking the dot product between the mystery input vector and all of the basis vectors, which is the same as just finding the coordinates of that mystery input. So in that very special case, x would be the dot product of the first column with the output vector, and y would be the dot product of the second column with the output vector. Why am I bringing this up when this idea breaks down for almost all linear systems? Well, it points us in a direction of something to look for. Is there an alternate geometric understanding for the coordinates of our input vector that remains unchanged after the transformation? If your mind has been mulling over determinants, you might think of the following clever idea.

### take the parallelogram [5:46]

Take the parallelogram defined by the first basis vector i-hat and the mystery input vector xy. The area of this parallelogram is the base, 1, times the height perpendicular to that base, which is the y-coordinate of that input vector. So the area of that parallelogram is a sort of screwy roundabout way to describe the vector's y-coordinate. It's a wacky way to talk about coordinates, but run with me. And actually, to be a little more accurate, you should think of this as the signed area of that parallelogram, in the sense described in the determinant video. That way, a vector with a negative y-coordinate would correspond to a negative area for this parallelogram, at least if you think of i-hat as in some sense being the first out of these two vectors defining the parallelogram.

### defining the parallelogram [6:32]

And symmetrically, if you look at the parallelogram spanned by our mystery input vector and the second basis, j-hat, its area is going to be the x-coordinate of that mystery vector. Again, it's a strange way to represent the x-coordinate, but see what it buys us in a moment. And just to make sure it's clear how this might generalize, let's look in three dimensions. Ordinarily, the way you might think about one of a vector's coordinates, say its z-coordinate, would be to take its dot product with the third standard basis vector, often called k-hat. But an alternate geometric interpretation would be to consider the

### consider the parallelepiped [7:06]

parallelepiped that it creates with the other two basis vectors, i-hat and j-hat. If you think of the square with area 1 spanned by i-hat and j-hat as the base of this whole shape, then its volume is the same as its height, which is the third coordinate of our vector. And likewise, the wacky way to think about the other coordinates of the vector

### form a parallelepiped [7:27]

would be to form a parallelepiped using the vector and then all of the basis vectors other than the one corresponding to the direction you're looking for. Then the volume of this gives you the coordinate. Or rather, we should be talking about the signed volume of parallelepiped in the sense described in the determinant video using the right-hand rule. So the order in which you list these three vectors matters. That way, negative coordinates still make sense. Okay, so why think of coordinates as areas and volumes like this? Well, as you apply some sort of matrix transformation, the areas of these parallelograms, well, they don't stay the same, they might get scaled up or down. But, and this is the key idea of determinants, all of the areas get scaled by the same amount, namely the determinant of our transformation matrix. For example, if you look at the parallelogram spanned by the vector where your first basis vector lands, which is the first column of the matrix, and the transformed version of xy, what is its area? Well, this is the transformed version of the parallelogram we were looking at earlier, the one whose area was the y-coordinate of the mystery input vector. So its area is just going to be the determinant of the transformation multiplied by that y-coordinate.

### solve for y by taking the area of this new parallelogram [8:40]

So that means we can solve for y by taking the area of this new parallelogram in the output space divided by the determinant of the full transformation. And how do you get that area? Well, we know the coordinates for where the mystery input vector lands, that's the whole point of a linear system of equations. So what you might do is create a new matrix whose first column is the same as that of our matrix, but whose second column is the output vector, and then you take its determinant. So look at that, just using data from the output of the transformation, namely the columns of the matrix and the coordinates of our output vector, we can recover the y-coordinate of the mystery input vector, which is halfway to solving the system. Likewise, the same idea can give us the x-coordinate.

### look at the parallelogram [9:27]

Look at the parallelogram we defined earlier, which encodes the x-coordinate of the mystery input vector spanned by that vector and j-hat. The transformed version of this guy is spanned by the output vector and the second column of the matrix, and its area will have been multiplied by the determinant of that matrix. So to solve for x, you can take this new area divided by the determinant of the full transformation. And similar to what we did before, you can compute the area of that

### compute the area of that output parallelogram by creating a new matrix [9:55]

output parallelogram by creating a new matrix whose first column is the output vector and whose second column is the same as the original matrix. So again, just using data from the output space, the numbers we see in our original linear system, we can solve for what x must be. This formula for finding the solutions to a linear system of equations is known as Cramer's rule. Here, just to sanity check ourselves, plug in some numbers here. The determinant of that top altered matrix is 4 plus 2, which is 6, and the bottom determinant is 2, so the x-coordinate should be 3. And indeed, looking back at the input vector we started with, the x-coordinate is 3. Likewise, Cramer's rule suggests that the y-coordinate should be 4 divided by 2, or 2, and that is in fact the y-coordinate of the input vector we were starting with. The case with three dimensions or more is similar, and I highly recommend you take a moment to pause and think through it yourself. Here, I'll give you a little bit of momentum. What we have is a known transformation given by some 3x3 matrix and a known output vector given by the right side of our linear system, and we want to know what input lands on that output. And if you think of, say, the z-coordinate of that input vector as the volume of that special parallelepiped we were looking at earlier, spanned by i-hat, j-hat, and the mystery input vector, what happens to that volume after the transformation? And what are the various ways you can compute that volume? Really, pause and take a moment to think through the details of generalizing this to higher dimensions, finding an expression for each coordinate of the solution to a larger linear system. Thinking through more general cases like this and convincing yourself that it works and why it works is where all the learning really happens, much more so than listening to some dude on YouTube walk you through the same reasoning again.
