# (ML 2.4) Growing a classification tree (CART)

## Метаданные

- **Канал:** mathematicalmonk
- **YouTube:** https://www.youtube.com/watch?v=S51plSJBC2g
- **Дата:** 06.06.2013
- **Длительность:** 14:06
- **Просмотры:** 37,730

## Описание

How to build a decision tree for classification using the CART approach.


A playlist of these Machine Learning videos is available here:
http://www.youtube.com/my_playlists?p=D0F06AA0D2E8FFBA

## Содержание

### [0:00](https://www.youtube.com/watch?v=S51plSJBC2g) Segment 1 (00:00 - 05:00)

we just saw how to grow a regression tree and now we'll talk about growing a classification tree so we'll see that we follow a very similar greedy procedure as for the regression case so let's so we've got some you know our abstract space Rd here and all of our points in the classification case they I've all got labels so maybe we've got some points some red dots and blue I don't know maybe make them X's we got some blue x's and a little bit of notation to make our lives easier let's say if we got some arbitrary subset of Rd called R so we have some subset R then we'll Define e subr to be the fraction of so we we're going to count the number of points of each class in R and if we wanted to assign a label to this region R we would take a majority vote so in this case we've got four blues and only two four blue x's and one and two red dots so we would classify a new point in this region say this one as a blue X so it's the fraction of points XI in our data set remember we have some data set as usual X one up to y1 up to xn YN so it's the fraction of points x i in our training data set that would be misclassified points XI in the region r that would be misclassified by a majority vote in the region r so in this case how many would we get wrong we we're classifying them as blue x's and there's two Reds so we would get two out of six wrong so that would be one third wrong for this R here and we could make this a little more formal if we said this is the so if n r is the number of points in R this would be the we would kind of to make this look similar to the regression case so we would take the Min over all the possible classes y of the fraction of points and that can be written as I'll say what NR is in a second the number of points which we can write using this indicator function that are not equal to Y so this tells us the number of points in R for which the label is not equal to Y and NR we'll say is the total number of points so that's just 12 and R is just well I can just write it as the number of I's such that XI is in R total number of points in our trading set that is in R so this is exactly what this is by taking a majority vote and now we've got this nice little notation we can very easily Define our procedure for growing a tree classification tree so I'm drawing in higher give us one more Dimension to visualize this is our Rd it's got more than three dimensions in general but we'll just think about since we can only draw three and what we will do is for the first split so we're growing this binary tree and for the first split in the tree we will

### [5:00](https://www.youtube.com/watch?v=S51plSJBC2g&t=300s) Segment 2 (05:00 - 10:00)

will choose j a dimension J and a split s to minimize so I need a little bit more tation here I guess so I'm going to minimize e r JS plus e Prime JS where so little more notation here so if J is one and S is this point then we would we're going to make a and we might make a split here and let me call R JS so we'll call rjs the points this is the set of all the points XI in our data set such that x i j is greater than S and we'll call R Prime JS the other side so XI J is less or equal to S so this would be our JS and Prime JS if we took J equal to one and S equal to this point so we're going to choose the split that minimizes this quantity using this definition here for the error so this is like the error on the RP Prime region and this is the error on the r region and now for the general case we've got so we have our first split in the tree and let me call this let me call let me associate a region let's associate a region with each node in the tree so the first region we'll call R1 and that'll just be the whole space and then we'll so we'll number these nodes maybe this is node one node two node three every node in the tree will give a number and we will associate a region associate with that node so that would be R2 this would be R3 and so on and R2 so let's call this R1 since this was our first split this is R1 and this is R1 Prime JS and so R2 will be for the JNS that minimized this it will be R1 Js and R3 is going to be R1 Prime JS for the JNS that minimize this on the first split so right we're just you know we're just dividing this up and we're assigning one of these regions to this next node down and we're assigning the other region to this other node and now for the in general so we need to Define how we're going to split at a general node so in general we'll be at some RK maybe it's one of these but in general some RK and so we need to split r k so splitting RK to do that we choose just like before choose JNS to minimize e r k JS plus e r Prime k j s so if K so if r k so if K was two for example then here we had this was R1 Prime then this would also be R2 this right hand side and R2 JS if J was two say and S was this was this quantity we're going to split R2 at this s and we get splitting along this plane and r k

### [10:00](https://www.youtube.com/watch?v=S51plSJBC2g&t=600s) Segment 3 (10:00 - 14:00)

JS just like before here I'm going to Define it well let me just modify our previous definition we can just generalize this to RK and this time it's only going to be the points in RK so this does indeed generalize the previous definition because when it was R1 we were considering all the points XI in our data set so this is these are also points in the data set that also lie in r k and the first case we r k was one so it was all of Rd so this is this agrees with our earlier definition so that should make it to be precise these are not all points in RK but these are points in RK that are in the data set so at the general step we need to split our K we we're here and we look over all Dimensions split points and we've choose the plane that minimizes this quantity Where RK would be like this region here and RK Prime might be this other region and we're minimizing the sum of the classific of the misclassification error that would would arise by classifying the points in each of those two regions according to a majority vote in that region this gives us a procedure for building this tree so we get this infinite tree if we were to just continue doing this forever and but we'll use a stopping Criterion just like before in the regression case so the our preferred or a better stopping criteria then we could stop if there's only one point but another option would be stopping when or only considering splits when the resulting regions would result in some minimum number of points say five or in the classification case another option C would be to stop when so we would stop at RK when RK contains points only of one class so this would be another possible stopping criteria so this defines a procedure for building a classification tree and it can be generalized so we use the misclassification error here but this can be generalized to use other types of quantities to minimize such as so one would be the entropy I have undefined entropy but one another possibility which people often use some may give better performance is use the entropy or another possibility is to use What's called the genie index

---
*Источник: https://ekstraktznaniy.ru/video/52874*