# Course outline: "Master Machine Learning with scikit-learn"

## Метаданные

- **Канал:** Data School
- **YouTube:** https://www.youtube.com/watch?v=mN51Tp02RNk
- **Источник:** https://ekstraktznaniy.ru/video/23712

## Транскрипт

### Segment 1 (00:00 - 04:00) []

Let's talk about what we're going to cover in each chapter. In chapter 1, which is this chapter, we're getting you ready for the course. In chapter 2, we'll walk through the basic Machine Learning workflow, from loading a dataset to building a model to making predictions. In chapter 3, we'll focus on one of the most important data preprocessing steps, which is the encoding of categorical features. In chapter 4, we'll see how to use ColumnTransformer and Pipeline to make our workflow more powerful and efficient. In chapter 5, we'll review the workflow we've built so far to make sure you understand the key concepts before we start adding additional complexity. In chapter 6, we'll learn how to create features from unstructured text data. In chapter 7, we'll discuss missing values and explore a few different ways to handle them. In chapter 8, we'll see what problems arise when we expand the size of our dataset, and then we'll figure out how to handle those problems. In chapter 9, we'll review our workflow again and discuss how it's helping us to prevent data leakage. In chapter 10, we'll take a deep dive into how to efficiently tune our Pipeline for maximum performance. In chapter 11, we'll try out a non-linear model called "random forests" and figure out how to tune it without overextending our computing resources. In chapter 12, we'll learn how to ensemble our different models two different ways and how to tune the ensemble for even better performance. In chapter 13, we'll discuss the benefits of feature selection and then try out a handful of different automated methods for selecting features. In chapter 14, we'll experiment with standardizing our features to see if that improves our model performance. In chapter 15, we'll create a variety of new features within our Pipeline and discuss why you might want to do all of your feature engineering using scikit-learn rather than pandas. In chapter 16, we'll do one final review of the workflow that we created throughout the course. In chapter 17, we'll experiment with different ways of handling categorical features with lots of unique values. In chapter 18, we'll thoroughly explore the problem of class imbalance and the processes you can use to work around it. In chapter 19, we'll walk through my complete workflow for handling class imbalance so that you can see a demonstration of the best practices. And finally, in chapter 20, we'll discuss how you can keep learning and improving your skills on your own. I recommend watching the chapters in order, because each chapter builds on the material from previous chapters. Also, you might have noticed that most of the chapters end with a series of lessons marked "Q&A". These lessons answer common questions that may have come up in your mind while watching the rest of the lessons in that chapter. Thus, the Q&A lessons should help you to understand the core material from each chapter in greater depth. However, you can still ask your own questions by posting a comment underneath any video, and I'll do my best to help. Finally, I wanted to mention that this course won't be focusing on high-level algorithm selection, such as whether you should use a logistic regression model or a random forests model for your particular problem. That's because I've found that the workflow is more important and will have a greater impact on your overall Machine Learning results than your ability to pick between algorithms. In fact, once you've mastered the workflow, you can iterate through different algorithms quickly even if you don't deeply understand them. And even if you did understand all of the algorithms, it's hard to know in advance which one will work best for a given problem, which is why it's so important to build a reusable workflow that enables you to switch between algorithms easily. The bottom line is that understanding the algorithms is still useful, but the workflow is even more important, and so that's the focus of this course.