# How to save a scikit-learn Pipeline with custom transformers

## Метаданные

- **Канал:** Data School
- **YouTube:** https://www.youtube.com/watch?v=47jp10PxzIg
- **Источник:** https://ekstraktznaniy.ru/video/23715

## Транскрипт

### Segment 1 (00:00 - 02:00) []

If you save a Pipeline using pickle or joblib, and the Pipeline includes custom transformers, then the saved Pipeline can only be loaded into a new environment if the functions it depends upon are defined in the new environment. For example, let's save our current Pipeline using pickle. Let's pretend that we're in a brand new environment and we wanted to make predictions for X_new using our saved Pipeline. Because the Pipeline includes custom transformers which use the first_letter and sum_cols functions, those two functions need to be defined in the new environment. And because those functions depend on pandas and NumPy, then pandas and numpy also need to be imported in the new environment. Now we can import pickle and load our saved Pipeline into the "pipe_from_pickle" object. We also need to create the X_new object in our environment. Finally, we can make predictions using the saved Pipeline. If that process seems too burdensome, you can actually simplify the process by using a Python library called cloudpickle. cloudpickle extends the functionality of pickle to allow you to save user-defined functions. All you have to do is to install cloudpickle using pip or conda, import it, and then save the Pipeline using cloudpickle instead of pickle. Notice that the cloudpickle code is exactly the same as the pickle code, except you use the dump function from cloudpickle instead of from pickle. Then, in your new environment, you'll be able to load the saved Pipeline using pickle and use it to make predictions without having to define the custom functions in that environment.
