# Hyperparameter tuning in Fabric Data Science

## Метаданные

- **Канал:** Azure Synapse Analytics
- **YouTube:** https://www.youtube.com/watch?v=eXZLxRPOzzQ
- **Источник:** https://ekstraktznaniy.ru/video/44741

## Транскрипт

### Segment 1 (00:00 - 05:00) []

hey everyone welcome back to our Channel dedicated to Microsoft Fabric and this is the special series about data integration data engineering and data science within Microsoft fabric today Misha joins us to discuss another topic related to data science related to machine learning that is hyperparameters studing M thanks for joining us yeah thanks for having me so how should we start maybe just with a question why I need hyper parameters tuning what is hyper parameter what are the methods of tuning yeah and so hyper parameter tuning is a you know often really complex process and so typically you know in the machine learning workflow you get your data you do a lot of work to prepare it then you spend a lot of time figuring out what are the right models what's the right way that I want to train this you go through an evaluation step and then you finally do like the deployment and the impressing steps but often what we see is that a lot of projects fail in this training evaluation process and a lot of times they fail because you don't know how to best optimize your model to fit your data and to even give you know drill into this a bit further so you know in the machine learning workflow we see that there's often a lot of different decisions that you have to make to better tune your model performance the first thing is that you're often selecting a model and you know that in of itself there's so many different Frameworks out there um but even figuring out which is the best framework is a challenge in of itself um but even once you've selected a framework right you've said hey you know I want to pick a light GBM model there's often a lot of different hyperparameters to tune and Hyper parameters you can really think of them as the different toggles within a model that help you better fit the model to your data and so let's take an example in the first step one of the first things I have to do when training a model is select the model and so here's a few different Frameworks that are available for you know that are most popular within the machine learning workflow so things like psych learn XG boost like GBM these are all different model Frameworks that you can use um to start your model training process but even once you've selected the model you often have a lot of different hyper parameters that you have to pick from and so in the case of a light GBM model I can be tuning things like the number of estimators the number of leaves the learning rate these are all different settings on the model that you kind of tweak the performance that you might get and better fit it to your data set so right you can think of do you want your say decision tree to be very Broad and do you want a lot of different nodes for things to be classified into do you want it to be a bit more succinct a bit smaller tighter maybe less levels but might be you know more broad for some of the other kinds of data that you have and so often this whole process of figuring out how you set these hyper parameters with the kind of models that you want and the data that you have um this is again where we see a lot of projects fail and things that we want to kind of be able to provide customers data scientists a tool to be able to help better optimize make some of these decisions that makes sense last time we recorded espresso episode that was about Automated machine learning I would love to ask just to bring the clarity how hyperparameter tuning relates to automl yeah so great question automl and Hyper pramer tuning they're both capabilities that are available through Flamel uh but they both kind of serve kinds of use cases and so for hyperparameter tun you can really think of this as you already know the model that you want to start with right in this case I might start with the light GBM model I just optimize that specific model for my data um in the case of automl right you're starting with a data set and task right that could be regression forecasting classification task and you're saying hey I just you know you tell me what the best model is and you do the tuning for me and so really they're both two different tools that you have in your kind of data science toolbx if you will you can kind of pick which might be best suited for your particular use case and where you're starting from got it so the last time again we discussed aoml and I asked a question about what's the building block the heart of outl and that Flamel and what's the heart of the methods the algorithm the solution that stands behind hyperparameters tuning in Microsoft fabric data science yeah so again like Flamel is a open source library that we've taken it came out of MSR and they've done a lot of work to really figure out the different kinds of ways that you can explore and optimize your hyperparameter search spaces and so some of the things that they've really focused on is how do we come up with a very fast economical lightweight approach to searching all of these spaces um and so what we've done is we've actually taken this open- source project Flamel and both its automl and hyperparameter tuning functionalities and we've deeply integrated it into the fabric product and the data science

### Segment 2 (05:00 - 10:00) [5:00]

experiences and so a few of the things that we've added in and contributed towards is um automating you know taking this framework deeply integrating it into the runtime and so as you access any of the run times with spark 3. 4 and above you'll see that this hyperparameter tuning functionality is automatically available it's available through the Flamel library and so again this process really helps automate the process of optimizing your machine learning models some of the things that we've added support for is also support to parallelize your hyperparameter TR and so a lot of times right if you're training a bunch of um single node models instead of tuning one at a time some of our thinking was well hey if you have access to a whole spark cluster why not use all of the nodes in your spark cluster to parallelize your training right so now you can train optimize multiple models at the same time the other things that we've added support for are the ability to easily tune snap smell and Spark ml based models and so a lot of times what you'll find is that sometimes your model is not small enough that it can fit into a single node and so this is where support for spark-based models is really important in that you can now tune your models especially when they're at spark scale so you don't need to worry about figuring out how you can fit your spark data frame into Panda's data frame the new things that we've added into the hyper parameter tuning functionalities in Flamel it lets you take a spark data frame and automatically tune that as well and then the last we've also added support for integrating this all with mlflow right so as you're exploring these different hyperparameters these different properties of the model if you will all of the metrics all the parameters these are all automatically going to be captured using mlflow and the hyper parameter tuning capabilities we have can you help us to understand what is synaps ml because it's the very first time that viewers of this channel may hear that term can you help us to see what's this Library what's this tool about so yeah let me show you snaps ml is a library that is available it's open source but it's one of the primary spark based machine learning libraries and so contains a lot of different capabilities things around support for cognitive services and so it provides different interfaces for you to access um cognitive services to a you know what we see as some of the most popular things is the spark based implementation of light GBM and so it's really our spark-based library and so it has a lot of different functionality for spark based tools machine learning scenarios and so and definitely get take a look try it out there's a lot of fun kind of things that you can do with synaps ML and it's also open source yeah so a lot of the kind of capabilities as you're accessing them in fabric you can also access it locally on your machine as well now is it a time for the demo to see hyper parameters Tuning In Action yeah let's take a look awesome now at how you can tune your hyper parameters and your machine learning models and fabric first let's take a look at our data so first we'll start by loading our data and so in this tutorial we'll be using the psychic learn California Housing data set this contains information about the housing values across various districts here we'll load our data into a spark data frame and quickly display it to see a quick visual of what our data looks like now let's get ready to train our model we'll split out our data into a training and test data set and we'll then set up our mlflow experiment to track all of the different results here our mlflow experiment will allow us to track all the different iterations that are attempted by our tuning trial in this tutorial we'll be working with the snapl light GBM model in the cell what I have is a training function which takes in the alpha value the learning rate the number of leaves and the number of iterations all different parameters that we can use to tune our snaps mlite GBM regressor model this function also takes in our training and test data sets so as you can see this training function will create a light GBM Reg pressor by passing in the corresponding hyperparameters fit the model to our training data set and then generate the predictions and final evaluation metrics in which case here we'll be looking at the r s value so now let's take a look at our Baseline model this model contains a designated set of hyperparameters that we'll be using to compare some of our tuning results against and so here I'm passing in some arbitrary parameters we'll be logging our r s value and once the training complete we can start looking at some of the corresponding metrics and so here we can see that by randomly kind of estimating some of these parameters we'll get an R squ value of about 51% so not great but obviously an opportunity for us to tune and better improve some of the results of this so now let's tune this model we'll first Define a tuning function this function takes a dictionary of configs and then passes this back to our training function that we saw earlier it also returns our evaluation metric which we can then use to select your best hyper

### Segment 3 (10:00 - 13:00) [10:00]

parameters we'll also Define a search space and so this defines all of the different parameters that we want to explore and the range of values that we can take a look at you can learn more about the different configs and how you can set up your search space through the Flamel docs here finally we'll Define our hyperparameter trial this sets all the different settings that we want to use when we're actually executing our hyper parameter trial and so here I could set things like the budget the metric the mode that I want to run this hyperparameter trial in and so here I want to maximize the R squ value and using some of the inline exploration tools that we have we can investigate all of the different metrics and parameters that were used for each of the different trials that were attempted and so here I can see that I was able to increase my r squar value up to 81% using the following set of configurations I can also explore all the other configurations that were used we can also use the visualization module in fabric to compare and visualize our results here I can create a parallel coordinates plot which allows me to see the set of parameters that would yield improved results in my hyper parameter trial now finally I want to compare the results on my final test data set so what I'll do is I'll train the model on the final set of configs that were generated from the tuning trial here what we can see is that on our test data set our initial model our Baseline model here returned an R squ value of about 51% our final Flamel model that went through the tuning process was able to achieve an R squ value of about 81% so here we can see significant improvements in our tuning process once we've completed our hyperparameter trial we'll now save the final tuned model as a machine learning model in fabric this will allow the model to be versioned and tracked seamlessly throughout the fabric life cycle thanks for the demo so something super complicated has been solved in just a single feature like hyper parameters tuning now I would love to ask what's the stage for that functionality is it g ready yeah so it's currently available in public preview and so um you should be able to access it on any of your runtimes using spark 3. 4 and above so we'd love to hear your feedback through the ideas Channel you know let us know if there's missing features things you want to see support for we'd love to hear and really improve some of the review before our G release awesome Misha thanks a lot for joining and those who are watching us please uh remember to leave a comment to hit the like button and I'm looking for my like icon yes hit the like button if you like that episode share it with your colleagues co-workers who are maybe using and working on data science solution and can discover the capabilities that are coming out of the box of Microsoft fabric so until the next time happy tuning hyper parameters for your ml models without an effort thanks a lot for watching thanks so