# ML 101: Welcome Overview and a Bonkers No-Code Demo Using Orange! 🍊

## Метаданные

- **Канал:** Brandon Foltz
- **YouTube:** https://www.youtube.com/watch?v=iBmTKxYKhpg
- **Дата:** 10.01.2024
- **Длительность:** 21:43
- **Просмотры:** 1,460
- **Источник:** https://ekstraktznaniy.ru/video/52872

## Описание

🚀 Dive into the world of Machine Learning with ease! 🌟 Welcome to our exciting new video series, where we demystify machine learning for beginners and enthusiasts alike. 

00:00 Introduction
00:37 Series Overview
02:40 Installing Orange
05:35 Opening Orange
07:07 Loading Data with Dataset Widget in Orange
08:18 Viewing Data with Datatable Widget in Orange
09:58 Loading the Test and Score Widget in Orange
10:57 Running Logistic Regression in Orange
12:22 Running a Classification Tree in Orange
13:00 Machine Learning Madness in Orange
15:52 Creating a Confusion Matrix in Orange
18:18 AUC / ROC Analysis in Orange
20:10 Wrapping Things Up


In this first episode, we introduce you to the powerful, user-friendly tool - Orange! 🍊 No coding? No problem! Whether you're a student, a professional, or just curious about machine learning, this series is crafted for you. Join us on this interactive and fun journey, and start building your machine-learning superpowers today! 🤖✨ Follow along, experime

## Транскрипт

### Introduction []

Hello, and a very warm welcome to my brand new video series, Machine Learning for Mere Mortals. We are embarking on an exciting journey so you can build your machine learning superpowers. This first video is split into two parts. First, I'll briefly outline what you can expect from this series, and then we'll jump into a hands-on demonstration. I encourage you to follow along and not just watch. Don't wait to start. Start now. Plus, it's going to be short, interactive, and fun.

### Series Overview [0:37]

First an overview of what you can expect. Are you someone with a basic grasp of math, and maybe some statistics under your belt? - from my STATS 101 videos, for example - but find machine learning a bit scary. With the complexity of coding or not having enough time has stopped you in the past, this series will help you overcome those challenges. We are here to make machine learning accessible, easy, and yes, enjoyable for everyone. And guess what? No coding. Instead, we will leverage a powerful tool called Orange. Orange is free, it's user-friendly, and it's extremely helpful for anyone working with data. And it's perfect for Windows and Mac OS users alike. Orange, with its visual programming interface built on top of Python, simplifies machine learning and makes learning it more intuitive. Plus, it is backed by over a decade of dedication to education from the University of Ljubljana's, Bioinformatics Lab in Slovenia. Finally, the online Orange community is thriving and helpful, offering resources on GitHub, Discord, YouTube, and social media. So whether you are a university student from anywhere in the world, be it Cal Fullerton, University of Phoenix, WGU, University of Sydney, IIT in India, or UNILAG in Nigeria, or a highly-driven high school student, or a busy professional balancing career and batches of chicken nuggets for your family, this series is for you. Ready to make machine learning approachable and fun? I hope so. Let's go ahead and get started.

### Installing Orange [2:40]

So the first thing we will need to do is download and install Orange. You can find their website at orangedatamining. com, or you can just search for Orange Data Mining in your search engine, and it will take you here to their homepage. Usually, they will detect what system you are using and go ahead and recommend the proper download for you using this big orange button here on their home screen. However, you can go to the download menu here at the top and then select what you might need. For example, I am using Windows, so I would select the standalone installer for Windows right here, or I would select this button here that it gives me. If you are on Mac OS, keep in mind that it depends on whether you are using Apple Silicon or an Intel chip, so make sure you download the correct one based off the hardware that is in your system. So I am going to go ahead and download Orange. Actually, I have already done that, so I am not going to download it again. So I am going to download Orange. I will go ahead and click the installer because there is something very important I want to point out to you, especially if you are using Windows and have OneDrive and stuff like that. So here I am in the Orange setup. For the most part, we are going to accept the defaults except for one recommendation I have for you. So I will go ahead and click next, accept the agreement. I will install just for me in this case. Select the defaults for choose components. Here I am going to make a change. It wants to install Orange in my OneDrive folder. I don't want it to do that. Programs like Orange and R for that matter and other types of programs tend to have issues if you install that software in a folder that syncs with some sort of a cloud-based service like OneDrive or what have you. What I am going to do is install this directly onto my C: drive. That way it is just there and not syncing with the cloud service. So I am going to select browse, I am going to go to this PC, to my C: drive, and I already have an Orange folder set up. So I am just putting it right into my C: drive so I don't have any sync issues with OneDrive. Go ahead and click that folder. Click OK. So we can see it is in C:\Orange. Next, go ahead and click install and it is going to install. So this will take several minutes because it is installing a bunch of packages, it is installing Python of course. So this might take between two and five minutes depending on your system. So we will go ahead and let it do its thing and then we will come back when Orange is fully installed. OK so Orange has finished installing, we will go ahead and click next and we will go ahead and start Orange.

### Opening Orange [5:35]

So here we are in Orange for the very first time. There are three things I want to point out before we get started. Number one, I am on a Windows PC so if you are in Mac OS, the window you are looking at, I look a bit different in terms of the borders and stuff but the underlying functionality of Orange is the exact same. So don't worry about any differences in how it looks, those are simply aesthetic. Number two is that this will not be a comprehensive overview of what Orange can do in terms of its menus and so on and so forth, we will be doing that in later videos. And thirdly, this will not be proper data science or machine learning. This is simply testing out the tool, sort of taking the new car out on the road and putting the hammer down and then just kind of seeing what it can do. That is what we are doing here. This is not sort of proper technique or anything like that, this is just a demo of what we can do with it. I will also assume that you are on one screen so I will be very explicit to what I am pointing on, what I am clicking on and what I am doing so you can hopefully follow along just by listening to what I am saying. On the left hand side you will see several panels and inside these panels are widgets. So we have data transform visualized, so on and so forth. You might not have all of these because I think I might have installed a couple of additional add-ons and I will get to add-ons at a later time but everything you need to do this demonstration is already in orange by default. Let's give it a whirl and put your CPU to the test. So on the left hand side you will see several panels, click on data and then click on datasets.

### Loading Data with Dataset Widget in Orange [7:07]

That will put the datasets widget onto your workflow. Double click datasets and here you can see built in datasets that are in orange. We can use these to learn like we are doing here for a demo, to test certain workflows, similar datasets to what we are doing, we can use it as a test and so on and so forth. It is just a widget that contains built in datasets. Make sure you double click and select attrition train. I have already done so, that is why the dot to the left is solid and green. That lets us know that we have loaded this dataset into this widget. Be careful, there are a couple of other similar ones. In predict, we don't want that, we want attrition train. There is also one called employer attrition, these are all related by the way, but we don't want that either. We are going to do attrition train for this demo. Once we have done that and loaded it in, we will go ahead and close that window. There are several ways to add widgets onto the workflow in orange. So we can click and it will put it there, we can also drag it on. So go to data table and we will drag it on to the right of datasets.

### Viewing Data with Datatable Widget in Orange [8:18]

Right now, nothing is happening. That is because orange is a workflow based visual programming interface. We have to connect things and send the data across the workflow for anything to happen. So if you click on the right side of datasets where the dotted line is and just connect it to data table, we are done. What is data table? Double click it and you will see that data table is just our data represented in tabular format like an Excel spreadsheet. So in this case, the target variable that we are trying to predict under model is employee attrition. In other words, whether or not the employee left the company or did not leave the company. So if they did leave, it's a Yes, if they did not leave, it's a No. Then to the right of that, we have all of our features or all of our variables such as the employee's age, whether or not they did business travel rarely, frequently and so on, the department they were in and so on and so forth all the way down. So what we are trying to do is use all of these features, all of these variables to predict whether or not an employee left the company or not. You can think of this attrition as our dependent variable and everything else as our independent variable if you are familiar with that terminology in statistics. Go ahead and close that. So remember, we are trying to predict employee attrition. To the right of data table, click on the dotted half circle there, drag out and then you will have a search for widget box come up.

### Loading the Test and Score Widget in Orange [9:58]

In this one, either search for or find test and score. Click that. I am going to drag this out pretty far because we are going to need the room. So here is test and score. If I double click test and score, there is nothing in it, it's blank. So why is that? Well, we haven't connected any machine learning models to it yet, which we will do here in a second. All you need to know right now is to make sure that cross validation is selected, number of folds is five and stratified is also selected. We will get into it. All this stuff on the side means later. For this demo, just make sure yours looks like mine. Cross validation, five folds, stratified. And then we will get to all this later as well. Close that. So now we need to attach some machine learning models to our test and score widget. So I am going to go to the left of test and score on its left side, drag out the connection

### Running Logistic Regression in Orange [10:57]

and then select logistic regression. You can search for it if yours isn't at the top. Mine is. I will select logistic regression. And then once I do that, watch the test and score widget in the blue there. There we go. Now it's running. So what we did is we just connected the logistic regression machine learning model to our test and score widget because the data is being piped into that test and score widget. Now if I double click test and score, there is our logistic regression model. Boom. That easy. I will make these columns so we don't have the zeros floating. And there we go. So you might have heard of the AUC, which is the area under the curve. We will sort of use that for our measure when we compare models in this demo. But we will get to all the other stuff later. So here we can see that we have an AUC of 0. 795, which is actually pretty good. Let's go ahead and close this test and score window for now. Now here's where orange becomes just crazy fun and crazy, exciting and stuff like that. What I'm going to do, I'm actually going to select all my widgets and move it down to the middle because we're going to need some space. Okay. So let's go ahead and add another model to our test and score widget. I'm going to select to the left-hand side, drag a connection out, and this time we'll

### Running a Classification Tree in Orange [12:22]

do a decision tree. So put tree and it runs. We double-click test and score. Now we have our tree. See how easy this is to do? It is so crazy easy and very visual. You can see what's going on. So here we can see that logistic regression did quite a bit better than the tree when it made predictions in the machine learning training and prediction process. We can just keep going. That's just go crazy. Like I told you, here we're just giving orange a test run. So let's just do really bad machine learning and data science and just connect a bunch of stuff and just have some fun. All right?

### Machine Learning Madness in Orange [13:00]

So we're going to drag out another one under tree and it doesn't really matter. We can do all kinds of stuff. Let's do support vector machine. So SVM looks like SVM. Let it run. We won't open test and score quite yet. Let's add another one. Let's add a k Nearest Neighbor. So kNN. We'll add that. Let it run. Let's add random forests. So we'll do random forest. Let it run. There's that. Make some room here. And let's keep going. What else can we add? Let's select. What haven't we done yet? Gradient boosting. There we go. Do gradient boosting. There's that. We'll connect naive Bayes. There we go. We'll connect AdaBoost right there. Let's connect. What else? Stochastic gradient descent. We'll do that. And finally, what we're going to do now is add a neural network. But keep in mind, because of the nature of neural networks, this will take a few seconds to run. So on my machine, which is pretty powerful, it might run pretty quickly. But on yours, depending on what you have in your computer, it might take you a minute or two. So on mine, it should run pretty quickly, but we'll see. So we'll connect to this a neural network and let it run. So as you can see here on my system, it's not nearly as instantaneous as some of the other machine learning models. It isn't going to take my computer maybe five, 10 seconds to run. And there we go. So here we have, let's just stop and just say what we have here. We're loading in the employee attrition data set. We looked at it as a table, like in an Excel format so we could see the variables and kind of how the data is structured and so forth. We connect that to a test and score widget. And then to that test and score, we piped in learners. So in orange, a learner is a machine learning model. Now if we double click test and score, we have all of our models in here. Look at that. So I'm going to go ahead and see if I can make this window. I think I can. So here we have all of our models. We want to make sure that AUC is sorted in descending order so we can see that logistic regression seems to be the best, gradient boosting, naive bays, random forest and so on and so forth with K and N and tree being the bottom two. This chart right here, I'll explain in a later video. It's basically just a way of comparing models so you can tell whether or not there's a significant difference between two models, but we'll get to that in a later video. So here we can kind of see how all of our models did again just looking at AUC for now. And logistic regression, the good old fashioned logistic regression seem to do the best on this data. We'll go ahead and close that.

### Creating a Confusion Matrix in Orange [15:52]

To the right of test and score, we're going to drag out again. Here put a confusion matrix in so you can search for it or you can just select it if it's visible in the list, confusion matrix, double click this. What this is telling us is how well our model did in terms of the actual data versus what the model predicted. So for logistic regression, make sure we have that selected, we can see that in our data set in total, there were 1,233 Noes in our actual data sets, we're looking across the row here, there were 237 Yeses in our data set. So quite an imbalance, but we selected stratified, so hopefully this helped the model. Now out of those 1,233 Noes in the actual data set, our model predicted no, our logistic regression model predicted no, 1,203 of those, and then predicted yes, 30 of those. So you can think of it that it got 30 wrong when the employee was a no, the employee did not leave the company. But our model said 1,203, no, but 30 yes. Same thing is true for the bottom row. So in our data set, there were 237 Yeses. The model here predicted yes, 76th of those, but predicted incorrectly, 161 of those. So this is what's called a confusion matrix, it just tells you how well the model did relative to the actual data that we fed into it. Okay, and we'll get into all this again in later videos. So it did really good on the Noes, but not all that great on the Yeses. And again, part of this could be because we have so fewer Yeses in the actual data set. But you can look at all these, K& N, Random Forest, and see how each one did relative to each other. So we can close that, and then I'm going to make some more room here by selecting everything and just moving it over. We'll do one more thing in this demo. The next we're going to add one more widget to this, we'll go to evaluate panel on the left hand side, and we'll click ROC analysis.

### AUC / ROC Analysis in Orange [18:18]

That stands for receiver operating characteristics, but we won't get into that in this video. I'm going to do something wrong, so I'm going to show you what orange will do if you try to connect two widgets that don't go together. If I try to connect confusion matrix to ROC analysis, it won't do it, okay, it won't do it. That's because the input of one widget has to match the output of another, and we'll get into all that again later on. So I'm going to move ROC analysis above confusion matrix, go to test and score, click on the right side of test and score, connect that to ROC analysis, and then it will go ahead and connect. If I click double click on the ROC analysis, here you can see all of our ROC curves. All of them are selected by default, but we can go ahead and just click, you know, ones that we want to see. So here we can see logistic regression. It was the best that we had. If we click the worst, so I think was it neural network was pretty good. What was one of the bad ones? All K and N was pretty bad. So here's K and N. You can see the difference here. So ideally what we want this to look like is for the curve to technically go straight up and straight over. But the closer it is, the more the curve is bent towards the upper left, the better it is. I should also keep in mind here that we select target no, we can also, so this is looking at whether or not the target variable was a no, we can also select yes, to see the different performance based on that target variable here, right? Just like that, and we'll go ahead and close this for now.

### Wrapping Things Up [20:10]

So here, what we have done, of course, I've been explaining the whole time, but if I were just doing this without all the explanation, we have imported a data set, looked at it in tabular form, connected it to the test and score widget, and then in a matter of a few seconds, connected 10 machine learning models to it, got all the evaluation metrics, and looked at those, not only in the test and score widget, but also in the confusion matrix and in the curve analysis. So if I were doing this without any speaking, this could all be done probably what? In a minute, two minutes tops, and that's the beauty of orange. So of course, there's so much more that can happen in orange, and we'll get into that obviously in many future videos, but I just wanted you to see how easy it is to use, how visual it is, because you know exactly what you're connecting to what, and how you can do a lot of different things based off very specific nodes that connect to each other that a whole lot of back and forth, and of course, there's no code in this, right? So again, this is just a demo of orange, and of course, we will get into all these nuances and other different things in future videos. So I hope you enjoyed this first video in this series, and look forward to seeing you again in the next one. Take care. Bye-bye.