Using Streamlit To Make GUIs for your Kedro Parameters!

Using Streamlit To Make GUIs for your Kedro Parameters!

DataEngineerOne

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

this video is brought to you by ketro. community stick around to the end to find out more what's up data pipeliners welcome back to another episode on writing data pipelines with kedrow in this episode we have something super cool we're going to connect streamlit with your ketro project let's go ahead and get started now if you haven't heard of streamlit it's one of the fastest ways to build data web applications it's a super cool library that allows you to construct web applications that you can use to interact with your data science and data modeling using very simple python code so if you look at the example here they've got a great example on this website this is streamlit. io i highly recommend you taking a look at this documentation because there's some fantastic stuff that this thing is capable of in today's video we're going to be using streamlit to control parameters inside of our kedrow pipeline this allows us to interact with our ketchup pipeline in a way that we actually haven't been able to interact with it before previously we would have to manually modify parameters in the parameter. yaml file but by using streamlit we can actually have a web interface that'll help us modify the parameters instead so the pipeline that we're going to be using today is a very simple pipeline it's based on the titanic data set so i'm going to just show you very quickly what this looks like and then we're going to go ahead and get into the streamlit code that you're going to be using to control the pipeline i have here a set of titanic nodes we have the ability to filter by age and then filter by fare so if you are not familiar with the titanic data set it's basically a data set that contains all of the people who'd perished in the titanic accident and all the people who survived we have their ticket prices their ages their gender etc and so what we're doing here is we're just creating very simple filters that'll allow us to filter the data set by an age range and by a fair range how much money they paid for their trip now these two nodes go into the pipeline just in order very simple we have the titanic data which is of course our beloved titanic data set this is this one was provided to us by kaggle and we actually have a two parameters so this is the minimum age and the maximum age so these are the two ages that are available for that input uh then we just connect them together so it's a very simple pipeline moving to the parameters. aml file we have these four parameters here the max age of min age max fare and minfair now this pipeline is run by just using the titanic create pipeline we just call ketro run dash pipeline titanic and we get the titanic running now in order to operate with streamlit streamlet actually comes with its own command line interface which means we're actually going to forego using the cadre command line interface and instead opt to use the streamlit one now to write streamlink code you have to create an entry point so this is the point that you're going to use with the streamlit cli in this case i've written one here called streamlit entry dot pi now in order to get a streamlit thing running we have the first of course import stream width so we're going to do a streamlet as st and for this case we're also going to import plotly to help us do our plotting as well as we're going to import yaml and our load context from kedrow now the way that streamlight works is that you create the webpage based on the order of the library calls that you create so for example when i do st. title and i can make it titanic this is going to be the first thing that shows up in streamlit and i'm going to go ahead and run streamlit for you just to show you what that looks like we're going to do streamlit run and then streamlet entry dot pi this will take our streamline code and then turn it into a web page and so here on the right you can see we have our titanic title let's go ahead and add a few more things here we're going to have some sliders that will allow us to adjust the minimum fare the maximum fare the minimum age and the age what we can do is we can just save the file and if we go over to streamlit we can just click on rerun and it'll actually parse that file again and show us the new output and so here we have titanic with the min fair the max fare the min age and the max age we can also create arbitrary text output in this case i'm just going to give us give back a message tell us exactly what we're going to be filtering by re-running streamlit we can see we are

Segment 2 (05:00 - 10:00)

filtering fares between zero and zero ages obviously we don't have the correct fare and the correct age but this is where kedro comes in and it's very easy for us to grab that knowledge from our kedro project catalog using the load context i'm going to create a new ketro context inside of this streamlit entry file and then i'm going to load the titanic data of course in our catalog. yml file we have a titanic data key which points to the titanic data training set i'm going to do a very simple data analysis and we're going to get the minimum and maximum values of the age and the fare next i'm going to go ahead and add those ranges to our sliders and now we're going to rerun our streamlit application as you can see in the bottom left-hand corner we are loading our titanic data set so it actually is using our ketchup pipeline to grab data for our streamled application and here on the right hand side you can see that the minimum fare and the maximum fare already have this new value as well as the age and so the oldest person on the titanic was 80 years old and the youngest person was less than a year old now of course the max fare looks a little bit awkward we're going to go ahead and set the value for the maximum fare to be equal to the found maximum fare and the same goes true for max age we're going to save that and then re-run our streamlet and here we go the sliders have been updated now we have a proper range we have the range between minimum and maximum and you can see here it modifies the message as well so streamline actually auto updates these things every time you make a change which means that we can rerun the pipeline every time we want to change any one of our parameters automatically now the next thing that i'm going to do is i'm going to actually run the pipeline so the pipeline again is the titanic pipeline and the output is going to be this titanic data filtered and so again this will filter our titanic data based on the age as well as the fair ranges in order to run this pipeline inside of here very simply all we need to do is use context dot run and then we give it the and the pipeline name is going to be titanic now what's interesting here is that when you run a pipeline like this if your output is an in-memory data set and in our case it's true we do not have this entry inside of the catalog we only have titanic data it will return that data from the context. run function so instead we actually get a dictionary that contains all of the in-memory data sets that it was left with so the output in memory data sets and in this case our titanic data is going to be this titanic data filtered where we pull it from the output data dictionary now what we're going to do is we're actually going to plot this data so let's do a very simple plot which will show how many people survived and how many people died first we get the survivor count and the deceased count and then we create a plot we figure now using streamlet's generic write function i'm going to go ahead and write that figure directly to the page and now when we save this file and rerun our streamlet application we're going to see the plot of the people that survived and diet and here it is at the very bottom this is the total survivor count and deceased count however if you notice when we modify any of these parameters here nothing on the graph changes that's because we haven't connected the parameters with our catcher pipeline and so here's the secret sauce in order to modify the parameters in the ketchup pipeline all we need to do is just open up the parameters yaml file located in the local configuration so we're not modifying the parameters from our current configuration we're just modifying them in the local configuration meaning that we don't overwrite anything unnecessarily then we use yaml to dump the object that includes the fare and the age parameters and again these parameters are coming from the sliders so every time we move the sliders we get a new min max for the age and the fares now that we're dumping into the parameters at yaml local the context should be automatically reading

Segment 3 (10:00 - 13:00)

this parameters. yaml file and when it reads it of course it's going to change the parameters that are available so let's go ahead and give this a try we're going to save this and we're going to rerun our streamlet application and here we go we actually do have a difference we're filtering between the ages of 11 and 27 and we can see that most of these people in this age bracket of 11 and 27 were able to survive let's try and increase the maximum fare and let's see if the graph changes yeah it actually shows more people who died so it's interesting that the more expensive tickets were actually not helpful and here we have a maximum age if we move that maximum age over to the 80 then we also change this min age to about 60 or so we see that most people who are too old weren't able to survive the accident so as you can see this actually is super cool it works perfectly well so in my opinion using streamlit in combination with ketro is one of the coolest and best ways to play around with your parameters now of course there's going to be a caveat where you have to rerun the pipeline every time but you probably would have had to do that anyway if you were going to be modifying your parameters in a yaml file so by having streamlit become a nice little web interface it makes it clean and easy to play around with your parameter creation and of course we're using plotly to do our plotting here so the number of ways to showcase your data are effectively limitless now this code right here is actually going to be left available on ketro. community this is a brand new website that is a discourse forum that allows you users of quetro to congregate come together and discuss ketro topics it was started by a small group of us of the kedro community and we really hope that it serves as a place that you guys can come to if you want to ask questions about quetro talk about petro or if you want to even share some of your own ideas and your own tutorials about ketro so why don't you come on by take a look at ketchup. comunity where we can answer your questions you can answer our questions and we can all play around we actually have a ton of fun on this community as you can see here this is a new post that i made i actually created a snapchat filter for pedro pipelines so if you want to show off to your friends your kedro love you can actually use this snapchat filter you can even use it in zoom calls if you download snap camera so check this out here i mean like this video by min yusuke is just phenomenal he actually i didn't know that he did this he created an interface for his switch fit ring to talk to a mega man game so he can play mega man with using this like exercise ring which is truly remarkable there's a lot of really fantastic things that people have done and having this community allows us to share some of these cool things that we're doing thank you very much for watching today i hope to see you guys at the kedro community and if you enjoy this content make sure that you button that like sub that scribe and ring that ding if you want to know when we are pipelining and i'll see you guys next time take care bye you

Другие видео автора — DataEngineerOne

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник