# Forecasting with Prophet and TidyModels

## Метаданные

- **Канал:** Andrew Couch
- **YouTube:** https://www.youtube.com/watch?v=OIQPIefDxx0

## Содержание

### [0:00](https://www.youtube.com/watch?v=OIQPIefDxx0) Segment 1 (00:00 - 05:00)

hey y'all it's andrew cash here and this time today's video i'm going to be going over forecasting with profit so i've made a few videos on forecasting and i wasn't very happy or happy about the types of content i created through it just because um oftentimes i couldn't find a good data set for it and also it really didn't represent what i think is modern so in my previous forecasting videos with like i think model time or fable i was mostly going like over arimas or arma models but now uh since i've done a decent amount of forecasting i actually don't do these traditional or ema forecasting models instead i actually use profit in fact i kind of recommend that we should basically always be using profit unless if you have some specific thing that you want to really work on but i figured i'd go over profit and also using kind of the tidy models infrastructure or framework around it so i figured this would be kind of a short quick video going over you know how simple it is to use profit to forecast data so i'm gonna open up a our markdown document i'll say uh tidy tuesday uh profit okay and um i am actually using their data set um from profit so let me load that in and it's called the uh it's called example retail sales um okay so this data set i believe doesn't have any other exogenous variables so i think we're just obviously predicting some type of retail sales um there's a date com which is called ds and a y com or a column where we're trying to predict i'm assuming like their monthly retail sales um one package i actually like using when i'm doing forecasting is loading in the lubradate package and this comes with like some nice helper date functions so right now i'm just gonna create a little tester and i'm going to create a as uh as a date right and let's see if that actually do it she doesn't so we can just do month a day and a year right as that date right so month yeah there you go cool so we have our date um and i'm going to say clean df and let's actually do some plots so just do a simple line chart and we can see we have a nice little um trend with our sales uh obviously we can see like some type of monthly component or some type of seasonal component but overall our average uh monthly sales are is also increasing and that's actually something we kind of see a lot um when we're doing some forecasting so i think when people want to do some type of forecasting problem is generally with like some type of sales or products or views and stuff like that and generally that type of stuff that we're trying to forecast is actually going to be moving or non-stationary and that's actually kind of a problem with things like arimas where we kind of have an assumption that it's non-stationary or where the average value or the like the average is not changing if i add like a i don't know like a you know smooth you know and just do that we can see obviously you know the average with the linear model is saying that this trend is increasing through time so that can be a problem if we're trying to forecast this type of stuff luckily with profit we don't actually have to have an assumption of stationary non-stationary data it's kind of baked into it in fact we actually can change our specific methods for um how we want to assess the trend but let's actually go in and see what we would want to do uh when we're approaching like a time series problem so one of the things that we can think about uh when we're working our clean d up is if we did a basic um uh split so if i want to do like a uh wrong way wrong split it's a wrong uh and say initial oh oops did i not get to anyone else oh i forgot to do tiny models okay geez okay uh so if i do it an initial

### [5:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=300s) Segment 2 (05:00 - 10:00)

split and we do our cleaned df if we look at our actual training test datas so if we do a training wrong split right and then we can say like type equal to train and we'll just kind of do that and i'm going to basically combine these up splits with the respective training test sets so i'm going to do that test cool and we're going to make the same plots right we can see that this actually isn't good for our are the types of um data that we're working with so you know generally we want to have randomly sample data sets so we don't have any um problems we can't have a good representative you know features for our models but when we're dealing with these um time series problems we can't split it in uh the traditional way you know we always have to think of a time component and that's where we really need to think about is like data leakage so if i created like a generic linear model i'm sure it would really quickly overfit because we have some data leakage where we can have access to future data um while also trying to predict you know something less uh in the future so luckily uh tiny model is kind of aware of this so we actually have a initial time splits um so instead of doing that uh was it to clean df right we'll do the same thing so we'll say correct splits and then we'll just uh do that so this is the correct way to split your data because it's kind of taking uh if we say arrange by uh ds we can see that now it's doing what we wanted to where we're training a model data with uh with data that's not in the test data set for that time period uh so that way we don't have any data leakage and one thing that i kind of would want to say is that you know just because you're not doing it a direct forecasting problem you know just because you're not forecasting you know a univariate time series where it's just like some type of sales um it's still useful to think about when you're assessing your features so um when i'm creating some models i generally will look at these features that i'm using such as like age stuff like that and say is there like a time component that i'm you know missing out on that's causing leakage um so it's good to kind of think about that so oftentimes you actually might be using like a initial time split that doesn't have directly like a day component that you want to use so um that's always a good thing to use um but let's actually go with profit where we can actually do a few different things so right now what i want to do is just show everyone you know what does profit look like what's the workflow for it so generally when you're creating a forecast uh you want to give it the data set so we're going to give it a clean df assume it's arranged in the correct order from date and what's interesting about uh like forecast is that you're generally always refitting it to basically predict the next period so you know right now we have uh this oops this data set if we do it so this is like for 2016. so for like uh jennifer was in like in the next few months we'd be refitting it once we observed more data because that's how it kind of understands how to forecast the next uh next period um so what's kind of useful about profit is once we have our model we actually use that model to create what we want to forecast so in this case we're going to use the make future data frame and this is essentially the periods that we want to forecast in the future so we pass it the model which is m our periods and with the periods we can say okay maybe we want to forecast like six months out um we have to also just specify our period so obviously right here they have day week month quarter year you and but you can also go like seconds minutes hours um in this case we want to do month and lastly we have include history i'm just going to include the history right now um and what we call this usually is the future and then you can make your forecast by just using the simple predict the m and

### [10:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=600s) Segment 3 (10:00 - 15:00)

future because you're using your model and your you know essentially your data set that you want to predict on and we usually call that uh forecast so if we look at our forecast you can see we actually have a lot of stuff we have our time our trend our additive trends our yearly components stuff like that and then we also have our lower and upper bound and then our actual prediction it's actually nice if we have a plot uh what is it called a profit uh plot components and then forecast so if we do that probably plot components we can see uh we have our like our uh you know trend component so we can see what's driving our trend uh when you see like our deseasonalized um stuff you can also i think just plot out uh m and forecast right and then right here we can see that you know it's fitting uh it's oops i actually know it's fitting the model and it's also showing what it was seeing um from the past data sets or best past uh observations so we see that it fits it pretty well and this is what we're trying to forecast right over here and yeah it looks it makes sense it looks nice this data set is extremely like seasonal it's ext and it's pretty consistent with the seasonality so this is a perfect data set for our model to run on one thing that we also want to look at so not only do we like our components right here that we're forecasting but we also can add um our change points to the plot and this is basically saying like when we're creating our models or our forecasts where is it seeing a change in the trend and that can help diagnostic diagnose things where hey maybe our model is picking up something weird that we don't want to maybe we should do some like regularization and change that or reduce the amount of change points or hey we can see that there's a change point and that's actually making a lot of sense where else do we see a change point that maybe the model is not picking up um so right here um profit again is pretty useful since it kind of can automatic it automatically detects it um which requires basically little to no tuning at all but it's also nice that what's also nice about profit is that we can actually specify specific change points in the future so if we're expecting you know some crazy monthly or marketing campaign for a product we can basically say okay model um this thing uh the thing that we're trying to forecast right now it's gonna have a change and probably can hopefully an increase in like say like product sales so for this we could say like we're gonna run a crazy black friday sale and we're gonna push all of our sales people we're going to increase huge commissions on it so they should be pushing out a lot of the product um we can actually anticipate that and put that into the model um pretty quickly um so let's actually go over um how would we like tune a model if we uh kind of want to start doing that so first we can see our actual components right here we have like our change points our n dot chain points and stuff like that in general uh you can change your uh scale so when we do a change point so there's three things we really want to change point prior scale i'll just say 10. um season seasonality prior skill which i'll say 10 and a holidays prior scale um there are a few other components too so in case we want to add in our seasonality mode we can have an additive or multiplicative in this case uh we actually might want to consider uh changing it right um because it turns a little bit different but in general the three points you want to add in are your change point prior scale seasonality prior scale and holiday prior scale um and how i remember it is it's always like a prior scale and then seasonality mode is something where you kind of want to look at the data set and change it yourself um you can also do the change points um so change points is the oops change points is like a vector of dates so that is where you would basically be inputting um future uh trend changes that you're going to anticipate so like if you're doing some type of huge uh change in like a business uh that's where you would be putting it into so again that's pretty easy to do but let's actually go over say how we would change um our models or just kind of see say like how would we assess our models um in uh in a you know in a machine learning framework so you know when we're looking at our data set right here we can see that we have our terrain and test set but you know one of the things we want to think about is like well what if we want to try multiple parameters right um and it doesn't seem the best way to do

### [15:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=900s) Segment 4 (15:00 - 20:00)

it if we want to just assess it on this data set or on this test set right here you know it'd be nice to be able to have you know multiple uh you know folds to tune a model and assess its uh performance um luckily we actually have that with the uh tidy models and it's called rolling origin um and with rolling origin we have our data our initial and our assess the default is cumulative but what it's basically saying is uh we can have essentially a kind of a cumulative data set so in this case uh if we can say like we want to take you know this from 2005 and uh older we'll train uh we'll use that data set and then we'll also predict for the next year so i'm just gonna go do this right now so we'll say initial uh we'll say give it a year and assess we'll give it you know six months and if we uh if i do a train a map splits an analysis uh test equals map splits assessment and i'll do a pivot longer just so i can show you guys what it looks like okay let's see that looks like and then i can do a uh maybe i'll do a um and then i'll say filter let me count the id first okay so then i'll say filter uh filter id is in slice 001 slice 0 2 slice 03 right cool and now you can see kind of an example of this where it's essentially rolling it's saying like first our first uh data set we'll have 52 um samples and then we'll increase it and we'll assess the next six uh six periods so right there you can see how it's changing um stuff like that um again if we count our ids we can see that it's increment uh increasing by one so it's saying like we're going to have these 58 or 52 uh and yeah 58 uh samples that we're trying to predict on or train and predict on uh and then we'll go up by one more increment and what 59. we'll go up one more 60 and it'll go on and go so as you see that's pretty useful for what we're doing so we essentially have that many um there also are different ways to do it so instead of just having rolling origin where it's doing it by one period um we can also do a uh was a slide siding period oops sliding period in this case we can kind of have it by obviously a period so if we want to do say like uh the sliding period um our index is our you know essentially what we're going to be sorting or kind of being basing our period on and then we could say our period which could be i think let's see we'll say like year and it'll look back oops look back uh can be like infinity intercess like stop can be i don't like one right so now we have like i don't know um we have a different way to sample it um in that case it's more by a year than by index um but you can see that you know there's different ways to skin a cat for it and there's also sliding index which is more about like you know the numbers and stuff like that so if you want to specify a certain amount but um again nothing too interesting maybe i can show you guys a uh train data because map splits training or it's uh analysis now test data assessments do another thing right there filter id is in c slice a one slice o2 slice o3 a nest

### [20:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=1200s) Segment 5 (20:00 - 25:00)

and we'll do the same plots all right so you can see it's going by a year um so yeah again uh this might be a lot of stuff but the functions are pretty self-explanatory um i think the trick that i learned from it is using lookback as an infiniti but let's actually go over training a model and then going over some model metrics for first thing so in this case i'll probably just do the this thing right here um just because it's like 24 so let's say like a uh time i'll say time series k folds right so now that we have our time series k folds uh we can tune two models so i'll just create a helper function say um tune profit function splits and we'll just do the same thing so train data can be like uh analysis splits s data equals assessment splits and then our profit model will be that df equals train data and then i'll be like our m1 and our trend will be seasonality will uh additive okay so we have that model um and then we do m2 which would be uh how do you spell it which is uh season now he's in high mode and then we can do a future make future data frame uh m1 periods it's equal to and row test data freak is weak and then we'll say include history equals false and then we can do our predictions so m1 future uh and what i always do is just select the yes and y hat or the date and the prediction and we'll do it with the other model right uh in this case we can do a few different ways to do it uh i'll just make this more of like a tidy way so i'll say type is equal to additive and that type is multiplicative and we'll just bind the rows and put that in and lastly we'll left join our test data i it's uh by the date cool and obviously there are different ways to do this so we could create a generic tuner function so like tune profit uh like two functions splits and then it will say like season type right and then we can just uh do that and switch out with the m2 seasonality mode to season type uh and then we'll just do like i don't know do that and then do that so right here again different ways to do it um oops but you can see that you know both ways can do the same thing however i'm kind of more har hard coding it just because you know i know what seasonality or what primers i want to tune but again you can just do it this way it's cleaner but it depends on like i guess the uh how you want to do it but we'll probably just do some profit because it seems like the easiest thing to do and we'll use our time series cam fold so say time series i'll say ts tune map the a function or map the splits to tune profit and we can get the results right there and we'll wait we'll get some different seasonalities coming in

### [25:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=1500s) Segment 6 (25:00 - 30:00)

okay it should be hopefully it'll be done soon uh another thing that we could have done is basically added in a progress bar um but let's see your time series capable it's 24 but yeah we got everything cool so you're gonna get some of these errors where it's like uh the seasonality is different um you can say like suppress messages or suppress warnings um and get and you know prevent those from coming out so i just say select id and res finesse oops uh-oh assessment splits it's here ooh did i uh let's see here look uh splits one analysis and then we'll do this one too where we do assessment 1993 yeah that makes sense so web 9193 date 1992. okay about two yeah 24 1992 to 1993. okay what's going on here um so it's saying predict m select dsy hat mutate type test data uh it's here tune profit uh that's a little odd all right time series k folds is clean df dsy hat range by ds period as year look back to infinity um okay i'll just do a uh bind columns let's do this again let's do that and profit and we'll do it one more time in fact actually i'll do a uh a slice ale n equals 10. yep yes tune and then we have what is going on here oh it's not lining up okay that's interesting um okay what's your test data is that oh freak is weak it's a month that's why okay that's just a little typo because i usually do with uh use weekly data and i totally forgot that this is not weekly so we can do that uh brick month do it again sorry about that but this is kind of a good video to highlight um just some live more live coding and uh find out where i was making the errors and stuff like that so um okay cool so now we have our uh data set um one thing i actually do like doing is i like grouping by id and the type or the parameters arranging by the uh the date and also creating like a forecast field this is assessing like how far out our the forecast is equal to row number right so after that we can use our forecast oops forecast type d s y hat and y and there are different ways that we can assess this so one thing we can do is just do a group by forecast type and then we can do a you know rmse

### [30:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=1800s) Segment 7 (30:00 - 35:00)

uh truth equals y estimate equals y hat and we can assess that through saying like you know uh group by forecast you know slice min estimate right and kind of count that we can see that our multiplicative has less error but another thing that i like to do is at the end day i like to try to beat a specific baseline so in this case when we're doing um this with forecasting it's generally nice to have a baseline um in like classification it's saying like you know if you predict the uh the the class with the highest frequencies um so the baseline for that is actually like naive forecasts so with the naive forecast you're just predicting the previous um period um so we can actually do that pretty simply by going to let's say you're clean df um i'll just create a naive right there so mutate naive equals lag y n equals one order by yes drop that name look yes and naive so this is a weekly naive or a monthly naive uh forecast so we'll just do a left join i equals yes um yeah it's right there so this is essentially what uh what we can do and we can use what's called the mean absolute scaled error which is the mean absolute error for your forecast uh divided by the mean absolute error of a naive forecast um i like to do it this way just because we have a little bit more control over um saying what counts as a naive forecast so there are different ways to do it um for me right now i'm just doing a simple you know naive forecast where you assume that you can just predict um the previous month's data but if we're predicting you know six months out you don't actually have you know say if i want to predict in december i don't have uh november's data so you know how you can say like what is our lagged or what's the uh the naive value that we're gonna use to forecast uh can just depend on what you're gonna do so in this case we're just gonna do the most um conservative amount where we're basically gonna give our naive forecast the most power where essentially the naive forecast is kind of doing some data peaking but there's also ways where we can do like um uh some other things where like we can say um uh we can do like a group by the id and stuff like that and select you know the most recent month as a naive value but what is it do it like this so forecast type summarize m a e mean absolute um y hat minus y by mean absolute naive minus y so right now we're calculating the mean absolute scale there and we can see that you know maybe we do a pivot wider names from type values from equals mse and just do like a comparison plot or something so x equals say additive y equals multiplicative gm tab line or like a little y equals mx plus b line uh gm point and then a uh chord is of observer spread so we can see you know where our uh our forecasts are doing well and stuff like that um cool so let's do that uh mean absolutes get here and for this case i'll just do like maybe a group by type and get the like median uh mean absolute uh scale there so say median equals median and see and we see our multiplicative uh has a slight edge on that and that kind of makes sense um again we can also do another way to do the uh how we assess a uh a naive forecast so uh instead of this we'll just do an id right um and we'll just say uh oops so right here we can say like we'll group by our id and our uh type and we'll say our date is like ds equals we'll say our joined or naive date is our min ds right minus month

### [35:00](https://www.youtube.com/watch?v=OIQPIefDxx0&t=2100s) Segment 8 (35:00 - 38:00)

uh one right so if we go right here let's say range by id and yes uh we yeah we can see like so this is the month that we saw but our naive date is this so this will join basically our naive um forecast values onto oops on to uh our naive date so you just do that right here and instead of joining it by d yes we'll say naive d s is equal to yes oops uh oh god ungroup oh that's why okay so that way we have like a more uh a more realistic naive value and then we can do the same thing what we did right here um so we can just do a group by forecast right and we can get you know some of our forecasts right and that makes sense because now our mean absolute scale there has improved a little bit better and then our naive because our we're using a different uh threshold for the forecast okay um so let's actually look at our final models and see how it compares so we'll make a little profit model actually i got an idea i'll go back up here and we'll grab the profit right so this is our original forecast and this is the one that we found that had a slightly better um uh performance so let's say a multiplicative right so you can see it's adding a little bit more change points but it looks like it's fitting a little bit nicer yeah um that was basically it for this video i know you know we had some problems with the joining and understand you know the frequency of the periods but um this is a pretty straightforward way of how to do most time series forecasting problems uh you will rise you will get uh you know some more complexities with like adding you know exogenous variables um looking at you know specific holiday effects stuff like that but we're just like assessing a basic forecast i think profit has a lot of cool and useful tools and we can also utilize uh tidy models to deal with some of these more fancier resampling methods for time series components so it's pretty good to have at least some of these stuff in your arsenal regardless if you do a lot of forecasting or not with that being said i'll see you guys next time and tidy on

---
*Источник: https://ekstraktznaniy.ru/video/44693*