In this video, I present you Toto 2. In the world of observability and DevOps, we are constantly dealing with telemetry. Huge streams of data coming from servers, applications, databases, networks, and infrastructure. Things like CPU usage, memory consumption, request latency, error rates, and network traffic, and so much more. All of this data is called as time series data. Measurements recorded at regular time intervals. Being able to accurately predict what these metrics will do in the next few minutes or hours is incredibly valuable. Today, I'm exploring Toto 2 from DataDog. Toto stands for time series optimized transformer for observability. It's a new family of open source time series foundation models designed specifically for this kind of observability telemetry. We are going to install this model, and build a real-time dashboard for telemetry just like you would use DataDog or any other such tool. Before I go into that, allow me just 1 minute to talk about what exactly this model is made up of. So, there are three key ideas here. One is contiguous patch masking, multivariate forecasting, and quantile output head. So, these are the three things which we'll be talking about. I will kick off the installation. I'm going to use this Ubuntu system. I have one GPU card, Nvidia RTX A6000 with 48 GB of VRAM. It's a very small model. They even have even smaller model than this. So, you can even run it on your CPU or any consumer CPU. If you're looking to rent a GPU on very good price, you can find the link to Vast Compute in video's description with a discount coupon code of 50% for a range of GPUs. And now let's install all the prerequisites, which is going to take a few minutes. While that happens, let's talk about this architecture and the concepts which I referred to earlier. So, Toto-2 has three smart design choices that make it quite powerful for observability data. First, it uses contiguous patch masking or CPM. The model breaks the time series into small patches and can predict an entire future window all at once in a single forward pass instead of predicting one step at a time. Secondly, it supports multivariate forecasting, which means that it can analyze multiple related metrics together, like CPU usage, memory, and network traffic at the same time, and learn how they influence each other. Third, instead of giving just one predicted value, it has a quantile output head that produces a full range of possible outcomes. So, you get nice uncertainty bands showing, for example, the 10th to 90th percentiles. And that is all there is to it. In terms of comparing to other models, as you can see in this diagram, this shows how Toto-2 performs compared to other models on both Boom, which is an observability benchmark, and Gift Eval, which is a general time series benchmark. As the model size increases from 4 million to 2. 5 billion parameters, Toto-2's rank gets better and better, placing it among the top models overall, especially the larger versions. So, after everything is installed, now let's test it out. So, first up, I'm just going to download the model with this simple Python script, and it is going to go to hugging face, get the model. And I think it is just over 1 gig in size. And the model is downloaded and you can see that it is showing us the parameters plus the batch size. So, the batch size means that the model doesn't look at your time series one number at a time. Instead, it breaks the data into small chunks, batches of 32 time steps each. So, every batch contains 32 consecutive values. For example, 32 minutes of CPU data. Uh something like that. Okay, so now the model is downloaded. So, for the first test with the model, I have just written this code where I'm going to create some fake data that has a trend and daily seasonality, common in real telemetry. So, we give the model the last 512 points as context and ask it to forecast the next 96 steps. Finally, we plot the last part of the history plus the median forecast with uncertainty bands. Let me now run this in front of you
Segment 2 (05:00 - 10:00)
and it is just running at the moment. I will quickly show you the VRAM consumption if it is consuming any. I'll just let it run. So, you see this is the VRAM. It was consuming just under 2 gig of VRAM. You can easily run it on CPU. And this is the plot which it has generated as you were working through. So, if you look at this plot, the black line on the left is the recent history plus 96 steps of our synthetic data. The blue line shows the model's predicted median, most likely forecast for the next 96 steps. The light blue shaded area represents the 80% confidence interval, which means the model believes the actual values will fall inside this band 80% of the time. Now you can go ahead and create your own data dog or Dynatrace or Sumo Logic or whatever observability tools are there. I'm just kidding. Let's check out another example. So for this next test, what I'm going to do I'm going to use this total model to do real-world multivariate forecasting capability on actual system telemetry. So instead of using fake data, I'm going to collect four real matrices at the same time from my own machine, which include CPU usage, memory usage, GPU usage, and network traffic. Then I will feed all these four metrics together to this model, and that is called as multivariate forecasting. And then I'm asking that total two to predict the next 96 steps, roughly 1. 5 hours if collected every second. So I already have started this as you can see here. It is going to take 8. 5 minutes. It is collecting the data from my system. So let's wait for it to come back and we will check out uh okay, it's already almost there. Meanwhile, you can follow me on X if you're looking for AI updates and consider becoming a member if you want to support the channel. Okay, so let's wait to see what it does here. And there you go, the model has come back with a response. And if you look through it, the code worked correctly and the model has done wonderfully well. It successfully collected 512 steps of four real matrices. Multivariate input, CPU, memory, GPU, network, as you can see. And then from there, it not only did um it error-free, but also produced a forecast of this shape, as you can see here in the output. This is correct. Nine quantiles, four variables, next 96 steps. And the plot was generated pretty fine. And then you can see that the you know this these are the real system matrices. Um but one thing I'm noticing is that maybe the predicted values are much smoother and less variable than the actual history. Uh still there is some variation, but I think the system is pretty much ideal. But if you give it a very noisy high-frequency data like raw CPU, then it might be able to do bit more in terms of these fluctuations. Okay, so that is good. And now in this final example, I'm going to build a dashboard in real time to collect the real-time system telemetry. Let me go back and then run that it is going to launch my Streamlit dashboard locally. And it is going to collect all the GPU memory, network usage after loading the model. Let's wait. It is going to load the model hopefully shortly. And everything will be shown in the browser. So it is loading the model. Small model, but still takes a bit of a time. Few seconds I would say. There you go. The model is loaded. Let's start the live dashboard. And right now in another terminal I'm running a script to just put some load on the CPU so you can see that the CPU load is quite high. And there are some network traffic going on because I'm using it um you know accessing it remotely. Some memory consumption is there and this is the graph. And on the left-hand side I'm refreshing it every 8 second. You can of course increase it or decrease it. You can also reduce the forecast horizon. You just have to run the live dashboard again. There you go. See? So this is Toto 2 running live on real observability style data. Pretty cool to see it working end-to-end on our own systems. So, as I said, you can just build your own telemetry tool. Um just visualize it and you can use graphs and whatever you want. Just do some forecasting and have fun. Let me know what do you think about this model.
Segment 3 (10:00 - 10:00)
Again, please follow me on X if you are looking for AI updates. Thank you for all the support.