💻️ Need some help with a project or some consulting? Contact me here: https://www.neuralnine.com/services
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
Оглавление (4 сегментов)
Segment 1 (00:00 - 05:00)
What is going on, guys? Welcome back. In this video today, we're going to take a look at ONNX, which is Open Neural Network Exchange, basically a PDF-like format for machine learning models, or to be precise, for neural networks. So, the idea is we train a model in PyTorch, TensorFlow, we download a model from Hugging Face, and instead of needing all these dependencies to actually use the model, we can just export it to the ONNX format, and we can serve it using the ONNX runtime or something similar. And that is basically exactly what we're going to do in this video today. I'm going to train a simple PyTorch model, a simple TensorFlow model, we're going to export them into the same format, and we're going to see that the code for the inference is the same. It doesn't need to PyTorch, it doesn't need TensorFlow. I'm also going to show it with a more complex model, an MNIST classifier that is actually being trained, and then we can use it. And I'm also going to show you how we can pull Hugging Face models in this format already, so we can serve them without needing all the dependencies. I think this is super interesting, so if you like this video, let me know by hitting the like button and subscribing, but now let us get right into it. All right. So, as always, let us jump right into it. Here we have the Wikipedia page of the ONNX format, which is, as I mentioned in the introduction, a little bit like a PDF format for models, for neural networks. So, we can train them in PyTorch, we can export them from PyTorch or from TensorFlow, it doesn't matter. At the end of the day, they're running on an ONNX runtime or something similar, and we can use them with the same code, no matter what they have been trained in, and we don't need all the dependencies. Originally, this was developed by Facebook and Microsoft, but it's open, so it's not uh in any way proprietary or corporate at this point. And now what we're going to do is we're going to navigate to a directory that we want to be working in. In my case, that's going to be the tutorial directory, and we're going to install some packages. Now, you can do that with your ordinary Python installation. You can just say pip or pip three install, and then the packages. I don't like to do that. I like to work in isolated environments, and I like the Rust-based package manager UV. So, if you want to use that as well, you can install UV and run UV init. Otherwise, don't feel the need to use UV. This is just my way of working. So, you can say pip install. I'm going to say UV add, which has the same effect. So, UV init to create a project here, and then UV add, which is your pip install. And then, what we're going to need is PyTorch, because the first example that I want to show you is how we can build a very, very simple neural network in PyTorch, export it to this ONNX format, and then use it using the ONNX runtime, which does not depend on PyTorch. So, I'm going to say UV add torch. Then, I'm also going to install the ONNX runtime. I'm going to install also ONNX and ONNX script. So, these are the first dependencies that we're going to install. Later on, we're also going to do the same thing with TensorFlow, but for now, these are the packages that we're going to add here. And then, I'm going to give you a very, very minimal example of how we can do this. So, I'm going to open up a script called main. py, and we're going to start by importing torch. Now, the example I want to use here, let me zoom in a little bit. The example I want to use here is a very, very trivial one. I'm going to hardcode the functionality of this neural network. We're not going to have a training loop, no evaluation, no dataset. I'm just going to have a very simple forward function that adds one to input. So, it's going to take an input, plus one, output. That's it. The reason is I just want to show you as a proof of concept that this works, and then we're going to do a real example where we train a model to do MNIST data classification. So, basically, handwritten digits. But for now, let's keep it simple. I'm going to say class, and then at one. I'm going to inherit from torch. nn. Module, and it's just going to have a forward function. So, forward is going to be taking an input, so x, and of course, we need the self keyword here. And then, what we do is we just return x + 1. That is the entire functionality of this module. So, then I can say model is equal to at one. I can instantiate it. I can put it to eval mode, and that is basically my model. And now, the only thing that we actually need to do here now is we need to export it into ONNX, into the format ONNX with a simple function call. However, in PyTorch, this requires also some dummy data, because in TensorFlow, we can infer the computation graph, so all the information that we need from the metadata. In PyTorch, we do this by passing some sample data to infer the computation graph. So, what I'm going to do here is I'm going to say X is torch tensor, and we're going to provide some sample data here, which is just going to be one, two, and three as floats. That's it. And that is the dummy data that we're going to feed through the network, so it gets the computation graph. In this case, very trivial, but we still need to do that. And the actual function that we need here is the ONNX export function. So, torch. onnx. export. The first thing that we pass here is the model. The second thing we pass is the dummy data. So, I'm going to pass this as a tuple, X {comma}. Then, we can choose a name. Let's call this add1. onnx. Oops. And then, we choose input names, specifically to input, because these things, the metadata, can change a little bit
Segment 2 (05:00 - 10:00)
depending on what you use. The default values can be different. So, if you want to really keep the code exactly the same, you need to have the same input and output names for all the export systems, TensorFlow, PyTorch, whatever you're using. But, besides that, it's intertain- interchangeable. So, we also have output names. And then, in order to use the modern exporter, we need to provide dynamo or dynamo is equal to true. That is the entire thing that we need to export an ONNX file from a simple PyTorch module. So, we can close this. I can say UV run. In your case, if you're not using UV, you say python3 or python, and then main. py. And there you go, you can see obtain model graph. It passes the stuff through the network, and there you go, we have the output, add1. onnx. So, that's all we need to do to create the model in this format. Now, let's open up a second file, let's call it inference. py. And here we're going to actually use this model. So, we're actually going to use the ONNX runtime to do inference using this file. For this, we're going to start by saying import numpy as np and import ONNX runtime as ort. Then we're going to say session is equal to ort inference session. And here we need to provide now the name of the model in ONNX format, so the file basically. And now for the input and output name, you can either hardcode it or you can get it from the actual uh from the actual information, from the actual metadata, by saying session. get_inputs. And then you can say index zero. name. This is going to give you the input name, and the output name is basically session. get_outputs. Then zero. name. And now to test this, we can create a numpy array with some data that we want to see if it works. So, in this case, we wanted to add one to all the values. So, we pass, let's say 10, let's say 15, and let's say 23. And we expect all of these to be increased by one by passing them through the neural network. And the actual inference now is done by saying output is equal to session. run. Then we pass here in a list the output name, and as a dictionary we pass the input name as key pointing to the input, which is X. And then this entire object here we get the first uh element, so index zero. And then we get print that the inputs was X, and the output is now the output. So, that's basically what we do. We load the file, the ONNX format file. We just get the name, so we target the correct fields. Again, you can hardcode this if you want to, but then you might have problems with TensorFlow. It's not interchangeable. In this case, you're just reading it from the file, and that's it. You don't need to anything else. But, what we do here is we load the file, we get some data, we run it through the network, we get the output. So, actually everything is happening here and here. Or here, if we're honest. This is the inference. So, let's close this, and let's say you UV run inference. py. Uh and it says I have some duplicate name of the input. Oh, sorry. We need to go into the training script again and make sure this is output, not input. Otherwise, we have a duplicate. So, let's run first of all the training script again. UV run main. py. This is going to export the model again. And also in our inference. py, we need to make sure that we pass the proper data type. So, that's going to be NP float 32. Once this is done, we can go and run the inference again. And we can see we get the input, and the output is just everything plus one. So, there's no PyTorch here, no TensorFlow, nothing. This is just the ONNX runtime. I can copy the ONNX file, and I can install just the ONNX dependencies, no PyTorch, and it will work. So, now I want to show you that the exact same thing works if I export the model from TensorFlow. I can have the same functionality written in TensorFlow, and you can imagine this would be a training process, and it will have the exact same effect. I can also use the inference. py script. Nothing's going to change. I can do the same thing. For this, let's go and say UV add TensorFlow and also TF2ONNX, which is what we need for the export, because TensorFlow doesn't have it by default in the main package. So, this is installed. Now, let's go and call this main2. py. And we're going to import TensorFlow as TF, and we're also going to import TF2ONNX. And basically, same idea. We have a class called add_one. This is going to inherit from TF module. Then we're going to have here the decorator TF function. The input signature, we define it here explicitly, is going to be a list TF tensor spec. We pass here none and none. TF float 32 is a data type and the name here is also input. So, the naming is the same. Then, the actual functionality is going to be just a dunder method call. So, when we call what we're going to do here is we're going to add plus one. And of course, here we have uh X as a parameter. So
Segment 3 (10:00 - 15:00)
we're just going to return X plus one. All right. And then we say model is equal to add one. We instantiate it. And then we say ONNX model and a placeholder because we don't need the value that is returned, the second one. And we just say TF2ONNX. convert. from_function. Then we can pass model_call. We can copy the input signature here. So, that is up until here. Copy that. Paste it here also as the same argument. And the output path is add one TF. ONNX. So, we can run this now too. And everything worked. As you can see, we also have now the add one TFONNX. We also didn't need dummy data to infer the computation graph. This is just because we have enough uh specification and metadata here in TensorFlow. And now we can go ahead into the inference script. Keep everything the same. The only thing we're going to change is the file that we're using. So, add one TF. And then we're going to do inference. py. There you go. Same functionality. So, we have one model trained in PyTorch, one in TensorFlow. I mean, in this case not trained but defined. And we can use both of them just with the ONNX runtime because they have this interchangeable PDF-like format. Now, I want to show you that this also works with a proper example. For this, we're going to add another package called TorchVision just so we can get the MNIST data set. And I'm also not going to write the whole code here from scratch because I don't want to make this a PyTorch tutorial, but let's say this is going to be MNIST example. py and I'm going to copy-paste my prepared code here. All of this is super basic if you know any PyTorch. There's nothing special here. We just have a basic PyTorch module. We have a simple sequential neural network, flatten layer, linear layer, then we have a ReLU activation function, linear layer with 10 outputs. We have a simple forward call or forward pass. We have a training loop, optimizer, loss function, nothing fancy. We just train a model five epochs to classify handwritten digits using the MNIST data set. So, we you see that this is loaded here. And that is everything we do. Nothing fancy. But, the final thing here again is we have dummy data. We provide a random 28 by 28 pixels because that is the format of the MNIST images. We provide some random noise here. We feed it through the network and we export as ONNX. That is the only thing that is new in this video. The rest is just basic PyTorch stuff. So, we can also run this now. say UV run MNIST example. py. This downloads the data set and will start training in a second. There you go. We can see how the epochs progress and everything is now exported. We have the MNIST MLP, multi-layer perceptron, in ONNX format. And now we can say inference MNIST. py. I'm also going to copy-paste this one here. Basically, all we do here is the same. We do an inference session on the MNIST. We load the test data set here again just so we have actual data. I don't want to feed some random noise in. I will just use the test data set. We're going to get the image data, the 28 * 28 pixels, and then we're going to make a prediction in ONNX. So, using the ONNX runtime, the only torch dependency here is the data set. Nothing else is running on PyTorch. We're going to get a prediction and then we're going to see the ground truth, the actual label, and the prediction. Hopefully, it's correct. So, UV run inference MNIST and you can see seven and seven, which means seven is the actual digit and our neural network predicted seven as well. Now, the final thing I want to show you is how we can pull models from hugging face in this format. For this, of course, you need to find a repo. model that offers this format. You can just search there. And once you have it, we can install it with hugging face hub. So, I can say UV at hugging face {dash} hub. And then we can use the download command, so the HF command. Uh in this case, I have to preface it with UV run. So, I have to basically say UV run HF download because that is only installed in this particular project here. But the basic idea is UV run HF download, then the path to the model, which has to be in ONNX format. You say include only this specific one, which is a very optimized uh model, quantized and everything, for CPU and mobile. And it has this uh ONNX structure uh or format. And then we load this or we download this into the local directory, which we're going to call 534ONNX. This then starts to download, as you can see. And once it's downloaded, we can see the directory here, the 5 directory in my case. And for this now, we need to actually install a different ONNX package, which is ONNX and genAI. Or to be precise, ONNX runtime genAI. So, UV add ONNX runtime {dash} genAI. And then we can open up another Python script. Let's call it genAI example. py.
Segment 4 (15:00 - 18:00)
And we're going to say import ONNX runtime genAI as OG. And we're also going to use the JSON package because we need to adhere to a certain uh chat template. But essentially, we can just say model is equal to OG model and provide the actual path. And path here means on our system, so 535 on an X CPU in mobile and then CPU in four whatever. Now, the tokenizer is going to also be part of the model, so tokenizer is going to be OG. tokenizer from the model. Then we can do a basic message. So, we can say here we have a history. I'm going to have a dictionary. Role is going to be the user and the user will ask the question, "What is 2 + 2? " Answer only with the number. Now, then we need to turn this actually into a prompt that adheres to the chat template. So, I'm going to say prompt is equal to tokenizer. apply_chat_template onto message. Or since our message is a Python object, we need to turn it into a string. That's why we use the JSON package. So, json. dumps message and then at generation prompt is equal to true. So, then to turn the prompt into actual tokens, we need to say input tokens is equal to tokenizer. encode the prompt. Then we define the generator parameters by saying OG. generator_params for the model. And then the generator itself will be the OG. generator given the model and the params. And then we can append our tokens. So, generator. append_tokens and our input tokens, so our prompt basically. And then we can just generate output tokens. So, while our generator, while not generator. is_done, we're going to keep generating the next token and we're going to extend the output tokens list by that. By generator. get_next_tokens. And finally, once this is done, we're going to just print the output tokens by saying and then we're going to strip everything from the string. And that's going to be it. So, let's see if this works or if I forgot something, but that is the gen AI example, so I can just run this. Okay, and of course, this crashed my PC because I forgot a very important line. I now added it. I'm going to show it to you. Uh this line here, params. set_search_options_max_length and then length input tokens plus 20, limits the output length, the sequence length that the model can use. And without this, it basically blows up my RAM or I don't have VRAM on this computer. It crashes my system, so this is very important. If you don't add this, you might run into problems. So, with this now, hopefully it won't crash again. genai_example. py, let's run this. I can still move my mouse, and there you go. Four. 2 + 2 is 4. So, that is also how you can work with ONNX formats on Hugging Face. So, that's it for this video today. I hope you enjoyed it and hope you learned something. If so, let me know by hitting a like button and leave your comment in the comment section down below. Also, in case you're interested, on my website you'll find a services tab and a tutoring tab. There you can contact me if you need help with a project, if you need a freelancer, consultant, contact me via LinkedIn or email at the bottom of the pages. Besides that, don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free. Other than that, thank you very much for watching. See you in the next video, and bye.