Protobuf vs JSON Performance, Size & Comparison (2026)

Protobuf vs JSON Performance, Size & Comparison (2026)

Anton Putra

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

In this video, we’ll compare Protobuf vs. JSON. We’ll use small, medium, and large sizes of each and compare how fast each can serialize and deserialize, how many operations per second each can perform, and finally the size of the serialized object. The primary difference between the two is that JSON uses key-value pairs and nested structures to define attributes and their associated data. If you need to send the data over the network to a client, for example, every single time you need to not only send the data you want—like name, address, and maybe a phone number—but also the keys for each piece of data. And depending on the JSON, those keys can be a significant part of the overall size. Let’s say we want to make it more efficient. Well, the obvious way is to decouple the schema from the data and only send the data over the network to the client. But now the client needs to have that schema in order to decode our message. So by decoupling the schema from the data, we reduce the payload size but make it a little bit more complicated to work with and debug. Also, by reducing the data size, we reduce the time needed to serialize that data and also reduce the time that it takes to transmit that data to the client. For example, if you take the standard MTU packet size, which is around 1500 bytes—which is simply the number of bytes you can send from server to client in one shot—by using Protobuf, you can fit more bytes in that packet and send it over the network and increase the overall performance, since you can send more of the same data more quickly to the target since Protobuf messages are smaller. Also, Protobuf is widely adopted, and based on the same Protobuf message definition, you can generate code stubs for any language you want. So, for example, if you have a Rust server and Python client, you can generate both SDKs and easily integrate your applications. It’s actually very easy to work with Protobuf, and in the next video I’ll cover and compare gRPC vs. REST. Another cool feature of Protobuf is that you can start with one message definition, evolve it over time by adding more fields, and still maintain backward and forward compatibility. This means that old clients generated from the old schema will still be able to read the initial fields, while new clients generated from the new schema will be able to read both old and new messages. Now, to be on the same page: when you serialize data, you are converting an in-memory data structure like a struct or a class, for example to a format that can be stored in a file or transmitted over the network to another server. When you deserialize, you read those bytes from the file or over the network and reconstruct the object in memory so that your program can use it. Let me give you a simple example of how Protobuf encoding works. First, we need to define a schema. Let’s keep it very simple and use a string for the symbol and an integer for the price. Let’s say we have Apple stock, which is $280 per share right now. First of all, we need a tag that would indicate the proto field—like symbol—and separate our values in the proto message fields. A tag is a unique integer identifier assigned to each field so the parser knows where to start and how to extract values. First, we need to encode the field tag and combine it with the wire type, which is basically the same as a JSON key but much smaller. Here, for example, are primary wire types in use. In our example, the first field is a symbol of string type, which corresponds with wire type number 2. To encode the tag, here’s a generic formula: (field_number << 3) | wire_type. First, we need to perform a left shift operation, which shifts the field_number 3 bits to the left, and then combine the shifted field_number with the wire_type by setting the lowest 3 bits to the wire_type value. Let’s go over a simple example. This is 1 in binary representation, and we need to shift it to the left, which becomes 8. Now it reserves the bottom 3 bits for the wire_type, while the higher bits hold the field_number. Next, we need to perform a bitwise OR operation. It returns 1 if at least one of the corresponding bits is 1; otherwise, 0. 2 in binary is 10. Let’s combine the shifted field_number with the wire_type by setting the lowest 3 bits to the wire_type value. The result, which is 10, is a single integer (the tag) that gets encoded as a varint in the binary stream.

Segment 2 (05:00 - 10:00)

Next, we need to concatenate the tag plus payload for each set field. Let’s go ahead and encode AAPL in UTF-8. And finally, combine all of this together. We have the tag, which is 10, then we have a length of the string, which is 4 in this case, and the encoded AAPL value. And we use hex to represent it instead of binary just because it’s smaller and easier to read. Next, let’s encode the second field for the price. The field number is 2, and we also need to shift it left 3 bits, and we get the wire type, which is 0. After performing the bitwise OR operation, we get 16 (10 in hex). Next, we need to encode 280’s decimal representation to hex as well. And at the end, when we combine both fields, we get a message of only 9 bytes. Let’s encode the same JSON message in hex as well. So here we get 29 bytes, so you can see—based on the schema and the length of the JSON keys—the difference in size can be huge. This video is sponsored by TestSprite. Whether you're writing code on your own or use AI, this tool can actually be very helpful for finding bugs in your code, especially if much of it was generated. You can start for free and get 150 credits, which is enough to test this out. Let's go over a simple example, and I'll show you how you can use it and what type of bugs it can find for you. Let's use Cursor and generate a very simple Golang application with a single users endpoint that you can use to create, list, and delete users. In a few seconds, we'll get a fully working example, but it does not mean that it is immediately production-ready. Now, to run the tests locally, we first need to install it. First of all, we need to generate a key. Let's give it a test name. Copy the key. Next, you can install it by just clicking "Add to Cursor. " You can add your API key here as well during the installation. And that's all; you can close the settings. Now, to start testing, you can just ask TestSprite to run the tests. When you run it for the first time, you're going to need to configure it as well. You can test the backend or frontend, select the scope, and add the same API key. And finally, you need to upload the docs for your project, like what features you have, etc. In my case, I'll just upload the README that was generated in the first place. Now TestSprite is going to generate tests and run them. In a few minutes, you'll get test results and some findings. For example, some tests like "get users," "get user by ID," and "delete user" worked, but there are some issues as well. For example, there is no email validation, and since we're using an in-memory store, it detected that all data is wiped out after you restart the app. So, the key finding that you need to pay attention to is email validation. For now, our app would accept any email string. You can also find the test report on the TestSprite website. Alright, let's go ahead and fix the email validation first. You can see that this fixes the test case that we had for email validation. If you want, you can ask TestSprite to rerun the tests to make sure we don't have that bug anymore. To run the benchmark, I created a simple application in C++ that uses Protobuf as well as the fastest JSON library. It supports writes and reads. And as always, I run this benchmark—as with all others—on AWS using the same instance types that I would use in real production environments. Alright, let’s go ahead and run the test. First, let’s compare the smallest JSON and Protobuf message I have. In the first graph, you can see that it takes around 40 nanoseconds to serialize JSON and around 25 for the Protobuf—it’s quite a significant difference. In the second graph, when we deserialize bytes, we still have a big difference between these two. And based on this, you can see how many operations per second we can perform with each format: around 6 and 5 million. And in the last graph, you can see the size of each object when we serialize—75 bytes for JSON and only 25 for the Protobuf message. Next, let’s compare medium sizes, which would better illustrate real-world use cases. We still have a significant difference: around 190 nanoseconds for the JSON and 110 for Proto. And in the case of the medium size, it takes twice as much time to deserialize JSON compared to Protobuf. Also, we have more operations per second on Protobuf as well, and the size is significantly smaller for Protobuf—more than half. And finally, I decided to test large payloads

Segment 3 (10:00 - 10:00)

which you would almost never see in real life but are still interesting to test. I was expecting the difference to be even bigger, but for some reason, there is a threshold for the Proto messages after which it becomes significantly slower but still performs better than JSON. In the case of the JSON, I would say there is a linear difference for how much slower it becomes compared to Protobuf. So, we have a lot of keys which need to be computed for Proto messages, and I have a lot of string fields. You can find the source code, as well as JSON and Protobuf examples, in my public GitHub repository. Let me know what I should test next.

Другие видео автора — Anton Putra

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник