ZeroMQ vs Aeron: Best for Market Data? Performance (Latency & Throughput)

ZeroMQ vs Aeron: Best for Market Data? Performance (Latency & Throughput)

Anton Putra

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

In this video, we’ll compare ZeroMQ and Aeron in their native environments, ingesting and distributing market data. In trading, latency plays a huge role and can be a deciding factor in whether you make or lose money. So, we’ll measure latency, throughput, and saturation. In the second test, we’ll use the IPC protocol and replicate the same setup to give you an idea of how much performance can improve by placing all components on a single server. In trading, speed is very important. If you can colocate your application near the exchange and receive L2 and L3 data, you would ingest that market data and send it over to an in-memory order book application responsible for building and constantly updating that order book. If you just receive L1 data, which is the top of the book, you ingest only the best bid and offer prices with trades. In that case, you might distribute that data across strategies, OMS, executor, and other parts of the system. If you have all those components on different servers, you would mostly use UDP multicasting to reduce the latency. Here, we have a very popular Aeron framework that does exactly this. It can use UDP unicast, which means a 1-to-1 relationship, multicast (one-to-many), and other protocols. More importantly, they have a layer on top of UDP to verify that none of the messages are lost and all are delivered to the client. On the other hand, we have ZeroMQ, which was originally created specifically for market data handling. There are a lot of protocols available, but they don’t have reliable UDP. Instead, you can use the PGM protocol for reliable multicasting; it’s similar to what Aeron does with UDP. Now, if you can fit all your components into a single server, you will get much higher speed and lower latency with the IPC protocol, which you will see in the second test. That way, you don’t have the overhead of the TCP/IP stack, and it really improves latency and throughput. Now, for simplicity, I’ll use in the first test Aeron UDP unicast and ZeroMQ pub/sub with the TCP protocol, which is also a 1:1 relationship. Keep in mind both Aeron with UDP and ZeroMQ with TCP verify delivery of the message, so it’s fair to compare. And in the second test, it’s just the IPC protocol for both. And I use the C++ programming language, and you can find the source code in my public GitHub repository. ZeroMQ is much simpler to set up: you just have a publisher and a subscriber on different servers. For Aeron, when using C++, you need to compile and run the aeronmd daemon on each server so the requests go through that component between applications; it’s like a service mesh. And well, SBE (Simple Binary Encoding) is a protocol for decoding and encoding messages, which is very common in trading because it’s very efficient to decode and very small. For example, Binance provides market data via websocket using the same SBE format. I run all my tests on AWS to give you a real-world performance that you get in production. I use m7a. large instances for producers and consumers, as well as EKS for monitoring. Alright, let’s go ahead and run the test. I’ll slowly increase the number of requests until both applications fail to process any more. From the very start, you can notice that the Aeron subscriber uses 50% of the CPU. Since I use 2 CPUs, it actually consumes all CPU from a single thread. It’s actually intentional; it’s called a busy strategy that runs, and you usually use it when you need the lowest latency possible. ZeroMQ, on the other hand, will slowly increase the CPU usage throughout the test. Well, you can see that ZeroMQ with regular TCP actually performs better. In trading, latency is the most important metric because you compete with other people and algos, and if you have higher latency, that means you would get a higher percentage of slippage and might lose money. Aeron has a paid version with a kernel bypass, which is actually faster, but I use the open-source version that any of you can use as well. With open source, ZeroMQ actually performs better. I also measure network usage. Each message encoded in SBE is 70 bytes, but for some reason, the Aeron subscriber has higher usage, maybe due to an abstraction layer

Segment 2 (05:00 - 10:00)

on top of UDP to confirm delivery. You can also notice higher memory usage for Aeron; it’s mostly because you need to run both the application and the aeronmd media driver on each server. Moving forward, you might also notice that with higher volume, it will reduce latency a little bit. Alright, let me run the test for 1 more minute, and we’ll go over each graph one by one. First, we have latency, and for some reason with Aeron, when you have enough volume, the latency will be lower. You can see a spike by the end of the test because I completely remove the delay and each publisher produces as many messages as it can. Next, we have throughput, which is similar for both. Take note of this number, which is around 900,000 messages per second that can be delivered from 1 server to another. Compared to REST API and previous benchmarks, it’s a huge number. It would actually be interesting to compare Aeron with WebSockets. Next, we have CPU usage. Memory usage. And finally, network usage. So, ZeroMQ actually performed better than Aeron, and it’s much easier to set up and maintain. It’s just a library that you use in your application; that’s something to keep in mind. So, the latency was around 80 microseconds. Let’s go ahead and run the second test. I just want to clarify that there are no source code changes for the second test; I only use an environment variable to set the channel for ZeroMQ and Aeron, so it’s identical to the first test, but in this test, we place the subscriber and publisher on the same server. For ZeroMQ, you can see that the latency fell from 80 microseconds down to 40, which is a nice improvement. But every microsecond counts in HFT trading. But for Aeron, it now actually fell down to nanoseconds, which is a huge improvement with no source code changes. One microsecond is 1000 nanoseconds, and this is where Aeron shines. I also use exactly the same busy strategy to keep the loop running. Alright, let me run this test for 1 more minute. For ZeroMQ, it’s still around 900,000 messages per second, but for Aeron, it’s up to 20 million messages per second. Well, you can argue that if you use a queue based on a ring buffer and shared memory, you perhaps can even double the throughput and reduce latency even more. But this would require you to build a monolith and share data between threads. This approach works great if you have standalone applications running on the same server. First, we have latency. Throughput. CPU usage. Memory usage. Well, as you can see, latency really

Segment 3 (10:00 - 10:00)

matters in trading, and if you can achieve lower latency than others, you have an edge in trading and can make more money. There are FPGA cards as well, which I might cover in the future.

Другие видео автора — Anton Putra

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник