Stop Using Sagas for Everything
7:40

Stop Using Sagas for Everything

CodeOpinion 13.05.2026 4 288 просмотров 304 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Are your workflows constantly getting stuck in a "Pending" state? In a distributed system, we often lean on Sagas to coordinate complex processes, but they have a fatal flaw: a Saga is not the source of truth. In this video, I’ll show you why treating a Saga as the ultimate authority leads to data inconsistency, especially when dealing with external API timeouts and "uncertain" failures. Using a common checkout and payment example, we’ll explore why adding more edge-case logic to your Saga isn’t the answer. Instead, the solution is Reconciliation. 🔗 Kurrent https://kurrent.io 🔔 Subscribe: https://www.youtube.com/channel/UC3RKA4vunFAfrfxiJhPEplw?sub_confirmation=1 💥 Join this channel to get access to a private Discord Server and any source code in my videos. 🔥 Join via Patreon https://www.patreon.com/codeopinion ✔️ Join via YouTube https://www.youtube.com/channel/UC3RKA4vunFAfrfxiJhPEplw/join 📝 Blog: https://codeopinion.com 👋 Twitter: https://twitter.com/codeopinion ✨ LinkedIn: https://www.linkedin.com/in/dcomartin/ 📧 Weekly Updates: https://mailchi.mp/63c7a0b3ff38/codeopinion 0:00 Intro 1:12 Saga 3:37 Reconcile

Оглавление (3 сегментов)

Intro

So, you built out an elaborate system that has commands, queues, an event-driven architecture, retries, timeouts, and most importantly, compensating actions. But, do we really? Cuz we get a call from support that we have this order where the payment's pending, and it's been pending for 48 hours now. So, you look into it more and see that the payment provider did charge the customer, but our system shows that the payment didn't go through. So, which is true? Clearly, the payment provider. Now, the first and obvious answer is, let's just make it smarter so we have different code paths that handle different use cases. We need to possibly add another retry. Or in my example, there was a timeout. The payment provider timed out, we didn't get a result, but it actually did work. We just need to make it smarter and add more logic. But, this is what it turns into, a mess of logic trying to understand all the different use cases. So, we do our capture of our payment, but if there's a timeout, maybe we got to check again to see if we actually got a retry later. Or if there's a payment provider exception, what's the reason for it? Handling all those different cases. Is it a duplicate? Then we got to check the provider cuz we know we've already sent this. Or do have we for that exact order? If it's an unknown error, maybe we got to flag something for review. We have to handle all these different logic, these edge cases, and flows in our code. Now, you might think a saga is

Saga

the answer to this problem. And it's not exactly. A saga is great for coordinating workflow, but it's not the source of truth. It can tell you, what step am I on? What message did I receive? What command should I send next? Back again to the checkout flow, a saga is perfect. the workflow. That's where you understand what the workflow is. An order's placed, we reserve inventory, we capture payment, we confirm the order. But, there's a giant gap here. Is it that the saga is not the point of truth of an external system like our payment provider? What you really want as a solution in this situation, alongside a saga, is reconciliation. You want to know what should be true, what is actually true, and what corrective actions, compensating actions can you take that are safe that you can apply. Anytime you're dealing with a network boundary to a third party that you do not control, you're going to have uncertainty. Did the call work? Did it time out but actually work behind the scenes? This is where reconciliation comes in. I'm going to show how a saga and reconciliation go hand in hand, but first I'd like to thank Current for sponsoring this video. Current's an event-native data platform that feeds real-time business-critical data with historical context in fine-grain streams from origination to destination, enhancing data analytics and AI outcomes. For more on Current, check out the link in the description. So here's the example of the Checo flow in a saga and it's incredibly simple, as it should be. So the first things first is what we start off is we handle the with the orders placed. And what are we doing? We're reserving inventory. From there, if inventory is reserved and we're handling that event, then we're going to actually try to capture the payment. What we're also going to do is we're going to have a timeout here for a payment capture timeout. Now, we'll get back to that in a second. Then, if the payment capture event actually occurs, then we can um mark our saga as complete, but we're also marking the order as confirmed. And then from here, if the timeout does happen because we don't get um this back event back of payment captured, but after 15 minutes we actually timed out, then we can send a request for reconciliation. Now, a couple things to note. The first is the timeout doesn't mean anything failed. It just means we stopped waiting. We had the expectation that the payment captured event would occur in a certain amount of time, 15 minutes, and it didn't. So we want to handle it at that point. You'll also notice that I didn't have any other compensating actions about reversing the inventory or anything like that. There wasn't any compensating actions about a failure because we don't know if we have one. So

Reconcile

to visualize that flow, really what was happening in our saga is we send the capture payment, it's received, but we're waiting. The third party, that payment provider, never responds to us ultimately, so we get a timeout. But because we have that timeout, now our saga, when it fires off after 15 minutes, does that request payment reconciliation, which I'll show here in a second, where it can see, "Yes, actually third party, you did process that payment. I'm using your API. " And then I can go and update my order status because I know, "Yes, payment isn't pending anymore. " From a payment provider's perspective, everything worked. This isn't a saga problem. This is a data drift problem about point of truth, and that's where that reconciliation comes in. So, what does it look like? It's simple, and it should be. When we actually need to reconcile, we get our order out, and really what we do is just check the status, see, "Okay, if it's confirmed already or canceled, then we can just exit. " But, what we really need to do is go to the point of truth, go to the payment provider, get the actual status. If it is valid that we actually did process it, the payment processor, the payment gateway, then we can just mark our order as a payment was captured. And then we can send off our confirm order. If it we actually failed on the other side, again, cuz it timed out, but we didn't know that, then we can just mark the payment as failed, and we can cancel the order. Just a heads-up, everything in this video is in a blog post. I'll have a link down below. So, here's the pattern. The first is knowing that there's a drift. Way we did that was with a timeout. Then it's going to the source of truth, our payment provider, to compare it against our data to see if there is something different. And if there is, then apply some safe action. And there's so many different ways that you can trigger the reconciliation. My example, I was just using the saga timeout. That can work great. But, it might just be something as simple as adding a button like I have check payment status here that an end user can actually invoke themselves. Now, polling might not seem like the greatest option, but it might be a good option for your system. My example here, it's just that could be the trigger for this. We have a background service that every 5 minutes, we just look at our database looking for pending payments that are older than, for example, 50 minutes. And then we just iterate through those, and we call the request payment reconciliation. It really could just be polling. Reconciliation is not a cleanup job. It's about consistency. I get the sense this feeling well, I have to run this job or because of timeout reconcile with some third-party system when everything should have magically always be consistent. It's not. It's not going to be. There's nothing wrong with doing reconciliation. It's not some cleanup job. Sagas are great about coordinating workflows. Reconciliation is about verifying what you think the state of system is actually is the state of the system. Sagas are great at coordinating. Reconciliation is great at verifying what the state of the system is actually is because there's so many things that can fail. And trying to shove all these use cases and logic and edge cases into sagas or other parts of your system to handle every possible failure just isn't going to work. You don't need at all these different branching logic when something gets invoked like capture payment, you can reconcile. You can do this safely. Is something wrong? Yes, perform some type of a safe action. And my favorite part, get in the comments cuz I know you've probably felt this pain of something trying to do too much where you have so much logic handling all these different edge cases when all you really needed was probably some reconciliation. Get in the comments and let me know what you're doing. And you got this far so you clearly like videos on software architecture and design and topics like this. You can join my channel and get access to a private Discord server. The link's in the description on how to join. I really do appreciate everybody that supports my channel and joined it. Thanks again. And if you enjoyed this video, please give it a thumbs up. If you have any other thoughts or questions, make sure to leave a comment and please subscribe for more videos on software architecture and design. Thanks.

Другие видео автора — CodeOpinion

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник