Event-driven, cloud-native architectures promise ultimate scalability, but highly regulated industries like fintech and healthcare introduce strict boundaries where you cannot afford to lose or duplicate a single event. In this InfoQ video, discover the foundational principles of lean events vs. fat events, the critical difference between commands and events, and why you don’t need event sourcing to succeed.
Learn how to implement structural guardrails like Inbox/Outbox patterns to stop data duplication, maintain event contracts across domain boundaries, and manage event ordering without tanking your system's scale.
⏱️ Video Timestamps (For Navigation)
00:00 - The Reality of Cloud-Native Architecture in Banking
01:12 - Foundations: Lean Events vs. Fat Events
02:05 - Commands vs. Events: The Mistake That Screws You Later
03:00 - The Myth: Do You Actually Need Event Sourcing?
04:32 - 5 Core Benefits of Event-Driven Systems in Fintech
04:55 - Deep Dive: Decoupling Transaction Monitoring & Payments
06:10 - Creating an Immutable Activity Log That Auditors Trust
07:02 - Fan-Out Patterns & Architecting 3-Tier Fault Tolerance
09:33 - The Human Cost: Why Your Engineers Struggle with EDA
10:45 - Scaling Success: Developer Platforms & Paved Roads
11:58 - Preventing Disasters: Outbox vs. Inbox Patterns Explained
14:40 - How to Avoid Breaking Event Contracts (Versioning Strategies)
16:51 - Event Ordering vs. Architectural Scale: The Tradeoffs
18:50 - Putting It Together: End-to-End Banking Architecture Diagram
20:45 - Q&A: Handling Audits, Deduplication Overhead, and Chaos
🔗 Transcript & slides available on InfoQ: https://bit.ly/4uTh904
#EventDrivenArchitecture #SoftwareArchitecture #FintechEngineering #SystemDesign
Оглавление (15 сегментов)
The Reality of Cloud-Native Architecture in Banking
So firstly, I want to get an idea of where everyone's at. Um, hands up if you've built or just been part of building an event- driven architecture. Nice. Cool. That's definitely over half. Um, keep your hands up if that event driven architecture was in the cloud. Okay, we lost a few people. And then keep your hands up if it was in some kind of highly regulated industry, banking, healthcare, etc. Someone's bouncing. That's interesting. You consider regulation as a as an up and down. Aviation. Okay. Yeah, we we'll put that one as regulated. Okay, cool. So, we've got a bit of a variety of uh of experience in the room, which is fine. Uh I'm going to talk through the foundations and the principles first. So anyone who hasn't worked with event- driven architectures, you will be fine. We'll go through the foundations. We'll all get up to speed so that we can get into the detail. Um if you have done event- driven architectures before, that's fine. These might just be
Foundations: Lean Events vs. Fat Events
reminders. Um please don't leave. Uh if you can lock the doors, um we it might just be a reminder. It might be some things that actually you haven't considered before. Once we've done the foundations, we'll then get on to why do we want to do event-driven architectures in this highly regulated environment. Why would we put ourselves through that? What the benefits? And then we'll get into what hurts, what are the challenges that you need to be considering if you're building systems like this. Uh and I'm not just going to leave you with the pain. We will go through what helps as well. Okay. So, foundations first. If we take our title event driven cloudnative banking, we'll break that down and we'll define each part. So an event uh essentially is a change in state somewhere in the system and that could be caused by a user's action, an asynchronous background task, an
Commands vs. Events: The Mistake That Screws You Later
external entity, an external system to the platform that you're building. It may carry data and we might call that a fat event. Um or it might simply be a notification which would be a thin event. Now I haven't made uh the dietary uh definition of an event up. This is something that's discussed out in the world. There's actually quite a famous paper that talks to putting your events on a diet. Um I would tend to aim to have your events lean. So essent essentially all of the data that pertains to the event put it in there. Anything else don't do it. Um but yeah, you do have these levels of an event. Before we get on to anything else, I am quickly going to discuss commands versus events. And this is because this is a conversation I get into time after time. If you build an event- driven system and then you start pumping
The Myth: Do You Actually Need Event Sourcing?
commands around it, you're not getting all the benefits that you would like to get from an event- driven system and actually you're going to screw yourself over in the future. So very simple uh differentiation. A command is me saying I want something to happen and I'm explicitly asking you to do that thing and I'm going to wait because I'm expecting some kind of result. Even if it's asynchronous, I'm expecting a result. An event is me shouting into the world saying that something happened. I'm not expecting anything to happen off the back of that. In fact, I'm not necessarily expecting anyone to be listening to me. I could be shouting into the ether, no one's subscribed to that event and that is fine. This differentiation comes up a lot. So get it burnt into your brains if you've not worked with these architectures before. Understand what each thing is and when to use which one. Cool. So we know what an event is. It's a change in state. So therefore an event- driven architecture quite simple. It's where we combine multiple systems that are reacting to events. — [snorts] — um tends to consist of producers, so systems that publish events, and consumers, systems that receive an event. Nice and simple. A quick shout on event sourcing. This tends to end up in the same conversations as event- driven architectures. When you talk to people and you talk about event- driven
5 Core Benefits of Event-Driven Systems in Fintech
architectures, a lot of them will think of event sourcing. These are not the same thing. And please like spread this to your teams. You do not need to do event sourcing in order to do an event-driven architecture. [snorts] So event sourcing is where it's actually how the state of your application is represented. It's represented as an immutable sequence of events. So, if we considered a shopping
Deep Dive: Decoupling Transaction Monitoring & Payments
cart online, if we weren't doing event sourcing, we might represent the state of that shopping cart as, okay, I have four hats in my shopping cart. I don't know why I'm buying four hats. We'll go with that. I've got four hats in my shopping cart. And if I went to look at my state at my database, I would see a record there that says hats times four. event sourcing. The state of my shopping cart is represented probably in this case as four events. You would have four records and each one represents me adding a hat to the shopping cart. In order for me to know the state of an application when it's event sourced, I need to play back those events in order to know, cool, at this point in time there are four hats in my shopping cart. It's a complicated pattern to apply. Um, I've seen people really struggle with understanding it and it takes people time to learn it. So understand that you do not to need to do event sourcing to do event- driven architectures. The reason they come hand in hand is that if you have done event sourcing, adding that little extra bit of
Creating an Immutable Activity Log That Auditors Trust
subscribing to an event is much easier. So that's why they tend to come hand in hand. You do not need to do this and understand there is there are dragons here. Okay. Next up, cloud native. I hopefully don't need to spend too much time on this. Uh essentially designing, constructing operating workloads uh in the cloud. Um technically the cloud has any operating model in it. We could spin up virtual machines. I could SSH into a virtual machine and copy a zip file over and run a manual service on that VM. So actually when we're talking cloud native we tend to be talking about doing modern engineering practices highly scalable. Um I have put micros service based in here although I realize there are other approaches to problems uh modular monoliths but they exist and they are
good patterns to use. Um, and these are systems that would be deployed using modern DevOps principles, CI/CD practices. Hopefully, cloud native is something that you guys are already fairly comfortable with. Okay, so our title was event- driven cloud native banking. We got one more thing left to define and that is banking. These are large, slow, highly regulated organizations that promise to keep your cash safe under their mattress so that you don't have to put it under yours. Uh they also tend to be terrified of any of the modern principles that we've just been discussing. Uh lots of them will use fax machines as integration mechanisms. Um fortunately, Investec, the bank that I work for, is not one of them. We are a pretty modern agile organization and so we've been doing a lot of more modern engineering practices like the ones that we're going to be talking about today. Okay, so there's our foundations set. Lovely. Let's get into the details. So why do we want to do eventing event driven things? Um there are actually many different reasons. Uh I've picked out a few because they apply to real situations, real use cases that we have had to solve for at the bank. Um you can read online about many different benefits and drawbacks of event- driven architectures. I'm just going to pick out some that make sense. So decoupling is it's an obvious one once you get into using events. Um we have a very real use case here of transaction monitoring at a bank. So transaction monitoring for anyone who's not aware um essentially everything that happens on a client's account we need to be paying attention to monitoring looking for anything that's strange. Um, if you think about times where you've uh traveled to a new country, we want to be able to see that and work out if that's something abnormal or if that's something that we would expect of you as a client. Um, to solve for transaction monitoring, they need lots of data from our payment system. So, we've got two options here. We could couple the two things. Now, payments at a bank is a very, very important thing. highly regulated uh
The Human Cost: Why Your Engineers Struggle with EDA
PSD2 if you want to go read about that regulation uh it's a lot of fun um payments is crucial and we have to build it with reliability at the core transaction monitoring is not something that has to come in a payment flow it's something that happens behind the scenes you do fraud checks on a payment but you don't necessarily have to monitor transactions actively in order for a payment to go out the door so if we couple these two things and we got two ways of coupling. We could either determine that payments has to hit an API on transaction monitoring. It's going to push that data to that service or maybe we determine that transaction monitoring it's its responsibility. So it's going to pull from an API on payments. Either way, we are now coupling these two systems and we're coupling two systems that actually should be independent of one another have very different reliability expectations, very different expectations from the organization. So not ideal. By moving to an event- driven architecture, we can split these two things. So in the decoupled version, you can see that payments has no idea that
transaction monitoring exists. payments gets to focus on its flow and it pumps out publishes events. In this case, I've called out two that a payment was initiated which has some data about the location of the user, the channel that they were coming in on, the creditor, the data and we also pump out an event to say that the payment was processed, which gateway did it go down. transaction monitoring now gets to be independent and it gets to look at the event stream of payments and say cool for my uh use case to monitor transactions I'm going to pull these two events in the future it could pull new events could go and find new data that it wants to use but it completely decouples these two things and now transaction monitoring can go down without taking payments with it perfect so decoupling veryant important benefit. Second benefit is an immutable activity log. So before we moved to an event- driven architecture for payments, we obviously had payments running through the organization, but it was hard to know where a payment was in all of the
Preventing Disasters: Outbox vs. Inbox Patterns Explained
many flow points in a bank. You have lots of different things that happen. Fraud checks, sanctions, you choose gateways. Uh actually payment gateways themselves tell you lots of different things as responses when you've sent a payment out. And we were struggling to see that. When we moved to an event- driven model, we now had this immutable activity log of the events that were powering a payment. And that's the crucial bit. It wasn't some audit log off the side. It wasn't us explicitly messaging like logs to a log aggregation that we then needed to correlate. The events we saw we trusted because that's what this that's how the system was running. Now actually here I've picked out a couple of events. There's way more in the flow. But we can now see as a business with very nicely businessoriented event names which is something that's important to do within your domain design. We can see that a payment was initiated. fraud check was completed or maybe actually we're still waiting on that it's fallen into a manual operational process where someone's needing to do additional fraud checks on something. So huge benefit that you get from using events to power your system and again very real thing that we have running in production. The third one fan out and I guess I haven't called it out but there's also fan in as the uh alternative. Um if we take another payment related one where off the back of a payment we need to do two things. We need to update our payment limits. So, we need to say, uh, for example, you might only be able to spend £10,000 a day. Um, gosh, I wish I could. Uh, we we'll have that limit assigned to you as a client, and we need to update your payment limits when you've made a payment so that we know where are you against that limit. We also want to send coms. So, maybe a push notification, an SMS, an email, a pigeon to say that your payment has completed. Now without an event driven architecture we of course can solve for this problem. We can do these things but we end up kind of wrapping them together. We end up saying okay so we need to update the payment limits and we need to send coms but if payment limits fails we need to handle that failure somehow. Do we wait for that until we send out our coms? Do we still send out our coms and then we go and fix the payment limits issue? We can avoid all of that
How to Avoid Breaking Event Contracts (Versioning Strategies)
by having a simple event fan out. So one single event that says this payment was processed and then two independent processes and actually in reality there'll be way more than this but two independent processes that go off and do the things that they need to do. Client coms does not care about the payment limit service. It shouldn't need to and it can just work independently. Um, we'll get on to fault tolerance in a second, but it also means that each of these can handle their faults, their retries, their fallouts independently of one another. So, fault tolerance briefly mentioned it a huge benefit of an event-driven architecture. And when we're talking about highly regulated industries, we have to be tolerant to all faults. We are not talking about some uh no disrespect but some IoT processing or big data analytics or anything like that. We are talking about vital things that must happen in this case. Uh and again very real situation. I'm not going to mention specifics about the fraud engine. We have a fraud engine. It's an external vendor. We're not going to mention anything more. um who have some reliability issues and we can't fix those reliability issues. We didn't build their software, but we do need to be able to handle them. And so with an event- driven architecture, we kind of have three places that we can handle faults. And you can customize these however you like based on the domain and the use case that you're solving for. So the first level is that transient box. Now actually this is no different to inprocess retries that you'll probably all have written in your code. If you think about like poly and net where you just define like look we're happy to retry five times. We'll add a bit of a jitter. We'll wait a couple of seconds and hopefully that transient issue that network issue is solved and our request goes through. No different to normal.
Event Ordering vs. Architectural Scale: The Tradeoffs
The only additional benefit you get is that because it's event driven, what this actually means is that this is asynchronous. This is eventually consistent. So you might be able to extend out those transient retries a little bit longer than you might otherwise have. The second level we now get, so fraud engine's still down. Our transient retries have are still failing. We can now actually back off to our eventing tech. And it really doesn't matter which cloudnative eventing tech you're using. It could be Kinesis. It could be Azure event hubs. It could be some kind of managed CFKA instance. Doesn't matter. You can configure this thing on all of them. And this is where you would say, right, we know things are problematic, but we're still going to retry, but we're going to back off a bit more. And we can back off for whatever the organization is happy for us to back off to until we eventually say, "Right, things have gone really bad. We need to dead letter this thing. " Dead lettering is uh pretty important mainly for the problem of poisonous messages, poisonous events. So, if some naughty person pumps out an event into your uh your system and it breaks your eventing contract or it has bad data that just cannot be processed, you need a way for that to escape from your architecture eventually. Otherwise, it will continue retrying forever and you'll have some [snorts] fun screwing around in databases to fix that. So, we have to dead letter. And we have our third level of fault tolerance there because we can. We will alert some human. We will wake someone up at 2 a. m. and they will have to go and look and replay that event if they determine that they want to replay that event. It's not a poisonous message. So, fault tolerance huge benefit of event- driven architectures and we have really benefited from that within our highly regulated uh use cases.
Putting It Together: End-to-End Banking Architecture Diagram
The fifth one that I'm going to call out is plugand play. Now this example we're talking about here is the build out of a new capability rewards. We want to offer rewards to you and we need to actually build out that capability. We don't have it with some mature platforms like payments accounts client where now they're publishing events. They're publishing well-defined, ideally domained designed events out into the world. We actually have a really nice benefit here that rewards might be able to be built without bugging any of them. If the events are good, we can slot this capability in. It just needs permissions to those events, permissions to the event streams, and bang, it now can say, it now knows when a client's on boarded. an account is created. It knows when payments are processed, etc., etc. Once you reach this level of maturity with your event- driven architecture, you can plug in new capabilities really nicely. Okay, I keep pressing the wrong button on this. So, we're going to get into some of the pain now. What hurts. And yes, I will talk about what helps. I also have some lovely AI generated images on this one because it wouldn't be a presentation without them. Um, I found them hilarious. I debated just deleting them, but I I can't go over some of the things that AI decides uh decides to generate. [snorts] Okay, so the first one is not a tech problem. Event-driven architectures are hard for people, mainly people who have not yet worked on those architectures before. So those of you apologies who didn't throw your hands up. This is hard and we
Q&A: Handling Audits, Deduplication Overhead, and Chaos
see it. We see it in our architects and engineers who needed to learn these new concepts, patterns. We've seen it with new joiners who uh in one of our spaces where we had event sourcing as well as event- driven architecture. It took about 6 months for a new joiner to get to the point where they were delivering at the same pace as the engineers in that team already. That is a very real consideration and I think we very easily look at like the technical tradeoffs but this is a real organizational impacting thing that people are going to be slower to deliver and it's a different paradigm when you're designing those solutions. When teams just step into this world, they may almost forget that they're they have a different paradigm and they'll start solving for problems they don't need to solve for. And they might forget about the problems that they really do need to solve for now, i. e. eventual consistency, the fault tolerance that we've talked about, some of the other things that I'll get on to in a second. So your people will find it hard and that should not be ignored. Now, what helps? There are things that help. Hopefully you have a developer platform. If you don't, just quickly create one of those. Um, hopefully you have some concept of like pave roads in your organization. Getting event- driven artifacts into your developer platform which look like service templates. So, as an engineer, I can now step in and go, "Right, here's a good shaped template of an event- driven micros service. " That will help people get started quicker. Uh, application modules that take away a lot of the problems that actually we'll talk about coming up, take away those problems so that not every single engineer is having to solve the same problem over and over again. Do that and do it early. We did and we did find um it much easier for multiple teams to start building out these architectures. Now your developer platform can have all these lovely artifacts, but you do still need to train your people. In fact, it's a dangerous world if you've given them the keys to developing and smashing an event- driven system into production, but you've not focused on training them. Uh, at 2 a. m. when that thing falls over, they're not going to have any idea what the lovely magic that you've written into that developer platform does. So, we need to train them. Um, one thing that we did with uh, we had an enablement team and we took that a delivery team who had been I mean they essentially came to us and said we keep seeing people building event- driven architectures. Um, we'd like some of that but we've never done it before. And so we managed to book out an entire week with that team and our enablement team. This doesn't really scale, but it really did work. And we ran through some training materials, probably some stuff that's similar to what we've we're going to be going through today. And we also designed and built an event- driven system. It was very small in their space that actually ended up in production. By the end of the 5 days, it wasn't quite in production, but it was they had a working system. And I I'm calling that out because it I think it's a shift from how some of us think about training materials. This wasn't just a please go and read a bunch of documentation or go and watch a video. We sat with them, we taught them some stuff, we then got in, designed their system with them, built the system with them, found the problems, solved them, and that team are now off confidently building event-driven systems. Um, third call out, aligning on standards and principles across the estate. The earlier you do this, the better. It doesn't need to be that you define everything, but defining things like your event contracts, defining things about the permissions model that you want on your event streams. Ideally, defining what technology drives those event streams. All of those things, write them down, agree them, so that when you go off and you want to consume someone else's events, it's not completely different to the other system that you've already consumed events from. You end up in that world and you're not going to find pace ever. Okay? And also coming to this talk, so well done. You're all going to be legends after this. Okay? Second pain. again that my [snorts] prompt for this image was uh myself and my clone spending lots of my money. Tada. Um [snorts] so two ends of a spectrum here. Duplicating events and losing events. Both things that in highly regulated industries including aviation we do not want. Uh if you go off and you go off to pay your rent and we just happen to lose that event and your landlord never gets your rent payment, not going to do very well off that. Alternatively, you go off, you buy your new property, uh you put down your deposit and we pay it twice. You're going to be pretty angry. So, we cannot handle this. And it's a call out that's important because in some event- driven architectures again with like big data analytics, IoT devices, this may not be a problem. You can afford to lose an event every 100,000 events. We can't. We are a bank. We cannot have that happen. So this requires design and build upfront. You cannot leave this until later. You will hurt yourself. So what helps? Two things. Inbox patterns and outbox patterns and building both of those into that developer platform into the frameworks that we've just talked about. Build them in immediately so that people get this stuff for free and they don't stumble into paying two deposits. Okay. So what's an inbox pattern and what's an outbox pattern? [snorts] Let's look at the outbox first. An outbox pattern protects you from losing events when you publish them. In our example here, we're looking at onboarding. We're onboarding Aurelia, who just happens to be the name of my daughter because I was writing this slide with her sat next to me. We're on boarding Aurelia. And when we're making that modification to the client's table, we're saving that state. We draw a little transaction around it along with an outbox and in that outbox we put the event that a client was onboarded with our unique ID. We now know that we have updated our state and published an event at the same time within the same transactional boundary. Without that you can very easily end up in a situation where you've updated the state. We know Aurelia exists, but then something falls over when we try to publish that event. Not going to be good. And we're going to lose any of the benefits that we wanted to get from our event- driven architecture. So cool, we save that record to our outbox. And we then just need a little dispatcher pattern. The dispatcher goes off. It maybe it's just polling that uh outbox table. That's fine. and it's going to take that event and actually publish it onto whichever technology you've chosen, CFKA, Kinesis, event hubs, whatever it is that you've decided to use. So fine, using an outbox, we've now protected ourselves from losing events. Crucially, we haven't actually protected ourselves from duplicating events. That dispatcher could still publish something twice. Not ideal. Also, the eventing technology that we publish to may just do some kind of at least once delivery and we're going to end up getting duplicate events. So, we still need to handle those. That's okay because we're going to use an inbox. The inbox is on the consumer side where we're now receiving that client onboarded event and rather than going off and doing our business logic dealing with that event however it is that we're dealing with where we could fail for real business validation reasons or we could run into some transient issue that we've talked about before. No, we immediately pump that event into an inbox and the inbox just states, "Here's the ID of the event. Here's the data. We received it. " Off the back of that, you then go and do your business logic. Fine. Perfect. What this avoids now is if our eventing tech, our at least once delivery eventing tech pumps out the same event, that's absolutely fine. We're protected by our inbox. We're going to check the ID and say, "I've seen that event before. I'm not doing it again. " So, we're nicely protected. Cool. The third uh painful thing to deal with, and actually someone I don't know if they're here, someone commented on my LinkedIn post about doing this talk, but they wanted to discuss this. So, I hope you're here. Is breaking event contracts. We talked about coupling and how by using events we can decouple systems. However, you are coupled by your events. These are a contract that you have promised to the world. And crucially, you can't take them back. One of the things about event- driven architectures is that you'll publish them onto an event stream and it's an immutable event stream and it goes back all the way to the beginning of time. And someone has the right to go back to the beginning of time and replay all of those events. So once you have published something, some data point on your events, you don't get to take it back. Consumers failing because of a change in that event data is a really painful remediation process. We talked about events being immutable. um they're not immutable if you're off editing events in your data store or in your event stream. Uh it's a real thing that I have seen companies doing. Please don't do it. Please don't put yourself in that pain. So we need to really care about our event contracts here. Now what helps? I find the thought of your event being like an API contract helps people probably just because we're more comfortable with what API contracts are and how important they are to not bring in breaking changes. So consider your events like you would consider your APIs. Design them carefully. Be aware that any property that you put on that contract is out there in the world. And if you want to remove it, that's a breaking change. Ideally, avoid those breaking changes if you can. If you can't avoid them, version them like you would an API. On a REST API, you would bring in a V2. If you can't remove that breaking change, you can do that same thing with events. Um, something that you'll see on some event standards is the concept of a data version property. just some metadata that you put on your event that says this is actually version two of this event. Now what this allows your consumers to do and you can just imagine it as like a really simple if else statement in their code. They now can check that they can see okay for v1 we're going to do this event handling for v2 this property has been removed or the data types changed or the whole structure of this event has changed. Ideally, not that would almost be a different event. Um, we can go and branch off and handle that event differently. And so now we can replay from the beginning of time because we've got V1, V1, oh V2. It's fine, safe. The other thing that will help is separating your domain and integration events. I'll go on to what I mean by domain and integration events. I have a little picture, but you will thank yourself later. Essentially, if we draw our bounded context, we draw our domain, you may well have an event- driven architecture within your domain. So, if we take payments, you may well have some internal events in that domain. That's cool. The key thing is to model your integration events. the events that tie multiple domains together, model those differently because this allows you to protect yourself from bleeding domain concepts out that you will then be tied to. You've now contractually said, I'm accidentally pumping this domain concept out and you're now consuming it and oh god, I want to change my domain and I can't. We'll get on to that in a little bit more detail in a sec. The fourth thing that needs considering, it's not necessarily an immediate pain, event ordering, unless you explicitly configure it. Cloud native eventing tech does not care about the order of your events. They're usually built for scale. They're there to say like they'll make massive statements and they'll be like, "Yeah, we can handle a million events a second. " That's because they don't care about the order of your events. Your retries don't either. We talked about fault tolerance. You could be retrying. You could be backing off. You're playing these events through independent of other events. So there is no ordering within your technology, within your architecture. You can introduce it, but just know that it carries more risk. allowing a client to make two $1 million payments because we hadn't updated our balance yet is slightly career limiting. We should not do that. So, there is more risk as soon as we require ordering on our events. It's not that it's not a no-go. We have two approaches that you can follow for event ordering. Firstly, we can bring in an order. So we can stamp our events with a version property which would say for our first event of an aggregate it would say this is version one. For our second one version two, version three, version four. You can call it whatever you like. I like to call it version. And that can enforce ordering within your inbox pattern. Then you can add code to that lovely framework that everyone's getting, everyone's using. You can add code that checks for ordering. And it says for this aggregate, I've used the word aggregate. I realized that wasn't in the foundational definitions. Sorry. Uh for our aggregate, let's say an aggregate is our shopping cart. For our shopping cart, we now have ordering. And we know that event one, event two, event three, event four for adding those hats. Now, maybe we care about the order that those hats were added. uh in which case within our inbox within our ordering check we say look for this shopping cart I haven't seen version one event yet therefore I'm not going to process event two I'm going to back off I'm going to go back to the event stream and hopefully eventually we're living in an eventually consistent world here event one comes in we process it we then get back to retrying event two and we say, "Cool, I've seen event one. I'm going to go off and do it. " The key thing here is that it is less scalable if you bring in that kind of ordering because you've just seen what's happened. We've now essentially built a queue into our event- driven architecture without using queuing technologies. So, it will scale less. It will still work. We have very real implementations of this within the bank where we need that kind of ordering and it works. You just have to be aware of the impact on scale. The second option here is that you introduce implicit ordering where your domain handles the types of events that it can process without necessarily saying I'm going to process them one after another. And it may well be that in this example, we can't pay a beneficiary until we've seen that the beneficiary was created. That makes sense. We don't know the beneficiary details yet. We've implicitly added ordering to our system by having that domain validation. We haven't had to stamp the events. This is a very real approach to the problem. We have a platform within the bank that does this and it works absolutely fine. They've never needed to introduce ordering version stamps on their events. Two very valid options. Okay, good timing. Bringing it all together with a very scary number of boxes. And I'm not expecting you to immediately understand this, but I want to try to bring all of the concepts that we talked about here. I appreciate there was a lot and into a very real again banking use case where we've got payments and we've got communications. We talked about them before. We now have our domain and integration events. We have our two domains have payments and we have communications. And let's follow the flow through. So we have our API. Someone has gone to create a payment. It flows through into our outbox. We have our outbox to avoid losing the event. So, we save the payment to our payments database. And we also save our domain event to our outbox. Maybe we have some internal domain event handling. I'm not going to go into any more detail on that, but maybe we do. Maybe we need to do some stuff. That's fine. We have an inbox on that event handler that avoids uh processing that event multiple times. Perfect. Our event is called something really funky because we own our domain and so we've been really verbose with our event naming. Swift FPS payment processed suite. We'll name it whatever we like because we now have our integration event publisher. This is where I was talking about the difference between domain events and integration events. And you build this into your service template so that people will just have this available to them and they won't accidentally bleed this very specific event out into the world. So we have our publisher. Our publisher does three things. It filters It aggregates and it transforms. So not all domain events will become integration events. Fine. Sometimes you might have a fan in where multiple domain events become just one integration event out into the world. And crucially transformation where we say our domain event has all of these properties. We only want to publish these. Perfect. We've got our nice protection here. Um, some people might kind of see some similarities here with like ACL's. You are protecting your boundary. Okay. So, we pump out our payment processed event. Look, we removed all of our silly domain language. We now just know that a payment was processed into our other domain. Now, for communications, we have our integration event handler. It handles integration events or named. So we handle our payment processed. We have another inbox. Lovely. And we have some maybe filtering aggregation transformation here too. So move into our domain events uh for brevity. I didn't want to fill this with the same thing, but you have the same thing here as you had in the other domain. We then go off and do our work after the inbox has protected us from sending multiple SMSs to you. And again, we can follow that same flow. SMS delivered gets transformed to communication sent out into the integration events. I'm not expecting you to immediately be able to go off and build this, but these are all there deliberately. This is how you can do event- driven architectures within a highly regulated industry with all of these protections in place. And what we found, as I stated, we found by building this stuff in to our developer platform, teams haven't needed to solve these problems. They've not needed to run into these issues at 2 a. m. And we've managed to build quite a few things, quite a few platforms in the cloud in Azure. Sorry to all of you lovely AWS people. I'm sure there's many more AWS people here. And there we go. We'll finish up there. Uh, I'm happy to take a few questions. I know we've only got a few minutes left. Um, I will leave that up. You can burn it into your skull, take pictures, feel free to ask questions outside. I'm around all day today and tomorrow. Uh, and I could talk about this stuff forever. — Uh, on the previous slide regarding the ordering. — Yeah. like stamp events with a version. Like how do we define like what version to assign basically to this events because it might be like based on a database where we are storing a record version and reading and doing plus one or maybe a time based time stamp based uh version but like both of them like in my mind has some pros and cons because if it is time based how do you know like there is an event previously which is not yet processed or like if it is like database then if I'm doing two reads simultaneously plus one so those will have the same versions. Yeah. Uh really good question. Um both of those approaches are done out in the world. Uh there are people who have fights about this online. We have followed the version one of 1 2 3 4 5 6 7 with the understanding that now you've introduced that issue with competing rights. And what that means is that you're probably going to have some kind of unique index on your database that says we can't for this aggregate ID have duplicate versions. So we can't have two events with the number two as their version. We know what that means, right? It will scale worse. — No, — but you can introduce it with that. — Thank you. — Yes. So I'm in a very similar situation as you also doing event uh driven architecture also in the bank. Um and uh we are actually socks compliant. Um and uh our auditor is uh requiring us to prove completeness on our event stream. — Okay. which is of course very much a batch concept but now being put on an infinite stream and I'm having a hard time even grasping it philosophically but uh but h have you encountered this and any suggestions or tips how we might prove completeness on this um maybe we're lucky that I haven't had to be asked that question before because I mean we are audited in the same way um we've not had the question over that because I guess we've been able to just show that we have an immutable log of our events. Uh we can show that they haven't been tampered with. We've not needed to go any further than that. But interesting, that sounds like a chat we should have uh for maybe many hours over a beer. — In a similar industry, for example, where privacy compliance and auditing is a main priority and we don't want to afford dduplication. But at the same time uh for example uh in an outbox pattern lookup in the inbox table is very expensive for us. — Uh what should our tradeoff be? Um, I think the trade-off is probably as you just stated, like if you really can't afford to have duplicate events, you kind of have two options, right? You're either do the inbox if you really can't follow an inbox pattern. You're now relying on item potency on the actions that you're doing off the back of that event. If all of your downstreams have solid item potency logic, you know, you can add your X item potency key header to every single API call that you're making and you know that they're implementing that, then you could probably get away with not having an inbox. But without both of those, there's just risk and you're going to have to manage that risk. Um there's no magic bullet, unfortunately. uh either item potency on your downstream or building in the inbox. Ideally, do both. — When I look at your diagram, I say two boxes, payments and communication. Probably in the real world, you have like well way more boxes. Yeah. — And way more messages, way more integration events. Do you manage who can create integration events? because I I'm just thinking when you have like 200 300 different types of integration events, it could be a little bit chaotic or you just accept psychio chaos. Yeah. Um it's a good question for uh for ease of fitting this onto a slide. I actually didn't introduce there is a third level uh that we tend to have at the bank um because we tend to so we're trying to build out platforms um with varying levels of success but we're trying to build out platforms and within a platform is where you'd probably have those integration events. There is another level where a platform itself is going to publish out events and we we've been naming that public events although they're not public to the world they're public to the organization and again you have the same filtering aggregation etc on [snorts] that on that public level it really doesn't get that noisy um I think if you're if all of your teams understand the implications of publishing an event they don't want to publish that too much data on them. Um, so yeah, you can kind of handle it by having those layers of filtering between this domain level of events that's going to be really noisy. Integration events is going to be less noisy. Then your platform level events are going to be even less noisy. Um, if you still have too much going on, that's ideally where you need to think about the topics that you're exposing from a platform or from whichever level that you need to look at this from. So maybe now you're not just saying here's a single event stream of everything that happened on this platform. You're saying we have a topic specifically for this, this to avoid people having to come and ignore 99% of the events because they're not actually interested. So there are approaches. — I believe you recommended using slim events, but then you'd have to then fill in the data from somewhere. Would that be a blocking integration or — So I'm not going to try and go all the way back. Um I would recommend using lean events. So um the difference being so fat events is where you kind of get you you're essentially carrying your entire entity state around your system in an event. Um [snorts] thin would be like nothing at all. It's just a notification saying something happened. Um and your right to call out I didn't talk about it here. If you are pumping out notification style events, people are going to have to go somewhere to find out data and you're going to end up coupling on whatever that thing is. So, it's a really good question and that's why I like lean events where it's in the middle. It makes the likelihood of them needing to go off to your API to still get more data less likely because as long as you've designed it carefully, you've included all the data that makes sense to be included on this type of event. Um, and nothing more and nothing less. It's a really good call out. Um, and I think it's certainly something that I could try and add into these slides.