# The Right Way to Scale Long-Running API Requests

## Метаданные

- **Канал:** Milan Jovanović
- **YouTube:** https://www.youtube.com/watch?v=U40HzU_KkDY
- **Дата:** 12.05.2026
- **Длительность:** 9:08
- **Просмотры:** 10,615
- **Источник:** https://ekstraktznaniy.ru/video/51637

## Описание

Get the source code for this video here → https://the-dotnet-weekly.kit.com/scaling-slow-apis
Want to master Clean Architecture? Go here: https://dub.sh/clean-architecture
Want to master Modular Monoliths? Go here: https://dub.sh/modular-monolith

Join the .NET Architects Club: https://www.skool.com/mj-tech-community-5418/about
Get the 2026 .NET Developer roadmap here → https://the-dotnet-weekly.ck.page/2026-roadmap

Long-running API requests can quietly destroy your API’s scalability.

At first, the solution seems obvious: accept the request, do the work, return the result.

But what happens when that work takes several minutes?

Now your users are waiting. Your API is holding connections open. And during traffic spikes, your server can quickly run out of capacity just handling slow requests.

In this video, I’ll show you the right way to scale long-running API requests using a step-by-step architecture progression.

We’ll start with the naive synchronous approach, then improve it by 

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Imagine you have an API endpoint that takes several minutes to complete and this is by all means slow. This could be generating some report, it could be uploading a large file, or it could be talking to several external services and doing a lot of database work. Now, we have two problems here. The first one is our users need to wait a long time to get a response back from our API. The second one is that the API needs to hold this request open for a long time and this limits the API's ability to handle concurrent requests. Now, in this video, I want to show you a progression for how we can fix this and make our API more scalable. Let's start from our initial setup. We've got a user, an application server, and we've got some database. Let me zoom in a little so that we can see all of this better and our user sends a request to the application server. The application server could be talking to the database or doing some other work and the problem is that this request takes several minutes to complete. Now, this is bad for several reasons. First, it's bad user experience to keep the user waiting for a large amount of time. It's also problematic that if we get traffic spikes and all of the requests take several minutes to complete, it might overwhelm the server and limit our ability to even function correctly and serve regular API requests. So, whenever you run into long-running requests like this, you've got several options for how you can improve the overall design, improve the responsiveness of your server, and also give yourself more flexibility in how you want to process this request. So, your first progression should be to prevent the user from having to wait several minutes to get a response back and this seemingly improves the responsiveness of your API. So, how we're going to achieve this is to define a jobs table inside of our database. This is going to represent the work that we are trying to complete. Then, we're going to update our API to immediately respond to incoming requests with a 202 accepted response. So, this means that the server or API acknowledges that it received the request and it's going to begin processing it behind the scenes. What this involves in practice is just storing the job within the database and returning to it to immediately run this API endpoint. So, we've essentially fixed two issues here. We've made our API more responsive, so now our user experience is somewhat improved and we've also kind of solved the problem with traffic spikes because accepting an incoming request and immediately writing it to the database isn't a lot of work as opposed to processing this request for several minutes. But, there's still a question of if this remains scalable because we now need to introduce another component into our system. So, let me add this component here and this is going to be our background processor. Now, this component essentially runs inside of our API and what it's going to do is it's going to pick up any jobs that need to be processed from our jobs table and start handling them in the background. Now, we're going to need either a new API endpoint for the user to check the status of the background processing, which would mean we need to poll our backend to figure out the status of the job, or we could just notify the clients asynchronously using web sockets or emails or server-sent events. So, let's say notify users async and I'm going to outline a couple of options. Let's say web socket, server-sent events, or email, or anything else that comes to mind. So, while this is an improvement on our initial design, it still runs into a bottleneck because the background processor consumes the same resources as the application server. So, in case of traffic spikes, we could run into a situation where the background processor overwhelms the resources on our server, and the API will just stop working. So, if you want to take this further, you need to decouple the background processor from the application server, which is essentially our API. So, here's how you can do it. We're going to need another component inside of our system, which is going to be our queue. Let me update the name here. And how this is going to work now, I'll have to just update the UI a bit. We can now either store the job in the database and publish a message to the queue, or we could completely alter the flow to just immediately publish to the queue, and we're going to return a 202 response to our client. Now, as I said, we're doing this to decouple our background processor from our application server. So, essentially, I can now take the background processor, let me detach this request, and move it into its own component. From here, the background processor can pick up the messages from our queue. So, it's going to dequeue the job, and it's going to start processing. Now, this gives a lot more flexibility

### Segment 2 (05:00 - 09:00) [5:00]

to our application server to handle traffic spikes, as our queue is going to act as a buffer, and it's going to allow us to smooth out incoming requests while being able to process them in the background. Another big advantage that this gives us is now our background processor isn't tied to the resources of our API, which means we have an additional ability to scale our system by simply introducing more background processors. So, this could look something like this. Let me just update this slightly. So, we would effectively have multiple background processor instances that are all going to be dequeuing from our message queue and processing these requests in parallel. This is effectively the competing consumers pattern, which is a standard way to scale a queue-based system. So, we've gone from our initial just client-server setup, where the application server was handling all of the incoming requests, to storing the incoming requests inside of our jobs table, and then using a background processor that runs within our API to pick up jobs from this table and handle them one by one. This significantly improved the responsiveness of our API, and it also improves the user experience, but it gives us several additional advantages that you should be aware of. For example, we now get the ability to retry the processing of this request without the user having to manually kick this off. We also get the ability to pause and resume processing. For example, if the API stops suddenly or the background processor stops for any reason, we could store the current progress information inside of the jobs table and just pick up from where we left off when the background processor starts up again. And this also gives us the ability to get some very good error handling on the entire process, and we could even implement a dead letter queue for any incoming requests that we aren't able to process within a meaningful number of attempts. When the system reaches its limits, you can offload this to using a queue, which is an external component that's also going to smooth out incoming requests, basically acting as a buffer, where you move the background processor into a separate process to handle messages from the queue, and then it's going to process the same job effectively, storing it inside of our database. And this gives us the ability to introduce competing consumers to be able to process a larger number of messages from our queue. Now, additionally, we are going to need another lightweight component here, which I'm going to call the notification service, and this will essentially be a component inside of our application server, and it's going to be responsible for notifying the users about any of the request processing updates. So, with the design that we have right now, we are able to scale to a much larger number of requests processed while not sacrificing the overall performance of our API, which is crucial to our users. And lastly, just a brief comment on where this type of design is a good fit for, it's a good fit when you have very long-running API requests. This also assumes that your users don't need the final results immediately. Now, when you think about it, this was actually the case because they would just be waiting on the UI for several minutes to get the result of the processing. So, from the users' perspective, not much changes. They just get a more pleasant user experience as we make it transparent that we're doing the processing behind the scenes. Also, this is a good fit for when you have traffic spikes that you need to smooth out as we can use the queue as a buffer and handle the processing behind the scenes as the available resources on our background workers allow it. And lastly, it's a good fit for whenever you need the ability to scale out your background processing independently from your API. If you want to see an example of how this is implemented, you can grab the source code for this completely for free from the pinned comment right below this video. And if you actually want to see how this is built from scratch, then I suggest that you watch this video next. If you enjoyed this discussion and want me to make more videos like this one, then gently tap the like button to let me know. Thanks a lot for watching, and until next time, stay awesome.