# The Better Way to Build AI Apps

## Метаданные

- **Канал:** Josh tried coding
- **YouTube:** https://www.youtube.com/watch?v=s1Dkk48PFmw
- **Дата:** 18.11.2025
- **Длительность:** 13:34
- **Просмотры:** 13,224

## Описание

I've been experimenting with durable AI streams for a few months and figured it's really useful. 

If you have suggestions how to improve this even more let me know!! :]

--- links
live demo: https://upstashrealtime.vercel.app/
open-source code: https://github.com/joschan21/durable-llm-streams

-- my links
second channel (in depth videos): @Joshtriedupstash 
github: https://github.com/joschan21

thanksfor watching, appreciate ya

## Содержание

### [0:00](https://www.youtube.com/watch?v=s1Dkk48PFmw) Segment 1 (00:00 - 05:00)

Yo gang, the trick I'm going to show you in this video is used by some of the best AI providers in the world like Open AI. And what it does is if you have an ongoing AI generation, a running stream, it becomes unbreakable. The user can close their laptop, they can reload the page, they can have a network interruption or five of them. It doesn't matter. Your AI stream will keep working in any case. This is super useful, man. And if you're building an AI app of any kind, your users will probably love you for it. And the thing is, this sounds fancy and complicated, but we're going to take a look at the infrastructure and it's really easy to do, man. It's fully deployable to Verscell. It works on the newest NexJS 16. Let me just show you. So, what's that trick, Josh? How do some of the best AI providers in the world like Enthropic, OpenAI, T3 chat, man, I'm telling you, use this trick I'm showing you in this video to make their user experience extremely good. Well, let me show you. Let's head over to the network tab. Write me a small poem. This is Enthropic with Sonnet, one of the best models in the world. And that's going to trigger a completion request right here. What kind of protocol are we even using here? What does Enthropic use for their AI that we might be able to use for or AI as well? And the answer is right here. Content type text event stream. So this is a SSE, a server sent events connection. And if we go into the response, we can actually see the protocol that Anthropic uses for their own web UI. So we know this is going to be kind of good because this is one of the best AI providers in the world. What can we learn from this? Right? First off, they have their own kind of protocol here like a content block delta and each content block here contains the chunk of text like here here. So these are individual tokens that enthropic sends along through their SSE protocol. So they are streamed on the front end. Now that is not the trick, but it really helps understand how these providers implement what we're about to do. If I ask Chad GBT, write me a long poem and I'm going to hit enter. We are able to refresh the website and the text will continue streaming. So I could refresh midstream and the generation will not stop. It will go on no matter what happens. And the same happens in T3 chat for example here. How does AI work? I can just refresh the page right here and the stream will not care. It will not break. And that makes for a really good user experience because what are some scenarios where stream could break? For example, an AI stream can break if the user has a mobile data interruption, right? If they're walking around with their mobile phone in their hand and something goes wrong, the entire generation might be lost. Or if the user closes their laptop, man, it can be really mundane things that make an AI generation break. The infrastructure behind how these interruptible durable streams, whatever you want to call them, work are extremely interesting because we use two of my favorite primitives in Reddus. Now, really quick, what is Reddus? Reddus is an open-source in-memory data structure store used as a database cache and message broker. Very fancy words. Basically, it's open-source technology like a really fast database, right? Reddis is super nice. It's one of my favorite tools of 2025. I really enjoy it. So, let me just pull in the Reddus instance and show you. One of my favorite ones of all time that we can use in Nex. js 16, 15, whatever, and even deploy on Versel is Reddis. Subscribe, right? So, I'm going to pull in my Reddis instance and we can subscribe to let's call it channel, right? This is the channel we're going to subscribe to that we're later on going to send the AI chunks through. And let's just call that sub right here. And this sub has a on method that we can use. For example, onsubscribe, as soon as we are subscribed to the channel, we're going to log out subscribe. And now comes the really cool part, man. If we do sub. on, there's a message handler. For example, we can say received message. Beautiful. Again, this works in any Nex. js server component, route handler, server action. This is what makes it so powerful because as you're going to see, implementing these durable streams in XJS is really, really easy. So, we are implementing a pub sub pattern in X. js. Let's publish to a channel right here. And we can just publish data, my data, any message that we want to publish in Reddit. This is the most important primitive that makes durable AI streams possible. If I run the demo file right here, we are going to subscribe to this channel. Perfect. And as you can see, the file has not stopped executing. This is still running because this is a persistent connection that we're establishing to our Reddis database. And as soon as we're publishing anything to that Reddis database, like my data here, let's run that file. Here we go. We're going to see receive message, right? In traditional AI messaging setups, we always have a persistent connection between client and server. And technically I have been lying to you because anthropic doesn't actually use the persistent protocol that for example open eye does, right? So if I say write me a long poem and then I refresh the page, the user has a network interruption, closes the laptop, anything happens to this connection on

### [5:00](https://www.youtube.com/watch?v=s1Dkk48PFmw&t=300s) Segment 2 (05:00 - 10:00)

anthropic. Maybe because they're not the best engineers, man. Maybe because they're using Angular. I don't know what they're doing. The generation is gone. So, I just went ahead and refreshed the page. This is the only large AI provider I could find that doesn't implement this really, really nice thing. I don't know why, to be honest, but they use a setup like this, right, between client and server. And if anything goes wrong in the client connection, the server can't transmit data anymore. And we also don't have a message history replay, right? With OpenAI, that's different. If we refresh then first off it replays all the messages up until the point that we last saw and then it streams in all other chunks right so this works really well together so we have a client and then we split responsibility because right now this server right is responsible for two things it's generating the AI stream for example using the versel AI SDK and then also the other responsibility of the server is streaming to the client and That's not great because now we have one service with two responsibilities and we want to split that up. That is probably going to be a bit smelly code. So the easy thing we can do is just have our client and we're going to implement one publisher and one subscriber. So the only thing the client has to do right the user it's really stupid. It just needs to trigger the publisher once that's going to run the AI generation and then it needs to read the current state of the AI stream from the subscriber. Right? That's it. Like architecturally this is really easy. Now the question is how does the subscriber and the publisher how do they communicate right under the hood these are just two next shares routes and the answer is really easy. They communicate through Reddus right so the publisher publishes with the same primitive I just showed you to Reddus and then all the subscriber needs to do is to subscribe to those Reddus events right so it's notified in real time when a new AI chunk arrives. So all the client needs to do is trigger the publisher using a regular fetch call. The publisher publishes AI chunks to Reddus. The subscriber reads them from Reddus using subscribe and then the client can read them from the subscriber via SSE. So this is the app I built to demo you this what are the latest trends in AI just with that pattern implemented. We can actually refresh the page right here during the generation. It's going to continue streaming in the data and it's extremely fast. Right? If I say write me a long poem and you watch how fast this data streams in. This is not a persistent connection between client and server. This is proxied through reddus and that's what allows us this real time looking stream right but technically that's going all through reddus. The final piece if I have a chat write a long poem how does the stream know which messages to replay up until the point where we refresh the page? Let's get rid of all the code here just to kind of clean up this mess. I'm going to rename the publish to XR rev range. Sounds fancy, but it's actually going to be really, really easy. And the demo I'm going to rename to X add. So, what we can do in Reddis is use a really, really cool primitive called Reddis streams. This is one of my favorite discoveries of the year 2025 in development. Reddis streams are extremely OP. Basically, they're a data structure you can use to absolutely store any amount of information, right? These are extremely memory efficient like event sourcing, tracking user actions in an analytics software, sensor monitoring like IoT devices, right? You can insert so much data per second into this notifications and these are extremely memory efficient pairs of a time stamp. I don't know imagine this is a Unix time stamp and data like my data. So the point I want to get to is with Reddus streams, if we have like an onchunk handler that runs for every token in the AI stream, we should store that in a radius stream over here. And because these are sorted by Unix timestamp, we get replay functionality, right? We can say give me all chunks that the AI generated from this Unix time stamp and onwards. For example, right? If we have some delay here, we get extremely memory efficient replay functionality. Verscel made their own package for this called resumable stream. And I think that's at like, I don't know, 50,000 downloads per week or something, even a 100,000 downloads per week. Now, Vzero uses it, OpenAI uses it at extreme scale, right? All these big providers need what I'm showing you in this video. It's actually really easy to implement. All we need to do is say, await. x, X add which is the command to add to a stream right and this takes a couple of things like the key let's do channel like the message ID we're going to let it automatically generate an ID which is going to be the Unix time stamp and then lastly we can enter I don't know my data for example this would be the actual content of the chunk from the AI if you now go ahead and run that tsx exact let's run this file we just added

### [10:00](https://www.youtube.com/watch?v=s1Dkk48PFmw&t=600s) Segment 3 (10:00 - 13:00)

an AI chunk metaphorically to this reddest stream right and all we need to do to get replay functionality is actually query. So we can say con data is going to be equal to await radius. x rev range. So basically we are taking the entire stream in rev in reverse order channel because that's the uh stream that we want to get the data from and we're going to go from minus to plus. So from beginning to end we want all the data in the stream and let's just log out that data here. That's literally all we need to do man. Let's run this file. Oops. I switched up the order here. It needs to be plus and minus. Bam. So we get replay functionality, right? If we try this again, if we added one more chunk to the AI stream, which in reality you would add like five chunks per second, right? This would be really fast as the AI is generating. Then let's try the query again. We would get all the data from the stream. So we have two messages now. This AI chunk and this AI chunk. And if for any reason we wanted to only query AI chunks from a specific time onwards, these are already sorted. Because streams are so memory efficient, we can say give us only stream entries starting from this Unix time stamp, from this ID that you generated. If I run this again, it's only going to get entries from this point on and onwards. So, we get this replaying of messages functionality and that's the missing piece. If I go ahead to a new chat, write me a long poem, and I hit enter, it's going to generate an ID. And as soon as we refresh, all chunks up until this very point right now were replayed from the Reddus stream. And all the new incoming chunks are then in real time submitted via Reddis Pub, the first primitive I showed you. And just with that pattern, you can implement an extremely reliable AI stream that survives basically anything you throw at it. And if the mobile data interrupts, if the user closes their laptop, it is absolutely crazy what you can do with this. And if you want, you can make a lot of money with this. So I think the principle behind this is just super interesting. It works really well. And if I show you the entire code that I used in this example right here, the fully working open AI like version that's also extremely fast by the way, you're going to see this is comically easy to implement. Man, this is it. Right? But if I zoom out a little bit, the entire thing is like 64 lines of code and it's just AI SDK, man. So where we have a result stream and for every chunk of the stream, for every AI chunk, you just emit that message. And under the hood, this does two things. First off is the X ad. So it adds it to the Reddit stream history. Then the second one is the publish so that we can subscribe to new events right here. Right? So we replay the history of the AI chunk event and then we just incue that data using SSE server send events to the client and that's how we get extremely durable AI streams by just changing up our infrastructure a little bit to this pattern right here from the client to the publisher to Reddus and then the subscriber can read from Reddus and publish to the client. Seems confusing maybe. I hope it's not. It's kind of a difficult concept, but I hope I explained it pretty well. Thanks so much for listening, man. I really hope you enjoyed this video. This was my first video in like a year, man. Kind of crazy to think about and that I went for such a like difficult technical concept. I don't know. I hope you enjoyed, man. I really like this pattern. I hope I could share that enthusiasm with you. Thanks so much for watching and I'm going to see you in the next video. Until then, have a good one and bye-bye.

---
*Источник: https://ekstraktznaniy.ru/video/49190*