us on. Okay, so let's get to the heart of it. So, why don't you show us what solution did you build? How does it work? During this demo, I'm going to ask you some questions as well. Sounds good. Thank you. So, this is our learning studio that we have built as a part of Raj's hackathon. It was I think we had about a week and a half to come up with the MVP. We had a lot of ideas to go with and we chose something that's close to heart for Taran and myself. We both have family members who are who have dyslexia. And for people coming from India, you might have seen Taare Zameen Par. And um in that Ishaan actually has dyslexia. That's the all the words jump jumping around. We have a lot of difficulty reading normal text as paragraphs, as books with 100 200 pages. And that's where I thought this would be a be the right support that 20% of the population would need. Let's get started. I will sign in. Is this font also dyslexic friendly? Yes. Sorry, I forgot to mention. The font that we are using is Open Dyslexic and is a free font available for anybody to use, but unfortunately nobody has decided to focus on this problem that 20% of our population has. And this is our entry point. Once you sign in, we have flashcards, questions and answers behind them. And each of the questions can be spoken into. I have a few documents already preloaded, so that's why we have flashcards that are preloaded. It It's an interface for people coming back in to essentially revise what they have already learned. We got focus mode, which is really important for people with ADHD where they need to just focus on one question, figure out the answer, hyperfocus, and move on from that. And we are keeping track of all of this. So, as you go through, you keep saying got it, okay, no, whatever it is based on your understanding of the question, right? It's basically like a game at this point, which is something ADHD folks enjoy. And we get a full scorecard instant instantaneous response. And I feel like that's important. And here's our uploading facility, which is basically taking all the PDFs that we have. That the user wants to upload. Yeah. So, here's a live example of what we want to provide, the value we want to provide. And Taren needs daughter younger daughter is dyslexic, so we wanted to see from the customer side what is going to help them and she gave us this example. Like hey, we have a uh these questions um and these questions are not really readable because as I mentioned earlier dyslexia um creates a lot of issue for reading. And we said, "Okay, let's see if we can make this readable and more interactive, gamify the entire learning procedure um that the kid has to go through. Uh so, I'm going to drop in the PDF. Um and once we have the PDF, I can choose how many of our flashcards that I want and hit generate here. What happens in the back end is it's going to take this, create embeddings uh out of it, and then use the embeddings to create uh a rag pipeline for us. Once you have the rag pipeline, we're able to create um the flashcards and return back immediately. Um And and as soon as you upload and you get a response, we have the entire uh flashcard set here ready to go for study uh within less than 30 seconds. Um uh and So, this can be used for literally any topic, like even for science and stuff. If you have a PDF copy of the book, you can literally generate bunch of flashcards before your exam and whatnot. — Yes. Yes, that's correct. So, the way um I would just like to add. So, this is where the user input whatever extra knowledge they want to bring to whenever they upload a PDF or, you know, share a YouTube video link or give an MP4 file. So, they're actually bringing the knowledge base. Did you guys and girls do anything to uh evaluate the ideal model for how did you all navigate it? Yeah, we ended up doing quite a bit of uh testing with respect to all the models that was available within the bedrock environment because we wanted to keep everything as secure as possible and try to get this to the highest standards in and nothing's higher than HIPAA. So, we decided to choose HIPAA and I have some bit of experience in that. When we think of this from a essay perspective everybody would say, "Oh yeah, let's just go for the highest reasoning model to get the exact value. " But, that's not the outcome we're trying to target here. We're trying to make this as fast as possible so the user is not going to lose attention. And when it comes to neurodivergence, ADHD, attention is the key parameter from a customer's perspective that we need to look at. And that's why we ended up going with uh Amazon No micro and from a hacker tone perspective, it kind of helped us because it's one of the most cheapest models available. So, the principles of architecture remain the same. Well, we move it to Gemini. I remember the time when we used to run the VMs on prem and then the solution used to be, "Hey, let's get the biggest possible VM. Hey, give the biggest Java heap to be do it. " But, we have to go from the customer first perspective and look at what enriches a customer and focus on the customer perspective is the latency that matters, not the way you got the best biggest model out there. So, and we focus on that perspective as well as of from a well-architected framework perspective, cost optimization cost is important. You might be getting 99. 9% accuracy, but if the cost is 10 times higher we don't deliver it the solution, right? So, in this case, we looked at all the trade-offs and we went with the Titan model. Got it. Okay, so I wanted to add so because this was you know, the objective was to build it out like an MVP for the hackathon purpose, but you know, as we have more scalability, as we start adding more users, we're going to reiterate our testing process to see what makes more sense and we may choose a different model, you know, depending whether it's meeting the customer needs or not. So, it's always, you know, changeable process. We're not stuck on any one use one model versus any other. So, we'll revisit that when we have to. Excellent. Okay, so I see some of the other buttons like history and ride, but before we go there, let's we are all
architects here. Let's talk a little bit about the architecture. Can one of you walk us through the actual design of this application? Sure, yeah, I can talk about that. So, we'll be sharing our architecture diagram as well and then I'll give a quick walk-through of let's say what happens when a user uploads a PDF, right? So, I would say the best way to think about it is in three stages: the user's browser, our serverless backend, and the AI layer. And I do want to mention that none of this runs on a traditional server. So, when a user uploads a PDF, right? So, the first thing happens is the browser ask our upload URL lambda for a pre-signed URL. Basically, a time-limited permission slip to write directly to S3. So, the file never really touches lambda. It goes straight from the browser to S3, which matters because lambda has payload limits and cost, you know, per millisecond. So, once the file lands in S3, the browser calls our generate lambda with just a file key, which essentially becomes a pointer to where the file is. So, the lambda reads it, extracts the text, and sends it to the Amazon Noah micro via bedrock with a prompt that says, "Please generate me flashcards, keep the language simple, and return the structured JSON. " And the model responds. We save the session in DynamoDB and send the cards back to the browser. So, the whole thing also runs inside a private VPC and the entire stack is serverless. Yeah, so which got me curious. Why serverless over containers or EC2? Yeah, do you want to take that over, Turney? Thanks, Rush. That's a very important question. That's a interesting question there. It all comes back from the customer perspective. What is important to the customer? I would say it is the latency that is important. Customer does not care if you're running a cluster or serverless over there. So, for us, it was looking at the customer first and optimizing it for the best experience as well as cost. Well, think it from our use case perspective. A user logs in, does a it does and goes to bed at night, hopefully. And then the servers are if they're running in EC2 machines, they're running idle and then we're, you know, paying for all of that cost. As well as the geographically our customers are at this point in North America and if they are not using it, again your computer resources are idle. As well as the cost to maintain a large containerized infrastructure and a pipeline would be high. So, it was not a just a most easiest route that we picked in. It was a well-thought decision to go with the serverless architecture. From our architecture perspective, we are using Lambda as a full function as a service over there. As well as now using all the serverless computer resources such as S3, DynamoDB. So, our cost is really very minimal ongoing cost, right? So, we only pay for the usage as it goes. Except for our NAT gateway that we have to pay for. — Are you using Bedrock's native knowledge bases or did you build out something of your own? How did you solve that part? So, GenAI is the core of what the product does, right? It's not bolted on. So, let me walk you through how it actually works. When the user uploads content, our Lambda sends the text to Amazon over micro via bedrock. So, the data itself is secure within an AWS network across the region, right? Uh and with a specifically carefully engineered prompt that we created, uh it generates X number of cards and then the and keeps the questions under 15 words, which is really important cuz we're looking for speed at this uh when we're trying to give back um the flash cards. And then answer under uh within the 30-second limit that API Gateway enforces. Uh we can walk we can talk about how we can get that changed through service request, but uh that's we want to work with what we have at this point um for the hackathon. So, and we're using plain language only. We're not We're trying to make sure it is readable and uh understandable, comprehendable by students uh of a younger age. And it's returned with uh at the end it's we get a JSON object uh converted from the model itself. So, we parse that response and save it into DynamoDB for our history tab, which I can walk through. Uh for larger documents, we use rag, which is retrieval augmented generation. I'm sure everyone knows about this. Uh the analogy I use is um instead of cramming an entire textbook uh into the AI's context window at once, we have uh massive textbooks, right? Uh I mean, I I can't even imagine how big they can get. Uh the uh content gets chunked into 512 token pieces and then we uh end up storing that into the bedrock knowledge base backed by S3 vectors storage. That's something new that came in reinvent. Uh we wanted to see if that fit our idea, and it really worked well for us. And at the generation time we retrieve only the chunks that match the specific user's file. Um and that is filtered by metadata sidecar we uh we write at the upload time. So YouTube is different, right? Uh because these those answers are small enough to fit within a prompt. So we skip the knowledge base entirely, go directly to the model, and uh knowing when not to use rag is like uh is what is really important than when to use it. Um so we save on the costs associated with it. So we um so we're able to deliver value faster to the customer at the end of Got it. No, I really like that uh you all have implemented a real-world way of doing rag because in all projects no one just ingests a full document. You have to do chunking, either semantics and then semantic search or similarity, reranking if needed, metadata, right? Very good. Um So do you I don't know like can thousands of people use this application? What would you do when this