📚 Join the #1 community for AI entrepreneurs, get all my resources & connect with 280k+ members: https://bit.ly/48GDTsB
📈 Become a Wildly Profitable AI Entrepreneur: https://bit.ly/3KVHTvu
🤝 Ready to transform your business with AI? Let's talk: https://bit.ly/3KUveZA
📋 Get our FREE 14-day playbook for finding high-impact AI opportunities in any business: https://bit.ly/14-day-playbook-
Voice AI is one of the few AI channels delivering predictable ROI right now. With Brendan Jowett, we break down how to architect, test, and ship production-grade AI voice agents—covering platform choices (Vapi vs. Voiceflow), precision prompting, and functions/integrations for CRM/calendar, appointment booking, transfers, and SMS. You’ll see the 8-week rollout and automated QA workflow that cuts time-to-production while improving conversion rates and ad ROI. We also share pricing and retainer models agencies use to sell and support reliable AI receptionists and speed-to-lead systems.
Connect with Brendan:
https://www.linkedin.com/in/brendanjowett/
https://www.youtube.com/@UCzIsviqoJc-VcWqF5Pp8iLw
⏱️ Timestamps:
00:00 What We're Covering
01:17 How to build a Voice Agent 101
05:24 Downside of conversational AI Prompts
08:30 Visual Call Flow Diagrams
11:45 Testing Blueprint
23:33 Relyable Demo: simulations, monitoring, and scoring with Retell/Vapi integrations
Оглавление (6 сегментов)
What We're Covering
Despite all of the hype and bubble talk around AI, there is one area at the moment that is time and time again proving to deliver a positive ROI for their clients, and that is voice AI. So today, I wanted to bring on one of my good friends, Brendan Jawer, who is one of the pioneers of the voice AI space, especially here on YouTube. And over the past 2 years, he's been able to go from a complete business and AI beginner to running his own successful agency, having his own successful education business, and also having his own SAS, which we're going to dive into here when it comes to testing your voice agents, as well as a live demo of the voice AI testing software that he uses at his agency to drastically shrink down the time to production. Brendan has been kind enough to come on here and give us a master class on what it really takes to build production grade voice agents that provide a positive ROI for your clients and will allow you to build a successful agency with some juicy retainer income coming in each month. So, there's so much sourcing here. I hope you guys enjoy. Brendon, mate, it's great to see you again. Thank you for coming on. — Yeah, no, Ro, thanks for having me. — Obviously, you're one of the pioneers of the voice AI space. So, um I was interested in your take on uh on what people should be building these days and the best way to really build a voice agent. And as we'll get to uh how to test it. As we both know, there's a big distance difference between a uh your average voice agent, one that's ready for production. And I think a lot of people who are trying to uh really build an agency around voice are going to struggle with their production, I bet, as you have. So, I'm excited to get into it.
How to build a Voice Agent 101
— Cool. So, I'll just jump into how to build a voice agent sort of 101 um just to really set the stage as to what it takes to get one of these systems up and running. Um if you are brand new to the space um or if you are obviously building the systems already um there's really three main things you need to do. There's obviously a bit more to it but these are the three main things. Um one thing you pick out a platform. So that could be retail, could be vapy. There's obviously plenty of others but these are the two that I use the most. Prompting is obviously the second thing you need to jump into. Um so building out an instruction set for that voice agent um is going to be pretty key to building it out. You're just guiding it as to how to act. Um and that could be something super simple. U but as I'll show these things can get quite complex and you need to be very um specific when it comes to building them functions essentially getting sterile data from these apps um appointment booking sending text messages ending the call transferring the call um once you've got three those three main things um you'll seemingly have something up and running. One thing to note building these voice agents is it's seemingly very easy but don't get trapped by that. And I think I sort of contribute to that a little bit creating my sort of 20inut videos on YouTube on how to build a voice agent. It seems quite easy to do it um just by you know a basic framework to build it but obviously um it's not that easy and it takes a bit of time. I think it's like anything, right? If you told me how hard it would be to have like made money online to build a business or anything, I like I probably still would have done it, but you know what I mean? If you tell them how hard it is up front, like sometimes you got to just like tackle those walls one after the other and like — yeah, — here's how to start it in 20 minutes and then they get a bit of like experience and so from there they take on the next challenge and next challenge and before they know they are building it in production. So, uh sometimes it's the naivity is actually a benefit. We give a bit of context behind obviously building one of these systems. Popular use cases we see um at least for us as an agency is the inbound receptionist system. So that could just be answering incoming phone calls or booking appointments is probably one of the biggest things um is you know booking into Google calendar um that could just be for if there service um that's going to be one of the main things. Live transfer of calls um is a pretty big thing as well. So doing a call routing system um is a big thing. So I think there's different layers to these systems. Um, you know, you could have one system which just does call transfers right out of the gate. Um, and it's kind of like a AI advanced IVR. So, you're really not taking on the entire channel of of voice. Um, you're just starting at the start and you're just pretty much having an advanced IVR system. So, that's one use case. Outbound, there's a lot of use cases. We don't dive into cold calling at all. Um, there's a lot of regulations around it at the moment. Um, pretty much in the US, it's mostly banned um to really do any cold calling because you need consent for the call. Um otherwise there are still some good outbound use cases. Like I mentioned earlier the speed to lead use case is something that um we've done quite a bit and we've seen quite a good amount of results out of. Um that's something that I would recommend as well. For people who uh are maybe new to this that just means as soon as a lead arrives in the business whether it's through ads or content or any form of inbound lead. Usually those things will sit in the database or sit in the CRM for uh like 6 12 uh 24 hours before they get back to them. And the whole speed to lead thing that is very popular right now. It's very easy to create value with a system like this that would use a voice agent to immediately reach out and do a quick qualification and then book them in if they're qualified or like human handoff or something at least reach out to them. Um or might even be SMS and you're starting a conver like conversation with the lead and saying hey thanks for this like tell me a bit more about yourself and they can go through the qualification process. So shrinking that down is proven from like 6 to 12 hours to down to a few minutes or even like 30 seconds gives a massive lift in the in the conversion. And if you're looking at companies who have maybe they're running paid traffic, running paid ads to a landing page or something, getting that uh that speed to lead down is going to massively increase the ROI of their ads because they're going to perform better overall. So, gives you much more budget to spend and uh overall better performing ads. — I think Alexi has a good um case on it. I think he put on his LinkedIn once, which was um yeah, if you call them up in the first, you know, 30 seconds or something, the chance of closing it is just radically higher. Um, so that's always a good pitch to make with these systems. Once you've got use case, uh
Downside of conversational AI Prompts
and once you've built out your system, um, obviously putting these systems in production is a whole another thing. Um, and so downside of conversational AI prompts and really generative AI in general is that we can't predict what the AI is going to say next every time. And so as an agency, um, the fact that we can't verify the certainty as to what it's going to say every time, um, isn't necessarily the best thing that the client wants to hear that, you know, we don't know exactly what it's going to say next. And so very initially as an agency building out these voice agents um you know beyond the few manual tests that we were doing and conducting ourselves we were essentially launching these agents blind. You know we didn't really fully understand where the weak points were um and really where the prompt was. Um and this did lead to a bit of frustration from our clients you know initially. Um because there was just really a lack of clear visibility into how well it was performing at scale and it just wasn't consistent. Um and you know we didn't even have an idea as to how well it would able to perform. Um, you know, a lot of the time it would either be a client that would give us feedback or even the person, you know, the users calling it, which is not a good situation to really be in. — Yeah. It's just not feasible for you to like ring this thing and put it through 300 different calls. — Um, and really like test it before putting it out, at least uh when we first started this. — Throw in this quick quote here. Question isn't whether your application will go off the rails, it's when and how much. This was from the principles of AI agents. I'd recommend it. Good sort of read on sort of AI agent architecture. pretty clear that our goal isn't to get 100% um when it is with these systems. We aren't going to be able to guarantee 100% performance at any time. Um and our goal is to just mitigate how severe it goes off the rails — like with humans as well. You know, you're not going to get a flawless team of uh human support or if it's like a human receptionist, they're also going to up sometimes too. So, uh probably a little bit more on the AI side, unfortunately. — And just give some examples as to what could go wrong as well. um if you haven't really pushed one of these systems as far as you can. I mean, this is only like a percent of what could possibly go wrong. Um with even super simple systems, something that is asking a couple qualifying questions um and booking an appointment. Um like as rewarded a greeting message is a big thing. Um just rewarding stuff, you know, it might not seem like it's that big of a deal, but you know, in some clients, they want it to be read out a specific way. Forgetting to read an AI disclaimer is a big thing. Um, in some states you need to, you know, mention that it's an AI at the start of the call. Um, if it doesn't mention that, that's sort of a, you know, compliance issue. Um, in some states we need to mention that the call is being recorded. If you're recording the call, that's also going to be sort of a compliance issue as well and something that we need to have something in place to make sure that, uh, you know, we're at least tracking that at the bare minimum. I get a lot of comments on my videos at least, um, which is sort of how complex can these systems even get? Um, you know, and obviously us as an agency, um, you know, we never used to really map out our agents. We sort of just did a very rough, um, you know, ask of the client, what are you looking for? Point booking system. Cool. You want to ask some questions? Great. We sort of just mapped it out in a document. Um, and that process wasn't the best, um, in order to cover all of the edge cases and areas of an agent. Um, and it was a very, um, you know, just yoling it is what we were doing. And so this right here, just to quickly skim through it, are a
Visual Call Flow Diagrams
couple of visual call flow diagrams, as we call it, that we've mapped out for some of our clients for these types of voice systems. Um, and as you can see that the complexity, it definitely gets there. Um, so if you're not just building out a simple straightforward booking system, asking a couple of questions. Um, you know, these systems can reach um, you know, different levels with deeper integrations with CRM. Um, and so if you are going to be building out these systems for clients, um, you know, just so that you understand, um, you know, it can get a bit more complex. So, in terms of how we actually map out, um, our diagrams, this specific one here, we've got a bit of a legend at the top. Um, it's a bit blurry, but this is sort of confidential. Anyway, the blue is the message. So, that's kind of whenever we're going to be speaking anything specific. Um, so if we're going to be saying anything, if it's a message, a question, um, anything they want to read out, that's the blue, the blue areas. Orange is API calls. So, if we're going to be sending any data off anywhere, communicating to a CRM, um, whatever reason, maybe transferring the call as well, um, that's what the orange is. — So, every one of those orange ones is a tool call. — Yeah. — Damn. How many do you get on like a on a production grade voice agent, say it's inbound receptionist, how many tools have you equipped it with? — Max, I mean, we've probably done definitely some we've had about 10 um 10 tool calls. — Um, we probably haven't really gone any further than that. And I mean in this particular case, some of these tool calls are actually the same tool. It's just that they're ran under different conditions. — Exactly. Yeah. — And what do you think when you've got like uh this has obviously got quite a lot of different like if else paths and stuff like that. When you look at something like voice flow where they've got their much more sort of deterministic, you can control the flow. You can tell it exactly what to say, but they also have more agentic kind of uh LLM based answering as well. uh in contrast to something like retail or vapy which is as far as I know I mean bland I think had those conversational pathways which was kind of good like splitting down the middle — what's the current state of the industry and like where do you are you using some platforms that have a mix of these um when's it best to use do you use those at all the more deterministic tools like voice flow to deliver those systems that maybe need a little bit more control or are you just fully doing this uh based on like um conversational AI with tool calling — yeah it's a pretty common debate at the moment we do stick to single prompt systems quite a bit. So on retail and vapy um we've built the majority of our systems using the single prompt. So literally just building out an entire instruction set that guides it to go down all these different pathways. We have built some systems on the conversational pathways um the conversational flows. Um I think this one actually might have been on the conversational flow. — Vap and retail have that as well. Now — they do. So retail um and Vappy both have um yeah like visual builders um with blocks and connecting those blocks. Um, to be honest, I still prefer the single prompt systems. Um, when it comes to just the level of flexibility that we have in moving all around the conversation, it can still be a bit rigid when you build out using those block tools. And yeah, on voice flow, voice flow is definitely an interesting one. I think their approach is actually slightly different to the others. And I think it's something that, um, we'll probably still be assessing. Um, you know, at the moment I think they've got single prompt systems with their agents, but they've got the ability to call out to a flow a bit more conditionally and that gives you a little bit more control. Um, but I think retail and Vapius a little bit better on the voice side of it anyway. So, we stick to them. So, once you've got your
Testing Blueprint
agent up and running, you've got something that the client's using, um, there's a testing process involved. Um, no doubt about it. And so this is a bit of a screenshot from one of our client docs um for the timeline of our sort of key milestones. Um and so this is an example of a project that went for a total of eight weeks. This involved two weeks of development of building out the initial instruction set and automations and every week after that was literally dedicated to testing. Um, so we kicked it off with two weeks of internal testing after jumped into two weeks of live testing before jumping automated testing which I'll obviously add a bit more context on later. Um, but yeah, obviously given our and our you know understanding of the importance of it as I just mentioned um this is the level of testing that we found is almost or definitely necessary to get — two weeks div and six weeks of testing. — Exactly. Yeah. This is our sort of this is our allout approach. you know sort of maximum amount of testing. You guys are technically like a voice agent testing agents at this point, right? Like development is — change the name. — Yeah. Wow, that's crazy. I didn't realize it'd be that split. You got literally a quarter of it on the dev. I mean, that's a good sign for people who are looking to get into it because a lot of the time people are going to want to buy those PCs and at least get it up and running. So, if it is sort of two weeks to get something that's ready for internal testing, that's at least a deliverable that a beginner agency could hope to deliver, right? — Yeah, exactly. Um yeah, the building the systems themselves are actually yeah what not the hard part is what we found. Um and the testing is the part. Obviously we still do development during all of those testing periods. If they have things that we need to add to it and fix it um whatever we'll still do development all throughout it. Um but the focus is on testing it after. So usually like I said that's probably what our sort of you know maximum amount of testing that we'll do. Um we'll either do live testing first or we'll do automated testing before live testing. Only reason for that is just depending on what the client wants to do. Obviously, putting the system for live calls and real people is a bit risky. So, if they want and the client if it's a specific use case that maybe has flows, — we wouldn't do that right away. Uh we would do automated testing first then live testing. Um but usually we would like to do live testing earlier just because we get a little bit more data as to um what the automated testing period could look like. Um but yeah, I'll jump in a little bit more into that later. Let's just go through like internal testing because obviously that's the first thing that most agencies who are starting to deliver these things are going to be looking at. What is that process look like? Is it one of your devs just like bashing away at it and calling it over and over again? Are you going to the client and asking hey can you give us an idea of what these people are even going to say? Can you give us some prior transcripts so we can like attempt to like make a couple different uh use case not use case but examples of personas maybe like what's that internal testing process just so people know. So it's going to be us calling it and testing it and just making sure that the main flows um do what they should at least from a very small set of test calls. Um but I think the biggest thing we get out of it is just working with the team, working with the client and getting them to test it and get their feedback on that's on the side of reliability but also just getting them familiar with actually just talking to it. Um so less about how well it's performing but just the system overall getting them comfortable with it. If they have any feedback on the voice they want to change it. um stuff like that. I think that's what we mostly get out of the internal testing period. Um and I can scroll down a little bit here and we've got I've got this nice little meme here without testing and with testing. Um this is sort of just going back to if you don't test it at all, you call it a couple times, it sounds amazing and so you think you're an awesome prompt engineer because you've called it three times and it worked every time. Um but turns out as you start to test it more, um it gets a little bit worse. And then obviously once you've tested it max amount of times um you know you're now you know at the top of the game. So this is a sheet from some real client internal testing that we did. Um and so this is just a really just a small snippet um of a sheet that we have. Um and so this is pretty much what came through that internal testing for two weeks. This is the issues and adjustments. So this is the client coming in and actually just mentioning that this is what's going wrong. Um this is what needs to be fixed and changed and updated um from their own feedback as well as our own feedback as well. Um coming in and making suggestions. So some of this are bugs, some of these are just sort of wording things. Um but I guess this is just to help you understand um that this is what you could expect from a client if you are going to be putting one of these systems out. This is the kind of adjustments and sort of expectations for um getting it up and running. say they've left some feedback, you're going to go and pull up the call recording on Vapor Retail and listen through that, analyze it, and then I assume like that right hand column is for like if it's been fixed or patched or like still an issue or like escalating to you to look into. — Exactly. Yeah. So, we got a phone number and call time that we put in this list right here. If there's a specific call that we have to have a look at, we'll jump into there. Um otherwise, yeah, we'll just jump in. We'll add a note that we've either fixed it or we have a question about it. Um and then all the way on the right hand side we've got if we've completed it there's no action needed um for whatever reason and then or if it needs clarification it's not very clear. — Okay so that's internal testing and then you go to live testing with actual like putting into production and seeing when it comes to evaluating those are you like going through and pulling the transcripts and then analyzing them with AI? Are you getting your team to go through and listen to them one by one? Are you like how are you picking up on things like uh like hang up rates or or like — I don't know like drop calls and things like that. how there's obviously a lot more to it. What's the process for doing that? Or are you getting your client to go through and listen to them and let you know? So, there's so many different ways you could do that. — Well, you know, one, we do have the clients obviously still reaching out and telling us um you know about adjustments. There's we can't really avoid that. Um that's always going to come through as well as users maybe mentioning you a thing here or there. So, the main thing that we do, we use our own platform, which I'll jump into in just a second. Um and that actually monitors every single one of the calls and gives us a score. Otherwise, I'll quick run through what do we actually learn from testing? why do we even do this and what do we get out of it do about it? So really three key things prompt improvements. So really looking at where does the prompt lack context is kind of what we look at. So if somebody you know client gives us an issue you know why is this an issue? Why is this happening? Um usually you know the prompt does have everything it needs. It's just that it hasn't been sort of worded properly to give it enough context to understand exactly how to do it or um you know we just haven't given it enough you know proper structure for it to understand that properly. Otherwise yeah we're just looking at why it's getting confused. Um we're improving function call responses as well. So obviously prompt engineering is a big part of it but also the function calls are a big part of it. Improving um you know what those responses look like. Um you know it's not just running an automation but it's making sure that the actual structure of that info coming back into the agent itself um can actually get a good understanding of it and make it so that's conversationally something that's good to speak with. User experience is something we look at as well. So it's not just about is it broken or not but also is the actual experience good? um you know, are we able to actually speak with it naturally? Um is it something that people want to speak with? Is everybody just hanging up? Um you know, are people getting mad at it? Whatever. That's a lot of stuff that we'll still evaluate. Of course, — on that last one, the request types, do you also attach some kind of dashboard to uh to your clients so that they can sort of know what the hell's going on? Like, hey, you've received, I don't know, x like 100 calls. We had 20 of them result in a booking. We had uh 10% of them hung up. How do you actually give that reporting back to the client? — Yeah, we actually just use um so like retail for example, they've got an analytics page in there and we so the way we work with we actually hand over the retail accounts to our clients uh which might be a little bit different from others, but we just hand over the entire account. Um and on there you can set up the actual analytics. — You know that they're not going to like figure it out and like ditch you like — Yeah. Exactly. — There's enough difficulty with it that uh that's like you're not really bothered if they like have it. — Yeah, exactly. you know, we've um you know, we've got clients that uh on a retainer with us. Um and yeah, all of our clients own the entire retail account. They've got access to it. — Um and on there there's yeah, there's obviously your call history. Um and you can have a look through there. And there's also an analytics dashboard which you can custom configure to look at things like end call reason, the function calls that are ran for appointment booking. Um and yeah, you can get a dashboard that just tells you all that stuff. So they can get the insights into that. — On that while we're on retainers, uh what's the structure that you're usually doing? And there's of course probably an upfront like 5 to 10k or I believe you said over the weekend like 8,000 Aussie or something and then you've got a retainer based off usage or is there a fixed part and is that just pure maintenance or is there some improvement optimization? I know there's a few different ways you can cut a retainer deal. So how are you doing it at the moment? — Yeah, so our setup is from at the moment about $6 to 12,000. Um so that's for the setup of an agent. Um and yeah, just because you know the way I get leads is inbound. It's through YouTube. people are familiar with retail and Vappy and these platforms and a lot of them do want to actually own the platform or at least own the account. They don't want it to be locked away or upsold or anything. So yeah, in terms of a retainer on the base of that um we have optional support packages. So it's not absolutely required for them to go forward with it, but we do heavily recommend it and obviously when I get into to monitoring in that process and why that's important, um we stress it to them that it's you're probably going to have some issues later um if you don't have us, you know, at least monitoring it um to some degree. Um, so you're putting like range fixed, yeah, fixed 1K or you've got usage on top of that. — Yeah. Um, yeah, it can range from about 1K. Um, could go up to like four to 5K a month, but that would include actual adjustments every month. So, if you want to have a set amount of, um, development hours, — Yep. for us to jump in to add workflows, to add stuff. Um, we've got a, you know, a series of packages to offer them um, if they want to be more aggressive or more lenient on it. And — just quickly, people probably thinking at this point, like, how did you learn all this? How did you get to this point where you're selling like multiple thousand per month retainers to these companies? So what like your background was not in this sort of stuff before, right? — My background wasn't um I don't I didn't have any sort of formal technical background. I mean I have been sort of playing around with things for you know quite a long time. Um just random things here and there as just a hobbyist. Um you know I did do random things but yeah nothing um academic. I didn't go to study computer science or anything. — And you're 20 right? — 20 years old. Yeah. — Oh my god. Like you guys are look like you should be so inspired by what Brandon's done here, guys. Like 20 years old running voice agency at this scale with this level of depth of expertise. Um I'm freaking blown away by what you've done, man. So um let's keep going. — I mean YouTube has been the biggest thing to help learn this. No doubt about it. Um obviously just jumping on YouTube, watching videos. Um — I feel like a good feedback cycle, right? When you make your own ones, it's like you get to learn and get clients at the same time. And once you get stuck in that loop of like, oh, I can like learn stuff and like build things and then share it and then get leads from it and build a personal brand. That was the like — the loop that I got stuck in and that to this point. That's so good. — Cool. All right. We'll jump into automated testing now. So, obviously I went through briefly, but to actually dive into a little bit deeper into what it is and how it works and — and the importance of it. This is probably going to be the best thing that we do to these systems. This is how we get the most amount out of it. Um, which is essentially using AI to fix AI. And the way that we sort of pitch this to our clients is that we're going to go ahead and build other AI callers to call your AI. So we're going to, you know, build other AI agents for them. And so, for example, if we're working with maybe a home services company, which we do quite a bit, um, could be for a plumbing company, um, we will be able to simulate real world scenarios. So, we can build an AI that is specifically calling in about a leaking tap and it can send calls, you know, we can send the calls off to our agent and they're literally going to talk to each other over the phone um about that situation. Um, and we're going to be able to assess it performance um and we can customize its voice, its persona, it could be a very old person, um, elderly woman, a young man. Um, otherwise, um, yeah, we could do anything really. — Sick. — So, that's what automate testing is. The goal is to just break it um put it against the hardest edge cases we can and just see how it holds up in the real world. So jumping into reliable
Relyable Demo: simulations, monitoring, and scoring with Retell/Vapi integrations
this is kind of what we've been working on for the past months. So reliable is a simulation and monitoring platform for AR voice agents. Um the goal is to help you ship high performing agents just quicker. Um that's the ultimate thing. Um you know higher level our mission is to really just sort of help contribute to safer and more reliable AI systems. Um you know we want this to be the central hub to test, monitor and approve these systems. Obviously with generative AI like we've been through it just there's a lot of unpredictability into it that really software hasn't ever really seen before I don't think and this is something that needs to be controlled you know within our limits um and we think that this is the way to do it. Yeah. So before I jump in to the actual platform, um automated monitoring is something that I think everybody needs to sort of also understand the importance of um obviously this is what we run through on our retainers. Um and I think that this is equally if not more important than the actual um you know pre-production testing as well. Um and you know automated monitoring is having a system and couple of screenshots here from the actual platform. Um this is analyzing every single conversation that comes through and scoring it. So actually giving it an a physical number to attach to say how well it performed. Um and that's something that we can reference and jump into and really track. Um and it alerts us and whatever. Um and that's just something you can't do manually. — So it's like deeper insights for is it you as an agency or you also turn that around to a more of a client dashboard as well? — Both. Yes. So as an agency, this is obviously great for our development team to be able to have um something that works for them. So, this just runs by itself and notifies them if they actually need to do anything. Um, rather than them, you know, jumping into the retail calls and having a look at if it's working or not, which just isn't necessarily feasible. Um, this is a proactive approach rather than reacting to the issues later on. Why automated monitoring is pretty important. These are just a couple of examples here. Instantly detecting critical failures. So, if you are building out a system for, let's say, a home services company, um, they might they're likely to have maybe like an emergency pathway. um that's something that you want to pick up on. Um and I'll run into exactly how we're able to handle that exact situation in a minute. Um but that's something that you want to be able to have um a system for. Compliance detection as well. So obviously um there are you know a lot of compliance and regulations coming up around these systems and you want to be ahead of it. Um and this is what this is going to help out with that as well. Component failures. So you know the platform could fail very easily. Um and the calls could go down. You want to be aware of that. Transcriber LLM. There's a lot of components a part of these systems that you want to make sure are performing um at all times. Um as well as being able to send periodic test calls to get a pulse on the agent so we can rather than just waiting for it to shut down um and you know if you don't get calls every day it's hard to tell you know how well it's performing um if the calls aren't coming in daily. And so if we can you know send fake calls then we can get a pulse on it. Cool. So this is reliable. Um so ultimate goal is to really help you test AI voice agents um as you can see here. So this right here is the dashboard. Um, and so this is something that plugs into currently at the moment both retail and vapy. So we've got a native integration with them. So you can just add in your API key and assistant ID um, and plug the agent directly in. So if I click over here um, and click on create new agent. You'll see here Vappy and Retail are both natively integrated. So you only need two things and you can plug it in um, and connect to agent right away. Essentially what will happen is the system prompt gets automatically populated into here. So you can see your entire prompt that's been thrown in. Um you can also sync the prompt. So it automatically um updates if stuff gets changed here and in the other platform as well. So the first thing I think that I I'll cover is test cases. So test cases are essentially one of the key parts of this tool as to how we evaluate the performance of it. Um so the way the test cases work is that we're you can think of as splitting up the prompt into multiple different um you know sort of sections I would say. Um so obviously a part of a prompt you would have many different specific things for um you know in this case this is sort of like a real estate um agent. This is just like a mock demo. This is not a real client at all. Um but this has appointment booking functions. This has you know questions about specific properties. Um this is very basic but it should get the point across. These test cases are splitting up that prompt into all the different areas and then we're evaluating it against all of those different areas. — So if you were to click on like the generate test case, it's going to look at the system prompt that's in there and then be like okay well what are some like picks out elements of that prompt and goes okay well they obviously want this to happen. So how can we make a test case and I assume that's going to lead to it creating a system prompt for the test agent. Correct. — Yeah exactly. So these are all AI generated down here. You don't have to manually create these at all. Um so you can hit generate test cases. This looks at that system prompt and you can give it a number of test cases that you'd like and it will automatically generate um all of the test cases you know necessary for it. One thing on the road map is to actually just automatically detect how many test cases you need rather than having to put in a number here. Um but that's coming soon. You can also manually create them um just by typing in here and just adding them manually. Um but I think a key thing to note here as well test case priority. So obviously not every part of your prompt is absolutely as critical as the other. So some stuff is not that big of a deal. In this case it picked out using informal expressions. So this is more to do with it sort of you know speak speaking personality. So agent should appropriately use informal language. Um and this stuff not a big deal if it doesn't do that of course. So it's set to low. Um but then obviously coming down a bit we've got some more critical ones. So executing the appointment booking function. It's critical. if it fails to do that for whatever reason um we probably should know about that and so that's why it's been set to critical. — So if you got all these fleshed out how do you like run it? — Yeah. So simulation uh scroll down a little bit here is yeah main part of the platform. Uh so jump into here there's three sort of components to doing a simulation. Um so obviously one you need your test cases done. Um then next you need a persona. So these personas are going to be mimicking different types of people that can call into this system. So this could be somebody with a British accent, Australian, American, Indian, whatever. You can add in custom 11 Labs voices into this and you can, you know, create these custom personas. You can generate them with AI as well. Um, and these are really just mapping out a specific person. This is not specific to the conversation. This is just specific to that person and who they are. Um, and how they sort of speak. If they're super annoying, they speak a lot, they are very patient in their speech, whatever. Um, and you can see an example here. Um it goes into a little bit of detail as to um this guy's entire life story. Um but it helps to shape that personality behind it. — So you guys have just got a lot of these pre-loaded as like test uh test personas that we can use and it's going to then like take that prompt and then adapt that to each of the test cases we've got in the previous step. — Exactly. Yeah. So these ones right here are all um pre-created and automatically um yeah can be used by anybody. But yeah, you can also jump in and create your own um just by prompting it to create a specific personality. Um, otherwise you can create a personality here as well. Um, and select a voice and really tailor it for whatever situation. So once you've got your persona, we've now got our scenarios. So the scenarios are now specific to your actual client, your actual, you know, your system prompts. And so what we can do is hit the generate test button and we can select one of the personalities that we've either just created and we can assign it to our specific scenario. So, if we want Nigel to call in now, um we can generate a scenario based on the prompt for him. Um and so I've got three examples here. So, this is um one that's been generated and this just says, "You're calling on behalf of your niece who lives in America and is interested in the Maple Grove property in Minnesota. Um you find the idea of speaking to an air receptionist a bit of a faf, but you're determined to help her. This is just generated based on personality plus prompt. Um and then plus you know — test case — bit of the AI as well is thrown into there so that we can push it test cases down here. So we can select which test cases we want to um hit specifically. Um otherwise you can just select them all and um see how it goes. — Sweet. And then what you just select one on the left there. You check the box and then hit run. — Exactly. Yeah. Super easy. You just go ahead select these and hit the run button. Um and then you can send the calls off and then it starts and just sends those calls directly off um to the platform you've chosen. Um, and then all of that then gets populated on the results page right here. Um, and so these will all start to populate. — Interesting. — And then you start to get a score, which is a pretty key thing, um, especially for our development team to be able to just have a look at a verifiable number, a score to actually, you know, see how well it's performing, um, is a really big time saver. So, I jump into one of these and I can run through exactly how that score is calculated as well a little bit. Otherwise, got — you go deep. — Yeah, going deep. Average score. — How long does it take? Oh, you said the duration there is 11 minutes. If you're running like a big batch like your initial batch of uh of testing for an agent, is it taken like a few hours? You're we're loading up in the test case and you sort of set and forget. — Bit of a set and forget at the moment. Um it does take we don't necessarily do like 100 at a time. I'd say we sort of maybe run like 10 to 20 and then we get enough feedback out of it and then sort of just like fix it up, do it again, again. Uh we don't really do like 200 at a time. — Um — but yeah, probably like 10 to 10 minutes. We got an average score. Um we've got bit of sort of um a legend at the bottom here. Greater than 90% that's excellent. That's a good position to be in. Um around 80% not terrible. That's good to be in. You know, that's not going to be horrible, but less than 70%. We would like it to be better than 70%. Um just based on what we're seeing. Um average latency in here as well. The score is evaluated in terms of the test case against the prompt like did it perform as expected? Did it satisfy? So is that like an aggregate of all of the um I suppose you've got them down there, right? So adopt agent. — Yeah. So I'll jump into I'll give a bit more context. So at the bottom here we got the actual specific calls that have been run. So I'll jump into here um and we can have a look at the details related to that specific AI call. — Bro, you got layers on this thing. Holy moly. — Yeah, we got layers um call recording right here. So you can have a listen to the AI speaking to each other. Um they can be a little bit painful to listen to be honest. I mean that's the point. The point is these things to be a bit annoying long-winded — to just see how agent respon how our agent responds to it. — Um but yeah so we've got test cases here. This is the main thing that we use in calculating that percentage. — Cool. — Um so each of our test cases get evaluated against that call. Um and there's can be categorized in three different things. So it's either successful at doing what it should be doing, it's either failed or it's not relevant to that call. So obviously not every test case a part of your prompt is relevant to every call because you have so many different flows. Um, so we can get categorized as yeah, a fail, a success, or at the bottom here, it's just not relevant to what that call was about. Um, and so the way that we calculate the actual score, we essentially total up all of the failed test cases uh as with the all of the successful ones um the percentage of amount that failed. Um, and they are also yeah weighted as well with the priority. So if a failed — Yeah. If a critical test case fails, then that is much more significantly weighed than something that's low or medium. — Cool, bro. That is freaking awesome. You've like I'm very very impressed. I didn't think you had it built out to this depth yet. So, um I'm uh I'm very very impressed for you to be at 20 years old, have built the agency, built your channel up, um you've got a community as well, and now you've built this that fits like a glove around everything you're doing. I'm like a massive fan, bro. I think you're absolutely crushing it. And this is something that I think a lot of people can get a lot of value out of. And I mean, you know it firsthand, right? And uh it's only a matter of time to people kind of wake up and realize that this is out there. So, um I think we can probably wrap on that unless you got anything else uh else sourcy to show. But I think if people just want to jump in, I'll leave a link down the description. You guys can get in there and uh and have a play around. — Appreciate it. Um yeah, if anyone wants to check this out, I suppose they can um give it a go. we can you can book a demo on the website and um jump on a call with me and I can run you through it. — Brenda, mate, as always, always a pleasure. Um it's great to see you over the weekend and um great to talk again here. Uh but I love what you're doing in the space, mate. We need more people like you. So, keep being awesome. Uh keep helping uh helping people out there to learn this stuff cuz it is a I mean, the amount of voice agents that need to be built between now and the next like decade uh is ridiculous. And there's a huge amount of money to be made there. I think it's one of those little like the niches that people like yourself and others once they get in it, they sort of really find their stride. It's quite specialized, but also general enough to be used in multiple different ways. There really are so many different lanes that you can uh you can occupy in the voice space. So, I'd really recommend you guys check out Bren's channel. I'll link down below. Uh but mate, looking forward to talking to you soon. — Yeah. No, I appreciate it. Thank you so much. — So, that is all for this episode of the podcast, guys. If you want to see something similar that I really think you'd like, you can click up here to watch another one. And remember, if you think you have a story worth telling and some valuable insight you can share with the community, you can fill out my podcast application form in the description below. I'd love to have a chat with you and get some exposure for your business. Aside from that, guys, that's all for the video. Thank you so much for watching and I'll see you in the next