Building Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons
19:22

Building Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons

AssemblyAI 25.02.2026 283 просмотров 4 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Vedant, co-founder of Quso.ai, shares the journey from a simple long-video-to-shorts tool to a full autonomous social media engine — plus why speech-to-text quality was non-negotiable from day one. ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (4 сегментов)

Segment 1 (00:00 - 05:00)

Hi everyone, it's Mart here from Assembly. Um I'm joined here today by Vadant from Cuso. Uh Vadant, thank you so much for joining us today. Um we just wanted to ask some questions to Vant. Um get him to share about uh Cuso, his company, and hopefully um we can dig into how he's building with voice AI and um how Assembly AI has been a partner for him, but also we'll get to see his product, see how what he's building, the problem that he's solving, and learn more about him. Uh Vant, thanks for joining. Maybe you could start a short self intro about yourself and your company. — Hey Mark. Uh and I'm Vidant and thanks so much for having me here. Uh I'm Vidanta. I'm one of the founders of Q. AI. Our software allows people who have absolutely no social media expertise to show up on social media every single day without burning out or without having to think about the nitty-g gritties of what it will take to have a great social presence. uh we've built out a software that's a combination of uh an AI powered video editor, a social media manager and multiple pieces that anybody would need to show up on social media and put it all together in one place uh for people. So far we've over the last four years we've had over 4 million people who've used our signed up and used our software and we have a large customer base spread from spread across pretty much like every country in the world. Very nice. And um maybe you could kind of start by telling me the story of Cuso. I know it used to be called video, but how did the idea come about and what was the problem that you were really trying to solve? — Yeah. Um the problem was very simple. I had spent 6 years working as a social media manager for a media company in India. And the biggest challenge that we used to face was just the amount of time and effort and human resources. We needed to just maintain presence on social media. Each post that we were creating had to go through this elaborate process that involved multiple people and that meant turnaround times were really high. The bottleneck especially in all cases was that hey you needed someone who understood how a complex software like a video editor works in order to create any content that people are consuming right and people are obviously consuming a lot of video. This remains true since 10 years. This was a bottleneck for me. It was one of the biggest challenges uh in my job and my career. And so after uh working with that company for 4 years, I was like hey I got to do something about this and I got to give it a shot of solving this problem by myself with my uh with my understanding of my company and that's how uh the software has worked. Yes, it was previously called video. aii but then we uh changed the name to Q. I to truly reflect what we're trying to do which is quick social. — Nice. Oh, now I understand. Quick social. Okay, cool. Very nice. And um what was the first version like? I know I guess uh four years ago now um speech text has come a long way and so uh was the first version of video um much simpler like other features that you've been able to build now that speech text has gotten so accurate like what was the core feature set at launchite? — Yeah uh of course software has evolved massively in the last four years. what when we started what the software world looked like at that point of time there was no AI we launched preAI so we launched the AI product before chat GPT uh so at that point of time AI wasn't as cool as it is today right and right after us launching we launched with a very simple workflow uh where you could upload a long video and you could get a bunch of short videos from it that's it like that was the there was no social media there was no captioning um it was just one long video that you upload it takes you through a linear process and you get like multiple short clips uh that you can choose from and post to your social media. So it instantly takes like one long video and creates like multiple assets from it. And at that point of time if you remember reals and short videos have just started taking off. So our timing was right over there. In terms of speech to text also yeah the market has evolved massively in terms of what was the status quo at that point of time what's available in the market today. uh but yeah even the very first version of our software did have uh speech to text uh and there was no GPD like I said so there were manual pipelines that we had built to take that uh the speech to text data and then convert that and you know apply intelligence to it without chat GPD to make short clips and then go from there. That's um I remember one of the most interesting workflows your team was trying to work on was like lip syncing to see who was speaking um and then like panning the camera accordingly like you would like we would produce the speaker labels and then you would run like facial recognition. — Yes. — And like clip according to who our

Segment 2 (05:00 - 10:00)

speaker label said was speaking and like — absolutely — three years ago that was wild but now we have like speaker identification built into the API. Um so I imagine like you know the product has changed a lot um — like just over time just as technology has gotten much better and it's — kind of interesting to look back on how we used to do things our like our old methods and our old paradigms um and how I guess technology has simplified so much in such a short span of time. — Absolutely that's true. At that point of time I also remember that we had to like the there was auto detection of language but it just wasn't like as good as it is today and today we have one model for so many languages that was not the case that point time. So you had to specify hey this is the language and then provide a little bit of uh you know but and if that didn't happen everything got messed up and you know it's evolved massively since then. Absolutely. — Yeah. And I guess on that topic um when did like transcription quality kind of start to make the product like was there a specific moment where you realized this technology is like it's real it's really working this product is going to work I mean for a software like ours for a service like ours where the end product is derived from the transcription quality is extremely important because not only will that be used in the intelligence in the processing of what the output looks like but it actually shows up on the output as captions right you actually the words that have been spoken are burnt onto the videos as captions so that people can follow along. So from the start accuracy was really important for our use case and uh there was absolutely no going around that and uh obviously like every other team would we tested out a bunch of stuff right that was available in the market at that point of time uh including uh you know the big tech models speech to text models that's available which is the most obvious choice when you're thinking speech to text and most of your infrastructure is already on uh GCP or AWS that's what you're thinking in terms of adding more services as well right so we we did our test runs with a bunch of those services they were never up to the mark in terms of the accuracy and that became a core bottleneck so one of the things that we wanted to solve like I said like if there is no accurate subtitles then the entire process for us breaks so it has to be the best quality from the start and that's when we started looking out for different other vendors and sort of started comparing ing internally, hey uh if we were to run the same video on five different services, which one is giving us the best results, etc., etc. And that's how we ultimately came down and settled on assembly AI. — Nice. And um maybe you can also kind of give us a bit of sneak peek. What are you building on now? What is like you thinking about on the road map? I mean for us like I said like uh software has evolved massively over the last four years and today SAS tools in general are becoming more about not you know what features you have but what job you can get done for your customer right and that's essentially where we are at uh where the current version of our software or what we had been building so far was about hey how can we give you as much content as we possibly can repurpose from your stuff and then you make the decision of posting it to now completing the entire life cycle where we're like hey when you upload something or when you give something to us we will have agents or a system that autonomously schedules and creates social media content for you and publishes it on your behalf as well. So the entire loop is completed as compared to you intervening inside the loop which used to happen all the time with any SAS tool where you had to click and get the work done. The tool offers some other arbitrage. I think that's completely shifting to a done for you autonomous uh loop completion. Yeah, it it does kind of um strike some some thoughts for me as like it's no longer about just one workflow or like this tool being one thing that I use to get my whole job done. Like I want something that does everything from end to end and like I'm thinking of like open clo open code or cloud code like one tool one place like that's my workstation for this particular slice of work. I guess I'm guessing Cusu would be something similar for social media marketers. — Um even for someone like myself, a technical product marketing manager. — Um I can just upload everything that I've got in long form and like get lots of little shots out that I can put on X, put on LinkedIn, put on YouTube reels and on Instagram. — You don't want to stop at just the videos. What we understood is that you know there's a large part of the content that you're creating that is not videos

Segment 3 (10:00 - 15:00)

right? For example, this interview, we could create short videos out of it, but you could also take like a oneliner quotation or a testimonial and put that out as a photo. And at the same time, you could also make an infographic of how the cuso. ai team decided to start using Assembly AI and what they used. Those are all things that creating these assets would have taken hours and it would have never even come to your mind preI and post AI. If you're able to capture this conversation, then you're able to repurpose it for any kind of content like even newsletter, blog posts, you name it. So I think we want to build those entire workflows for you. So all you have to do is, you know, just record it your Zoom call or your Riverside for that matter, drop it on our software and we create or re repurpose that into social media content for the next 30 days. Um, it kind of reminds me like it's cool that a lot of the work being done is over these conversations and we're just capturing people talking like natural unscripted honestly not even that prepared but like um there's quality and there's like an exchange of value in these conversations that shots are able to capture and videos really well. — Yeah. — Well, thanks for answering those questions. All I really had next was if you want to give some advice for founders, project managers, maybe even social media managers watching uh who want to use AI, AI powered video and audio tools like Cuso. Um if you had anything to say to them, you know, what would you say? There's never been a better time uh to innovate on either using AI to get your work done or using AI to be seen more. I feel that no matter what your goal is in today's day and age, when you're building product or when you're shipping marketing product, I think AI is there at every part of the workflow and there's a new AI tool that comes up every day. I think there are probably like 50 AI tools lost every single day. So that what that means is and what that means for product managers and product builders is that software as a mode is collapsing. Everyone can any guy with a computer and a cloud subscription can now build a software and that means that building softwares or seeing your ideas through into actual real work is not going to be a barrier anymore. It's the same thing that happened to content creation when everybody got an iPhone, a camera phone, right? Everybody could create and that see how that's changed the entire the same thing is going to happen for software uh and using AI in the next 3 to 5 years. And it's so easy to build today that that's not going to stop you from trying something and you know experimenting and putting something out there that's worth noticing. And even if it is changing the lives of a very small set of people, it's still uh it's still valid, right? And those equations or the power laws of software don't hold true anymore. So I think everyone should go and try creating something of themselves like using AI as fast as possible. uh using voice AI for that matter as fast as possible. — Um I keep seeing on X that like SAS is dead and I think honestly like these apps like especially the application layer is just having such a great time because adoption is increasing. People are more open to trying AI. People are yes — not everyone is going to you know v code a cuso over the weekend but rather they're actually more open to trying these things out and seeing the real value that comes from using these tools. Yeah, I agree. I don't think I think the overall market is going to change. And when people say SAS is dead, they mean SAS in the traditional sense in what it used to be and the way it was sold is dead because anyone else can create. So, so now you're really the pricing model was always by per seat uh not per unit of value and that's changed completely because there can be someone else who comes in just as easily and starts charging per unit of value and then your entire model flat falls flat. Uh so I think that's definitely changing but it's definitely expanding the market. It's great news for you know all of us in the business because more and more people want to use these tools. — Yeah. Great. Um that's all I really had. All I wanted to do next was to see in action. Maybe you could show me around the dashboard. Show me how to create these snippets and uh we can learn more about your product that way. — Yeah. So this is what the software looks like. Um, what you can do is that you can, this is, by the way, this is going to change in the next couple of months, but I'm just going to show you how this works. You can get any video links from YouTube, Instagram, Facebook, or drop it from your computer, etc. And then you can ask the software to do a bunch of things on those videos, right? So they can uh you can ask it to create short videos, add captions, you can ask it to write stuff around the video, and you can also use some other features. But I'm just going to show you an example of what it does. So this is an example of a video that I got on

Segment 4 (15:00 - 19:00)

YouTube. And what it's done is that it's created a bunch of short videos from this video, right? So and it's given each of these short video scores. So you see that there's a score. Sorry, this video scored a 97. So it's likely to do very well. Now, now see how what it's done to the video. It's automatically framed, understood every frame of the video and automatically framed. — This one is the most technical tool out of the list here, which is NAN. This is a AI orchestration tool. What does that mean? It allows you to link all of these different AI. — So, it's automatically reframed at every scene — that we're talking about today. — When it detects that there is somebody else talking about something, then it'll automatically sort of change — the entire layout. Uh like in this case, 500 AI tools to discover the top nine to blow up your business, help you make money if you got no money, — and now if you want to change stuff. So, so it's created all of these short clips and any are likely very good to be posted directly online. In that case, you can just click on this button that says share link your social media accounts and post this video directly on social media. Or if you want to change some stuff up, you we have a powerful editor that allows you to, you know, change things up uh inside the editor. If you want to, you know, change the style, uh cut up some parts, uh remove stuff, all of those typical video editing options are also available over here. So you can connect your social media, drop a video, and you can also use this nifty little tool called Viddy, which is like chat GPT for your video. So you can uh ask Vidi to create timestamps, show notes, summaries, quotes, titles. So let's say if I ask it to generate a summary of this video, it will go through the entire video and create like a summary of this video and if I wanted to do you know other stuff around the video, create other content like SEO blog posts, letters, stuff like that, it'll do all of that as well. — Um, — yeah, this is a video that you shared with me. So here you go. So it's automatically — what we think is super cool about this is you know because you have the control with prompting for different use cases this may be good and this may be bad right uh if I was — so now this is a layout that it automatically assigned because it noticed that Ryan was speaking on one corner and um so it gave Ryan this so now I can change this right if I want to change the layout a little bit and if I want to make like Ryan bigger then I can do something like this and I can change this. Yeah. Then I and then I can crop this. So I can ensure hey I in the frame here. So I can do that. So this is what we've been up to. This is going to be live in a month or so. Uh sorry in a week actually. So now I can even I don't want to talk about I just want this part. Right. So — so it's automatically done all of this work for you as well. — Um Vidy kind of reminded me of um like uploading shots on our YouTube channel. And to be honest, I never like it's kind of like a game of finding the perfect caption and the perfect description. And — because you guys have all the experience in like social media, obviously I'm no expert, you guys are the expert, you're able to transfer that expertise um to the user because you guys already know what works. You guys have the cheat codes. So you just share that with us and I get to benefit from all of that knowledge. That's great. — Yeah, absolutely. — Um that was really exciting. Thanks for your demo. It was like great to see um what you guys are building. It's it's amazing like what AI is able to do these days. Um I'm quite fascinated by the workflows by the platform. Um and I'm also really glad that I got to hear your story in um building CUSO um and I guess uh how you evaluated your speech text provider um why it was important and overall I really enjoyed this conversation. I'm really glad that you took the time to share this with us today Vant. — Of course. Thank you so much. — Thanks for your time. CF. — See you.

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник