Building Earmark: Real-time voice AI, privacy by design, and founder lessons
25:14

Building Earmark: Real-time voice AI, privacy by design, and founder lessons

AssemblyAI 09.02.2026 228 просмотров 4 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Mark Barbir and Sanden Gocka, founders of Earmark, share what it's really like building a real-time voice AI product for product managers. We cover: - The origin story of Earmark - Dogfooding their product internally - Their experience building with Voice AI - Advice that they would give other founders Earmark: https://www.tryearmark.com/ --- Timestamps: 0:47 - Earmark intro & origin story 2:47 - Dogfooding Earmark internally 6:10 - Why Earmark switched to AssemblyAI 8:56 - What Earmark is building now 13:05 - Earmark demo 21:29 - Advice for founders building voice AI products ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (6 сегментов)

Earmark intro & origin story

without all the manual follow-up. — Nice. Um, was that how earmark started? Like could you walk me kind of through the origin story and what the first version looked like? Yeah. So the origin story in the first version was radically different than what we're working on now. Uh we actually started at earark as a vision pro product you know. So essentially what it was a ARVR uh rehearsal experience right where um we wanted to help product engineering leaders be more preparatory like as they're influencing. I mean a lot of folks are certain leaders a lot of people in product uh nobody reports to them right. So the concept was how do we make people effectual like in their ability to influence. So we started with um this idea of real-time speech coaching, right? So, you know, you could go into an immersive environment, you know, in your vision pro, you know, you'd have your Google slide deck, you know, uh presented and [snorts] then you would have uh real-time feedback in terms of, you know, whether or not you're breathing, you know, should you enunciate a little bit better, should you speak up more, uh maybe you're speaking too quickly, right? And we provide real-time feedback around those concepts. Uh and then we decided to, you know, conduct a bunch of user research, right, around that particular solution. Uh and it turns out that uh you know we learned and the key insight was that nobody really prepares for anything. So we made a rehearsal product for people who don't prepare for presentations. Uh so that was the key learning. Uh and then what we did was we took um essentially we just pivoted and we took the idea of a real time uh feedback experience right and we put it on the web. Uh and the concept there was could we essentially uh enable product folks to be uh more uh uh informed in the moment. Right? So that was like sort of the first thread we started pulling as a service. Uh and that evolved to uh um automated creation of artifacts and deliverables of work while people were having conversations. Uh and we're about five iterations in since our vision pro solution. Um and yeah, just in market and having conversations with customers and prospects every day.

Dogfooding Earmark internally

That's really cool. I like that. I guess it kind of developed and matured over time and now you're building something people want. So that's always a good place to be in. — Um do you dog food earmark internally at all? Like does your own team use it day-to-day? — Yeah, we use it every day. Um so one concept would be I mean like for Sandon and I and you know and Dylan our go to market uh leader uh what we'll do is we'll have uh just unstructured conversations you know that are that you know could be like brainstorming right an idea would be you know could you like you know so we would talk we'll talk through st sales status right things about customer sentiment things about the product uh and then we'll actually like go through ideation in real time uh the thing for earmark that's really powerful is it'll take these unstructured conversations and turn them into structured artifacts first and foremost, right? So, you know, requirements or maybe uh support documentation for our customers as an example or maybe go to market messaging in terms of uh you know, type of language we're using maybe based off of customer conversations we've had. Right? So, uh that's been really useful. But um something that's actually more useful um in our most recent iteration of the product is the ability to essentially push to cursor um you know push to vzero push to codeex um and actually uh have conversations lead to prototypical flows um and so for us the concept and the the true problem we're trying to solve is this idea of can we essentially um author you know the deliverables that you normally would create maybe a day after sever several days after uh in that 30-minut increment or hour increment of the meeting that you're actually in, right? And the real unlock there is, you know, if we're talking through concepts, right? And then actually looking at protocol flows, um that's a great way to kind of close that cycle time up. Um you know, because R& D teams are just overt taxed, you know, Mart, I'm sure you understand like how hard uh just the business of R& D is, you know, and for us, the biggest thing is, you know, can we just, you know, uh help cycle times for everybody? And then also you know for product managers and a lot of folks that aren't in engineering you know like a lot of the like R& D is moving so much faster with AI um can we essentially enable um you know product folks to keep up and maybe even stay ahead of their engineers that are running five times 10 times faster. — Yeah. Yeah, I think my, you know, my favorite use case is, you know, like imagine you're in a meeting as Mark was mentioning, you're brainstorming on features and you're 30 minutes in, you're 40 minutes in and then suddenly you share your screen and then you'd be like, is this what you were talking about? And like just having that there like in that context is so cool because like everyone's minds is like already thinking about that. Um, and then you can continue to riff and it's just like the quality is so much better and just reducing like that cycle time and in in not having to do those follow-ups is kind of such a huge benefit. — I I honestly the thought of working by just talking is a dream come true for me and it has been for me personally now that I use Super Whisper on Mac like most of the things I write now are hardly with the keyboard. It's all with a voice and having kind of like an interactive platform to do that on like — meetings obviously but um having artifacts come out coming out of those conversations that's like a game changer. — Yes.

Why Earmark switched to AssemblyAI

— Um so you guys use assembly um under the hood. Could you tell me more about why you chose assembly? What was that decision looking like? Did you do an eval? Did you do a vibe eval instead? Um did you have another solution prior to assembly? Why did you change? maybe kind of walk me through the story of how Assembly got into the picture. — Yeah, there were two big reasons uh for why we chose assembly. So, we were using another provider uh for transcription before um but we were running into two big issues. I would say the first issue was kind of like the plumbing. It's almost like we had to build a ton of abstractions just to get it to work. like you can think of things like microphone management, the websocket life cycle, a lot of reconnection logic and just tons of work to make it reliable. Um, but I think the biggest piece was that we were kind of getting slammed on the concurrency limits and it was really hard for us to predict scale or even like a launch when we did like a product hunt launch or like hey like we're not sure like we seem to be like around the edge. Um, but if we want to get over this we have to sign like this really expensive like enterprise contract. Um, so it was just kind of like a lot of unknowns. Um, and actually funny story, just kind of like right before launch, um, we kind of discovered Assembly AI. We did a quick, uh, you know, some quick tests with it. We discovered, hey, like this transcription, it's not only really fast, but it's also really accurate and more so than what we were using. Um, and we actually swapped it out in 4 days right before launch. And we launched with that, and it's been it's been great ever since. — That must have been just a mad rush. like 4 days before launch, you're trying to like get like all the plumbing moved over from one provider to another. — Um could you — I want to like double tap on like that concurrency limit conversation. — Um do you open one session for everyone on the call or does like each person get one stream? — Every person will get one stream. Um so if there's one person on the call uh who's actually using earmark they'll get one stream but if four people are using it simultaneously that will be four streams. So if you think about that, like even one meeting at like one workplace could be using four streams at the same time and then obviously multiply that out within that one workspace and that's a lot. Um and then of course over different companies and whatnot. So that's like you know a ton of concurrency streams. And what's really cool about uh assembly is having this unlimited concurrency stream where there's almost kind of like a backoff policy like we get you to a certain threshold. Um and then it just continues to add on based off of that. Um so that's been that's been fantastic for us scaling and uh yeah I'm not sure how we would have survived without that.

What Earmark is building now

— Nice. And what are you guys focused on building now? Like where's the product data? — Yeah. So, so Mark, the one thing that we're trying to focus on this idea of a true, you know, chief of staff, — uh, for product teams that is predictive in terms of what individuals need, what teams need. — Um, the ability to not only task or delegate work to your chief of staff, but then also um have proactive tasking. you know, being aware of um maybe if you think about sort of a multiplayer setting. Um if let's say um there's a delivery team, you know, that's like offshore, right, that had a blocker last Tuesday. Um you know, could you be proactively notified, right, of what those blockers are? Uh you know, sort of day in day out, like what should I truly pay attention to? Um you know, as a product or engineering leader. So I think for us you know just the idea of like sort of this co-presence that is really helping you know like every aspect of your work. Um we know that uh you know product leaders and engineering leaders are so over tax right in terms of um not only meeting schedules but then the types of deliverables that result from them. Uh the one thing that uh that we've learned is you know every 30 minutes you know is different you know in terms of the audiences they're speaking to. uh every audience requires a different type of artifact or deliverable, right? And oftentimes they require different levels of fidelity of those artifacts based on seniority or whatever the immediate need is, right? Um and that's just a huge, you know, sort of contextual lift, you know, for folks that are working 60 hours a week, right? Like, um so I think a big part of that is, you know, can we help those folks like in their roles to create capacity to uh be more strategic, right? Like you know for a lot of customers we speak to um oftentimes they haven't talked to customers in months because they're just sort of beholden to the needs of internal teams and deliverables around those teams and keeping their teams mobilized and fed. Um so the idea for us is you know this chief of staff that basically creates capacity for folks to basically you know do the things that got them into product in the first place or engineering right um so that's the concept of the tool. The other piece is this idea of like you know a second brain you know which is kind of similar to this idea of achiev staff but could we have a second brain for product teams you know which is a queryable you know pool of context right organized by project um and can you essentially um uh you know have essentially a system of action right relative you know relative to maybe a system of record in more of a traditional form — like a brain — honesty kind of sounds like a — sorry about that um no sorry — honestly kind sounds like a game changer for founders where like they're coming in and out of these meetings with different people, with customers, with like employees of the company. Um like not every meeting looks the same and not every meeting needs the same artifact to come out of it. So like they could really benefit from earmark by like being productive all day when they're on those calls talking to people and like getting things done even though like their calendar is full. — Yeah. the that the the unlock for our customers which has been a really cool thing to see is this idea of um unlimited task agents you know that are running in real time like in the background you know like as conversations and the workday progresses um for a lot of folks that we speak to it's like they can't imagine not having unlimited task agents operating in the background as conversations progress you know so it's a so that's a cool like sort of unlock for for a lot of people — it kind of reminds me of my coding workflow where like Nowadays, I have like four different tabs open on my terminal just with open code chipping away at different projects. But since it's a real-time like product earmarked, like those terminals are constantly being open and we can have like way more than just four. Like it could in theory be unlimited. — Correct. Yeah. We haven't actually uh tested the unlimited part, but uh but we're you know it it's you know we're fairly certain that uh that you can put quite a few task agents towards work. really nice. Uh maybe you guys

Earmark demo

could show me around the product itself and we can give like the audience an idea of what it looks like and how you can use it. — Yeah, let me pull that up for [sighs] you a quick second here. So, welcome to Earark. Um so, I'm kind of in my main page. Uh I have a meeting that I've already pre-recorded here. Um kind of one of our retros here. Um, but basically what we wanted to do is kind of like design almost like a utility type tool that was just super straightforward to use, just really quick to get in. Um, where one a user can kind of start capturing their meeting. Um, and during a meeting or when they're done with their meeting, um, they can create essentially any artifacts that they want, which is kind of completed outputs. Um, so kind of like what we were talking a little bit earlier, one of my favorite use cases um is to come up with um engineering specs. So essentially based off the meeting, based off the transcript um is essentially pulling in kind of what it thinks are um actual work items that engineers can work on. Um so for example, on this meeting um that I have here, uh the users were talking about a missing 404 page. And what's really neat about this is I have just kind of like quick actions to essentially kind of like build a cursor. Um, which would open it up right into the external app and I could literally just kind of start going um and get that running um get that running from here. Or likewise, okay, maybe if I didn't want to jump into cursor um real quick, maybe I need to save this for later. Why not just add this right into linear? Cool. This looks good. I'll create the issue and now that's on my tracker. Um, so those are kind of examples of just kind of like getting to action really quickly. Um, for more communication examples, we have kind of a bunch of different templates such as, hey, if I want to follow up with my team on Slack, um, I have this nice template that has kind of emojis. It's like really short and condensed. Um, has all of the action items in here. Or maybe I, you know, need a more kind of like traditional BR PRD that I just want to get started and uh get onto a first uh draft here. Um, all right. Um, so yeah, here's a traditional PRD that uh someone might see. And uh on the left, um I think most people might be used to essentially kind of like chatting uh to change. we're kind of introducing this topic called uh vibe uh vibe docking um which is kind of on the left is essentially the format of which you see on the right and what's really cool about this is I can just kind of go in here and maybe like you know add um add emojis kind of like under this executive summary section and what that's going to do is it's just like it's going to like regenerate um live for me just kind of like based off of what I might tweak in here or like maybe I like I add another section um like um I don't know customer quotes um if there were any quotes um you know in this particular meeting and uh let's see what it comes off okay cool so yeah indirect feedback of quotes um so it's neat because you can really just kind of like adjust and fine-tune on the spot um and then send this out um to your platform of choice when you are done. But uh yeah, that's your mark. — That's real neat. Thanks so much for sharing this with us. Um I want to kind of take the conversation in another direction and maybe start to think about how like the future way that we interface with computers and with AI is with voice. Um you mentioned sort of like this chief of staff uh that lives uh on your computer. Maybe you guys can share a bit more of like a vision of how you think we would use this tech 5 to 10 years from now when you know ASR becomes like really accurate, really perfect um super fast and of course we start to build up a lot of this context um from work that we've done and artifacts that we've already generated. — Yeah. one of our one of our goals is how to turn essentially a knowledge work from being really uh reactive to proactive. So, you know, imagine kind of like a true chief of staff like as you come into work um like in the morning as you're walking down the halls. Yeah. Imagine someone is being like, hey, like overnight this vendor might have renegotiated the deal or maybe there was an engineering team that's in a different time zone as you and they ran into a blocker and like here's that blocker and just kind of surfacing those things for you. Um so you don't have to find out kind of later in the day or maybe where we have about hey like you know what's the status on this? Those things just come to you so you can then decide okay like where do I want to take action today? And it's almost like coming into work like every day and knowing what are like the top three important things to work on and those are all in real time. And the extra kicker on that is since we know the context now imagine if earmark could actually take action on that. So imagine like okay now I actually want to delegate some of this work off or maybe I want to take on some of this work that uh you know that um has but uh that's kind of our fiveyear like a grand vision for — yeah and the other piece too is um you know we talk a lot about the relevance uh or maybe slightly ill irrelevance of systems of record you know today where a lot of work you know for knowledge workers is um essentially entry you know to make sure that um you records are kept right that you can have a credible report to your key stakeholders you know because everybody is you know uh entering you know whatever is required like within their system of record. Um we think that the future work is going to evolve to where systems of record might be a little less important maybe in a traditional state because if you capture all conversational context right you there's nobody that has to play scribe um to enter these things you know to have visibility in terms of what's happening in R& D as an example um so that's a really powerful unlock and then to Sandon's um point around um proactive agents right or it being smart enough to sort of self task itself self. Um, you know, this sort of evolution to systems of action, right, is something that we're really um optimistic about as well, where, you know, it's not entry, you know, sort of passive, right? You still have to go and, you know, execute whatever the work is within the system of record. You know, that the idea that um things will basically selfseed themselves in terms of tasking is a really powerful concept um and something that uh that we really look forward to. — Yeah, it's really nice. Sorry, if I could just add one more thing. Um, I think one of the reason why we're really bullish on voice is because that's where like 90% of like the conversations happen at work. And so like if you imagine like you have all of your documents, Slacks. Like that's great. That's a lot of data, but there's still so much that happens in conversations in meetings or side conversations. Um, and just imagine that also captured in kind of with your second brain and what you could do with it. It's really nice. I like the idea of like a daily brief coming into work and like just being caught up on everything and conversations that happened while I was asleep and leveraging that to be more productive in my day. That's like that's a really nice vision for the future. Um, so do you guys have any like advice for founders who are building in general but also with voice AI? like um I know I guess your product has uh matured and it's grown. Um I guess maybe you can share some insights from there. — Yeah, there two big pieces of advice I think. One is like privacy is design. Uh voice data is very sensitive by default. Um so you need to be like really intentional about what you store, how long you store it, do you encrypt it, um do you avoid storing it? Um and actually for us for your mark um you know because we view we view privacy

Advice for founders building voice AI products

so strongly um we do have an option on all of our plans um which is called temporary mode where we actually don't store um the transcript or any of your data at all. Like there's no retention plan. It literally just bypasses our database completely. Um so really designing around that and thinking about that is really important. Um, but uh, one of the lessons that we learned early on to for voice AI products is actually making the UX really forgiving. So when a user is using a voice AI product, they're actually taking action in something else. Like they could be in a meeting, phone call, or they could be in a conversation with something. The product that you're using is almost secondary to them. So if you are trying to capture a conversation and in order to start that capture it's like four button clicks or different configurations, the user is just not going to use it. So it needs to be dead obvious like can it be one click or can it do it for you? And if there's a blip that happens, can it resolve itself or can it figure things out itself? um just removing that kind of like that uh that those decisions from people while they're using something else I think is a is a huge thing that uh that it's easy to overlook when building voice AI products. — Yeah. and more just general foundry experience. Um, one thing that uh Stand and I uh sort of, you know, sort of evolved to is um I don't know like there there's so much uh out there in the form of best practices and like founder content. Um and there's so much uh um uh that's the word I'm looking for. there's so much dogma, right, associated with just being a founder that um I think that the pro tip is like perspectives are nice. Um and there are some frameworks that are helpful, but there's nothing more uh helpful and impactful than just lived experience and just charging through and just being a founder and not um uh thinking the dogma, you know, is what it really is. You kind of learn that like everybody has an opinion, right? But you have to sort of figure out what works best for you in your organization in terms of driving it to success. Um, and I think for a lot of folks that are sort of like on the precipice of like becoming founders or when they're in it really early, um, they kind of get caught up in that group think, you know, in a way that I think is less than productive. Um, so, uh, yeah. So, the pro tip there is like advice is great. Um, but, you know, chart your own path. That's it's really funny that you mentioned that because one of our like unofficial company values is beware the dogma like um we were going through a phase where we like really grew as a company and we immediately reached for like that enterprise software growth playbook that's common and part of the reason why you guys have um stuff like unlimited currency with sorry unlimited concurrency with no enterprise agreement is because we realize that's not how our customers pay for and buy AI products. They want flexibility. They want to know that this is a company that they can scale and grow with. And kind of like letting go of that traditional playbook has allowed us to serve our customers and keep them really happy. So, I think that's really great advice. Thank you so much for sharing that with us, Mark. — Yeah, absolutely. — Well, that's uh all I really had. Um, thanks for your time and uh, I guess I will leave like your contact information down below if you want to try earmark yourself or get in touch with M and send and um, everything will be in the description. Uh, thanks guys for your time. This was really such a great conversation. — Yeah, thank you Mark. Really appreciate the opportunity. Agreement.

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник