Universal-3 Pro Technical Overview

5:14

Universal-3 Pro Technical Overview

AssemblyAI 03.02.2026 274 просмотров 13 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Hi, Ryan here from Assembly AI. I'm excited to announce our newest speechtoext model, Universal 3 Pro. This is the first model that allows you to add a text prompt input next to your audio files to generate a completely customized transcription output for your particular use case and customers. Now, let's actually jump in and see what some of these prompting capabilities look like. Our prompt engineering guide walks you through some of the ways that prompts can influence the output of your transcripts. Some things that prompts can do off the bat, increase disfluencies, change style and formatting, add context aware clues and improve entity accuracy, add speaker attribution and different audio event tags, make sure the model code switches, and a number of more capabilities that we're still working on documenting and discovering. With this, I actually want to walk you through some of these example prompts and behavior so that you can see how the model reacts and changes as we're actually prompting it. To help with this, we're going to be using this GitLab unfiltered SEC growth data science staff meeting as the sample file for our comparisons. I whipped up this quick lovable app to do speechto text model comparisons between the different assembly models. On the left, we're going to have universal 2. This is our current production model, which has the best price per performance of any speech to text model on the market. On the right, we're going to have Universal 3 Pro. For this comparison though, we are not going to add a prompt. The reason is I want to highlight very quickly just Universal 2 versus Universal 3 Pro out of the box, no prompt customization, what some of the differences are between the models and some of the things that we've improved. You'll see below we're actually marking some of the differences between the two. I'm going to go ahead and play the first little bit of this audio file so you can see some of the differences. So it is the SEC meaning secure and govern growth and data science meaning applied ML MLOps and anti-abuse team meeting. That's a big mouthful. We might get a better name over time. Um and uh that's our meeting for September 14th or 15th in APAC. And hi Alan, glad you're here. Why are you here when it's midnight? We could talk uh glad you're here. domain. — So really interestingly, you can see immediately we had some corrections in the first sentence to fix some broken words. We've capitalized some of the proper nouns. We've also completely fixed the meaning of this sentence. We could talk. The original actually had that as a completely different meaning. And so just out of the box, Universal 3 Pro has done a bunch of things to make our transcript better. Now, let's actually start prompting to see what we can do in terms of customization. Since we've already done kind of a simple prompt, I'm going to go down and start doing sample prompt two so that we can see some of the differences when we go ahead and use this prompt. Let's go ahead and plug this into the tool and compare it to no prompt and see what the results look like. With this done, you'll see that this the nuances and differences in this file are quite subtle. You can quickly see this it may later on in that transcript. Let's actually go and highlight to that particular point so we could uh see what the difference is. — Glad you're here. Don't make it a habit to come to this meeting since it's really late for you. Uh and so but I'm glad. Thank — so you can actually see it may when he said that was like a stutter and speech hesitation and now we've properly transcribed that with this simple prompt. This prompt however could be a lot more verbose and follow some of the best practices in our prompt engineering guide. Let's go ahead and test an additional prompt to see how we can improve these results. Something I noticed when we were listening to that audio is it seems like there's a lot of false starts and hesitations. So, I'm actually going to go down to the verbatim section and try one of the different prompts here to see if we can tease out some of those capabilities in the audio file. I've gone ahead and moved the initial prompt that we used to. And now we have this new prompt running on the right. Let's go ahead and compare these results to see what we get. With this new verbatim prompt being used, you can actually see quite quickly how many ums and we've actually added in here. Let's go ahead and scroll to that part of the audio just so we can see what this actually looks like here. — Mouthful. Um and uh that's our meeting hi Alan, glad you're here. Why are you here when it's m midnight? We could talk. Uh, glad you're here. Don't make it a habit to come to this meeting since. — So, with that, you can see very quickly how we've customized our transcript and gotten completely different results based on the prompt that we've used. So, there you have it. Completely customized transcripts with Universal 3 Pro and prompting. If you're new to Assembly AI, please reference our quick start guide. You can use the speech models parameter under request to request Universal 3 Pro and feel free to include the prompt

Segment 2 (05:00 - 05:00)

parameter to start experimenting with the different capabilities of the model. We're really excited to see what you build and looking forward to your feedback so we can keep making the model more and more robust for our different customers use cases. Thanks.

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник