# Universal: The Most Powerful Speech-to-Text Ever | Demo & Tutorial

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=w09b30BI0lk
- **Дата:** 30.10.2024
- **Длительность:** 3:23
- **Просмотры:** 167,073

## Описание

Universal: A next-gen speech-to-text model pushing beyond traditional WER (word error rate) metrics. Built on Universal-1's industry-leading performance in just 6 months.

Key results:
24% better at recognizing proper nouns
21% improvement in alphanumeric accuracy
15% enhanced text formatting
73% of users prefer Universal-2 compared to Universal-1
Overall more accurate and robust model especially on real-world speech complexity
Sets new standards across human and technical benchmarks

Architecture:
Smart architecture choices prioritized over simply scaling model size
Universal-2 uses a 660M parameter Conformer RNN-T model
Built an innovative all-neural formatting pipeline
Solved critical challenges like repeated token handling in RNN-T

Announcement Landing Page: https://www.assemblyai.com/universal-2 
Try it yourself: https://www.assemblyai.com/playground 
Google colab: https://colab.research.google.com/drive/1IP_RFufO_-iQVICDEtTbqqHSTgqWPNmD?usp=sharing

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

🖥️ Website: https://www.assemblyai.com
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️  Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#MachineLearning #DeepLearning

## Содержание

### [0:00](https://www.youtube.com/watch?v=w09b30BI0lk) Introduction

In the world of speech recognition, accuracy isn't just a metric, it's everything. Today, we're proud to announce Universal 2, our most accurate speechtoext model yet, which has been trained on over 12. 5 million hours of audio data. Here's what's improved in Universal 2. There is a 24% improvement in the recognition of rare words like names, brands, and location. There's also been a 15% improvement in transcript structure with proper punctuation and casing across things like emails, dates, and dollar amounts, and also a 21% increase in detecting alpha numeric. So, higher accuracy across critical data like phone numbers, zip codes, and other numerical identifiers. Here's how Universal 2 performs across those three key areas when compared to other speechtoext models. Universal 2 has the lowest word error rate across these three key areas.

### [0:52](https://www.youtube.com/watch?v=w09b30BI0lk&t=52s) Demo

Now, let's see Universal 2 in action. In the description box below, you'll see a link to this Google Collab, so you too can try this out to test out how Universal 2 can be deployed. The very first thing we're doing is importing Assembly AI and defining our Assembly AI API key. You can also check out the link in the description box below to get your free Assembly AI API key to test this out. This code snippet right here helps us to do speech recognition with Assembly AI. We're making use of this audio file on hand, but feel free to replace that with whatever audio file you want to make use of. And also, we're making use of the assembly AI transcriber object and the transcribe function where we pass the audio file to. Once we do that, we're just going to simply print out the transcript to see this. With universal tool, you can also

### [1:39](https://www.youtube.com/watch?v=w09b30BI0lk&t=99s) Speaker Diarization

do speaker diorization in just a few lines of code. So here's exactly how you would go about doing it. The main thing is of course to configurate and turn speaker labels equals to true in our transcription config object. Once you do that, you're also going to be printing out our speaker as well as what they're uttering. And this is exactly how our printed out transcript would look like.

### [2:05](https://www.youtube.com/watch?v=w09b30BI0lk&t=125s) Audio Intelligent Tasks

Universal tool also enables you to do a wide range of audio intelligent tasks at high accuracy. So things like sentiment analysis, summarization, PII reduction, and many more. So here's an example of how you would do summarization with assembly AI. All you would have to do is modify the transcription config, set summarization to true, select a summarization model, in this case informative, and then also set the summarization type. Once you print out your transcript summary, this is exactly how it would look like. We have a summary in bullet points.

### [2:43](https://www.youtube.com/watch?v=w09b30BI0lk&t=163s) Sentiment Analysis

Next up is sentiment analysis. Similarly, you would turn on the sentiment analysis model by setting it to true in the transcription config. And upon printing it out, you can also print out things like the text, the sentiment, as well as the confidence score and the time stamp at which that word was uttered. So, here's exactly how your transcript when it's printed out would look like. To find out more about Universal 2 and all the major improvements, check out the link in the description box below. And to learn more about all the audio intelligence tasks that you can use Assembly AI with, check out our documentation page.

---
*Источник: https://ekstraktznaniy.ru/video/12555*