Universal: The Most Powerful Speech-to-Text Ever | Demo & Tutorial

3:23

Universal: The Most Powerful Speech-to-Text Ever | Demo & Tutorial

AssemblyAI 30.10.2024 167 073 просмотров 104 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Universal: A next-gen speech-to-text model pushing beyond traditional WER (word error rate) metrics. Built on Universal-1's industry-leading performance in just 6 months. Key results: 24% better at recognizing proper nouns 21% improvement in alphanumeric accuracy 15% enhanced text formatting 73% of users prefer Universal-2 compared to Universal-1 Overall more accurate and robust model especially on real-world speech complexity Sets new standards across human and technical benchmarks Architecture: Smart architecture choices prioritized over simply scaling model size Universal-2 uses a 660M parameter Conformer RNN-T model Built an innovative all-neural formatting pipeline Solved critical challenges like repeated token handling in RNN-T Announcement Landing Page: https://www.assemblyai.com/universal-2 Try it yourself: https://www.assemblyai.com/playground Google colab: https://colab.research.google.com/drive/1IP_RFufO_-iQVICDEtTbqqHSTgqWPNmD?usp=sharing ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (5 сегментов)

Introduction

In the world of speech recognition, accuracy isn't just a metric, it's everything. Today, we're proud to announce Universal 2, our most accurate speechtoext model yet, which has been trained on over 12. 5 million hours of audio data. Here's what's improved in Universal 2. There is a 24% improvement in the recognition of rare words like names, brands, and location. There's also been a 15% improvement in transcript structure with proper punctuation and casing across things like emails, dates, and dollar amounts, and also a 21% increase in detecting alpha numeric. So, higher accuracy across critical data like phone numbers, zip codes, and other numerical identifiers. Here's how Universal 2 performs across those three key areas when compared to other speechtoext models. Universal 2 has the lowest word error rate across these three key areas.

Demo

Now, let's see Universal 2 in action. In the description box below, you'll see a link to this Google Collab, so you too can try this out to test out how Universal 2 can be deployed. The very first thing we're doing is importing Assembly AI and defining our Assembly AI API key. You can also check out the link in the description box below to get your free Assembly AI API key to test this out. This code snippet right here helps us to do speech recognition with Assembly AI. We're making use of this audio file on hand, but feel free to replace that with whatever audio file you want to make use of. And also, we're making use of the assembly AI transcriber object and the transcribe function where we pass the audio file to. Once we do that, we're just going to simply print out the transcript to see this. With universal tool, you can also

Speaker Diarization

do speaker diorization in just a few lines of code. So here's exactly how you would go about doing it. The main thing is of course to configurate and turn speaker labels equals to true in our transcription config object. Once you do that, you're also going to be printing out our speaker as well as what they're uttering. And this is exactly how our printed out transcript would look like.

Audio Intelligent Tasks

Universal tool also enables you to do a wide range of audio intelligent tasks at high accuracy. So things like sentiment analysis, summarization, PII reduction, and many more. So here's an example of how you would do summarization with assembly AI. All you would have to do is modify the transcription config, set summarization to true, select a summarization model, in this case informative, and then also set the summarization type. Once you print out your transcript summary, this is exactly how it would look like. We have a summary in bullet points.

Sentiment Analysis

Next up is sentiment analysis. Similarly, you would turn on the sentiment analysis model by setting it to true in the transcription config. And upon printing it out, you can also print out things like the text, the sentiment, as well as the confidence score and the time stamp at which that word was uttered. So, here's exactly how your transcript when it's printed out would look like. To find out more about Universal 2 and all the major improvements, check out the link in the description box below. And to learn more about all the audio intelligence tasks that you can use Assembly AI with, check out our documentation page.

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник