# Conformer-1: a new large scale/robust speech recognition model

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=hkChdbq7IQI
- **Дата:** 22.03.2023
- **Длительность:** 5:04
- **Просмотры:** 302,362

## Описание

We're introducing Conformer-1, a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data.

Our results demonstrate that Conformer-1 is more robust on real-world data than popular ASR models, making up to 43% fewer errors on noisy dataI, and achieving state-of-the-art results on a wide variety of academic and real-world datasets compared to other ASR models.

Conformer-1 release blog: https://www.assemblyai.com/blog/conformer-1/?utm_source=youtube&utm_medium=referral&utm_campaign=conformer1

AssemblyAI Playground: https://www.assemblyai.com/playground/

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

🖥️ Website: https://www.assemblyai.com?utm_source=youtube&utm_medium=referral&utm_campaign=conformer1
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️  Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#MachineLearning #DeepLearning

## Содержание

### [0:00](https://www.youtube.com/watch?v=hkChdbq7IQI) Segment 1 (00:00 - 05:00)

assembly AI just released a new speech recognition model conformer 1. conformer 1 achieves near human level performance and robustness across a variety of data it was trained on 650 000 hours of data which corresponds to a 60 terabyte data set to put that into perspective most production ASR systems are trained on 50 to a hundred thousand hours of data which makes conformer 1 nearly 10 times bigger because it was trained on so much data conformer 1 is highly robust and accurate across a number of different data sets it makes 43 fever errors on noisy data and achieves state-of-the-art results on a wide range of academic and real world data sets compared to other ASR models here are some examples can you describe two years when I came back to bring a championship to the city I gave you everything that I had I brought my heart in my blood my sweat my tears to this game and let's see how it works so the conformer of one model combines the conformer architecture and the findings from the recent chinchilla paper about optimal model training conformer is an architecture developed by Google brain for speech recognition back in 2020. it combines the strengths of convolutional neural networks and the very famous Transformers it captures both local and Global dependencies while being a relatively size efficient architecture even though the conformer architecture has shown state-of-the-art performance in automatic speech recognition it has a serious disadvantage of computational and memory inefficiency the attention mechanism inside the conformer architecture is essential to capture and retain long-term information in an input sequence yet it is known to cause computational bottleneck this makes the conformer slower to operate during training and inference times compared to other architectures and poses a serious engineering challenge for deployment of this model inside large-scale ASR systems where speed is one of the key priorities with conformer 1 a couple of steps are taken to overcome the issues that come with the conformer model such as using the efficient conformer as the base model which is a faster and more robust version of the conformer architecture and also using sparse attention to improve the model's performance on noisy data to learn more about the architectural details you can take a look at the in-depth model release blog alright but why train on that much data it's because the recent research shows us that most neural Nets are under trained over the last couple of years we've seen a lot of large language models come out such as bird gpt4 chat GPT Palm Megatron and even more are on the way the main focus of these models have been to increase the model size and size here meaning the number of parameters I'm sure you've seen deep learning companies highlight the number of parameters they have when they're talking about the new model they're launching this practice of increasing the model size to achieve a better model performance is rooted from the paper scaling laws for neural language models by Kaplan at all where it is proposed that for every 10 times the compute budget is increased the model size should Increase five and a half times whereas the number of training tokens should only increase 1. 8 times Recent research shows however that the number of data a model is trained on should double for every time the model size doubles to arrive at this conclusion a team of researchers at deepmind trained over 400 different models with varying numbers of parameters and varying numbers of training tokens ranging from 70 million to 16 billion parameters and from 5 billion to 500 billion training tokens their findings suggest that the current models are considerably oversized given their respective training budgets this is because these models have been trained by blindly following the previous optimal scaling laws provided by Kaplan At All by adopting and improving the conformer architecture scaling the size of the training data in accordance with the findings of the chinchilla paper and training on human labeled and noisy data conformer 1 is able to achieve robustness and state-of-the-art accuracy conformer 1 makes 43 fewer errors on average on noisy data compared to popular commercially available ASR models as well as open source models like whisper this comparison is done using more than 60 hours of human labeled data covering domains such as webinars call centers broadcasts or podcasts conformer 1 also generalizes well and maintains its high accuracy across the board in terms of word error rate against a number of academic data sets conformer 1 is available through assembly ai's API for free if you'd like to try it out right away you can head to assembly ai's playground at assemblyai. com playground

### [5:00](https://www.youtube.com/watch?v=hkChdbq7IQI&t=300s) Segment 2 (05:00 - 05:00)

stay tuned for more to come from assembly AI

---
*Источник: https://ekstraktznaniy.ru/video/12663*