Best Free Speech-To-Text APIs and Open Source Libraries
7:33

Best Free Speech-To-Text APIs and Open Source Libraries

AssemblyAI 05.01.2022 205 428 просмотров 1 379 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this video, we have a look at the best free speech to text APIs and also at the top open source libraries for speech recognition! Get your Free Token for AssemblyAI Speech-To-Text API 👇https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_pat_6 Converting speech to text is an exciting but also challenging task. Luckily there are existing solutions available that we can use. We can either use a speech-to-text API, or an existing open source engine. Before we have a look at the best best free solutions, we also go over the advantages and disadvantages of both approaches. APIs: Google Speech to Text AssemblyAI AWS Transcribe Open Source Libraries: DeepSpeech Kaldi Wav2Letter SpeechBrain Coqui

Оглавление (7 сегментов)

Intro

do you want to convert speech to text in your own project but don't know where to get started then look no further because in this video we have a look at the best free speech to text apis and also at the top open source libraries for speech recognition converting speech to text is

Overview

an exciting but also a challenging task luckily there are existing solutions out there that we can use basically we have two options we can either use an api or we can use an existing open source library so in this video we have a look at the best free solutions of course normally you have to pay for an api but all the listed services in this video also come with a free tier that might be enough for a simple project or to get started with your mvp so before we have

Advantages

a look at each service and library let's go over the advantages and disadvantages of both approaches with an api it's much easier to get started you don't even need any deep learning related knowledge how the underlying model actually works apis usually offer a well-trained state-of-the-art language model so the accuracy is much better and it can offer additional out-of-the-box features like entity detection or sentiment analysis but on the downside you have to pay for the service and you always need an internet connection to access it on the other hand open source libraries are completely free and with open source you can see what's going on under the hood and you can even contribute and help to improve it also by working with open source libraries you learn a lot but on a downside it can be difficult to set up and oftentimes you need a lot of prerequisites for example a lot of libraries require a linux build system and you need a good gpu and you need programming skills and oftentimes also deep learning specific knowledge for a speech to text library so now that we

Google SpeechToText

know about the different pros and cons of each approach let's go over the different options we have first let's have a look at the different speech-to-text apis that also come with a free tier google's speech to text api is probably the most popular api for speech recognition they offer 60 minutes free transcription per month and as a new user you also get 300 in free credits for google cloud after that it costs 0. 006 dollar per 15 seconds or 0. 009 per 15 seconds depending on the different options their api has a good accuracy and support for over 60 different languages on the downside you need to sign up for a google cloud account and create a project in there and it's surprisingly

AssemblyAI

complicated to get started with it next we have a look at assembly ai offers a state-of-the-art speech to text api which is built for developers their api documentation is great and they also provide a lot of tutorials so you can get started and integrate speech recognition into your app in under five minutes with a free tier you can transcribe three hours of audio content each month and after that pricing is very straightforward transcribing simply costs 0. 00025 dollars per second this results in 0. 00375 per 15 seconds as compared to the 0. 006 per 15 seconds we have with google additional optional audio intelligence features cost 0. 000 dollar per second on top which makes the total amount still pretty cheap and these features are awesome you can get sentiment analysis content summarization topic detection entity detection and much more and all of this can be obtained with a few simple api calls now on the downside as of today assembly i only supports english transcription but more language models will be available soon and also their sdks are still a little bit limited but their api is so easy to work with that it allows for a quick setup with native http libraries in any programming language so out of all options in this video i think this

AWS Transcribe

is the easiest one to set up and the last api option i want to show you is the aws transcribe service the free tier offers one hour free per month for the first 12 months of use pricing can vary depending on different options but in the first category it is for example 0. 024 per minute which is 0. 006 per 15 seconds so the same that we have with google getting started in the aws ecosystem can be a complex process but once you have set this up this is also a reliable api and if you're looking for a specific feature like medical transcription aws has some intriguing options for example the transcribe medical api with a medical focused speech recognition service now let's move on to explore some completely free open source libraries deep speech is an open source embedded speech to text engine designed to run offline in real time on a range of devices from high power gpu servers to a raspberry pi the deep speech library uses an end-to-end model architecture pioneered by baidu and the implementation is based on tensorflow deepspeech has a decent out-of-the-box accuracy and is relatively easy to tune and train on your own data kaldi is a speech recognition toolkit written in c plus that has been widely popular in the research community for many years like deep speech kaldi has good out of the box accuracy and supports the ability to train your own models i leave it up to you if you like their documentation pages but if you know your way around the toolkit and are comfortable with c plus it's one of the best production ready open source libraries out there wave to letter is facebook ai's automatic speech recognition toolkit also written in c plus wave to letter has been moved and consolidated into another repository namely into the flashlight project which is a c plus standalone library for machine learning like deep speech wave to letter is decently accurate for an open source library and is easy to work with on a small project and i also like their documentation on the github pages which is easy to follow speech brain is a pie torch-based all-in-one conversational ai toolkit the goal is to create a single flexible and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies including systems for speech recognition speaker recognition of speech enhancement speech separation and many others getting started is simpler than in many other open source speech libraries and it offers various pre-trained models nicely integrated with hugging phase so if you

Cockroach

like pie charts then this is my recommendation for you and the final open source library is cockry cocky stt is a fast multi-platform deep learning toolkit for training and deploying speech to text models it's battle tested in both production and research and has support for over 20 different languages alright i hope i could give you a nice overview of the different options you have and if you know any other good apis or free open source libraries then let us know in the comments below in the end it's up to you which one you want to use i personally love open source libraries and it's amazing how far we've come there but sometimes i don't have the computational resources or the time to set this up so apis are a pretty good alternative here i also recommend to watch this video where you learn how to build an app with the assembly ai api in under five minutes it's free to get started and really simple to set up so why not give it a try and if you enjoyed this video then leave us a like and then i hope to see you in the next video bye

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник