How to Use OpenAI's Whisper for Perfect Transcriptions (Speech to Text)
8:13

How to Use OpenAI's Whisper for Perfect Transcriptions (Speech to Text)

Teacher's Tech 08.10.2025 47 998 просмотров 1 135 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this step-by-step tutorial, I show you how to use OpenAI's Whisper AI to get incredibly accurate transcriptions from any audio or video file, completely for free. Turn your audio files into text. Stop paying for expensive services or wasting time transcribing manually! Whisper AI is a powerful tool that can handle 99 different languages with near human-level precision. I'll walk you through the entire process using a free Google Colab notebook, which means you don't need a powerful computer to follow along. We'll cover how to choose the right model, run the transcription command, and understand the different output files, including the .srt file you can use for video captions. Whether you're a content creator, student, researcher, or professional who needs to transcribe meetings, this guide will show you everything you need to know to get started with one of the best AI tools available today. ⬇️ COPY & PASTE COMMANDS FOR GOOGLE COLAB ⬇️ 1. Install Whisper AI & FFmpeg: (Run this cell first to set up the environment) !pip install git+https://github.com/openai/whisper.git !sudo apt update && sudo apt install ffmpeg 2. Run Whisper AI: (Change the file name and model size to fit your needs) !whisper "ENTER FILE NAME HERE" --model base.en 3. View All Commands (Help Menu): (See all possible arguments and options) !whisper -h TIMESTAMPS: 0:00 - Introduction 0:32 - Getting Started with Google Colaboratory 1:23 - Configuring Your Google Colab Environment 1:58 - Installing Whisper AI & FFmpeg 2:35 - Uploading Your Audio or Video File 3:02 - Running Whisper AI (Choosing Your Model) 4:52 - Reviewing the Output Files (.txt, .srt, .vtt, .tsv) 5:57 - How to Transcribe Another File 6:55 - Exploring Additional Parameters with -h 7:50 - Final Thoughts & Wrap Up

Оглавление (10 сегментов)

  1. 0:00 Introduction 95 сл.
  2. 0:32 Getting Started with Google Colaboratory 161 сл.
  3. 1:23 Configuring Your Google Colab Environment 132 сл.
  4. 1:58 Installing Whisper AI & FFmpeg 123 сл.
  5. 2:35 Uploading Your Audio or Video File 104 сл.
  6. 3:02 Running Whisper AI (Choosing Your Model) 353 сл.
  7. 4:52 Reviewing the Output Files (.txt, .srt, .vtt, .tsv) 212 сл.
  8. 5:57 How to Transcribe Another File 192 сл.
  9. 6:55 Exploring Additional Parameters with -h 192 сл.
  10. 7:50 Final Thoughts & Wrap Up 66 сл.
0:00

Introduction

Hey, I'm Jamie and welcome back to Teachers  Tech. Tired of slowly transcribing audio or   paying for expensive services? Today  I'm going to show you how you can   get incredibly accurate transcriptions in  almost 100 languages completely for free. That's the power of OpenAI's Whisper. And today  I'm going to show you how to use it. We'll run   it for free using Google Collab to get highly  accurate text from audio file. By the end of this   video, you'll be able to get a full transcription  and even time subtitle file in minutes. Let's dive
0:32

Getting Started with Google Colaboratory

in. Let's head over to Google Drive. I'm already  logged in here with a completely free account. I'm   just going to go to my Google Drive. And the first  thing we're going to need to do is install Google   Collaboratory. Head over to the new drop down to  where it says more. And you can see connect more   apps. When you select this, you're going to need  to do a search for Google Collaboratory. Even if   you just start typing collaboratory, you'll see  it come up right here. This is what we want to   install the collaboratory collaborative team.   And we're just going to go and click install   on this. Go ahead and continue. You'll get to  a point where you have to continue again. We'll   click okay here. And let's just close out of  here. And now if we go back to new and more,   you're going to see Google Collaboratory  right here. Let's go ahead and click on this.
1:23

Configuring Your Google Colab Environment

Now we're inside Google Collaboratory. First  thing is if this is looks completely new to you,   don't get intimidated by this. I'll walk you  through step by step and you'll see that it's   actually pretty easy. I'm going to give this a  title to begin with right up top here uh just so   it makes it easy to understand what we are doing.   And I'm just going to call this one transcribe up   here. Now, another change that we have to do is go  over to runtime and we're going to go and change   runtime type. We're going to select the graphics  one, the GPU, which runs Whisper a lot better. So,   when I go ahead and click on it, I'm going to go  hit save. Now, we're going to install Whisper,
1:58

Installing Whisper AI & FFmpeg

and I'm going to paste this code in. This is going  to be down below in the description of this video,   so you can just go ahead copy and paste it in as  well. This right here is going to install Whisper.    It's going to grab the code files from GitHub and  then it is also going to install this ffmpeg which   will allow it to work with audio and video files.   Once this is all ready, we can go ahead and hit   run. And it will take about 30 40 seconds to get  this all installed. And look at that. Installed   in 27 seconds. Let's go over to the folder on the  left here and we'll just click it. This is where
2:35

Uploading Your Audio or Video File

we're going to drag or upload the files to. So,  I'm going to go and grab an MP3 that I'm going   to drag and place in. I could upload a video as  well. So, if I add an MP4, it will transcribe   that as well. So, this is just going to say that  the runtime files will be deleted afterwards. I'm   going to hit okay. Let's go ahead now and extract  from the file that we uploaded. We'll need to add   a line of code. So, I'm just going to move up and  click on this. And it gives me this new line down
3:02

Running Whisper AI (Choosing Your Model)

here. I'm going to paste this in. Again, this is  down below in the description, and it's quick to   just copy and paste it up here. So, this is going  to call on Whisper. We're going to enter the file   name here, and I'm going to change this in a  moment. And this shows me the model. And in this   example, I'm going to be using base, but there's  other models you can choose from. Okay, before we   hit run, we need to make one quick decision,  which Whisper model to use. Think of it like   choosing the engine size for a car. Whisper comes  in five main sizes: tiny, base, small, medium, and   large. The trade-off is really simple. The larger  the model, the more accurate the transcription.    The large model is incredibly precise, but it's  also the slowest. The tiny model is super fast,   but it might make more mistakes. There is also  English only versions for the smaller models. So,   if you know your audio is only in English,  choosing a model like bass can be a bit faster.    For this video, we're going to choose the base  model. Gives you a fantastic balance of speed   and accuracy, and it's the perfect place to start.   All right, let's start selecting our model and get   transcribing. I have my model chosen. Don't forget  to put the en at the end. Now, I have to put my   file name here. So, look what I called this. This  is probably not a very good naming convention here   with the hyphens and everything. Uh, I'm going  to go and rechange the name. I'm just going to   go rename just to make this easier. I'm just going  to call this Gemini. mpp. notmpp3. So now it makes   it easy to type in over there. Or if I wanted to  to make sure that I uh typed everything correctly,   I could just go and select everything in  here, Ctrl + C, copy, and go over here and   Ctrl +V and paste. So now I know it's letter  to letter. Let's go ahead and run this now.
4:52

Reviewing the Output Files (.txt, .srt, .vtt, .tsv)

Now look at all this transcription right here.   But I want to take a look over on the left at   these files here. Let's start with this one, the  TXT file. So this is the most straightforward one.    It's plain text file containing the entire  transcription. You can also see there's an   SRT file and a VTT file. These are both caption  files used for videos. They have the text synced   with timestamps, so you can upload the SRT file  directly to YouTube to instantly add perfect   captions. Finally, there's the TSV file. This is  a spreadsheet format. It gives you super detailed   breakdown of the transcription uh with start  and time for segment. It's great if you need   to do deeper analysis. If you want to download  the download these ones and what we'll be doing   with this one, we can just go ahead and click on  those ellipses. We have the download option. And   then it will have it downloaded and we can  go ahead and open it up. Let me bring this   over and take a look at this. This is the text  file. It's in perfect punctuation, capitals,   everything in this. And this is a free tool. So,  what happens if you want to do another file? I
5:57

How to Transcribe Another File

can go grab another one, drag it over, drop  it in here, and then once this gets uploaded,   I'm going to do the same thing. I'll just do  the rename. I just do this because I want to   make sure that I have everything uh letter to  letter, not have any mistakes in the typing. So,   there I go right here. And I'm going to go  and run this one now. And just like that,   everything here is transcribed. I have these more  files on the side. And this time, I'm going to go   and download the SRT file that I can show you.   I'm going to open it up right away here. Open   it up in Notepad. And it gives you an idea of  what happens here. You can see how everything is   timed through it. And this is what you can upload  directly to your videos in YouTube if you want.    even though YouTube's a lot better now with their  transcription than it used to be, but if you do   need to give this to somebody to know the exact  times of the video. One other thing I want to
6:55

Exploring Additional Parameters with -h

point out, and I'm going to add another line of  code here. If you want to find more parameters,   what you can do with whisper, go ahead  and type this in. And this is going to be in   the down below in the description as well.   I'm just going to run this. That H that we   typed in right up here, that just simply stands  for help. When we ran this, it doesn't do any   transcription. Instead, it just prints out a  complete user manual for Whisper. You'll see   a list of every single setting you can tweak, like  how to specify the language, change output format,   and a bunch of other advanced features. It's a  great little trick to remember because it lets   you see everything the tool is capable of within  one simple command. Now, we have our files all   transcribed. I have them downloaded and that's  an important thing to download them when you're   working in Google Collaboratory because once I  leave here, I won't be able to get these back. So,   make sure you download any of the different files  that you've already transcribed and that you need.
7:50

Final Thoughts & Wrap Up

So, there you have it. In just a few minutes, we  used OpenAI's Whisper to get a full incredibly   accurate transcription from our audio file. We  saw how to choose a model, run the code, and get   different output files like plain text and timed  captions. This is such a gamecher for transcribing   meetings, lectures or creating subtitles.   And the best part, it is completely free.

Ещё от Teacher's Tech

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться