How to Use OpenAI's Whisper for Perfect Transcriptions (Speech to Text)

8:13

How to Use OpenAI's Whisper for Perfect Transcriptions (Speech to Text)

Teacher's Tech 08.10.2025 47 998 просмотров 1 135 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

In this step-by-step tutorial, I show you how to use OpenAI's Whisper AI to get incredibly accurate transcriptions from any audio or video file, completely for free. Turn your audio files into text. Stop paying for expensive services or wasting time transcribing manually! Whisper AI is a powerful tool that can handle 99 different languages with near human-level precision. I'll walk you through the entire process using a free Google Colab notebook, which means you don't need a powerful computer to follow along. We'll cover how to choose the right model, run the transcription command, and understand the different output files, including the .srt file you can use for video captions. Whether you're a content creator, student, researcher, or professional who needs to transcribe meetings, this guide will show you everything you need to know to get started with one of the best AI tools available today. ⬇️ COPY & PASTE COMMANDS FOR GOOGLE COLAB ⬇️ 1. Install Whisper AI & FFmpeg: (Run this cell first to set up the environment) !pip install git+https://github.com/openai/whisper.git !sudo apt update && sudo apt install ffmpeg 2. Run Whisper AI: (Change the file name and model size to fit your needs) !whisper "ENTER FILE NAME HERE" --model base.en 3. View All Commands (Help Menu): (See all possible arguments and options) !whisper -h TIMESTAMPS: 0:00 - Introduction 0:32 - Getting Started with Google Colaboratory 1:23 - Configuring Your Google Colab Environment 1:58 - Installing Whisper AI & FFmpeg 2:35 - Uploading Your Audio or Video File 3:02 - Running Whisper AI (Choosing Your Model) 4:52 - Reviewing the Output Files (.txt, .srt, .vtt, .tsv) 5:57 - How to Transcribe Another File 6:55 - Exploring Additional Parameters with -h 7:50 - Final Thoughts & Wrap Up

Оглавление (10 сегментов)

Introduction

Hey, I'm Jamie and welcome back to Teachers Tech. Tired of slowly transcribing audio or paying for expensive services? Today I'm going to show you how you can get incredibly accurate transcriptions in almost 100 languages completely for free. That's the power of OpenAI's Whisper. And today I'm going to show you how to use it. We'll run it for free using Google Collab to get highly accurate text from audio file. By the end of this video, you'll be able to get a full transcription and even time subtitle file in minutes. Let's dive

Getting Started with Google Colaboratory

in. Let's head over to Google Drive. I'm already logged in here with a completely free account. I'm just going to go to my Google Drive. And the first thing we're going to need to do is install Google Collaboratory. Head over to the new drop down to where it says more. And you can see connect more apps. When you select this, you're going to need to do a search for Google Collaboratory. Even if you just start typing collaboratory, you'll see it come up right here. This is what we want to install the collaboratory collaborative team. And we're just going to go and click install on this. Go ahead and continue. You'll get to a point where you have to continue again. We'll click okay here. And let's just close out of here. And now if we go back to new and more, you're going to see Google Collaboratory right here. Let's go ahead and click on this.

Configuring Your Google Colab Environment

Now we're inside Google Collaboratory. First thing is if this is looks completely new to you, don't get intimidated by this. I'll walk you through step by step and you'll see that it's actually pretty easy. I'm going to give this a title to begin with right up top here uh just so it makes it easy to understand what we are doing. And I'm just going to call this one transcribe up here. Now, another change that we have to do is go over to runtime and we're going to go and change runtime type. We're going to select the graphics one, the GPU, which runs Whisper a lot better. So, when I go ahead and click on it, I'm going to go hit save. Now, we're going to install Whisper,

Installing Whisper AI & FFmpeg

and I'm going to paste this code in. This is going to be down below in the description of this video, so you can just go ahead copy and paste it in as well. This right here is going to install Whisper. It's going to grab the code files from GitHub and then it is also going to install this ffmpeg which will allow it to work with audio and video files. Once this is all ready, we can go ahead and hit run. And it will take about 30 40 seconds to get this all installed. And look at that. Installed in 27 seconds. Let's go over to the folder on the left here and we'll just click it. This is where

Uploading Your Audio or Video File

we're going to drag or upload the files to. So, I'm going to go and grab an MP3 that I'm going to drag and place in. I could upload a video as well. So, if I add an MP4, it will transcribe that as well. So, this is just going to say that the runtime files will be deleted afterwards. I'm going to hit okay. Let's go ahead now and extract from the file that we uploaded. We'll need to add a line of code. So, I'm just going to move up and click on this. And it gives me this new line down

Running Whisper AI (Choosing Your Model)

here. I'm going to paste this in. Again, this is down below in the description, and it's quick to just copy and paste it up here. So, this is going to call on Whisper. We're going to enter the file name here, and I'm going to change this in a moment. And this shows me the model. And in this example, I'm going to be using base, but there's other models you can choose from. Okay, before we hit run, we need to make one quick decision, which Whisper model to use. Think of it like choosing the engine size for a car. Whisper comes in five main sizes: tiny, base, small, medium, and large. The trade-off is really simple. The larger the model, the more accurate the transcription. The large model is incredibly precise, but it's also the slowest. The tiny model is super fast, but it might make more mistakes. There is also English only versions for the smaller models. So, if you know your audio is only in English, choosing a model like bass can be a bit faster. For this video, we're going to choose the base model. Gives you a fantastic balance of speed and accuracy, and it's the perfect place to start. All right, let's start selecting our model and get transcribing. I have my model chosen. Don't forget to put the en at the end. Now, I have to put my file name here. So, look what I called this. This is probably not a very good naming convention here with the hyphens and everything. Uh, I'm going to go and rechange the name. I'm just going to go rename just to make this easier. I'm just going to call this Gemini. mpp. notmpp3. So now it makes it easy to type in over there. Or if I wanted to to make sure that I uh typed everything correctly, I could just go and select everything in here, Ctrl + C, copy, and go over here and Ctrl +V and paste. So now I know it's letter to letter. Let's go ahead and run this now.

Reviewing the Output Files (.txt, .srt, .vtt, .tsv)

Now look at all this transcription right here. But I want to take a look over on the left at these files here. Let's start with this one, the TXT file. So this is the most straightforward one. It's plain text file containing the entire transcription. You can also see there's an SRT file and a VTT file. These are both caption files used for videos. They have the text synced with timestamps, so you can upload the SRT file directly to YouTube to instantly add perfect captions. Finally, there's the TSV file. This is a spreadsheet format. It gives you super detailed breakdown of the transcription uh with start and time for segment. It's great if you need to do deeper analysis. If you want to download the download these ones and what we'll be doing with this one, we can just go ahead and click on those ellipses. We have the download option. And then it will have it downloaded and we can go ahead and open it up. Let me bring this over and take a look at this. This is the text file. It's in perfect punctuation, capitals, everything in this. And this is a free tool. So, what happens if you want to do another file? I

How to Transcribe Another File

can go grab another one, drag it over, drop it in here, and then once this gets uploaded, I'm going to do the same thing. I'll just do the rename. I just do this because I want to make sure that I have everything uh letter to letter, not have any mistakes in the typing. So, there I go right here. And I'm going to go and run this one now. And just like that, everything here is transcribed. I have these more files on the side. And this time, I'm going to go and download the SRT file that I can show you. I'm going to open it up right away here. Open it up in Notepad. And it gives you an idea of what happens here. You can see how everything is timed through it. And this is what you can upload directly to your videos in YouTube if you want. even though YouTube's a lot better now with their transcription than it used to be, but if you do need to give this to somebody to know the exact times of the video. One other thing I want to

Exploring Additional Parameters with -h

point out, and I'm going to add another line of code here. If you want to find more parameters, what you can do with whisper, go ahead and type this in. And this is going to be in the down below in the description as well. I'm just going to run this. That H that we typed in right up here, that just simply stands for help. When we ran this, it doesn't do any transcription. Instead, it just prints out a complete user manual for Whisper. You'll see a list of every single setting you can tweak, like how to specify the language, change output format, and a bunch of other advanced features. It's a great little trick to remember because it lets you see everything the tool is capable of within one simple command. Now, we have our files all transcribed. I have them downloaded and that's an important thing to download them when you're working in Google Collaboratory because once I leave here, I won't be able to get these back. So, make sure you download any of the different files that you've already transcribed and that you need.

Final Thoughts & Wrap Up

So, there you have it. In just a few minutes, we used OpenAI's Whisper to get a full incredibly accurate transcription from our audio file. We saw how to choose a model, run the code, and get different output files like plain text and timed captions. This is such a gamecher for transcribing meetings, lectures or creating subtitles. And the best part, it is completely free.

Другие видео автора — Teacher's Tech

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник