Python Audio Processing Basics - How to work with audio files in Python
24:59

Python Audio Processing Basics - How to work with audio files in Python

AssemblyAI 26.05.2022 79 103 просмотров 1 241 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Learn how to work with audio files in Python in this Python Audio Processing Tutorial. Learn about: - mp3, wave, flac file - sampling rate - wave module Python - plot waveform with matplotlib - record microphone with Python - PyAudio Tutorial - PyDub Tutorial Get your Free Token for AssemblyAI Speech-To-Text API 👇https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_pat_38 Resources: PyAudio: http://people.csail.mit.edu/hubert/pyaudio/ M1 PyAudio Installation command: python -m pip install --global-option='build_ext' --global-option='-I/opt/homebrew/Cellar/portaudio/19.7.0/include' --global-option='-L/opt/homebrew/Cellar/portaudio/19.7.0/lib' pyaudio ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Timestamps: 00:00 Intro 00:27 signal parameters 03:18 wave module 10:44 plot waveform matplotlib 15:40 record microphone with PyAudio 21:50 load mp3 with PyDub Headphones icons created by photo3idea_studio - Flaticon: https://www.flaticon.com/free-icons/headphones Microphone icons created by Freepik - Flaticon: https://www.flaticon.com/free-icons/microphone #Python #PyAudio

Оглавление (6 сегментов)

Intro

hi everyone in this video i teach you some audio processing basics in python so we briefly touch on different audio file formats then we have a look at different audio signal parameters then i show you how to use the wave module to load and save a wav file then i show you how to plot a wave signal then we also learn how to record with your microphone in python and finally i'll also show you how to load other file formats like mp3 files so let's get started so first of

signal parameters

all before we write some code let's talk briefly about different audio file formats so here i've listed three of the most popular ones mp3 flag and wave mp3 is probably the most popular one that you may know and this is a lossy compression format so this means it compresses the data and during this process we can lose information on the other hand flac is a loss-less compression format so it also compresses the data but it allows to perfectly reconstruct the original data and wave is a uncompressed format so this means it stores the data in an uncompressed way so the audio quality here is the best but also the file size is the largest and wave is the standard for cd audio quality so we focus on this in the first part because it's actually very easy to work with this in python because we have a built-in wave module so we don't have to install this and now let's have a look at how we can work with a wave audio file um by the way wave stands for wave form audio format and before we start loading some data let's talk about a few parameters that we have to understand so before we load our first wav file let's understand a few parameters so we have the number of channels this is usually one or two so one is also known as mono and two is stereo so this is the number of the independent audio channels for example two or stereo has two independent channels and this means it gives you the impression that the audio is coming from two different directions then we have the sample width this is the number of bytes for each sample so this will get more clear later when we have a look at an example and then we have the frame rate which is also known as the sample rate or sample frequency and this is a very important parameter so this means the number of samples for each second and for example you may have seen this number a lot so this means 44 100 hertz or 44. 1 kilohertz this is usually the standard sampling rate for cd quality so this means we get 44 100 sample values in each second and then we have the number of frames so yeah this is the total number of frames we get and then we have the values in each frame and when we load this will be in a binary format but we can convert this to integer values later so

wave module

now let's have a look at how to load a file so with the wav for wave module so here i prepared a simple wav file so this is five seconds long so let's actually listen to this hi my name is patrick and i'm a developer advocate at assembly ai and yeah here we also see a few parameters already so now let's go back to the code and now let's load this file so um for this we create an object and we simply say wave. open then we have to give it the name so this is called patrick. wave and to read this we say we read this in read binary and now we can extract all these different parameters for example let's print the um let's say the number of channels and we get this by saying objects dot get n channels then we also want to print the sample width so and we get this by saying object dot get samp width then let's print the um frame rate so print frame rate and we get this by saying objects dot get frame rate then what do we also want the number of frames so we print and then we say objects dot get and not and channels and frames and lastly um let's also print the um all the parameters so we can get all the parameters at once by saying object dot get params and now let's print this so if we run this so i say python wave example dot pi then we see we have only one channel so this is a mono format we have a sample width of two so we have two bytes for each sample then we have a frame rate of sixteen thousand and a number of frames of eighty 000 and here we also have all the parameters as a wave params object so for example now we can calculate the time of the audio and we as i said the frame rate is the number of samples per second so if we get the whole number of frames so the number of frames or number of samples divided by the frame rate then we get the time in seconds so now if we print t audio and run this then we get 5. 0 so 5 seconds so this is the same that we see here so this works and now let's get the actual frames so the frames equals objects dot get frames get no sorry object dot read frames and then we can give it the number of frames or we can i think we can pass in -1 so this will read all frames and let's for example so let's print the type of this to see what this is and then also print the type of frames 0 and then let's print the length of frames so now let's run this and then we see this is a um bytes object and um so here we see class bytes and when we extract the first byte then we see this is a integer and now the length of the frames object is 160 000 so this is not the same as the number of frames so if we have a look here the number of frames is 80 000 but if we extract the link here then this is twice as much and if you listen carefully in the beginning here i mentioned the sample width this means we have two bytes per sample so now if we actually calculate this divided by 2 then again we get our 80 000 number of frames and yeah this is how easily we can read a wave um file and then we can work with this and work with the frames and now to load or to save the data again we also open let's call this object new equals and then we say wave dot open then we can give it a new name let's say patrick underscore new dot wave and now we open this in right binary mode and now we can basically call all those functions as setters and not as getters so we say object new dot set number of channels so this is only one channel then we say object new dot set sample width this should be 2 object new dot set frame rate this is 16 000 as a float so these are all the parameters we should set and then we can write the frames by saying object new dot right frames and then the frames so here we have the original frame so now basically we duplicate the file so we write the frames and what i forgot so when we are done with opening this and reading all the information we want we all should also should call objects dot close and then the same here so here we say object new dot close and this will close the file objects and yeah so now if we save this and run this then now here we see we have the duplicated file and if we run this hi my name is patrick and i'm a developer advocate at assembly ai then we see this works and it has the same data in it so yeah this is how to work with a wav file and with the wave module

plot waveform matplotlib

so now let's see how we can plot a wav file object now plotting a wave signal is actually not too difficult so for this we just need to install matplotlib and numpy then we import all the modules we need so again we need wave then we need matplotlib dot pi plot s plt and then we import numpy num pi s and p then again i want to read the wave file so i say wave dot open and this was patrick dot wave in read binary mode then i want to read all the parameters that i need so i want to read the sample frequency and this is objects dot get frame rate then i need the number of samples so this is object dot get and frames and then i also need the actual signal so i call this signal dot wave equals object dot read frames -1 so all the frames and then i can say object dot close and then for example we can calculate the number on the length of the signal in seconds so i call this t audio and if you remember this from the first code so this is the number of samples divided by the sample frequency and now let's print the t audio and save this and run this just as a test so now we can run python plot audio and we get 5. 0 so this works so far so now um i want to create the plot so this is a bytes object so we can create a numpy array out of this very easily so i call this um signal array equals and then we can use numpy from buffer and here we put in the signal signal wave and we can also specify a data type so here i want to be to have this int 16 and now we need an object for the x-axis or the so the times axis so we say times equals and here we use the numpy lin space function this gets zero as the start and the end is the length of the signal so this is t audio or five seconds and then we can also give this a number parameter and the number is the number of samples so if you remember so the signal wave so here we basically get a sample for each point in time and now we want to plot this so we create a figure so we say plt dot figure and we give this a fixed size of 15 by five then we say plt dot plot and we want to plot the times against the signal array then we simply give it a title plt dot title and let's call this audio signal then i also want to say plt dot y label and the y label is the sig null wave and the plt x label is um the time in seconds and then we say l t x lim and we limit this to be between zero and t audios for five seconds and then we say plt dot show and this is all we need and now if we run this then this should open our plot and here we have it so here we have our audio signal plotted as a wave plot and this is how easily we can do it with matplotlib and the wave module now let's

record microphone with PyAudio

learn how we can record with our microphone and capture the microphone input in python so for this we use pi audio a popular python library and this provides bindings for port audio the cross platform audio i o library so with this we can easily play and record audio and this works on linux windows and mac and for each platform there's a slightly different installation command that they recommend so for example on windows you should use this command on mac you also have to install port audio first so if you use homebrew then you can easily say brew install port audio and then pip install pi audio and on linux you can use this command so i already did this so here i'm on a mac so i used brew install port audio and then pip install pi audio and now i can import this so i say import pi audio and i also want to import wave to save the recording later then i want to set up a few parameters so i say frames per buffer and here i say 3200 so you can play around with this a little bit then i specified the format so the format equals pi audio dot p r int 16 so this is basically the same that we used here so here we use numpy in 16 and then here we have the pr in 16 then i also specify the number of channels so here i say one so simply a mono format and then also the frame rate so the rate here again i say 16 000 so again you can use a different rate and play around with this then we create our pi audio object so we say p equals pi audio dot pi audio then we create a stream object so we say stream equals p dot open and now we put in all the parameters so we say format equals format then i need the channels so channels equals channels the rate equals the rate we also want to capture the input so input equals true and lastly we say frames per buffer equals frames per buffer then we have our stream object so now we can print start recording and now we want to record for a number of seconds so here i say five seconds and then we store the frames and we store this in a list object and now we can iterate over and say for i in range and we start at zero and go until and now we say rate divided by frames per buffer times the seconds and then we convert this to a integer not a float and with this we basically record for five seconds and then we read each chunk so we say data equals and then here we say stream dot read and then we read the frames per buffer and then we say frames um dots append the data so basically frames per buffer so this means we read this many frames and at once so with one iteration and now we have it so now we can close everything again so we can say stream dot stop stream then we can also say stream dot close and we can say p dot terminate so now we have everything correctly shut down and now we can for example save the frames object again in a wav file so for this i say object equals wave dot open and let's call this output dot wave and in right binary mode then we set all the parameters so i said object set number of channels this is the channels parameter objects dot set sample width this is the this we get from p dot get sample size of our format then objects dot set frame rate this is the rate and then we can write all the frames so we say object dot write frames and we need to write this in binary so we can create a binary string like this so a string and then dot join and here we put in our frames list so this will combine all the elements in our frames list into a binary string and then we say object dot close and this is everything we need to do so now we can run python record mic and test this hi i'm patrick this is a test one two three and now it's done so here we have our new file so let's play this and see if this works hi i'm patrick this is a test one two three and it worked awesome and now as last step i also want to show you how to load mp3 files and not only wav files so

load mp3 with PyDub

let's do this so to load mp3 files we need an additional third party library and i recommend to use piedup so this is a very simple to use library it provides a simple and easy high level interface to load and also to manipulate audio so in order to install this we also need to install ffmpeg so on the mac i use homebrew so i had to say brew install ffmpeg and after this you can simply say pip install and then pied up and now this should install it so here it's already satisfied and now we can for example say from pi dub we want to import the audio segment and then we can say audio equals here we can say from mp3 if we have an mp3 in my case right now i only have a from wave and then here i let's load the um patrick dot wave and then we can for example also very easily manipulate this by saying audio plus six so this will increase the volume by six d b six db then we can also for example repeat the clip so we say audio equals audio times 2 then we can use a fade in for example audio equals audio dot um fade underscore in with 2 000 milliseconds so two seconds fade in the same works with fade out so yeah this is how we can manipulate and then we can say audio dot export and then i want to export this in let's call this mesh up dot mp3 and then i have to say format equals um as a string mp3 and now for example i could load um this by saying audio2 equals audio dot from mp3 and then here i use mesh up dot mp3 and then print done so that we see it arrives at this part and now let's say python and then the load mp3 file and yeah this works so now here we have our mp3 file and we could also load it like this so yeah that's how we can use the pineapple module to load other file formats and yeah that's all i wanted to show you in this tutorial i hope you really enjoyed this if you like this then please leave us a thumbs up and consider subscribing to our channel and then i hope to see you next time bye

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник