I Tried Coding My Own Synthesizer

I Tried Coding My Own Synthesizer

Sebastian Lague

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (7 сегментов)

Segment 1 (00:00 - 05:00)

Hello everyone and welcome to another episode of coding adventures. Today I'd like to try synthesizing the sounds of some musical instruments. Now we recently worked on this little spectrogram tool which allows us to visualize the strength of different frequencies in a sound and how those change over time. And we can even paint in our own values here like maybe a line of increasing and then decreasing frequency and listen to the result. I find this pretty fun to play around with, like just scribbling random stuff and seeing what comes out of it. — We even tried last time roughly tracing over a voice recording as a very crude sort of attempt at — synthesizing. — I'll definitely be exploring speech synthesis in the future, but let's turn our attention to music for the moment. So I found this nice repository of various instrument recordings where we can download individual notes played on a piano for instance and plop one of those into our specttogram to have a look at it. Okay, we can see here a whole sort of stack of evenly spaced frequencies which seem to roughly get weaker the higher up they go. But there's clearly quite a lot going on. So I imagine we'll be returning to this frequently to examine in more detail. For now, let's just get started by creating a simple system for generating these different frequencies. I mean, we could use our existing little drawing tool for this, but I think we'd be better off with a more precise approach this time. All right, so I've been setting some stuff up in the code here, like we have this function, first of all, that's responsible for filling in this empty array of audio data. It actually just delegates that job to whatever audio source has been assigned. But just in case that accidentally blasts out something outrageously loud, I've added a final safety step here to strictly clamp the values within a reasonable range. Okay. Then I've also created a little note class here or naughty as my syntax highlighter is insisting for some reason. And this just holds some information like the frequency of the note and how long it's been held down for. Then to actually make some noise, we have this synthesizer function which I've assigned as the source to our audio system. And in here, if the note is pressed down, we can increment its timer and then use that to figure out the phase, which is to say where along the soundwave we are at this moment in time. So, that's finally just passed through a sign function to get the actual value. All right, I've just set up a little button now to trigger the note, meaning we can sit back and enjoy the delightful sound we've no doubt created. Okay, that was a bit strange actually. I don't know if there's a bug in the code or just in my ears, but I could swear the sound changed slightly towards the end there. So, I've been snooping around the spectrogram here, and it sure does look suspicious. In particular, I mean, these little jumps at the 2 and 4 secondond marks where things suddenly look ever so slightly different. I want to hear just this little section again. All right, my best guess for what's going on is that 2 and 4 seconds are both values at which we lose a bit of floatingoint precision since it obviously needs to represent the integer part of the value. And I'm accustomed to this becoming a graphical or physics problem at really huge numbers like when a spaceship is far from the center of the universe and so you need workarounds like having the universe moving around the spaceship instead. But I wasn't expecting the precision loss to cause issues with our ears pretty much right from the get- go. For your listening pleasure though, here's how the precision loss sounds at various hour marks. Let me know if you'd like an uncut version of this to fall asleep to. Anyway, I guess what we can do is store the phase of the wave inside of the note rather than the time because so long as we stick to integer frequencies, then they will perfectly repeat once per second. Meaning that each time the value goes past one, we can shift it back to zero to maintain maximum precision. And with that little change, it seems our spectrogram is looking a lot heavier. Okay, so we're now able to create reliable individual tones from which we can hopefully build some interesting sounding things. We do need to be able to control the loudness of the tone over time though, otherwise it's just a continuous blaring bleep, which is not particularly pleasant. Like just looking at the waveform of that piano note, we can see how it takes at least a fraction of a second for the amplitude to ramp up at the start and how it then starts to gradually fade away. So I've been working on a simple tool here that allows us to specify in milliseconds this initial ramping up time known as the attack as well as how long it then takes to decay away. Then on a piano for instance, we might let go of the key at any moment causing the dampers to slam down and stop the vibrations very quickly. So we can control the duration of that fade out with this release

Segment 2 (05:00 - 10:00)

parameter here. This is known in general as an envelope. So we have a little envelope class here now holding those three parameters along with a function for figuring out the current value based on how long the note has been held or released for. So just testing this out quickly. I'll hold down the note and we can see the amplitude shoot up to one and then gradually decay. And of course, if I release the note, well, that's not quite what I was expecting. I think probably these here need to be clamped between zero and one to behave properly. All right, let's try that out again. So, I'll let go of the note. And that seems to be working well. Now, one thing that might still be buggy, though, if I set this to have a fairly slow release duration, is when we re-trigger a note while it's fading out. In that case, we can see there's a hard jump back to zero amplitude for the start of the new attack phase, which would probably sound quite unnatural. So, that's something we should fix at some point, but I'm not quite sure yet how I want to handle it. Anyway, I've just been applying this amplitude to the synthesizer quickly. So, we have the note timers being incremented over here. And with those set, we can then look up the amplitude from the envelope and of course scale the wave down based on that. All right, let's give it a go. I'm just going to set some longer durations here to start with. And then let's give this button a press and release. That seems good. And so let's hear this with some shorter values as well. All right, not too bad. But I feel like these straight lines aren't really leaving us much room for finesse. So I've been coding up a little curve editor over here. I don't know what kind of curve is most appropriate for this sort of thing. So, I've just gone with a basic bezier since that's what I'm most familiar with. Uh there's maybe still a few kinks to iron out. Okay. After a few fixes, I've now been working on a function to search for the y-coordinate of the curve that corresponds to a given x coordinate so we can actually use it in our envelope. This function isn't fabulously fast, though. So whenever the curve is edited, the search gets run at evenly spaced X intervals and the results stored in this discretized curve class. That way we can sacrifice a bit of accuracy to enable evaluating the curve much more efficiently. So our envelope now has access to three of these for attack, decay, and release from which it looks up the values rather than simply using straight lines. All righty then. Let's see if we can do anything interesting with this. The curves are straight at the moment, so we can hear how that sounds again. And now let's try tweaking the attack curve a little so that it starts out rising more rapidly, which will hopefully add a bit of punch to the sound. Okay, that's sounding good, I think. So, let's also try tweaking the decay to fall off a little faster. Let's try it faster still. Something like this perhaps. And I'll turn the pitch down as well, just for some variety. I'm pretty happy with this setup for now, I'd say. We're a bit limited, I guess, in how sharp we can make the curves. But for the most part, I think this gives us a good amount of control. What is annoying at the moment, though, is having to manually change the frequency like this if we want to play some sort of tune. So, I'm excited to present a groundbreaking new feature I've been cooking up, multiple notes. They do currently all play the same thing, though. So, we need to figure out which frequencies we actually want to assign to these things. A straightforward solution would be to just pick some bass frequency to start with, such as 250, and then go up in equal increments of, let's say, 50 Hz from there. Then the synthesizer now handles this by simply looping over all these notes and summing their waves together. Okay, it's pretty exciting that we can actually make some sort of music now. Like how's this for a tune? Or let me try something where multiple notes are held at once. You do have to be careful not to play too many notes at once though because if the sum of their waves gets too big then our clipping will come into effect of course and that's not exactly easy on the ears. This should really only happen as a last resort. So, I've been tinkering with a little function to run our audio through first, which looks at how much an input value exceeds some specified threshold and if there's no excess, just returns the input unaltered, but otherwise reduces the excess amount by a given

Segment 3 (10:00 - 15:00)

factor. I also added the option for some smoothing around the threshold. So, we can specify that threshold over here in decb and tell it how much to bring the result down beyond that. And here's what the smoothing looks like. I still need to hook this up to the synthesizer. So, I've quickly added in a little loop now over all of the audio data we're generating to convert them to decibb as an approximation of loudness and then reduce those if they exceed the threshold. Then finally, the result is converted back to raw amplitudes. Okay, with that in place, let's play some notes. And that is sounding absolutely horrible. Though maybe it's good for obnoxious alarm noises or something. I think altering each audio value individually was maybe a mistake. We should rather keep track of the loudest part of the sound encountered so far and how much our curve says to bring that down and then apply the same reduction in decibb to all the values though with some smoothing applied to avoid any harsh jumps. We should also fade this reduction back to zero so that it returns to full volume after a little while, at least until another overly loud sound is encountered. All right, that's my attempt at coding a so-called compressor. Not in the sense of fewer bites, but rather of squashing the loud parts of an audio signal down. Let's give it a try. I think that sounds fine now. Just a bit on the soft side. So, I'm going to move the threshold to only kick in at a louder level. So now individual notes should be mostly unaffected. On the other hand, playing a whole bunch at once should cause the volume to drop a lot more, preventing the sound from clipping. And it seems to be doing its job pretty well. Okay, I think it's time to revisit our choice of note frequencies. They're going up in equal frequency steps at the moment. But if we have a listen, you might agree it feels like the progression kind of slows down as we go. Apparently, humans perceive pitch logarithmically, meaning if we want it to sound like we're taking equal steps, then they should be in equal ratios rather. So, I'll define a ratio in the code, like 2:1 for instance, meaning that each note will be double the frequency of the last. I'm going to lower the starting frequency for this a bit. And then let's have a listen to how that sounds. All right. I don't have the best ear for this sort of thing, but I would say that feels like a much more consistent progression. Now suddenly if we compare it to the same range with equal frequency spacing again that doesn't sound uniform at all. Also of interest here is that playing these notes together doesn't sound particularly amazing. Whereas with our new version it sounds a lot more pleasant. I'd say in fact out of any combination it seems most people agree that these double frequencies fit together especially well. And so it's nice to think of them not as forming different notes, but rather the same note just shifted up a level. We should probably squeeze some extra notes in between though, otherwise our music will sound quite repetitive. So I've been reworking the frequency function slightly to now take in the number of unique notes we want to create. If that's set to one, then the frequencies will double on every step as we've just been looking at. But if it's set to say five for instance, then the frequencies will go up by the fifth root of two, meaning that they double after every fifth step instead. And here's how that sounds. We could squeeze in as many notes as we like, of course, but it's worth weighing the practicalities of actually playing the instrument and how many of those notes even sound nice together. So, there's heaps of history behind the start of Goss, but a common choice today is 12 unique notes in this perceptually even spacing we've implemented with a bass frequency chosen such that one of them lands on a standardized 440 Hz. Here's how that setup sounds. And after the 12th note, we land on double the frequency. So, in a sense, back where we started. An interesting convention though is to pick just seven of these 12 to limit oneself to in a piece of music. More or less at least. And one popular approach to picking them is a pattern that goes like this. That's the pattern of the so-called minor scales. So let's actually align our little keyboard with convention here. And rather than querty, label the seven notes alphabetically as A B C D E F G. And then the eighth or octave is back to A again. Meanwhile, the ones we skipped over are called accidentals, which I'll label Aar, C#, D#, F#, and G.

Segment 4 (15:00 - 20:00)

Presumably, this was the preferred pattern of whoever named the notes. But another super popular pattern goes like this instead. And that is the major scale. These patterns can be applied to any starting note. like this was A major, but if we apply it starting on C, then that is of course called C major instead. All right, let's arrange our keys more compactly into the familiar piano layout where the accidentals are moved up into these little black keys, which both saves space and gives a nice guide to where each note can be found. Like if we look for a cluster of three black keys, the A, for instance, is nestled in on the right. Let's try playing something on here. Okay, playing on a regular keyboard is a little awkward, unsurprisingly, and you don't get any information about how hard you're hitting the notes. So, I'd love to swap it out for this little MIDI keyboard. MIDI meaning musical instrument digital interface. I was happy to come across this plugin here to help with listening out for these MIDI messages. So, we basically just need to loop through all connected devices, skipping over anything non-musical and then subscribing to these note on and off events, which will send us a specific number each time a note is pressed or released. I've made a little helper function here to just extract some information from that note number, like which octave it falls into, whether it's an accidental, and what its letter name is. And with that, we can quickly test if this is working properly. Okay, that's looking promising. I'll try pressing down middle C then. And that's being detected. So, let's try D and E as well. And those work, too. Along with the note number, we also get a velocity value between 0 and one for how hard the key was pressed. So, I'm just testing that out here quickly, too. But, it seems to be working fine. All right, let's increase the size of our virtual keyboard to match the physical one. Ah, that's too many keys. It should be two octaves starting on C and ending on C. Okay, and that's looking a little cramped. So, I'll increase the width of it as well. Now, these keys will be triggered by the MIDI events, of course. And I've also gone and scaled the amplitude of the sine wave we're generating by that velocity value we're given. So let's see how that sounds. Okay, that's working nicely. So let's try to work some more on the sound itself. Thus far, we've just been using a single frequency for each note. But if we return to our spectrogram at long last, we saw, of course, that a real note contains many different frequencies, the lowest of these is known as the fundamental. And we can select that in the spectrogram to listen to it in isolation. So that's essentially what we're synthesizing at the moment. But let's try now isolating the other frequencies instead to hear what those are contributing. And then let's hear it all together again as well. All right. Clearly those extra frequencies are adding quite a lot to the sound. And as we noted before, they all seem to be spaced at equal intervals. So whatever the frequency of our fundamental is, the one above that must be twice as much and the one above that thrice as much and force as much and so on. This makes some sense if we think about how the piano and many other instruments actually make a noise by way of a string fixed at either end and caused to vibrate. The lowest frequency that could exist on the string would look like this since any lower and it would no longer fit the fixed constraints. And a higher frequency also wouldn't work, at least not until we got all the way up to precisely double the fundamental frequency. And of course, the next one that would work would be at triple the fundamental, then at quadruple, quintuple, and so on. I'd love to explore the physics of these standing waves in the future, like how the length and thickness and tightness of the strings affect the frequency and how it might deviate from the perfect simple sinosoids we're assuming here. But let's keep things easy for now. So, I've been working on a little widget here that lets us control the amplitudes of these various waves. And the waves above the fundamental are known generally as overtones, by the way. Though in the special case where they're at these integer multiples, we can also refer to them as harmonics. Anyway, in the code, the synthesizer now loops over all these harmonics we've defined and increments a phase value for

Segment 5 (20:00 - 25:00)

each of them. And then all of those are simply summed up over here. All right, let's see how it goes. So, I'll play a little tune like this one from Outer Wilds. And then let's bring in the overtone editor and make some changes. Like maybe add in the second harmonic here. Okay, let's bring the rest of these in as well. That is sounding very buzzy though. So I think I'll bring the upper harmonics down a little. We could also try thinning out the sound a bit like this. I believe some instruments actually naturally produce this sort of pattern of odd harmonics. I think when there's a tube that's only open at one end or something like that. Anyway, it's fun to mess about with this and hear the different kinds of noises we can make. But let's return finally to our simple sine wave again for comparison. I want to go back to the reference piano quickly to have a closer look at the strengths of its harmonics. And to help with that, I've added a feature where we can now mouse over a frequency to get a little plot of its amplitude. So we can see the fundamental here has a peak of one since it's the loudest in the normalized signal. But going up to the next harmonic, we can see that it peaks at around 0. 3 instead. And the one above that is about 0. 1. Then even lower and lower still from there. I've roughly noted these peaks down and transferred them over into our editor here just out of curiosity for how close this will get us to the real sound. All right, so let's try playing our note. And then here is the real piano. I feel like how the loudness of our sound is changing over time is not quite right. And just comparing the waveforms here does confirm that ours is getting quiet too quickly and curvy. We have the decay time set to 5 seconds at the moment, but it should really be persisting much longer than that. Like maybe 50 seconds perhaps. The trouble though is that we don't really have enough control with this curve to get the shape that I want. So I've been working on a more free form alternative here where we can just plop down points wherever we please. I've set this up to roughly match the shape we saw in the waveform. And then we have the ability to squish these points down to the first few seconds. I also added a nonlinear view mode to this so we can fine-tune the quieter areas as well since otherwise the values are very tiny. With a bit of tweaking, I was then able to take our waveform from looking completely wrong to having the right sort of idea at least. So, if we play some notes now, it's sounding at least somewhat like a piano. Well, unless we play too high up, that is, at which point it sounds very bright. Let's focus on this upper C here for a second and compare how that sounds to our reference piano. We can see from the spectrogram that the upper harmonics past the fifth or so, let's say, barely exist at all. They're very short and quiet. Whereas in our synthesized sound, they're still pretty prominent. So, the first thing I want to try is just taking those harmonics we set up for the low E, and let me actually switch this to a nonlinear view as well, since the amplitudes are pretty tiny. And then let's create a new version of this to use for our higher notes since the piano kind of becomes a whole different instrument up there. All right, let's see how that sounds. Starting with what we had before and then switching to our new version. And that's definitely sounding better. But taking a look at the spectrogram, it's still not quite right. It looks like the first few harmonics we have should be fading away a fair bit faster. I guess higher frequencies just naturally get smoothed out more rapidly. So, I've been tinkering with another curve here to control how fast different frequencies should decay. At the bottom end, the multiplier is just one. But as we start getting into the upper frequencies, then maybe we want the decay to be five times faster or whatever. I've based the shape of this curve on nothing at all, by the way. So, don't read anything into it. But let's have a listen again to how it sounds without this. And then here's with our new curve enabled. The difference is kind of subtle, but it's definitely a bit less shrill. We can also compare on the specttogram again, and it seems at least a little

Segment 6 (25:00 - 30:00)

closer to the reference. Now, listening to the reference again, though, there's clearly still something missing from our version. So, I want to zoom in a little here, and we can see that there's quite a lot going on in between the harmonics. I'm curious how this would sound if we actually go ahead and cut out the harmonic frequencies. So, we have just this noisy looking stuff spread across the rest of the spectrum. It sounds like what's left behind is I guess the thock of the actual key and maybe hammer or whatever else they put in pianos. So, I've tried setting up a version of our widget here where we can set the amplitudes of specific frequencies rather than just the integer harmonics. And as we can hear, that lets us make more dull sort of a tonal sounds. Still, even after messing about with this for a while, I really struggle to get anywhere close to what we're looking for. Of course, we could simply use what we extracted from the recording as a sample at the start of our synthesized sound, but I think it's more fun to recreate everything ourselves. So, I'm going to leave this as a challenge for another time, and we'll just remain sadly thoughtless for now. What we do still need to do though is take these harmonics we've defined for the middle and upper C and fill in the information for all the notes in between. So I've just been writing a quick function here that takes in a note, figures out where it lies between those two we've defined and sets the notes own harmonics to be a blend between them. Okay, let's try that out. And I think that sounds reasonable, but maybe a little bit bland. Like if we take a look at the specttogram of this D on our reference piano, for instance, we can see it actually has a whole character of its own. Like I was surprised to see that the fundamental frequency is not actually the loudest one here. Instead, the second harmonic is a bit stronger. I don't know why that is exactly, but everything from the piano's shape and size and type of wood all play a role in amplifying or damping certain frequencies. So maybe that has something to do with it. Anyway, let's stick with the interpolation for convenience today, but we do need to have a look at the lower octave still. I'll load up this low C from our reference. And wow, that is a lot of harmonics. But let's have a listen. Okay. So, there are techniques that can efficiently handle loads of harmonics like this, but with the approach we're using today where we generate them one by one, this will be very expensive. So, we're going to have to cut some of them out. Let's isolate these upper frequencies here and hear what those sound like. Okay. Obviously, they are contributing something, but I think if we just stick to these lower thousand htz or so, we still get a decent, if somewhat duller sound. All right, I'm going to zoom in here and just start noting down the peak amplitudes of these waves. The fundamental seems to be very weak in this case, and even the second harmonic is not so strong. In fact, it's only all the way up at harmonic number seven where we actually find the strongest one. It's kind of interesting the way this bounces between loud and soft. And you may have noticed a bit of this going on in the other notes as well. I think this happens because each note is played not by a single string in fact, but rather two or three to help reinforce the sound. And if those are a tiny bit out of tune, then the subtle difference in frequency will result in interference, making the volume fade in and out like we saw. So that's certainly an effect we could incorporate if we wanted. I want to have a look at some more of these harmonics here though. This is an interesting looking one with its amplitude oscillating very rapidly. But anyway, it looks like many of the frequencies here are decaying significantly slower than we've seen before, which I imagine is on account of lower notes actually using thicker strings which can sustain the vibrations for longer. So, I've just been expanding our interpolation function to now allow us to specify information about whichever notes we want, which I've called our key frames. And it then just finds the nearest key frame above and below the given note and interpolates its properties between them. The properties being the harmonics of course, but now also a decay multiplier. So we can say that all the frequencies of a particular note should last a little longer for instance to crudely emulate a thicker string. All right. So here's what our low C was sounding like before. And with our new tweaks, here's how it's sounding now. Definitely an improvement, I'd say. Though, if we listen to the real one again, ours does still sound quite synthetic. Here's a sidebyside of the real and synthesized specttograms for interest sake. And I think it'll be fun in the

Segment 7 (30:00 - 34:00)

future to try and make something that really sounds convincing, but I'm happy enough with the humble start we've made today. Let's just try out the rest of the notes quickly. And this is with key frames on each of the C's. All right, not too bad. And I want to switch over to a zoomed out view of all the little widgets we've set up so we can appreciate everything we've put into this simple sound. Okay, to end with today, I'd just like to mess about a bit and see what little tweaks we can make to get some different sounding things out of this. In fact, I was cleaning up the code a bit behind the scenes and somehow ended up sticking the sine waves were generating back into the sign function again. So the next time I ran it, it sounded like this. That is a much brighter tone than before, which is obviously not intended. If we take a quick look at the spectrogram of our original synthesized note, followed by now the funky double sign version, this is the effect of adding a whole host of extra harmonics. Albeit not in a very controllable fashion, so it's probably not terribly useful. But still, it was a fun little accident. Anyway, I want to just mess about with the harmonic editor a bit longer and make some more noises. After playing around for a while, I ended up with this little setup modeled very loosely after a mima. There's plenty of room for improvement, but it at least sounds like a different instrument, which I think is cool. Another way we could make different sounds is that currently the decay curve takes us from full amplitude down to zero. But some instruments are able to sustain their sound for as long as you have breath to blow, for instance. So we could set a sustain level, meaning that the sound could get a little softer if we wanted after the initial attack, but we'll then maintain that level until the note is released. This lets us create something like a flute, for instance. And I actually had a look at the specttogram for one of these and there's a lot of strange stuff going on. Like this harmonic is pretty reasonable. It gets loud then decays a bit and kind of levels out as we expect. But then this other one decays steadily over the whole duration while yet another gets slowly louder the whole time. You flawists have a lot of explaining to do. But here's how it's supposed to sound. And here's my quick attempt at synthesizing it. Ignoring all that complexity. Unsurprisingly, not super realistic. But anyway, having this new sustain setting does open up a lot of new possibilities. All right. Now, what we've been doing today where we simply add together sine waves is a technique known as additive synthesis. But there are a number of different techniques beyond this, doubtless more than I'm listing here. These are just the ones I've heard of. And obviously, there's much more we could still do with this additive approach as well. So, I'd love to hear any ideas you might have for future explorations. For now, though, let's just put all our crude little instruments together for one final performance.

Другие видео автора — Sebastian Lague

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник