Bayesian Maximum Aposteriori Estimation (MAP): Extending Maximum Likelihood Estimation

Bayesian Maximum Aposteriori Estimation (MAP): Extending Maximum Likelihood Estimation

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

welcome back okay we have been talking a lot about the maximum likelihood estimator which allows us to estimate the unknown parameters of a probability distribution using uh data so this is a method in statistics to estimate parameters given data and we have hinted that this maximum likelihood estimate is related to kind of aasan formulation or a basan version of this problem so today I want to flesh this out and make this connection more concrete so you can see how to go between uh an mle and kind of a bean estimation of those parameters and in future lectures we're actually going to code this up and do some examples okay so just remember um that the maximum likelihood estimation problem what we do is we take our probability density function and we take the likelihood function essentially we take our PDF and we plug in our measurement data for uh the variables so if I have a gaus e to the you know- x - mu ^ 2 Sigma 2 I would take my actual measured data numbers that I collected and plug those in for my variables X and now I have something that's only a function of my unknown parameters if I take the logarithm of that probability density function I have something called The Log likelihood function and this tells me roughly the likelihood of observing this specific data given those specific parameters and what we're trying to do is essentially tweak or optimize these parameters to find the maximum likelihood function okay we maximize overall of theta to find the parameters that are most likely or most consistent with the measurement data that we have access to now there is a big downside to the maximum likelihood estimation and that downside is essentially it is fragile to bad data and it doesn't allow me to include any prior knowledge or beliefs into these parameters let me give you a really simple example of a downside okay um so let's say that I'm trying to estimate the um probability of a coin being heads versus Tails so there's a single parameter p and you know for a Fair coin it would be 0. 5 so data would be 0. 5 for a Fair coin and a biased coin it would be somewhere between 0er and one so if I flip a coin let's say it is a Fair coin let's say I actually know it's a fair coin um and I flip this coin uh three times and let's say each of those three times I just get unlucky it's not even unlucky it's just it happens sometimes I flip a coin three times and I get uh heads heads okay I get three heads in a row that's the actual data that I have if I get three heads in a row the maximum likelihood estimate the mle for Theta hat would equal one it says that there is a probability of getting heads that should equal one you can go through and actually calculate this is um these are each Bern random variables so you can actually compute this for nals 3 of three beri coins and you can convince yourself that with this data the maximum likelihood estimator will say that the coin is always going to flip heads for all future flips that's a really bad issue um of the mle is that if I give it a little bit of data and that data is not um you know I get kind of unlucky on the draw of that data I'm going to get a really bad estimate and this is a problem with lots of estimation techniques ml are not you know are not particularly bad it's a problem with what we call kind of deterministic estimation techniques and so the solution to this is using aasan formulation the solution is use Baye theorem to incorporate some prior knowledge about the parameter use Bay to incorporate uh prior knowledge about uh about Theta okay so literally I might have a prior distribution of what I think Theta is maybe a Fair coin so for example my strong prior knowledge is that a coin is going to be uh you know fair coin is fair so you know maybe Theta is close to 1/2 and this I'm being very loose here I might say that it's a normally

Segment 2 (05:00 - 10:00)

distributed variable with mean 1/2 and some variance there's lots of ways of putting this prior knowledge in that's a whole you know deeper dive set of lectures but the idea here is that maximum likelihood estimation has this kind of canonical fault um this is a cartoon example that shows how bad it can get but if we use Baye um you know kind of bean statistics we might be able to incorporate some prior knowledge about Theta to make it more robust to these bad unlucky draws so let's write down what that looks like so in mle uh we use the probability of x given Theta for uh for ML okay we've Lo talked about this a lot but if we have a prior knowledge a prior on Theta sometimes we just say P of theta this is a distribution of what I think Theta is distributed as this could literally be you know a normal distribution around 1/2 that would be a prior Distribution on Theta if I think it's a fair coin pretty tight prior okay but if we have a prior on Theta we can multiply these two and we can get uh essentially P of x given Theta time P of theta and this is going to equal maybe I'll divide by P of X just so it looks exactly like we're used to looking from base theorem this is going to equal the probability of theta given X this is in a lot of circumstances a lot more useful um for a few reasons first off optimizing this function over Theta could be kind of messy this is a hard optimization problem um if I have P of Theta given X this is actually more of what I'm trying to do I have data so given data what's the probability of this value of theta so I essentially also want to maximize this quantity over Theta this would be a useful thing um to maximize over Theta um and I'm just going to label these this is my prior um this is my posterior distribution um and then you know this these are the um the other distributions in B theorem so if we have a prior we can get something that looks a whole lot like base theorem but we don't always know the probability of X we don't always have this quantity here so the useful thing kind of the the thing that makes this nice for optimization is if we're optimizing over Theta if we optimize uh over Theta which is what we're trying to do over Theta P of X doesn't depend on Theta so essentially what we can do and this is a handwavy argument but you can make this precise is that this quantity here this uh posterior P of theta given X is kind of proportional it varies in a similar way with Theta this means proportional to kind of related to P of x given Theta time P of theta so roughly speaking if all I'm trying to do is find the Theta that maximizes this quantity instead of maximizing this I can maximize this and I will get the Theta that maximizes this that's really useful okay so um essentially I'll just write this out so that it's really clear so instead of of doing kind of the max of the log of P given x uh of theta this is the mle we this is the classic um maximum likelihood estimate instead of doing that we compute the max of the log of this quantity of the log of P of theta given X which is essentially I can plug in this expression into here okay this equals Max uh log of P of data given parameters times my prior Distribution on the parameters okay these are kind of equivalents this is called the map so if this is mle this is map the maximum

Segment 3 (10:00 - 13:00)

posterior estimation I'm going to write that out it's the maximum a posterior estimator estimate the map estimate because essentially what we've done is we have replaced um kind of the likelihood function the log likelihood function in the mle with a slightly more informative log likelihood function that's being informed by the prior knowledge on Theta in this case that I think my coin is a Fair coin and we can code this up we're actually going to do examples for coin flips and this exact example and also for a Le squares estimation trying to estimate the slope um of a scatter of data points using uh maximum likelihood estimators and kind of beian informed estimation where there's some prior knowledge baked in maybe that the intercept is zero or something like that and so this is a really clever and simple way of incorporating prior knowledge into the maximum likelihood estimation to make it more robust to kind of bad unlucky draws of the data and other things outliers you know malicious attacks things like that really useful um idea here now this really relies ban statistics this sounds like this solves all of our problems it's nice but it has some issues you better have a good prior if your prior is bad this is going to be bad okay you need a good prior for this to actually improve things that's for one often times you don't really have an unlucky draw or such a small data set so mle is not as bad as I made it sound usually it works really well um you need a good prior for this bean version this maximum a posterior estimate to be you know good and there's kind of an interesting con nection also the maximum likelihood estimate can be thought of as a special case of the maximum OPP posteriori estimate when your prior is kind of maximally uninformative and if you are beijan in spirit you'll know that means if my prior on Theta has kind of infinite variance meaning it's really a super uninformative super duper weak prior then this will actually converge to this so mle is a special case of the beian version the map estimator when my prior is kind of maximally uninformative but if I have a good prior I can do better and I can incorporate that prior knowledge into the estimation problem using this beijan analog okay we'll see this more later this is a big Topic in machine learning optimization and statistics so we'll get into this um might take me a few lectures um but keep uh you know stay tuned for that all right thank you

Другие видео автора — Steve Brunton

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник