Googles NEW "Med-Gemini" SURPRISES Doctors! (Googles New Medical AI)
25:41

Googles NEW "Med-Gemini" SURPRISES Doctors! (Googles New Medical AI)

TheAIGRID 03.05.2024 44 376 просмотров 754 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Googles NEW "Med-Gemini" SURPRISES Doctors! (Googles New Medical AI) How To Not Be Replaced By AGI https://youtu.be/AiDR2aMye5M Stay Up To Date With AI Job Market - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy/ 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ 00:22 Intro 00:31 Med Gemini Development 02:34 Fine Tuning 03:46 Previous SOTA 06:18 New Benchmark 08:53 Advanced Reasoning 10:25 Model Architecture 13:46 Video Benchamrks 14:27 Overall Benchmarks 16:55 Benchmark problems 18:45 Dialogue Exmaple 23:42 AMIE Vs Med Gemini Links From Todays Video: https://arxiv.org/pdf/2404.18416 Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (12 сегментов)

Intro

very effectively for helping out in the medical industry so this is quite surprising because I didn't expect such a system from Google yet but they also

Med Gemini Development

released something earlier this year which was actually pretty similar if you remember this is something that I spoke about and this was Amy which was articulate medical intelligence Explorer it was an advanced AI research system developed by Google and this one which was released around 3 to 4 weeks ago it was basically designed to handle diagnostic reasoning and engage in meaningful conversations within a medical context aiming to enhance the interactions between Physicians and patients as well as improve the quality and accessibility of consultations and basically the reason that Amy was so good is because it was able to use a simulated learning environment to enhance its learning and it engaged in diagnostic dialogues with AI patient simulators allowing it to practice and redefine and refine its conversational and diagnostic skills continually and it was actually trained on a huge diverse set of medical data including will World clinical conversations with medical reasoning scenarios and one of the key things about Amy that when it was actually pitted against clinicians it showed us that this was something that was Far effective if humans used it in the loop and you can see right here on this graph we can literally see that with Amy only the system performed increasingly better than the clinician unassisted and the clinician was assisted by search and search is essentially just the internet which shows a huge Improvement in the gaps and then we can see that assisted by Amy is a far Stark increase from just the clinician unassisted and then of course Amy only compared to the clinician assisted by Amy shows us that Amy actually did surpass the actual clinici so basically what this showed us okay and I know this isn't Med Gemini just yet but this is basically showing us that Google's increasing their efforts for medical health in terms of research because they're showing us that these AI systems like Amy are far superior than just the clinicians now essentially like

Fine Tuning

I said this is of course Med Gemini so what we have here is we have the initial Gemini system that exists here and Gemini is a family of powerful AI systems that are completely multimodal you can see that there are the inherited capabilities such as the advanced reasoning the multimodal understanding and the long context processing now this is where they decided to have the development for Med Gemini so they did medical specialization with self-training with web search integration and of course the multimodal understanding they did fine-tuning and customized encoders and of course with the long context processing they did chain of reasoning prompting and with all of these skills combined that's where we now get of course Med Gemini this version of Gemini which is specialized for medical applications and it's very fascinating because if you're someone that hasn't been paying attention to the space this is an industry that is truly about to be disrupted because the applications and the things that we're seeing show us that the benchmarks are looking pretty incredible in terms of the applications now one of the things that was there

Previous SOTA

before was of course the previous State ofth art so in terms of in the industry for medical AI systems that are able to talk and answer certain questions and queries in terms of their accuracy you can see that here we can see from September 21st all the way to September 2023 there has been a huge increase in terms of what these AI systems have been able to do noticeably the jump from DPT 3. 5 to Google's Med pomm then of course to GPT 4 and Med pomm 2 and the previous state of the art model which was before Med Gemini and of course now what we do have is we do have a state-of-the-art system which is medical Gemini the state-ofthe-art system before the one that was released today with Google well not actually released but the paper was released was actually GPT 4 not just the base version which the GPT 4 base version was very close to Google's medical one it's actually GPT 4 with a fine-tune version that gets it 19. 2% on the med QA which is a decent Benchmark for these AI systems so gp4 with the med prompt actually had a very high Benchmark but once again Google has beaten them now with GPT 4 and met prompt the reason I'm showing you guys this infrastructure is because this shows us how crazy it actually is so this takes us from the base level of GPT 4 and you can see all of the different things that they've added to GPT 4 in order to get the system to perform a lot better and what's crazy is that Med Gemini the reason why it's so effective and we're about to get into that is because it doesn't use all of these crazy techniqu techniques like The Ensemble with choice shffle and if you don't know what that is basically in multiple choice questions sometimes there is bias rated towards the first question in terms of like the answer so for example if you were to ask someone what is the primary gas found in Earth's atmosphere and if you were to have four answers the first one a lot of people subconsciously might think that one is correct and essentially what you do is you shuffle them and then when the most common answer is picked among these you can then find out that answer and then that's the one that you use so it's pretty crazy you can see how many different iterations that they did on top of gbt 4 to get to 90. 2 and surprisingly they managed to surpass this Benchmark and you can see right here that this is where Googles Med Gemini comes in on the med QA at

New Benchmark

91. 1% which is very interesting in terms of the increase and I guess some people could argue that maybe we are starting to Peter out in terms of the capabilities of large language models on these benchmarks but I would certainly disagree because whilst yes this might be the truth Med Gemini is pretty great and one of the key things you do need to know about the benchmarks is that you can see here it says relabeling with expert clinicians suggests that 7. 4% of the questions in the data set have quality issues or ambiguous ground truth essentially one of the things that have been consistently problematic in the AI benchmarking industry was the fact that these kind of systems unfortunately have to go with benchmarks that are pretty standard but these benchmarks contain thousands and thousands of questions but some of these questions are quite ambiguous and they have quality issues meaning that the AI systems can't really even get any of the answers correct because the questions literally don't make sense I wish I did have some examples to show you but just trust me when I tell you some of the question questions are completely insane so what they're stating is that 7. 4% of this might not even be that great because the benchmarks are potentially facing several quality issues and whilst you might think that this is going to be something that Peters out in the future we do know that more advanced reasoning systems could take this to 100% in terms of not only the system being good but of course the benchmarks performing a lot better when those quality issues do get fixed so one of the things we could see as well about the medical benchmarking here is we can see that there are several categories in which Med Gemini actually surpasses the previously state-ofthe-art the blue one here is of course the main focus and we can see that the blue one is Gemini and this surpasses the previous state-ofthe-art which is gbt 4 with Med prompt in every single category well nearly every single category there are some ones right here where is pretty much on par and only this one where it's on you know long context video but you can even see right here it says GPT 4 results not available able due to context length limitations and the advanced text reasoning you can see here that this is done pretty well the multimodal understanding it's done pretty well as well and essentially if you just want to take a look at this in terms of how much better it is if we just take a look at this line we can see that anything above this line is something that is an improvement and we can see that in pretty much near all of these categories we do have a decent Improvement so one of the things here is

Advanced Reasoning

the medical Gemini on Advanced Tech based reasoning tasks and you can see here that based on the previous state-ofthe-art it surpasses it in many different categories and one of the things that they actually did talk about is how this system compared with GPT 4 in some scenarios GPT 4 just didn't actually have the context length to support that on Long context reasoning so it's important to know that long context reasoning with Google's new context length it's actually a pretty important feature for the future because it allows us to process more information and the thing is with the medical industry the more data you have the more comprehensive of a picture you do have because a human body is made up of so many different intricate parts and because they all connect together if you have a long context window you're able to fit more data in and arguably get to a better conclusion about what the diagnosis may be or what's going wrong with a certain individual's body so it's important to have that for the future especially in terms of the needle in the head stack which is where you're trying to get that data from a long piece of context and use it correctly and of course reason with that correctly so we can see Gemini well met Gemini uh surpasses state-ofthe-art and we can of course see it here as well compared to Amy where it does surpass that as well now of course some of these aren't that crazy but it seems like maybe there's going to be some Advanced reasoning techniques now one of the things that I did actually see that was pretty cool

Model Architecture

was that the advanced reasoning that we did see was a little bit different so they had two kind of ways that they did Advanced reasoning with this um and I think that this is probably what we're going to see in terms of different models that are specialized for different use cases so in the case of Med Gemini they essentially had self-training and search and these were leveraged to enhance its capabilities in handling complex medical data and queries so let's actually dive into how these features were used by med Gemini so the self-training aspect with Med Gemini was where you use the model's own outputs to generate new training examples which are then used to further improve the model apologies for that coming off and then this method is particularly beneficial for refining the model's capabilities in an area where the initial training data might be limited or lack diversity so essentially what they do is the first thing that they do is they generate synthetic examples so for example Med Gemini processes medical data or queries and then generates responses based on its current understanding these responses along with the context in which they were made serve as new training examples then we have it refinement so these generated examples are fed back into the training cycle allowing Med Gemini to learn from its own outputs this iterative process helps the model to continually refine its reasoning and decision-making capabilities especially in handling complex medical scenarios then we have enhanced learning from simulators and this is where for Gemini simulations could involve creating scenarios where the model must interpret complex Medical Data from text images or even long medical records and feedback from the simulations helps the model adjust its methods for better accuracy and reliability and then of course we have the search for Med Gemini which is where we have Med Gemini when it actually encounters a question that it might struggle with has low confidence or insufficient internal data it can perform a web search to gather additional information like here you can see it says confident no then of course we go to here which is web search and then of course there's an uncertainty guided search where essentially Med Gemini employs strategy where the model calculates that its predictions have high uncertainty and it proactively searches for more information before finalizing its response and this method helps in the accuracy of the reliability of its outputs so this is the loop that it conducts in order to get better results then of course we have continuous update of the knowledge the ability to search and integrate information from external sources means that Med gem can continuously update its knowledge base without the need for actually frequent retraining and this is actually pretty crucial in the medical field where new research and clinical practices can change standard care protocols so by combining these two approaches that we have here it can better adapt to new or rare medical scenarios it has access to the latest medical information via search which means that its data is not only based on the initial training data but also the latest studies clinical trials and guidelines and then through the continuous learning and adaptation it becomes more proficient at handling diverse and complex medical queries making it a valuable tool for medical professionals seeking AI support then of

Video Benchamrks

course here are the video benchmarks and this is something where it's pretty good and we do know that Google's Gemini 1. 5 Pro can actually take in I think an hour of video which means it also can look at these videos and analyze the scene in a medical way so you can see right here the states here you can see the previous state-ofthe-art where we've got medp mgbt for vision and I'm wondering what would happen when GPT 4 release the video how it's going to compete with medical Gemini on this as well which is going to be pretty crazy and then of course here is where we have the capabilities of Gemini models in

Overall Benchmarks

medicine and this is where we have the actual benchmarks so you can see right here this is where we have the Benchmark and you can see the accuracy from the clinician then the clinician and search then the prior state-ofthe-art then of course the med Gemini then the med Gemini plus search so we can see a stark Improvement like I showed you guys before with of course Amy and this is the Amy stuff right here is what I'm guessing and then this is of course the GPT 4 or Amy I'm not sure which one but what we do have here is a clear conclusion from this graph just on this part we can see that the clinician is actually the lowest rated one in terms of the accuracy and then if we go all the way up here we can see that Med Gemini plus sege gives us a huge increase in terms of performance so there is a huge gap right here in terms of what's being done and I think that is rather impressive on how AI is able to bridge this knowledge Gap for clinicians and it's able to help them so I definitely think that this is something that could provide a lot of details and a lot of help to people the only thing that I do think that could be problematic here is that hopefully humans don't completely rely on the AI because sometimes the AI might miss certain things in diagnosis and I think humans always always have a lot more data points than an AI system because something that I've you know and this is a little bit off tangent but it does make sense here is that if you are prompting GPT 4 or any other AI system sometimes you'll think why can't this system get what I want to go out but if you give the system every single piece of information that you can like for example if you're asking it how to write a certain essay and if you say when the deadline is if you say what your teacher is like if you say the style length if you say a few things you need the more information you give these systems the much better they are at essentially giving you the output that you want that's something personally that I've seen and that's why I state that the reasons you know humans would have a lot more information is because a human might go to a system and be like oh I have a runny nose but if you do to an AI system that's the only piece of information it has whereas a human who's in there the room with you can see how your skin looks it can see how you're moving around it can see if you're fatigued it knows your age it knows what you look like it knows your

Benchmark problems

family tree so the point here is that I do think that this is going to Def L improve as we move on in the future and hopefully this can be a very good tool for clinicians in the future now like I said before whilst yes this is doing well there are some problematic benchmarks you can see here as I spoke about previously revisiting the med QA benchmarks there were some probabilities it says however some Med QA test questions have missing information such as figures or lab results and potentially out sted ground truth answers to address these concerns they had at least three us doctors review each question to answer the question themselves check the if the original answers were still correct and note any missing details or ambiguous elements in these questions and they used a method actually called bootstrapping where a committee of three reviewers decided if a question should be excluded due to its flaws apologies for that little glitch there and the findings were that 3. 8% of questions were missing information 2. 9% had incorrect answers and 0. 7% were ambiguous so most reviewers agreed on these assessments showing strong consensus and removing flawed questions helped improve the ai's test score from 91. 1% accuracy to 91. 8% and if they used majority decisions which are the more relaxed criteria instead of unanimous ones the accuracy further increased to 92 2. 9% by dropping about 1/5 of the problematic questions so in simple terms here by cleaning up the test and removing or fixing flawed questions they made the test a better tool for accurately assessing the ai's ability to handle medical queries now here's where

Dialogue Exmaple

we actually get into some of the dialogue examples for this AI system and I think this is where we can see how this multimodal system works so it's not something that's too crazy but I think you guys can all truly understand exactly how this works you can see you have the dialogue examples and this is where the person actually decides to leave a comment down where they're stating exactly what's going on and then Med Gemini says I understand your concern can you send me a picture of whatever it is you're dealing with and then it sends them a picture then it queries and asks for more information then it says this is what I think it is thanks could you explain what this is then it you know tells you exactly what it is it's a diagnosis and it says only a definitive diagnosis can be made and it says okay will you advise me on how to treat this and then you can see here it's able to give out a lot of these pieces of information that could help this person to you know solve this issue and I think like I said before this stuff is really good because doctors you have an appointment you maybe have 15 to 30 minutes but with an AI system you could ask it a million or a billion questions and it's going to be patient it's going to understand you and it's going to be able to talk to you in many different ways that it could easily help you more than a person because whilst yes I do appreciate what doctors do and I know how good they are the only problem is and I know this sounds crazy that I'm about to say this is that they are human which means they have human limitations such as the time and this is something that AI systems can help so imagine a virtual AI doctor where it would just talk to you about this kinds of thing and you could quickly get diagnosed now there was some feedback from an actual dermatologist where it says that it's impressive diagnostic accuracy for this condition which is relatively rare and a speciality specific condition based on limited data of one photo and a brief description so the reason they included this is because this is something that is relatively rare and its speciality specific condition and the fact is that this was just one picture remember I've always St stated how in the future AI systems trained on millions and millions of different images are going to be far superior than any human because they're going to have seen so much data that they're going to instantly know when something is like that and I think in the future with more advanced systems these kind of things are going to becoming pretty normal however there were some cons it says additional photos of Representative lesions on different extremities would strengthen the diagnosis and it says they could include other things and then of course it does say while there's no cure it could emphasize the possibility for symptom Improvement and management there was also another one here and I do want to state that there are several examples in the paper that I simply just didn't want to take a look at because the pictures were pretty graphic considering they were from medical examinations like people doing open surgeries so it was a little bit graphic and I just didn't want to include that but you can see here this is where we have another dialogue example of someone getting their picture and it says hello I'm a primary care physician and this is an x-ray for a patient of mine the formal Radiology report is still pending and I would like some help to understand the X-ray please write a radiology report from me and then it talks about that then you can query it and then it says my patient has a history of XY Z but I do think in the future that it would be better if you could just somehow load the patient data with this because trying to you know prompt it and say could this be the back pain I do think that whilst yes doctors are good to do that I think it would be better in the future if these AI systems are going to have access to that users's complete medical history because it's going to allow for a much more comprehensive diagnosis and then of course this is where we had feedback which was a rather interesting example where it said you are a helpful medical video assistant you are given a video and a corresponding subtitle with a start time and duration followed by question your task is to extract the precise video time stamps and then answer the question given below so provide one single time spand that spans the entire length of the answer while considering the entire video it's better to be exhaustive and providing the longest time span for the answer so how do we relieve calve strain with foam roller massage and you can see that this is exactly where the footage is it says the start and the end and it says that the ground truth time span annotation is the same as what Gemini had answered so overall I think some people at the end of this might be a little bit confused between Med Gemini and Amy but essentially they are quite different

AMIE Vs Med Gemini

whereas the purpose of these two systems Amy is primarily designed for improving Diagnostic dialogues and reasoning within medical consultations it aims to simulate and support the interactive conversation part of a medical consultation focus on history retaking diagnostic accuracy and patient communication but Med Gemini is a more generalized AI model that excels in processing complex multimodal Medical Data such as you know text and long medical records and it's specialized in understanding and integrating broad medical knowledge across various formats to assist in Diagnostics and treatment planning beyond the conversational capabilities and the strength of course of Med Gemini is the fact that it could be used of course in some instances for the long context processing with long history records to enable more accurate diagnosis and Amy is actually optimized on focusing on engaging patients in meaningful dialogues the future goals for these two kind of systems are quite different because Amy aims to become a virtual assistant in medical consultations enhancing the quality of care through better communication and diagnostic support and it seeks to basically address the conversational and empathetic aspects of Medical Practice whereas Med Gemini is positioned to Aid in a more analytical way involving vast amounts of data aiming to support medical professionals by providing comprehensive integrative analyses of patient information potentially leading to more informed decisions and I think something that you also need to take into account is that once these AI systems are trained on vast different languages it's also going to break down the barrier in terms of people who struggle with certain languages get the right Medical care that they need because I can't imagine trying to explain something to a doctor that doesn't speak my language the nuances in certain things do make the difference in ensuring that you get the right medical treatment

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник