OpenAI’s New ChatGPT Learned from 100,000 Conversations!
6:19

OpenAI’s New ChatGPT Learned from 100,000 Conversations!

Two Minute Papers 21.01.2025 51 384 просмотров 1 644 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper is available here: https://openai.com/index/openai-o1-system-card/ 📝 My paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Gaston Ingaramo, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Sundvall, Taras Bobrovytsky,, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/ X/Twitter: https://twitter.com/twominutepapers Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

yes open AI has unveiled their amazing 03 Ai and everyone is talking about it this was their chat GPT that can think before it answers it shows many thoughts that lead to the final answer it can reflect on its mistakes it truly is chefus and does it does way better than previous methods you hear a lot about it everyone says AI this AI that and they say it will never be able to do this and that but to that we say no because we are fellow skyar here we want hard data from real research papers so do we have any of that oh my look at this a 52 page paper now we're talking and I went through it so you don't have to although it's super fun so here are three things that really surprised me for instance it has been tested with 100,000 tax pounds and so much more dear fellow scanar this is 2minute paper with Dr car note that these are results with the 01 version which is in terms of fundamentals the same as 03 first finding cyber security issues now that would be absolutely incredible an AI that helps us create more safe computer systems sign me up right now however that is not easy at all for instance they tested o1 on a set of curated cyber security challenges first on a high School level now don't be fooled these are still quite a challenge and require chaining multiple steps to solve one liner Solutions don't work here these require a little creative thought as they cannot be copy pasted from a textbook you have to adapt your knowledge to new systems the AI was given 12 potential attempts at the problems and what happened well the earlier GPT 40 system was able to solve 21% about every Fifth and what about the new one is that better oh my more than double it solves almost half of these problems that is incredible however I know you saw it Oh yes there are Collegiate and professional level challenges too so what about those well my initial guess was that these are going to be big fat zeros these are really tough problems after all so what is the actual result previous system 3 and 4% and now hold on to your papers fellow Scholars for their new system which can solve more than triple 133% who I have to be honest that is absolutely incredible progress but it gets better second let's try to hack the o1 system ourselves what is that is called jailbreaking a jailbreak means writing a carefully designed prompt that tries to make the AI do something that it shouldn't be doing if we try that the previous system was quite resistant to those but wait there are data sets with examples that it did not do well with at all so how about the new one oh my look at that loving this even the worst version is more than three times more resistant to these jailbreaks it is like a safe that is resistant to many of the world's best lock Pickers and when humans were testing the two systems against each other the new one was found to be safer around 60% of the time approximately 30% goes to the previous system and about 10% in ties mat yes I hear you fellow scholar asking what about hallucinations this is when we ask about something and we get an answer with made up data is it doing better in that regard I really hope so because I am using this previous model to talk about this very paper and it hallucinates a fair it well first they talk about accuracy not hallucination rate H why because it is easy to say something that is not made up information but is also not accurate like leaving the answer for the question in an exam blank that is not hallucinating but also not helpful so accuracy has increased for the new system and with that did hallucinations increase or decrease huh would you look at that decrease yummy it is better on both axes absolutely incredible and it is also meaningfully better on troubleshooting questions in virology 18% better and don't forget this is just 01 we are currently at 03 which is once again meaningfully better than 01 I think this shows that a huge amount of work goes into the testing and evaluation of these AI models what a time to be alive now something to highlight openly I found

Segment 2 (05:00 - 06:00)

that it also makes a formidable con artist this area is outside of my expertise but I would like to highlight it so that more people know about it though I cannot add too much more but I will say one thing I would love to see more work on how we can use this system to be a shield against such manipulative Behavior both from humans and from Bots it knows what patterns to look for so it should be able to help us I would love to see that and this is all hard data from the paper not just stuff that you hear thrown around in the media so that's what we do here at two minute papers subscribe and hit the Bell icon if you wish to see more it's also good for us because you know we can keep existing and that would be great so what do you fellow Scholars think let me know in the comments below to run your own experiments on an Nvidia GPU check out Lambda I use it myself regular Illy for these videos H look at that you can generate high quality images in less than a second per image I did a ton more of them and paid less than a dollar for all this crazy seriously try it out now at Lambda labs. com /p papers or click the link in the description

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник