DeepSeek R1 vs ChatGPT o1 - Ultimate Test

19:05

DeepSeek R1 vs ChatGPT o1 - Ultimate Test

Skill Leap AI 30.01.2025 99 109 просмотров 1 840 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Prompts used in the video: Multi-Step Reasoning & Logic You have a row of 100 light bulbs, all initially off. When you pass through the row the first time, you toggle every bulb (turn them all on). On your second pass, you toggle every 2nd bulb. On your third pass, you toggle every 3rd bulb, and so on, up to the 100th pass. Which bulbs end up turned on? Math Word Problem With a Twist A horse costs $50, a chicken costs $20, and a goat costs $40. You bought 4 animals for a total of $140. Which animals (and how many of each) did you buy? Advanced Domain Knowledge (Physics) A spaceship traveling at 0.8c (where c is the speed of light) launches a probe forward at 0.3c (relative to the spaceship). According to special relativity, how fast is the probe moving relative to an outside observer at rest? (Use the relativistic velocity addition formula.) Creative Interpretation & Consistency Write a short fable involving a mischievous fox and a wise crow. Then, without repeating any phrases or exact sentences from the first story, retell the same fable from the perspective of an onlooker. Keep the key events consistent between the two versions. Thought Experiment If you have a sealed glass bottle full of water sitting in a freezer, why might it break? Chain-of-Thought / Step-by-Step Explanation The restaurant bill for 3 people was $45. They each paid $15, so they paid $45 in total. The waiter put $5 in his pocket and gave $5 back to them. Therefore, they each ended up paying $14, which sums to $42, plus the $5 in his pocket equals $47. What happened to the missing $3? Ambiguity & Language Nuance Consider the sentence: “I didn’t say she stole my money.” Interpret this sentence in at least four distinct ways by emphasizing different words, and explain how each emphasis changes the meaning. How many R's are in "strawberry"? Which came first, the chicken or the egg? Which number is bigger: 9.11 or 9.9? ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ MORE FROM SKILL LEAP: 💡 Join the fastest-growing AI education platform & Instantly access 20+ top courses in AI: 👉 Start with a free trial: https://bit.ly/skill-leap ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Оглавление (4 сегментов)

Segment 1 (00:00 - 05:00)

I compare deep seek R1 reasoning model versus chat GPT o1 reasoning model side by side I ran it across 10 different reasoning prompts and the results were completely surprising to me now I'm going to specifically use prompts that require reasoning so you wouldn't use these type of models for every single task you would use it when it needs reasoning when it needs to think through a problem step by step before it just gives you an answer now a couple of things to mention before I start with the prompting on the left side here we're at chat. Deep seek. com this is the Deep seek official chat website but to use R1 you have to make sure you turn this on right here if you don't turn that on it uses the V3 model which is comparable more to the chat GPT 40 model I'll save that for a different video right now I want to really focus on the reasoning model now the second thing I want to mention is deep seek is an open- Source large language model meaning you could actually download versions of this and run it privately and locally the version on the website is not going to be private the data will be stored by this company here and I will cover the terms of service at the very end we're actually going to use both of these models and compare them side by side so you see how deep seek uses your data and how chat GPT uses your data but chat gpt1 especially if you use it for work especially if you're going to upload data here with this little Clipper to analyze this one opts out of training data by default because it is a paid upgrade it is really designed I'm using the teams plan but any of their paid upgrade UPS you out of data training this deep seek the free version and the chat PT free version by default do use your data for training their models now keep in mind the access to chat gp01 requires a plan that starts at $20 a month deep seek is free now for the very first prompt we're going to run both of them through a logic and multi-step reasoning question you have a row of 100 light bulbs all initially off when you pass through the row for the first time you toggle every bulb turn them all on the second pass you toggle every second bulb on the third third bulb and so on up to the 100th pass which bulb end up turned on now they're both going to go to work deep seek is a little bit slower right now just because it's going completely viral so the server load they're getting on the Deep seek website is pretty significant so I'm not going to compare it with any type of speed right now and I want to show you why these models are a lot different than the regular chat GPT or other large language model models they actually go through a process of thinking through all the steps and this whole gray part for example inside of deep seek is all the steps that he had to think through before he gave you an answer and if I go all the way down look at all this thinking here it had to do and here is the actual answer in white so this whole area I could actually leave it collapse down and only see the answer and here's the answer he gave us and chat gpt's answer is also right over here so 1 4 9 16 25 36 49 yep they both got this right and chat GPT the 01 model you could also click here to kind of see the step-by-step process it looks like there's only a couple different things that it thought through and in this case it took 45 seconds of thinking so you can see I had a lot more chat pt01 here thought for 10 seconds okay this one is going to be a little bit of a math problem with a Twist to it a horse cost $50 a chicken cost $20 and a goat cost $40 you bought four animals for a total of $140 which animals and how many of each did you buy and again I have R1 turned on and chat GPT the reasoning model could be turned on couple of different ways you could pick it from a model dropdown again you need the Plus or the teams plan or the Pro Plan here I'm using the teams plan and then you could also turn it on here to use thinking here and you could go ahead and send it out that way but you just need one of those turned on for it to go through the thinking process here okay this time actually deep seek is working a little bit faster because it's showing you the thinking process here this one hides the thinking process until you click on it 17 seconds this one is still actually thinking so it looks like deep seek thinks longer okay this is very interesting deep seek got a different answer than 01 deep seek says two valid combinations two horses and two chickens or one chicken and three goats really nice formatting down here with the answer and 01 says buy one chicken and three goat which is one of the answers but it did not come up with the second combination now I'm pretty sure this is the right answer here and let me just show you the thinking of deep seek here I mean look at the thinking process that he went through and how long did this take 76 seconds he went through like every single step here and ran so many different math equations here to come up with that okay so that's obviously a point for deep seek let's get to the

Segment 2 (05:00 - 10:00)

next one now this one is domain specific problem solving and this is related to physics and again I got the answer key from someone that specializes in physics just to be able to verify it because I actually have no idea how to solve this one but I do know the answer so let's send this one out okay this one again is the right answer and chpt also got the right answer let me just show you the difference in the way they had to think through it though look at Deep seek right here it's incredible how much it has to go through to get to the answer so Chantry PT does a lot quicker and if I click over here pretty much no details on the steps it used to actually give me a lot more details when I was looking at this before but now 137 seconds versus 7 Seconds came to the right one but again you did miss one so far but they both got this one right let's get to the next one now this one's a famous one in the thought experiment category which came first chicken or the Egg let's see what we get for that okay let's kind of go through this in real time this is a classic question I've heard before first I know the chicken came from the egg wait maybe Evolution plays a role here but then again the egg is a chicken egg if it contains a chicken alternatively if you define a chicken egg as an egg laid by a chicken then the chicken would have to come first this is really interesting he thinking about pretty much every scenario that it can be deep seek says for scientific standpoint the egg came first as the first chicken emerged from an egg laid by nearly a chicken ancestor okay well this can be playful straightforward explanation is eggs existed before what we formally classified as chickens so the egg came first so they both said the egg came first and I think deeps just does a little better job explaining it in simpler terms but same answer this is I think a pass for both okay this one I want to test their stepbystep reasoning here and it's kind of a trick question the restaurant bill for three people was $45 they each paid $15 so they paid 45 in total the waiter put $5 in his pocket and gave $5 back to them therefore each ended up with paying $14 which sums up to $42 it's kind of a trick question let's see what we get here okay so 80 Seconds deep seek chat GPT 7 Seconds let's get to the bottom here to see the answer chat GPT says in short there is no actual money missing and that was what I was testing for I was trying to confuse it and it says all the money is accounted for okay and deep seek the missing oh strange formatting here all the money is accounted for so same conclusion here this one just you know gets to it much faster it looks like 80 Seconds versus 7 Seconds okay this next one actually will be a variable answer so there's no right or wrong I just want to see how it deals with something that is more ambiguous here consider the sentence I did not say she stole the money interpret the sentence in at least four distinct Ways by emphasizing different words and explain how each emphasis changes the meaning now they both kind of came to a similar conclusion here so deep seek says emphasize on I emphasize on didn't emphasize on she and emphasize on stole chat GPT emphasize on I same thing emphasize on didn't she same thing but the last one's a little different emphasize on my money instead of stole and again both accurate this is not right or wrong I just wanted to see if we could actually figure out this kind of an ambiguous question that has multi- answers to it and they both passed this one now this next one seems very obvious to us as humans but usually large language models have a very hard time with this one which one is bigger 9. 11 or 9. 9 let's see if it gets this one right wow chat GPT got that one wrong 9. 11 is larger than 9. 9 deep seek the right answer 9. 9 okay so far deep seek has not failed once CH GP now has failed in two different questions and here's another classic question how many RS are in Strawberry let's see if we guess this one right again very easy for us to figure out but hard for large language models to figure out okay strawberry contains three instances of R in position three eight and nine okay interesting few seconds versus 24 seconds this time but they did both get a right I'm going to actually try to confuse it a little bit let me start a new chat and I'm going to misspell strawberry this time okay I misspelled it this time so there's one two three four RS now in Strawberry oh look at this thinking right here S no T no R yes and then these other RS yes yes okay

Segment 3 (10:00 - 15:00)

the letter R appears four times in the misspelled word strawberry and then right here the word strawberry which contains three consecutive RS contains a total of four RS okay so this didn't tell us it was misspelled but it did get the same conclusion four and four 7 Seconds versus again 21 seconds so clearly Chad GPT thinks a lot less in less time just gives you the answer quicker this one thinks longer but I rather get the right answer and wait now a couple of things when it comes to the functionality of these websites one thing that I really like with this deep seek website is the fact that you could turn on search and then you could check for something that requires search and requires deep thinking because otherwise the training data he has is actually older than even chat gpt's training data back to like 2023 but with this search icon it does have upto-date information chat GPT on the other hand does have search but you see it's grayed out right now because search for some reason doesn't work with the 01 model now the one really nice update that I made a different video about is the website prop plexity doai which is an AI search engine does allow you to use the reasoning model with R1 and o1 if you want to combine it with search so that does solve the limitations with 01 but to get access to this is another $20 a month upgrade that has nothing to do with the subscription from 01 that doesn't give you this is a totally different company that uses that model but there is a way around combining 01 with search which is true perplexity right now and hopefully open Ai and chat GPT roll that out so you could do the same kind of thing I'm showing you inside of R1 with search now deeps also has this other problem with how often the server is busy but again it's going completely viral right now the usage is through the roof so hopefully this is something that gets solved or other companies add deep seek into their platform because it is open source they could technically do that and download it and provide servers for it so it doesn't have this issue but while I was recording this video this message right here the servers are busy was pretty consistent now I just want to try one last thing here so chat gp01 has another model called 01 Pro so I switched my account right now to chat GPT Pro which is $200 a month but with $200 a month you get 01 unlimited 01 the other one actually has a limit which again is probably a plus to R1 because you could just use the website as long as it's up and running but this 01 Pro is supposed to even beat 01 in this $200 plan so let me run the couple of questions that 01 missed here let's see if we actually could solve those now this one got it right 9. 9 is larger than 99. 1 so 01 Pro solved that 01 did not and our math problem here with a little bit of twist let's run that one when it had two different answers okay 01 Pro got it right two valid ways to buy that three goats one chicken two horses two chickens two possible solutions same thing we got out of deep seek but 01 only gave us half that answer so again 01 Pro is matching deep seek but it is at $200 a month so kind of not fair to compare to a free model now for this next one we'll do a twoin one we'll see how it analyzes text and gives you useful information out of text and in that same step we'll compare the Deep seek privacy policy and the open AI privacy policy side by side so you see exactly how they compare now one of the main reasons I'm also doing this is beyond the data protection for your own personal data some people may not really be concerned about that if you do have protection against data at work a lot of people upload documents that is not their data they're doing it at work to analyze work related tasks so I want to mention that to make sure at your company you understand the Privacy policies of both because the version of chat GPT I'm using has an option to opt out of some of these things where deep seek does not have that option but with free chat Bots typically you don't have an option to opt out even though the free version of chat GPT does have that option but you have to go find it in settings and toggle it on and I kept my prompt really simple compare these two privacy policies and create a table comparing pros and cons for each one and I literally copy and pasted both of those documents into both of these chat Bots here okay they both gave us detailed tables here chat GPT did not do a great job with formatting he kept these HTML tags for some reason into this table and I'll kind of just focus on the cons here of each one so deep seek collects keystroke patterns privacy risk open AI broad collection of content files audio and images and it's

Segment 4 (15:00 - 19:00)

saying deep seek is vague about how they use the input to train their models open AI kind of has the same issue there then it says the data May lack factual accuracy that's just the risk of large language models that is not really relevant to the privacy policy every large language models makes stuff up sometimes and has factual issues now it says they share data with advertising Partners that's typical of most companies that have a free service now one key thing that they both mentioned relating to data storage and this again is going to be relevant if you use this for work and you have policy at work against this kind of thing but the data with deep seek is stored in China and that is very clear here in their privacy policy to make that very transparent again chpt says the same thing stores data in China this open AI obviously is a us-based company so it stores data in various jurisdictions often in the US and if you are a European user of this and have to comply with things like gdpr this discloses us storage with gdpr compliance effort and from some of this I saw deep seek does not follow some of this and is actually unclear how compant with International Frameworks they are like gdpr and another thing is how they use your training data so they will to improve their model by default deep seek does this chat gpt's free version does this the one reason that I usually like to use models like chat GPT even though it fell short here in this initial test I did is they have paid upgrades that let you opt out of that in fact this plan that I have this teams plan the plus plan any paid plan it is opted out by default at least that's what they say right so if you're using this for work you're like well I'm using the teams plan designed to do this designed to keep our data private now the way you get around that if it's more sensitive data not your own but you're trying to analyze it and it's company's data is you could use deep seeks local install now you're not using anyone's server you're not on a website you download it to your computer you run it that way deep seek lets you do that as well because it's open source chat GPD does not have that there is no way to completely use a private so you have to take their word for it with their upgrade that they're not using your information for any type of training data typically with sensitive data especially if it's out your own and you're trying to analyze it for work local install models are usually a safer bet because you're not relying on the servers of these companies so this is a side-by-side comparison and deep seek right here here's the quick takeaway directly from them their chatbot deep seek emphasizes compliance with Chinese regulation and offer straightforward data storage Clarity but is less transparent about model training and Global user rights open AI provides a stronger user control data correction opting out this is the point that I really wanted to emphasize you could opt out on some of this the settings inside of deep seek just don't exist for that and detailed disclosure for us and EU users but complicated compliance with crossborder data transfers and Technical jargons privacy conscious users might prefer open AI for its opt out actions and accuracy correction while those in China may find deep se's localized approach more accessible now all the prompts I used are in the description but if you've used other prompts if you got different results please let me know in the comment section below this video and I have different videos about running deep seek locally on your computer the different models that are available for that the full-blown version that is running the website is not going to really be able to run on a local computer right now even though you can download it and I tried to do it you're going to need a whole lot of computer power to do it and I don't know any consumer computers that could do it right now but the other versions are actually pretty good too and could run locally and privately on your computer so I'll link that video below as well thanks for watching I'll see you next time

Другие видео автора — Skill Leap AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник