# I Found the Best AI Reasoning Model! DeepSeek vs ChatGPT vs Gemini

## Метаданные

- **Канал:** Skill Leap AI
- **YouTube:** https://www.youtube.com/watch?v=4s-c93sQYj0
- **Дата:** 14.02.2025
- **Длительность:** 21:57
- **Просмотры:** 37,604

## Описание

📖 Download HubSpot's free prompt book here: https://clickhubspot.com/e23p

ChatGPT, DeepSeek, and Google Gemini have all released advanced "reasoning" models that outperform older models in nearly every benchmark.

In this video, I put the top reasoning models to the test with 10 different prompts designed to assess their problem-solving, logic, and creative reasoning skills.

What Are Reasoning Models?
Traditional AI models generate answers as quickly as possible, but reasoning models take their time. They break down questions into smaller steps, think through them logically, and apply Chain of Thought (CoT) reasoning.

This process allows them to solve complex problems that require multi-step thinking rather than just pulling an answer from a dataset.

Testing the Models: 10 Challenging Prompts
Prompt set 1: Rapid-Fire Questions
Which is bigger: 9.11 or 9.9?
How many "R"s are in Strawberry?
Which came first, the chicken or the egg?

Prompt 2: Creative Problem-Solving (With a Twist)
📌 Question: You have a 50-foot rope and a 75-foot building. Using only the rope and your body (no other tools), how can you measure the building's height?

Prompt 3: Logical Deduction & Paradox
📌 Question:
"If the statement below is true, then the statement above is false."
Is the statement above true or false? Explain your reasoning.

Prompt 4: Coding Challenge
📌 Question: Create a game of chess where the king moves like a queen instead of following standard rules.

Prompt 5: Commonsense Reasoning
📌 Question: Why might a sealed glass bottle full of water break if placed in a freezer?

Prompt 6: Vision with Reasoning
📌 Question:
Which AI model created this image?


Prompt 7: News Article Summarization
📌 Summarize a news article in 100-150 words.

Prompt 8: Alternate Universe Physics
📌 Question: If the electron’s mass increased by 1% and its charge decreased by 1%, how would the speed of sound in diamond change?


Prompt 9: Hardest Question - Goldbach’s Conjecture
📌 The Question:
Every even number greater than 2 can be expressed as the sum of two prime numbers.


Prompt 10: Do Parallel Universes Exist?
📌 Prompt: "Using quantum mechanics, explore the possibility of detecting alternate realities interacting with our own."

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
MORE FROM SKILL LEAP:
💡 Join the fastest-growing AI education platform & Instantly access 20+ top courses in AI:
👉 Start with a free trial: https://bit.ly/skill-leap
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

## Содержание

### [0:00](https://www.youtube.com/watch?v=4s-c93sQYj0) Segment 1 (00:00 - 05:00)

Chad GPT deep seek and Google Gemini all have newer reasoning models that could outperform older models in pretty much every single Benchmark so I wanted to make a very comprehensive video and put them through the ultimate test so we're going to take this across 10 different prompts and this is going to start easy and it's going to get pretty much impossible by the end and we're going to use chat GPT O3 mini deep seek R1 and we're going to use Google Gemini's Flash thinking models this video is brought to you by HubSpot more on that in a bit now before I jump into the prompts let me give you a quick background on what these reasoning models are so every AI model and chatbot before the release of reasoning models gave you the answer as quickly as it could so when you use chat GPT 3. 5 40 it just you gave the prompt and it gave you the answer reasoning models actually take their time they break down the question you're asking into smaller parts and they think through before they quickly respond to you so some questions will take few seconds of thinking up to several minutes of thinking so this technique is called chain of thoughts and you could even look on your screen and see the actual thinking process before they give you the answer to see how they break down each problem now let's get into our test for the very first prompt I actually have three different questions that I've tested these models with before in previous video so I just bundled those as prompt set number one and we'll get much more complicated as we go okay so with chat GPT I'm choosing chat gp03 mini I'm using this under my teams plan right now but it is available under the free plan you're just going to get a limitation on how often you could use it and I'm also using deep seek on their website here and I have R1 resenting model turned on I'm not going to turn on search this is free now I have Google Gemini Advanced which is a paid upgrade but you will also get 2. 0 flash thinking experimental with your free account too and they have another one called 2. 0 flash thinking experimental with apps this will combine it with things like search and YouTube we're not going to use that one again we're going to use this one so all three different models I'm showing you here have a search functionality that you could combine with reasoning and it makes it really a lot more powerful than just using the reasoning model if you're looking for upto-date information for my questions I'm not I'm just looking for it reasoning powers and chat GPT is going to do some reasoning here and then you could kind of see his thinking process over here and if you click on this right here you'll see exactly how much thinking he had to do to answer in this case it got a right three Rs in Strawberry actually didn't do much thinking it's really interesting to see the thinking of each one here and in future prompts I'm not going to really dive too deep into showing you the thinking I just want to show you for this very first one how it goes through the process deep seek does a much more elaborate breakdown as is thinking through a problem so it's going to go through it's a little bit slower too compared to chat GPT so it came to the right answer three Rs in Strawberry but let me show you the reasoning here it had to do to come up with that you see that you had to break it down multiple times it took 88 seconds if I look at chat GPT it took 5 seconds so keep that in mind deep seek I found is the slowest by far of these models but sometimes with these type of reasoning models you're not really too concerned about speed you're just trying to get more accuracy so we'll see how this kind of shapes up now let's use the Gemini thinking model here now this one is the fastest by far so right there three Rs in strawberry and the answer was instant right it maybe took two seconds right it doesn't tell you how long it was thinking for though like the other models do but here is just the thinking process very straightforward you spelled it out you found R here and he gave you the answer here now which one's bigger 9. 11 and 9. 9 by the way the reason why I use those two questions all the time those are directly from open ai's test when they first release 01 their very first reasoning model and right here 9. 9 is larger than 9. 11 it took just a few seconds deep seek 9. 9 is bigger but let's see 37 seconds of thinking to come to that and Gemini again a little bit of thinking a few seconds here pretty much match the speed of the chat GPT model 9. 9 is bigger now the next one which came first chicken or the egg the answer is egg came first the first true chicken likely hash from an egg laid by a bird that was almost but not exactly a chicken deep seek we got the same answer and Gemini says the most scientifically sound and widely accepted answer is that the egg came first same reason there okay so that was all part of prompt set number one now I'm going to go to more complicated questions and we're going to get pretty much near impossible by the end this one is for Creative problem solving you have a rope that is exactly 50 ft long and a building that is 75 ft

### [5:00](https://www.youtube.com/watch?v=4s-c93sQYj0&t=300s) Segment 2 (05:00 - 10:00)

tall you need to measure the height of the building using only the rope and your own body no other tools you 5T tall how can you do this describe this in steps now chat GPT took a minute and 14 second to answer the 03 Mini model here but the answer kind of doesn't make sense this Final Answer hang the Rope 50 ft over the top of the building you have 25 ft left from the ground then use your own body which is 5 ft as a ruler to measure the rest but how can you do that you could only measure that 5T at a time you can't just stand on top of yourself to keep doing that so that doesn't quite add up now deep seek actually thought for over 3 minutes and he actually came up with a whole different way to solve it by creating similar triangles with the rope and your body you determine the building's height is 75 ft through proportional reasoning and folding a rope into segments that match your height this one actually made a lot more sense to me than the other one where you kind of had to use your own body and stand on top of yourself to make up that 25 ft now Gemini actually didn't give us a really easy formatted answer he gave us these larger five steps but he used the same technique the similar triangle technique as deep seek so I think deep seek and Gemini got it right chat GPT this kind of doesn't make sense to me if it makes sense to you let me know in the comment section but I think this is missing and the other technique actually works a lot better okay question number three is going to be logical deduction if the statement below is true then the statement above is false is the statement above true or false explain your reasoning okay I got three completely different answers this time chat gp03 says the statement is paradoxical neither true nor false in a consistent way deep seek on the other hand says the statement above is true and Gemini and deep seek thought for a while 161 seconds gem and I says and jimini I had a lot of thinking this time to go through but again very fast output even though we had to think a lot and it said false so we got true we got false and one that says it's neither true or false now based on my answer key the statement above is true is the only right answer so the other two got it wrong deep seek got it right okay let me get to some practical prompts now and by the way if you're using reasoning models you should keep your prompts relatively simple because these models break things down step by step they think through a question in smaller questions so you don't need to overly engineer your prompt and make it super long A lot of times we chat GPT 40 that is what we do we want to give it a lot of context these don't need that now I actually have a free resource from HubSpot that I wanted to share with you this is one of the most comprehensive prompt list I've ever seen it's over a thousand expertly crafted prompts and it will help you with productivity strategy content creation building your business branding and much more so if you're a marketer entrepreneur content creator this is going to save you a lot of time actually personally found this marketing strategy section and this brand pricing strategy section prompts really helpful you could see there's a thousand prompts here but they're organized into categories makes it really easy to jump around and these type of tasks really could benefit from reasoning models too so if you need any type of strategy involved when you're using any type of chatbot I've been using more and more of these reasoning models using these type of prompts over the standard models like chat pt40 or the regular Gemini and you could get the entire prompt list in the description below this video thanks for HubSpot for sponsoring this video let's get to a coding question next so in a previous coding test that I was doing I asked it to create a gamma chess I could run on my computer and a couple of different models were actually able to do that relatively well but people were saying well this kind of information is a available online in the comment section so I added something new to it to see if you could get this right one rule that I want to change the king can move like the queen so we're going to test this to see if it can make a chess game that's our first part of the test the second part is if I change the logic of a chess game is it going to be able to keep up with that too okay I had the pieces saved on my computer already let's see if we could get this to work right let me just do a couple moves here so far so good okay now let's see if it listened to my instructions this is the queen so that moved the right way now the king oh wow okay so the king usually could only move one piece and now yes he moves exactly like the queen so can I take this piece is this going to be Checkmate oh it doesn't know checkmates but everything else up to that point he got right he just doesn't know the end game on how a game is going to end let me see if I take this

### [10:00](https://www.youtube.com/watch?v=4s-c93sQYj0&t=600s) Segment 3 (10:00 - 15:00)

one yeah it doesn't know how a game ends so that was chat GPT just one followup prompt to get a visual look out of it let's go to the next one on deep seek okay I spent literally 10 15 minutes here with deep seek trying to get a working chess game and it just keep having errors on pointing to those little icons that I had in my chess game it just couldn't find it no matter what I did and just to keep it fair the other ones it was literally one prompt sometimes only a quick follow-up prompt to get a different visuals out of it this one was like 10 prompts and I still don't have a working game so deep seek is one of the losers here for this test okay this one is from Gemini right over here let's see what we get oh I'm getting the beach ball no it's not working it looks really nice though the graphics are really sharp yeah I don't know why it's not working actually I have a code debugging test coming up next so I'll just save that for with this but right now I'm not getting a functioning game without doing a follow-up prompt the whole point of this test is coding right off the gate without a follow-up prompt to fix any errors like this one I have to quit this app now okay so for this one I'm going to see how it actually interacts with anything you upload to it so I give it a code file here this is a python code file for the chess game fix the bugs in this code and give me a working code so I could do the exact same thing I'm going to start with 03 mini here deep seek also has this option right here where I could upload documents to it well it will extract the text only so I'll run it through the same problem Gemini though is going to lose this one right away because right now I could upload images but I'm not going to be able to upload a code file or a text file of any kind so that should be coming to Gemini the regular Gemini has it but this model right here thinking model does not have it which is really going to limit it because a lot of times we're going to want to work with numbers analyze documents we're going to look at code we can't do it with Gemini so Gemini loses this one right away now this is the code that Gemini could not run the one that crashed now 03 mini fixed it so let's see if the game works here so far so good it definitely didn't crash let's try that okay so far that's working well let's go over here that's working well now let's see if our King could actually move like a queen so it should be able to take this even though it normally only moves one piece so let's go here yep that worked let's go here okay I just moved a couple of pieces let me see if I could just get this Checkmate right here so if I move here sh oh okay now is it Checkmate okay now it's Checkmate technically shouldn't have been able to take that piece but it let me take that piece so it's still missing a little bit of game logic on a chess game but it did figure out the ending I can't move any other pieces all right we got the one from Deep seek let's go through the movement here and let's see if this one is going to work right so far so good no issues here let's move this guy okay now let me just try to see if the queen moves the right way let's go over here okay so it did move two pieces but it shouldn't move there because now I could take it so let's try that and game should be over yep game is over so deep seek was also able to fix that issue and he had the same logic issue here my king because he can now move more than one piece I think is confusing it so the king should have not been able to move up here for this Pawn to take it so it didn't give me an error message but um yeah so far it it's pretty much equal to what gp3 gave us okay I want to try something pretty difficult here I want to use his vision capabilities and his reasoning at the same time I don't think it's going to be able to answer but I want to see how he thinks through it so I uploaded an image I made in mid Journey version 6 and I said which AI model made this image so since they all could actually see that even Gemini could see inside of an image let's try it with all three okay chat GPT says after 5 seconds I'm afraid there's no reliable way to tell which AI model produced this image deep seek just gave me an error message because I guess it could only extract text from an image it found no text even though this does have text right over here the chocolate says yum on it and Gemini after a bunch of thinking here he actually went through and it says the image was likely created using mid Journey wow that's incredible and here's why photo realistic style artistic flare and it's impossible to be 100% sure but that's crazy that it gave me an answer that actually was right now the next one is the most common way we use Ai and I want to see how these do combining search as well as summarizing articles I'm just going to ask which is the best AI model right now so really kind of a vague question to see how he responds

### [15:00](https://www.youtube.com/watch?v=4s-c93sQYj0&t=900s) Segment 4 (15:00 - 20:00)

I'm going to turn on search for deep seek and with Gemini we need to choose this other model flash thinking with apps so he has access to search too okay chat GPT says chat GPT 40 and 03 models are the best general purpose and conversational use for multimodal and deep reasoning is saying Google Gemini is the best so far I'm kind of in the same page here for safety and specialized tasks plot 3. 5 Sonet yeah that's pretty good cost efficiency deep seek R1 wow this is actually a really good answer I think that he gave us and this is the source that he pulled that information from now deep seek general purpose you got chat PT 4 this should be 40 though not four claw 3 Opus not correct 3. 5 Sonet is actually newer and better and this is out data to so this did not seem to use the search function very well this time yeah this is kind of outdated it seems to be using its own knowledge base which is these models are usually outdated so you do want to turn on search for things like this that needs more upto-date information I'll try one more time just in case there was a glitch on the search icon okay it looks like it's actually fixing it through his own thinking it says the answer I gave was from mid 2024 data but they're asking again for newer search results okay so yeah deep SE car1 Gemini Pro okay now it's actually answering us correctly specialized models open source models now Gemini give us the longest answer by far you put Gemini right on top over here GPT 4 again not 40 claw 3 not 3. 5 so this had the same issues mid Journey Dolly 3 these are good what else but there's newer models like flux and things like that it's not telling us about yeah this is sort of out data too I feel like from the sources Gemini Advanced 1. 5 again this is a previous model so this is not quite what I'm looking for the whole point of this is to tell me the latest greatest models which is I'm literally using it right here and this is telling me a model that is older than that so Google is supposed to be the best at search I'm combining this reasoning with search and it definitely failed here so chat PT got to write the first time DC got it right with a second follow-up prompt and Gemini just kind of failed here and give us just way too long of an answer so put chat GPT as the winner of this one now let's see how it deals with follow-up prompts here I'm going to ask what is the very best someone told me in the comments section that when they're having chats with these reasoning models they actually forget everything they said previously so let's make sure they could follow along 03 mini says if I had to pick one I will pick the latest GPT 40 or 03 okay so guess it's picking us two different models one for general purpose one for reasoning I'm using 03 mini here now deep seek pick its own model too deep seek R1 scores 89 out of 100 in quality metrics outperforming GPT 40 and it's giving us the source where he got that from here now Gemini interesting it did not pick even though it says right here Gemini the conclusion is GPT 40 it thinks that's the best model I will select GPT 40 as the single very best and provide the justification here of why that is very interesting but again this is using that Source here because if I don't have search turned on it's going to give me outdated information from his training data okay for the next prompt I'm going to use one of the hardest questions ever and I got it from this test right here it's called Humanity's last exam and just to show you some of the scores right now GPT 40 scores 3% accuracy okay 3% 03 mini what I'm using right here 10% deep seek 9% so a long way to go this is about 3,000 questions I picked one for this prompt and then I have some other videos coming up where I'm comparing 01 Pro and this other 103 mini high this other version that are part of the higher plans of chat GPT so I'll save a few but let's go ahead and use one here and this is actually a multiple choice in an alternate universe where the mass of the electron was 1% heavier and the charges of the electron and proton were both one 1% smaller but all other fundamental constants stayed the same approximately how would the speed of sound in Diamond change obviously I have no idea but I have the answer key and it's a multiple choice so these are the different choices that it's given us okay chat GPT only 29 seconds and he got the answer right the answer is B decreased by 1. 5% okay deep seek 140 seconds the answer is D decreased by half a percent

### [20:00](https://www.youtube.com/watch?v=4s-c93sQYj0&t=1200s) Segment 5 (20:00 - 21:00)

not right on this one Gemini the final answer is C so they all got different answers but chat GPT is the only one that got this one right now I'm going to finish up with a math question that hasn't been solved for hundreds of years let me just kind of play this for you to explain what it is goldbox conjecture is an unsolved problem in number theory proposed by the German mathematician Christian goldbach in 1742 it states every even integer greater than two can be expressed as the sum of two prime numbers for example 4 = 2 + 2 so he knows what problem this is and it's kind of given us a background despite being one of the oldest unsolved problems in math the conjecture remains unproven and I said yes I know that but I want you to solve it for me and just a few seconds later it says unfortunately I can't solve this for you so we're clearly not at the point where these models are going to solve problems that humans haven't solved yet deep seek tried much harder but it still says general proof remains elusive here and yeah we had 138 seconds trying to work through it but obviously no definitive answer Gemini says the same thing it has not been proven true for all even numbers one day I spent two hours really pushing these models to try to give me an answer and no matter what I would ask it just literally wouldn't even go to try to solve it just kept giving me the same thing so definitely we're not at a point where these are going to solve this type of problem okay so this graphic shows the result of all the different prompts across all the three models that we tested to figure out which is the best reasoning model obviously some models do better at some things some models just have functionality issues so I can't upload documents for example inside of Gemini but really interesting test let me know which one is your favorite thanks again for hopspot for sponsoring this video make sure you grab that free resource Linked In the description below and I will catch you on the next video

---
*Источник: https://ekstraktznaniy.ru/video/12561*