Join AI Startup School & learn to vibe code and get paying customers for your apps ⤵️
https://www.skool.com/ai-startup-school
—— MY APPS ——
💬 MindDeck, an advanced frontend for LLMs: https://minddeck.ai/
- Use coupon code 1JYEN9RH for 50% off
📲 Tensor AI: Never Miss the AI News
- on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746
- on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai
—————
CONNECT WITH ME
📸 Instagram: https://www.instagram.com/theramjad/
👨💻 LinkedIn: https://www.linkedin.com/in/rayamjad/
🌍 My website/blog: https://www.rayamjad.com/
—————
Links:
- DeepSeek Announcement: https://x.com/deepseek_ai/status/1958417062008918312
Timestamps:
00:00 - Intro
01:06 - What We'll Be Doing
01:45 - Planning with DeepSeek on MindDeck
04:22 - Running DeepSeek via Claude Code
04:55 - Comparing Responses
06:30 - Conclusion
So yesterday, DeepSeek released a brand new model, DeepSeek v3. 1, and we'll be trying out to see how good it is at coding. They say it's a first step towards the agent era, which means the model is going to be really good at tool calling, as many other agent-focused models are. And we can see over here that it's scoring better than DeepSeek v3 and DeepSeek r1 on the software engineering benchmark, scoring 66, and that is pretty close to what Sonnet 3. 7 or better than what Sonnet 3. 7 was scoring. Still not quite there with Sonnet 4 and Opus 4, but it is better than some of the other coding models as well. And you can see how much of an improvement it had over the DeepSeek r1 model, scoring much better on all the benchmarks, especially related to agentic tasks. And you can see the model is much more token efficient than DeepSeek r1 was, and I'm sure they're going to be releasing DeepSeek r2 as well in a couple months, so it'll be interesting to see how they compare on their benchmarks as well. But of course, the benchmarks aren't everything, and every model kind of has its own feel to it, so we'll be trying it out on a real-world codebase. We're going to be trying it out via Claude Code, because one new thing that DeepSeek added is an Anthropic API endpoint, which means that it's now compatible with Claude Code. And there's some setup instructions over here that we'll be going through shortly. So what we'll be doing is adding to my
application, Tensor AI over here, and it basically helps you stay up to date with the latest AI news. You can download it using the link in the description down below for free. And the thing that I want to add is like a For You tab over here, so it knows which articles are relevant to a user, so they don't have to manually select based on categories. And I should probably use something like which articles they engage with previously, and also their interests as well, and weight it in some way. And that will require some embeddings as well, so I'll be using the OpenAI embeddings, because I'm already using OpenAI in this codebase. Now, the very first thing we're going to do with DeepSeek is to have it plan the approach we're going to take. I have a rough idea of how it will look like, but I want it planned out in detail. And I also want to compare the approach that it suggests to these other models as well. And I'll be comparing the models via my application called
MindDeck. ai. There's a coupon code down below if you want to use it to get access. And basically, it's 100% offline, and you just enter your own API keys, and everything is stored locally on your device. It never leaves your device. So we can go to Settings, and then insert a DeepSeek API key. So if we go to DeepSeek. platform, and then copy the key over here, this key will not be working by the time you're watching it. And then press Save API keys. Then we can select DeepSeek chat as one model, DeepSeek Reasoner as another model. And I also want to use DeepSeek v3 and DeepSeek r1 as well to see how it compares when it comes to planning. So I'm going to go to Models over here, and then find both of the models. So here's DeepSeek v3. We can press Add over here, and then search for DeepSeek r1, and then press Add Model over here. So now I have all the models loaded up, the non-thinking, thinking, and older models over here. And I'm basically going to say, hey, so basically I want to implement a for you feed in my application. The application is an AI news application, and it has articles with lots of different AI news. I want to use OpenAI's embedding models to embed the articles, and then recommend relevant articles to users. They put in their interests on the application during signup, and because their interests may change over time, like I need a way of accounting for this. What would you recommend? I'm using Supabase as my database, and I want to use PGVector as well to store the embeddings. Maybe I need an API endpoint for this. Basically think of all the different approaches, and then recommend the best approach for me for this particular application. Press Stop, and then press Enter. And now it will pass it to all the models in parallel. And you can see which models are already faster and which are slower. Now, after having read through all the responses over here to the same prompt, I actually prefer DeepSeek Chat, which is the non-thinking model that they recently released yesterday, the most. Because it gives a simple approach, and then building upon the simple approach, and then combining it with a hybrid approach as well. And it gives some example code as well. The Reasoner model, which they released yesterday, the thinking model, it answered the question kinda, and then it just gave me a bunch of code, HTML code as well, which I did not need. And compared to the other approaches, I think DeepSeek Chat, which is running DeepSeek v3. 1, was the most comprehensive. Anyways, if you're an LLM power user, and you want a privacy-first way of running many different models in parallel, to compare outputs like I just did over here, and to be able to use many different models, like literally hundreds of them, and you want all the data stored locally on your device, rather than on the cloud somewhere, then you can check out MindDeck. And there's a link in the description down below, and there's a coupon code for the first 25 users as well. It is a one-time subscription, so you don't have to worry about any recurring monthly fees or anything like that. Anyways, to run DeepSeek via Claude Code, you want to make sure you have it
installed, and then run this command over here, and replace the API key with your API key. Press run, and then run Claude, and then you should see something at the top that says that it's overwritten the base URL, and it's also overridden some of these environment variables over here. So we can say, hi, who are you? And you can see that it says that's Claude Code, because that's baked into the system prompt. But anyways, we can give the instructions and the plan that we have from earlier, and then see how it performs in executing it. So I'll paste in the plan, and then switch to plan mode, and press enter over here, and see what it comes up with. Alright, so it seems that
DeepSeek is done over here, and it took around 17 cents. And I also decided to run Claude Code as well, using Opus in another folder, which is an exact duplicate. So we can see how the solutions compare, because of course, Opus will give a much better solution. But we want to see how close the DeepSeek got. So one thing that DeepSeek has done better, is they added the functions and the tables in separate files in the schema, like I said in my claude. md file. Whereas it seems that Claude Opus did not do that, it just made a single migration file over here. But looking through the schema itself, I actually prefer what Opus did, because they used duration seconds over here, to count how long they spent in the article, as a way of knowing how interested they are in it. And when it actually comes to doing the embeddings, for some reason, DeepSeek just makes an API endpoint over here. Whereas Claude Opus, it actually integrates into an existing workflow. So after the articles have been generated, right at the bottom, it triggers an embeddings thing over here, to then create the embeddings. It does have to resolve the types, but I actually prefer the overall solution of Claude Opus over here. And DeepSeek also seems to have added the API endpoints into a wrong folder. So it should have added it to a mobile API folder over here, which Claude Opus did. But when comparing the API endpoints themselves, Claude Opus kind of went over the top over here, because it made a get endpoint, which I wouldn't think would be required for this particular use case. We would just need a post endpoint over here. But as for the recommendations endpoint, I think Claude Opus was more comprehensive, because it used a fallback feed over here. And it also added a recommendation reason too, which I thought was quite good. But actually neither model has integrated into the frontend. So my overall conclusion is a new DeepSeek model is definitely
a step up. DeepSeek chat seems to give better outputs than DeepSeek Reasoner at least. I think DeepSeek Reasoner is kind of overcomplicating the problem. So maybe my prompt has to be adjusted for this particular model. DeepSeek seems to do okay in Claude Code. It probably will perform better in much smaller code bases. This is a monorepo with many different like segments to it. So it probably doesn't perform as well because the context window is smaller. And because the Claude Code system prompt is probably not fine-tuned for DeepSeek itself. And when it comes to pricing of the model, it is pretty good value for money. And they also have a discount price for these like off-peak hours over here, which I find quite interesting. So ultimately, I don't think I will be using DeepSeek for coding much. But I will use it for planning though, because I thought that DeepSeek chat's planning capabilities were quite interesting. And it did come up with a pretty good plan. Right now, I currently use O3 when it comes to planning mostly. So I think probably what I will do is I will have O3 running on the side over here, make a new window, and then have DeepSeek Chat running on the side over here. And then whatever plan or whatever idea that I'm thinking of, I'll just paste into both and then have it run. And then use two hopefully very different perspectives to like come up with a good plan for this. Because I think that DeepSeek was trained on a lot of Chinese content on Chinese social media, and the Chinese internet and books and so forth. So it probably has a slightly different way of looking at things, I imagine. Anyways, if you do want to try MindDeck, then there's a coupon code down below in the description for it. It will get more expensive over time as more features are added, and the development costs increase and so forth. But the license is for life, so if you buy it now, then you will save money in the long run.