We Ranked AI Models by Their Performance in n8n
3:29

We Ranked AI Models by Their Performance in n8n

n8n 02.02.2026 2 888 просмотров 135 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
n8n now has an Official AI Benchmark. A free community resource for choosing the best model for your use cases. Link to the benchmark: https://go.n8n.io/benchmark-ty-launch @liammcgarrigle tested over 60 AI models across 8 categories. The entire benchmark was built and runs inside of n8n and scores models on actual use cases in n8n rather than conversational style or subjective preference. Smaller models often outperform larger models when it comes to specific tasks. We also found that a model's list price doesn't tell the whole story. One model priced at half the cost of competitors ended up being 10x more expensive in practice because it was so verbose in its outputs. No single model dominates every category, so use the category filter on the benchmark page to find the best fit for your specific workflow. Whether you're building AI agents, automating data extraction, or generating code, this benchmark helps you make more informed decisions and build more cost-effective solutions. Time Stamps: 00:00 - Announcement 00:20 - What is it? 00:35 - How it works 01:00 - Made to be adapted for each use case 01:18 - Manual usage example 01:45 - Copy and paste benchmark results 01:54 - Per execution cost estimates 02:24 - Make focused, limited scope agents for best results 02:54 - Specialized agents are better AND faster 03:07 - Want a behind the scenes deep dive? Leave a comment 03:21 - Go try it out!

Оглавление (11 сегментов)

Announcement

How do you pick which AI models to use? Are you sure it's the best one for each use case? I noticed that a lot of people just pick a model and then pretty much just stick with it. If that's you, then we made something that you should see. I'm proud to announce the official NADN AI benchmark. The AI benchmark is a

What is it?

table that helps you pick the best model for your use case. Unlike some other benchmarks, this does not include any subjective or undeterministic scoring. So, writing and style is completely out of scope here. This is powered by

How it works

thousands of pass or fail challenges broken up by category that we ran dozens of models through. That means pure deterministic logic creating the scores from the AI running in the same engine where you use the actual models inside of NADN with the agent nodes. This allows us to say pretty confidently which AI models may be best for different use cases. Speaking of

Made to be adapted for each use case

different use cases, the entire system was designed around customizing the score for your specific needs. Don't just look at the overall score and assume that it will be best for you. Go in and actually pick which categories your agent actually needs. For instance, let's say we have a tags suggest for a

Manual usage example

blog site. You'll probably need structured output, classification, speed, and likely also cost. That actually recalculates the overall number to give you a new personalized benchmark just for your use case. To make this even easier to use, you can just describe or even paste a workflow into this text box and our agent will pick the categories for you and explain why

Copy and paste benchmark results

it did. Then once you decide on a model, you can just copy the node right from the copy icon and paste it directly into your workflow. Something else I find

Per execution cost estimates

pretty hard is understanding how much custom agents will cost me. I don't know about you, but my brain can't really do quick math with per million token pricing. So, we added a dead simple way to get an instant cost estimate for each model. Just grab your token usage from the NADN logs and then put it in the fields in the pricing dropdown. That recalculates the average price per run column. So then you can get a really good estimate of how much you're spending per workflow execution. With AI, the smaller the scope is, the more

Make focused, limited scope agents for best results

reliable the output will be. This benchmark was made to help you know which models to use for those more specialized agents, which is how you should be building for reliable realworld systems. Each category was tested in isolation with as few variables as possible, which gives better scores for specialized models and a bit worse scores for general models, which is why you might see some very good general purpose models, lower than you'd expect them to be. But this is

Specialized agents are better AND faster

great because the best model for your focused use case is usually cheaper and much faster than the big generalurpose models that you know you're usually used to chatting with. I spent many hours

Want a behind the scenes deep dive? Leave a comment

building the system behind this. So I really want to deep dive into how it works which by the way it is 100% built in NADN. I even use data tables instead of an external database but that will need to wait for another video. The

Go try it out!

benchmark page is now live and it is the first link down in the description.

Другие видео автора — n8n

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник