# We Ranked AI Models by Their Performance in n8n

## Метаданные

- **Канал:** n8n
- **YouTube:** https://www.youtube.com/watch?v=lcVdzwjyslg
- **Дата:** 02.02.2026
- **Длительность:** 3:29
- **Просмотры:** 2,888
- **Источник:** https://ekstraktznaniy.ru/video/15125

## Описание

n8n now has an Official AI Benchmark. A free community resource for choosing the best model for your use cases.

Link to the benchmark: https://go.n8n.io/benchmark-ty-launch

@liammcgarrigle tested over 60 AI models across 8 categories. The entire benchmark was built and runs inside of n8n and scores models on actual use cases in n8n rather than conversational style or subjective preference.

Smaller models often outperform larger models when it comes to specific tasks. We also found that a model's list price doesn't tell the whole story. One model priced at half the cost of competitors ended up being 10x more expensive in practice because it was so verbose in its outputs.

No single model dominates every category, so use the category filter on the benchmark page to find the best fit for your specific workflow. Whether you're building AI agents, automating data extraction, or generating code, this benchmark helps you make more informed decisions and build more cost-effective solutions.

## Транскрипт

### Announcement []

How do you pick which AI models to use? Are you sure it's the best one for each use case? I noticed that a lot of people just pick a model and then pretty much just stick with it. If that's you, then we made something that you should see. I'm proud to announce the official NADN AI benchmark. The AI benchmark is a

### What is it? [0:20]

table that helps you pick the best model for your use case. Unlike some other benchmarks, this does not include any subjective or undeterministic scoring. So, writing and style is completely out of scope here. This is powered by

### How it works [0:35]

thousands of pass or fail challenges broken up by category that we ran dozens of models through. That means pure deterministic logic creating the scores from the AI running in the same engine where you use the actual models inside of NADN with the agent nodes. This allows us to say pretty confidently which AI models may be best for different use cases. Speaking of

### Made to be adapted for each use case [1:00]

different use cases, the entire system was designed around customizing the score for your specific needs. Don't just look at the overall score and assume that it will be best for you. Go in and actually pick which categories your agent actually needs. For instance, let's say we have a tags suggest for a

### Manual usage example [1:18]

blog site. You'll probably need structured output, classification, speed, and likely also cost. That actually recalculates the overall number to give you a new personalized benchmark just for your use case. To make this even easier to use, you can just describe or even paste a workflow into this text box and our agent will pick the categories for you and explain why

### Copy and paste benchmark results [1:45]

it did. Then once you decide on a model, you can just copy the node right from the copy icon and paste it directly into your workflow. Something else I find

### Per execution cost estimates [1:54]

pretty hard is understanding how much custom agents will cost me. I don't know about you, but my brain can't really do quick math with per million token pricing. So, we added a dead simple way to get an instant cost estimate for each model. Just grab your token usage from the NADN logs and then put it in the fields in the pricing dropdown. That recalculates the average price per run column. So then you can get a really good estimate of how much you're spending per workflow execution. With AI, the smaller the scope is, the more

### Make focused, limited scope agents for best results [2:24]

reliable the output will be. This benchmark was made to help you know which models to use for those more specialized agents, which is how you should be building for reliable realworld systems. Each category was tested in isolation with as few variables as possible, which gives better scores for specialized models and a bit worse scores for general models, which is why you might see some very good general purpose models, lower than you'd expect them to be. But this is

### Specialized agents are better AND faster [2:54]

great because the best model for your focused use case is usually cheaper and much faster than the big generalurpose models that you know you're usually used to chatting with. I spent many hours

### Want a behind the scenes deep dive? Leave a comment [3:07]

building the system behind this. So I really want to deep dive into how it works which by the way it is 100% built in NADN. I even use data tables instead of an external database but that will need to wait for another video. The

### Go try it out! [3:21]

benchmark page is now live and it is the first link down in the description.
