# ‘Principles for Building with AI’ - from the Berlin Meetup (May 2025)

## Метаданные

- **Канал:** n8n
- **YouTube:** https://www.youtube.com/watch?v=cxQE2LaWaWw
- **Дата:** 12.06.2025
- **Длительность:** 14:22
- **Просмотры:** 2,596
- **Источник:** https://ekstraktznaniy.ru/video/15381

## Описание

David Roberts, VP of Product at n8n, shares some key insights he learned while working on the AI features of n8n:

- AI is a new puzzle piece
- Start fast, then go deep
- Explicit structure works best
- Iterate

Keep an eye on our community calendar for upcoming events around the world: https://lu.ma/n8n-events

Interested in hosting a community event in your area? Join our Ambassador program: https://n8n.io/ambassadors

#n8n #community #ai #agents #lowcode #nocode

## Транскрипт

### Intro []

Really great to be here. Really great to see so many faces. I look after product uh for NA10. I've been working with Yan for about four years. It's humbling to see this many people because when I started you guys would have been a large proportion of our cloud user base to be honest. Uh so you know things have changed for sure. Um I actually want to talk about three things. Um I'm going to start off by just sharing a few stats to sort of set the scene. Um then I'm going to go into these principles about um building with AI. They're things that, you know, we've just observed or um deduced. They're not uh things we've invented, I'd say. And then finally, I want to give you a little product demo uh that's related to those principles. Um so, let's get stuck in. Um if you go to Google Trends and you

### The beginning of the year [0:45]

look up NA10, you'll see a graph that looks like this. Um and that inflection point is basically the beginning of the year. It's the beginning of January. Um, and when that happened, we spent a long time trying to disprove it, trying to prove that it was an analytics bug or fraud or something. And we finally had to admit, you know, this seems to be real. This is a this is really going on. I'd love to sort of check if this graph is reflected in this room. So, um, how many people here use NA10? Hands up. Okay. A lot of hands. Okay. Keep your hand up if you were using NA10 two years ago. Okay, we got some emojis. Um, and what about one year ago? month ago? Yeah. Okay. Well, that's some kind of proof, I think, that this graph is real. Um, and you might be asking yourself, you know, what did you do here? Like what was the thing you changed? Um, and the truth is not that much. Honestly, we didn't have any big product launch. We didn't, I think, have any huge marketing push at that time. The real point of interest is back here. Um, because in October 2023, that's when we uh released our AI nodes for the first time. Um, and we released them. They were very MVP. They had a lot of rough edges. Um, we spent a lot of time kind of refining them and making them better. And things were growing quite nicely, quite nicely. And then you know what looks flat now that felt quite nice at the time. Um and then for some unknown reason in January this happened. I guess it was just sort of a alignment of the planets or something. Um but the other thing that I should mention that also happened back in the day is that we really kind of invested in the community. Community is you know very important to Yan personally as you can tell from his speech just now. Um but also um it's been very very helpful for NAN just the the appreciation the spreading the word the making the videos like Yan says that everyone's done that's really what sort of lit the the touch paper if you like um for us. Okay so um what was going on during that time so this is the number of people or the percentage of people that put AI in the first workflow that they build with N10. Uh and you can see that while we were on that sort of

### The graph [3:09]

flattish part of the graph, this was going up all the time. Um and the takeaway I take from this graph is that although we predate chat GBT, you know, NHM's 5 years old, uh and was originally a workflow automation platform, um we are essentially an AI tool now, as cheesy as it sounds. You know, I know there's a lot of skepticism around that kind of thing, but it's a little bit undeniable when 80% of people are using you um for AI. When people use AI in N10, about twothirds of the time is agents. Um, I guess that shouldn't be surprising. That's the sort of most powerful way of using AI. And about half of those agent workflows use a chat interface. So that could be sort of a built-in chat, but it could also be Telegram or WhatsApp or Slack. Um, and about half of them call other workflows, too. And I think that's very important because that's where a lot of the richness and the power is in N10 because your agents can call other workflows as tools. We've established that, you know, AI is a is an important puzzle piece. And this is where I start repeating Yan. Thanks Yan for stealing my speech. But uh um you have these three pieces of the puzzle. And they really are like I guess the metaphor I'm going to use is um they're colors on your palette that you can paint with, right? And if you think about it, to do any process, these are your only three options, at least that I can think of. Um and we've had humans forever. We've had code for what, 70 years or so. AI is obviously um very new and as Yan said I think the art the value the subtlety is in how do you mix these three things together and that's basically what the strategy of NAN is it's to be a great place for mixing these things together and using the right puzzle piece in the right position when I say code in the context of NA10 I mean deterministic logic I mean the sort of classic nodes that we have um we've been also investing in human in the loop so having these human steps that you can blend into your workflow and there's definitely more to come there. We have uh another David who's also a product manager uh who is looking after that and of course AI. So onto those principles there's principles of AI. At the first one start fast but then go deep. So we've all seen these amazing demos of AI functionality and they're often kind of jaw-dropping. you know, they're amazing. And we've all seen these demos kind of fail to make it into real products into production. And I think that's because of like a fundamental truth of developing with AI is that the time to first demo is actually shorter. It's actually reduced because you can just get there much faster, but the time to production is actually longer than with normal development. And that's because you have to it's hard to reason about AI. You know that it's a black box. you have to discover performance. You can't just um reason about it, do it logically. And

### The big model [6:00]

that brings us on to this second um principle. So there's a bit of a debate raging on the internet or in AI circles right now um between two camps which are called kind of like big model and big workflow. And big model is epitomized by a company like OpenAI because what they'll say is all right you know you've got something complicated to do just give it to the agent. The agent's going to call some tools and it's going to figure some stuff out give you a result and that's the way that you should be architecting and they've got a bit of a vested interest obviously because you know you'd be putting it into chat GBT or something. Um but on the other side the big workflow side that's you know companies like Langchain would be on that side for example and they say well yeah that just doesn't work you know you need to explicitly architect AI you need to put AI as a puzzle piece into something bigger and who's going to win this debate hard to know hard to predict the future um but what I can tell you is that today if you want to be doing production level things with AI And it's not something that will then be reviewed by a human like a draft of an email or something like that. You need explicit structure. You know, this is what we've seen building ourselves working with customers. If you don't basically give um give a structure to your AI, you're going to struggle. You know, what does that mean? What does that mean giving structure? So, with that demo, you'll probably start off, you'll put everything in one prompt, and it might be quite long. it might have like you know if this do that then do this um that kind of logic when you wanted to make it more reliable it's very helpful to focus the agents or the a the AI on very small constrained pieces of the puzzle and maybe have multiple steps with multiple agents but just doing simple things um because just like humans I guess you know the the more you tightly define your task the more that you um give clear instructions and keep things small the better your results are going to be. Um, but there's also um there's also code, right? And that that's the other puzzle piece that we talked about. And with code, you should probably if you can use it, you probably should do. It's more reliable, it's faster, and it's cheaper than AI. And then finally, um you often need to do checks or approvals, right? you might want to um check that code that the AI generated runs properly or you might want to um use a human step to verify things or you might even want to use another AI step to verify things and so all of this leads to this structure and it's great to start off with a single prompt but I think as soon as you start to really get into the details you need this explicit structure and finally third principle is iteration um iteration was a really important principle or is for general development for coding it has been for a long time but I would argue that with AI it's even more important uh and the reason behind that this discovering performance your workflow your AI is going to perform differently depending on what input is given into it you know if it's maybe like a chatbot to answer questions it's going to perform differently on different questions and you as the author you don't even know what kind of questions it's going to be asked let alone which questions is it going to perform badly on, well? So, what we found is that when you launch an AI feature or an AI product, your work is really just beginning because you have to be monitoring how people are using it, figuring out where's it gone wrong, then changing the workflow to fix those things, checking that you haven't accidentally broken anything, and then repeating the whole cycle again. So, that really is the definition of iteration, right? these cycles. Um, and NA10 has been built from the ground up with iteration in mind. You know, there's a lot of little things that make iteration easier in NA10. An example would be when you double click a node and you open it, you get the data right next to the settings. So, you can quickly, you know, tweak something, run it, see what the result is, do that little loop uh faster. And also, if you want to re-execute uh just a part of a workflow, you don't have to start from the beginning. So all these things like shorten those iteration loops. And that brings me on to the third part of this uh short presentation which is about um a feature workflow evaluation. We actually sort of talked about at a community meetup a while ago. Um but then we realized it wasn't ready. Okay. The user experience wasn't really there and so we went back to the drawing board um and we've come up with a new version uh which fingers crossed will be released as beta on Monday. The reason we went back to the drawing board is because we think this is a super super important part of building with AI. Uh so I'm just going to play this little video for you which unfortunately also includes me speaking um to explain what it does. Evaluation is a great way

### Evaluation [11:00]

for getting confidence in your AI workflow. It's essentially a form of testing and it's the technique that helps you move from an impressive proof of concept to a solid production ready workflow. In fact, it's pretty much essential to do anything reliable with AI. And if you don't believe me, take these guys word for it. Anthropic calls it the crucial secret behind good prompting. And Vel goes as far as to say your prompt is broken if it doesn't have evaluations. It's important because LLMs can't be reasoned about like code can. So they have to be measured instead. Evaluation provides a way of doing that measurement. In evaluation, you run a test data set through your workflow to see how it performs on a range of different inputs. So let's look at an example here. I am building a workflow for classifying support tickets. It takes an incoming ticket in the format of a subject and a body and using AI it calculates a category and a priority. I've got it working end to end and now I want to get a sense of how reliable it is. This Google sheet has test data that I can use for that. A handful of support questions ready for classification. And I've also added in what the correct answers are by hand. By running them through my workflow, I can get a sense of where things are going wrong and get ideas for improvements. So, I've added a trigger here that will read in the rows from the data set and a set outputs node that will write the results back to that data set. Uh, and you can see that it's running multiple times in a row, once for each row in my data set. And we can see the results coming in here. Uh the red cells are where the AI got it wrong. And I can look at these to get a sense of where I need to improve things. After I've made my changes, I can try again and I can see what's changed. Once my data sets, once my data set starts getting bigger, I'll need a way of measuring performance, not just eyeballing it. And that's where metrics come in. Metrics are essentially a score that measures the quality of the workflow. Metrics can be code-based or can use AI to judge performance. So here I've got a very simple metric. I am just comparing the expected priority from my Google sheet with the actual priority that's been output by the workflow and doing the same for the category. But you can also calculate lots of metrics like correctness or document rag relevance with this feature. I can use the evaluations tab to run my evaluation. And once it's finished, I can see a roll up of my metrics over all my inputs as well as results for individual rows. I can also see how those metrics have evolved over time as I have worked on my workflow. So in summary, you should use evaluation when building your workflow to get an idea of how it performs over a handful of different cases, when deploying to production to check that the edge cases you know about are covered, and when making changes to check that you haven't accidentally broken anything. It's a feature we're very excited about, and I hope you find it useful. Yeah, that's it. So coming to an instance near you from Monday hopefully. Um, thank you very much.
