# n8n: Using Siri with n8n, and 'Nodes as Tools' - from the Prague Meetup (Sept 2024)

## Метаданные

- **Канал:** n8n
- **YouTube:** https://www.youtube.com/watch?v=96XWx-7WZM0
- **Дата:** 18.10.2024
- **Длительность:** 32:05
- **Просмотры:** 2,035

## Описание

In this talk from the n8n Meetup in Prague, n8n AI Engineer Oleg Ivanov  shares how he connected Siri on his Macbook to an AI workflow, and he gave a sneak peek of the new AI 'Nodes as Tools'.

Links:

- Read the full report on the Prague meetup here: https://community.n8n.io/t/prague-september-2024-meetup-report/56598
- Interested in hosting a community event in your area? Become an n8n Ambassador: https://n8n.io/ambassadors

## Содержание

### [0:00](https://www.youtube.com/watch?v=96XWx-7WZM0) Segment 1 (00:00 - 05:00)

I'm I've been working at n for two years now um I'm work in the AI team originally I started in Berlin but right now I'm based in czechia in my so I have two workflows uh that I prepared and they go like this so the first one is pretty basic it's this very simple workflow and what it does is uh this uh Siri what's on the picture the image appears to be a screenshot of a computer screen displaying a flowchart or diagram which seems to represent a process or workflow the specific details are not entirely clear due to the resolution the angle of the photo so as you can see this was not scripted I'm going to show you on a proc castle that this is indeed Hey sir what's on a picture Lou sorry I'm having trouble with the connection please try again in a moment hey Siri what's on the picture come on sir sorry I'm having trouble with the con look it really works okay sure sometime it times out like when it's thinking for too long of a time you can see that it's still thinking oh there's two executions running but let me show you how this actually works so as you can see I'm using the uh theama chat model that's also one of the reasons why it's taking so long because it's running on my local machine and using the lava 7B model which is a multi model uh multimodel model uh oh yeah so it supports image uh input and I have very simple setup where I have a web hook that accepts the data it's called SL Seri image to which we will po post the image I convert the image from uh body. image Bas 64 to an actual image file and then I'm using this basic llm chain to pass that image to my model uh if we go here yeah you see now it finished execution but unfortunately this Siri times out after 5 seconds or something or when it takes it to execute but it's actually provided the that's not the right one well this one's still running seems like myama is down okay now it's back up I guarantee this is going to work hey Siri what's on the image finder hasn't added support for that with Siri I guarantee now it's maybe you shouldn't use Siri you know it's not that great anyway hey Siri what's on the image finder hasn't added support for that with Siri oh it's the oh hey Siri what's on the picture yeah thank you hey Siri what's on the picture I'm having some trouble with the connection please try again in a moment son of a did it execute oh no it actually failed okay look I'm just going to replace Lama because that's what's taking solow well that's cool thing about anen that you can just freely switch the models very easily so now I can connect open AI chat model select the credentials we'll use the GPT 40 mini and that should be much quicker let me just save it hey Siri what's on the picture I'm not sure I've clicked it right now I let see sorry I'm having trouble with the connection Hey sir what's on the picture come on what do you want to search for okay last time come on hey Siri what's on the picture

### [5:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=300s) Segment 2 (05:00 - 10:00)

I'm having some trouble with the connection please try again in a moment okay but it returns stuff you see the image show PR Castle it's just not picking it up for some reason okay but let me show you how I build it anyway so as you can see this is a simple web hook we pass data to the whatever model is connected it has to support of images and it's using shortcuts to basically get that image from my desktop to my local instance of na10 uh but I could also be using you know production or self-hosted instance uh the shortcut looks like this so it's called what's on the picture uh after I ask the or say this prompt uh it will take the interactive screenshot so that was the this thingy you saw me selecting the image we'll convert that uh area that I've selected to base 64 and it will post it to this web hook that I've created it will wait for the web hook to execute here you can see how I passed these as uh inside Json uh and I add a static prompt what's on the picture to let the llm describe the picture uh and then finally when it's once it gets the output once it times out I just extract the text from the output of the web hook and ideally say it um well let's try it from here maybe like if I so now I just play without saying Siri see it's running here now I can't see images or describe them if you need to something specific related to the image feel free to ask okay then creepy oh now it's saying they can't describe images but I'm pretty sure it can using the right model we are maybe let's tone down temperature a bit save it is that your usual voice you work with yeah nothing iing AI Avatar the image shows the Cod typescript it defines an asynchronous function named War request which appears to be part of thear application lightly yeah okay that's correct actually so you see it's sort of picture uh then it explained in a creepy way and we can see here the image show a Cod snipper written the typescript that's all correct and that brings me to another uh workflow I have uh but now I have very little confident that this will work but maybe it will so uh it is called what use current file agent so it's another shortcut uh this one is slightly more complicated from the shortcut point of view uh but what it does is I say this shortcut use current file agent it will ask me what do I want to pass to the agent and it will also allow me to take a screenshot again but this time of the whole window uh and I do that to get the path of the file that I'm currently working on U there is some apple script API that you can use to get that from vs code but I wasn't able to get it to work and C GPT neither so I just uh fall back to this solution so what I'm doing is I'm taking that screenshot and I'm extracting using the Rex to extract the path to get the path of the current of the file that I'm currently editing so it's uh this thing here uh then I will use that uh and send it to my web hook s serif file uh as a current file with a query and then I just speak the output maybe let's time let's not speak it uh say to and we can try it now oh maybe let me show you the workflow too so yeah here's the workflow it's using a tools agent uh we're just passing the current file that we get from the request from the web hook with a query uh as a prompt we're telling it that it's a helpful coding assistant and that the response should be very on size because it's going to be

### [10:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=600s) Segment 3 (10:00 - 15:00)

passed uh passed through uh text to speech engine so it doesn't go into tangents about like explaining something in markow and so on uh these are the two tools that it has access to so it's a read file uh read file allows it to read a file from my file system uh it has one argument uh one static argument that is tool name and that is to so we would be able to have two different tools in the same workflow and then we're using the specify input schema uh to allow to tell what parameters this tool expects so here I'm just using Json example but you could also be uh using like a Json schema where you could also provide a description for this tool let me just refresh to make sure I don't save this with the schema and we then also Al have the update file which is pretty much the same but now it also accepts new content in the parameter allowing it allowing the llm to overwrite the file on the path and the tools are implemented like this so here I have the execute workflow trigger I pin some data to have an example like when I'm developing this workflow to be able to yeah play with it quickly so it has these two items read file update file with a path a new content uh we SW we have a switch note here to be able to Route it to first read the file so that is pretty simple we just execute command to cat the file at this path and the wrer file uh we convert the text to text file get it from query new content put it to data and then we just write it to disk uh at the location that was passed to this workflow and we respond with the success uh let's test it out uh what's the name of the shortcut use current file agent hey Siri what's your prompt to pass to agent uh explain what this file does so it's asking me to take a screenshot do that the file contains typescript code for making API requests to the flow API including functions for handling Authentication and pagination what's your prompt to pass to agent and now because I have this well whoops stop it please repeat it's going to ask me four times so theoretically I would be also able to connect memory to my agent and then I ask I could ask further questions in the same context uh but let's try a different question asking it to modify this file so hey Siri use current file agent finder hasn't added support for that what was that current file agent Siri use current file agent hey Siri use current file agent what's your prompt to pass to agent uh add detailed comments to this file explaining the functionality of individual methods so there's a high chance this will time out because now it needs to do two tools operation but after a while we should actually see the comments appearing sorry I'm having trouble with the connection please try again in a moment oh there you go now it added these comments see these were not here before so we documented the function based on the parameters of that function if I would go back these were not there uh this was there okay so it just added these headers um yeah so these examples definitely not perfect but it's showing you a way how you can also use uh na10 for some local stuff combined with Siri maybe uh although after playing with this for a while I wouldn't really recommend Siri because it's uh transcript engine is not that good so it would often uh Miss uh mislabel the stuff I would say so you would have to really make sure that you say it with the precisely correct the pronunciation and everything especially when using for example file paths it's very hard to mess up also the timings is sometimes uh hard to get because as it goes to like you you should start speaking after it does the sound but sometimes if you start speaking sooner it wouldn't pick up part of the sentence and uh just send you the the second half of the sentence um so yeah this would be the series stuff any questions

### [15:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=900s) Segment 4 (15:00 - 20:00)

yeah go yeah great talk um firstly are you making any of this publicly available because it would be great to make a copy of the shortcuts and things like that and secondly do you think this would work on mobile with Siri as well MH yeah I've got to work it's a very simple one but yeah you can definitely send requests to uh web hooks on mobile uh the JavaScript automation not sure if that one work but then you probably wouldn't be running the same shortcut on mobile but you can definitely do most of this stuff and also make it say stuff on mobile uh the shortcut I wasn't planning on making available it's pretty simple but I can share it somewhere and now the second part is uh a new feature we've been working on in the AI team uh and that is using nodes as Tool uh so you see I've done this whole process here where I wanted to have two tools which are very simple but I need to like add this workflow trigger Add a switch then route it and uh also the passing of these parameters is bit awkward uh so what's coming in the next uh few weeks uh is nodes as a tool and that's going to look like this um so here I have a tools agent and I want my tools agent to get the thr as articles from Hacker News about open Ai and send the digest to my email and then Lo BL that this email was set to the Google Sheets so if you would be doing this as a subw workflow there will be a lot of potentially steps but now you can solve it with three easy steps so in tools we now support like sub uh some of our notes uh one of them is Hacker News Google Sheets and Gmail so let's add those right now so just put them here just add them and then we'll configure them so first we want to get three last articles from Hecker news about open Ai and when we open this note uh we see it supports multiple resource and operation combinations so as a resource we want to get all the Articles uh and we want to be able to specify the keyword to search for and the limit so these parameters uh we don't want to set them fixed although we could and would get executed but we want them to get provided by llm so what's going to be available in the new uh in a few weeks is this new uh way to specify the dynamic parameter in notes as a tool as a expression so there's this new function called from AI uh this function supports four arguments key description type and default value uh the last three are optional so in most cases you would be fine with just specifying the key so let's say this is the amount of Articles uh and you see that you don't need to follow like the specific uh labels we already have you can name it yourself uh you could also provide whatever description you want the amount of verticles but again you don't need a description so why uh let's copy this because we also want to specify keyword this search well now we have this note set up so we can close it uh we said we want to send it an email and also log it to Google Sheets so now let's send it to as an email I connect my Gmail let me just make sure that I'm connected okay and we're going to send an email to uh so again let's do the from I email address and save for the subject that's it for the this Noe but for content uh you see that we've added email type as HTML so we well we could do HTML content to let it know from the key that it should be HTML but we could also add a description content of the message format as HTML so now it has more context about what kind of argument it should pass to this tool and finally let's set up the Google Sheets note

### [20:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=1200s) Segment 5 (20:00 - 25:00)

cool I have this n10 locks that should be empty here and it has yeah it has two columns key and payload I'm going to select that sheet and I'm going to add oh I want to add uh a pan throw now it start it Maps the available columns so for this one I'm going to say from AI uh email address and here we can could just be a action that it done I don't know summary or something uh primary summary cool so now we have these three notes configured and now we should be able to let check that the yeah now let's execute it so you could saw that uh first it fetched The Hacker News article but then it did two action at the same time it sent a message and it uh appended stuff to Google Sheets and that is because these are not dependent on each other like they could be done simultaneously um it seems like it worked it told me that it did all this stuff uh we can take a look at the locks view to see exactly the inputs and outputs of each tool so for The Hacker News you see that we got amount of Articles and the search query we got some Hacker News stuff that correspond to this then for the email it sent it to my email it added some subject uh it formatted as HTM L and finally for Google Sheets uh yeah it added email address and summary and then if you open the tool now this expression is resolved and you can also see it from here the values to which it uh they resolved and then if I open my email I should have this dig yeah here uh yeah so there's more tool notes as tool coming uh we're going in the first badge we're going to support these but uh there's theoretically nothing stopping us from implementing this for more nodes uh as once we have the basic structure for this it should be pretty straightforward uh and that's it you have any questions please ask now yeah so how does pagination work inside of this and uh doesn't do that automatically or uh do you have to set it up in the s like you normally would for HTTP can you repeat the question oh yeah how does the pagination work so right now we're sending all the data to the llm but in some cases it might not be ideal for example here uh Hacker News you see there's bunch of stuff that you don't really need to send to llm so this the feature that is coming next for this is uh being able to specify well either include or exclude Fields uh or be able to specify like uh cleaning up function to how to clean up the response before sending it to LM something similar we have for the HTTP request tool where we have this optimized response and you can do stuff with that uh but for pagination uh I guess that would be mostly for HTTP requests right yeah my thought would be like if I had a request that went to Gmail or to my very busy Google Calendar that it would probably be over the Google um single reest Quest limit um and if it could get a little bit more stuff from there uh and maybe store somewhere like it normally would look in the memory I don't know if that built into this or that's in mind no not that would be up to llm to decide that okay I did not get all the results and then ask user if the he wants to fetch more results and then we'll just pass different parameters I have one more question SL request which would be uh if this would be open to any type of uh Community bu uh tools like if there was some type of community uh node building framework coming like we have regular for regular nodes that we could add our own nodes to be a node as a tool because that seems like it is something that would really bring this to the next level yeah for sure uh so this is the pr that uh where I'm working on this feature and you can see that for actual note stuff there's very little that you need to do to make your notes supported it's ler one toggle us usable as tool through and that would

### [25:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=1500s) Segment 6 (25:00 - 30:00)

make that would add the support for the node so you're saying this is already this supported out of the box once this one is merged yeah got question so for the AI agent if you open up for example that have to use tool what context is the AI agent getting like does it does do does the AI agent know that this from AI statement is inside something called limit like does it know limit okay um yeah that's a good question so you see if we open the regular Hacker News uh Hacker News note it looks almost the same but without this description type so for the tool we have a description type and description type you could either let it set automatically and that is for the cases where we know where we have a resource and oper because then we probably can say with a high amount of certainty that this is the like we are able to describe it based on these two Fields but you can also provide a manual description of what is this tool doing and then it doesn't know explicitly about these fields so it only knows about these two uh that you've defined uh using this from AI expression so we're only going to send it uh these you can fill out these two parameters amount of Articles and search query and because the type is not specified it's going to be a string but you can also make it I don't know a number date Json whatever you want but the llm doesn't know about the full note type uh description yeah as a quick followup is it um is it one action per node like so if I want a Gmail you know search and a Gmail send email am I adding two Gmail nodes to the canas to configure this can I access all those all can you repeat it please so let's say I want to have a you know a Gmail a delete email action and a send email action am I dragging on one node and setting that up AC drag on another one for the you probably wouldn't have to because these well you yeah you would are not Expressions so you can't Define them based on the parameter that llm would sent uh but if this would be an expression you could theoretically let the llm also pass those as Expressions but chances are that your the fields that you want to populate would be different anyway so it probably Mak sense to have two separate tools the main reason I was asking as well is I guess with the way it's designed right now is it's very easy to not allow the LM to delete things yeah that's for sure yeah than sure cool um how many API calls you have in this F for open a so uh this was three API calls uh we can see in the logs the first one was just us giving up giving it available tools and the user prompt uh then it uh it told us that we need to call First The Hacker News then we execut the Hecker news note and send it the response of that note that's what's happening here and then and it decided that it needs to call two more tools uh this one the Gmail and Google Sheets so we called it provided with a response of both of these tool at the same time so a single API call and then it finally made the final uh it gave us the final output that it successfully done these things okay and follow up uh on the fation thing what happens if the response is uh bigger than your context window yeah right now but it would throw an error but there's going to be some ways to optimize the responses before you send it to the node and actually one more yeah go ahead um uh do you plan to support multiple agents to be used as tools for example or to yeah let me show you this thing I mean don't tell anyone that I show you but so this is how that would look like I tried to play a bit with it yesterday and it works but it was actually performing worse than just connecting these tools because then you need to be very specific about what each agent does and you need to provide the description of each agent to the parent agent so that it knows uh like what should it call it with because then the agent is responding like okay you didn't provide me with this stuff and then like especially the smaller models they're having a hard time figuring out like what is the right stuff that I should provide to the uh child agent sort of because it doesn't have that whole schem scha because to the child agent it

### [30:00](https://www.youtube.com/watch?v=96XWx-7WZM0&t=1800s) Segment 7 (30:00 - 32:00)

only has a prompt as a parameter the parent agent while the child agents it has the parameters of all the tools so in this case it wasn't really worth it I was just testing that it worked and um it did but I'm sure there are scenarios where this could be extremely powerful but that's not going to be available out of the box because there there's some issue with uh how we do these connections even here you can see that there's still main connections uh which shouldn't be there but it's kind of hard to get off because so we're using the we're using Dynamic connectors so that these are calculated via expressions and overriding them based on this flag is slightly more tricky than just for the regular notes but it might come thank you yeah what you could do today as well if you do want to experiment with multi- agentic stuff is use the subw tool you can have agent living on a subw level you just by the time it goes back to the parent workflow you going to have like a standard response um so that's how you can like one example I had I was using like C Opus to do some like validation checks Opus is really expensive so I was using that as a tool when my cheaper LM decided to do it and thus I was getting Opus a lot less than like running in like a serialized agent where Opus checks every run coming yeah oh and if you guys interested in running stuff locally especially the AI uh We've recently released this local AI uh selfhosted aii startkit on GitHub it's essentially just a Docker file with all these Services set up already for na10 uh that you can you can run and you would get like a qu quadrant or Lama uh running both on CPU or if you have a GPU on GPU uh and n8m and all of it would be already configured for you so that's available in our repo selfhosted AI star kit just as a side note and yeah that's it thank you very much

---
*Источник: https://ekstraktznaniy.ru/video/15564*