# Day 10:  Notion Database Assistant, Email Scraper collab with  @workfloows  [Update #05]

## Метаданные

- **Канал:** n8n
- **YouTube:** https://www.youtube.com/watch?v=ugikPbDF4-I
- **Дата:** 11.09.2024
- **Длительность:** 34:21
- **Просмотры:** 4,049
- **Источник:** https://ekstraktznaniy.ru/video/15613

## Описание

I've made some major progress on my Notion Database Assistant—an AI-powered tool that lets you chat directly with your Notion database. Next, I’m teaming up with  @workfloows  to build EmailSpy—a tool that finds public email addresses for any domain.

0:00 - Intro & AI Sprint Day 10 Recap
A quick recap of the AI Sprint covering StarLens, EmailSpy, and the Notion Database Assistant (still needs a name!).

2:15 - Brainstorming with  @workfloows  
Collaborating with Oskar to brainstorm ideas and kick off our new project.

11:29 - Starting the Notion Database Assistant (Day 9)
Diving into the early stages of building the Notion Database Assistant on Day 9 of the sprint.

15:40 - Building Out the Assistant (Day 10)
Day 10 progress as we expand on the functionality of the Notion Database Assistant.

27:58 - Cloning the Assistant for a Different Database
Using a ReAct agent, we successfully clone the assistant for the Support team’s database schema—second time’s the charm!

31:49 - Adding 'Se

## Транскрипт

### Intro & AI Sprint Day 10 Recap []

it's the end of day 10 of my AI Sprint and there's a lot of exciting updates first off we've got Starland had a great launch on product hunt we got over 180 up votes and we also got featured in product Hunt's email so that was really nice surprise and I checked the cash and there's been over 1,600 unique requests already at this point so I'm very proud of the first project that came out of the AI Sprint and I also got to meet with Oscar from workflows and we had a great brainstorming session on what we should build as we collaborate so we're going to be building an email tool that takes a domain name and returns public emails available for it we're going to use AI for some of the fuzzy language things around that and for some of the scraping use cases later in this video I've got a cut down of US during our brainstorming session as we came up with that idea and we should probably start on that next week There's Been major progress on my notion database assistant which is the project I picked up after launching di at this point I've got a working version that works with a knowledge base in notion so I can ask it about individual records and it will search and return those it's got multiple tools one for searching a database itself and one for searching within a page itself it's handling the peculiarities of the notion API it might not hold up for 25,000 records when you're doing semantic search across those but so far it's been great once I've added the search within a page tool I've also gotten it to not hallucinate but the real I think headline here is that I made a workflow that makes other workflows so I basically took that knowledge based workflow and I copied it workflow Jason I fed that into a reacts agent in NN and then I also fed in the database schema of another database and it worked so today I posted on our NAD and slack about that and had a few folks interested in that so I cloned our support team's knowledge base and on the second attempt to generate it worked on the first one it didn't what I'm working on now is to improve the percentage rate of Generation by having some validation on the work for Json that comes out and if

### Brainstorming with  @workfloows [2:15]

it fails some that generation looping it back into the AI and having it try again that's what I'll probably show in the next update so for the rest of this video check out some of the more detailed highlights from the last couple of days hi I'm Oscar I run a YouTube channel called workflows I share tips tricks and tutorials about workflow automation especially with so I've been tracking some ideas in here but I was wondering before we jump into that Oscar is there anything that you've wanted to build that you maybe didn't have time for that you'd like some support on especially when it comes to AI use case I actually have IOS mode prepared if you want I can share this too so I have several ideas there are four of them three are like AI based and the fourth one is like a quite a simple app I would say basically the idea number one is something that I was working on my like early days with na10 back then there were no Advanced AI tools I think we can level up this with AI what the Tool uh is supposed to do is you enter the domain uh you want to research for example in this case it's work. com and in the outputs uh you should receive uh the list of emails that are publicly available so this is super simple I see the AI agent going through websites search results and retrieving potential uh em ultris that can be used for proposes which is for research so yeah this is the idea number one here's the number two option which is Emoji API sometimes I think it may be useful when you have a PL TCH and you want to enrich this with Emojis or you just you are searching or want to automatically search for some kind of emoji and you don't know which one to use so you can do it like programmatically and I have it in the form of API but we can create some kind of UI for this there are like some endest like Emoji to text to emoji and so on so forth and for example like the search for emoji the third idea is something you already built something on the top of GitHub some apps like for example na10 are releasing like weekly or quite often you know and it's sometimes quite difficult to catch up on all the changes that have been made in a specific tool specific repository so basically doing an automation that will create for you on a weekly basis some kind of uh brief uh of the changes that have been made in the repositories you are following it can be super cool to pick up those pictures and send you like the weekly digest I think that's useful as well because I could see that being very settings to modify like level of detail errors only I really like this idea as well because I think it's has a lot of utility there whole products that do this already and I love the examples where we can show how we can recreate Solutions do as a whole product in NN a lot of NN users use different tools and packages and open source products this is probably like a real pain Point as well so I guess the question is you've made our harder Oscar because you got four great ideas here and I think we should probably just focus on one of these four ideas CU these are some great ideas but I think the hard problem now is which to pick oh yeah so with that in mind I can provide quite a lot of build support because for the rest of the month I'm just focused on this fulltime well what's your schedule like how much time could you dedicate to collaborating on something I've been recently quite tight on work but I think you know doing something about like hour to two hours a day should be toble I think this prodject should not be superent yeah I and I think also the goal here is like we're trying to create some stuff that people could recreate themselves or download these templates and tweak from there if we make it too complicated it'll be a little less accessible what's your favorite I would go with the first one because I see the proper value for a lot of companies here doing their research and so on it should be relatively quick to implement especially with an and so this one I would say is my big the second one some kind of emoji feature API or an up it would be also cool you know the other two also great ideas with idea number four you need to create a more extended infrastructure because you need some kind of scripts to process those images when it comes to the idea 3 you already did something with GitHub so we don't want to be automatic so I think going with number one or two The Sweet Spot and so for email let's talk about the two ideas quickly so for Emil py I see that it could have a few different data sources for this information maybe that's configurable what were the data sources you were considering before I bias you my big like the initial big and most probably the one that will provide the highest number of results and data is simple Google sech results and going through the websites picking the content from them extracting uh IM there are exceptions here because in many cases email addresses may not be in an obvious format like here and stuff exactly so we have like plenty you know possibilities here that both you know custom scripts and AI can play a significant role I think search results and going one by one for the websites is the most important we can most probably use some kind of open databases which we can connect with some kind of Customs creative Puppeteer wherever it's available for those terms and so on because we don't want to go to the shade Isel I think going with the Google search results and maybe something additional would be cool but I don't have the specific you know data sources here in my besid result so how we could structure it is it's modular maybe one of the tools or sub work forward to we'll figure that out is the scraping of the side we Google Search right and then we'll see how long that takes we could potentially add another tool to it so we can also show users how they could add other tools maybe someone pays for people data Labs which provides this kind of information they want to check there first and then scrape additionally and then what this we could also build is whatever sources that you get it he duplicates it so if two of the sources come back to the same thing I'll put the same one what I like about it is if we build this as a backend it's headless and then with some workflows I could in 10 minutes create a form trigger workflow for people to demo it easily and we launch it with that as well slack boot teams box whatever it's at that point I think we could ship it as a headless workflow with maybe one or two example works on how to consume it um and then from there hopefully you know people build their own ones and kind of remix it their own way um yeah totally basically you know this is like I think building it in a to it's like I think this is the perfect work but to build like in the form of you know some kind of request response you know API you style because uh like you know eventually most of this data will be based in some kind of you know spreadsheet or other database you know so people don't want to like go through copy pasting from some kind of interface of course it's nice to have but you know eventually like going with the API way is like super handy I think it decouples it we see it in e-commerce headless Ecom is all the rage these days because your front end sometimes it's a mobile app sometimes it's an flexible but also with the Advent of AI nice mug by the way I like that on the mug that's Clippo The Unofficial it's unofficial and mascot back to the point so with the Advent of AI right and AI agents and it being easier for people to build those themselves I feel like building in a headless way makes it also future proof to be used by other AI agents if it's a restful endpoint in multi- agentic workflows is something that's we're seeing because even with na AA the assistant that's on cloud right now it used a multi-agent approach because we found that one agent solving all problems is a much harder thing to do it's less accurate so it's better to instead triage what the problem is and have specific agents for specific types of problems and so this could you know in future be used by something like that basically do we see any reason to not build it in a headless way absolutely not I think like bu get the Headless not is like a perfect approach and then we could also do once this once it's out Oscar has a great tutorial on using basically to put a little service on top of your end workflows to really make them production API endpoint perhaps we can launch this as a service on product hun but I recommend we build it first see how it's looking and then we could get together and figure out how to launch it how does that sound yeah totally I'm looking forward to this cool yeah you have a great time you know building and a few moments later

### Starting the Notion Database Assistant (Day 9) [11:29]

I'm starting on my chat with notion database template in here and so what this is going to do is allow users to use the idend chat trigger to chat with a notion database but not as a vector store we're going to use an AI agent and we're going to teach it how to interact with a notion database as a tool one thing I want to do with this feature is make it a bit more sophisticated to where the user just has to type in the ID of this database when they're setting it up the workflow will dynamically fetch these things create the schemas that it needs to in the AI agent so users don't have to remap that all themselves a neat thing that I'm doing is using chbt to create a lot of the mock data when I'm testing things so for example give you a look at this I'm creating an AI assistant propose some knowledge Source types so it was even giving me ideas on what this use case can be I really like this automated knowledge based one I think that's a very relatable use case probably a lot of people have knowledge sources that they'd like to interact with AI agent and then we went through the columns that it should have and then from there basically fleshed that out to create uh my mock data and Json um because I was going to first try to copy into notion and I figured I'm going to do this with n an because it'll be faster where I'm at now is I've created my different tables in notion I've connected to my notion so when you're using n ITN mine's called max box it's already added so it's here but make sure it's added there and then in my workflow this is the workflow that I started building for my no and database assistant but here I'm just creating a quick little flow just to load this data in so I added a set node from here I double clicked it I hit the edit button and I pasted in what I got from chat gbt you got to make sure that you have objects in a top level array that's how chat GPT happened to Output this but if it doesn't you'll have to instruct chat GPT to do that or add these in your yourselves so as you can see got this all in there so I'll click on here I'll add notion we want to add a database page um we'll select from list it's called my knowledge source and so for the title of the document something to remember is each notion page needs to have a title by default but you can rename that column so I've called it question and so for title I'm going to drag and drop the question so we got 14 items coming in so I'll expect to create 14 knowledge Source uh entries in here and then for the properties for this let's see for the answer we'll drag and drop that in there and we'll add another property and we'll do this for all of them we'll add another property and for here we want to map the department now department I have modeled as tags so we don't have drift here and this is a common Case by default there's a drop down right so if I pick operations I can't necessarily map to this if I flip to Expressions we can see that the tags are just in plain text like this so if I write operations with a capital letter here this will match to that tag so what we could do is drag and drop this in here and then we just got to double check in my data that we have the same capitalization now of course we could pre-process that if we wanted to but since this is mock data I have a bit more flexibility so it capitals HR and capitals operations and product that is how we have it in here great it will map to the department uh select type in notion a notion we can see we're populating these that's great if I seeing here we're seeing it all being added here that's perfect now I didn't add too much prompting to chat gbt on being more consistent with my tags or something so that might be something that we could improve but since I'm building a template that should work with any schema um it doesn't really matter for this use case so now we've successfully mapped those in and we can build my actual workl now that we have some representative Muk data 6 and a half

### Building Out the Assistant (Day 10) [15:40]

hours later I'm building out the AI assistant you can see there's an example of it I just ran here I asked what is our VPN and the workflow responded with correct information and also the relevant pages so through chat prompting I was able to get it to always reference the Pages it used to inform this answer and if I click on these we can see those are valid links that uh link to the particular document it used to inform this answer so how does it work if I go in here it starts off with a chat trigger this node handles this UI interface it's also generating a session ID which gets fed into my AI agent step now I haven't attached memory yet but once I do along with the session ID it will allow messages in the thread be fed in as a thread to the AI because right now if I ask another question it runs another execution of this workflow whatever gets outputed here is return to the chat and it's stateless so if I send another message it won't know that this happened to do that from where my work was right now I basically would just need to attach some memory and it would work out of the box because I'm using a chat trigger if you were using a different kind of trigger for example web hook you would need to be sending in your own session ID and as long as by the time that data comes in here with a top level session ID key the AI agent will pick that up so what's happening in these two nodes I'm using the edit Fields nodes and they're basically just setting some static data here so if I go in here it's setting my database name which I use later on and I often have a set node like this after my trigger node with certain variables like this because it makes it easier if I want to swap my trigger node later let's say Swap this to a web hook Trigger or I want to also be able to run this workflow right you can have multiple trigger nodes in an end workflow what's going to happen now is when I need to reference these variables from this node let's say in here right so I'm referencing the name of the database and that's not the notion name of it it's the custom name that I set my set node if I change my triggers now this won't be effective so that's just a good strategy for workflows is to add some sort of variable node right at the beginning the next uh node that I add here basically resets the structure going into the agent to what it was going in here so it makes it look like this again because if I go in here input data is from the previous node right which is setting that database name so I'm basically just resetting that and how that works it's really nice with this drag and drop feature I can basically take this for example drop it here and it set the key and the value there um then that information gets fed into my AI agent let's open that up quickly in here I have it set to a tools agent because I am using tool I'm using the search notion database tool which I created it's an HTTP request tool and then for the text we're using Define below we could in this case just use take from previous node automatically because it's in the right structure going in it's the same structure as from the chat trigger but I was experimenting with adding some suffixes even things like I love you at the end it wasn't really working I don't recommend it but this is a way that you could combine what the user is actually sending with some other strings and then for the system message I actually use chat GPT to help set this up essentially there's role Behavior error handling and output sections and it's really the output here where I was kind of iterating you can see in the end here do not output links twice only in relevant Pages section I sort of slapped this in because it showed the relevant links twice adding this I haven't seen that since so was iterating on this quite a bit and then we have the chat model itself nothing too crazy in here I've just got a shorter timeout set right now that I added from the options because I was finding that if it was taking long time it was failing and it was attempting to use my tool multiple times so I just set that to Lowa to reduce how long I'm waiting and how many tokens I'm spending while I was configuring uh the HTP request tool here then I have an HTP request tool I added from the nodes panel here and if I open this up it's got a description so use this tool to search the again I'm referencing the name of that database so far I've been finding it very helpful to use the exact same name especially when it's in two different context so if I'm referencing something in my tool I try to as much as possible have the exact same name in my a i node because at the end of the day these llms they're finding weights between certain characters and text right so I'm not an expert at this but I assume it's better to use the exact same name there this is a post uh request I need to make here's the URL with the database ID for a lot of this stuff I was using chat GPT just asking hey give me a CO request or give me body parameters uh for so and so call we are using Jason below for specifying this previously I was experimenting with let metal specify entire body it was working some of the time actually so what I had done is just let model specify entire body and just piped in a lot of context on how the notion API works here but this approach to specify it and then use placeholders on what the AI is allowed to modify so here keyword is in KY brackets and I've defined it as a placeholder below this is what tells the AI that you can manipulate this part of the query I found that specifying this and locking down what the AI can do does lead to more predictable results so you can see here I've added two things for the AI to be able to control the keyword itself this is affecting question property here so that would be this one now because notion has a strict search not a fuzzy search I told the AI that it should think of this as a keyword so search questions of the record use one keyword at a time uh I changed this because it was searching for more specific phrases and getting very few matches so now it searches for something like VPN pulls in all the issues around VPN and searches for those so I think this strategy is going to work when you're talking about dozens and hundreds of Records it might be a little less efficient for thousands of them but it also can search by tag here are the options for the tag this is something I can automate because obviously users could update these tags so I want to make that fresh and then the important thing here the way notion works is make sure this is an or filter or be conscious if you're doing an or an and because I had this set to and I was getting no results back so another thing that I haven't done yet but it could be useful is when you get a no results query copy that response from the output and perhaps as context for your AI tell it in the description here what that looks like because my AI wasn't able to identify that it was getting a no results query it was seeing that it was a problem now it wasn't a problem there just was no results for that specific query which could be good context for the AI to know so that it tries something different perhaps a synonym so that's definitely something that I could optimize there then what I did next so that's the workflow working is if we go to uh the project here I created a workflow that takes a different noi database as input or the URL for it and outputs using a react agent a new workflow that works with that new schema so let's try this out copy this go here so enter a notion database URL we see you even have a placeholder here we'll copy this now the nice thing is I set up some pretty quick error validations so if I do this so sorry that doesn't look like a valid notion database if I send in a valid one it's now generating and while it does that let me show you the workflow in the chat trigger the first thing that we do is try and fetch that notion URL by URL I pipe that in as an expression and then I had GPT generate a match function and what it does is this match function filters out anything that's not a valid notion URL so if the user type something like here is my URL this would still work it wouldn't fail it would just extract this but if they type in something else that doesn't get extracted and uh this node errors out because this isn't a valid parameter because I've got on error set to continue using error output it outputs to here and then in this node I make sure I carry along the same session ID to return uh to this session and then output is the key that the chat trigger expects so any node that is the last node to execute if it has an output that's the value of that will be sent to the chat trigger so I'm exploiting that very simply by just having a static message here sorry that doesn't look like a valid notion URL we could add a switch Noe here have some conditional logic checking that error the error that comes out you do get some key values on type of error and so I could have multiple static messages sent out to the user for better feedback but since I'm still in build mode I'll do some stuff like that later on let's check the workflow great it's generated so let's just copy that let's go to a new workflow let's save this let's open paste it in okay so it's a valid workflow Json and let's ask in this new database it's about companies let's see which companies are from J Germany da da and there we have it so no tricks I ran this workflow it took my original flow it took the context of the new database schema and it outputed a valid workflow Jason and if I just go into the generate workflow uh what's happening is I get the notion database I strip that down a little bit because it has too much information it's not so relevant it's going into an AI so we want it to not be too heavy I then uh have the workflow Json from my previous workflow being outputed in a code node I'm doing it because it itself is Json so I needed a way to not break that right and then in my AI agent I have a prompt modify this original workflow here's the new notion database details this is basically a schema that I got from the first node the notion node the original workflow we want to add to Json string here to stringify that and then a pretty simple prompt uh you were Ed an expert developer there was some stuff on the Jason output because it was outputting it with line brakes and stuff so I actually pasted in the broken Json that it was sending into cat gbt asked it to give me a prompt modify to fix that and it's been working so far so yeah we now have a workflow that's creating other workflows I'm not so confident that it'll work for more complicated properties in notion table now so that's next thing to test and basically tweak The Prompt or give it a bit more context so then we can launch basically a generator that takes my template takes your table and outputs already made working workflow for you where all you're going to have to do is swap out the credentials a few minutes later our

### Cloning the Assistant for a Different Database [27:58]

support team heard about the notion database workflow and they also heard that it can be cloned with AI I just duplicated their support knowledge base this is Ned 's official support knowledge base that a support team's using I just duplicated it and I just added my Max bot as a connection what we can do now is copy this here open it in a new tab if I copy this paste it in the workflow is now scraping the schema for this here so name tags updated by all these different ones let's see if this works it does not know at least from my workflow about this structure for checkboxes maybe you can figure it out if not I'll basically ask cat GPT to Output some context for each type of notion property here for a database and see if that improves the results exciting stuff jeez if this works out of the box without any more tweaks I'll be super troughed and if not I reckon a few more tweaks and we'll get it to work okay got the workflow let's copy it what I can do next is also have this bot just create the workflow on the instance because we have an NN node in NN that can create work force so getting very Inception um but let's create a new workflow let's name the workflow be good eggs assistant past it in okay that would work for Jason let's just check this out real quick okay did not work although it did update the tags correctly but not the Jason object let's give it one more go it looks like we got a bit more work to well at least for it to work automatically for every single type of notion database which I think is quite a cheat and if I can get that done by end of day I'm going to be thoroughly impressed it's crazy to me you know like this is being represented by these three dots but what's happening under the hood while this is running always kind of humbles me I've caught myself sometimes it's you know taking 15 seconds I'm like geez it's taking so long and I'm thinking wait a second it's automatically generating a workflow Json that can automate a process for you copy this put in here delete that paste in and run it how do I set up ldap because we see ldap is in the support base and it's in the name let's see yeah it worked awesome so we need to figure out the reliability of that a little bit what I could try and do um is have some validation checks so we saw in the first example it was just outputting body so the first thing I'm going to try and do is create some checks that can check for that and basically reject that we'll try first doing it outside the agent so it's deterministic maybe kind of reject check checking for that part of the work for and we'll rerun the agent until it works because we saw that it does work so that's going to be the first approach I try but again within 24 hours I built a workflow that can understand a notion database and another workflow that can take a different notion database and customize that workflow to work for it and I got that working right now 50 to 70% of the time so let's increase that number great progress within one day I Reon a few inches later I've updated the

### Adding 'Search Inside Page' to the Notion Assistant [31:49]

workflow to now also be able to search inside database records because the support teams records mostly had the content the how-to stuff the useful things were inside the page not as properties in the notion database now the agent can search across the notion database with things like tags and names of things so we can do that rather efficiently and not have to download every single page in the database then for ones that it thinks are relevant for example when the user asks to summarize something it can search for that page itself and pull the blocks content in notion from that page I haven't even really formatted it too much just stripped it down just a little bit inside the HTTP request tool so here I'm optimizing response I asked cat gbt what the response from notion looks like results is the relevant one so I pop that in there and then I asked GPT to which Fields might so what this stuff is doing here is notion is sending back a heavier payload and this is stripping it before it touches the llm so it's a more efficient way to pipe data into the llm because it's sending some more relevant information and I could probably optimize this further but moving quickly building in public and using llms to build llm powered things so let's give it a go and then how L summarize it and so now and we'll see this in the preview on the right here yeah it's called multiple tools if I look at the tools we can see that we searched the notion database first for ldap and tags and this is using an or so it came up with anything with that there was some responses there looks like more than one and then the model ran again decided to search inside database record so it used the second tool which is why it had this context to summarize of this information if we open it up we can see this is from the support knowledge database 12 seconds later so that's the end of this update but every day on Twitter and Linkedin giving updates throughout the day as I continue to Sprint I'll catch you tomorrow the next day of the AI Sprint