# Remove Duplicates Node Update: Remove Duplicates from Previous Runs!

## Метаданные

- **Канал:** n8n
- **YouTube:** https://www.youtube.com/watch?v=zXhuQj_YJ5I
- **Дата:** 19.11.2024
- **Длительность:** 6:01
- **Просмотры:** 4,820
- **Источник:** https://ekstraktznaniy.ru/video/15555

## Описание

Our Remove Duplicates Node now supports filtering data from previous executions, letting you process only new items in your data stream—no more need for home-rolled solutions using external databases or spreadsheets!

Learn more: https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.removeduplicates/

00:00 - Intro: Poll based approach
00:48 - New Remove Duplicates operation
02:57 - Configuring options
03:40 - Running the workflow
04:46 - Other deduplication options

#Automation #n8n #lowcode #nocode

## Транскрипт

### Intro: Poll based approach []

for many automation scenarios answering the question have we seen this item of data before is imperative to implementing business logic a classic example would be when you don't have some sort of event-based trigger so you have no on new contact created action so instead you would typically do a poll-based approach where every period of time let's say every hour you fetch a collection of records and get many operation for example and then you handle duplication which is where you're identifying basically whether you've seen this item before whether you've processed it before whether it's Unique and new or not that's been possible to do in edn for years but the duplication part had to be home rolled that is you had to bring your own data store for example postur database and Implement that logic yourself thankfully the nadn team has made that much easier with

### New Remove Duplicates operation [0:48]

the new remove duplicates node which in short handles the majority of due duplication use cases in an automation context and has some related actions that help you identify whether an item is unique or whether you've seen it before let's take a look so in this scenario I'm fetching articles from hacken news so if I run this workflow I can see that I'm outputting 10 news articles it's CU I got a limit set for now and what I'd like to do in my use case is never double process an item so if this workflow has seen that item before I don't want to see it again or process it again so to do that we can click the plus button here which opens the nodes panel and in here in the data transformation section we can find the new remove duplicates functionality and then in here we're going to want to choose the remove items processed in previous executions there's a few other actions available we'll go over those in a moment so I'll click to add that how this node works is there's the operation which we just picked in keep items where this is where we can decide under what conditions we keep or filter out the items that are coming into this node so for this case we want to make sure we never run an item twice for here we're going to rock the default which is value is new there's a few other options for example value is higher than any previous value this is super helpful if you have an incrementing DB so it might be that you've seen item 20 before but you haven't seen item 21 and 22 but if item 25 comes in that's the latest item because you know that because you've chronological IDs and then here's one where you can work with date values so how this functionality works is for each new item coming in you define the value to dup on that gets saved to a database under the hood in n8n each time it sees a new value to dupon it checks across this database to know if it's seen it before so an ID a unique ID a uu ID these are the types of values you would typically put in here but you could also combine multiple values so let's collapse some of this other information here and look in this hack and user article and find some sort of unique ID so here object ID is that unique ID so I can actually just drag and drop that on here this sets an NN expression and for each incoming item so there's 10 items coming in it'll set the ID there there's also a few different options

### Configuring options [2:57]

available one of the main ones I'm going to call out is the scope So Right Now the default scope is at a node level what this means is that this specific node maintains its own data store of what it's seen before so if I have another removed duplicates node in the workflow those will be disconnected and it won't know its source of Truth we can also set a workflow level uh duplication which is what I'm going to do in this case because at my workflow level I only have a need for one dup step in my flow so I'm going to set the scope to workflow there's also a history size option its default is 10,000 so what this means is that the data store under the hood will keep up to 10,000 items before you need to clear those now there is an action to clear the glication history okay so I've got my options set I've got my value Tod dupon

### Running the workflow [3:40]

let's run this workflow as expected I have 10 items going out the kept branch and zero going out the discarded Branch because these are all new items that I've never seen before we can look in the canvas and we can see that we've got 10 items going out the kep branch now I'm just going to add a noop node and this is simply so we can help visualize this now let's run it again since we're running this a second time I can see those 10 items are now all in the discarded output here and that makes sense because we just ran this once we added those object IDs to our data store and so when we ran it again we've already seen those items now we can test that by going into the hack and use node and let's say let's increase this limit let's actually fetch 15 articles so now when I run removed duplicates we can see that only five items are coming out because those 10 that I already saw got discarded and then the five that we keep so that's how we handle a very classic D duplication case there's a few other actions and modes in the node but they all sort of work with the same premise and Concepts that I just showed you and just to quickly call those out so we've got remove items process in previous executions the one we just did There's remove items repeated within current

### Other deduplication options [4:46]

input so that's very similar but what it's basically doing is taking all these input items and making sure there's no duplicates so that could be really useful if you're let's say ingesting a database that did not have good G duplication strategies and it does have duplicates and when you're comparing you can compare all Fields so it has to be an exact match to be a duplicate or you could remove certain fields or just compare by uh certain specific Fields so here if I just wanted to compare by title I could drag that in here now what's happening here you'll see is we haven't mapped an expression it's just expecting the name of this key here but NN is going to handle that under the hood for you so you can just always drag and drop and NN will pick the right format to insert in here so now in the rest of the worklow you know that you have unique items that you've never seen you can assume that in your logic and you can continue with unique items doing whatever you need to them loading them somewhere updating them Etc this is a new node in NN there's definitely more functionality to be added and that happens based on your feedback so once you get a chance to check out the node if you have any product requests or feedback to the NN team head to community. nend. io happy flow gramming
