How to Scrape Any Website in Make.com (2026)
1:12:42

How to Scrape Any Website in Make.com (2026)

Nick Saraev 26.02.2024 196 299 просмотров 4 511 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Update: Redfin has since added anti-scraping protections that invalidate the specific method I showed in this video. Instead, scrape it using a service like Firecrawl.dev, Apify, or Browserless. The HTTP request process I showed still works for the majority of static sites, so don't let that discourage you. WATCH ME BUILD MY $300K/mo BUSINESS LIVE WITH DAILY VIDEOS ⤵️ https://www.youtube.com/@nicksaraevdaily GET THE FREE MAKE.COM BLUEPRINT 🙏 https://leftclicker.gumroad.com/l/zcolpr JOIN MY AUTOMATION COMMUNITY & GET YOUR FIRST CUSTOMER, GUARANTEED 👑 https://www.skool.com/makerschool/about?ref=e525fc95e7c346999dcec8e0e870e55d SUMMARY ⤵️ In this video, I'll show you how to scrape any website using Make.com and AI! First, I'll demonstrate how to gather data from virtually any source and transform it into structured information that you can use for various purposes using AI. You can customize outbound emails, build simple parasite SEO campaigns, etc in minutes. Then, I'll show you how to scrape a large multinational data source like Redfin. We'll build custom parsers, apply sneaky headers to avoid being detected and I'll show you how to dump the data to a Google Sheet for later usage! MY SOFTWARE, TOOLS, & DEALS (some of these give me kickbacks—thank you!) 🚀 INSTANTLY: https://link.nicksaraev.com/instantly-short 📧 ANYMAIL FINDER: https://link.nicksaraev.com/amf-short 👻 PHANTOMBUSTER: https://link.nicksaraev.com/pb-short ✅ CLICKUP: https://link.nicksaraev.com/clickup-short 📈 RIZE: https://link.nicksaraev.com/rize-short (use promo code NICK for addn 25% off) WHAT TO WATCH NEXT 🍿 HOW I HIT $25K/MO SELLING AUTOMATION: https://youtube.com/watch?v=T7qAiuWDwLw MY $21K/MO MAKE.COM PROPOSAL SYSTEM: https://youtube.com/watch?v=UVLeX600irk GENERATE CONTENT AUTOMATICALLY WITH AI: https://youtube.com/watch?v=P2Y_DVW1TSQ FOLLOW ME ✍🏻 My content writing agency: https://1secondcopy.com 🦾 My automation agency: https://leftclick.ai 🕊️ My Twitter/X: https://twitter.com/nicksaraev 🤙 My blog (followed by the founder of HubSpot!): https://nicksaraev.com WHY ME? If this is your first watch—hi, I’m Nick! TLDR: I spent five years building automated businesses with Make.com (most notably 1SecondCopy, a content company that hit 7 figures). Today a lot of people talk about automation, but I’ve noticed that very few have practical, real world success making money with it. So this channel is me chiming in and showing you what *real* systems that make *real* revenue look like! Hopefully I can help you improve your business, and in doing so, the rest of your life :-) Please like, subscribe, and leave me a comment if you have a specific request! Thanks. Timestamps 00:00 Introduction 01:19 What we'll be covering 03:21 Scraping a website with the HTTP module 07:03 Text processing & turning HTML into plaintext for AI 08:37 Using AI to summarize a scraped website 13:37 Building a Google Sheets integration 16:08 Scraping a structured data source (Redfin) 23:13 Using Regex to parse Redfin 34:17 Advanced Make.com scraping—two HTTP calls in one 37:15 Using cookies in request headers 46:27 Parsing & splitting HTTP call results 55:05 Testing Google Sheets integration on real data 1:07:40 Final test of Redfin flow from end to end

Оглавление (13 сегментов)

Introduction

hey guys welcome to another video in our course on make. com titled make. com for people who want to make real money and this video I'm going to dive deep into one of the most popularly requested features of make just going off the number of comments that I get about this every day and that's how to scrape any website you want with make. com to pull information structure that and then use that in basically any other flow I've seen this used for like email personalization to like great competitor websites you can get a bunch of info off like e-commerce platforms and use that to do your own product research there are a million and one different use cases for what I'm about to show you uh and the systems that I'm going to be building just in the next 20 or 30 minutes are systems that I actually go out and I build for clients that pay me between 2,000 usually between $2,000 to $5,000 a pop sometimes more sometimes a little less but I want you to know these are basically production ready systems um these are systems that you can actually go at and start employing literally today the second that you finish watching the video you don't need to know how to like script crazy programming skills anybody that has a reasonable level of make. com skills certainly everything you guys know up until this point in time can do what I'm about to show you so if that sounds like something you're interested in then stay tuned and let's get into it now one of the reasons why I'm so

What we'll be covering

excited for this video is because this is personally one of those like mindset shifts with make where like when I figured it out I just got it and everything else in website design and business and obviously Building Systems and stuff like that just got a million in one times easier as a result so what I'm going to do here is I'm going to start by just scraping a random website just one of my own and then I'm going to look at ways that we could structure that data we're going to look at like what we get when we do the scrape uh various tools and techniques to turn that data into something usable and then I'm going to go and I'm going to scrape like um a more like listing based website and what I'm going to do is I'll scrape redin which is basically just like a big real estate compendium uh and we're going to look at how to turn redin into usable data as well as a couple of sneaky ways that you can get around things like rate limits using what are called hidden apis so this is super in demand tons of people are using this right now and this is basically like how you get through to websites like Redfin or Uber or all of these like historically UNSC scrapable websites and then I'll see if I have the time and if I do then I'll scrape some other Shopify website or something this is just one that I pulled up when I was searching um so we can use this one or maybe another one by the end of it you'll know everything that you need to know about how to scrape like you'll be better than 99% of the rest of the world at scraping sites and you don't even really need to know like HTML or anything like that because we're going to use AI uh to help us along with the process so without further Ado let's get into it uh this is just a little flow that I built out previously that I'm going to discard and then I'm just going to build a new one here and let's just call this um how to scrape websites with make. com and I'm going to call it this as opposed to something more functional like I usually do because uh I'm going to be like building this and then exporting the blueprint and then like rebuilding it doing this a couple of times depending on the website so in order to scrape any website with make. com what you have to do is you need to use the request module we've used the request module before in a previous video um but what matters here is go down to http so if you don't have this

Scraping a website with the HTTP module

just go HTTP I already have added it to my little doc my little favorite list here so I already have it down there and then click make a request and remember you know back when you were using API calls and stuff like that uh we used the exact same module so I'm going to unlink this and then just drag this over here and then boom got my HTTP maker request module and I'm just going to call it something like uh get and then let's just do like left click. a I'm assuming that we're building out a scraper whose sole purpose is just to Ping this website and maybe look for updates or something like that so we have a request module called get left click. the URL that we are going to call is simple all we're going to do is we're just going to paste in the website URL of the place that we want to scrape the method we're going to use is get uh we're just going to perform a get request to that website and then you don't need to worry about any of this stuff the body type you don't need to worry about par response because we're going to do that in a second and you don't need to worry about any of these um kind of advanced settings here okay so simplest module ever let's just run this and see what happens so we received a 200 status code which is positive that means that the request went through sometimes on particularly like high volume websites sometimes when you call with make. com as the originator or as the source of the request um it'll just like cancel you out right because it wants to be safe and it you know wants you to be like a proper uh user like browsing normally not like a an automated bot system but you can see what we ended up getting as a result of this was this huge list of HTML and it's pretty long scrolling all the way down to the bottom of this thing you see it's pretty long the F size 62420 I think like bytes um so you know obviously that can be pretty intimidating how do we actually get useful data out of this well first of all like what is that data is just the HTML the CSS and the JavaScript used to build this page and so when you get a web resource um what you're doing is you're getting all of that code now when your browser gets a resource like when you know open up Chrome and then type google. com it's getting the exact same thing it just has what's called like a rendering engine and so it can turn that code into the pretty elements that you see on the page here so for instance if I go back and then I look at this data this huge long string and then let me just find something here like automate your business and if I were to command F that so find you'll see that all of the text on the website is actually present in the uh request bundle that we received from left click. when we did the get request uh it's just like you know wrapped around with a bunch of formatting and spans and divs and all that stuff and that's not hyper relevant unless you're like a web developer so don't worry too much about that but what I mean to say is basically we've just called all of the contents of the website um and the reason why when your browser does it looks different is because the browser just has a rendering engine whose sole job is to turn that code into pretty images like you see on the screen okay great so now that we understand at least some of those Basics how do we actually turn this into something usable because certainly I mean if you were to go and then try and use this data for something I mean you'd have to parse the hell out of it you'd have to I mean like even if we just fed this string into AI it would probably do a reasonable job but it's so long it would cost so much money just not something that we really want to get in the habit of doing um what you need to do is just go down to the text parser and there are a bunch of Transformers here which can basically take an input of HTML and then turn that into some type of raw text and so I'm just going to use the simplest example this is by far the simplest example ever this probably isn't what you actually want to do in practice and uh we'll just build in complexity more and more as we get further along in the video but all we're going to do here is we're going to take all of the HTM Emil that we just received so all of this stuff and then

Text processing & turning HTML into plaintext for AI

we're just going to strip away anything that's not text what do I mean by that well if you look at the HTML here there's like span class inline H6 mb4 whatever and then it says automate your business so what we're going to do is we're only going to get the automate your business part we're not going to get all the stuff that wraps around it and that's what the HTML detex module does so I'm going to click okay you'll see I'm going to get a warning because now I'm using a Transformer as the last step in a uh in a flow and it doesn't really like that so I'm going to click ignore warnings and then run anyway and then if we click on the output and we click on text what you'll see now is all of that HTML stuff or at least the vast majority of it has now been scraped away and what we're left with is we're left with a much shorter string um instead of 62420 characters or bytes or it's probably like 5,000 so maybe like 1/10th or 115th the length um and it's just text and so what you could do at this point already is you could take this text and then you could just feed it into Ai and you could have it let's say tell you something about the website and that's what I'm going to do here because I'm going to pretend that let's say you know uh we're not getting left click. AA we're just getting like a website and maybe we have a list of websites and Google Sheets and what we doing what we're doing is for every Row in Google Sheets we're pinging a website and then we're getting a bunch of information about it then we're calling Ai and we're just telling it to like tell us something about the website that's a pretty common use case that a lot of people do uh usually in the runup for like an email campaign or I don't know some type of like value add in that way so pretending that we're doing that if I go down to add and then I go to open AI specifically I'm going to be using

Using AI to summarize a scraped website

the completions endpoint like we understand hopefully understand well by now the model I'm going to use let's just use GPT 4 messages for system let's just say you're a helpful intelligent um what exactly is this doing let's say web scraping assistant for the user message I'm just going to write let's keep this super simple nothing fancy here tell me about this website use Json format let's say about use the Json format below there you go um let's have it output the an about field which is just like a description of what this website is um let's do a I don't know we should get some other interesting information here um let's do a one lion Icebreaker and then maybe we'll look for the year updated as well yeah because usually on websites there is a little snippet down at the very bottom that says something like copyright 2023 copyright 2024 as you can see I haven't actually updated my website since 2023 so I should probably get on that cuz now it's 2024 you may be watching this in 2027 in which case head over to my website and confirm and if I haven't done that then yell at me um but all we're doing is we're saying tell me about this website use the Json format below and then we're having it basically give us an about so just a brief description of what the website is a oneline Icebreaker which I'm going to use in sales so you can kind of think of that as like the first line of an email and then we'll have it give us the year updated as well let me just make sure that this the formatting of this is correct I think I added an extra double quote there yeah I did okay great uh and then what we're going to do is we're going to feed in one final user prompt and that user prompt is just going to be the text output and let's just say using the plain text scrape below and then I'll say use the following Json format let's just try that and let's give this a quick run we scrape the website with all the code and we parse that to plain text like this and now we're calling the open a endpoint with that long string and it's taking a second because we are feeding it in a fair amount of data so let's go down to choices one message and then content Let's see we got an about that says the website is for a company called left click that provides automations and growth systems for B2B businesses left click services operating and subscription model and offerings include leg gen blah BL blah a Founder named Nick that's me seems heavily involved both in operations and client interactions the website also features positive client testimonials yet add one line Icebreaker it said left click is making B2B business growth hands off with a customized and Automated Business operation services so I didn't really understand what I wanted there what I wanted was something that I put into an email but you can imagine how we didn't train this at all we didn't provide any examples whatsoever it still did a pretty good job at like creating a oneline Icebreaker um so why don't I just to try and make this one shot oneline email intro let's do c customized oneline email intro and let's say write a simple on line email introduction saying something like I noticed your website was about XYZ yeah that should work and then what I'm also going to do is I'll go down to add and then type Json I'll grab um the parsejson module stick that at the very end then I'm going to feed in this choices string because I want to force this to structure my data in a way that I can use so we're getting the website again rece to 200 we're then parsing that into the same string that we had before we're then feeding that into open Ai gp4 and then we are parsing the Json and what we should get here is a list of information with an about which is pretty long and then that customized oneline email intro I noticed your business provides services to automate businesses and or your website provides services automate businesses and system and scaling towards eight figures not the best email intro in the whole wide world but you can already see how this would be useful we haven't even spent five minutes building this flow or building any sort of prompt whatsoever um you know if we had the time to put a bunch of examples in place if we had uh you know a specific email sequence that we wanted to fit this in you can imagine how that for the cost of like less than one tenth of a cent in tokens well maybe it's a couple cents in tokens we did feed in quite a lot of data but you can imagine how for the cost of that um you can automatically customize the hell out of some big email Outreach sequence right so it starts to get pretty valuable anyway that is the simp simplest scraping setup that you could ever imagine to do um for instance what you could do with this information is you could add a Google Sheets module at

Building a Google Sheets integration

the end maybe what you want to do is instead of just send an email you want to add a row to a pre-existing spreadsheet and in that pre-existing spreadsheet you want there to be a website URL column an about a on line email ice Breer column and then you want to want there to be like a year updated column and what you can do is you can take that information maybe hand it off to your sales team or somebody else in the company that's responsible for business development you can do all this in 5 minutes right which is what makes it so valuable okay any uh I'm going to save this and then I'm going to say add scrape websites with make. com we'll just call this one and the reason why I'm doing that is because I want to be able to save this blueprint for all of you later so I'm going to export this blueprint that looks pretty good and now I want to take this in a more structured Direction now that we know how to scrape you know one simple web page using uh an html text parser I want to get in a little deeper and look at how to build a scraper that automatically scrapes product listings on a website like redin uh this is an extremely common use case a lot of people want to constantly ping a data resource or something and then use that to fill up a spreadsheet that maybe somebody on else on their team manages or can use for lead gen or I don't know I mean in my specific case the red fin scraper that I developed um and I actually like built a red fin scraper for a client which is why I wanted to show you guys this I wanted this to be as relevant as humanely possible to ways that you can make money um they want to buy up properties that are below a certain price per square footage and redin allows you well they don't technically allow you but if you're sneaky you can ping redin get a list of URLs and then ping every one of those URLs separately to get a bunch of data about that specific web page um the way that this is going to work is it's going to use a very sneaky principle of webdev that not too many people understand called hidden apis and so I'm going to look into how to retrieve data from a hidden API which is like a regular API it's just you're not really supposed to legally be able to use it uh and then I'm going to show you how to transform that into a format that you can use to like populate a spreadsheet or something like that so first things first I'm going to clone this puppy we're going to say how to scrape websites with make. com 2 and then I have the same flow that I had earlier what I'm going to do now is just delete all of this and then and delete this too and

Scraping a structured data source (Redfin)

then I'm just going to check out my data source so what I want to do is I basically want to go on redfin. com and then let's pretend that we're just scraping data from the San Francisco source so uh redfin. com what basically happens is when you type in San Francisco and press this little uh search icon you'll see how the URL at the top is changed now it says redfin. com City 17151 a/ sand- Francisco understandably they probably have some data structure in their backend that when you call this URL the server retrieves a request and then it parses out okay we're looking for uh city 17151 in California with the title San Francisco it then probably just you know on a whim here um calls the database with that information and then receives some list in like as sending order of uh recency and then it pulls that and then displays that on the web page right I think this is pretty basic um it's you know you don't really need to be a developer to fully understand that part that's sort of just how the servers are working under the hood what you end up with is a couple of views here there's like a map view on the left then there's sort of like this listing view on the right and to me if I were looking for a red fin scraper or you know valuable information um this is what I'd be concerned about the stuff on the right you see that there is a bunch of summary information so you get like the price to the listing um wow 1. 65 mil Jesus and I thought Vancouver homes for pricey uh six beds three baths 2,920 ft whatever you get some summary information up here and then presumably if you click into the listing we now get way more information on it you get like an about this home how many days has it been on red fin what type of home is it when was it built that's like 18 years old Jesus uh and then I'm going to stop being surprised at s Fran real estate prices and then if you go back to the URL you'll see that now we have like another URL pattern it's like redfin. com slca sanfrancisco the actual address slome some number so presumably what we're going to do is we're going to split this into two parts the first part is going to call just this General San Francisco page and then it's going to receive a list of links and then what we're going to do is we're going to call every one of those links individually so that we can get all of that good tasty data like what we have down here we're then going to parse that data we're going to get a couple of specific fields that we care about like I care about the price so I'm going to parse the price um maybe I care about the type right multif family or something uh maybe I care about the square footage so I mean I'll keep it really simple I only do like one or two Fields so this video isn't an hour uh and then I'm going to take that data and I'm going to dump that into a Google sheet that I'm going to set up for that purpose so that's what the flow is going to look like and you can imagine how you'll be able to massage this flow into like literally any use case you could ever possibly want very high Roi stuff um I'm probably going to make a couple mistakes while doing this is the first time that I've done something like this in quite a while um but I want this to be as informative as possible and just to show you guys like what somebody that builds these sorts of things for living is sort of thinking about and looking for um as they do so yeah let's get into it um this is a cup that one of my clients just gave me by the way it's a Yeti tumbler beautiful okay um so first things first let's call this resource and let's see what happens so it just paste it in that San Francisco link I'm going to call it you'll see that this request took a little while it was 200 error uh 200 uh status which is nice but it still took a little while which tells me it's probably pretty big and yeah that is really big I'm scrolling all the way down here good God how big is this resource 2,685 718 and you see how it even had to like truncate that a little bit so the actual amount of data is a little bit longer if you remember back when I sced my own website I was at about like one I think I was at like 60,000 or something if I'm not misremembering which is like one what like four 40th or something of the length of this website so first thing that tells me is it's probably going to be unreasonably long if we just try using the text par meod that I was using before so I'm going to do HTML to text we're just going to see what happens when I stick that text in there again the Transformer should not be the last module on the route that's okay don't need to worry about it let's see what comes out so you see we receive a bunch of login sign up you know here's the nav bar information bunch more info San Francisco popular searches blah blah uh and then it looks like after a certain time we actually start getting into the listing so all filters there are 1,13 homes you can sort by recommended photos table this is all just the text that's on this website over here presumably and then it looks like you have a link actually already so you have square bracket slca sanf Francisco unit 702 slome okay that is actually probably all that we need as is so that's great um what I'm going to do is I'm going to open up redfin. com and then I'm going to just try pasting this in I'm going to see what happens okay great so now we actually have the link which is what we wanted initially so um so yeah that's basically what we wanted a list of links and then for every link we wanted to do a bunch of fun stuff with it so that's pretty simple um the question is how do you parse this in presumably a giant SE of links I mean this is really long right we got one link up there um we got another link right over here so what I'm looking for now is I'm looking for commonalities between these links it looks to me like they all have a square bracket slca San Francisco so I can probably build out a Rex if you guys are uh if you guys remember from previous videos so I can probably build out a Rex for this information using something like this and then maybe I'll have like a DOT star and then I'll have another square bracket here um to constitute the end of that the dot star will match any text in between sand- Francisco backslash all the way up to that square bracket and then I should be able to export all of those results and then let's see how many we got so it looks like we get about 40 every time we call which is good so that seems like a pretty solid data resource I mean just uh from the top of my head the San Francisco Market's probably populated enough that like around 30 to 40 listings probably go up live every day at minimum maybe more um we didn't apply any filters to the search on redin you see up here at the top you know if you're using these listing websites there's usually a bunch of filters that you kind of muck around with or screw around with um we didn't do any of that so this is just like literally any home in San Francisco so 30 to 40 homes is probably okay I can imagine that we're probably going to want to call this resource maybe once every two or three hours to start uh and then see uh whether or not like we're doubling up on queries and that sort of thing so that's great um what I'm going to do then is I'm going to call this module parse HTML as text let's just do plain text I want it to be as simple as

Using Regex to parse Redfin

possible get all SF listings let's just call H yeah no let's leave it at that and then I'm going to use the text parser module and go down to match pattern here so it's going to be a regular expression and then what I usually do when I'm designing Rex's is I will take a big list usually at least two examples of what I'm looking for and then I'll go down to a website called Rex 101 which is just a way that uh you can build them out really quickly what you do is you paste the test string down here and this is the string that we are going to be searching and then I'm going to try and build out the Rex over here you could of course also just ask chat gbt to do this um I know enough about is that I can probably build one out in about the same amount of time or faster and I think it'll be like reasonably good okay so we actually don't uhbe we should escape this and I want to escape all of these and then let's see we have two matches here which is what we want and then I'm just going to do period and then this actually don't think I need to escape this do I oh yeah I do need to escape this okay and then what I'm going to want is this so um we have an incomplete group structure I don't really know what that means if I'm honest see if maybe I can do that an unescape delimiter must be escaped in most languages with a back slash right okay so that's going a little too far you see how that's selecting from the beginning of this bracket and it's going all the way to the end of that bracket I believe that's because it's on a greedy match I think it's called so if I wanted to be UNG greedy I go back here okay great and then I have the URL and really what I'm looking for here is this group one and you see how I've got two results from this I've got one with the ca San Francisco whatever slome then I've got another one here with this other Home URL just so just for my own sanity I'm going to take this in this make sure that works okay good I'm going to do the same thing down here paste this okay great so I got another home so that's great so these are two um you know I've basically called the general listings once and then I've gotten just a list of all of the listings I'm now going to parse it out so I have like an array of just the URLs and then I'm going to use an iterator to iterate through that list of URLs and then just call the same sort of idea I'm going to call every one of those URLs once extract a bunch of info and then use that to update a Google sheet so may seem complicated but stay with me on this one um it'll all make sense quite shortly so I have two here why don't I test this out on the rest of this data so I'm just going to go all the way down to the bottom copy that in and again just sanity check myself the reason why I always like to sanity check myself over and over and over again is because I've been in the habit of building these systems out um and then like realizing that my Rex doesn't work on all the data or whatever but I just paste it in presumably like another big chunk and it says it that it's found 32 uh which is great I'm just going to randomly select one again just make sure I'm not wasting my time here okay great yeah looks good I've tested three out of these randomly and uh that seems to be okay uh one thing I need to keep in mind is that I need to make sure that my quantifiers are lazy so that's that may be an issue that I run into at some point so now we're going to say extract URLs let's say extract individual URLs and then I'm going to go up here and then check out the pattern I don't know if there's an option for greedy yeah it doesn't look like so we'll have to play it by ear I'm going to take this long string paste this in here um case sensitive yes multi-line yes single line yeah no continue the execution right the module finds no matches I'll say yes for now the text I'm going to feed in is going to be this text and then let's give this a try now um ideally what you probably should do actually why don't I do it ideally what we should do is while we're testing we're going to call this API or this list this link a bunch of times um hell I'm probably going to call it like 20 or 30 times over the course of the next like 15 or 20 minutes just cuz I want to make sure this works right but you can run into issues where when you're calling one resource over and over and over again uh you can run into issues where they'll either like isolate your IP and then block you or something like that and So to avoid that what I'm going to do is I'm actually going to set a variable here and I'm going to call this HTML and then the value that I'm going to stick in here is I'm just going to hardcode it for now but eventually I'm going to use the actual HTML okay I'm having trouble with this just bought myself a new um Magic trackpad which is quite nice but it's also very slow uh okay so I can't just copy and paste it directly from the module output there so what I'm going to do is I'm actually going to download the output bundles and then I'm going to get all of the text or all of the HTML here um and then what I want so I want this and I want to put that in make oh unfortunately when you copy something this massive cuz this is massive right it's 2 point whatever million bytes takes a while to copy so just be wary of that I'm then going to try pasting it all in scroll all the way up to the top uh and then I'm just going to remove this okay it doesn't actually look like it copied at all probably ran into some like character limit but that's okay we uh we copied over the vast majority of the website so I'm fine with that uh and basically what you do uh what this prevents you from having to do then is you never really have to call that Resource as long as you use this as the source that you are pulling your data from and then when we're ready for um to actually use this in like a real flow all we do is we replace this hardcoded test data with the actual output that we're looking for here so that's what I'm going to do just for my own sanity and just to prevent me from having to re-record this video later so now that we have all of that text what I'm going to do is parse the html's plain text and then try and extract the individual URLs uh and we're going to see how that goes so called it you see this is empty which is telling me that there's some issue with the Rex that's preventing it from actually effectively scraping this so um let's find out what the problem is I'm looking at these URLs here and it looks like there's actually a quote and I don't remember there being quotes before this may be an artifact of the fact that I'm dumping in text here um let me see I don't want to be crazy let's call it one more time and see if we uh if we got the results oh okay well yeah that actually did work um so what's happening is when you dump um the results from this get all SF listings module uh it escapes all of the backslashes with a quote and so when when you actually turn that into like a string variable in make. com it'll do that so A very sneaky pesky error but because of that our Rex in work and we would have had to put another quote sign there I'm actually lazy so I'm not going to do that so forget everything that I said a minute ago but yeah you know ideally when you're doing it on your own end you would uh you would make sure that you're not quering the API or the resource unnecessarily okay so it looks like we also got a little bit of additional information here that we didn't want and the reason why is because uh we couldn't put in the greedy or the UNG greedy token so if you check it out what we get is we get the output is a giant array of or uh it's a bunch of bundles essentially and in every bundle is um like a Json or a JavaScript object notation with a value called I which I imagine is just the index and then a dollar sign one which is the name of the first match that we made using Reax now it looks like if you look at the link here we're not actually just getting the link we're also getting a bunch of additional text so the link that I'm interested in is CA whatever all the way up to here U but it's actually dragging way more of it in there and the reason why is because uh it's using what's called greedy Rex uh which tries to match the longest end of that string uh and that's not what we want it just to grab one so okay so a quick and easy way that we could probably do this and one that I might consider a little dirtier than usual um is we can get that text and then we can just split it based off the presence of that uh right word uh square bracket and then we can just grab everything to the left of that and because I've examined you know five or six at this point as you saw me scrolling through I'm pretty confident this is only going to get that URL and then we should be good so what that might look like in practice is okay we've gotten all the San Francisco listings we pars the hml's pl text we've extracted the individual URL some of the URLs are too long so what we're going to do now is let me actually just test this first uh let's set a variable let's call this test URL I'm going to what I'm going to do is I'm going to select the dollar sign one I'm going to split this based off of that right bracket and then I'm going to get we're just going to grab the first result what this is going to do is it's going to take all of the strings in that dollar sign one variable all 30 or 40 of them or however many that we're pulling because it'll iterate through this one for one and then it's going to split them based off the presence of that right square bracket and so if exists then we'll have two elements in which case it will it'll get the first element the one on the left um and then if it doesn't exist then it won't so um that's good that is intended behavior for us let's click run now and see what happens okay great this is looking good uh so we chucked through it 40 times so first test URL worked second test URL worked third test URL fourth test URL fifth sixth seventh okay I mean obviously it's done what we wanted to do let me just do one sanity check at the end here uh okay great and we'll know pretty quickly if it doesn't work let's pump that URL in good awesome so now we're at the point where we have the individual um URLs that we're then going to call again and what I'm going to do now is I'm going to test out what it looks like when I call a specific one so I'm going

Advanced Make.com scraping—two HTTP calls in one

to type oops I just called the same URL twice let me stop that um we just want redfin. com one of these URLs so let's just grab this one I already have it in my URL bar okay and let's see what sorts of data we get when we call an individual listing now and we can make a determination about what we want to scrape specifically uh okay I got a 403 not a good sign um for those of you unfamiliar 403 error essentially means that uh we don't have some cookie or something that we need uh alternatively we could be um let's see alternatively we could have put the URL wrong uh there's a one final reason why obviously we could have been rate limited we could have put the URL wrong or we could have not had headers or cookies um that like make our requests seem like real so that makes sense uh it's one of the reasons why let me just try doing this from my own URL yeah so it works from my own call uh and let's try pasting this in here doing this one more time seeing if maybe you ran into that local rate limit no usually when you receive a request that is satisfied as quickly as this it just means that you've been denied uh and that's understandable so there are a couple ways that we can get around this and I'm glad that this happened because now I get to show you guys how to like sneakily divide and conquer you know to make it around these uh these rate limits the first is we need to put a bunch of cookies in our URL that mimic our browser and so the simplest and easiest way to do this is head over to the page that you want to hit up so that's this one right here then head over to inspect elements going to zoom out here because I'm way too far in and then head over to network now if you're unfamiliar what the network tab is it's like a real-time display of every request that is being sent or received from your browser in real time um when you're accessing a web resource of whatever kind and so in our case what's happening is every time that we enter this URL a bunch of requests are being made for specific resources like the header image for instance maybe some other listing images um some CSS some JavaScript on the page right all these tiny little things that you normally never have to think about because the browser does it automatically in order for those requests to check out you usually need to have what are called headers on the requests that you are sending and headers are basically just like a unique identifier like a fingerprint um showing the server that you are a person and not a robot like what we are attempting to design now the good part about this is a lot of servers are very easily tricked and you can just reuse the same headers that you use in your browser over and over again uh without there being any issues the problem with our request right now is if you go back here you notice that we have no headers whatsoever so we're not actually sending anything and so we're going to try and get around that and try and be a little sneaky um so what I'm going to do is I'm going to go uh let's not preserve the log so I'm going to eliminate all of

Using cookies in request headers

these Network requests then I'm going to press return and I'm going to scroll all the way up to the very top of the page and just see exactly what I am calling here and then as you'll see the network tab has another tab called headers in them um headers show you what you are sending the uh the server resource okay so here are request headers looks like we're sending one called accept with text HTML all this stuff so what I'm going to do is I'm just going to copy over a bunch of these including the cookie uh because we're not actually signed in and then I'm going to see what happens the platforms Android that sounds like to me uh yeah and then we're going to try pumping these in so I'm going to go back over here to my make module and then I'm going to go to headers and then you'll see there's a two-part format there's a name and then there's a value and so what we basically have to do if you look on the left hand side here the name is the text over here like server and the value is NG NX for instance so U these are response headers that just means that we're receiving them from the server and then these are request headers that means these are the ones that we're sending so we want to do is we want to mimic all of the ones that we are sending so you see here it says accept text HTML application whatever so we actually want one called accept that has all of this information I'm going to add another header uh accept encoding some of these are already like default added um to every request that comes out and I don't remember if you actually need to include this um uh this colon here so uh we're going to try both with and without the cookie is probably the most important one because a lot of the time they will like specifically browse or fingerprint you and they'll store the results of that in the cookie so they'll just store some really long string and then if they find that string then good uh and then user agent is another really common one too so I'm going to copy that one in user-agent and then we're just going to paste in everything that we had here uh you'll see there's some additional info which doesn't look like it's relevant don't know where that came from so I'm just going to scroll through and make sure that data wasn't pasted anywhere else okay great um then if you scroll down here sometimes there's uh a few other settings that you can change that request other headers so for instance if I were to click body type application whatever then that would actually be adding that as a header I'm just hardcoding it in right now because I want to see uh if this works okay so we are now going to call this URL again but I'm going to do it from make. com here it's taking a fair amount of time now which is usually a good sign and as you can see the status code is now 200 uh and so we actually get all of the text here that uh you know you'd otherwise get on this page so yeah that's more or less how to like sneakily navigate a lot of these simple 400 errors that you're going to get like the unauthorized resource type things um I do this all the time and I usually just copy over my own browser credentials you can be even sneakier and so you can like generate header uh fingerprint I think that's what it's called and there are a bunch of scripts out there that do that for you um I thought there was just like an online service but I guess I was mistaken but essentially what these services will do is they will randomize like your uh like the browser that you're using so maybe instead of chrome it's like Firefox they'll randomize like the page size so that you know a lot of the time you'll pass that in as part of the cookie um they'll randomize just a bunch of stuff and basically make it impossible for the browsers to what are called fingerprint you um so something to keep in mind you don't necessarily need to know it but might be worth thinking about it and then uh what I'm going to do here just because I've gone through this whole rig roll I've done a test URL and I've um I've ran through this three or four times is I'm also going to provide a weight because I don't want this let's just do this directly in here uh let's get one I'm also going to provide a sleep basically and so every new individual URL that comes in we're going to sleep and we're going to try sleeping for a fair amount of time um I guess for testing purposes should I sleep let's sleep after that's smarter actually let's sleep like 3 seconds before just so that there's a little bit of a delay in the request that we're making and then afterwards let's sleep I don't know like 30 seconds or something then I'm going to Auto align this and then I'm checking the URL here it's redfin. com and it should be ca SL whatever and so we initially split the source string and then we split it by the presence of that right bracket character and then we're getting the first record that looks good to me we got all of our cookies and everything else like that here which is nice uh and then what we want to do is I'm just going to stick this at the very end we also want to parse the html's pl text and just see what sorts of cool sexy fields we got as a result of that so I'm going to pass in the data from this and then presumably I'm just going to run once is a test uh presumably what's going to happen is we're going to then get the text Data the plain text of a particular page and then we can do more or less the same thing that we just did and then just dump those results into a Google sheet okay again you can use this exact same approach on any e-commerce website that you want in fact e-commerce websites are way easier to do than like a very popular Global resource like redin simply because they usually don't have the data security protocols in place to like verify your headers or anything like that so you're probably not going to do any of that stuff I would always get in the habit of adding some type of sleep you can get pretty fancy with this if you really want to like scrape completely undetected you could for instance add a random weight time um on a normal distribution of like human traffic visits so you could try and model that based off actual human data uh and then have your weight time not just be a flat 30 seconds or something um but you know we're keeping it pretty simple I want it to be I want this to be as accessible as humanly possible uh so that's what we're doing um I called this 30 I usually just like writing the length of my sleep directly inside the module title like this um just because it keeps it really simple to understand from a bird's eye view and then let me just double check all this okay good this looks good let's try running getting all the SF listings up here that was a 200 got them all successfully sleeping 3 seconds we're getting the SF listing from one of the individual links here and now we're sleeping for 30 seconds so let's check this HTML plain text while we're doing it and let's see what sort of information that we can get this looks to be the text of that specific page so we could grab the Google Maps entry it looks like so let me just turn this off because I think we've demonstrated this works uh we could grab the Google Maps entry it looks like so maybe this is relevant information for our sales team or something let's see what that might look like uh we're missing a parameter called size so I don't actually know if I'm selecting this correctly maybe I'm not um we could grab like the price we could grab the estimated mortgage payments let's think what's the simplest thing that we can do to demonstrate how this would work uh we could obviously grab all of this text uh we just have to build out like an appropriate Rex for it okay I like the property yeah I like this is easy uh because we just have tabs like property facts sale history tax history um that be pretty easy we could do Transit we could do walk score that's neat what would be the most valuable here for us looks like there are also some other links afterwards so I imagine if we go on a particular page and then scroll all the way down there'll probably be like some recommended similar listings yeah that's what these are so we don't want those so basically everything up until this point here um we're good we don't need to continue scrolling down because everything after that is just like other listings so of this uh I'm going to grab let's do payment calculator so let's do estimated mortgage payments and then let's do a line called tax history and then there should just be a cost as well where we can get just the dollar value um unfortunately make doesn't really let me do this by yeah I can't really drag this out anymore so that's unfortunate um why don't I just paste this in here and see what the information was okay I want this paste that in okay so it looks like the price is sandwiched between a number of space and then beds and then a left bracket and I imagine

Parsing & splitting HTTP call results

just based off of what I'm seeing here this is very like consistently formatted um that format will probably work in the vast majority of cases so I'm going to build out a Rex for that data and we're going to start with price and once I have all of the variables I'm going to add them to a set multiple variables module and then I'm going to update a Google sheet um so I'll have to create a Google sheet as well so let's set multiple variables the first variable that I want I'm going to call Price and then for now um I'm just going to leave it at that I don't know if it'll allow us to yeah it will allow us to um the other information that I wanted not remembering off the top of my head I think it was mortgage estimated mortgage payment let's do mortgage yada yada okay estimated whatever dollars per month so next up is going to be estimated mortgage payment let's just call it its full name and then the last thing what else might be useful that we could grab uh let's just like copy over this um the sale history that looks nice and we'll just do like uh we'll just copy over all the texts so we can get in the habit of doing that okay great um so now and there are multiple different ways to do this but now I'm just going to create three rexes one for everything that I want so one will be for the price the other estimated mortgage payment the last one will be for um that sale history let's just Auto align this and then let's call this extract price let's call this one extract estimated payment and then sale history awesome we're then going to set the outputs of these so price is going to be here um estimated payment and then sale history I'm going to also add a Google sheet module it's good to just scaffold the stuff out ahead of time and we're going to say add new row to sheet great so this looks pretty self-explanatory um I'm just going to go through now and make manually find the Rex's and then I don't know how long this video has gotten but I'll just look at the time and then if this is a little too long I'm going to jump to all the Rex's being done in 321 click okay assuming that I didn't do that uh let's see how we're going to do this so there was a million dooll Mark here I'm just going to copy over this text pump it into re X 101 and it looks like I'm looking for a new or a space character then a dollar sign and then an amount so I think this would be slash D star although I'm not entirely sure let me see what is a digit character any digit dashd oh you know that probably that's probably not going to work because this is sometimes going to include a comma so why don't we just do this and then we'll go dpace beds yeah that looks uh that should work I think so oh there actually is no sorry I meant uh let's do this wrong one okay and then we want a dollar character number what we want is we want to just extract this number here but it's not picking that up for some reason let's see why there we go it's because the dollar sign is a specific um it's a specific sort of value there okay great so we've now extracted this going to go back to the price and then I'm going to pump this in I'm then going to take my example which is over here and then just run this and also I have a million in one tabs open it's probably getting pretty complicated to see what's going on so let's paste this in and see if our Rex found the price successfully it did that's nice um now let's look for the estimated payment it's right over here so it'll be a space character then a dollar sign and then it'll be and then a back slash Monon um it's going greedy so it's not what we want presumably what it's going to do is it'll have a space character immediately after that so we can use that to select this Rex here for estimated payment and then I'm just going to copy this in and then run another test looks like it selected the estimated payment correctly and then sale history um we got to do another test which is unfortunate so I have to Ping the API again unnecessarily let's just see if they all have sale history okay let's just call most recent sale history so I'm going to run um you know what let's run this whole thing and just because I think I know what this is going to look like I think it'll be an n and then I want everything so I think what we're looking for is this and this is me being sneaky I'm not necessarily saying that this is the right thing to do um that should possibly work so I'm just going to stick this in and just give it a try and then what we're looking for is an output I just realized was wrong previously uh this is currently referencing module 8 so I'm just going to go back here and then reference module L7 and I can just copy that in and then paste this into the other three modules so copied oops let me just uh go back here that one's right and then this one's still lacking and then this one looks good okay great so now that I have everything and everything is being pulled correctly I'm just going to run this again this is taking some time so I imagine this is going to be a 200 nice going to sleep 3 seconds we're going to call the individual listing which looks good we have the three rexes here it looks like it's extract multiple oh yeah we're running this too often so I'm G to pause that now before we start getting a bunch of 400 errors usually with apis you can do anywhere from like five to 10 or um five to 10 calls a minute but I like to be safe okay so we pars the HTML let's see if the price was extracted correctly okay so price was uh extracted correctly on the first go and then on the second go It also says last sold price so we're going to want a way to um select only the first entry as opposed to all of the entries and I think that if I check this out there'll be other records that are similar yeah um extract estimated payment it's pulling in a lot of prices here which is not good so uh extract sale history let's see if there's anything list listed on February last sold on last sold okay actually yeah turns out my total random pick here worked so now I'm just going to create a new Google sheet you can do that really easily just by writing sheets. new by the way I'm going to call this um redin scrape data and then I'm going to add in a bunch of information here the first is going to be URL the second is going to be um what would we want like price third was going to be mortgage let's do estimated payments and then we'll just do like sale history now you can imagine

Testing Google Sheets integration on real data

how you could build this into an error table or something extremely easily um doesn't have to be on Google Sheets I'm just using this because Google Sheets is always the simplest place to get started a lot of the time what I'll do is I'll actually scaffold my flows in Google Sheets I'll verify that it works on just like a data first perspective and then I'll dump everything into like a more enriched CRM or project manager like air table or clickup or whatnot um but it's just a simple and easy way of you know testing your flow using simpler tools uh and that's you know particularly important if um you know you don't want to spend money on these okay so just because I have um a bunch of account set up what I'm going to do is just share this with one of my other accounts called Nick left click. we're going to give him full editing access Run for the hills neck and then what I'm going to do down here is I'm going to go select that user just because I already have that off I'm going to enter manually or sorry I am going to select by path uh I'm going to go to shared with me because I just shared that with myself at Nick left click. from another email address and then the one that I'm looking for is called redin scrape data which is right here and the sheet name is going to be sheet one table it does contain headers which is going to allow us to then iterate over all of these variables so first thing that I'm looking for really is the URL so I can extract the individual URL here and then I can just prein redin to this so redfin. com uh the price I can pull from this value the estimated morg payments from this value and then the sale history from this value this will be just plain text uh these two will be numbers I think we should be able to format them correctly and then after I'm going to sleep for 30 seconds let me just make sure that all of this is formatted the way that I want it to be so price and estim ated history I'm just going to go down to format number and then we're just going to do accounting and this should allow us to automatically account for those little commas because otherwise commas can be a massive pain in the ass now if um it doesn't account for the commas we can also just pass that through another step where I just replace the comma with nothing and then it'll be rendered as a number but I do like to just do this as a sanity check um let's just make sure this isn't being oh yeah and let's just turn Global match off for all these that means that this will only select the first entry on the page with the text that we want the reason why this is important uh on these modules is because um we're just looking at one page so odds are the text isn't repeating we needed a global match back here because we were looking at a giant listing that's going to have similarly formatted links okay great so now that everything's good to go let's click run once and let's just see if there are any errors I kind of hope there are because I love debugging these things and I've received a lot of feedback that they're valuable okay let's go over here and doesn't look like we dumped in any of the variables which is not a good sign yeah it doesn't oh JavaScript is disable oh yeah okay great so we're actually getting hit with a rate limit so we got a 202 here we'll see that javascript's disabled this requires JavaScript enable JavaScript and reload the page so this is another example of sort of like automated anti- scraping behavior um they're doing this just because we've tested so often and a short period of time that like we're hitting the rate limit we're running into the upper limit again um so you know a couple ways to solve this initially the first is obviously just when you test have some variable that you're calling instead of the URL or the API over and over again so just like what I was showing you before I was just too lazy to deal with the commas um the second is make sure that you have like authorized request headers just like I'm doing over here um and then the third is I actually have I built this out on red fin elsewhere so let me just log into that and let me see if there was another API that I was calling that maybe would help me get around this issue let's do that let me see my gut tells me probably I think we're doing this if I had to guess correctly see that there were some warnings here yeah so it looks like I'm just uh just stuffing a few additional headers so let me try doing this and seeing if we could just stuff a few additional headers and have that work a few of the headers that I have here that I don't have here are accept language so I'm just going to paste that in and then there's also cache control this might be valuable this might just discard um any cached data uh it also looks like I'm requesting text plane um as opposed to HTML so that might have something to do with it let me give that a try although I think this may sort of muck up the rest of our uh parsing but whatever we'll give it a try and then what I want to do here is I want to add a bigger sleep between these two let's do 10 seconds so that we're not calling red fin twice from the same place because it looks like this first one worked it just looks like the second one is a little bit suspect um and then for my sanity I'm going to go back and then I'm going to extract one of these URLs let's do like this one here and then I'm just going to test that one um just so I don't have to like you know call a million in one URLs okay let's run this module we received a 200 request we're going to get nothing here because I didn't actually call a link um why don't I just do one specific URL here so this one paste this in my URL and this looks good so we should get information on Liberty Street uh the price should show up is 1. 7 mil um wait one bed four bath man am I living in another world or something that's crazy and then we should also get some sale history information so uh we should get at least one of these that the price change that it's listed active ideally we would get the most recent one because we chose to stop after the first um uh after the first uh match so we'll give that a try you'll see that this URL is not going to work so uh the reason why it's not going to work is because we are hardcoding in this so let's just do that and then let's try running this once we received a 200 request there so I imagine the additional entries that I put in probably made a difference looks like we couldn't find the price so there's something that we need to debug there and it looks like we could find everything else which is nice so we got the URL over here we got the estimated mortgage payments um the formatting doesn't seem to be correct so let me just jump in and then make everything accounting you can do that pretty quickly on a Mac by well when you know the keyboard shortcuts un like me um by going control option o n and then a so I go back up here you'll see that the price is now tabulated correctly um and then this URL uh I guess was the first one that we ran so I'm just going to delete that doesn't look correct okay so last thing we have to do is figure out how to get these price entries in this database successfully so let's see what the issue was with this we fed in a price I'm actually just going to copy all of this paste this in here then I'm going to go grab the Rex now it may be because um that only had one bed whereas the other ones had multiple beds yeah looks like it so you see how I wrote beds previously and I just told you that this place only had one bed and then four bathrooms um so because I was searching for beds with an S essentially it didn't exact match that correctly so my actual reject should just be bed so I'm going to go back here change that to bed presumably not every place is going to have more than one bed um because we don't have that filter set up ahead of time so this is just a good way of ensuring that you know um everything you need is uh is parsed correctly so I'm going to go through here and then just delete this row and then um just because this Rex thing or this uh red fin API looks to be a little bit I don't know it's just a little um uh sensitive I want to say I'm just going to add like a very generous sleep of 60 seconds you can imagine how you could just schedule this to r on once a day maybe at I don't know like 6:30 a. m. or something like that and then um you know 60 seconds on 40 records would take you about 40 minutes from start to finish it's important to note that make times out after 45 so that should allow us enough time for the rest of these calls to be made and stuff like that um but yeah you know you can obviously run it more often than not and then oh one thing that I didn't check was we need to check to see if the record actually exists in the Google sheet first before we add it that's important and that's the last thing and I'll do so let me go down to search rows first and then we only want to add this if oh and you know what I kind of want another column here I'll call this like a date created let's do that yeah not having this would be silly okay so I'm going to go back here to Nick left click. I'll do shared with me I guess just red fin scrape data there sheet this does contain headers so the filter I'm going to be looking for is I'm basically the URL and I'm going to look to see and essentially I'm going to Ure that the URL does not already exist somewhere on this listing so I'm going to go over here and then I'm going to um get split and then we're going to do this again then the value that I want to check for and we should really be doing this in a variable um instead of just like dumping it directly in there that'll be a lot smarter because then we can also reuse that variable yeah let's do that we're going to say individual URL the value we're going to stick in is going to be this we're then going to paste that in and then I'm going to get to reuse this back at the very end URL so what we're going to do is we're going to check to see if there is a URL that is equal to this URL that we just pasted in uh except I think we're also going to have to put redfin. com yeah and then if there are then we're not going to proceed so I'm going to say not found and then we'll say uh total number of bundles if the is greater than one or greater than zero essentially uh then we're not going to proceed so we basically want the total number of bundles to be equal to zero and only if the total number of bundles is equal to zero we're going to use numeric operator there and not a text operator um are we going to proceed with this flow because if the total number of bundles is equal to zero then this did not return anything and if then that URL does not already exist in our database um what we should do is we should add the sleep before this as well so that if the URL is not found or sorry found it's not just dumping two of these in you can imagine that that'd be sort of annoying okay and then I'm going to Auto align this and then I'm just going to say set URL oh sorry actually it looks like I already did uh stick the red fin thing here so I actually don't need any of that I can just go individual URL there nice uh this looks no we can just paste the individual URL in there okay that was a lot of jumping around uh

Final test of Redfin flow from end to end

should work why don't we give it a try only one way to find out right getting all the SF listings Over Here html's plane Tex set the URL this URL looks good I'm sleeping a little bit I'm now calling this um HTTP request module I think I set this to like 10 seconds yeah I did hence why it took a little longer we received another 200 which is nice uh we're parsing all of these fields looks like we got the price correctly we got the mortgage payment sale history correctly we set these three variables uh we're then sleeping for a full minute so that's unfortunate but whatever we'll wait for a minute as we uh as we dump those records in so everything here looks good and I think that the scraper is basically good to go now um there's only one final change that I would make uh and that's just to ensure that the scraper uh you know just never really errors out or never really has like issues related to rate limits and those are called error handling modules now I haven't discussed error handling before but air handling is basically just a way that you can put in some contingencies so that if your make flow ever you know 400 errors or if there's ever like a rate limit like what we saw earlier or maybe if that JavaScript warning comes up again um it's essentially we can just intelligently retry the flow over and over again until uh you know that flow goes through and so there's a quick and dirty way to do this so let me just cancel this out looks like the records were dumped incorrectly oh and then I didn't update the date created record right silly Nick well for the purposes of um exporting this I will I'm just going to put in a date record and I'm just going to call this now there you go that should be good and then we will will um format this as a date so that should technically be correct any who um yeah so what I'm going to do here is I'm going to click on this little gear icon and then I'm going to click allow storing of incomplete executions I'll click okay and then on the left hand side of the tools um modal here there are a bunch of different error handlers that you could use under directives and the one that we're going to be using is called break now what break does is break basically if there's a warning or an error with that module it doesn't work then it will um depending on your instructions store the execution history of that uh it'll store the history of that specific execution so store all those variables and then it'll wait a predetermined amount of time before using them again to try and finish that execution what do I mean by this um basically anytime you're calling an API or anytime you're calling from a request module or something like that you do have to understand that like the Internet isn't perfect uh functions pretty damn well but like your up time is not always going to be 100% And so if you don't have an air handler like this then eventually over the course of like the days weeks or month that you're operating scraper there will just be some issue with the website it'll airor out and if it airs out three times in a row it'll just shut your entire thing down what breakes allow you to do is they just allow you to retry and so you can see there's a number of attempts module or um selector here so I could say hey I want you to retry 10 times and the interval between attempts I want to be 15 minutes now you can actually make this as long as you want you can see you could liter to make this 44640 minutes uh if you wanted to like really be sure that the red finin API isn't going to like air you out obviously we don't need it to be that long but what I'd recommend is you have some number in place I usually just stick with the default 15 um you know maybe if we want this to be really safe let's do 30 and then the number ofs attempts being every um essentially retrying three times that sounds good to me so I'm going to leave it at that now you don't need a breake module for every single module in your flow but you do need it anytime you're calling an API you're calling a web resource so I'm going to put that in and then what I'll also do is I'll put this in my Google Sheets I don't expect Google Sheets to break I don't think it ever has um but anytime you're calling a resource or interacting with some type of like web system it's just good to get in the habit of doing especially if this is something that like you depend on for revenue and whose uptime is very important so yeah that's more or less that I hope that was educational and informative just imagine there's a little date created here uh I'll share the blueprint that I've developed here for both steps the first where we're scraping just like one resource and then using AI to turn that into Parable data and then the second where we're actually building out our own text parsers just using Rex and stuff like that obviously the second example is a lot more High throughput these are the systems that people pay me usually between 2 to 5K to build um I did build it in I think under an hour Depending on time so you can imagine how your Roi gets pretty good there although if I were building this up for a client I would have taken a couple of additional steps just to sanitize the data and make sure that like everything was hunky dory before proceeding so hope that was valuable if you guys got any questions leave them down below in the comment section otherwise like subscribe all of your support to my channel so far has been incredible we've been averaging between 200 to 300 Subs a day which is mind-blowing um I'd love it if we could keep that up thanks so much for watching and I'll see you in the next one bye-bye

Другие видео автора — Nick Saraev

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник