Technical SEO for AI: Robots.txt, GPTBot & llms.txt Explained | 3.4. AEO Course by Ahrefs
7:58

Technical SEO for AI: Robots.txt, GPTBot & llms.txt Explained | 3.4. AEO Course by Ahrefs

Ahrefs 13.05.2026 160 просмотров 8 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Technical SEO might be the reason AI cannot see your site, and in minutes you will learn if robots.txt, GPTBot, and llms.txt are quietly blocking you. Additional Resources: ► https://www.youtube.com/watch?v=gReszNnykpg ► https://www.youtube.com/watch?v=6NFei1FbytM ► https://www.youtube.com/watch?v=vE_A4IhmkKQ In this AEO lesson, we walk through six technical checks that make AI assistants actually access and understand your content. We start with robots.txt for AI, because it is not just Googlebot anymore. You will see how to spot and fix rules that block GPTBot, OAI-SearchBot, ClaudeBot, and Google-Extended, plus a heads up on Cloudflare’s default setting that can silently add AI blocks. We also show how to use Ahrefs’ Site Audit to flag AI crawler issues fast. You will hear where llms.txt fits in today, why it is not a priority, and how to think about it alongside robots.txt. Then we get practical with rendering and speed. If your content relies on JavaScript, ChatGPT’s crawler will not see it, so we cover a quick no-JS test and why server-side rendering is the fix. We touch on page speed for AI retrieval too, since slow pages can get skipped while systems fetch and chunk in real time. From there, you will learn how clean HTML structure, clear headings, and atomic sections help AI parse your pages, and where schema markup fits for AEO. Finally, we tackle AI-hallucinated URLs, how to find AI referrer traffic that hits 404s, and the simple redirect play that turns lost clicks into engaged visits. If AI visibility matters to your business, do these checks before the next crawl. Watch now to make sure AI can find you, understand you, and promote you.

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Hey, it's Sam O and welcome to the fourth lesson in this module, which is on the technical side of AEO. Now, I know technical can sound intimidating, but this lesson isn't about rewriting your site's code. It's about making sure that AI can actually access and understand your content. And while access and understand might sound rudimentary for some of you, the reality is a lot of sites are accidentally blocking AI without even knowing it. According to our data, around 5. 9% of 140 million websites are blocking GBTO, which is OpenAI's crawler. That's millions of sites that are invisible to chat GBT. So, in this lesson, I've got six technical checks and tips for you to make sure AI can find you so that it can promote you. Let's get started. So, the first thing you need to check is your robots. txt file. Robots. txt txt is a file on your site that tells crawlers what they can and can't access. And the thing is, it's not just Google's crawler you need to think about anymore. There are now dozens of AI specific bots that crawl the web. The main ones you should know about are GPTO and OAI searchbot from OpenAI, Cloudbot from Anthropic, and Google extended from Google. If any of these are blocked in your robots. txt, txt, you're asking those AI platforms not to crawl your content. And assuming they obey your rules, they sure won't be recommending your pages then if they don't know what's on them. Now, you might not have blocked these bots intentionally, but a lot of sites inherit robots. txt rules from templates or old configurations, and some platforms add blocks by default. For example, Cloudflare has a feature called instruct AI bot traffic with robots. txt. That's now enabled by default. When this is on, Cloudflare automatically updates your robots. txt to signal that your content shouldn't be used for AI training. So, if your site is on Cloudflare, you could be blocking AI crawlers without even realizing it. So, the first step is simple. Go to yourdomain. com/roots. txt and look for any lines that mention GPTO, Cloudbot, Google Extended, or OI Searchbot. If you see a disallow rule next to any of those, you're blocking that AI crawler. You can also use HF site audit to check this. Run a crawl on your site and it'll flag any robots. txt rules that might be blocking AI crawlers. Now, while we're on the topic of files AI reads, I want to make a quick note on something you might have heard of called LLM. txt. This is a proposed standard kind of like robots. txt, but specifically designed to tell AI systems about your site. The idea is that you create a file atyoudommain. com/lms. txt that gives AI a summary of who you are, what your site covers, and where to find your most important content. It's useful in theory, but as of right now, no major LLM provider officially supports it. OpenAI doesn't use it. Anthropic publishes one on their own site, but hasn't confirmed their crawlers actually read it, and Google hasn't adopted it either. So, should you create one? Well, I don't think it'll hurt you, but I wouldn't prioritize it over the other things we've talked about in this lesson. Robots. txt is still the file that actually matters most right now. All right, the second thing to check is how your site handles JavaScript. Some AI platforms can render JavaScript and some can't. Without getting too technical, Gemini and Copilot can render JS while ChatGpt's crawler does not. So if your content relies on JavaScript to load, which is common with single page apps and some React or Angular frameworks, ChatGBT literally can't see your content. It visits the page and gets an empty shell. The fix here is serverside rendering, which means your server sends the fully rendered HTML to the crawler instead of relying on JavaScript to build the page in the browser. If you're already doing this for SEO, you're covered. If not, it's worth looking into, especially if AI visibility matters to you. A quick way to test this is to disable JavaScript in your browser and visit your own site. If the content disappears, you have a JavaScript rendering issue that's affecting AI crawlers, too. The third thing to consider is page speed. Now, you might be thinking, page speed is an SEO thing, not an AEO thing, but it can actually matter more for AI retrieval than for traditional search. When AI systems retrieve information in real time, they're fetching, parsing, and chunking your pages on the fly. And if your page takes too long to load, it can get dropped before it's even scored. So, it won't be making it into in AI response, even if the content is great. The good news is that if you've already optimized your core web vitals for SEO, you're most of the way there. Fast loading pages with clean HTML benefit

Segment 2 (05:00 - 07:00)

both Google and AI systems. And that brings us to the fourth tip. Create clean HTML structure. This one's straightforward. AI systems parse your content by following your HTML structure. So if your headings are logical, your sections are well organized, and your paragraphs are focused on one idea each. AI has an easier time extracting the right information. This ties directly back to the content principles we covered in lesson 3. 1. Bluff, atomic content, and entity rich writing. Those principles aren't just about writing style. They're about making your content technically parsible for AI. So, when you're structuring your pages, use proper heading hierarchy. H1 for the title, H2s for the main sections, and H3s for subsections, and make sure each section can stand on its own because AI might chunk your content at any heading boundary. The fifth tip is about schema markup. Schema markup, which is also called structured data, is code you add to your pages to help search engines understand your content. Things like article schema, FAQ page, how-to, and local business. Now, does it help with AEO? Honestly, the evidence is mixed. There's no confirmed data that adding schema directly improves your chances of being cited by AI, but it doesn't hurt. And if you're already using it for SEO, there's no reason to remove it. I wouldn't spend a ton of time on schema specifically for AEO, but if you're setting up a new page, adding the right schema types is a good habit that makes your content easier for any system to understand. All right, the sixth tip that I have for you is to optimize for AI hallucinated URLs. AI assistants sometimes make up URLs that don't exist on your site. They'll recommend a page to a user, the user clicks it, and they hit a 404 error. And this happens a lot more often than you'd expect. According to our data, AI assistants send visitors to 404 pages 2. 87 times more often than Google search does. And Chat GPT is the biggest offender with about 1% of its clicked URLs leading to 404 pages. Now, rather than letting that 404 be the end of a visitor's browsing journey, you should either fix or optimize those pages to get more out of them. You can do that by checking your analytics for pages that are getting traffic from AI referers but returning a 404 status. If you spot a hallucinated URL that's getting consistent traffic, set up a redirect to the most relevant real page on your site. That way, you're capturing traffic that would otherwise be lost. Now, while creating content and getting cited is a big part of AEO, it's only part of the picture. You also need to know if it's actually working. And that's exactly what we'll be covering in module four, which is all about measuring and tracking your AI visibility. I'll see you there.

Другие видео автора — Ahrefs

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник