Cloudflare Outage Incident Report

Cloudflare Outage Incident Report

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (1 сегментов)

Segment 1 (00:00 - 01:00)

On the 18th of November 2025, Cloudflare had a major global outage. This is a summary of its incident report. Every request on Cloudflare passes through its core proxy system. This performs some basic security and sanity checks. One of these checks is called “Bot Management”, where multiple features are used to eventually classify this as a bot or human request. This is done using a configuration file. And this is where things went wrong. Cloudflare uses an automated query to Clickhouse DB to fetch all the features of the config file. This is done using a SELECT statement on the table of http_request_features. In parallel, the database admins at Cloudflare were trying to improve the security of the system by giving fine-grained user access control. Meaning that every user’s access would be granted explicitly. This had the side-effect of the system table listing every shard’s features as separate entries. This resulted in the config file becoming much larger than expected. From an expected 60 entries to 200 plus entries! And when it is propagated across the network, the core proxy checks started failing. As the core proxy started failing, They began heavy logging of every request which further increased latency in the CDN. Since the file was being propagated slowly, the Cloudflare engineers were able to intermittently bring back some of these proxies. And when they failed again, the engineers suspected that this is a massive, coordinated DDoS attack. Coincidentally, the Cloudflare status page was also affected. And since this runs outside their infrastructure, The engineers were nearly convinced that this is a “planned attack”. But the status page was down because of an unrelated issue, and eventually, when the file had propagated everywhere and taken down all the proxies, the Cloudflare engineers tracked down the root cause and reverted the config file to a previous stable state. The issue lasted for only 6 hours, and is a testament to how good the engineering at Cloudflare is. Thanks for watching.

Другие видео автора — Gaurav Sen

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник