The Internal Architecture of Amazon S3

The Internal Architecture of Amazon S3

Gaurav Sen

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (1 сегментов)

Segment 1 (00:00 - 01:00)

This is the internal architecture of Amazon S3. Amazon S3 is an object store that manages exabytes of data in millions of HDDs. When a user requests a file to be uploaded, Amazon breaks it into multiple parts and uploads each part in parallel. Every part has a checksum, which is used to verify its correctness after upload. A monitoring process continuously checks the hard drives to ensure that no data is lost or corrupted. If a hard drive fails, it is replaced quickly, and because the chunks have multiple copies, read requests don't get stalled. In case an entire region fails, another region is used to answer the request. This is done through shuffle sharding, where every part has 2 randomly allocated regions. So even if one region fails, the other can serve the request. If a region is slow, the request can be canceled preemptively, maintaining low latency. Another benefit of S3 is auto-scaling. In case the prefix becomes too hot, more servers are added to the shard, allowing fast uploads and keeping latency to a manageable level. An interesting part to this is that most of the data that is served on S3 is recent data. So if a newly assigned shard is only given recent data, then most of the requests will land on it, resulting in a hot spot. Instead, S3 populates the newly assigned shard with older data from other shards. This manages the heat across shards, uniformly distributing the read requests, and therefore managing latency. For fault tolerance, S3 has a clever algorithm S3 doesn't just naively replicate data. Instead, it uses error correcting codes. So for a file with five parts, S3 adds four additional parts of Error Correcting Codes. The benefit of this is if any three parts are corrupted, the file can still be served using the remaining parts. And so instead of using a naive 3x redundancy, the total memory consumed is just 1. 8x of file size. With this, AWS is able to guarantee 11 nines of durability for every file uploaded to S3. For more system design videos, you can check out the link below. Thanks for watching.

Другие видео автора — Gaurav Sen

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник