Kafka Replication and Min Insync Replicas | Learn Kafka | Kafka Tutorial

Kafka Replication and Min Insync Replicas | Learn Kafka | Kafka Tutorial

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

hello and welcome to another video on the channel today we are going to talk about replication in Kafka so what is replication basically means keeping a copy of some data on multiple machines and then these machines are connected over a network so why do we even bother with replication well it has a few very important advantages first of all replication reduces latency so you can keep the data geographically close to your users and that way you can reduce the response times second it also increases the overall availability of your system so if some parts of a system are down for whatever reason system can still continue working if the data is properly replicated third it helps increase the overall read throughput of our application you might have noticed that in usual applications the ratio of reads to writes is very high so more number of read request quests come as compared to the right requests so we can create more replicas that help us serve more of these read requests efficiently so coming back to replication in Kafka let us look at a diagram as you can see we have something known as the Kafka cluster basically consists of multiple Kafka Brokers if you remember our Kafka introduction Kafka Brokers is where the topics are located these topics have partitions and we write data to these partitions if a broker goes down the partition is gone and we can end up losing data to prevent this loss of data we need replication and to achieve replication in Kafka we need to replicate partitions across multiple Brokers this is because partitions is where the data is actually stored when a particular partition is replicated it basically means that the partition is assigned to more than one broker if you see in the diagram partition 0 of topic a is assigned to both broker 1 and broker 2 but these Brokers are different from each other when a partition is replicated across multiple Brokers one of the Brokers acts as the leader for the partition the other Brokers are followers so in this case broker 1 is the leader of partition 0 and broker 2 is the follower the message is received by the leader are also copied over to the followers to achieve replication so what is the difference between leaders and followers there is no difference on the consumer side because consumers can consume messages from the leader broker or even the followers however producers can only publish messages to the leader broker even if you see the diagram our producer publishes messages for partition 0 to broker 1 and for partition 1 to broker 2. at this point you might be wondering what happens if the leader broker goes down for some reason so in that case the remaining followers organize an election and a new leader is elected this happens using some consensus algorithms which we will be talking about in some later video the main takeaway is that at any given point of time there will be at least one leader now this concept of replication is slightly incomplete without understanding one very important term and that is Kafka in sync replicas so what is a Kafka in sync replica let us look at it with the help of another diagram in this diagram partition 1 is replicated across three Brokers broker one broker two and broker three now if you notice carefully broker 1 and broker 2 have the complete data broker 3 on the other hand has couple of missing records for some reason broker 3 has not been able to replicate the partition's data complete greatly so in this case broker 1 and broker 2 are in sync replicas of partition 1. broker 3 is not an in-sync replica basically Kafka in sync replicas are nothing but the replicas of a partition that are currently in sync with the leader in other words a replica is considered to be in sync if it has fully caught up with the leader partition in a certain amount of time it goes without saying that a leader is always an in-sync replica right because a leader will have the complete data however a follower is an in-sync replica only if it has fully caught up to the partition its following a follower cannot be behind on the latest records for a given partition to be called an in-sync replica so what happens if a follower fails for some reason followers replicate data from the leader to themselves by sending periodic fetch requests if a follower fails it will stop attending these requests and after

Segment 2 (05:00 - 05:00)

a set amount of time it will be removed from the list of in-sync replicas this is the reason why in our diagram we have labeled only broker 1 and broker 2 as in-sync replicas broker 3 is basically an out of sync replica of partition 1. mind you three is still a replica only it is out of sync it is possible for an out of sync replica to catch up and come back into the list of in-sync replicas so I hope the concept of replication and in-sync replicas is sufficiently clear we will be using both of these Concepts in the next video where we will talk about the Kafka producer AC case property if you like this video consider sharing it with friends and colleagues also don't forget to subscribe to the channel and press the Bell icon see you until the next time

Другие видео автора — ProgressiveCoder

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник