# Don't Let Your Microservices Run Wild: Learn Monitoring Basics! | Microservices Primer Course

## Метаданные

- **Канал:** sudoCODE
- **YouTube:** https://www.youtube.com/watch?v=fQRf1d6QxqM
- **Источник:** https://ekstraktznaniy.ru/video/38982

## Транскрипт

### Segment 1 (00:00 - 05:00) []

talking about microservices architecture seems so fascinating but what happens when there are outages in production let's try to take an example and understand the real scenario in case when things go wrong in microservices architecture in production let's say that you live in a house where there is just one washroom and one tap in the whole house if there is a leak in that house you will know where to find or where that leak would be coming from because there is just one tap versus you live in a huge penthouse where there are so many washrooms and so many tabs and whenever a leak happens you have to figure out where it is happening you have to go to different rooms and different washrooms and figure out what went wrong similar thing happens in case of microservices as well this is a monolith you have the whole application code in one machine with your database might be a situation that you have database in a different machine but at the end of the day you have like two different deployments if anything goes wrong you know which logs to check which machine CPU and memory to take and where to go and find out the issue right because it is just one huge deployment whenever an out it happens in microservices it is not easy to pinpoint just the right service because you have so many services and so many deployments some of the deployments of microservices could look like this you can have just one service on one instance or you can have one service on different instance behind a load balancer or you can have different services in different instances behind different load balancers and API gateways if anything goes wrong you need to figure out where is the problem that is where the concept of monitoring and observability comes into the picture when we just talk about monitoring it is very easy to implement monitoring in case you have one deployment or few Services you know what to monitor whereas observability itself is a different concept altogether when we talk about observability you have to implement a system so that you can observe the system as a whole to see if it is working as expected or not some of the tools that could be used to implement observability in a system or to bring observability into a system R you can have different metrics like how the CPU and memory and reads and writes and transactions are happening in different machines how the metrics of certain instances are behaving are reporting you can have logging in your system you can Implement tracing track the different flows of your system and see till what point certain flow is executing and at what point it is failing the important part about observability is that you shouldn't get lost into the tools to bring observability into the system these are some of the tools that you can use to have observability but this doesn't mean that you just get lost in the tools you have to make use of the tools in order to build observability let's see how can that be done it can be done through log aggregation metric aggregation implementing alerts into the system and having some kind of tracing into the system these are the fundamental tools that any company would use there are different companies itself which do this for you there for example data dog or neuralik these are the companies which will tie up to your systems and provide you all this data for you to have proper monitoring of your systems if you talk about log aggregation we would have something like different deployments of one service that service would be producing logs and there would be some internal demon that would be collecting of all the logs and forwarding it to a tool that's how you collect all the logs but you have to do that not just for one service for all the services and you have to build aggregated logs so if an order is placed and a notification is sent and a payment is made you have to see all those logs in your system where you have logs saying order service placed or accepted the order payment service made the payment notification service send the notification and so on very similar to logs you also need to aggregate the metrics so that you can correlate let's say that the notification DB is down for some reason you might want to know that why it went down if you have aggregated metrics of all the services and all the systems you might notice that there was a huge throughput in orders there were a lot of orders placed in a small window of time and Order service was able to handle that because it was scaled properly however notification DB couldn't take that load of the order service and head hence in it went or maybe it is stopped accepting connections and started throwing errors if you have aggregated metrics if you know that okay at this certain time order throughput increased and the DB went down that means there is a correlation between these two events if you aggregate all the metrics and if you have different dashboards to track those metrics it would be easier for you to find out the issues in the system one more approach that is used in observability is tracing where anytime a service is implementing any business flow it tries to produce certain trace for it sometimes logs can be used as traces as well the data in the logs can be used in s traces as well and then there would be again one common tool to collect that tracing and for you to figure out like what are the different series of events that took place in

### Segment 2 (05:00 - 05:00) [5:00]

different systems and for you to figure out where actually the problem happened and also how different services are actually interacting with each other these are the fundamentals of monitoring and observability in reality there is a whole team and their whole domain dedicated to this if you would have heard about there is a domain known as SRE the site reliability engineering that whole team does this for other teams in the company and this is a big domain in itself if you want to know more about it in details I can again create a dedicated video on the same till then take care see you in the next video