💜 Booking Link for My Live Projects and Programs: https://topmate.io/cloud
💙 Join Our WhatsApp Group: https://chat.whatsapp.com/LsFkkWKDUSoL0JKFowR1bq
💚 Join Our Telegram Group: https://t.me/+xjZA3ZS-OQkxZTk1
💚 Subscribe to the Channel: http://www.youtube.com/channel/UCYDlBuxE7BEtYR5r4q3WRrQ?sub_confirmation=1
About The Channel :
DevOps and Cloud Labs empowers you to thrive in the dynamic world of DevOps and Cloud Computing. We offer comprehensive learning resources covering AWS, Azure, and essential DevOps tools like Ansible, Packer, SonarQube, Trivy, Jenkins, Datadog, GitHub, GitHub Actions, GitOps, Argo CD, JFrog Artifactory, Terraform, Kubernetes, and Docker.
Our content includes tutorials, engaging shorts, realistic mock interviews (covering devops interview questions, devops engineer interview, aws devops interview questions, devops mock interview, cloud engineer interview questions, devops fresher interview questions, and senior devops engineer interview questions), insightful podcasts, downloadable PDFs, and helpful books.
We focus on practical application through complex, industry-grade projects, encompassing 3-tier architectures, database management (RDS, DynamoDB), data warehousing, secrets management, Azure DevOps, CI/CD pipelines, and various deployment strategies.
Whether you're a fresher starting your journey (devops interview for freshers, devops mock interview for freshers) or an experienced professional looking to enhance your expertise (devops interview questions for experienced), DevOps and Cloud Labs is your trusted partner.
Prepare for devops interviews with our resources, including devops interview recordings, and gain insights into the future of devops (devops future scope). Explore DevOps Cloud, AWS Projects, DevOps projects for practice, AWS, Azure, Terraform, Infrastructure, Kubernetes (K8s), Docker, and IaC (Infra as Code) with us. Subscribe now and unlock the power of DevOps and Cloud!
Оглавление (8 сегментов)
Segment 1 (00:00 - 05:00)
Okay. So, thank you for joining and I'll be taking your interview today. So, can you please briefly introduce about yourself and about your dev DevOps journey and then after that we'll proceed with the interview. Yeah, sure. Okay, sure. Thank you. I'm Somshekar. I have graduated from from BMS College of Engineering and I have around 5 and 1/2 years of experience as a No, as a DevOps engineer and I have worked at at Concentrix Okay, as a DevOps engineer and at IBM as a and and later worked DevOps engineer and my core focus has been on a on infrastructure automation, CICD implementation and container orchestration. In my career I have extensively worked with AWS services such as EC2, VPC, S3, EKS to build highly available and scalable solutions. I have also experience in containerization using Docker where I write optimized Docker files. I also have a managed and monitored end-to-end CICD pipelines using Jenkins uh declarative syntax. So, these pipelines automate the build, test and deploy the applications. For infrastructure as a code I use Terraform to manage and provision the cloud infrastructure and for configuration management and uh and automating the server provisioning I use Ansible. I also ensure system reliability through Prometheus and Grafana monitoring. So, coming to my day-to-day activities I check Prometheus dashboard and CloudWatch alerts for any incidents such as pod crash or high CPU etc. So, I'll try to resolve them else I'll connect with a developers to resolve the issue. And we follow Agile methodology, a 2-week sprint. So, I check Jira dashboard to review my tickets assigned and I prioritize them based on story points and sprint goals. Within this sprint my work revolve around Kubernetes operations and pod management. Managing the Kubernetes cluster, deploying and updating the applications through Helm and Argo CD and tuning the resources limits and HPA for better performance and debugging errors like crash loop back off or OOM killed and also configuring and reviewing the deployment and networking files. In addition to this I also monitor CICD pipelines using Jenkins and I also maintain Terraform configurations for automating the infrastructure such as we EC2, VPC, S3 etc. and I work closely with the developers to improve the system and I document the findings and and mentor juniors. Also, this is about my intro and uh and and roles and responsibilities. Okay, perfect. Thank you for the detailed introduction. So, we'll do one thing. We'll start from AWS and then after that we'll cover other technologies. So, so can you tell me like what is the difference between security groups and NACLs in AWS? Security groups, it's a virtual firewall at the instance level. It is a stateful. So, no if inbound traffic is allowed outbound will be automatically allowed. Whereas NACL is a virtual firewall at the subnet level. So, it's a stateless where we need to explicitly mention both inbound and outbound traffic. Okay, perfect. Uh So, suppose now uh you know, you're storing your uh lots of files in S3 like you're having a log files that you're going to storing to S3 bucket. But you know, but after but it is bringing you a lot of cost, all right? So, what are the steps you can take to you know, optimize your cost from you know, so that you know, it will bring down the cost to a significant level. What steps you can take? So, to optimize the cost I set up S3 uh
Segment 2 (05:00 - 10:00)
policies such as Okay, there are multiple storage classes are there such as standard, standard IA, standard one zone IA and standard in intelligent and S3 Glacier. So, based on the access of the logs or the data will be moving across the storage classes. So, in this way we can reduce the optimization. So, Okay, we reduce the uh cost optimization. Okay. Got it. So, like do you know what are life cycle management rule in the AWS S3? Uh yes. So, like once object has been uploaded in S3 bucket, so we can configure that. After X number of days move the object to the Okay, to the lower class. And if versioning is enabled, so we can define the number of versions to keep uh for X Okay, days. Mhm. And we can delete all the versions Okay, after X days. Mhm. Mhm. [clears throat] Okay, this way we can Okay, we can do it. Mhm. Okay, so basically that will help us to you know, like the files that are not important for us and we are not in frequent like frequently accessing them so that we can place a life cycle rules and that will you know, automatically switch the Yeah. the the storage class from you know, the standard to some other classes. Okay, perfect. Yes. Now, you know, you have been assigned to a new project where the client want to deploy their application critical application to the AWS cloud and they want a multi-regional setup for their application which is highly available and fault tolerant. So, and so how would you design three-tier architecture which is multi-region as well as highly available and fault tolerant as well? And you and what are the what other security services that you can utilize to ensure the security point as well? Okay, so to design a highly available and scalable solutions so in cloud, so first I'll start with a networking where I'll take a Okay, VPC and a one private one public subnet for web layer and one private subnet for application layer and the other private subnet for Okay, for database layer. And we can deploy this in multiple availability zones. And [clears throat] and we can implement IGW that is internet gateway for public subnets and NAT gateways for outbound access for the private instances and in this way we can establish the internet access and then as we know that security groups and Okay, NACL Okay, these control the inbound and outbound traffics Okay, through the instances and to the subnets and we'll use the I'll use load balancer to distribute the load proportionally across the instances which will be deployed in multiple zones and I'll Okay, implement auto scaling groups where the Okay, instances can be added or terminated based on the traffic and this makes the solution highly available and scalable and then we can use uh network and we can use EC2 or EKS for the backend Okay, for the for backend server and then uh Yeah, we can use WAF for web application Okay, web Okay, uh firewall that protects the application from web attacks. And then uh we can use Okay, Route 53 for for No, for domain name system and then uh VPC uh networking store Okay, and we can Okay, use S3 for No, for application storage and uh RDS for database storage
Segment 3 (10:00 - 15:00)
and okay RDS for relational database and then uh This is and then we can use CloudFront. Uh sorry. Yeah, okay that's it. Okay. So like we were talking about like we need to set up a multi-regional setup, right? So how would you ensure that like it is a multi-regional setup? I've explained it like you will create a lot of resources, and your application load balancer, Route 53, you will make a configuration S3 EKS and everything. But how would you make sure that it is a multi-regional setup? And you know like if the you know the primary region goes down, and like how would you make sure that it will the traffic will automatically switch over to the secondary region? Where those configuration that you'll make? So that we need to create the same in another region as well. Uh by we can change that domain name to route that to okay traffic to that region. So how would you in the routing policy? policies. Okay, so how where in the Route 53 you'll make this configuration? we can use a routing policy we can change there. Okay, which routing Huh? Oh, sorry. Which routing policy can you help you to achieve it? So this we can use a failover routing policy we can use. Okay. — When the when the uh Okay, there are two records. One is primary and the secondary. If the primary goes down, the traffic will be routed to the secondary region. Okay. So like how will you set up like it will be active-active setup or it will be active-passive setup? Uh like which approach you'll choose and why? Thanks. I'm not sure. I need to check this. Okay. So whenever you are setting up this architecture, like if client is coming to you and asking you to set up a multi-regional setup, so there are two ways that you can set up multi-regional setup. So either you can have active-active where you know your act both the regions will be you know receiving the traffic and serving the to the users, right? So you will be maintaining the similar entire similar architecture in both the regions. Like especially like exactly your EKS cluster will be running, and then after that you'll having all the replicas of an application running in both the regions, and the user will be able to you know serve from both the regions. That is a active-active setup. Right? — Okay, so in a synchronous Okay, both the region will be in the same thing. Right. Both the regions will be synchronized, and they will having be you know they will be used for serving the users at the real time. This will be this will bring a lot of cost because you know the application is spanned across multiple regions and you know multi and and multiple you know instances of each layer is getting created, right? But and there is another option is active-passive. You know where you know active region would be your primary region, and secondary region will work as a passive. So where if your second primary region goes down, then it will automatically switch over to the secondary region where you will be maintaining the minimum number of resources. So suppose in So like we are you know okay updating the configurations here, right? Let me know that. Where will be okay updating the configurations like uh change to the secondary region. So it is the Route 53 only. There will whenever you are you know creating a record for your application, there you can create a record failover routing policy there we you can define uh the secondary region uh uh application load balancer if it is a front end for you. And if you're using CDN, then you can configure a CDN uh DNS uh to map your to map the domain name. So and also one thing is that whenever like and it will like Route 53 will make sure to you know check to check the health check on a regular interval, and as well as soon as it will find out that primary region is not accessible and it is not able to serve the request, then it will automatically failover to the secondary region. So this is a Route 53 intelligence to achieve it. Okay, thank you. Okay. So now your application is running on AWS cloud, and you are starting getting malicious traffic from one IP one public IP. I'm sorry, I didn't get it. Then come
Segment 4 (15:00 - 20:00)
again. Yeah, so I'm saying that your application is running on AWS, and you started getting malicious traffic from a specific IP. All right? So now you want to block that IP on a firewall level. Like how can you do that? Uh so we can uh we can blacklist that IP. We can enable WAF or shield there uh for the application, and we can blacklist that IP there. Apart from your shield WAF and shield apart from WAF and shield you are able to think of uh like any other service or native firewall that you can utilize to achieve it in AWS? Apart from We have talked about it recently. Okay. A security groups we can block there. Allow or deny. Security group is not capable of blocking the IPs. So this is NACL. Network would be able to block it. Yes. Yeah, NACL is the right one. NACL has the capability to block it, and with the help of NACL we can block that IP. Okay. So now you're up now you want like you are configuring you know monitoring for your application using AWS CloudWatch. So can you tell me like what are the different matrices that you'll create for effectively monitoring your infrastructure? Apart from CPU and memory. Yes. Uh one is the application application-specific logs like number of number of failed requests, number of active connections, and volume okay and volume read operations, and volume read and volume write operations, and network in and network out. And uh And it depends upon the applications. We can use some application-specific metrics as well. Okay. What else you are able to think from the infrastructure point of view? And number and coming to S3 okay bucket number of number of objects and the bucket size. And the bucket it's okay access objects will be there, right? Access objects and I just forgot that name. And then uh volume okay EBS is done. S3 is done. And EC2 is about okay CPU and memory okay usage. And uh Yeah, that's it. — Okay. CPU and memory you can also configure a matrix for number of requests that are coming to load balancer. You can configure also create a matrix for response time like how much time it is taking to respond to the latency. Right, so latency. All right, so similarly you can configure a 5XX and 4XX error codes matrices as well to get the what are the number of count requests that are getting failed with a 4XX or 5XX. All right, so that will help you to you know monitor your infrastructure in a better way. Okay. Do you have some experience with Terraform? Yes, I have some experience. Okay. So can you tell me like what are the difference different ways of managing your state file in Terraform? Okay, to manage a state file we can use uh Okay, we can store the state file in local as well as in the remote okay S3. So these are the two options we have to store the state file. And to manage the different state files we can use workspace. So that allows to you use different workspace by keeping the same Terraform configurations. Okay, so why do we use Terraform workspace? What benefits it provides to us? Uh it provides that we can reuse the same configurations of of Terraform scripts by just changing the state file. So it becomes like okay reusable across different environments. Okay. So you are saying that with the help of workspaces like we can create multiple environments, and we can use the same Terraform configuration that we have written. config Yes, yes. You mentioned a state file, but Terraform workspace allow us to create different environments. Like if our application has three environments, dev, stage, and prod, right? So, with the help of with the help of Terraform workspace, we can create different
Segment 5 (20:00 - 25:00)
namespaces, kind of thing workspaces, and that will be entirely different environment. So, you know, we you — Yes, we can create like in the different workspaces also we can create there. Yeah, so you can create like each workspace for your specific environment. In this case, like dev, stage, UAT, prod, similar. So, but in this case we need to use same Terraform scripts, right? Yeah, but we would be having different variables. tf, so that will be maintained for each environment, and accordingly we can utilize the variable. tf, and for you know, specific to environment, and we can provision the resources. Using the same Terraform code. — Okay, environment specific variables will be there. Right, correct. Yeah. Okay, perfect. So, suppose now, you know, a one engineer, new engineer has been joined to your team, and by mistake he went to the AWS cloud, and he deleted the S3 bucket, which was managing the state file. So, can you tell me like how would you recover your infrastructure or state file? So, okay, the has deleted the S3 manually. Yeah, he bucket manually. So, then Okay, this is called drifted okay, Terraform, where the actual infrastructure doesn't match the state that is recorded in the state file. So, when when we run Terraform plan, in the output will be like getting to know the okay, drift okay, which resources will be okay, applied. The changes will be applied to which resources. So, we can do we can use — [clears throat] — Terraform import. So, that this will bring the manually changed you know, resources into you know, into the into Terraform control. Or else we can use Terraform apply. Mhm. This will update the infrastructure. Okay. So, Okay, so if you have you know, other backup if your state file has been deleted, then there is only one way that through which you can you know, get your infrastructure using a Terraform imports. If the first you have to import all the resources that have been provisioned using a Terraform, so that you can further manage it. So, okay, fine. So, what are the So, can what best practices that you can implement, so that such kind of event does not happen? So, to prevent drift, we can avoid changes through console. Every change should be through the script. Okay, this is the first thing, and the second we can implement like okay, like meta arguments like uh create like okay, ignore changes. So, this will okay, used when we don't want to uh So, okay, you know, this can uh This can okay, ignore the manual changes. And And everything should be going through the script only. Like through CI/CD pipelines. Okay, no, basically I was more talking about like the username the new team member has joined, and he deleted the state file, right? So, what best practice you can follow so that you know, such kind of events can be prevented in future? Providing limited access to the AWS console for the newly joined engineer. And we can add And also we can implement some multi-factor authentication for deleting or or changing the resources configurations. And what else? Uh Okay, you can also enable versioning in your S3 bucket, so that you know, Okay, versioning is there. if the state is deleted, then you can you have a backup marker in it, and you can just recover it. Okay. Okay, so can you tell me what is you know, state locking in Terraform, and why we use it? State locking in Terraform that okay, when we enable the state locking, uh it will prevent multiple read writes from multiple persons at the same time. Uh if it allows, then will the state
Segment 6 (25:00 - 30:00)
file will be corrupted from the multiple state files. I'm getting a call from someone from your team, I'm doing Uh you can you know, ignore it for a while. You can pick it up after this. Uh call. I'm getting call from your team. Yeah, I'm saying that you can ignore it for a while, and then you can after you pick up pick it up after the interview. Yeah. Okay, sure. Okay. So, — [clears throat] — Okay, so got it. So, and how we can you enable the state locking? What we use it? So, in our project, we have implemented state locking in S3 bucket by using use state lock equal to true in the configurations. That we have enabled this state locking. Okay. Perfect. So, do you have experience with Docker and Kubernetes? Yes, yes. Okay, so suppose you are setting up a new application, and you are writing a Dockerfile for it, and there are two options, entry point and CMD. So, can you tell me like what is the difference between the two these two? Entry point and CMD, so this defines the default command execution in the in the application or the container. So, CMD and entry point both. CMD can be overridden at the run time, whereas entry point cannot time. Any any command that passed okay, during the run time will become as parameter for the entry point. Okay, So, that is defined in the Dockerfile. Okay, and like what are the best practices you can follow while writing a Dockerfile? So, while writing a Dockerfile, the first thing is we okay, we can you know, we should use the efficient base images, like Alpine or slim images. And second is we can combine multiple run commands in a single command. And third one is we can place the frequently changing instructions at the at the end of the file, and then we can implement multi-stage build, where one stage is used to build the image, like okay, like downloading all the application okay, application And in the another stage only we need to copy the required files okay, at the run time. Mhm, so this we can use. Okay. So, like in the Kubernetes, like you are going to deploy your fresh application, and you don't need you need to decide whether you want to use deployment or stateful sets. So, which option will you choose, and why? How would you make that — Which application? You're setting a new application that you need to deploy in Kubernetes. And there are two options to you know, configure it, like one is deployment, and another one is stateful sets. So, can you tell me like which option will you choose, and why? Okay, so to deploy a new application it depends on the application so, application whether it is a stateful application or stateless applications. If it is a stateless application, where the user session is not on the server side, each request is processed okay, independently. At that time, I'll go with the deployment. Whereas when the application is stateful, where the user session is maintained on the on the server side, and when it requires some persistence storage for maintaining the sticky identity, at that time, I'll use okay, I'll use stateful set. Mhm. Okay, perfect. Uh so, now you're deploying your application, and you're getting pending state error. So, what could be the possible reasons for it? And how would you debug — Pending state Uh pending state that will come when the scheduler didn't find a node to schedule a pod due to node constraints such as CPU or memory limits, or there might be some tainted toleration for that. So, first I'll try to increase the resource or reduce the request limits of the pod, and then I'll
Segment 7 (30:00 - 35:00)
If there is a tainted node, I'll add that tolerations to the pod. In this way, I can Okay, clear up this. Okay. You mentioned Oh, yeah. The But, you mentioned all the you know, the reasons that you mentioned are correct. But, in our case, like we are not having any tainted toleration. And, we are also not having any resource related issues. Then also, we are getting uh you know, pending state error. What else you are able to think of? I check the uh networking whether the kubelet is able to communicate with the API server or not. I think there will be some network connection. No, but you know, it is not the case. Like, if your pods are showing into a pending state, then you are your pods are not able to schedule. So, kubelet is not coming into picture, right? Till now, the kubelet is not come into picture. So, think you think it from the perspective of Kubernetes components. Like, is there anything that you can think of due to which it is not able to schedule? Any of the component that is not, you know, properly working as uh as expected it should work. Okay, let me make you a little Yeah. Yeah, complete it. Scheduler, I'll uh So, how would you check whether it is scheduler is not working? Scheduler. Like, how would you check it like if the scheduler is not working? Scheduler. Okay. So, can you tell me like in which name space we deploy the like control plane components? In kube system. Okay. On the [snorts] kube system, I'll check the pods of the schedulers and the logs of the schedulers there. Correct. So, like there could be the case that scheduler is not working. That's why the pods are not able to come up. Right? And, they are showing into a pending state. To check whether the scheduler is working or not, we will list out the pods in the kube system name space and verify if the pod is running or not. If not, then we can further debug it using, you know, checking the logs of the scheduler pod, and then after that, you know, find to find out the exact issue. Okay. Uh So, now like you are having two services that are running in your Kubernetes cluster, app and app B. Okay? Now, app A wants to communicate to app B. But, when it's trying to communicate, it is not able to communicate. It is failing. So, what could be the possible reasons for it? Uh I'll check the services first. So, to enable the pods communication within the same cluster, uh so, we need to enable the cluster IP. So, first I'll check the networking files of the of the services. Uh like, I'll check the uh the I'll check the labels and the selectors there first. Whether it is matching or not. Like, what else you check? The port I'll be checking whether it is running. — You are saying that the labels and selectors are not correct. That's why it is not able to, you know, get the endpoints in the services. And, due to which the request is not able to reach the reach to the app two or app B pods, right? Uh very genuine reason. But, this is not the issue in our case. So, ports. Like, whether the ports are defined or are perfect or not. Like, also, it is matching or Okay, correct. Like, if the pod would not be correct, then also the issue can come. Okay? But, the pods are also uh the ports are also correct. Okay. So, then I'm getting Okay. So, there could be the case that uh like any network policy has been, you know, uh assigned. So, you know, due to which, you know, if the network policy is placed where, you know, only the uh that will be restricting the traffic. So, we will need to check that
Segment 8 (35:00 - 39:00)
whether the, you know, what kind of native network policy is assigned. And, if you want — Where uh this network policy will be Okay, defined like I didn't get that. It will be in the service YAML file, right? The networking policies. — No. So, network policy is a different object in the itself. Different object in itself in the Kubernetes. Suppose you want — Yeah. Okay, like Okay, that is what you are saying. No, ingress is a different thing. Network policy Like, similarly, like we have a pod deployments, services. Similarly, we have uh network policies object in Kubernetes. So, that allows you to, you know, streamline the communication among the applications. So, by default, like all the applications that are deployed into the cluster, they can communicate to each other. Right? Okay. But, we don't want that. Like, suppose, you know, uh there are multiple name spaces. Uh so, we have a two different teams that are working in the same cluster. Right? And, the team A application is running in name space team A. Okay? Okay. And, we have a different name space team B, where the team B resources and application will be running. And, we don't want that team A should able to access the team B resources, and team B A resources, right? Okay. This is where we will create a network policy that will restrict the traffic. So, only the team A would be able to access their resources in the team A name space. And, similarly, team B would be able to access the resources in team B name space. So, this is So, you are saying this is uh with respect to R back? R back is also a different thing where that is a role-based access control. That is also a thing where, you know, we restrict the access. You know, where that helps to fine-grain the access. But, network policy is a different thing. Basically, it is a way where we, you know, we can, you know, create a network policies to restrict the traffic among the, you know, the application running in the cluster. Okay? I'm got it. Perfect. So, uh like I will just like we have covered three uh topics today, like AWS, Terraform, and uh Q Docker and Kubernetes. So, I'll just provide you the feedback. On the cloud side, you are good. But, you just need to work on the you know, system design part, like where you can, you know, design a better architecture for an application. Like, you explained it like you will be using different services in the AWS for designing your architecture. But, you the sequence was not correct. So, like whenever you are designing your application or architecture, you should start from the scratch. Like, if you are talking about like how your application will be accessed. So, we talked about like you need to design a three-tier architecture. So, there is a web tier. Like, right? And, then after that, you'll be having an application tier. Then, after that, you'll have a database tier. So, now, you know, you'll have like how would you you'll start with a route 53, you know, where you know, registered your domain. And, configure your DNS routes. Uh you know, add different records like CNAME or A record as per the you know, requirement. Then, after that, adding the record record, your front end would be either the CloudFront if you are caching. Your d- domain domain name will be mapped to the CloudFront DNS. Or, either if you are not using DNS C- CDN, then you'll be using application load balancer as front end, right? So, yes. — you'll be using a public load balancer that will, you know, in the public load balancer, you'll define your uh you know, host-based routing or peer path-based routing. And, right? And, then after that, you create a target groups for your application if you are having a multiple application deployed in Kubernetes cluster. Right? Then, after that, you are creating the uh targets group, then you will basically, you know, create a EKS cluster where your different application will be running. So, Mhm. Explain the architecture in a sequential way. And, now if I further from the security point of view, like any you can, you know, use utilize WAF. You can create different WAF rules. And, then after that, you can attach the WAF rule to your CDN or application load balancer. So, that, you know, if any of the traffic request that is matching to your any of your uh WAF rules, and they will block the traffic. So, to ensure the security. So, the sequence matter whenever you are explaining the architecture. Mhm. I'm got it. That is one. Uh rest is everything good. But, your matrices from the matrices point of view, how you can effectively monitor the infrastructure. So, you can think of like how you can What are the different matrices that you can create? Right? Like, majorly, I have heard of like people you will ask them, they'll ask Okay, I'll create a CPU and memory matrices. " But apart from what are the custom matrices that