AWS Explained for Beginners | Start Your Cloud Career Today

AWS Explained for Beginners | Start Your Cloud Career Today

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (8 сегментов)

Segment 1 (00:00 - 05:00)

are going to focus on these five topics and out of that the most important is "Overview of AWS Services". However before we reach there we need to first cover some basics of AWS and then all those services and the architecture that we are going to see will make more sense. Now I know that not everyone of you has enough background about the cloud or in general cloud computing so let's begin with cloud computing and then gradually we will advance toward AWS services. So Cloud computing is exactly similar to the way you today consume the electricity. So there is electricity power plant somewhere and you are consuming this electricity at your home by paying as much you use, right? Now if you replace this electricity with the Computing resources which typically means the server, then it is Cloud computing and here all those Computing resources are owned and managed by the cloud service provider and you connect to those resources over the Internet. So that's in a nutshell is cloud computing. Right! So I hope it is clear. Now you must be wondering where are these data centers from where we can consume these cloud services? And that really depends on cloud Service provider. So here, as we are talking about AWS, AWS has these data centers across the globe and all that falls under AWS Global infrastructure. So now in next few minutes let's understand this AWS Global infrastructure and out of this the most important is AWS region and availability zones. Right! so whenever you want to consume AWS Services you need to select AWS region and as of today there are 34 AWS region and this number will grow over the time. So out of all these regions you can select which region you want to use, right? So region is a geographic area and AWS has so many regions out there. Okay, now if you talk about the region it's a logical entity which means if you double click into the region actually every region is made up of multiple availability Zones and every region has three or more availability zones. Now you must be wondering why there are so many AZ's or Availability zones? So for that we need to understand the design of Availability Zone. Okay, so every region has multiple AZ's and these AZ's are typically located up to 100 km from each other. Now that's for the reason that if there is any outedge in the AZ maybe due to the power failure or the flooding, it doesn't impact another AZ. So they could be on different flood planes and at the same time the distance between these AZ's is not too much like say thousands of kilometer. That's because if that's the case then the data between these AZs cannot be replicated in real-time, so in order to have that single digit millisecond latency between the AZ's they need to be located up to say 100 km. So this is how AZ's are designed and further every AZ has redundant power supply and all these AZs are connected with high bandwidth network. So having multiple AZ's enables the high availability and it's your responsibility to build the application which can handle AZ level failure. Further as I said because these AZ's are less than 100 km apart, you could have say primary database in one AZ and the replication database in another AZ so that there could be synchronous data replication between those databases. And if you talk about AWS services for example S3 or Aurora DB they provide the highest durability for your data and that's because AWS keeps multiple copies of data across these AZ's, right? And once you sign-up for your AWS account, you get access to all AWS regions and then as you know region contains the AZs, so you get access to All Region & all AZs and all AWS Services. Now, if you talk about different AWS services and their relation with the AZ or Region then there are services like EC2 which is a virtual machine service and it operates at the AZ level, then if you talk about AWS managed services like DynamoDB or S3, they are Regional services and

Segment 2 (05:00 - 10:00)

further there are the services which works across the Regions and these are Global Services and finally some AWS services are attached to your account for example IAM i. e. Identity and Access Management using which you can create IAM user in your account and this user will also have access to all AWS regions and the services. Similarly Billing service, where at the end of the month you will get a single bill for all the AWS Services you consumed across all these regions. So the point here is that AWS account is the top level entity and once you have that account you get access to everything in AWS, right? So I hope it is clear. Okay, now that we understand AWS global infrastructure and AWS account, it is the time to look at AWS services. So by category, there are tens of categories of AWS services and within each category for example, say Compute, there are tens of AWS services and I'm not kidding I can just keep on scrolling this so rather than just showing you AWS icons and explaining what that service does, let's do this in a little different fashion and for that let's try to build an architecture for a simplified version of a social media application and later we will see how this translates into the AWS world. And before we move forward, just a disclaimer, nowhere I'm claiming that this is how the social media application architecture should be. I'm just putting it forward for our understanding, right? Okay, now let's imagine that we are living in a pre-cloud era that means the application that we'll be building needs to be hosted in our on-premises data center. So for this application the first thing we need is a private data center. Now every data center or say every company also have their private Network so all the servers will reside inside this private network. Now, if as a developer, you are building that social media application you will first build a beta version of that and you will probably host that on one of the server. Now this server has a compute capacity and the disk and you will open access for this server to your test user who might be internal to the company or who might be outside of your company and in the initial days you will just have some public IP address using which users can access your application, right? So this is where you will start. Now over the time, you will realize that as you are adding more functionality, you need to split your functionality into different tiers, for example the user session, the UI, everything is managed by the web server and all the business logic as in say connecting to the different users, managing those connection, saving data to the database, all that should be handled by the application server. So you split your server into the web tier and the app tier. And as I said, you should also have separate database and this is typically called as a Three tier architecture and as of today, most of the application still follows this architecture. Right! so far so good, so we have now three tier architecture. Now imagine that you went into the production and your application really got a great traction, so which means your application need to now cater more users and has to serve more load and that's where now you need to scale your application and for that there are two choices that you have - one, vertical scaling which means you add hardware capacity for example the CPU, Memory and the disk to your existing server. Now, this vertical scaling might work for short period of time but beyond certain point even the vertical scaling doesn't improve the performance of your application and for this you have to scale horizontally which means you bring in additional servers and you distribute the load across those server. So as you can see here, now we have multiple web servers and multiple application servers, so this is now highly available and scalable architecture. Now, while this might work, there is a small problem. Because we have two web servers that means there are two IP addresses and we don't want to do that where we want to distribute multiple IP addresses for our application. Which means, we need some kind of intelligent router which can distribute the incoming traffic across these web servers and

Segment 3 (10:00 - 15:00)

exactly for that we bring-in the Load balancer. So if you ever heard about NGINX or HAproxy, those are called load balancer or also called as reverse proxy. So basically load balancer is the single point of entry for your application. So now all the traffic from the end user will hit the load balancer and then load balancer will distribute that traffic to your web servers, right? so far so good, now one more thing here is that typically we don't access our application using the IP address right? For that, we use a domain name for example awswithchetan. com. Now how that happens? For that you need a DNS service which translates your domain name to the IP address. Now for your application as well we will have our own domain name say myapp. com and we will point that domain name to the load balancers IP address. Now once you do that, your end user can access your application using this domain name. So I hope it is clear, right? Right, so we could solve the scaling problem for our web server and the app server but what about the databases? Now, because the databases has state, that means data, they are the most difficult to scale horizontally and here I'm talking about the relational databases. So typically databases can be scaled vertically but again as we discussed, we will hit some bottleneck beyond certain point. So typically for this there are some strategies to reduce the load on your relational databases and one of the strategy is to introduce database cache. And the way it works is that, the frequently accessed data is cached into the memory using this cache databases so that for all those queries the database is not hit. That means it reduces the load on your database and also the application performance increases because they are in memory databases the response time is very fast. So for this purpose you can bring-in the cache database. Now apart from that cache databases can also be used for storing the user session so that you do not have any dependency on your say web server. If one of the web server goes down, you still have that session information stored into the cache database and that's where you can easily scale-in and scale-out your web tier and the app tier because all the important state like user session or say shopping cart kind of information you are storing externally into the cache database, right? So here you will bring-in this cache database which will improve the performance of your application as well as for your databases. Okay, now as this is a social media application there will be lot of different type of information that you need to store, for example the profile of all the users, then which all posts the users are making, then who are liking all those posts. Now if you see all this information is of different kind and they might have different schema and relational databases are not a great choice for storing this kind of semi-structured data and for that typically you will use NoSQL databases because NoSQL databases allows you to store different type of the data into the same database. So they are schema-less and hence for our social media application as well we will also have NoSQL database. Further, because it's a social media application every user will have friends and those friends further will have friends, right? So it's all mesh of connections which is also called as graph. So in order to store this kind of information you need a graph database. So the bottom line is that, You are now using the purpose build databases to store the relevant information. Okay, so far we are talking about storing user related information in the database but there will always be lot of media which is generated. So whenever user uploads a picture or video, it has to be stored somewhere. So do you think it needs to be stored in a database or it should disk which is attached to the application server? If you're thinking so, that's not the best place to store your media. Your media should actually go into some external storage and that's for the reason that this external storage can be directly accessed from the internet. So in order to access the data from this external storage you don't need to process that data through the web server or the app servers, right? So that's the benefit of external storage and the way in which you will store this data is like if some user is creating a post and he or

Segment 4 (15:00 - 20:00)

she posts the video or the image then that media file will be stored in this external storage but the metadata of that media for example who is the user, the link to that media file in the external storage will be stored in your NoSQL database. So now using that metadata the page can be rendered where actual image can be pulled from the external storage but the metadata comes from the database. So this is how typically you will store this information. So I hope it is clear. Okay, so with this external storage you are efficiently serving your media content but as you can see here these are your Web users but in today's world there will be users who are accessing this application from the mobile phone and as you know most of your videos or images needs to be modified in order to serve that for the mobile users and that's for the reason that video format will be different or the image resolution will be different. So which means as there is a new media coming in, here you need to convert that media into the mobile friendly format, right? So for that you will use some codec program to convert the media and ofcourse for that you need some computing resources. So some processing is required to convert your media and as you do that you will store the converted file into the different storage and all the mobile users will be served from that storage, right? Okay, so with that now you can serve both the web users and the mobile users and these users are really happy right? But not in reality! Now consider that this data center is located in the US and your users are accessing your application from say Asia, right? Now you can imagine because of the long distance the law of physic kicks-in and user might experience some kind of latency when the videos are streamed from this storage locations and which means user won't be having great experience. Now how do you solve this problem? Now the only way to solve this problem is to bring this media closer to those end users and exactly for that there is something called CDN that is Content Delivery Network. Now the way CDN works is that there are these CDN devices which are across the world and as users watches that video it will be cached in the nearest CDN location and then if any other users wants to watch the same video they will be served from that cache location and that's where the latency will be reduced. Apart from that they also optimize the network path all the way from this storage location to all the content delivery network devices. So CDN greatly improves the user experience by lowering the latency. So we will also bring-in the CDN service. Okay, so far so good. This is much better than where we started. Now if you ever used social media application you know that any post you like you will be recommended with similar post or if you like some product, everywhere you will see that product, which is little irritating but that's true, right? Now how it is done? Now this is called Clickstream analytics, which means every click that you are making on the app and any activity you are doing is captured in real-time and then it is analyzed. So for this you need some kind of clickstream analytics engine and all this continuously stream data from the application servers and web server should be stored for further analytics so again we will be using some external storage, right? Now let's shift our focus towards the data analytics. Now any application in today's world cannot shine without the data analytics. Now for this particular application, if you want to know which are the most popular posts what is currently trending, what are the Age groups of the users and what kind of games they are playing on your social media app, what kind of product or ads that users are watching, now for all that you need to collect the data from various sources inside your app. Now this data sources could be your relational database, NoSQL database, Graph database and even this clickstream data that is being generated. Also you might want to look at the images and videos that user is posting so that you can combine this information and can analyze that data to take some business decision. Now for that if you want to collect this data from different sources you need to extract that data and transform that data into a specific format and on top of that you might want to do some data cleaning for example, you might want to remove the duplicates or sensitive information from your data. For example, you want to remove the phone numbers of the users or the personal information of the user. So this

Segment 5 (20:00 - 25:00)

is called ETL operation, that is Extract Transform and then Load this data into the central storage and this central storage is called Data Lake. Now over the time you will have wide variety of the data in this storage and it will be massive data and thing here to understand that it is not a data processing service, it is a data storage service. So now if you want to analyze that data at scale which has all type of the data say semi-structured, unstructured, images, videos you need distributed computing system and if you heard about Hadoop that is one of the distributed compute platform which is also called as Big Data platform. So with Hadoop platform you can process and analyze mass amount of data by Distributing that data across multiple nodes. And depending on what kind of data analysis you want to do, you can use different Hadoop frameworks for example Apache spark, Apache Hive, Apache Flink and so many others. So this Big data is all about processing vast amount of different type of the data. But often all the structured data which might come from this ETL process or from the Big data processing is loaded into the data warehouse. Now once you have all your structured data there then your business owners can use the BI tools that is Business Intelligence tool to query the data from the data warehouse and get those business insights, right? So for this application there will be a data warehouse and some kind of the business intelligence tool, okay? So for data analytics you will have this kind of data pipeline. Okay, now we are in 2024 and you know that these are the years of the Machine Learning and the Generative AI and for this application as well you might want to include some machine learning features, for example if you want to recommend friends for the users then you need machine learning. Also if you want to review the posts that users are making, comments if it contains any sensitive information or the comment has inappropriate languedge or the content or if you want to distinguish between the real users and the bot users etc. So in order to detect the frauds you have to have machine learning and in order to build those machine learning models you need to access all this data. So basically using all this data you will train your machine learning models and then you will deploy your application can use this model for different features and one of such feature could be the content moderation. So as the users upload the video or the picture you need some ML or AI service which checks whether that image or video is appropriate and then only it will allow you to store that video, right? And on top of that now a days many social media applications also allows you to create new post automatically using GenAI generated images or videos or the music so your application will also have some kind of GenAI capabilities. Okay, so far so good. We have come a long way with respect to the application architecture. So just let's add few more features and we are done. So moving on, you know that social media app will also send notification to the end user in the form of SMS or mobile push notification. So you need that functionality. You need email functionality and also if you are allowing users to chat with each other then some kind of messedge queing functionality. Now let's look at some operational things. So if we deploy this application in production, we need to make sure that all the components are healthy. For this, we'll have to continuously monitor the health of our web servers, application servers, databases, the available disk space etc. So for this we'll need some kind of monitoring system. Okay, now likewise we can go on adding more features but I think this is good enough for us to now translate this architecture into the AWS. So now let's see if you take the same application architecture and if you want to deploy that on AWS then how it will look. So now let's see what changes. Okay, so very first thing that you would have to make a choice for is where you want to deploy this application. So for your data center you have to choose AWS's Data Center and for that you have to choose AWS region. Now that would really depend on where your end users are, and depending on that you can choose any of the available AWS region. Now further we talked about this private network. Now in AWS world that private network is called VPC i. e. Virtual

Segment 6 (25:00 - 30:00)

Private Cloud and it is private network for your account and unless you want your application to be accessed over the internet, no one can get access to it, right, so like in your data center this is your private Network inside AWS. Further all these virtual machines for your web server and app servers are EC2 instances - Elastic Compute Cloud! And all those local discs there, they are EBS - Elastic Block Storage. So this is very much like the SSD and the HDD disk that you have in your laptop. Further the relational database that we have talked about, in AWS it is RDS that is Relational Database Service and there you get choice for different database engines for example you can have MySQL database, Postgres, Oracle, SQL Server all those kind of engines and on top of that you get different configuration options. For example if you want to have highly available database so there can be a primary database in one AZ and the secondary database in another AZ. Moving on to the NoSQL database, in AWS it is DynamoDB and AWS guarantees that this database will provide millisecond latency at any scale. Now with respect to this in-memory database that we used for cache, in AWS you can use ElastiCache service for that and for the graph database you can use Neptune database. Now likewise there are many other databases and we are going to dive deep into that into the course that I talked about but as of now we are just using these four purpose built databases. Now moving on to the load balancer, in AWS there is a managed load balancer service which is called ELB - Elastic Load Balancing and this load balancer will distribute the traffic across the backend EC2 instances. Now in order to scale this, we talked about the horizontal scaling, now in AWS that can be done using Autoscaling group. So if you have the autoscaling group you can define your Autoscaling policies for example if the average CPU utilization of your web tier goes beyond say 80% then you can ask autoscaling group to add one more instance automatically and same the other way if the average utilization falls below say 20% you want to remove one of the EC2 instance. Now all that is part of autoscaling configurations. Further in order to point your domain name to the IP address you will use AWS Route 53 service. Okay, now let's move to the storage service, that is the external storage that we talked about. So in AWS that storage is Amazon S3 service and Amazon S3 is one of the most important service in AWS because S3 provides unlimited storage and you can access that storage directly over the internet, right? Now if you remember, we also talked about converting the videos from one format to other, right, for the web users and the mobile users. Now looking at this you might be thinking to have just a simple EC2 instance and then using that instance to convert those videos. But imagine that throughout the day there was no upload made by the user. In that case even if there is no upload you are still running that E2 instance and you're paying for that. Now in this case in AWS you can rather use AWS Lambda. Now the way Lambda works is that as soon as there is a new picture or a media then only this Lambda will be triggered and it will perform its job. So throughout the day if there are no uploads, no Lambda function will be triggered so this is called Serverless computing where you are not managing any servers but you are just providing the code to convert that video and AWS will trigger that function whenever there is some event and in that case the event is uploading new object to the S3 bucket, right? So this is all part of serverless compute and it greatly reduces your cost for event driven applications. Okay moving on to the CDN, so in AWS there is CloudFront service which basically uses AWS's edge locations which are across the globe and use those edge locations to optimize the network path from the end user to the AWS region and as well as it will cache the static content. For example videos and images in the nearest edge location where the user is located. So it greatly improves the overall latency to access your media and static content, right? Now let's move to the

Segment 7 (30:00 - 35:00)

data analytics part. So for the clickstream you need to collect continuous stream of the data and for that in AWS you can use Kinesis service. Further in order to do the ETL you can use Amazon Glue service. Now Glue has many more features but the primary feature is to connect to the source database and then run the ETL jobs. Further the Hadoop system that you see that is EMR in AWS - Elastic Map Reduce and the data warehouse is Amazon Redshift, then the BI tool that you see there, it is Amazon Quicksite. So as you will see all the services or functionality are kind of one to one in AWS and hopefully this gives you better idea to look at this AWS Services. Okay, now moving on to the machine learning, you know that for building the machine learning model you need to have access to the data and then you need to train your model which typically requires a lot of computing resources and for that there are also a lot of ML frameworks for example pytorch, mxnet and so many others. So in AWS if you want to build your own ML model then you can use Amazon SageMaker and once you build this ML model you can deploy that on the sedgemaker endpoints and your application can access this model over the API call. Further as we said you might also want to use some AI services for example to check if the uploaded image or the video is appropriate. Now for this either you can build your own ML model but rather you should look at AWS AI Services because there you don't need to build that model yourself. AWS provides the pre-built model in the form of API calls and in there, there are lot of AI services. So for this example in particular we can use Amazon Rekognition but for everything there is an a service for example if you want to convert the text to the speech or you have to convert speech to the text or if you want to translate text from one languedge to other, for everything there is a separate AI service. So the first choice should be to use AWS AI service but if it doesn't serve then you should look to build your own ML model. Okay, so that's about all the features. Now as we go on the left, in order to have the push notification you can use Amazon SNS that is Simple Notification Service and for the email you can use Simple Email Service. Now for the message queue you can use Amazon SQS that is Simple Queue Service and for monitoring dashboards and collecting all the metrics and making sure that all the instances and databases are healthy you can use Amazon CloudWatch service. All right!, so we can go on adding more features and more AWS services to this architecture but I think it's good enough for us to make sure that you understand what is AWS cloud and how different AWS Services can be used to build the application on AWS. So our final architecture will look like this. Okay, now for a moment assume that you have been given a responsibility to deploy this solution in AWS from scratch. Now what do you think, how much time will you take to deploy this application in AWS? Now if you ask me, even if you have good AWS experience it will take at least a week to deploy this successfully if you are doing this manually and that's not because you don't know AWS services but when you deploy this you have to make sure that every little configuration for different AWS Services is correct for example EC2 itself has hundreds of configurations and even if you make mistake with one simple configuration your application may not work. So you can imagine that deploying this kind of application it is not efficient to do this manually and for that you need some kind of Automation and in AWS world it is called infrastructure as a code where you write the code for infrastructure and hand over that code to AWS services to deploy the infrastructure. So if you talk about AWS CloudFormation service you have to just write a JSON or YAML template which basically defines what all AWS services you want to use and how they connect with each other. You define that and hand over this template to AWS CloudFormation service. Now in that template you must have defined your VPC, the CIDR ranges, how many EC2 instances

Segment 8 (35:00 - 37:00)

you want and all the configurations for those EC2 instances, then load balancer, databases, everything and then CloudFormation will deploy this entire architecture for you and the best part of this model is that you can just share your template with other teams and they can also deploy the same application in exactly same manner. So you can replicate this for different environments like for development, for testing, for staging and even for the production. So this is called infrastructure as a code. Now if you heard about the Terraform, that's another way to deploy the infrastructure but AWS CloudFormation is AWS's native service to deploy the infrastructure. Right, so this is all about the infrastructure and now if you want to build a devops pipeline, so if you heard about the CI/CD where as the developer checks-in the new code it is built, tested and deployed into the production. Now if you want to do that in AWS there are a lot of devops services. So very first thing, everybody will write the code and check that in into the CodeCommit repository which is a git repository so DevOps people will write the code for the infrastructure and then they will deploy this application however the applications which are running there in the application server for that developer will write the code and similarly QA will write code for the unit test so everybody will check-in their code in the CodeCommit and then this code will be built which means compiled, unit tested and then there will be build artifact. So if you're using say Java, then there will be Jar and War files. Now once you have this build artifact you have to deploy that in the application server and for that in AWS you can use AWS CodeDeploy service, right? And further if you want to automate all these steps so as soon as developer checks in the code it should be built, unit tested and then if everything is okay then the new version of application should be deployed. So in that case you can use AWS CodePipeline service. So if you heard about the Jenkins it does exactly similar job, right? So this is how DevOps work in AWS. Okay, now when it comes to AWS I can just go on talking for
Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник