If you have followed all the chapters up until now, you should have an optimized machine learning model, with an API, nicely wrapped up in a Docker container image, excellent work! Now it is time for the last chapter of Vantage 101. In this chapter you will learn how to share your work with the world, at scale, by leveraging cloud services.
Although the idea of 'the cloud' had been born before then, the company that arguably pioneered cloud services as we know them now is Amazon. In the early 2000's, Amazon, one of the largest online retailers at that point, wanted to find a solution to the fact that they normally only used around 10% of their server capacity.
Their solution was to rent this space out: Amazon started Amazon Web Services (AWS) in 2006. Long story short: this turned out to be a hugely profitable business, much more profitable than retail. When Amazon first reported AWS operating numbers in 2015, AWS already had 9.6 billion dollars in revenue, with a huge margin of 17%.
AWS's Q4 2021 revenue was \$71 billion, with an operating profit of \$5.3 billion. As the operating profit of Amazon as a whole was only \$3.5 billion, this illustrates why AWS is one of Amazon's most important ventures at the moment: part of the reason why AWS founder Andrew Jassy replaced Jeff Bezos when he stapped down as Amazon's CEO.
Microsoft and Google quickly caught on to how insanely profitable this market was and started investing more into their own cloud services: Microsoft Azure and Google Cloud Platform (GCP). Nowadays, cloud computing is a 180 billion dollar industry:
As you see in this figure, AWS leads the market with a 33% market share, followed by Azure and GCP with 21% and 10% market shares, respectively. As a machine learning engineer, our recommendation is to familiarize yourself with all three of the market leaders.
Using the cloud allows you to provision infrastructure much faster than using on-premise infrastructure (i.e. a server park). In addition, using cloud services for compute allows you to only pay for what you use instead of having a fixed cost. This means you can temporarily (elastically) provision a ton of servers when you need to train a complicated model or run a large query, but shut these servers down - and stop paying for them - as soon as the task is finished.
Storing data in the cloud is often safer than on premise because your data can be duplicated over data centers in different regions. You don't need to hire a team of engineers either; while most cloud services are highly configurable, they can also be provisioned in a few clicks.
In short: for all operations that take too long on your local machine or are too big to store. After all, cloud services are ultimately made for carrying out operations as quickly as possible and storing your data as efficiently as possible.
Now, of course, you might happen to be one of the lucky ones who has access to a supercomputer with several petabytes of storage, 500GB of RAM and an arsenal of RTX 3090s to train the heaviest neural networks on, but even then, processing your operations and data in the cloud is still more advantageous for a few reasons:
There are a few tasks you can think of that you would want to outsource to the cloud:
There are many more applications to use the cloud for so just get some free credits from different cloud services and try them out;
The cloud is nothing more but a combination of computers (a.k.a servers) and other hardware running in a datacenter for you to access online. These servers have functionalities, similar to those of your personal computer, grouped into different services offered by the cloud service provider. Some to store your data, some for computing power and a lot of services that utilize these other services but with some software made on top of it (e.g. Sagemaker for notebooks, distributed model training and MLFlow tracking). Let's explore some of the most used services of AWS, what they do and how you could use them for your next project.
We have talked about how the cloud in essence is just like a computer. AWS' EC2 is the best example for this analogy. AWS EC2 (Amazon Web Services Elastic Compute Cloud) is a service that allows for using virtual machines called EC2 instances in the cloud to provide scalability. Namely, you can configure the computer as you like. Change the size of the RAM? No problem. A better CPU? Let's do it! In terms of software, you install your preferred OS and you can use this instance then as a virtual machine. You use EC2 when you need to run applications that run longer than 900 seconds, for applications with a variable execution time or when you need a lot of power (RAM, CPU, etc.). Otherwise you could use a serverless service which will save you money. To have a better understanding of EC2 I would recommend to explore this tutorial and use your free credits to do some testing yourself.
To store all your files on your computer, you'll need a harddrive. S3 is a service from Amazon which you could see as a harddrive. You won't store structured relational databases on it but it stores your raw data perfectly. Like all services, S3 can be accessed via its SDK and other AWS services (if you want to you could even access it from other cloud providers). S3 storage is normally used in the pre- or post-processing process. For instance, when you want to store some raw data that needs to go through an ETL (Extract, Transform and Load) process. To have a better understanding of S3 I would recommend to explore this tutorial and use your free credits to do some testing yourself.