If you have followed all the chapters up until now, you should have an optimized machine learning model, with an API, nicely wrapped up in a Docker container image, excellent work! Now it is time for the last chapter of Vantage 101. In this chapter you will learn how to share your work with the world, at scale, by leveraging cloud services.
10.1 A short history
Although the idea of 'the cloud' had been born before then, the company that arguably pioneered cloud services as we know them now is Amazon. In the early 2000's, Amazon, one of the largest online retailers at that point, wanted to find a solution to the fact that they normally only used around 10% of their server capacity.
Their solution was to rent this space out: Amazon started Amazon Web Services (AWS) in 2006. Long story short: this turned out to be a hugely profitable business, much more profitable than retail. When Amazon first reported AWS operating numbers in 2015, AWS already had 9.6 billion dollars in revenue, with a huge margin of 17%.
AWS's Q4 2021 revenue was \$71 billion, with an operating profit of \$5.3 billion. As the operating profit of Amazon as a whole was only \$3.5 billion, this illustrates why AWS is one of Amazon's most important ventures at the moment: part of the reason why AWS founder Andrew Jassy replaced Jeff Bezos when he stapped down as Amazon's CEO.
Microsoft and Google quickly caught on to how insanely profitable this market was and started investing more into their own cloud services: Microsoft Azure and Google Cloud Platform (GCP). Nowadays, cloud computing is a 180 billion dollar industry:
As you see in this figure, AWS leads the market with a 33% market share, followed by Azure and GCP with 21% and 10% market shares, respectively. As a machine learning engineer, our recommendation is to familiarize yourself with all three of the market leaders.
10.2 Why use cloud services
Using the cloud allows you to provision infrastructure much faster than using on-premise infrastructure (i.e. a server park). In addition, using cloud services for compute allows you to only pay for what you use instead of having a fixed cost. This means you can temporarily (elastically) provision a ton of servers when you need to train a complicated model or run a large query, but shut these servers down - and stop paying for them - as soon as the task is finished.
Storing data in the cloud is often safer than on premise because your data can be duplicated over data centers in different regions. You don't need to hire a team of engineers either; while most cloud services are highly configurable, they can also be provisioned in a few clicks.
10.3 When do you use cloud services?
In short: for all operations that take too long on your local machine or are too big to store. After all, cloud services are ultimately made for carrying out operations as quickly as possible and storing your data as efficiently as possible.
Now, of course, you might happen to be one of the lucky ones who has access to a supercomputer with several petabytes of storage, 500GB of RAM and an arsenal of RTX 3090s to train the heaviest neural networks on, but even then, processing your operations and data in the cloud is still more advantageous for a few reasons:
- The software on your (super)computer is not automatically maintained. Much of the software we machine learning engineers use is open source which can be a lot of work to maintain and keep up to date. For managed services in the cloud, which we'll talk more about later in this chapter, this is often handled for you.
- This is not only the case for maintaining software, but also for hardware. Now your supercomputer will be up-to-date for a while, but with cloud computing you can be sure that you are future proof.
- Each computer has its limits in terms of capacity and therefore makes it difficult to scale. The cloud does not have this problem because when a server reaches its physical limits its workload is distributed to other servers creating an unlimited scalability.
- This flexibility also allows your infrastructure to be located anywhere in the world. This is advantageous if, for example, there is a war in the country where you have your infrastructure stored. With the push of a button you can transfer it to servers in another area and your work will always be safe.
There are a few tasks you can think of that you would want to outsource to the cloud:
- Tasks that should always be accessible The cloud does not sleep! Meaning that tasks that should always be accessible are great for the cloud. Examples are scheduling operations, API's and databases.
- Training machine learning models Whether it's a heavy deep learning model where you need the correct hardware, have a lot of data to process or want to monitor a whole bunch of training parameters of your model the cloud has plenty of resources. Often having environments already created for you to do such tasks.
- Processing "big data" Lastly, processing the "big data" is what the cloud was originally build for! Having the most efficient databases available with the touch of a button you can store whatever you want and access it with hyperspeeds.
There are many more applications to use the cloud for so just get some free credits from different cloud services and try them out;
10.4 What cloud services to use?
The cloud is nothing more but a combination of computers (a.k.a servers) and other hardware running in a datacenter for you to access online. These servers have functionalities, similar to those of your personal computer, grouped into different services offered by the cloud service provider. Some to store your data, some for computing power and a lot of services that utilize these other services but with some software made on top of it (e.g. Sagemaker for notebooks, distributed model training and MLFlow tracking). Let's explore some of the most used services of AWS, what they do and how you could use them for your next project.
EC2 (Azure: Azure Virtual Machines, GCP: Compute Engine)
We have talked about how the cloud in essence is just like a computer. AWS' EC2 is the best example for this analogy. AWS EC2 (Amazon Web Services Elastic Compute Cloud) is a service that allows for using virtual machines called EC2 instances in the cloud to provide scalability. Namely, you can configure the computer as you like. Change the size of the RAM? No problem. A better CPU? Let's do it! In terms of software, you install your preferred OS and you can use this instance then as a virtual machine. You use EC2 when you need to run applications that run longer than 900 seconds, for applications with a variable execution time or when you need a lot of power (RAM, CPU, etc.). Otherwise you could use a serverless service which will save you money. To have a better understanding of EC2 I would recommend to explore this tutorial and use your free credits to do some testing yourself.
S3 (Azure: Blob Storage, GCP: Google Storage)
To store all your files on your computer, you'll need a harddrive. S3 is a service from Amazon which you could see as a harddrive. You won't store structured relational databases on it but it stores your raw data perfectly. Like all services, S3 can be accessed via its SDK and other AWS services (if you want to you could even access it from other cloud providers). S3 storage is normally used in the pre- or post-processing process. For instance, when you want to store some raw data that needs to go through an ETL (Extract, Transform and Load) process. To have a better understanding of S3 I would recommend to explore this tutorial and use your free credits to do some testing yourself.