About Vantage 101

Welcome to Vantage 101, the machine learning engineering course that bridges the gap between university and the business world. This course is designed to quickly get a grasp on all best practices you wish you would have known at the beginning of your career.

GitHub-Logo Check out the GitHub repo here

Questions or feedback? Let us know!

Chapter 1: Case Introduction

Australian Rainfall Case - Domain Background

Australia has always had a tough struggle with water. Due to the arid climate and minimal freshwater resources, Australians have had to deal with an increasing amount of droughts each summer. According to the Australian Bureau of Meteorology, 80% of the land receives less than 600mm of rainfall annually, with 50% receiving less than 300mm. To put that into perspective, the Netherlands got almost 900mm of rainfall in 2019.

In a drastic turn of events, the national meteorological bureau has been hit with a shortage of hotdogs and hotdog buns, leading to a nation-wide strike of meteorological experts. Without their usual weather forecasts, the nation's farmers turn to you for help. They have scavenged and stolen data from different weather stations throughout the country and have asked you to help them predict rainfalls, so that they can collect water for their crops and showers.

Objective

In order to help Australia's farmers, we will forecast whether or not it will rain tomorrow. The dataset provided will contain features relating to location, temperature, humidity, wind and cloud coverage. All these inputs can be used to forecast whether or not the locals should put out their water collection pans for tomorrow's rain.

Data Availability

Our dataset consists of ~145k rows and contains 22 different features, all of which can be used to model the target variable, RainTomorrow. The data is collected from 49 different locations in Australia between 2007 and 2017.

Your Turn

Take a look at the notebook in notebooks/australian_rainfall.ipynb in the Vantage 101 Github repository. We created this Jupyter Notebook as a basis to start our course on. It runs through an EDA and data split, followed by building and evaluating a simple model.

While we are aware that there is much more we could do regarding the ML side, such as training a better model, our focus in this course is to industrialize and deploy an existing model—no matter its quality.

Once you're done running the notebook, head on to chapter 2 for the first step in this course.