Before we get started with working on the case, it is important to configure and enhance your development environment, as this will make your life a lot easier in the long run. Consider it like sharpening your axe before chopping away at a big tree. This chapter covers how to setup your machine based on the Operating System (OS) it is running on. Secondly, it provides an introduction to Integrated Development Environments (IDEs) and recommendations on which IDE to use and how to extend it’s functionalities. Furthermore, it shows you how to improve your user experience when dealing with your terminal and the differences between e.g.
Bash. Lastly, it shows you how to configure Git and the Github CLI, to be able to version control the code that you will write throughout the case.
2.1 Operating System
UNIX-like compatibility is strongly preferred when doing Machine Learning Engineering work. The rest of this chapter, as well as the course more broadly, will assume that you are using such an environment. Therefore, this first paragraph lays out the various options you can go with based on the OS your machine is running on.
There are, generally speaking, three broad operating system setups that ensure UNIX-like compatibility:
- MacOS: MacOS is a UNIX-based OS, so if that’s your operating system, then you have compatibility without extra steps. Lucky you!
- Linux distribution: Many distributions of Linux exist, but if you need us to suggest one, then Ubuntu is probably the best choice for you. It is also possible to install both Ubuntu and Windows on the same machine and choose which one to boot on startup (dual-booting); you can find the steps to do that here. Alternatively, if you want both Windows and Ubuntu to encrypt their data, which we would recommend, you can follow this guide.
- Windows with WSL (Windows Subsystem for Linux): WSL is a compatibility layer for Windows that allows it to execute Linux binaries. Many guides exist for setting up WSL, but here's one.
If your machine already has one of the three options above then you can continue to the next paragraph about IDEss. If not, you are probably running on a Windows machine. In this case, you essentially have three choices (1) Windows with WSL, (2) dual-boot Ubuntu+Windows, or (3) Ubuntu only. Each choice has pros and cons which heavily depend on your use-case and may be difficult to anticipate, but nevertheless, some rules of thumb:
- If you are dabbling with this as a side project and you have no specific intrinsic interest in installing Ubuntu, there is probably little reason to go for a dedicated installation. You should be fine with a WSL setup. Which is not to say that a pro shouldn’t use WSL!
- If you do want a dedicated installation, whether you want to only install Ubuntu or dual-boot Ubuntu and Windows depends on what you intend to use this machine for. Ubuntu can handle your ML development and web browsing activity entirely and seamlessly. It can also handle some amount of office work and gaming, but it is still definitely better to do those things under Windows, so if these are significant use cases of your machine - it is probably best to go for a dual-boot setup.
This is “only one step”, but make no mistake - it can be finicky and error-prone, your system’s hardware or software may have small differences unaccounted for in guides which may force you to deviate from the guide, and a seemingly small mistake may turn out to actually be critical later, forcing you to redo the entire process. It truly pays to be patient with this and do it right.
2.2 Choosing your IDE
The IDE is the most fundamental programming tool. It typically comprises of a code editor, debugger, compiler, and automation tooling. Many IDE’s include real-time coding assistance, from error highlighting to intelligent code completion. Some popular code editors are Visual Studio Code (VSCode), Pycharm, Spyder, Sublime, Atom and Vim. If you are transitioning from R, then Spyder is a good choice. Otherwise, VSCode and Pycharm are the most popular choices, where we would ultimately recommend VSCode due to its extensive plugin support. We will show you our favourite extensions in the next paragraph, but first install VSCode from here. We will provide tips and shortcuts for VSCode in the remainder of this course, however if you prefer to use another IDE then that is perfectly fine as well!
As mentioned before, VSCode supports a large number of extensions, which can be installed from the marketplace. To install some of our favorite ones, open the VS Code terminal (^ + ` or CTRL + `) and run the following commands:
code --install-extension emmanuelbeziat.vscode-great-icons code --install-extension ms-python.python code --install-extension KevinRose.vsc-python-indent code --install-extension redhat.vscode-yaml code --install-extension ms-azuretools.vscode-docker code --install-extension bungcip.better-toml code --install-extension njpwerner.autodocstring code --install-extension 4ops.terraform
Here is a list of the extensions you are installing with a description of each one:
- VSCode Great Icons: Inserts a nice icon next to your filename based on its type.
- Python: Offers general support for the Python language such as IntelliSense (Pylance), linting, debugging, code navigation, code formatting, refactoring, variable explorer, test explorer, and more!
- Python Indent: Helps with correctly indenting your Python code.
- YAML: Detects whether the entire file is valid yaml. This helps a lot, as a yaml errors are easy to make!
- Docker: Makes it easy to build, manage, and deploy containerized applications from Visual Studio Code. It also provides one-click debugging of Node.js, Python, and .NET Core inside a container.
- Better TOML: TOML are configuration files which are for example used with
Poetry, some syntax highlighting is done using this extension.
- autoDocstring: Writing your own docstrings in the right format is tedious without this extension. It allows you to quickly generate docstrings for Python functions.
- Terraform: Adds syntax support for the
Terragruntconfiguration language. Nice to have if you are using Terraform for your infrastructure configuration.
There are many reasons why you would use a terminal to interact with your computer:
- You can type faster than you click
- To interact with a server
- It’s the entrypoint for shell scripting
- To interact with applications like
Git, which we will talk about later
Instead of using the default bash shell, we recommend using
zsh are almost identical, but
zsh is more interactive and customizable, which we will experience when downloading extra themes and plugins. Other advantages are:
- Automatic cd: Just type the name of the directory
- Recursive path expansion: For example “/u/lo/b” expands to “/usr/local/bin”
- Spelling correction and approximate completion: If you make a minor mistake typing a directory name, ZSH will fix it for you
We will install
zsh along with other useful tools, including
git, which is a command line tool that we use for version control. Open a VSCode terminal and copy paste the following commands:
sudo apt update sudo apt install -y vim tmux tree git ca-certificates curl jq unzip zsh apt-transport-https gnupg software-properties-common direnv sqlite3 make
These commands will ask for your password: type it in.
Oh My Zsh
You can make your terminal look nicer and also more readable with Oh My Zsh! They have a bunch of different themes, but the default one already gives you standard clarity on which git branch you are at all times!
Install it using:
sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
To make Oh My Zsh even more better looking you can use a theme such as https://github.com/romkatv/powerlevel10k.
Some other plugins that are recommended are:
2.4 Git & Github
Git is a free and open-source DevOps tool for version control. It is ubiquitous and used to handle small to very large projects efficiently. The git manual is a nice reference guide for all the different git commands and for in-depth explanations on how it works. Git is best learned by practicing, but there are two resources that we would like to highlight to get you started:
- This video is a great 15 minutes introduction to the main concepts of Git
- This interactive course is a great way to get some hands-on experience
- After understanding the main building blocks of git, it is useful to know preferred workflows for it. This article provides a great explanation.
Github is a website and cloud-based service that helps developers store and manage their code, as well as track and control changes to their code. To interact with your GitHub account from your terminal, we have to install GitHub official Command Line Interface (CLI). In your terminal, copy-paste the following commands and type in your password if asked;
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null sudo apt update sudo apt install -y gh
To check that
gh (Github CLI) has been successfully installed on your machine, you can run:
✔️ If you see
gh version X.Y.Z (YYYY-MM-DD), you’re good to go 👍
To login, copy-paste the following command in your terminal:
gh auth login -s 'user:email' -w
Answer the questions that
gh asks, and check at the end that you are properly connected using:
gh auth status
To further delve into Github, you can follow tutorials on Github Learning Lab, depending on your level. At the very least it is important to understand pull requests, conflicts, and how to clone/push to and from a remote repository.
It is now time to practice with Git:
- Initialize git in a new folder/project
- Create a new branch called
- Add some files with a few lines of code
- Stage these changes using
- Commit the changes using
- Observe the changes using
- Create a new feature branch from the
developmentbranch, feel free to be creative with the name of the branch.
- Add changes in a way that you expect to have a merge conflict, merge them, and solve the conflicts. If it seems like a dumb idea to intentionally create merge conflicts, good! Because it is. However, merge conflict do occur from time to time so it is good to practice with them.
- Finally push your repo to your Github account (create an account if you do not have one, it’s free). From now on you can always make a commit at the end of every assignment and push it to your remote codebase! Congratulations!