Chapter 4: Clean code

So far you’ve built and optimised a machine learning model in a notebook environment. Before deploying your model and making it available for other services, first it’s important to make sure the code you write is of high quality. This chapter describes the best practices to write clean Python code.

Python is an elegant language with a syntax that is relatively easy for humans to understand, however, that does not necessarily imply that the Python code you write is clean. Bad quality code can lead to technical debt, which can have negative impact on the team you’re working in and the organization as a whole. It can lead to bugs, a lower level of understanding of what the code does and above all, future development can take a lot longer if you want to add a feature to an existing messy code base.

code_quality_measurement

Writing good quality code is a core competence of a machine learning engineer, and it will certainly make your life and that of the others around you easier. Code is typically read much more often than modified, and clean code is especially important for large and complex programs. In this chapter, you will read about guidelines for writing production-grade code and tools that automate some of the stuff for you!

4.1 Guidelines for Writing Production-grade Code

When working on a Python project in a company setting, it is a good idea to agree on guidelines that code has to comply with before being merged to the main branch. Some of these guidelines can be specific to the team you are working in, but most of the time guidelines are trivial for Python or even programming in general. Here, we will focus on the latter.

PEP 8

As a start, for Python specifically, there is an official style guide with standards for how code should look like. In practice, you will find that this guide is used as the basis for any codebase. This guide will of course not be the most exciting read in your life, but it is a good idea to read through it and standardize those principles into your development. An alternative read to the official guide is this Real Python article.

Naming Conventions

Variable names, and also filenames, should have descriptive and unambiguous names.
The following guidelines usually apply:

Use easy to pronounce names
Use snake_case for functions and variables and PascalCase for class names
Set magic numbers as constants instead of hardcoding them (e.g. ‘CUTOFF_THRESHOLD = 50’ instead of hardcoding ‘if n_products > 50’)
Avoid single letter variables, but also variable names that are too long

DRY & Atomic Functions

To keep your codebase concise and efficient, apply the Don’t Repeat Yourself (DRY) principle; if you need logic in your code repeatedly, wrap it in a function. When writing those functions it is a good idea to make those functions atomic. For details and examples please refer to this article.

In general, python functions should:

Be small and do only one thing.
Have few arguments (less than five). Should you need more, make it a class.
Use docstrings adhering to a consistent format (e.g. Google Docstrings).
Have no duplication.
Have descriptive names.

Imports

Imports should always be specific. Avoid using wildcard imports (e.g. from sklearn import *) as it pollutes the namespace. Other best practices about import statements are described in this paragraph of the official pep 8 style guide.

Comments

Your code should be self-explanatory as much as possible, therefore, it is recommended to only place comments if it is strictly needed. Placing many comments in your code can come at a risk of repeating yourself. Moreover, forgetting to update your comments after making changes to your code can lead to your comments becoming deprecated. Such issues as the latter will not be caught by linters.

4.2 Tools to Make Your Life Easier

Being able to write clean code is a great skill for any programmer, but it should not be a goal on its own. Usually, the degree to which your code should be clean depends on a number of factors, of which the most important is the extent to which this codebase will be revisited later on. Fortunately, smart people have come up with a lot of tools to automate some of the previously mentioned stuff for you, so that you can apply guidelines in any project quickly and easily! In this section, some of the most common and helpful tools will be presented that you can use to your advantage.

The IDE

If you are not using an IDE for developing yet, definitely make sure to read chapter 2 first. While notebooks are a great means for data exploration and quick experiments, an IDE should be your weapon of choice to write production code. IDEs support a lot of the tools discussed in this section, built-in or downloadable in the form of extensions.

We will touch upon some specific tools later on, but as a start definitely make sure to familiarize yourself with the built-in methods for refactoring your code. Look up a shortcut cheatsheet for the IDE you work in and use them to refactor code quickly and neatly. Some of my personal favorites are renaming variables throughout the script at once (fn+F2 or F2 in VSCode) and extracting selected code into a method (⌘+. or CRTL+. in VSCode).

Autoformatters

Of course by now you have read the PEP8 styleguide from start to end and memorized everything by heart. Are you not that person? Don’t worry, me neither! There are tools to automatically format your code so that the styling adheres to PEP8. So called autoformatters automatically takes care of stuff like line breaks and enters in the right positions in your codebase.

A popular autoformatter is Black, as it is the ‘uncompromised’ formatter keeping the number of discussions about style within a team to a minimum. Although Black is configurable, you can also go for an alternative like autopep8 if you find Black too strict. Make sure to agree on using the same autoformatter when working in a team and you will find that not only your code looks better and is easier to read, but you will also encounter less merge conflicts!

Additionally, there is a tool called isort to automatically sort imports alphabetically, separated into sections and by type as well.

Autoformatters can typically be used via built-in buttons or shortcuts in the IDE, be configured to run on save, be applied from the command line or on commit (keep on reading for this one!).

Linters

Linters automatically analyze your code to spot potential problems, and sometimes even suggest solutions for those problems. Usually, different levels of issues with code quality can be checked against:

Errors, that prevent your code from running like syntax errors or the use of undefined variables.
Strong warnings, for instance variables or imports that are defined but remain unused.
Weak warnings, mainly for styling issues, like ‘line too long’.

Popular linters for Python are flake8 and pylint, where the latter even grades your code on a 1-10 scale for the real back-to-school vibes. These linters are highly configurable and like autoformatters, they can be applied in various ways.

Testing

Production-grade code usually also requires that tests are written to ensure your code does what is expected. Functions accompanied with unit tests give other readers of your code directly an idea of what a function exactly is intended to do. See it as a form of documentation! Pytest is a tool for automating your tests. You can read more about the art of writing good Python tests in chapter 6.

Type checkers

Python is originally a dynamically typed language, meaning that errors are not thrown until the moment your system tries to execute a piece of code. Adding type hints (introduced in Python 3.5) to your code can not only serve as documentation, but it can also be machine-checked with a static type checker, like mypy. Type checkers help ensure that you’re using variables and functions in your code correctly, so they are able to find bugs in your programs without even running them.

Automate the automation with pre-commit!

Programmers tend to be lazy in the sense that if there is still a step to do manually, they will find a way to automate that step as well. Pre-commit is a great way to ensure that you will never again commit code to your git repo that is unworthy of production. You only have to configure pre-commit hooks once for things like autoformatting, linting or testing, and pre-commit then ensures that these rules are met when committing your code. This blog with code examples shows you how can tie everything together with Pre-commit and as a bonus also gives a tour of a very helpful tool, called Makefile!

4.3 Assignment

Try to make the code as clean and maintainable as possible using the guidelines and tools discussed in this chapter.

In practical terms:

Move the code from your notebooks (including the code that was already there) into python scripts.
Reformat the code into atomic functions.
Make your code compliant with the pep 8 style guide.
Review your code with a linter of your choice and implement its suggestions.
Bonus: Set up a workflow with Pre-commit and a Makefile.
Bonus 2: Add type hints to your functions and classes and enforce them with mypy.

About Vantage 101

Table of contents

Questions or feedback? Let us know!