Menu

About Vantage 101

Welcome to Vantage 101, the machine learning engineering course that bridges the gap between university and the business world. This course is designed to quickly get a grasp on all best practices you wish you would have known at the beginning of your career.

GitHub-Logo Check out the GitHub repo here

Questions or feedback? Let us know!

Chapter 6: Testing

By now you have read quite a bit about how to write good, reliable code. While we can evaluate our code at the time of writing, it is a good idea to formalize this evaluation into tests. Code test will guarantee the correctness of your code, filtering out typos and bugs at each code revision. Moreover, tests can prove your code implements the desired solution. Read more about the relevance of unit testing in this blog.

Be aware that there are many different types of tests that you can use to make sure that your code is functioning as it should. Writing good tests is a complete field of expertise on its own, but as an ML engineer it is important that you are familiar with setting up tests and writing unit tests for your project.

Testing code is a common practice in software development and for Python there are several testing suites available. For example: unittest (build-in in python), nose and doctest. But in this blog we will focus on the tool of our choice: PyTest.

Writing tests

A test within the PyTest suite is written as follows:

import time

def test_patience():
    time.sleep(10)
    assert "patience" not in "programmer"

It is a function that starts with some arrangements and ends with an assert. It's possible to have multiple assert statements in one test, but it is recommended to have just one assert per test. You have got complete freedom in the part before the assert, however using a common pattern adds more structure.

You can run the test by executing pytest on the command line.

$ pytest
=================== test session starts ====================
platform win32 -- Python 3.7.11, pytest-6.2.4, py-1.11.0, 
pluggy-0.13.1
cachedir: tests\.pytest_cache
rootdir: C:\Users\...\testing_for_data_science
collected 1 item                         tests/test_marks.py   

tests\test_1.py .                                     [100%]
==================== 1 passed in 10.04s ====================

The output shows some details about the test environment and ends with the one test that passed in 10.04 seconds.

Test discovery

Test discovery is a term used to describe the process of finding the tests in your codebase. Pytests test discovery mainly comes down to:

  1. Look for files called test_*.py or *_test.py.
  2. Within those files it selects test prefixed functions or test prefixed methods within a Test prefixed test class.

Note: All tests written in unittest style are also valid Pytest tests, not the other way around.

A structure that is often used for test files is creating a separate test folder that mimics the file-structure of your package.

pyproject.toml
setup.cfg
mypkg/
    __init__.py
    app.py
    view.py
tests/
    test_app.py
    test_view.py
    ...

Workflow

It's essential to integrate testing in your workflow, here are some steps you can take to achieve this:

  • Run tests in your IDE, for example in VS Code.
  • When a test fails, your debugger can help to identify the problem.
  • There is not a strict set of rules to determine how many tests you should write. The tests should make you feel confident about the robustness of your code. A first guideline is to aim for a test coverage above 80%.
  • Integrate tests in your CI/CD pipeline. Block merge requests if a test fails.

Advanced topics

If you want to learn more on how to write tests more efficiently, you can experiment with using the following decorators in your tests:

  • Marks can be seen as labels to quickly select or deselect (groups of) tests. In some cases they also change the behaviour of a test.
  • An example of this is Parametrization: a mark that enables you to run the same test on different cases.
  • Fixtures are reusable parts for tests. This will eliminate duplicate code and speed-up tests.
  • Fixtures placed in conftest.py (a reserved filename) are available in all files in that directory and subdirectories.
  • Making tests fast ensures that they are executed often. To achieve this you can mock all code outside your project and cache heavy computations.
  • Tests should be repeatable, so all the data used in tests should be part of the repository.
  • Use pytest.approx to compare arrays and pandas function assert_frame_equal to compare dataframes.

Assignment

For your Vantage 101 case you have (re)written several functions. Write unit tests for these functions until you reach a test coverage of at least 80%.

You are free to choose which library you want to use, but we highly recommend PyTest.