By now you have read quite a bit about how to write good, reliable code. While we can evaluate our code at the time of writing, it is a good idea to formalize this evaluation into tests. Code test will guarantee the correctness of your code, filtering out typos and bugs at each code revision. Moreover, tests can prove your code implements the desired solution. Read more about the relevance of unit testing in this blog.
Be aware that there are many different types of tests that you can use to make sure that your code is functioning as it should. Writing good tests is a complete field of expertise on its own, but as an ML engineer it is important that you are familiar with setting up tests and writing unit tests for your project.
Testing code is a common practice in software development and for Python there are several testing suites available. For example:
unittest (build-in in python),
doctest. But in this blog we will focus on the tool of our choice: PyTest.
A test within the PyTest suite is written as follows:
import time def test_patience(): time.sleep(10) assert "patience" not in "programmer"
It is a function that starts with some arrangements and ends with an assert. It's possible to have multiple assert statements in one test, but it is recommended to have just one assert per test. You have got complete freedom in the part before the assert, however using a common pattern adds more structure.
You can run the test by executing
pytest on the command line.
$ pytest =================== test session starts ==================== platform win32 -- Python 3.7.11, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 cachedir: tests\.pytest_cache rootdir: C:\Users\...\testing_for_data_science collected 1 item tests/test_marks.py tests\test_1.py . [100%] ==================== 1 passed in 10.04s ====================
The output shows some details about the test environment and ends with the one test that passed in 10.04 seconds.
Test discovery is a term used to describe the process of finding the tests in your codebase. Pytests test discovery mainly comes down to:
- Look for files called
- Within those files it selects
testprefixed functions or
testprefixed methods within a
Testprefixed test class.
Note: All tests written in unittest style are also valid Pytest tests, not the other way around.
A structure that is often used for test files is creating a separate test folder that mimics the file-structure of your package.
pyproject.toml setup.cfg mypkg/ __init__.py app.py view.py tests/ test_app.py test_view.py ...
It's essential to integrate testing in your workflow, here are some steps you can take to achieve this:
- Run tests in your IDE, for example in VS Code.
- When a test fails, your debugger can help to identify the problem.
- There is not a strict set of rules to determine how many tests you should write. The tests should make you feel confident about the robustness of your code. A first guideline is to aim for a test coverage above 80%.
- Integrate tests in your CI/CD pipeline. Block merge requests if a test fails.
If you want to learn more on how to write tests more efficiently, you can experiment with using the following decorators in your tests:
- Marks can be seen as labels to quickly select or deselect (groups of) tests. In some cases they also change the behaviour of a test.
- An example of this is Parametrization: a mark that enables you to run the same test on different cases.
- Fixtures are reusable parts for tests. This will eliminate duplicate code and speed-up tests.
- Fixtures placed in
conftest.py(a reserved filename) are available in all files in that directory and subdirectories.
- Making tests fast ensures that they are executed often. To achieve this you can mock all code outside your project and cache heavy computations.
- Tests should be repeatable, so all the data used in tests should be part of the repository.
- Use pytest.approx to compare arrays and pandas function assert_frame_equal to compare dataframes.
For your Vantage 101 case you have (re)written several functions. Write unit tests for these functions until you reach a test coverage of at least 80%.
You are free to choose which library you want to use, but we highly recommend PyTest.