Contributor Guide#
There are multiple ways to contribute to this project. You can report bugs in this library or propose new ideas via Github issues. This guide describes how to contribute code or documentation.
Installation#
Contributors should not install the package from PyPI but from the Github repository to get the latest version and to be able to manipulate the code. First download the repository and install the project in developer mode with developer dependencies.
$ git clone git@github.com/cblearn/cblearn.git
$ cd cblearn
$ pip install -e.[tests,docs]
The -e option installs the package in developer mode such that changes in the code are considered directly without re-installation.
- tests
To run the unit tests, the
pytestpackage is required, which can be installed by adding thetestsoption to the install command.- docs
Building these docs requires the
sphinxpackage, which can be installed by adding the docs option to the install command.
Now you can run the tests and build the documentation:
$ python -m pytest --remote-data # should run all tests; this can take a while.
$ cd docs
$ make html # should generate docs/_build/html/index.html
The project directory contains the code directory cblearn/ and the documentation docs/.
In addition, the folder contains a readme, license, multiple configuration files, and an examples folder.
Changing Code#
The Python code is structured in Importing Modules. Each module contains
a tests folder with unit tests.
There should be such a test for every method and function.
Use pytest --cov to run these tests and to measure the coverage; no tests should fail.
The coverage indicates the tested fraction of code and should be close to 100%.
All Python code follows the PEP8 Style Guide. The style
of all code can be checked, running flake8 . and should print no warnings.
Every class, method, and function should have a docstring describing the functionality and parameters.
Please follow the Google Docstring Style.
The docstring will be added to the API Reference by adding the function name in docs/references/index.rst.
Check the syntax of the docstring by running make html in the docs/ folder.
Types should not be added to the docstring but in the code as type hints.
Typechecks can be performed using mypy cblearn.
Remote data tests#
Tests that require remote data, for example fetching a dataset from the internet, are marked with @pytest.mark.remote_data
or +REMOTE_DATA (docstring).
These tests are skipped by default but can be run by adding the --remote-data flag to pytest.
Scikit-learn estimator tests#
scikit-learn provides a test suite that should ensure the compatibility of estimators.
The estimator classes that require triplet data should return
‘triplets’=True in the _get_tags method.
Based on this tag, our test suite extends the sklearn estimator test to handle comparison-based estimators.
This modification is not unusual; sklearn internally modifies the data and skips individual tests silently based on different tags (e.g. pairwise).
The modifications are:
Monkey-patching of
check_estimatorfunction to create triplets instead of featurized data.Skipping
check_methods_subset_invarianceandcheck_methods_sample_order_invarianceThese tests require a 1-to-1 relationship for X -> .transform(X). This will never be true for our estimators (n-to-m). The alternative to skipping them here would be the ‘non_deterministic’ tag, which would trigger sklearn to skip these but also additional tests.
All sklearn estimator tests can be skipped with pytest -m "not sklearn.
Changing Documentation#
The documentation is contained in the docs/ folder.
It can be built by running make html.
Open docs/_build/html/index.html in a browser to view the local build of the documentation.
The documentation is structured in multiple folders and written as reStructuredText.
Excursion: Run Github Tests Locally#
Instead of running the different tests above independently, it is also possible to run the whole testing workflow, which is used on Github, locally.
Install nektos’ act and then run act -P ubuntu-latest=nektos/act-environments-ubuntu:18.04-full
act uses docker images with preinstalled software to provide almost the same test environment as Github. If it is not yet so, you have to install docker and, optionally, make it accessible for non-root users.
Note
The docker image requires about 18 GB disk space. The first start of act might take some time, because it downloads about 12 GB of image files.
Publish Changes#
Most contributions will change files in the code or the documentation directory, as described in the
sections below. Commit your changes to a separate git branch (do not commit to master).
After changing, push this branch to Github and open a pull request to the master branch there.
Once the request is opened, automated tests are run.
If these tests indicate a problem, you can fix this problem on your branch and push again.
Once the automated tests are successful, maintainers of cblearn will review the changes and provide feedback.
Usually, after some iterations, your changes will be merged into the main branch.
Versions should be semantic and follow PIP440: The version indicates major.minor.fix;
breaking changes are just allowed with major version steps.
A Github release tag indicates a new version, which triggers a continuous deployment to PyPI via Github Actions.