Contributor Guide
There are multiple ways to contribute to this project. You can report bugs in this library or propose new ideas via Github issues. This guide describes how to contribute code or documentation.
Getting Started
We assume you downloaded and installed cblearn
as described in Contributor Installation.
The project directory contains the code directory cblearn/
and the documentation docs/
.
In addition, the folder contains a readme, license, multiple configuration files, and an examples folder.
Changing Code
The Python code is structured in Importing Modules. Each module contains
a tests folder with unit tests.
There should be such a test for every method and function.
Use pytest --cov
to run these tests and to measure the coverage; no tests should fail.
The coverage indicates the tested fraction of code and should be close to 100%.
All Python code follows the PEP8 Style Guide. The style
of all code can be checked, running flake8 .
and should print no warnings.
Every class, method, and function should have a docstring describing the functionality and parameters.
Please follow the Google Docstring Style.
The docstring will be added to the API Reference by adding the function name in docs/references/index.rst
.
Check the syntax of the docstring by running make html
in the docs/
folder.
Types should not be added to the docstring but in the code as type hints.
Typechecks can be performed using mypy cblearn
.
Remote data tests
Tests that require remote data, for example fetching a dataset from the internet, are marked with @pytest.mark.remote_data
or +REMOTE_DATA
(docstring).
These tests are skipped by default but can be run by adding the --remote-data
flag to pytest
.
Scikit-learn estimator tests
scikit-learn
provides a test suite that should ensure the compatibility of estimators.
We use this test suite to test our estimators, too, but have to skip some tests because they use artificial data incompatible
to comparison data. Typically, cblearn
estimators are compatible with scikit-learn
estimators
if comparisons are represented as numpy
arrays. From an API perspective,
comparison arrays look like discrete features and class labels; however, not all discrete features and class labels are valid comparisons.
In the future scikit-learn might simplify the usage of custom data generation routines during the compatibility tests. Otherwise, we might replace those incompatible tests with our own.
All sklearn estimator tests can be skipped with pytest -m "not sklearn
.
Changing Documentation
The documentation is contained in the docs/ folder.
It can be built by running make html
.
Open docs/_build/html/index.html
in a browser to view the local build of the documentation.
The documentation is structured in multiple folders and written as reStructuredText.
Excursion: Run Github Tests Locally
Instead of running the different tests above independently, it is also possible to run the whole testing workflow, which is used on Github, locally.
Install nektos’ act and then run act -P ubuntu-latest=nektos/act-environments-ubuntu:18.04-full
act uses docker images with preinstalled software to provide almost the same test environment as Github. If it is not yet so, you have to install docker and, optionally, make it accessible for non-root users.
Note
The docker image requires about 18 GB disk space. The first start of act might take some time, because it downloads about 12 GB of image files.
Publish Changes
Most contributions will change files in the code or the documentation directory, as described in the
sections below. Commit your changes to a separate git branch (do not commit to master
).
After changing, push this branch to Github and open a pull request to the master
branch there.
Once the request is opened, automated tests are run.
If these tests indicate a problem, you can fix this problem on your branch and push again.
Once the automated tests are successful, maintainers of cblearn
will review the changes and provide feedback.
Usually, after some iterations, your changes will be merged into the main
branch.
Versions should be semantic and follow PIP440: The version indicates major.minor.fix
;
breaking changes are just allowed with major version steps.
A Github release tag indicates a new version, which triggers a continuous deployment to PyPI via Github Actions.