.. _getting_started: ================ Getting Started ================ ----- Setup ----- ``cblearn`` requires Python 3.9 or newer. The package is mainly tested on Linux, but Windows and Mac OS should work, too. Python environment ================== The easiest way to install Python and its dependencies is using a Anaconda_ environment or similar, because dependencies do not conflict with other Python packages you may have installed and the Python version can be specified. .. _Anaconda: https://docs.anaconda.com/anaconda/install/ .. code-block:: bash conda create -n cblearn python==3.9 conda activate cblearn Install cblearn =================== ``cblearn`` and can be installed using `pip`: .. code-block:: bash pip install cblearn This will install the minimal set of required packages, sufficient for most uses and saving disk space. However, some features require more packages that can be installed by adding an ``option`` to the install command. .. _extras_install: Install Extra Requirements ========================== Extra requirements can be installed by adding an ``option`` to the install command and enable more advanced features. Some of those extra dependencies need non-Python packages to be installed first. .. code-block:: bash $ pip install cblearn[torch,wrapper] h5py torch Most estimators provide an (optional) implementation using ``pytorch`` to run large datasets on CPU and GPU. This requires the ```pytorch`` `_ package to be installed manually or by adding the ``torch`` extras option to the install command. Note that ``pytorch`` might need about 1GB of disk space. wrapper The estimators in :ref:`references_embedding_wrapper` provide an Python interface to the original implementation in ``R``-lang. This requires the ``rpy2`` package to be installed by adding the ``wrapper`` option to the install command. Additionally, this requires an installed ``R`` interpreter whit must available be in the ``PATH`` environment variable. The ``R`` packages are installed automatically upon the first use of the estimators. h5py The function :func:`cblearn.datasets.fetch_imagenet_similarity` requires the ``h5py`` package to load the dataset. This can package can be installed with pip. Note that some platforms require additionally the ``hdf5`` libraries to be installed `manually `_. ----------- Quick Start ----------- `cblearn` is designed to be easy to use. The following example generates triplets from a point cloud, each specifying if point A is closer to point B or C, and fits an ordinal embedding model to the triplets. This ordinal embedding model is then used to predict the relative distances between the points. .. literalinclude:: quickstart.py :language: python :linenos: The output should show a trend similar to the following:: Triplets | Error (SSE) ---------------------- 25 | 0.913 100 | 0.278 400 | 0.053 1600 | 0.001 The Procrustes distance measures the sum of squared errors between points and embedding after aligning the embedding to the points (i.e., by optimizing rotating, translation, and scaling). The error approaches zero, demonstrating that the relative distances in the point cloud can be reconstructed from triplets only once enough are available. The triplet generator's `result_format` option specifies the expected data format of the triplets, as triplets can be represented in different ways. This example uses the `list-order` format, a list of triplets, containing the indices of an anchor, near, and far point. Learn more about data formats and other aspects of the library in the :ref:`user_guide`. Alternatively, you can find more code in the :ref:`examples` or get an overview of the :ref:`api`.