User Guide

Most Machine Learning algorithms use numerical training data (features) for inference, either representing points in a Euclidean space, similarities, or distances. The are settings, e.g. in human studies, when metric points are not available but only ordinal comparisons. Comparison-based learning algorithms are the machine learning algorithms applicable in this setting.

Triplet comparisons

Triplet comparisons are the most common form of ordinal comparisons. For the triplet of objects \((i, j, k)\) one can ask, “Is the object i more similar to the object j or k?”. For the unknown points \((x_i, x_j, x_k)\) and the distance metric \(\delta\), the question corresponds to the following inequality:

\[\delta(x_i, x_j) \le \delta(x_i, x_k).\]

This library supports two representation formats of triplets in an array or a sparse matrix form. The array form uses 2d numpy arrays representing a triplet per row and columns for i,j,k. Alternatively to the ordering, an additional response array containing 1 or -1 can specify if (i,j,k) is correct or wrong. The sparse matrix is an alternative representation, where triplets are naturally specified as the matrix indices, containing entries 1 or -1.

Scikit-learn compatibility

All estimators in this library are compatible with the scikit-learn API and can be used in scikit-learn pipelines if comparisons are represented in the array format. The scikit-learn compatibility is achieved by implementing the fit, predict, and score methods of the BaseEstimator class.

The fit method is used to train the model, the predict method is used to predict the labels of the test data, and the score method is used to evaluate the model on the test data. In the case of ordinal embedding, for example, the predict method returns the triplet response according to the embedding and the score method returns the triplet accuracy (the fraction of correct triplet responses).

The Ordinal Embedding example shows how to use a scikit-learn cross validation function with an ordinal embedding estimator.

Pytorch backend (CPU/GPU)

The default backend for computations is the scipy stack, optimized for fast CPU computations and minimal overhead in both compute and disk space. However, this comes with limitations when implementing new methods and for calculations with very large data sets.

As an alternative for some estimators, a pytorch implementation exists.: To use this implementation, pytorch must be installed (see Extra Requirements) and, if necessary, the option backend='torch' must be set (see the respective function documentation).

These estimators take care automatically of the data transfer between numpy and torch (internal data representation) and use a batched optimizer for faster convergence. If a CUDA GPU is available, the computations are automatically performed on the GPU.

pytorch itself needs a lot of hard disk space and starting the optimization has a certain overhead (automatic derivation, data transformation).

It is therefore advisable to use the scipy backend by default and only change if necessary.

Dataset loading utilities

Musician Similarity dataset

This dataset contains triplets gathered during the MusicSeer similarity survey in October 2002.

In a web-based survey or game, the user was presented a target musician and multiple others to select the most similar to the target. Such, for each user judgement multiple triplets were created with the remaining others.

Data Set Characteristics:

Triplets

131.970

Objects (Artists)

448

Dimensionality

unknown

This is is based on the original dataset, that was used in the ISMIR paper that is referenced below with 138.338 triplets and 413 artists, but make some modifications. We drop triplets that are missing the third (other) entry. Some artists in the triplets are missing in the provided name list, we call them ‘unknown_0’, ‘unknown_1’, etc.

This dataset can be downloaded using the cblearn.datasets.fetch_musician_similarity().

When using these triplets, please give credit to the original authors.

Food Similarity dataset

The food dataset contains triplets collected from Amazon Mechanical Turk in 2014.

The crowd workers were presented a target and multiple other of the 100 food images. They selected a fixed number of other images, which taste more similar to the target than the remaining. Per user selection, multiple triplet constraints were created.

Data Set Characteristics:

Triplets

190376

Objects

100

Dimensionality

unknown

This dataset can be downloaded using the cblearn.datasets.fetch_food_similarity().

When using this data, please consider the fair use statement above and give credit to the original authors.

Car Similarity dataset

This dataset contains triplets of 60 car images, responsed in an online survey. The people chose one car of three, such that the following statement is true: “Object A is the most central object within the triple of objects (A,B,C)”.

All images were found on Wikimedia Commons and are assigned to one of four classes: ORDINARY CARS, SPORTS CARS, OFF-ROAD/SPORT UTILITY VEHICLES, and OUTLIERS.

The corresponding car images are available with the full dataset. .. _full dataset: http://www.tml.cs.uni-tuebingen.de/team/luxburg/code_and_data/index.php

Data Set Characteristics:

Triplets

7097

Objects (Cars)

60

Query

3 cars, most-central

This dataset can be downloaded using the cblearn.datasets.fetch_car_similarity(). To use the most-central triplets with e.g. ordinal embedding algorithms, you should convert them to standard triplets (cblearn.dataset.triplets_from_mostcentral()).

Please cite the following paper if you use this dataset in publications.

Imagenet Similarity dataset

This dataset contains comparison trials of images from the ImageNet validation dataset (ILSVRC-2012). In an crowd sourced experiments, subjects ranked two out of 8 images that appeared most similar to a reference image. The trials where selected in an active learning routine, such that they already are not too dissimilar within a trial.

There are two versions of this dataset: Version “0.2” has trials for all 50 ImageNet validation images per class, version “0.1” has trials for a single image per class.

The whole dataset is published under CC-By Attribution 4.0 International by Brett Roads.

Data Set Characteristics:

Trials v0.1/v0.2

25,273 / 384,277

Objects (Images)

1,000 / 50,000

Classes

1,000

Query

rank 2 from 8

This dataset can be downloaded using the cblearn.datasets.fetch_imagenet_similarity(). To use the 8-rank-2 trials with e.g. ordinal embedding algorithms, they can be converted to standard triplets with cblearn.preprocessing.triplets_from_multiselect().

Please cite the following paper if you use this dataset in publications.

Things Similarity dataset

This dataset contains odd-one-out trials of images from the Things image database. In an crowd sourced experiments, subjects were asked to choose one of three images, that is the odd-one. Note: The trials used here, are the test trials of the original paper. Their train trials are not published.

The data is shared under CC-BY-4.0 by Hebart, M. N., Zheng, C. Y., Pereira, F., and Baker, C. I.

Data Set Characteristics:

Trials

146,012

Objects (Things)

1,854

Query

3 images, odd one out

This dataset can be downloaded using the cblearn.datasets.fetch_things_similarity(). To use the odd-one-out trials with e.g. ordinal embedding algorithms, they can be converted to standard triplets with cblearn.preprocessing.triplets_from_oddoneout().

Please cite the following paper if you use this dataset in publications.

Nature and Vogue datasets

The nature and vogue datasets consist of odd-one-out triplets of the form “Out of three shown items pick one that appears to be different from the two others”.

The items were either images of natural scenes (forests, beaches, mountaints, etc.) or covers of the Vogue magazine.

Data Set Characteristics:

Triplets (Covers)

1107

Objects (Covers)

60

Triplets (Scenes)

3355

Objects (Scenes)

120

This datasets can be downloaded using cblearn.datasets.fetch_nature_scene_similarity() and cblearn.datasets.fetch_vogue_cover_similarity() To use the odd-one-out triplets with e.g. ordinal embedding algorithms, convert them to standard triplets with cblearn.dataset.triplets_from_oddoneout().

Please cite the following paper if you use this dataset in publications.

Material Similarity dataset

This dataset contains triplets of 100 material images, gathered in a crowd sourced experiment. The subjects chose for triplets of one reference and two candidate images “Which of these two candidates has a more similar appearance to the reference?”. The trials where actively chosen such that they maximize the information gain (CKL algorithm).

Experimental code and the material images are available at the dataset author’s Github repository. .. _Github repository: https://github.com/mlagunas/material-appearance-similarity

Data Set Characteristics:

Triplets Train/Test

22801 / 3000

Responses

92892 / 11800

Objects (Materials)

100

This dataset can be downloaded using the cblearn.datasets.fetch_material_similarity(). Most triplets where responded multiple times, often contradictory.

Please cite the following paper if you use this dataset in publications.

Similarity Judgement Matrix datasets

This collection provides similarity matrices from human similarity judgments on various different stimuli. The collection was aggregated and published by Michael Lee.

Data Sets:

These are Michael Lee’s descriptions of the datasets (with minor modifications):

abstractnumbers: Human judgments of the numbers 0-9. From research described in Shepard, R. N., Kilpatrick, D. W., & Cunningham, J. P. (1975). The internal representation of numbers. Cognitive Psychology, 7, 82-138 (with thanks to Josh Tenenbaum).
auditory: Auditory confusions of 25 letters (all excluding ‘o’) and the numbers 0-9. From research reported in Kuennapas, T., & Janson, A-J. (1969). Multidimensional Similarity of Letters. Perceptual and Motor Skills, 28, 3-12.
bankwiring: A sociologist’s judgment of the relationships between 14 bank wiring workers. From research reported in Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the worker. Cambridge, MA: Harvard University Press.
colours: Human judgments of 14 colours, specified by their wavelengths. From research reported in Ekman, G. (1954). Dimensions of color vision. The Journal of Psychology, 38, 467-474.
congress: Voting patterns of 14 members of congress on environmental bills. From raw data presented in Romesburg, H. C. (1984). Cluster analysis for researchers. Belmont, CA: Lifetime Learning Publications.
dotpatterns: Human judgments of 17 dot patterns. From research reported in Glushko, R. J. (1975). Pattern goodness and redundancy revisited: Multidimensional scaling and hierarchical cluster analysis. Perception & Psychophysics, 17(2), 158-162.
druguse: Reported adolescent use of 13 drug types. From research reported in Huba, G. L., Wingard, J. A., & Bentler, P. M. (1981). A comparison of two latent variable causal models for adolescent drug use. Journal of Personality and Social Psychology, 40(1), 180-193.
flowerpots: Human judgments of 16 drawings of flowerpots. From research reported in Gati, I., & Tversky, A. (1982). Representations of qualitative and quantitative dimensions. Journal of Experimental Psychology: Human Perception and Performance, 8(2), 325-340.
fruits: Human judgments of 21 fruits. From research reported in Tversky, A., & Hutchinson, J. W. (1986). Nearest Neighbor Analysis of Psychological Spaces. Psychological Review, 93(1), 3-22.
letters: Kindergarten children’s judgment of perceptual similarity of the 26 capital letters. From research reported in Gibson, E. J., Osser, H., Schiff, W., & Smith, J. (1963). An analysis of critical features of letters, tested by a confusion matrix. Cooperative Research Project No. 639, U.S. Office of Education.
morseall and morsenumbers: Confusion of Morse code numerals and numeral and letters. From research reported in Rothkopf, E. Z. (1957). A measure of stimulus similarity and errors in some paired-associate learning tasks. Journal of Experimental Psychology, 53, 94-101.
phonemes: Auditory confusion of 16 consonant phonemes. From research reported in Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338-352.
risks: Human judgments of 18 risks. From research reported in Johnson, E. J., & Tversky, A. (1984). Representations of Perceptions of Risks. Journal of Experimental Psychology: General, 113(1), 55-70.
rectangles: Human judgments of 16 rectangles. From research described in Chapter 15 of Borg, I., & Lingoes, J. (1987). Multidimensional similarity structure analysis. New York: Springer Verlag.

The following datasets contain also a empirical estimate of the precision of the similarity measurements:

country_robinsonhefner

Human judgments (in 1967) of 17 countries. From research reported in Robinson, J. P., & Hefner, R (1967). Multidimensional Differences in Public and Academic Perceptions of Nations. Journal of Personality and Social Psychology, 7(3), 251-259.

rectangles_kruschke

Human judgments of 8 rectangles with interior line segments. From research reported in Kruschke, J. K. (1993). Human category learning: Implications for backpropagation models. Connection Science, 5, 3-36.

kinship_rosenbergkim

Human judgments of 15 kinship terms. From research reported in Rosenberg, S., & Kim, M. P. (1975). The Method of Sorting as a Data-Generating Procedure in Multivariate Research. Multivariate Behavioral Research, 10, 489-502.

romney name datasets:

Human judgments of 21 bird names, 21 clothing names, 21 different clothing names, 21 fish names, 21 fruit names, 21 different fruit names, 21 furniture names, 21 different furniture names, 21 semantically unrelated words, 21 sport names, 21 tool names, 21 toy names, 21 vegetable names, 21 different vegetable names, 21 vehicle names, 21 different vehicle names, 21 weapon names, 21 different weapon names. All from research reported in Romney, A. K., Brewer, D. D., & Batchelder, W. H. (1993). Predicting Clustering from Semantic Structure. Psychological Science, 4(1), 28-34, with thanks to Devon Brewer.

birds_romney, clothing_romney, clothing2_romney, fish_romney, fruit_romney, fruit2_romney, furniture_romney, furniture2_romney, nonsense_romney, sport_romney, tools_romney, toys_romney, vegetables_romney, vegetables2_romney, vehicles_romney, vehicles2_romney, weapons_romney, and weapons2_romney.

lines_cohen, faces_busey, faces_steyvers, sizeangle_treat, and bodies_viken

Human judgments of 9 lines of different lengths, 60 faces, 7 ‘morphed’ faces, 9 shapes varying in size and angle, 24 bodies varying in “affect and body size”. Mark Steyvers kindly provided Michael Lee with all of these.

texturebrodatz_heaps and texturemit_heaps

Human judgments of 30 Brodatz textures, and 24 MIT textures. Both from research reported in Heaps, C., & Handel, S. (1999). Similarity and Features of Natural Textures. Journal of Experimental Psychology: Human Perception and Performance, 25(2), 299-320.

cartoonfaces, countriessim, and countriesdis

Human judgments of 10 cartoon faces, and forced-choice judgments of 16 countries in a similarity condition and a dissimilarity condition. From the research described in Navarro, D.J., & Lee, M.D. (2004). Common and distinctive features in stimulus representation: A modified version of the contrast model. Psychonomic Bulletin & Review, 11(6), 961–974, and Navarro, D.J., & Lee, M.D. (2002). Commonalities and distinctions in featural stimulus representations. In W.G. Gray & C. D. Schunn, (Eds.), Proceedings of the 24th Annual Conference of the Cognitive Science Society, pp. 685-690. Mahwah, NJ: Erlbaum.

animalpictures5, animalpictures11, and animalpictures21

Human judgments of 21 animals (presented as pictures on a 5 point scale), of 21 animals (presented as pictures on a 5 point scale), of 21 animals (presented as pictures on an 11 point scale). From (as yet; probably never-to-be) unreported research Michael Lee did a while back.

animalnames5, animalnames11

Human judgments of 21 animals (presented as words on a 5 point scale), of 21 animals (presented as words on an 11 point scale) From (as yet; probably never-to-be) unreported research Michael Lee did a while back.

faces5 and faces11

Human judgements of 25 faces (5 point scale), and of 25 faces (11 point scale). From (as yet; probably never-to-be) unreported research Michael Lee did a while back.

Please cite the dataset’s paper if you use a it in publications.

These datasets can be downloaded using the cblearn.datasets.fetch_similarity_matrix() with the corresponding name parameter. Triplet trials can be generated by using 1 - the similarity matrix as a precomputed distance matrix: cblearn.dataset.make_random_triplets(1 - data.similarity, distance=’precomputed’).