User Guide
Most Machine Learning algorithms use numerical training data (features) for inference, either representing points in a Euclidean space, similarities, or distances. The are settings, e.g. in human studies, when metric points are not available but only ordinal comparisons. Comparison-based learning algorithms are the machine learning algorithms applicable in this setting.
Triplet comparisons
Triplet comparisons are the most common form of ordinal comparisons. For the triplet of objects \((i, j, k)\) one can ask, “Is the object i more similar to the object j or k?”. For the unknown points \((x_i, x_j, x_k)\) and the distance metric \(\delta\), the question corresponds to the following inequality:
This library supports two representation formats of triplets in an array or a sparse matrix form.
The array form uses 2d numpy
arrays representing a triplet per row and columns for i,j,k
.
Alternatively to the ordering, an additional response array containing 1 or -1 can specify if (i,j,k)
is correct or wrong.
The sparse matrix is an alternative representation, where triplets are naturally specified as the matrix indices, containing entries 1 or -1.
Scikit-learn compatibility
All estimators in this library are compatible with the scikit-learn
API and can be used in scikit-learn
pipelines
if comparisons are represented in the array format.
The scikit-learn
compatibility is achieved by implementing the fit
, predict
, and score
methods of the BaseEstimator
class.
The fit
method is used to train the model, the predict
method is used to predict the labels of the test data,
and the score
method is used to evaluate the model on the test data.
In the case of ordinal embedding, for example, the predict
method returns the triplet response according to the embedding
and the score
method returns the triplet accuracy (the fraction of correct triplet responses).
The Ordinal Embedding example shows how to use a scikit-learn cross validation function with an ordinal embedding estimator.
Pytorch backend (CPU/GPU)
The default backend for computations is the scipy
stack, optimized for fast CPU computations and minimal overhead in both compute and disk space.
However, this comes with limitations when implementing new methods and for calculations with very large data sets.
- As an alternative for some estimators, a
pytorch
implementation exists. To use this implementation,
pytorch
must be installed (see Extra Requirements) and, if necessary, the optionbackend='torch'
must be set (see the respective function documentation).
These estimators take care automatically of the data transfer between numpy and torch (internal data representation) and use a batched optimizer for faster convergence. If a CUDA GPU is available, the computations are automatically performed on the GPU.
pytorch
itself needs a lot of hard disk space and starting the optimization has a certain overhead
(automatic derivation, data transformation).
It is therefore advisable to use the
scipy
backend by default and only change if necessary.
Dataset loading utilities
Musician Similarity dataset
This dataset contains triplets gathered during the MusicSeer similarity survey in October 2002.
In a web-based survey or game, the user was presented a target musician and multiple others to select the most similar to the target. Such, for each user judgement multiple triplets were created with the remaining others.
Data Set Characteristics:
Triplets
131.970
Objects (Artists)
448
Dimensionality
unknown
This is is based on the original dataset, that was used in the ISMIR paper that is referenced below with 138.338 triplets and 413 artists, but make some modifications. We drop triplets that are missing the third (other) entry. Some artists in the triplets are missing in the provided name list, we call them ‘unknown_0’, ‘unknown_1’, etc.
This dataset can be downloaded using the cblearn.datasets.fetch_musician_similarity()
.
When using these triplets, please give credit to the original authors.
Food Similarity dataset
The food dataset contains triplets collected from Amazon Mechanical Turk in 2014.
The crowd workers were presented a target and multiple other of the 100 food images. They selected a fixed number of other images, which taste more similar to the target than the remaining. Per user selection, multiple triplet constraints were created.
Data Set Characteristics:
Triplets
190376
Objects
100
Dimensionality
unknown
This dataset can be downloaded using the cblearn.datasets.fetch_food_similarity()
.
When using this data, please consider the fair use statement above and give credit to the original authors.
Car Similarity dataset
This dataset contains triplets of 60 car images, responsed in an online survey. The people chose one car of three, such that the following statement is true: “Object A is the most central object within the triple of objects (A,B,C)”.
All images were found on Wikimedia Commons and are assigned to one of four classes: ORDINARY CARS, SPORTS CARS, OFF-ROAD/SPORT UTILITY VEHICLES, and OUTLIERS.
The corresponding car images are available with the full dataset. .. _full dataset: http://www.tml.cs.uni-tuebingen.de/team/luxburg/code_and_data/index.php
Data Set Characteristics:
Triplets
7097
Objects (Cars)
60
Query
3 cars, most-central
This dataset can be downloaded using the cblearn.datasets.fetch_car_similarity()
.
To use the most-central triplets with e.g. ordinal embedding algorithms, you should convert them to standard triplets
(cblearn.dataset.triplets_from_mostcentral()
).
Please cite the following paper if you use this dataset in publications.
Imagenet Similarity dataset
This dataset contains comparison trials of images from the ImageNet validation dataset (ILSVRC-2012). In an crowd sourced experiments, subjects ranked two out of 8 images that appeared most similar to a reference image. The trials where selected in an active learning routine, such that they already are not too dissimilar within a trial.
There are two versions of this dataset: Version “0.2” has trials for all 50 ImageNet validation images per class, version “0.1” has trials for a single image per class.
The whole dataset is published under CC-By Attribution 4.0 International by Brett Roads.
Data Set Characteristics:
Trials v0.1/v0.2
25,273 / 384,277
Objects (Images)
1,000 / 50,000
Classes
1,000
Query
rank 2 from 8
This dataset can be downloaded using the cblearn.datasets.fetch_imagenet_similarity()
.
To use the 8-rank-2 trials with e.g. ordinal embedding algorithms, they can be converted to standard triplets
with cblearn.preprocessing.triplets_from_multiselect()
.
Please cite the following paper if you use this dataset in publications.
Things Similarity dataset
This dataset contains odd-one-out trials of images from the Things image database. In an crowd sourced experiments, subjects were asked to choose one of three images, that is the odd-one. Note: The trials used here, are the test trials of the original paper. Their train trials are not published.
The data is shared under CC-BY-4.0 by Hebart, M. N., Zheng, C. Y., Pereira, F., and Baker, C. I.
Data Set Characteristics:
Trials
146,012
Objects (Things)
1,854
Query
3 images, odd one out
This dataset can be downloaded using the cblearn.datasets.fetch_things_similarity()
.
To use the odd-one-out trials with e.g. ordinal embedding algorithms, they can be converted to standard triplets
with cblearn.preprocessing.triplets_from_oddoneout()
.
Please cite the following paper if you use this dataset in publications.
Nature and Vogue datasets
The nature and vogue datasets consist of odd-one-out triplets of the form “Out of three shown items pick one that appears to be different from the two others”.
The items were either images of natural scenes (forests, beaches, mountaints, etc.) or covers of the Vogue magazine.
Data Set Characteristics:
Triplets (Covers)
1107
Objects (Covers)
60
Triplets (Scenes)
3355
Objects (Scenes)
120
This datasets can be downloaded using cblearn.datasets.fetch_nature_scene_similarity()
and
cblearn.datasets.fetch_vogue_cover_similarity()
To use the odd-one-out triplets with e.g. ordinal embedding algorithms, convert them to standard triplets
with cblearn.dataset.triplets_from_oddoneout()
.
Please cite the following paper if you use this dataset in publications.
Material Similarity dataset
This dataset contains triplets of 100 material images, gathered in a crowd sourced experiment. The subjects chose for triplets of one reference and two candidate images “Which of these two candidates has a more similar appearance to the reference?”. The trials where actively chosen such that they maximize the information gain (CKL algorithm).
Experimental code and the material images are available at the dataset author’s Github repository. .. _Github repository: https://github.com/mlagunas/material-appearance-similarity
Data Set Characteristics:
Triplets Train/Test
22801 / 3000
Responses
92892 / 11800
Objects (Materials)
100
This dataset can be downloaded using the cblearn.datasets.fetch_material_similarity()
.
Most triplets where responded multiple times, often contradictory.
Please cite the following paper if you use this dataset in publications.
Similarity Judgement Matrix datasets
This collection provides similarity matrices from human similarity judgments on various different stimuli. The collection was aggregated and published by Michael Lee.
Data Sets:
These are Michael Lee’s descriptions of the datasets (with minor modifications):
- abstractnumbers
Human judgments of the numbers 0-9. From research described in Shepard, R. N., Kilpatrick, D. W., & Cunningham, J. P. (1975). The internal representation of numbers. Cognitive Psychology, 7, 82-138 (with thanks to Josh Tenenbaum).
- auditory
Auditory confusions of 25 letters (all excluding ‘o’) and the numbers 0-9. From research reported in Kuennapas, T., & Janson, A-J. (1969). Multidimensional Similarity of Letters. Perceptual and Motor Skills, 28, 3-12.
- bankwiring
A sociologist’s judgment of the relationships between 14 bank wiring workers. From research reported in Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the worker. Cambridge, MA: Harvard University Press.
- colours
Human judgments of 14 colours, specified by their wavelengths. From research reported in Ekman, G. (1954). Dimensions of color vision. The Journal of Psychology, 38, 467-474.
- congress
Voting patterns of 14 members of congress on environmental bills. From raw data presented in Romesburg, H. C. (1984). Cluster analysis for researchers. Belmont, CA: Lifetime Learning Publications.
- dotpatterns
Human judgments of 17 dot patterns. From research reported in Glushko, R. J. (1975). Pattern goodness and redundancy revisited: Multidimensional scaling and hierarchical cluster analysis. Perception & Psychophysics, 17(2), 158-162.
- druguse
Reported adolescent use of 13 drug types. From research reported in Huba, G. L., Wingard, J. A., & Bentler, P. M. (1981). A comparison of two latent variable causal models for adolescent drug use. Journal of Personality and Social Psychology, 40(1), 180-193.
- flowerpots
Human judgments of 16 drawings of flowerpots. From research reported in Gati, I., & Tversky, A. (1982). Representations of qualitative and quantitative dimensions. Journal of Experimental Psychology: Human Perception and Performance, 8(2), 325-340.
- fruits
Human judgments of 21 fruits. From research reported in Tversky, A., & Hutchinson, J. W. (1986). Nearest Neighbor Analysis of Psychological Spaces. Psychological Review, 93(1), 3-22.
- letters
Kindergarten children’s judgment of perceptual similarity of the 26 capital letters. From research reported in Gibson, E. J., Osser, H., Schiff, W., & Smith, J. (1963). An analysis of critical features of letters, tested by a confusion matrix. Cooperative Research Project No. 639, U.S. Office of Education.
- morseall and morsenumbers
Confusion of Morse code numerals and numeral and letters. From research reported in Rothkopf, E. Z. (1957). A measure of stimulus similarity and errors in some paired-associate learning tasks. Journal of Experimental Psychology, 53, 94-101.
- phonemes
Auditory confusion of 16 consonant phonemes. From research reported in Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338-352.
- risks
Human judgments of 18 risks. From research reported in Johnson, E. J., & Tversky, A. (1984). Representations of Perceptions of Risks. Journal of Experimental Psychology: General, 113(1), 55-70.
- rectangles
Human judgments of 16 rectangles. From research described in Chapter 15 of Borg, I., & Lingoes, J. (1987). Multidimensional similarity structure analysis. New York: Springer Verlag.
The following datasets contain also a empirical estimate of the precision of the similarity measurements:
- country_robinsonhefner
Human judgments (in 1967) of 17 countries. From research reported in Robinson, J. P., & Hefner, R (1967). Multidimensional Differences in Public and Academic Perceptions of Nations. Journal of Personality and Social Psychology, 7(3), 251-259.
- rectangles_kruschke
Human judgments of 8 rectangles with interior line segments. From research reported in Kruschke, J. K. (1993). Human category learning: Implications for backpropagation models. Connection Science, 5, 3-36.
- kinship_rosenbergkim
Human judgments of 15 kinship terms. From research reported in Rosenberg, S., & Kim, M. P. (1975). The Method of Sorting as a Data-Generating Procedure in Multivariate Research. Multivariate Behavioral Research, 10, 489-502.
- romney name datasets:
Human judgments of 21 bird names, 21 clothing names, 21 different clothing names, 21 fish names, 21 fruit names, 21 different fruit names, 21 furniture names, 21 different furniture names, 21 semantically unrelated words, 21 sport names, 21 tool names, 21 toy names, 21 vegetable names, 21 different vegetable names, 21 vehicle names, 21 different vehicle names, 21 weapon names, 21 different weapon names. All from research reported in Romney, A. K., Brewer, D. D., & Batchelder, W. H. (1993). Predicting Clustering from Semantic Structure. Psychological Science, 4(1), 28-34, with thanks to Devon Brewer.
birds_romney, clothing_romney, clothing2_romney, fish_romney, fruit_romney, fruit2_romney, furniture_romney, furniture2_romney, nonsense_romney, sport_romney, tools_romney, toys_romney, vegetables_romney, vegetables2_romney, vehicles_romney, vehicles2_romney, weapons_romney, and weapons2_romney.
- lines_cohen, faces_busey, faces_steyvers, sizeangle_treat, and bodies_viken
Human judgments of 9 lines of different lengths, 60 faces, 7 ‘morphed’ faces, 9 shapes varying in size and angle, 24 bodies varying in “affect and body size”. Mark Steyvers kindly provided Michael Lee with all of these.
- texturebrodatz_heaps and texturemit_heaps
Human judgments of 30 Brodatz textures, and 24 MIT textures. Both from research reported in Heaps, C., & Handel, S. (1999). Similarity and Features of Natural Textures. Journal of Experimental Psychology: Human Perception and Performance, 25(2), 299-320.
- cartoonfaces, countriessim, and countriesdis
Human judgments of 10 cartoon faces, and forced-choice judgments of 16 countries in a similarity condition and a dissimilarity condition. From the research described in Navarro, D.J., & Lee, M.D. (2004). Common and distinctive features in stimulus representation: A modified version of the contrast model. Psychonomic Bulletin & Review, 11(6), 961–974, and Navarro, D.J., & Lee, M.D. (2002). Commonalities and distinctions in featural stimulus representations. In W.G. Gray & C. D. Schunn, (Eds.), Proceedings of the 24th Annual Conference of the Cognitive Science Society, pp. 685-690. Mahwah, NJ: Erlbaum.
- animalpictures5, animalpictures11, and animalpictures21
Human judgments of 21 animals (presented as pictures on a 5 point scale), of 21 animals (presented as pictures on a 5 point scale), of 21 animals (presented as pictures on an 11 point scale). From (as yet; probably never-to-be) unreported research Michael Lee did a while back.
- animalnames5, animalnames11
Human judgments of 21 animals (presented as words on a 5 point scale), of 21 animals (presented as words on an 11 point scale) From (as yet; probably never-to-be) unreported research Michael Lee did a while back.
- faces5 and faces11
Human judgements of 25 faces (5 point scale), and of 25 faces (11 point scale). From (as yet; probably never-to-be) unreported research Michael Lee did a while back.
Please cite the dataset’s paper if you use a it in publications.
These datasets can be downloaded using the cblearn.datasets.fetch_similarity_matrix()
with the
corresponding name parameter. Triplet trials can be generated by using 1 - the similarity matrix as a precomputed
distance matrix: cblearn.dataset.make_random_triplets(1 - data.similarity, distance=’precomputed’).