cblearn.datasets.fetch_things_similarity#

cblearn.datasets.fetch_things_similarity(data_home=None, download_if_missing=True, shuffle=True, random_state=None, return_data=False)[source]#

Load the things similarity dataset (odd-one-out).

Trials

146,012

Objects (Things)

1,854

Query

3 images, odd one out

See Things Similarity dataset for a detailed description.

>>> dataset = fetch_things_similarity(shuffle=True)  
>>> dataset.word[[0, -1]].tolist()  
['aardvark', 'zucchini']
>>> dataset.data.shape  
(146012, 3)
Parameters:
  • data_home (PathLike | None) – optional, default: None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

  • download_if_missing (bool) – optional, default=True

  • shuffle (bool) – default = True Shuffle the order of triplet constraints.

  • random_state (RandomState | None) – optional, default = None Initialization for shuffle random generator

  • return_triplets – boolean, default=False. If True, returns numpy array instead of a Bunch object.

Returns:

Bunch

Dictionary-like object, with the following attributes.

datandarray, shape (n_query, 3)

Each row corresponding a odd-one-out query, entries are object indices. The first column is the selected odd-one.

word(n_objects,)

Single word associated with the thing objects.

synset(n_objects,)

Wordnet Synset associated with the thing objects.

wordnet_id(n_objects,)

Wordnet Id associated with the thing objects.

thing_id(n_objects,)

Unique Id string associated with the thing objects.

DESCRstring

Description of the dataset.

datanumpy arrays (n_query, 3)

Only present when return_data=True.

Return type:

dataset

Raises:

IOError – If the data is not locally available, but download_if_missing=False