cblearn.datasets.fetch_imagenet_similarity#

cblearn.datasets.fetch_imagenet_similarity(data_home=None, download_if_missing=True, shuffle=True, random_state=None, version='0.1', return_data=False)[source]#

Load the imagenet similarity dataset (rank 2 from 8).

Trials v0.1/v0.2

25,273 / 384,277

Objects (Images)

1,000 / 50,000

Classes

1,000

Query

rank 2 from 8

See Imagenet Similarity dataset for a detailed description.

>>> dataset = fetch_imagenet_similarity(shuffle=True, version='0.1')  
>>> dataset.class_label[[0, -1]].tolist()  
['n01440764', 'n15075141']
>>> dataset.n_select, dataset.is_ranked  
(2, True)
>>> dataset.data.shape  
(25273, 9)
Parameters:
  • data_home (PathLike | None) – optional, default: None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

  • download_if_missing (bool) – optional, default=True

  • shuffle (bool) – default = True Shuffle the order of triplet constraints.

  • random_state (RandomState | None) – optional, default = None Initialization for shuffle random generator

  • version (str) – Version of the dataset. ‘0.1’ contains one object per class, ‘0.2’ 50 objects per class.

  • return_triplets – boolean, default=False. If True, returns numpy array instead of a Bunch object.

Returns:

Bunch

Dictionary-like object, with the following attributes.

datandarray, shape (n_query, 9)

Each row corresponding a rank-2-of-8 query, entries are object indices. The first column is the reference, the second column is the most similar, and the third column is the second most similar object.

rt_msndarray, shape (n_query, )

Reaction time in milliseconds.

n_selectint

Number of selected objects per trial.

is_rankedbool

Whether the selection is ranked in similarity to the reference.

session_id(n_query,)

Ids of the survey session for query recording.

stimulus_id(50.000,)

Ids of the images.

stimulus_filepath(50.000,)

Filepaths of images.

class_id(50.000,)

ImageNet class assigned to each image.

class_label(1.000,)

WordNet labels of the classes.

DESCRstring

Description of the dataset.

datanumpy arrays (n_query, 9)

Only present when return_data=True.

Return type:

dataset

Raises:

IOError – If the data is not locally available, but download_if_missing=False