cblearn.datasets.fetch_car_similarity#

cblearn.datasets.fetch_car_similarity(data_home=None, download_if_missing=True, shuffle=True, random_state=None, return_triplets=False)[source]#

Load the 60-car dataset (most-central triplets).

Triplets

7097

Objects (Cars)

60

Query

3 cars, most-central

Sessions

146

Queries per Session

30-50

See Car Similarity dataset for a detailed description.

>>> dataset = fetch_car_similarity(shuffle=False)  
>>> dataset.class_name.tolist()  
['OFF-ROAD / SPORT UTILITY VEHICLES', 'ORDINARY CARS', 'OUTLIERS', 'SPORTS CARS']
>>> dataset.triplet.shape  
(7097, 3)
>>> rounds, round_count = np.unique(dataset.survey_round, return_counts=True)  
>>> len(rounds), round_count.min(), round_count.max()  
(146, 30, 50)
Parameters:
  • data_home (PathLike | None) – optional, default: None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.

  • download_if_missing (bool) – optional, default=True

  • shuffle (bool) – default = True Shuffle the order of triplet constraints.

  • random_state (RandomState | None) – optional, default = None Initialization for shuffle random generator

  • return_triplets (bool) – boolean, default=False. If True, returns numpy array instead of a Bunch object.

Returns:

Bunch

Dictionary-like object, with the following attributes.

tripletndarray, shape (n_triplets, 3)

Each row corresponding a triplet constraint. The columns represent the three indices shown per most-central question.

responsendarray, shape (n_triplets, )

The car per question (0, 1, or 2) that was selected as “most-central”.

survey_roundndarray of int, shape (n_triplets, )

Survey rounds, grouping responses from a participant during a session. Some participants responded in multiple rounds at different times.

rtndarray of float, shape (n_triplets, )

Reaction time of the response in seconds.

class_idnp.ndarray (60, )

The class assigned to each object.

class_namelist (4)

Names of the classes.

DESCRstring

Description of the dataset.

tripletsnumpy array (n_triplets, 3)

Only present when return_triplets=True.

Return type:

dataset

Raises:

IOError – If the data is not locally available, but download_if_missing=False