cblearn.datasets.fetch_material_similarity#

cblearn.datasets.fetch_material_similarity(data_home=None, download_if_missing=True, shuffle=True, random_state=None, return_triplets=False)[source]#

Load the material similarity dataset (triplets).

Triplets Train/Test	22801 / 3000
Responses	92892 / 11800
Objects (Materials)	100

See Material Similarity dataset for a detailed description.

>>> dataset = fetch_material_similarity(shuffle=True)  
>>> dataset.material_name[[0, -1]].tolist()  
['alum-bronze', 'yellow-plastic']
>>> dataset.triplet.shape, dataset.response.shape  
((92892, 3), (92892,))

Parameters:

data_home (PathLike | None) – optional, default: None Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
download_if_missing (bool) – optional, default=True
shuffle (bool) – default = True Shuffle the order of triplet constraints.
random_state (RandomState | None) – optional, default = None Initialization for shuffle random generator
return_triplets (bool) – boolean, default=False. If True, returns numpy array instead of a Bunch object.

Returns:

Bunch

Dictionary-like object, with the following attributes.

tripletndarray, shape (n_triplets, 3): Each row corresponding a triplet constraint. The columns represent the reference and two other material indices.
responsendarray, shape (n_triplets, ): The count of subject responses that chose the first other (positive) or second other (negative) material to be more similar to the reference material.
test_tripletndarray, shape (n_test_triplets, 3): handoff test set.
test_responsendarray, shape (n_test_triplets, ): handoff test set.
material_namendarray, shape (100, ): Names of the materials.
DESCRstring: Description of the dataset.

triplets, responsenumpy arrays (n_triplets, 3) and (n_triplets, )

Only present when return_triplets=True.

Return type:

dataset

Raises:

IOError – If the data is not locally available, but download_if_missing=False