cblearn.preprocessing.query_from_columns#

cblearn.preprocessing.query_from_columns(data, query_columns, response_columns=None, response_map=None, return_transformer=False)[source]#

Extract queries with indices from feature columns in a DataFrame.

Comparison-based data in this libarary is typically represented by a collection of unique object indices. For example, [[1, 0, 2], [0, 2, 3]] could encode two triplet comparisons between objects 0, 1, 2, and 3. Experimental data, however, often stores the objects as featurized columns in a dataframe, describing the presented stimuli. There the same comparisons could be represented by two rows with columns alpha1, tau1 alpha2, tau2, alpha3, tau3 and Response. The query_from_columns function allows to extract the comparsion queries from such a dataframe by identifying the unique objects (e.g. unique combination of alpha and tau here).

Note

If the dataframe already contains unique indices for the objects per query, consider accessing the indices directly, e.g. df[[‘anchor_ix’, ‘pos_ix’, ‘neg_ix’]].values.astype(int), df[‘response’].values.astype(bool).

>>> import pandas as pd
>>> frame = pd.DataFrame({'alpha1': [0.1, 0.7, 0.1], 'tau1': [0, 0, 1],
...                       'alpha2': [0.3, 0.3, 0.7], 'tau2': [1, 0, 0],
...                       'alpha3': [0.7, 0.3, 0.7], 'tau3': [0, 1, 0], 'Response': [1, 0, 0]})
>>> q, r = query_from_columns(frame, ['alpha1', 'alpha2', 'alpha3'], 'Response', response_map={1: True, 0: False})
>>> q.tolist(), r.tolist()
([[0, 1, 2], [2, 1, 1], [0, 2, 2]], [True, False, False])
>>> q, r = query_from_columns(np.array(frame), [0, 2, 4], response_columns=-1, response_map={1: True, 0: False})
>>> q.tolist(), r.tolist()
([[0, 1, 2], [2, 1, 1], [0, 2, 2]], [True, False, False])
>>> q, r = query_from_columns(frame, [('alpha1', 'tau1'), ('alpha2', 'tau2'), ('alpha3', 'tau3')],
...                           response_columns='Response', response_map={1: True, 0: False})
>>> q.tolist(), r.tolist()
([[0, 3, 4], [4, 2, 3], [1, 4, 4]], [True, False, False])
>>> q, r = query_from_columns(frame, [('alpha1', 'tau1'), ('alpha2', 'tau2'), ('alpha3', 'tau3')],
...                           response_columns='Response')
>>> q.tolist(), r.tolist()
([[0, 3, 4], [4, 2, 3], [1, 4, 4]], [1, 0, 0])

The indices can be used to get the object attributes from the dataframe. This might be helpful in visulizations and for debugging. In the following example, the object-feature to object-index transformer object is accessed to get the object attributes from the object index.

>>> (q,r), (q_transform, r_transform) = query_from_columns(
...     np.array(frame), [0, 2, 4], -1, {1: True, 0: False}, return_transformer=True)
>>> q_transform.inverse_transform(q).tolist()
[[0.1, 0.3, 0.7], [0.7, 0.3, 0.3], [0.1, 0.7, 0.7]]

Parameters:

data (ndarray | pandas.DataFrame) – Tabular query representation (n_queries, n_columns)
query_columns (List[str] | List[int]) – Indices or column-labels in data per query entry. Columns can be grouped as tuples, if multiple columns define an object.
response_columns (List[str] | List[int] | str | int | None) – Indices or column-labels in data per response entry.
response_map (Dict[str, bool | int] | None) – Dictionary mapping the response entries in data to {-1, 1} or {False, True}. If none, use the original response.
return_transformer (bool) – If true, transformer objects for the query and response are returned.

Returns:

Tuple with arrays for the queries and responses.

If return_transform=True, an additional tuple with transformer objects is returned.

Return type:

Tuple[ndarray, ndarray] | Tuple[Tuple[ndarray, ndarray], Tuple[TransformerMixin, TransformerMixin]]