Note
Go to the end to download the full example code.
Triplet Formats#
cblearn supports triplet input data in two formats: As a triplet array (or matrix with three columns) or as a sparse matrix.
import time
from cblearn import datasets
from cblearn.utils import check_query_response
triplets_ordered = datasets.make_random_triplet_indices(n_objects=1000, size=1000000, repeat=False)
print(f"'triplets_ordered' is a numpy array of shape {triplets_ordered.shape}.")
'triplets_ordered' is a numpy array of shape (1000000, 3).
Triplet Array#
In the array format, the constraints are encoded by the index order.
The triplet [ 0 1 107] means, that object 0 (1st) should be embedded closer to object 1 (2nd) than to object 0 (3th).
Alternatively, the triplet array can be complemented by a answer array.
triplets_boolean, answers_boolean = check_query_response(triplets_ordered, result_format='list-boolean')
print(f"Is object {triplets_boolean[0, 0]} closer to object {triplets_boolean[0, 1]} "
f"than to object {triplets_boolean[0, 2]}? {answers_boolean[0]}.")
triplets_numeric, answers_numeric = check_query_response(triplets_ordered, result_format='list-count')
print(f"Is object {triplets_numeric[0, 0]} closer to object {triplets_numeric[0, 1]} "
f"than to object {triplets_numeric[0, 2]}? {answers_numeric[0]}.")
Is object 0 closer to object 1 than to object 107? True.
Is object 0 closer to object 1 than to object 107? 1.
Sparse Matrix#
In the sparse matrix format the object indices of the triplet constraints correspond to the row / column indices of a sparse matrix.
triplet_spmatrix = check_query_response(triplets_ordered, result_format='tensor-count')
print(f"triplet_spmatrix[i, j, k]="
f"{triplet_spmatrix[triplets_numeric[0, 0], triplets_numeric[0, 1], triplets_numeric[0, 2]]} "
f"is the same as answer(i,j,k)={answers_numeric[0]}.")
triplet_spmatrix[i, j, k]=1 is the same as answer(i,j,k)=1.
Conversation Time#
Converting between triplet and answer formats is not free, let’s measure the process time.
def time_convert_triplet(triplets, to_format):
time_start = time.process_time()
if len(triplets) == 2:
triplets, answers = triplets
check_query_response(triplets, answers, result_format=to_format)
else:
check_query_response(triplets, result_format=to_format)
return (time.process_time() - time_start)
data = [triplets_ordered, (triplets_boolean, answers_boolean),
(triplets_numeric, answers_numeric), triplet_spmatrix]
formats = ["list-order", "list-boolean", "list-count", "tensor-count"]
timings = [
(time_convert_triplet(triplets, to_format),
f"{from_format}->{to_format}")
for from_format, triplets in zip(formats, data)
for to_format in formats
]
for seconds, desc in sorted(timings):
print(f"{seconds * 1000:.2f}ms {desc}")
1.66ms tensor-count->tensor-count
1.87ms list-order->list-boolean
2.87ms list-order->list-order
10.40ms list-order->list-count
11.27ms list-boolean->list-boolean
13.67ms tensor-count->list-boolean
16.79ms list-boolean->list-count
20.29ms tensor-count->list-count
20.34ms list-count->list-boolean
26.46ms list-boolean->list-order
26.79ms list-count->list-count
35.96ms list-order->tensor-count
41.97ms tensor-count->list-order
44.71ms list-boolean->tensor-count
47.83ms list-count->list-order
53.39ms list-count->tensor-count
Total running time of the script: (0 minutes 8.418 seconds)