Triplet Formats#

cblearn supports triplet input data in two formats: As a triplet array (or matrix with three columns) or as a sparse matrix.

import time

from cblearn import datasets
from cblearn.utils import check_query_response

triplets_ordered = datasets.make_random_triplet_indices(n_objects=1000, size=1000000, repeat=False)
print(f"'triplets_ordered' is a numpy array of shape {triplets_ordered.shape}.")
'triplets_ordered' is a numpy array of shape (1000000, 3).

Triplet Array#

In the array format, the constraints are encoded by the index order.

triplet = triplets_ordered[0]
print(f"The triplet {triplet} means, that object {triplet[0]} (1st) should be "
      f"embedded closer to object {triplet[1]} (2nd) than to object {triplet[0]} (3th).")
The triplet [  0   1 525] means, that object 0 (1st) should be embedded closer to object 1 (2nd) than to object 0 (3th).

Alternatively, the triplet array can be complemented by a answer array.

triplets_boolean, answers_boolean = check_query_response(triplets_ordered, result_format='list-boolean')
print(f"Is object {triplets_boolean[0, 0]} closer to object {triplets_boolean[0, 1]} "
      f"than to object {triplets_boolean[0, 2]}? {answers_boolean[0]}.")

triplets_numeric, answers_numeric = check_query_response(triplets_ordered, result_format='list-count')
print(f"Is object {triplets_numeric[0, 0]} closer to object {triplets_numeric[0, 1]} "
      f"than to object {triplets_numeric[0, 2]}? {answers_numeric[0]}.")
Is object 0 closer to object 1 than to object 525? True.
Is object 0 closer to object 1 than to object 525? 1.

Sparse Matrix#

In the sparse matrix format the object indices of the triplet constraints correspond to the row / column indices of a sparse matrix.

triplet_spmatrix = check_query_response(triplets_ordered, result_format='tensor-count')
print(f"triplet_spmatrix[i, j, k]="
      f"{triplet_spmatrix[triplets_numeric[0, 0], triplets_numeric[0, 1], triplets_numeric[0, 2]]} "
      f"is the same as answer(i,j,k)={answers_numeric[0]}.")
triplet_spmatrix[i, j, k]=1 is the same as answer(i,j,k)=1.

Conversation Time#

Converting between triplet and answer formats is not free, let’s measure the process time.

def time_convert_triplet(triplets, to_format):
    time_start = time.process_time()
    if len(triplets) == 2:
        triplets, answers = triplets
        check_query_response(triplets, answers, result_format=to_format)
    else:
        check_query_response(triplets, result_format=to_format)
    return (time.process_time() - time_start)


data = [triplets_ordered, (triplets_boolean, answers_boolean),
        (triplets_numeric, answers_numeric), triplet_spmatrix]
formats = ["list-order", "list-boolean", "list-count", "tensor-count"]

timings = [
    (time_convert_triplet(triplets, to_format),
     f"{from_format}->{to_format}")
    for from_format, triplets in zip(formats, data)
    for to_format in formats
]

for seconds, desc in sorted(timings):
    print(f"{seconds * 1000:.2f}ms {desc}")
1.66ms tensor-count->tensor-count
1.87ms list-order->list-boolean
2.61ms list-order->list-order
8.46ms list-order->list-count
11.50ms list-boolean->list-boolean
13.62ms tensor-count->list-boolean
15.73ms list-boolean->list-count
17.23ms tensor-count->list-count
19.66ms list-count->list-boolean
23.01ms list-count->list-count
24.76ms list-boolean->list-order
33.95ms list-order->tensor-count
38.11ms tensor-count->list-order
42.51ms list-boolean->tensor-count
44.82ms list-count->list-order
47.84ms list-count->tensor-count

Total running time of the script: (0 minutes 8.000 seconds)

Gallery generated by Sphinx-Gallery