cblearn.embedding.estimate_dimensionality_cv#

cblearn.embedding.estimate_dimensionality_cv(estimator, queries, responses=None, test_dimensions=[1, 2, 3], n_splits=10, n_repeats=1, refit=True, alpha=0.05, param_name='n_components', n_jobs=-1, random_state=None)[source]#

Estimates the dimensionality of the embedding space.

The procedure estimates embeddings for the provided test_dimensions and evaluates the fit (triplet accuracy) through cross-validation [1]. The estimated dimension is the lowest, that has the best fit for the provided data. The test compares the increase in accuracy; if the increase is not significant, the dimension is considered to be sufficient. Testing a larger range of dimensions can reduce the test sensitivity due to multiple testing correction.

cblearn.embedding.estimator#

The embedding estimator to use.

cblearn.embedding.queries#

The triplet queries to embed.

cblearn.embedding.responses#

Optional responses, if not encoded in triplets.

cblearn.embedding.test_dimensions#

The dimensions to test as a monotonic increasing list.

cblearn.embedding.n_splits#

The number of splits to use for cross-validation.

cblearn.embedding.n_repeats#

The number of repeatitions of each cross-validation split. Use 1 for fast results, but 10 or more for more reliable results.

cblearn.embedding.refit#

if true, then fit the estimator on the entire dataset using the best dimensionality.

cblearn.embedding.alpha#

The significance level for the hypothesis test.

cblearn.embedding.param_name#

The name of the estimator parameter that describes the embedding dimensionality.

cblearn.embedding.n_jobs#

The number of parallel jobs to use for cross-validation.

cblearn.embedding.random_state#

The random state or seed to use for CV splits.

Returns:

A result object with the estimated dimension and other information.

Return type:

result

Examples:

>>> from cblearn.embedding import estimate_dimensionality_cv
>>> from cblearn.embedding import SOE
>>> from cblearn.datasets import make_random_triplets
>>> rs = np.random.RandomState(42)
>>> true_embedding = rs.rand(15, 2)  # 15 points in 2D
>>> triplets = make_random_triplets(true_embedding, result_format='list-order', size=1000, random_state=rs)
>>> estimator = SOE(n_components=1)
>>> dim_result = estimate_dimensionality_cv(estimator, triplets, test_dimensions=[1, 2, 3], n_splits=5, refit=True)
>>> dim_result.estimated_dimension
2
>>> true_embedding.shape == estimator.embedding_.shape
True
>>> dim_result.plot_scores()

References