cblearn.embedding.estimate_dimensionality_cv#

cblearn.embedding.estimate_dimensionality_cv(estimator, queries, responses=None, test_dimensions=[1, 2, 3], n_splits=10, n_repeats=1, refit=True, alpha=0.05, param_name='n_components', n_jobs=-1, random_state=None)[source]#

Estimates the dimensionality of the embedding space.

The procedure estimates embeddings for the provided test_dimensions and evaluates the fit (triplet accuracy) through cross-validation [1]. The estimated dimension is the lowest, that has the best fit for the provided data. The test compares the increase in accuracy; if the increase is not significant, the dimension is considered to be sufficient. Testing a larger range of dimensions can reduce the test sensitivity due to multiple testing correction.

cblearn.embedding.estimator#: The embedding estimator to use.

cblearn.embedding.queries#: The triplet queries to embed.

cblearn.embedding.responses#: Optional responses, if not encoded in triplets.

cblearn.embedding.test_dimensions#: The dimensions to test as a monotonic increasing list.

cblearn.embedding.n_splits#: The number of splits to use for cross-validation.

cblearn.embedding.n_repeats#: The number of repeatitions of each cross-validation split. Use 1 for fast results, but 10 or more for more reliable results.

cblearn.embedding.refit#: if true, then fit the estimator on the entire dataset using the best dimensionality.

cblearn.embedding.alpha#: The significance level for the hypothesis test.

cblearn.embedding.param_name#: The name of the estimator parameter that describes the embedding dimensionality.

cblearn.embedding.n_jobs#: The number of parallel jobs to use for cross-validation.

cblearn.embedding.random_state#: The random state or seed to use for CV splits.

Returns:: A result object with the estimated dimension and other information.
Return type:: result

Examples:

>>> from cblearn.embedding import estimate_dimensionality_cv
>>> from cblearn.embedding import SOE
>>> from cblearn.datasets import make_random_triplets
>>> rs = np.random.RandomState(42)
>>> true_embedding = rs.rand(15, 2)  # 15 points in 2D
>>> triplets = make_random_triplets(true_embedding, result_format='list-order', size=1000, random_state=rs)
>>> estimator = SOE(n_components=1)
>>> dim_result = estimate_dimensionality_cv(estimator, triplets, test_dimensions=[1, 2, 3], n_splits=5, refit=True)
>>> dim_result.estimated_dimension
2
>>> true_embedding.shape == estimator.embedding_.shape
True
>>> dim_result.plot_scores()

References