cblearn.embedding.estimate_dimensionality_cv#
- cblearn.embedding.estimate_dimensionality_cv(estimator, queries, responses=None, test_dimensions=[1, 2, 3], n_splits=10, n_repeats=1, refit=True, alpha=0.05, param_name='n_components', n_jobs=-1, random_state=None)[source]#
Estimates the dimensionality of the embedding space.
The procedure estimates embeddings for the provided test_dimensions and evaluates the fit (triplet accuracy) through cross-validation [1]. The estimated dimension is the lowest, that has the best fit for the provided data. The test compares the increase in accuracy; if the increase is not significant, the dimension is considered to be sufficient. Testing a larger range of dimensions can reduce the test sensitivity due to multiple testing correction.
- cblearn.embedding.estimator#
The embedding estimator to use.
- cblearn.embedding.queries#
The triplet queries to embed.
- cblearn.embedding.responses#
Optional responses, if not encoded in triplets.
- cblearn.embedding.test_dimensions#
The dimensions to test as a monotonic increasing list.
- cblearn.embedding.n_splits#
The number of splits to use for cross-validation.
- cblearn.embedding.n_repeats#
The number of repeatitions of each cross-validation split. Use 1 for fast results, but 10 or more for more reliable results.
- cblearn.embedding.refit#
if true, then fit the estimator on the entire dataset using the best dimensionality.
- cblearn.embedding.alpha#
The significance level for the hypothesis test.
- cblearn.embedding.param_name#
The name of the estimator parameter that describes the embedding dimensionality.
- cblearn.embedding.n_jobs#
The number of parallel jobs to use for cross-validation.
- cblearn.embedding.random_state#
The random state or seed to use for CV splits.
- Returns:
A result object with the estimated dimension and other information.
- Return type:
result
Examples:
>>> from cblearn.embedding import estimate_dimensionality_cv >>> from cblearn.embedding import SOE >>> from cblearn.datasets import make_random_triplets >>> rs = np.random.RandomState(42) >>> true_embedding = rs.rand(15, 2) # 15 points in 2D >>> triplets = make_random_triplets(true_embedding, result_format='list-order', size=1000, random_state=rs) >>> estimator = SOE(n_components=1) >>> dim_result = estimate_dimensionality_cv(estimator, triplets, test_dimensions=[1, 2, 3], n_splits=5, refit=True) >>> dim_result.estimated_dimension 2 >>> true_embedding.shape == estimator.embedding_.shape True >>> dim_result.plot_scores()
References