cblearn.cluster.ComparisonHC#
- class cblearn.cluster.ComparisonHC(n_clusters)[source]#
ComparisonHC.
ComparisonHC [1] is an hierarchical clustering algorithm that calculates clusters on triplet data without computing an intermediate embedding. This is done via an adapted linkage algorithm that only uses the triplet information.
As this is algorithm produces its clusterings via a Dendrogram that is created on the whole dataset, we do not provide a fit method. Call fit_predict directly with the complete dataset you want to do an clustering on.
Keep in mind that this algorithm was optimized and developed for hierarchical clustering, and simply adapted to produce a flat clustering with the desired number of clusters. Thus, this algorithm might not have optimal performance in these settings when compared to other approaches.
- dendrogram_#
numpy array, shape (n_clusters-1, 4) An array corresponding to the learned dendrogram. After iteration i, dendrogram[i,0] and dendrogram[i,1] are the indices of the merged clusters, and dendrogram[i,2] is the size of the new cluster. The dendrogram is initialized to None until the fit method is called. The last column is set to 0 (implemented like this by the original algorithm).
- cluster_#
list of list Initial cluster information used for fitting.
Examples:
>>> from sklearn.datasets import make_blobs >>> from sklearn.metrics import normalized_mutual_info_score >>> from cblearn.datasets import make_random_triplets >>> from cblearn.cluster import ComparisonHC >>> import numpy as np >>> means = np.array([[1,0], [-1, 0]]) >>> stds = 0.2 * np.ones(means.shape) >>> xs, ys = make_blobs(n_samples=[10, 10], centers=means, cluster_std=stds, ... n_features=2, random_state=2) >>> estimator = ComparisonHC(2) >>> t = make_random_triplets(xs, result_format="list-order", size=5000, random_state=2) >>> labels = estimator.fit_predict(t) >>> normalized_mutual_info_score(labels, ys) 1.0
References
- __init__(n_clusters)[source]#
Initialize the estimator.
- Parameters:
n_clusters (int) – Number of clusters desired in the final clustering.
Methods
__init__(n_clusters)Initialize the estimator.
fit(X[, y, init_clusters])Computes the dendrogram of a list of clusters.
fit_predict(X[, y])Perform clustering on X and returns cluster labels.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_fit_request(*[, init_clusters])Request metadata passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
- fit(X, y=None, init_clusters=None)[source]#
Computes the dendrogram of a list of clusters.
- Parameters:
X – Triplets, repeated responses will be ignored (majority vote)
y – optional responses
init_clusters – list of (list of examples), len(n_clusters) An optional list containing the initial clusters (list of examples).
- Returns:
object
- Return type:
self
- Raises:
ValueError – If the initial partition has less that n_examples.
- fit_predict(X, y=None, **kwargs)#
Perform clustering on X and returns cluster labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data.
y (Ignored) – Not used, present for API consistency by convention.
**kwargs (dict) –
Arguments to be passed to
fit.Added in version 1.4.
- Returns:
labels – Cluster labels.
- Return type:
ndarray of shape (n_samples,), dtype=np.int64
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- set_fit_request(*, init_clusters='$UNCHANGED$')#
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance