X-Means
Synopsis
Clustering using
X-Means. This operator implements the algorithm publisehd by Dan Pelleg and Andrew Moore.
Description
X-Means is a clustering algorithm which determines the correct number of centroids based on a heuristic. It begins with a minimum set of centroids and then iteratively exploits if using more centroids makes sense according to the data. If a cluster is split into two sub-clusters is determined by the Bayesian Information Criteria (BIC), balancing the trade-off between precision and model complexity. Original publication: "X-means: Extending K-means with Efficient Estimation of the Number of Clusters" by Dan Pelleg and Andrew Moore, Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
Input
example set
This is an example set input port
Output
cluster model
clustered set
Parameters
add cluster attribute
If enabled, a cluster id is generated as new special attribute directly in this operator, otherwise this operator does not add an id attribute. In the latter case you have to use the Apply Model operator to generate the cluster attribute.
add as label
If true, the cluster id is stored in an attribute with the special role 'label' instead of 'cluster'.
remove unlabeled
Delete the unlabeled examples.
k min
The minimal number of clusters which should be detected.
k max
The maximal number of clusters which should be detected.
determine good start values
Determine the first k centroids using the K-Means++ heuristic described in "k-means++: The Advantages of Careful Seeding" by David Arthur and Sergei Vassilvitskii 2007
measure types
The measure type
mixed measure
Select measure
nominal measure
Select measure
numerical measure
Select measure
divergence
Select divergence
kernel type
The kernel type
kernel gamma
The kernel parameter gamma.
kernel sigma1
The kernel parameter sigma1.
kernel sigma2
The kernel parameter sigma2.
kernel sigma3
The kernel parameter sigma3.
kernel degree
The kernel parameter degree.
kernel shift
The kernel parameter shift.
kernel a
The kernel parameter a.
kernel b
The kernel parameter b.
clustering algorithm
Clustering Algorithm
max runs
The maximal number of runs of k-Means with random initialization that are performed.
max optimization steps
The maximal number of iterations performed for one run of k-Means.
use local random seed
Indicates if a local random seed should be used.
local random seed
Specifies the local random seed