X-Means

Synopsis

Clustering using

X-Means. This operator implements the algorithm publisehd by Dan Pelleg and Andrew Moore.

Description

X-Means is a clustering algorithm which determines the correct number of centroids based on a heuristic. It begins with a minimum set of centroids and then iteratively exploits if using more centroids makes sense according to the data. If a cluster is split into two sub-clusters is determined by the Bayesian Information Criteria (BIC), balancing the trade-off between precision and model complexity. Original publication: "X-means: Extending K-means with Efficient Estimation of the Number of Clusters" by Dan Pelleg and Andrew Moore, Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

Input

example set

This is an example set input port

Output

cluster model

clustered set

Parameters

add cluster attribute

If enabled, a cluster id is generated as new special attribute directly in this operator, otherwise this operator does not add an id attribute. In the latter case you have to use the Apply Model operator to generate the cluster attribute.

add as label

If true, the cluster id is stored in an attribute with the special role 'label' instead of 'cluster'.

remove unlabeled

Delete the unlabeled examples.

k min

The minimal number of clusters which should be detected.

k max

The maximal number of clusters which should be detected.

determine good start values

Determine the first k centroids using the K-Means++ heuristic described in "k-means++: The Advantages of Careful Seeding" by David Arthur and Sergei Vassilvitskii 2007

measure types

The measure type

mixed measure

Select measure

nominal measure

Select measure

numerical measure

Select measure

divergence

Select divergence

kernel type

The kernel type

kernel gamma

The kernel parameter gamma.

kernel sigma1

The kernel parameter sigma1.

kernel sigma2

The kernel parameter sigma2.

kernel sigma3

The kernel parameter sigma3.

kernel degree

The kernel parameter degree.

kernel shift

The kernel parameter shift.

kernel a

The kernel parameter a.

kernel b

The kernel parameter b.

clustering algorithm

Clustering Algorithm

max runs

The maximal number of runs of k-Means with random initialization that are performed.

max optimization steps

The maximal number of iterations performed for one run of k-Means.

use local random seed

Indicates if a local random seed should be used.

local random seed

Specifies the local random seed

Synopsis​

Description​

Input​

example set​

Output​

cluster model​

clustered set​

Parameters​

add cluster attribute​

add as label​

remove unlabeled​

k min​

k max​

determine good start values​

measure types​

mixed measure​

nominal measure​

numerical measure​

divergence​

kernel type​

kernel gamma​

kernel sigma1​

kernel sigma2​

kernel sigma3​

kernel degree​

kernel shift​

kernel a​

kernel b​

clustering algorithm​

max runs​

max optimization steps​

use local random seed​

local random seed​

Synopsis

Description

Input

example set

Output

cluster model

clustered set

Parameters

add cluster attribute

add as label

remove unlabeled

k min

k max

determine good start values

measure types

mixed measure

nominal measure

numerical measure

divergence

kernel type

kernel gamma

kernel sigma1

kernel sigma2

kernel sigma3

kernel degree

kernel shift

kernel a

kernel b

clustering algorithm

max runs

max optimization steps

use local random seed

local random seed