Naive Bayes (Kernel)

Synopsis

This operator generates a Kernel Naive Bayes classification model using estimated kernel densities.

Description

A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be the 'independent feature model'. In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class (i.e. attribute) is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a Naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. The Naive Bayes classifier performs reasonably well even if the underlying assumption is not true

The advantage of the Naive Bayes classifier is that it only requires a small amount of training data to estimate the means and variances of the variables necessary for classification. Because independent variables are assumed, only the variances of the variables for each label need to be determined and not the entire covariance matrix. In contrast to the Naive Bayes operator, the Naive Bayes (Kernel) operator can be applied on numerical attributes.

A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable.

Kernel density estimators belong to a class of estimators called non-parametric density estimators. In comparison to parametric estimators where the estimator has a fixed functional form (structure) and the parameters of this function are the only information we need to store, Non-parametric estimators have no fixed structure and depend upon all the data points to reach an estimate.

Input

training set

The input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

model

The Kernel Naive Bayes classification model is delivered from this output port. This classification model can now be applied on unseen data sets for prediction of the label attribute.

example set

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Laplace correction

This parameter indicates if Laplace correction should be used to prevent high influence of zero probabilities. There is a simple trick to avoid zero probabilities. We can assume that our training set is so large that adding one to each count that we need would only make a negligible difference in the estimated probabilities, yet would avoid the case of zero probability values. This technique is known as Laplace correction.

Estimation mode

This parameter specifies the kernel density estimation mode. Two options are available.

full: If this option is selected, you can select a bandwidth through heuristic or a fix bandwidth can be specified.
greedy: If this option is selected, you have to specify the minimum bandwidth and the number of kernels.

Bandwidth selection

This parameter is only available when the estimation mode parameter is set to 'full'. This parameter specifies the method to set the kernel bandwidth. The bandwidth can be selected through heuristic or a fix bandwidth can be specified. Please note that the bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. It is important to choose the most appropriate bandwidth as a value that is too small or too large is not useful.

Bandwidth

This parameter is only available when the estimation mode parameter is set to 'full' and the bandwidth selection parameter is set to 'fix'. This parameter specifies the kernel bandwidth.

Minimum bandwidth

This parameter is only available when the estimation mode parameter is set to 'greedy'. This parameter specifies the minimum kernel bandwidth.

Number of kernels

This parameter is only available when the estimation mode parameter is set to 'greedy'. This parameter specifies the number of kernels.

Use application grid

This parameter indicates if the kernel density function grid should be used in the model application. It speeds up model application at the expense of the density function precision.

Application grid size

This parameter is only available when the use application grid parameter is set to true. This parameter specifies the size of the application grid.

Synopsis​

Description​

Input​

training set​

Output​

model​

example set​

Parameters​

Laplace correction​

Estimation mode​

Bandwidth selection​

Bandwidth​

Minimum bandwidth​

Number of kernels​

Use application grid​

Application grid size​