Discretize

Discretizes continuous attrbutes from input data set.

Channels

Inputs

Examples (ExampleTable)
Attribute-valued data set.

Outputs

Examples (ExampleTable)
Attribute-valued data set composed from instances from input data set that match user-defined condition.
Classified Examples (ExampleTableWithClass)
Same as above, but used only if input file includes a class attribute (disrete or continuous).

Description

Discretize widget receives a data set on the input, finds attributes that are continuous and discretizes them using selected method. It then outputs the same data set with continuous attributes replaced by their discretized version.

Discretize

Three discretization methods are supported. Continuous attributes are either discretized using a set of intervals of the same size (Equal-Width Intervals), using a set of intervals where interval borders are defined so that each interval covers approximately equal number of data instances (Equal-Frequency Intervals). Number of intervals is user-defined.

A different technique is Entropy-based discretization, which works only if an input data set includes a discrete class and finds intervals so that these minimize the entropy of the class variable (e.g., intervals tend to include instances of some prevailing class). The algorithm used is that of Fayyad and Irani (1992). One possible outcome of the algorithm is that no appropriate cut-off points are found, hence an attribute is reduced to a constant and can be removed from the data set. Attributes of this kind are listed under Removed Attributes.

Depending on the user's settings, the widget can display either the discretization intervals or the cut-off points, that is, interval borders.

Examples

In the schema below we show Iris data set with continuous attributes (as in original data file) and with discretized attributes.

Schema with Discretize widget