Data Sampler

Selects a subset of data instances from the input data set.

Channels

Inputs

Examples (ExampleTable)
Attribute-valued data set.

Outputs

Examples (ExampleTable)
Attribute-valued data set as sampled from the input data.
Classified Examples (ExampleTable)
Same as above, but used only if input file includes a special class attribute (disrete or continuous).
Remaining Examples (ExampleTable)
Data instances from input data set that are not included in the sampled data.
Remaining Classified Examples (ExampleTable)
Same as above, but used only if input file includes a special class attribute (disrete or continuous).

Description

Data Sampler supports provides support for several means of sampling of the data from the input channel and outputs the sampled data set and complementary data set (with instances from the input set that are not included in the sampled data set). Output is set when the input data set is set to the widget or after Sample Data button is pressed.

Data Sampler

Sampling may be stratified: if input data contains a class, sampling will try to match its class distribution in the output data sets.

Several types of sampling are supported. Random sampling can draw a fixed number of instances or can create a data set with a size set as a proportion of instances from the input data set. In repeated sampling, an data instance may be included in a sampled data several times (like in bootstrap).

Cross validation, leave-one-out or sampling that can create multiple subsets of preset sample sizes relative to the input data set (like random sampling) all create several data samples. Which one is send to the output is determined by the data set index in Fold/group (indices start with 1).

Examples

Schema where we have sampled 10 data instances from Iris data set and presented this selection in Scatterplot widget is shown below.

Schema with Data Sampler