Impute

Replaces unknown values in the data.

Channels

Inputs

Classified Examples (ExampleTableWithClass)
Data set.
Learner for Imputation
A learning algorithm to be used when values are imputed using a predictive model. This algorithm, if given, substitutes the default (1-NNLearner).

Outputs

Classified Examples (ExampleTableWithClass)
The same data set as on the input, but with the missing values imputed.

Description

Some Orange's algorithms and visualization cannot handle unknown values in the data. This widget does what statistician call imputation: it substitutes replace them by values computed from the data or set by the user.

Impute widget

In the top-most box, Default imputation method, the user can specify a general imputation technique for all attributes.

  • Average/Most-frequent uses the average value (for continuous attributes) or the most common value (for discrete attributes).
  • Model-based imputer constructs a model for predicting the missing value based on values of other attributes; a separate model is constructed for each attribute. The default model is 1-NN learner, which takes the value from the most similar example (this is sometimes referred to as hot deck imputation). This algorithm can be substituted by one that the user connects to the input signal Learner for Imputation. Note, however, that if there are discrete and continuous attributes in the data, the algorithm needs to be capable of handling them both; at the moment only kNN learner can do that. (In the future, when Orange has more regressors, Impute widget may have separate input signals for discrete and continuous models.)
  • Random values computes the distributions of values for each attribute and then imputes by picking random values from them.

It is also possible to specify individual treatment for each attribute. Besides the above options, one can decide not to impute the value at all (Don't impute) of to always impute a specific, manually defined value. These settings can be seen in the list of attributes: in the snapshot on the left, we decided not to impute the values of "normalized-losses" and "make", the missing values of "aspiration" will be replaced by random values, while the missing values of "body-style" and "drive-wheels" are replaced by "hatchback" and "fwd", respectively. Values of all other attributes use the default method set above (model-based imputer, in our case).

Button Set All to Default resets the individual attribute treatments to the default.

Imputing class values is typically not a good practice, so it is off by default. It can be enabled by checking Impute class values.

All changes are committed automatically is Send automatically is checked. Otherwise, Apply needs to be pushed to apply any new settings.