Chi-Squared Selector

Selects categorical features to use for predicting a categorical label.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.ChiSqSelector documentation.

Since: Seahorse 1.1.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
num top features Numeric Number of features that selector will select, ordered by statistics value descending. If the real number of features is lower, then this will select all features.
features column SingleColumnSelector The features column for model fitting.
output column String The output column name.
label column SingleColumnSelector The label column for model fitting.

Example

Parameters

Name Value
num top features 1.0
features column "features"
output column "selected_features"
label column "label"

Input

features label
[0.0,0.0,18.0,1.0] 1.0
[0.0,1.0,12.0,0.0] 0.0
[1.0,0.0,15.0,0.1] 0.0

Output

features label selected_features
[0.0,0.0,18.0,1.0] 1.0 [18.0]
[0.0,1.0,12.0,0.0] 0.0 [12.0]
[1.0,0.0,15.0,0.1] 0.0 [15.0]