Chi-Squared Selector
Selects categorical features to use for predicting a categorical label.
This operation is ported from Spark ML.
For a comprehensive introduction, see
Spark documentation.
For scala docs details, see
org.apache.spark.ml.feature.ChiSqSelector documentation.
Since: Seahorse 1.1.0
Port |
Type Qualifier |
Description |
0 | DataFrame | The input DataFrame . |
Output
Port |
Type Qualifier |
Description |
0 | DataFrame | The output DataFrame . |
1 | Transformer | A Transformer that allows to apply the operation on other DataFrames using a Transform. |
Parameters
Name |
Type |
Description |
num top features |
Numeric |
Number of features that selector will select, ordered by statistics value descending. If the real number of features is lower, then this will select all features. |
features column |
SingleColumnSelector |
The features column for model fitting. |
output column |
String |
The output column name. |
label column |
SingleColumnSelector |
The label column for model fitting. |
Example
Parameters
Name |
Value |
num top features |
1.0 |
features column |
"features" |
output column |
"selected_features" |
label column |
"label" |
features |
label |
[0.0,0.0,18.0,1.0] |
1.0 |
[0.0,1.0,12.0,0.0] |
0.0 |
[1.0,0.0,15.0,0.1] |
0.0 |
Output
features |
label |
selected_features |
[0.0,0.0,18.0,1.0] |
1.0 |
[18.0] |
[0.0,1.0,12.0,0.0] |
0.0 |
[12.0] |
[1.0,0.0,15.0,0.1] |
0.0 |
[15.0] |