Quantile Discretizer
Takes a column with continuous features and outputs a column with binned categorical features.
This operation is ported from Spark ML.
For a comprehensive introduction, see
Spark documentation.
For scala docs details, see
org.apache.spark.ml.feature.QuantileDiscretizer documentation.
Since: Seahorse 1.1.0
Port |
Type Qualifier |
Description |
0 | DataFrame | The input DataFrame . |
Output
Port |
Type Qualifier |
Description |
0 | DataFrame | The output DataFrame . |
1 | Transformer | A Transformer that allows to apply the operation on other DataFrames using a Transform. |
Parameters
Name |
Type |
Description |
input column |
SingleColumnSelector |
The input column name. |
output |
SingleChoice |
Output generation mode. Possible values: ["replace input column", "append new column"] |
num buckets |
Numeric |
Maximum number of buckets (quantiles or categories) into which the data points are grouped. Must be >= 2. |
Example
Parameters
Name |
Value |
input column |
"features" |
output |
append new column |
output column |
"discretized_features" |
num buckets |
3.0 |
features |
1.0 |
2.0 |
3.0 |
4.0 |
5.0 |
6.0 |
Output
features |
discretized_features |
1.0 |
0.0 |
2.0 |
1.0 |
3.0 |
1.0 |
4.0 |
2.0 |
5.0 |
2.0 |
6.0 |
2.0 |