Quantile Discretizer

Takes a column with continuous features and outputs a column with binned categorical features.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.QuantileDiscretizer documentation.

Since: Seahorse 1.1.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
input column SingleColumnSelector The input column name.
output SingleChoice Output generation mode. Possible values: ["replace input column", "append new column"]
num buckets Numeric Maximum number of buckets (quantiles or categories) into which the data points are grouped. Must be >= 2.

Example

Parameters

Name Value
input column "features"
output append new column
output column "discretized_features"
num buckets 3.0

Input

features
1.0
2.0
3.0
4.0
5.0
6.0

Output

features discretized_features
1.0 0.0
2.0 1.0
3.0 1.0
4.0 2.0
5.0 2.0
6.0 2.0