Binarize

Binarizes continuous features.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.Binarizer documentation.

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
threshold Numeric The threshold used to binarize continuous features. Feature values greater than the threshold will be binarized to 1.0. Remaining values will be binarized to 0.0.
operate on InputOutputColumnSelector The input and output columns for the operation.

Example

Parameters

Name Value
threshold 0.5
operate on one column
input column "hum"
output append new column
output column "hum_bin"

Input

datetime windspeed hum temp
2011-01-03 21:00:00.0 0.1045 0.47 0.2
2011-01-03 22:00:00.0 0.1343 0.64 0.18
2011-01-03 23:00:00.0 0.1343 0.69 0.14
2011-02-11 07:00:00.0 0.0 0.68 0.1
2011-02-13 18:00:00.0 0.3284 0.28 0.42
2011-02-18 12:00:00.0 0.1642 0.72 0.44
2011-02-19 03:00:00.0 0.3881 0.13 0.44
2011-02-19 04:00:00.0 0.2985 0.14 0.42
2013-01-01 00:00:00.0 0.1343 0.65 0.26

Output

datetime windspeed hum temp hum_bin
2011-01-03 21:00:00.0 0.1045 0.47 0.2 0.0
2011-01-03 22:00:00.0 0.1343 0.64 0.18 1.0
2011-01-03 23:00:00.0 0.1343 0.69 0.14 1.0
2011-02-11 07:00:00.0 0.0 0.68 0.1 1.0
2011-02-13 18:00:00.0 0.3284 0.28 0.42 0.0
2011-02-18 12:00:00.0 0.1642 0.72 0.44 1.0
2011-02-19 03:00:00.0 0.3881 0.13 0.44 0.0
2011-02-19 04:00:00.0 0.2985 0.14 0.42 0.0
2013-01-01 00:00:00.0 0.1343 0.65 0.26 1.0