Binarize
Binarizes continuous features.
This operation is ported from Spark ML.
For a comprehensive introduction, see
Spark documentation.
For scala docs details, see
org.apache.spark.ml.feature.Binarizer documentation.
Since: Seahorse 1.0.0
Port |
Type Qualifier |
Description |
0 | DataFrame | The input DataFrame . |
Output
Port |
Type Qualifier |
Description |
0 | DataFrame | The output DataFrame . |
1 | Transformer | A Transformer that allows to apply the operation on other DataFrames using a Transform. |
Parameters
Name |
Type |
Description |
threshold |
Numeric |
The threshold used to binarize continuous features. Feature values greater
than the threshold will be binarized to 1.0. Remaining values will be binarized
to 0.0. |
operate on |
InputOutputColumnSelector |
The input and output columns for the operation. |
Example
Parameters
Name |
Value |
threshold |
0.5 |
operate on |
one column |
input column |
"hum" |
output |
append new column |
output column |
"hum_bin" |
datetime |
windspeed |
hum |
temp |
2011-01-03 21:00:00.0 |
0.1045 |
0.47 |
0.2 |
2011-01-03 22:00:00.0 |
0.1343 |
0.64 |
0.18 |
2011-01-03 23:00:00.0 |
0.1343 |
0.69 |
0.14 |
2011-02-11 07:00:00.0 |
0.0 |
0.68 |
0.1 |
2011-02-13 18:00:00.0 |
0.3284 |
0.28 |
0.42 |
2011-02-18 12:00:00.0 |
0.1642 |
0.72 |
0.44 |
2011-02-19 03:00:00.0 |
0.3881 |
0.13 |
0.44 |
2011-02-19 04:00:00.0 |
0.2985 |
0.14 |
0.42 |
2013-01-01 00:00:00.0 |
0.1343 |
0.65 |
0.26 |
Output
datetime |
windspeed |
hum |
temp |
hum_bin |
2011-01-03 21:00:00.0 |
0.1045 |
0.47 |
0.2 |
0.0 |
2011-01-03 22:00:00.0 |
0.1343 |
0.64 |
0.18 |
1.0 |
2011-01-03 23:00:00.0 |
0.1343 |
0.69 |
0.14 |
1.0 |
2011-02-11 07:00:00.0 |
0.0 |
0.68 |
0.1 |
1.0 |
2011-02-13 18:00:00.0 |
0.3284 |
0.28 |
0.42 |
0.0 |
2011-02-18 12:00:00.0 |
0.1642 |
0.72 |
0.44 |
1.0 |
2011-02-19 03:00:00.0 |
0.3881 |
0.13 |
0.44 |
0.0 |
2011-02-19 04:00:00.0 |
0.2985 |
0.14 |
0.42 |
0.0 |
2013-01-01 00:00:00.0 |
0.1343 |
0.65 |
0.26 |
1.0 |