One Hot Encoder

Maps a column of category indices to a column of binary vectors.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.OneHotEncoder documentation.

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
drop last Boolean Whether to drop the last category in the encoded vector.
operate on InputOutputColumnSelector The input and output columns for the operation.

Example

Parameters

Name Value
drop last true
operate on one column
input column "labels"
output append new column
output column "encoded"

Input

features labels
a 0.0
a 0.0
b 1.0
c 2.0
a 0.0
b 1.0
a 0.0
a 0.0
c 2.0

Output

features labels encoded
a 0.0 (2,[0],[1.0])
a 0.0 (2,[0],[1.0])
b 1.0 (2,[1],[1.0])
c 2.0 (2,[],[])
a 0.0 (2,[0],[1.0])
b 1.0 (2,[1],[1.0])
a 0.0 (2,[0],[1.0])
a 0.0 (2,[0],[1.0])
c 2.0 (2,[],[])