Standard Scaler

Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.StandardScaler documentation.

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
input column SingleColumnSelector The input column name.
output SingleChoice Output generation mode. Possible values: ["replace input column", "append new column"]
with mean Boolean Centers the data with mean before scaling.
with std Boolean Scales the data to unit standard deviation.

Example

Parameters

Name Value
input column "features"
output append new column
output column "scaled"
with mean false
with std true

Input

features
[-2.0,2.3,0.0]
[0.0,-5.1,1.0]
[1.7,-0.6,3.3]

Output

features scaled
[-2.0,2.3,0.0] [-1.0798984943120777,0.6168340914150375,0.0]
[0.0,-5.1,1.0] [0.0,-1.3677625505289963,0.5909681092664519]
[1.7,-0.6,3.3] [0.9179137201652661,-0.16091324123870546,1.9501947605792913]