Vector Indexer

Vector Indexer indexes categorical features inside of a Vector. It decides which features are categorical and converts them to category indices. The decision is based on the number of distinct values of a feature.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.VectorIndexer documentation.

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
input column SingleColumnSelector The input column name.
output SingleChoice Output generation mode. Possible values: ["replace input column", "append new column"]
max categories Numeric The threshold for the number of values a categorical feature can take. If a feature is found to have more values, then it is declared continuous.

Example

Parameters

Name Value
input column "vectors"
output append new column
output column "indexed"
max categories 3.0

Input

vectors
[1.0,1.0,0.0,1.0]
[0.0,1.0,1.0,1.0]
[-1.0,1.0,2.0,0.0]

Output

vectors indexed
[1.0,1.0,0.0,1.0] [2.0,0.0,0.0,1.0]
[0.0,1.0,1.0,1.0] [0.0,0.0,1.0,1.0]
[-1.0,1.0,2.0,0.0] [1.0,0.0,2.0,0.0]