Convert To n-grams

Converts arrays of strings to arrays of n-grams. Null values in the input arrays are ignored. Each n-gram is represented by a space-separated string of words. When the input is empty, an empty array is returned. When the input array is shorter than n (number of elements per n-gram), no n-grams are returned.

This operation is ported from Spark ML.

For a comprehensive introduction, see Spark documentation.

For scala docs details, see org.apache.spark.ml.feature.NGram documentation.

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0DataFrameThe input DataFrame.

Output

Port Type Qualifier Description
0DataFrameThe output DataFrame.
1TransformerA Transformer that allows to apply the operation on other DataFrames using a Transform.

Parameters

Name Type Description
n Numeric The minimum n-gram length.
operate on InputOutputColumnSelector The input and output columns for the operation.

Example

Parameters

Name Value
n 3.0
operate on one column
input column "words"
output append new column
output column "output"

Input

label words
0 [Hi,I,heard,about,Spark]
1 [I,wish,Java,could,use,case,classes]
2 [Logistic,regression,models,are,neat]

Output

label words output
0 [Hi,I,heard,about,Spark] [Hi I heard,I heard about,heard about Spark]
1 [I,wish,Java,could,use,case,classes] [I wish Java,wish Java could,Java could use,could use case,use case classes]
2 [Logistic,regression,models,are,neat] [Logistic regression models,regression models are,models are neat]