Convert To n-grams
Converts arrays of strings to arrays of n-grams. Null values in the input arrays are ignored. Each n-gram is represented by a space-separated string of words. When the input is empty, an empty array is returned. When the input array is shorter than n (number of elements per n-gram), no n-grams are returned.
This operation is ported from Spark ML.
For a comprehensive introduction, see
Spark documentation.
For scala docs details, see
org.apache.spark.ml.feature.NGram documentation.
Since: Seahorse 1.0.0
Port |
Type Qualifier |
Description |
0 | DataFrame | The input DataFrame . |
Output
Port |
Type Qualifier |
Description |
0 | DataFrame | The output DataFrame . |
1 | Transformer | A Transformer that allows to apply the operation on other DataFrames using a Transform. |
Parameters
Example
Parameters
Name |
Value |
n |
3.0 |
operate on |
one column |
input column |
"words" |
output |
append new column |
output column |
"output" |
label |
words |
0 |
[Hi,I,heard,about,Spark] |
1 |
[I,wish,Java,could,use,case,classes] |
2 |
[Logistic,regression,models,are,neat] |
Output
label |
words |
output |
0 |
[Hi,I,heard,about,Spark] |
[Hi I heard,I heard about,heard about Spark] |
1 |
[I,wish,Java,could,use,case,classes] |
[I wish Java,wish Java could,Java could use,could use case,use case classes] |
2 |
[Logistic,regression,models,are,neat] |
[Logistic regression models,regression models are,models are neat] |