Python Transformation

Executes a Python function provided by the user on a DataFrame connected to its input port. Returns the results of the execution as a DataFrame.

Also returns a Transformer that can be later applied to another DataFrame with a Transform operation.

The Python function that will be executed must:

The variables and functions available in the operations’ global scope:

Example Python code:

from pyspark.sql.types import Row

def transform(dataframe):
    return spark.createDataFrame(dataframe.rdd.map(lambda row: Row(row.numbers_column*2)))

Since: Seahorse 1.0.0

Input

Port Type Qualifier Description
0 DataFrame The DataFrame that will be passed to the transform function.

Output

Port Type Qualifier Description
0 DataFrame The return value of the transform function.
1 Transformer Transformer that allows to apply the operation on another DataFrames using Transform.

Parameters

Name Type Description
code Code Snippet The Python code to be executed. It has to contain a Python function complying to signature presented in the operation's description.

Example

Parameters

Name Value
code
def transform(df):
    return df.filter(df.temp > 0.4).sort(df.windspeed, ascending=False)

Input

datetime windspeed hum temp
2011-01-03 21:00:00.0 0.1045 0.47 0.2
2011-01-03 22:00:00.0 0.1343 0.64 0.18
2011-01-03 23:00:00.0 0.1343 0.69 0.14
2011-02-11 07:00:00.0 0.0 0.68 0.1
2011-02-13 18:00:00.0 0.3284 0.28 0.42
2011-02-18 12:00:00.0 0.1642 0.72 0.44
2011-02-19 03:00:00.0 0.3881 0.13 0.44
2011-02-19 04:00:00.0 0.2985 0.14 0.42
2013-01-01 00:00:00.0 0.1343 0.65 0.26

Output

datetime windspeed hum temp
2011-02-19 03:00:00.0 0.3881 0.13 0.44
2011-02-13 18:00:00.0 0.3284 0.28 0.42
2011-02-19 04:00:00.0 0.2985 0.14 0.42
2011-02-18 12:00:00.0 0.1642 0.72 0.44