Executes a Python function provided by the user on a DataFrame connected to its input port.
Returns the results of the execution as a DataFrame
.
Also returns a Transformer that can be later applied
to another DataFrame
with a Transform operation.
The Python function that will be executed must:
be named transform
,
take exactly one argument of type DataFrame
,
return a DataFrame
(or data which can be automatically converted to Spark DataFrame
:
pandas.DataFrame, single value, tuple/list of single values, tuple/list of tuples/lists of single values).
The variables and functions available in the operations’ global scope:
dataframe()
- a function that returns the input DataFrame
for this operation.
Everytime the input DataFrame
changes, the dataframe()
returns the updated DataFrame
.
sc
- Spark Context
spark
- Spark Session
sqlContext
- SQL Context
from pyspark.sql.types import Row
def transform(dataframe):
return spark.createDataFrame(dataframe.rdd.map(lambda row: Row(row.numbers_column*2)))
Since: Seahorse 1.0.0
Port | Type Qualifier | Description |
---|---|---|
0 |
DataFrame |
The DataFrame that will be passed to the transform function. |
Port | Type Qualifier | Description |
---|---|---|
0 |
DataFrame |
The return value of the transform function. |
1 |
Transformer |
Transformer that allows to apply the operation on another DataFrames using
Transform. |
Name | Type | Description |
---|---|---|
code |
Code Snippet |
The Python code to be executed. It has to contain a Python function complying to signature presented in the operation's description. |
Name | Value |
---|---|
code |
def transform(df): return df.filter(df.temp > 0.4).sort(df.windspeed, ascending=False) |
datetime | windspeed | hum | temp |
---|---|---|---|
2011-01-03 21:00:00.0 | 0.1045 | 0.47 | 0.2 |
2011-01-03 22:00:00.0 | 0.1343 | 0.64 | 0.18 |
2011-01-03 23:00:00.0 | 0.1343 | 0.69 | 0.14 |
2011-02-11 07:00:00.0 | 0.0 | 0.68 | 0.1 |
2011-02-13 18:00:00.0 | 0.3284 | 0.28 | 0.42 |
2011-02-18 12:00:00.0 | 0.1642 | 0.72 | 0.44 |
2011-02-19 03:00:00.0 | 0.3881 | 0.13 | 0.44 |
2011-02-19 04:00:00.0 | 0.2985 | 0.14 | 0.42 |
2013-01-01 00:00:00.0 | 0.1343 | 0.65 | 0.26 |
datetime | windspeed | hum | temp |
---|---|---|---|
2011-02-19 03:00:00.0 | 0.3881 | 0.13 | 0.44 |
2011-02-13 18:00:00.0 | 0.3284 | 0.28 | 0.42 |
2011-02-19 04:00:00.0 | 0.2985 | 0.14 | 0.42 |
2011-02-18 12:00:00.0 | 0.1642 | 0.72 | 0.44 |