Join
Joins two DataFrames by performing an SQL join operation.
Depending on set parameters inner, outer, left outer or right outer join operation will be performed.
Creates a new DataFrame
that consists of the values from all the columns of the left DataFrame
and the columns not used in the join conditions from the right DataFrame
.
Two rows match when all of the equality conditions created by join columns
are satisfied.
That is, the values in the rows are equal. The order of the columns is preserved.
The operation joins two DataFrames
by the column pairs given in join columns
parameter.
For each given pair, both columns must be of the same type. If the column pairs in join columns
are not present in their DataFrames
(left DataFrame
and right DataFrame
, respectively),
a ColumnDoesNotExistException
is thrown. If columns from one pair have different types,
a WrongColumnTypeException
is thrown.
If values of left prefix
and/or right prefix
are provided,
the columns in the output DataFrame
are renamed by prepending the prefix
proper for the table they come from.
If the columns’ names in the resulting DataFrame
are not to be unique,
a DuplicatedColumnsException
is thrown.
Since: Seahorse 0.4.0
Port |
Type Qualifier |
Description |
0 |
DataFrame |
The left-side DataFrame . |
1 |
DataFrame |
The right-side DataFrame . |
Output
Port |
Type Qualifier |
Description |
0 |
DataFrame |
The DataFrame containing all the columns of the left DataFrame and the
columns of the right DataFrame not used in join condition
(in join columns parameter). |
Parameters
Name |
Type |
Description |
join type |
Single Choice |
The type of the join to perform. Possible values are:
Inner , Outer , Left outer , Right outer .
Default value: Inner . |
join columns |
Parameters Sequence |
The sequence of column pairs (left column: SingleColumnSelector ,
right column: SingleColumnSelector ) defining the condition for the JOIN operation.
Empty join condition is not supported and ColumnDoesNotExistException is thrown.
When a column selected by name or by index does not exist, ColumnDoesNotExistException is thrown.
When the type of columns to JOIN upon in the two DataFrames do not match,
WrongColumnTypeException is thrown. |
left prefix |
String |
An optional prefix, which can be prepended
to these columns in the output table, which come from the left input table. |
right prefix |
String |
An optional prefix, which can be prepended
to these columns in the output table, which come from the right input table. |
Example
Parameters
Name |
Value |
left prefix |
"left_" |
right prefix |
"right_" |
join columns |
Join on left.city == right.city |
city |
price |
CityA |
695611.0 |
CityC |
294691.0 |
CityB |
430784.0 |
CityB |
336677.0 |
CityA |
584639.0 |
CityA |
579560.0 |
city |
beds |
CityA |
4.0 |
CityC |
2.0 |
CityB |
3.0 |
Output
left_city |
left_price |
right_beds |
CityB |
430784.0 |
3.0 |
CityB |
336677.0 |
3.0 |
CityC |
294691.0 |
2.0 |
CityA |
695611.0 |
4.0 |
CityA |
584639.0 |
4.0 |
CityA |
579560.0 |
4.0 |