Join
Synopsis
This Operator joins two ExampleSets using one or more Attributes of the input ExampleSets as
key_attributes.
Description
This Operator joins two ExampleSets using one or more Attributes of the input ExampleSets as key_attributes.
Identical values of the key_attributes indicate matching Examples. An Attribute with id role is selected as key by default but an arbitrary set of one or more Attributes can be chosen as key. Four types of joins are possible: join_type.inner, join_type.left, join_type.right and join_type.outer join. All these types of joins are explained in the parameters section.
Differentiation
Append
The Append Operator merges the Examples of the input ExampleSets into the resulting ExampleSet. Therefore, all input ExampleSet need to have the same structure (number of Attributes, Attribute names and value types).
Cartesian Product
The Cartesian Product Operator builds a cartesian product of the input ExampleSets, i.e. every Example from the left ExampleSet is joined with each Example of the right ExampleSet.
Union
The Union Operator combines both input ExampleSets in such a way that all Attributes and Examples are part of the resulting union ExampleSet.
Superset
The Superset Operator expects two ExampleSets as input and adds the Attributes of the first ExampleSet to the second ExampleSet and vice versa. Both resulting ExampleSets are delivered as output of the Superset Operator.
Input
left
The left input port expects an ExampleSet. This ExampleSet will be used as the left ExampleSet for the join.
right
The right input port expects an ExampleSet. This ExampleSet will be used as the right ExampleSet for the join.
Output
join
The output port delivers the joint ExampleSet.
Parameters
remove double attributes
This parameter indicates if double Attributes should be removed or renamed. Double Attributes are those Attributes that are present in both ExampleSets. If this parameter is checked, from Attributes which are present in both ExampleSets only the one from the left ExampleSet will be taken and the one from the right ExampleSet will be discarded. If this parameter is unchecked, the Attributes from the right ExampleSet are renamed. The key_attributes will always be taken from the left ExampleSet. Please note that this check for double Attributes will only be applied for regular Attributes. Special Attributes of the right ExampleSet which do not exist in the left ExampleSet will simply be added. If they already exist they are simply skipped.
join type
This parameter specifies which join should be performed. You can easily understand these joins by studying the tutorial Process. Four types of joins are supported:
- inner: The resulting ExampleSet will contain only those Examples where the
key_attributesof both input ExampleSets match, i.e. have the same value. - left: This is also called left outer join. The resulting ExampleSet will contain all Examples from the left ExampleSet. If no matching Examples were found in the right ExampleSet, then its Attributes will consist of missing values. Missing values or null values are shown as '?' in Altair RapidMiner. The left join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the right ExampleSet.
- right: This is also called right outer join. The resulting ExampleSet will contain all records from the right ExampleSet. If no matching Examples were found in the left ExampleSet, then its Attributes will consist of missing values. Missing values or null values are shown as '?' in Altair RapidMiner. The right join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the left ExampleSet.
- outer: This is also called full outer join. This type of join combines the results of the left and the right join. All Examples from both ExampleSets will be part of the resulting ExampleSet, whether the matching
key_attributesvalue exists in the other ExampleSet or not. If no matchingkey_attributesvalue was found the corresponding resulting Attributes will consist of missing values. Missing values or null values are shown as '?' in Altair RapidMiner. The outer join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the other ExampleSet.
use id attribute as key
This parameter indicates if the Attribute with the id role should be used as the key attribute. This option is checked by default. If unchecked, then you have to specify the key_attributes for both left and right ExampleSets. Identical values of the key_attributes indicate matching Examples.
key attributes
This parameter is available when the parameter use_id_attribute_as_key is unchecked. This parameter specifies Attribute(s) which are used as the key attributes. Identical values of the key_attributes indicate matching Examples. For each key attribute from the left ExampleSet a corresponding key attribute from the right ExampleSet has to be chosen. Choosing appropriate key_attributes is critical for obtaining the desired results.
keep both join attributes
If checked, both Attributes of a join pair will be kept. Usually this is unnecessary since both Attributes are identical. It may be useful to keep such a column if there are missing values on one side.