Skip to main content

Pivot

Synopsis

The

Pivot Operator creates a pivot table, summarizing the data in a larger table by reorganizing it into groups and calculating sums, averages, or other statistics for each group.

Description

The most common example of a pivot table is the sales summary report. Imagine that you sell goods in numerous stores, and you want to know the total sales in each store during every month of the year. You have a data table listing all the transactions that occur in all of the stores. For each transaction, a row in your table lists the store and the date of purchase, with separate columns for month and day, plus the cost of the transaction. The pivot table is a new table, created from your original table in the following way:

  • Create a unique row for each store.
  • Create a unique column for each month of the year.
  • Assign every transaction to the appropriate row and column, by adding the cost of the transaction.

Although in this case the cells of the pivot table display a aggregation_attributes.sum (total sales for that category), you can create other kinds of statistics. A aggregation_attributes.count will tell you how many transactions occurred in each store during each month. An aggregation_attributes.average will tell you the average cost of a transaction. See the parameter aggregation_attributes for the complete list of statistics.

Typically, a pivot table is defined by three Attributes from your original ExampleSet:

  • group_by_attributes - Usually categorical (nominal) values, they define the rows
  • column_grouping_attribute - Usually categorical (nominal) values, they define the columns
  • aggregation_attributes - Usually numeric values, they are summed or averaged

However, the actual number of Attributes may be more or less than three, depending on your choices. You can, for example, choose more than one Attribute in the set of aggregation_attributes.

Notice that in our original example, the group_by_attributes (store) and the column_grouping_attribute (month) have categorical values, while the aggregation_attributes (cost) have numeric values. If an Attribute you have chosen for the aggregation_attributes has categorical values, the range of statistics is more limited -- e.g., aggregation_attributes.count is still available, but not aggregation_attributes.average.

If an Attribute you have chosen for the group_by_attributes or the column_grouping_attribute has numeric values, you may find it convenient to first convert them to categorical values, by placing the numeric values into bins.

Differentiation

Aggregate

Just like the Pivot Operator, the Aggregate Operator has the parameters group_by_attributes and aggregation_attributes, but it does not support the column_grouping_attribute. You can think of the Pivot Operator as a generalization of the Aggregate Operator, with a more advanced concept of groups.

Transpose

The Transpose Operator exchanges the rows and columns of an ExampleSet, like the transpose of a matrix.

Input

input

This input port expects an ExampleSet.

Output

output

This output port delivers the pivot table.

original

The ExampleSet that was given as input is passed through without changes.

preprocessing model

This port delivers the preprocessing model. It can be used by the Apply Model Operator to perform the specified pivot on another ExampleSet. The result of the Apply Model then has exactly the same attributes as at the output port of the Pivot. This is helpful for example if the Pivot Operator is used during training and the same transformation has to be applied on test or actual data.

The preprocessing model can also be grouped together with other preprocessing models and prediction models by the Group Models Operator.

Parameters

group by attributes

This parameter specifies the rows in the pivot table. Click on "gui.action.attributes.label" and choose one or more Attributes. If you choose a single Attribute, the rows will correspond to the unique values of that Attribute. If you choose two or more Attributes, the rows will correspond to unique combinations of the Attribute values. All data points with the same values of the Attributes belong in the same row.

column grouping attribute

This parameter specifies the columns in the pivot table. Select an Attribute from the drop-down. The columns will correspond to the unique values of this Attribute. All data points with the same value of the Attribute belong in the same column.

aggregation attributes

Within each cell of the pivot table, defined by a row and a column, one or more statistics are calculated, using the aggregation functions. Click on "gui.action.list.label", and choose an Attribute and a function from the drop-downs. The complete list of aggregation functions is given below.

  • average: The average of the values.
  • concatenation: The concatenation of the values as strings.
  • count: The number of non-missing values.
  • count (fractional): The fraction of non-missing values in relation to the whole number of rows.
  • count (including missings): The number of values including missing values.
  • count (percentage): The percentage of non-missing values.
  • first: The first non-missing value.
  • least: The least occurring value.
  • log product: The log product.
  • maximum: The maximum value.
  • median: The median of the values.
  • minimum: The minimum value.
  • mode: The most occurring value.
  • product: The product of the values.
  • standard deviation: The standard deviation of the values.
  • sum: The sum of the values.
  • sum (fractional): The sum of the grouped values divided by the sum of all the values.
  • variance: The variance.

use default aggregation

If checked, this parameter allows you to define a default_aggregation_function for the remaining Attributes -- all those that are not used by the first three parameters. Any remaining Attributes whose data type is not compatible with the default_aggregation_function will be ignored; e.g., notice that aggregation_attributes.average is not available for categorical (nominal) Attributes.

default aggregation function

This parameter is available only when the parameter use_default_aggregation is set to true. It specifies the default aggregation function for the remaining Attributes -- all those that are not used by the first three parameters. The Attributes in question must be compatible with the default_aggregation_function. For example, if an ExampleSet has two nominal Attributes and all the rest are numeric, you can aggregate all of them by first including the two nominal Attributes in the aggregation_attributes, with aggregation_attributes.count as the aggregation function, then choosing aggregation_attributes.average as the default_aggregation_function, to take care of the rest.