Skip to main content

Pivot

Synopsis

The Pivot Operator creates a

pivot table, summarizing the data in a larger table by reorganizing it into groups and calculating sums, averages, or other statistics for each group.

Description

The most common example of a pivot table is the sales summary report. Imagine that you sell goods in numerous stores, and you want to know the total sales in each store during every month of the year. You have a data table listing all the transactions that occur in all of the stores. For each transaction, a row in your table lists the store and the date of purchase, with separate columns for month and day, plus the cost of the transaction. The pivot table is a new table, created from your original table in the following way:

  • Create a unique row for each store.
  • Create a unique column for each month of the year.
  • Assign every transaction to the appropriate row and column, by adding the cost of the transaction.

Although in this case the cells of the pivot table display a sum (total sales for that category), you can create other kinds of statistics. A count will tell you how many transactions occurred in each store during each month. An average will tell you the average cost of a transaction. See the parameter aggregation attributes for the complete list of statistics.

Typically, a pivot table is defined by three Attributes from your original ExampleSet:

  • group by attributes - Usually categorical (nominal) values, they define the rows
  • column grouping attribute - Usually categorical (nominal) values, they define the columns
  • aggregation attributes - Usually numeric values, they are summed or averaged

However, the actual number of Attributes may be more or less than three, depending on your choices. You can, for example, choose more than one Attribute in the set of aggregation attributes.

Notice that in our original example, the group by attributes (store) and the column grouping attribute (month) have categorical values, while the aggregation attributes (cost) have numeric values. If an Attribute you have chosen for the aggregation attributes has categorical values, the range of statistics is more limited -- e.g., count is still available, but not average.

If an Attribute you have chosen for the group by attributes or the column grouping attribute has numeric values, you may find it convenient to first convert them to categorical values, by placing the numeric values into bins.

Differentiation

Aggregate

Just like the Pivot Operator, the Aggregate Operator has the parameters group by attributes and aggregation attributes, but it does not support the column grouping attribute. You can think of the Pivot Operator as a generalization of the Aggregate Operator, with a more advanced concept of groups.

Transpose

The Transpose operator exchanges the rows and columns of an ExampleSet, like the transpose of a matrix.

Input

input

This input port expects an ExampleSet.

Output

output

This output port delivers the pivot table.

original

The ExampleSet that was given as input is passed through without changes.

Parameters

Group by attributes

This parameter specifies the rows in the pivot table. Click on "Select Attributes" and choose one or more Attributes. If you choose a single Attribute, the rows will correspond to the unique values of that Attribute. If you choose two or more Attributes, the rows will correspond to unique combinations of the Attribute values. All data points with the same values of the Attributes belong in the same row.

Column grouping attribute

This parameter specifies the columns in the pivot table. Select an Attribute from the drop-down. The columns will correspond to the unique values of this Attribute. All data points with the same value of the Attribute belong in the same column.

Aggregation attributes

Within each cell of the pivot table, defined by a row and a column, one or more statistics are calculated, using the aggregation functions. Click on "Edit List", and choose an Attribute and a function from the drop-downs. The complete list of aggregation functions is given below.

  • average
  • concatenation
  • count
  • count (fractional)
  • count (including missings)
  • count (percentage)
  • first
  • least
  • log product
  • maximum
  • median
  • minimum
  • mode
  • product
  • standard deviation
  • sum
  • sum (fractional)
  • variance

Use default aggregation

If checked, this parameter allows you to define a default aggregation function for the remaining Attributes -- all those that are not used by the first three parameters. Any remaining Attributes whose data type is not compatible with the default aggregation function will be ignored; e.g., notice that average is not available for categorical (nominal) Attributes.

Default aggregation function

This parameter is available only when the parameter use default aggregation is set to true. It specifies the default aggregation function for the remaining Attributes -- all those that are not used by the first three parameters. The Attributes in question must be compatible with the default aggregation function. For example, if an ExampleSet has two nominal Attributes and all the rest are numeric, you can aggregate all of them by first including the two nominal Attributes in the aggregation attributes, with count as the aggregation function, then choosing average as the default aggregation function, to take care of the rest.