Data mining, processing and its presentation are the key activities in a production environment. The views and analytic integrals of the information through multiple spectacles can help in better business forecasting and strategic implementations. This innovative exercise of creating varying views might involve aggregation, data transposing, or cross tabulation reporting.
Data Pivoting, one of the ways to implement and achieve above purposes, can evolve multiple views with varying combinations of columns as rows and vice versa.
Oracle supports data pivoting through Pivot aggregate operators, which can be used in SQL statements. Oracle 11g introduced two new keywords PIVOT and UNPIVOT in support of this operation. While the PIVOT operation refers to the conversion of rows into columns, UNPIVOT implies the reverse operation.
Prior to the induction of the Pivot operator, such cross tabular reports used to get generated through workaround SQLs using self joins. While it achieved the purpose in small database, it proved to be nightmare in large enterprise data warehouse database systems. Performance used to be the most cursed area where high volume of data was multi folded on self joins.
Pivot operator transposes an aggregated row of a table into a column. The distinct row values become the columns in the output and aggregated column value places itself under the appropriate pivoted column. The syntax of Pivot operator is as below.
FROM [TABLE NAME]
- [XML] – optional clause to convert the pivoted data into XML
- PIVOT_CLAUSE uses an aggregate function on one of the column of the table. This is the data which places itself against the pivoted column accordingly.
- PIVOT_FOR_CLAUSE and PIVOT_IN_CLAUSE specify a column and its distinct values which are to be pivoted. In a transposed report, the distinct values of the pivoted column appear as the header in the output. Both are mandatory clauses, so distinct values of the column must be in hand.
Examples and illustrations
A retail firm maintains the track of customer sales in three products (Product_A, Product_B, Product_C) for the months of January, February and March. The sales data is collected for three privileged customers C1, C2 and C3.
The relational table CUST_SALES stores the data for each customer against each product in each month. The table data is as below
Now, we shall pivot the table data to give a new analytic dimension. Refer the below cases.
Case 1: Customer sales in each month for each product
PRODUCT_ID column of the CUST_SALES table has been pivoted in the below query. Now observe the beauty and intelligence of Pivot operator; it retains the positions of remaining columns (i.e. CUSTOMER_ID and MONTH) and formats the sales data in accordance with the pivoted column. Also note that the PRODUCT_ID is no more a column but its distinct values i.e. Prod A, Prod B,
Prod C are transposed as the column header in the query output.
IN (‘Prod A’, ‘Prod B’, ‘Prod C’)
In the above query, note the aggregated function, FOR clause and IN clause of Pivot operator.
Now, check the Explain Plan of the pivot query.
The plan generated shows the pivot specific optimization of the SQL query in HASH GROUP BY function.
Case 2: Customer sales for each product in each month
Now, changing the angle of perception by replacing PRODUCT_ID with the MONTH at the header level. Distinct values of MONTH appear as column in the query output.
IN (‘Jan’ as “January”, ‘Feb’ as “February”, ‘Mar’ as “March”)
In the query, note that if the value(s) of the pivoted column had to have a customized title for better readability, the same can be specified using “AS” keyword. In the CUST_SALES table, abbreviated values of the MONTH column have been replaced with their complete names.
Similarly, if MONTH column has to be made as key analysis column, CUSTOMER can be raised to column header. Sales data would automatically find its place in the new matrix.
IN (‘Adam’, ‘Jones’, ‘Kanes’)
ORDER BY DECODE(MONTH,’Jan’,1,’Feb’,2,’Mar’,3)
Unpivot operator functions just at the opposite principle of Pivot. Very rare occasions exist when an element is created along with its anti counterpart. Unpivot brings data in pivoted form back to the normal form. Its syntax and usage is same as that of Pivot.
Here, UNPIVOT_CLAUSE is not an aggregated column, but an arbitrary column name, which appears in the query to accommodate the values of unpivoted columns.
UNPIVOT_FOR_CLAUSE is also an arbitrary column name to hold the unpivoted columns values.
UNPIVOT_IN_CLAUSE defines the distinct values of the columns to be unpivoted.
CUST_SALES_MONTH is the table which holds the data in binary format, to denote whether a product has been consumed by a customer in a specific month or not. It looks more like a spreadsheet.
Now, Unpivot operator can be used here to break the CUST_SALES_MONTH table into rows to show only CUSTOMER_ID, PRODUCT_ID, MONTH and its corresponding sales. Check the query below:
order by customer_id, product_id
Now observe the output.
For each customer and product, the „month‟ column is repetitive to list the corresponding „month_count‟ value. The UNPIVOT_FOR_CLAUSE and UNPIVOT_IN_CLAUSE are responsible to have columns of the table listed as value under MONTH column.
The UNPIVOT_CLAUSE (here, MONTH_COUNT) specifies fetches the corresponding value of the unpivoted column and places at correct position alongside its customer_id, product_id and month. This demonstrates the power intelligence of UNPIVOT operator.
The explain plan for the above query shows the UNPIVOT operation for the conversion of column headers as row values.
Applications of Pivoting
We have already seen the benefits of Pivot operator. On need basis, data can be pivoted in the required format. Still, from developers perspective, I would mention an application derived from the previous ones.
Conversion of columns to rows and vice versa.
For example, you have a test data of known values, which is required in rows. One way is to combine all the separate SELECT statements using UNION operator as below:
SELECT S2, „Football‟ FROM DUAL
SELECT S3, „Badminton‟ FROM DUAL
SELECT S4, „Tennis‟ FROM DUAL
Unpivot operator eases this overhead in efficient and simple way. Check the query below.
(SELECT ‘Cricket’ s1, ‘Football’ s2, ‘Badminton’ s3, ‘Tennis’ s4
FOR STRING IN (S1,S2,S3,S4)
Pivot and Unpivot have made an impactful entry into the SQL language. They have set aside the cumbersome work around solutions and efficiently transpose the data of a relational table.