Notes
Outline
IS660J: Data Warehousing
“Reporting & Analysis”
Lecture 6
Professor Burns
3 Ways to Present Information
Traditional Reports – focus of the chapter
Pre-defined vs Adhoc
Pivot Tables – OLAP; Drilling
Charts
Pies
Plots
Lines
Areas
Bars
Variations of the above
Plus – Portals
http://www.cognos.com/products/tours/index.html
Pivot Tables
Pivot Tables
Pivot Tables
Pivot Tables
Pivot Tables
Manipulating Pivot Tables
Manipulating Pivot Tables
Charts - Bar
Charts - Line
Charts - Pie
Charts - Plots
Bubble Charts
Charts - Combinations
Charts - Combinations
Traditional Reports
Simple Report – Report 13.1
Fully-Additive Measures – stored and aggregated in SQL statement – Body rows
Non-Additive Measures – components are fully-additive; calc in the same SQL statement or in the application
Summary Rows – Report 13.2
One dimension - Total
Multiple dimensions – multiple levels of aggregation
Sub-Totals and Total
Non-additive measures – always take the ratio of the sums of the additive components, not the sum of the ratios (Margin %)
Recall Chap 3 - Storing Margin Dollars vs. calc Margin Dollars – computation of summary rows for a non-additive measure requires access to its fully additive components, even if they will not be displayed
Traditional Reports
cont. Report 13.2
Computational Approach
Retrieve data for columns 1(Brand),2 (Product), 3 (Sales Dollars),4 (Margin Dollars)
Compute summary rows (Row Sub-Totals and Total)
Compute column 5 – non-additive measure Margin %
Ease of Use Test – does the tool manage these details?  In other words, does it provide functionality that masks this complexity from the end user?
Traditional Reports
What happens when the values summarized are not fully-additive, or not composed of fully-additive components? – Report 13.3
Computational Approach doesn’t work well here because we’re averaging in the body rows themselves and the domain of customers vary across brands and products
Re-query Approach
Role of the Application/Tool
Traditional Reports
cont. Report 13.3
Re-Query Approach
First, construct the body rows: query Product, Brand and Average Customer Sales.
Next, we query Brand and Average Customer Sales.
Lastly, we query and compute the Average Customer Sales for all products.
3 SQL queries.
Each query is successively simpler
If we use a well-designed set of aggregate tables, each succesive summary level will enjoy a performance boost over the previous.
Traditional Reports
More on averages
Recall Chap 3. – semi-additive measures require averages
Bank Account Balances are not additive over time
Basket counts are not additive across products
Averages are a ratio; non-additive
Average = sum of a qty / cardinality
SQL’s Average function often gets the cardinality wrong; therefore, averages are best constructed by retrieving the sum and cardinality components, then calculating the ratio.
"Suppose we are trying to..."
Suppose we are trying to calculate the average inventory for a product in all warehouses
SQL Average function would count up the rows in the fact table and that would be the cardinality; but is it?  What if a warehouse didn’t have a product?  This is an issue of sparsity.
Solution:  Load zero rows.
No.  We would balloon the size of the fact tbl.
What if we want the average inventory for a product across the company (assume our grain is inventory by product by month and we have 4 warehouses)?
Now we have another problem – a grain problem.  We want to compute an average at a higher level of summarization than our fact table.  In other words,  our cardinality using the SQL Average function will be 4 x 12 = 48 when we really want 12
Solution is retrieve the sum and cardinality components, then calculate the ratio
Cardinality could be relative (ie. period averages); cardinality could computed based on dimension data or based on facts (ie. Average product sales by Brand for 1/01 - number of products in a brand would need to be the cardinality in the average for each brand).  The latter would require 2 queries.
Constraining Queries
Where clause
Simple:  Adding a line to restrict Report 13.1 to only 1/98.
Note:  In situations using multiple fact tables (drill across), each query must be constrained in the same manner.
Constraining Queries
Based on Comparision to Summary or Rank
Avoid complex SQL query; let the application handle it.
Show me sales by Product for Jan, only for Products that account for more than 30% of their brand totals
Interrow comparisions – see Report 13.4
Filter out rows < 30%
Constraining Queries
Based on Group Membership
Show me the average balance of new savings accounts opened by recipient’s on January’s Super-Saver mailing
SQL
Create a study group table – households part of the Super-Saver mailing
Join the study group table to the fact table
Application
Set Operations – union, intersect, minus – reduce complexity and increase performance
Quiz Review