One less thing I have to do in SQL
“Can you tell me the average total monthly purchase amount our customers had month over month?” This seemingly simple question has driven many a Tableau user to madness or, even worse, SQL scripting. But now, with Tableau 9.0, Analysts have access to the powerful Level of Detail (LOD) Expressions.
LOD expressions could be compared to SQL Windowing (or OVER) functions. They allow for us to specify an additional layer of data manipulation (i.e. a sub-query) prior to our final data calculations.
In this article we will introduce the LOD expressions, discuss some Tableau history as to why these expressions are so important, then take a deep dive into the “Include” LOD expressions. We’ll walk through a use case were we can now get the average total monthly spend of our customers, in Tableau, without any data level manipulation to our transactional data.
I think this will be very informative to those ready to upgrade to Tableau 9.0, read on for more info or download the PDF file with the instructions below!
An Average of a Sum
Average of total customer spend by month is a question that inevitably gets asked in almost any transaction driven industry, and it seems to confound the very foundation Tableau was built upon – which is only bring in the data which is necessary into the visualization.
However, if I have a multiple transactions from multiple customers over multiple months, and I want to know the average of the total purchases by the customers each month, I would first need to sum the transactions by customer by month, then take the average of the amount spent each month. This means I have to use a dimension (i.e. customer) that I do not want to include in the visualization!
Here we have some transactions from three customers A, B, and C during the months of February and January of 2014.
Now let’s do some basic visualizations in my trusty Tableau 8.3:
But there is no easy way to sum up what each customer spent that month, then take the average of the total customers’ monthly purchases.
Why is that? We have to go back and see why Tableau was created in the first place…
In the beginning
Tableau began as a tool to make visualizations on top of structured databases. And it was good. Classification and Ordinal data were assigned to “Dimensions” while numeric data were assigned to “Measures” with little loading time as these were all pulled from the schema. Loading only occurred when a user dragged these dimensions and measures to the “shelf” and Tableau would construct the database query for you and pass it to the underlying database.
Although Tableau uses its own custom query language called VQL (Visual Query Language) I’ll give some examples with SQL based on the above visualizations.
Total Spend by Month (SQL):
Here we only have four total values to return – two months and the sum of amount spent for those months. The query would be as follows:
We can see the similarities between the Tableau “gems” we drag over to its shelf and the syntax in our SQL query. Our measures are aggregated (i.e. SUM([Spend])) and our dimensions are contained in the “GROUP BY” clause (i.e. GROUP BY MONTH([Date])).
Tableau has created a way for us to create visuals, directly from a database, without learning any SQL! Fantastic!
Average Total Spend by Customer by Month
But there is a problem… Now I want to answer the question from the beginning of the article “What is the average amount spent per month by customers?”
Ironically, this is somewhat easy to do in SQL
While this may look complicated, all we are doing is selecting from the items returned by another query, i.e. a sub query.
Our sub query returns the following table:
Then we repeat the same type of grouping operation in our previous query to get the average by month of the above table:
But here lies the problem, the data must be pre-aggregated in order to calculate the average of sums. Prior to 9.0 there was no way to communicate to Tableau that the data element you are dragging onto the shelf should not be used in the visual, but used to pre-aggregate the data before your final calculations.
The only solution was to either use custom SQL when connecting to a data source or have a new SQL View created, where both solutions have the data pre-aggregated by month and customer. This solution is very sub-optimal. We have limited our ability to divide the data into weeks or days and we can no longer access the individual customers.
But this has changed in Tableau 9.0!
Enter Level of Detail Expressions
Level of Detail (LOD) expressions let us tell Tableau to aggregate the data at another level before aggregating for the final visualization. There are three specific expressions in the LOD toolbox:
First we will review the “Include” expression. Here is where we will also answer the question posed at the beginning of the article “What is the average total amount spent by customers each month?”
That is all you need to do!
Now we take the average of this expression in the Tableau shelf and we have our answer:
All we had to do was tell Tableau to include the sum of spend by customer id in the aggregation calculation. For a little more detail as to what’s happening behind the scenes let’s look at our original SQL script and see which parts correspond to our visualization:
Green corresponds to our row expression, blue to column, and orange to the level of detail. You can see that everything in the subquery is either in the LOD expression or the column, i.e. nothing corresponding to the row expression is contained in the subquery. Our row expression is only used as a top level aggregation and the top level aggregation only contains values included on the shelf.
Notice something interesting, however, about our subquery. Not only does it include the aggregation specified by our LOD expression, but it also includes any other shelf aggregation (i.e. “month”) apart from our row aggregation (i.e. average of our LOD expression).
Okay, that’s good and all, but what if I don’t want to view the data month by month? What if I want a single number showing the average monthly spend of my customers? You would think just dragging the date value out of the equation would solve it, unfortunately, this is not the case:
Somehow we have a number higher than our two previous monthly averages. That shouldn’t happen! Let’s look at our code:
Ah, well there’s our problem, without including month, we are now taking the average each customer spent across all the data. We can fix this by including not only customer, but also month in our include expression:
Notice that we did have to make a calculated field indicating the month of the transaction, LOD expressions will not allow other custom expressions to the left of the colon; however, they will allow calculated fields.
Notice how our date aggregation is now orange and only contained in the subquery. This is exactly what we want and returns the appropriate value:
Fixed and Exclude
Include is probably going to be the LOD expression most used by Tableau Analysts; however there are Fixed and Exclude expressions as well. In an upcoming article we will explore these in depth, but the information contained here should be enough to get you started with LOD:
– Fixed: Used when multiple hierarchies are present in a visualization (e.g. year > month > week) will fix the calculation to a specified level of the hierarchy. Useful for percent of total calculations (e.g. % of monthly revenue by week)
– Exclude: The opposite of Include. Tells Tableau not to use a value on the shelf when performing a calculation. Useful for comparing overall totals to dynamic hierarchies.
Daniel Smith, MBA CPA CMA
Using Business Intelligence platforms to bridge the gap between Advanced Data Analytics and the efficient effective principles of accounting, Daniel applies technology and mathematics to make business faster and smarter.
Daniel has managed solutions for diverse client sectors such as as advertising, military, insurance, and oil & gas. These solutions include Business Intelligence Platform management, online key performance indicator identification and tracking, to full predictive data model construction. Although the analytic solutions are often mathematically complex, Daniel’s presentation and academic background ensures any insights delivered by solutions are relevant and simple to understand.