SSAS: Can I make my Distinct Count totals subtotal correctly?

I've seen this question come up a couple of times. What is usually being asked is "can we make the subtotals additive for a distinct count measure?" – and the short answer is a resounding No!

But the short answer does not really make for a compelling blog post, so let's dig a little deeper into this issue. The more complete answer is that while there are some techniques that can make a distinct count "appear" to be additive in some circumstances, they are fundamentally flawed and will produce inconsistent results as distinct count measures are inherently non-additive.

I believe that the real issue here is not a technology issue, it is a problem with user expectations. So the correct action here is not to try to change the results, but to educate your users as to what is happening.

Let's explore this by taking a look at a simple example. The following matrix shows the products bought by two different customers over a 3 month period:

   Month   Customer 1   Customer 2 
   =====   ==========   ==========

   Jan     Product A        -
   Feb     Product B    Product C
   Mar     Product A    Product D

 

So, obviously over this 3 month period, there were only 2 distinct customers. However, if we group the records by month and look at the distinct count we get the following output.

   Month  Count
   =====  =====

   Jan    1
   Feb    2
   Mar    2

   Total  5

 

Which, if we try to add up the distinct count measure, makes it look like, we had 5 customers, instead of 2 over the past 3 months.

Then if we look at the distinct count of customers by product over the 3 month period we get the following:

   Product    Count
   =========  =====
   Product A  1
   Product B  1
   Product C  1
   Product D  1

   Total      4

 

Which someone could interpret to meaning that we actually had 4 customers over this 3 month period. Instead, what really happened is that our 2 distinct customers bought 2 different products each. 

This issue is quite easy to see with a small dataset, but with a larger dataset the results can be more subtle, but the effects will still be there. The end result of this will be that this inconsistency will cause your users to loose confidence in your solution, which is the worst thing that can happen to a BI project.

So if you ever get asked to make a distinct count measure additive, tell them "no" and feel free to point the person making the request at this blog post.

Update: 22 Feb 2010

There still appears to be some confusion here so I thought it may help to post the "correct" output for both the queries above. (Which is what you would get if you created a distinct count measure over the CustomerID.)

   Month  Count
   =====  =====

   Jan    1
   Feb    2
   Mar    2

   Total  2

 

   Product    Count
   =========  =====
   Product A  1
   Product B  1
   Product C  1
   Product D  1

   Total      2

In both cases, irrespective of the attribute on the row axis the correct total is a distinct count of 2 customers. Trying to make this total any other figure apart from 2 would be incorrect.

Print | posted on Wednesday, January 13, 2010 5:29 AM