Group by an expression… and limitations of MDX vs SQL

Some queries can be easily expressed with MDX while it’s not possible to get the same result using SQL. For example, if we use Adventure Works sample cube and would like to display internet gross profits by country in a way so that we have separate column for each year it can be easily achieved with below MDX.

select
  non empty [Date].[Calendar Year].[Calendar Year] on 0,
  [Customer].[Country].[Country] * [Measures].[Internet Gross Profit] on 1
from [Adventure Works]

It displays column for each year which has data for internet gross profits. In SQL, however, all columns must be explicitly listed in a query (or derived from table structure in case of select *) – whether in select list or in pivot clause. Generically speaking, SQL is designed to return two-dimensional datasets with arbitrary number of rows and pre-defined number of columns while MDX can return data on multiple axes (dimensions) although it’s a tricky question how to visualize the output if we use more than 2 axes.

On the other hand, sometimes it quite challenging to implement the logic in MDX which can be expressed using SQL in a very trivial manner. Let’s consider a couple of cases.

First: Return aggregated measure by N first symbols of an attribute.

More specifically, we want to see internet gross profits aggregated by first symbol of each city.
It’s relatively straightforward if we can list all the resulting members in a query.

with 
set non_empty_cities as
  filter([Customer].[City].[City].allmembers, [Measures].[Internet Gross Profit])
member [Customer].[City].[A] as
  sum(filter(non_empty_cities, left([Customer].[City].currentmember.name, 1) = "A"))
member [Customer].[City].[B] as
  sum(filter(non_empty_cities, left([Customer].[City].currentmember.name, 1) = "B"))
member [Customer].[City].[C] as
  sum(filter(non_empty_cities, left([Customer].[City].currentmember.name, 1) = "C"))
select
  [Measures].[Internet Gross Profit] on 0,
  {[Customer].[City].[A], [Customer].[City].[B], [Customer].[City].[C]} on 1
from [Adventure Works]

This solution requires a lot of copy & paste efforts even for 1-symbol prefixes and it’s not reasonable at all when prefix length is a few symbols long.

Alternatively, we can get an aggregate for each prefix using generate function but this approach does not allow us to create a member for each prefix (like above solution with hard coded calculated members).

with 
set non_empty_cities as
  filter([Customer].[City].[City].allmembers, [Measures].[Internet Gross Profit])
set first_cities as
  generate(non_empty_cities as y,
           filter(y.currentmember,
                  rank(y.currentmember,
                       filter(non_empty_cities,
                              left([Customer].[City].currentmember.name, 1) = 
                              left(y.currentmember.name, 1)
                             )
                      ) = 1
                 )
          )
member [Measures].[X] as
  generate([Customer].[City].currentmember as y,
           sum(filter(non_empty_cities,
                      left([Customer].[City].currentmember.name, 1) =
                      left(y.currentmember.name, 1)
                     ), [Measures].[Internet Gross Profit]
              )
          )
member [Measures].[LeftN] as
  left([Customer].[City].currentmember.name, 1)
select
  { [Measures].[X], [Measures].[LeftN] } on 0,
  first_cities on 1
from [Adventure Works]

It gets one city for each prefix and calculates an aggregate for it. Prefix itself is exposed in additional measure. It would be much better if we could put new members named as prefixes into a dimension but it does not seem possible if we do not know all the new members in advance.

Second: Return a number of dimension members belonging to particular measure value.

More specifically, we allocate measure into buckets 1, 10, 100, 1000 etc

with
set non_empty_cities as
  filter([Customer].[City].[City].allmembers, [Measures].[Internet Gross Profit])
member [Measures].[X] as
  10 ^ int(vba![log]([Measures].[Internet Gross Profit])/vba![log](10))
select
  [Measures].[X] on 0,
  non_empty_cities on 1
from [Adventure Works]

and would like to know the number of cities belonging to each bucket.

Below query does the job if we know all buckets in advance.

with
set non_empty_cities as
  filter([Customer].[City].[City].allmembers, [Measures].[Internet Gross Profit])
member [Measures].[X] as
  10 ^ int(vba![log]([Measures].[Internet Gross Profit])/vba![log](10))
member [Customer].[City].[Bucket 1] as
  filter(non_empty_cities, [Measures].[X] = 1).count
member [Customer].[City].[Bucket 10] as
  filter(non_empty_cities, [Measures].[X] = 10).count
member [Customer].[City].[Bucket 100] as
  filter(non_empty_cities, [Measures].[X] = 100).count
member [Customer].[City].[Bucket 1000] as
  filter(non_empty_cities, [Measures].[X] = 1000).count
member [Customer].[City].[Bucket 10000] as
  filter(non_empty_cities, [Measures].[X] = 10000).count
member [Customer].[City].[Bucket 100000] as
  filter(non_empty_cities, [Measures].[X] = 100000).count
member [Measures].[cnt] as
  [Measures].[Internet Gross Profit], format_string = "General Number"
select
  {[Measures].[cnt]} on 0,
  {[Customer].[City].[Bucket 1], [Customer].[City].[Bucket 10],
   [Customer].[City].[Bucket 100], [Customer].[City].[Bucket 1000],
   [Customer].[City].[Bucket 10000], [Customer].[City].[Bucket 100000]} on 1
from [Adventure Works]

Generic solution is similar to the one for first quiz. We get a set containing one member for each bucket and calculate a measure for each member from that set.

with 
set non_empty_cities as
  filter([Customer].[City].[City].allmembers, [Measures].[Internet Gross Profit])
member [Measures].[X] as
  10 ^ vba!int(vba![log]([Measures].[Internet Gross Profit])/vba![log](10))
set buckets as
  filter(order(non_empty_cities, [Measures].[X]) as s,
         [Measures].[X] > ([Measures].[X], s.item(s.currentOrdinal-2))
        )
member [Measures].[cnt] as
/*
  generate([Customer].[City].currentmember as y, 
           filter(non_empty_cities, 
                  measures.[X] = (measures.[X], y.item(y.currentOrdinal-1))
                 ).count
          )
*/
  filter(union(non_empty_cities, [Customer].[City].currentmember as y),
         [Measures].[X] = ([Measures].[X], Y.item(0)) 
        ).count
select 
  { [Measures].[X], [Measures].[cnt] } on 0,
  buckets on 1
from [Adventure Works]

Actually, solution for the first quiz can be implemented in a similar manner without generate function. In the first step we can derive first city for each bucket in ordered set (named set declaration) and in the second step we can calculate measure for each of them (calculated measure declaration). There is no strong need to use generate function for considered tasks because we do not really need to generate any set but rather we need to filter a set in the first step and calculate a measure in the second step. What important here is using set aliases otherwise it would not be possible to implement filtering.

So first quiz required aggregation by an expression on attribute (by prefix) while second quiz required aggregation by measure value. Approach in both cases is very similar and with the same limitation – i.e. we cannot add dynamic number of members into particular dimension thus we can only apply some filtering in runtime and do calculation for filtered sets.

I started this post speaking about limitations of SQL, in particular we need to know all the columns in select list in advance. More specifically, all columns must be known during PARSE stage of a query. Similarly, in MDX if we want to add calculated members to a dimension all of them must be know in advance before execution.

It’s worth to mention that required members can be pre-generated if we construct MDX script based on the output of MDX query. Well… similarly we can dynamically construct SQL script if we want to deal with dynamic number of columns in the select list.

SQLMDX

Eventually it's all about data

Group by an expression… and limitations of MDX vs SQL

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply