Monday, March 25, 2013

Slowly Changing Dimensions

One of the more interesting aspects of BI is being able to track an entity (person, place or thing) that undergoes a name change.  How do you ensure that all of the activity and measurements are properly correlated before and after that name change?
Business Intelligence parlay calls this type of entity a ‘Slowly Changing Dimension’ (SCD). Perhaps a simple example can clarify the issue:
  1. Your company has grown through acquisition, and your divisions have non-standardized codes that identify the source of a sale (Internet, Direct sales, word of mouth, etc).
  2. This created a lot of reporting problems because you weren’t able to compare apples to apples, so you underwent a major standardization process so that every division now uses the same code value to equate to the same code outcome.
  3. Unfortunately, the standardization process did not include changing all of your historical transactions to reflect the standardized codes, so when you run a report including dates before the standardization, the initial problem remains.
Why didn’t the standardization solve the problem, and how do you solve the new problem that now exists?
The solution can be pretty simple when you plan properly for these slowly changing dimensions.  The data required to solve a simple SCD issue like the one above might include something like this:
Division IDCurrent Code IDMaster Code IDStart DateEnd Date
12B1A2000-01-012012-12-31
13A1A2013-01-019999-99-99
12B2X2013-01-019999-99-99

This simple table allows your BI functions to:
  1. Associate code 2B in Division 1 to master code 1A during the active dates of January 1, 2000 through December 31, 2012.  Transactions during this time period can be coded correctly by joining from the transaction [c1] to the SCD table on division, current code, and start/end date range.  This is the old code that is being replaced (by 3A).
  2. Associate code 3A in Division 1 to master code 1A during the active dates of January 1, 2013 through the current date.  In this example, non-standardized code 2B has been replaced with corporate standard code 3A.
  3. Associate code 2B in Division 1 to master code 2X during the active dates of January 1, 2013 through the current date.  9999-99-99 indicates the code does not have an end date, and all transactions for division 1, code 2B, for dates after January 1, 2013.  In this example, code 2B in division 1 has taken on a whole new meaning as of January 1, 2013.  2B hasn’t been used in this devision prior to January 1, 2013.
  4. The Master code in each example allows a many-to-one relationship, so multiple current codes can ‘roll-up’ to a single master code.  You may have more than one code in a division that fundamentally means the same thing at the corporate level.
Of course this example is a simple one, and there are many circumstances of much greater complexity.  Hopefully this example will serve to illustrate the problem, and help you to solve the simple problems now, and move to more difficult ones over time.
SCD’s are a fact of life in BI implementations, and having a practical solution is critical to your data accuracy!
One final thought – standardizing your codes across a large organization can be a truly painful project.  If you can properly use SCD’s, you may find that standardizing all your codes is not required after all!

No comments:

Post a Comment