Accessibility Links
Main Content

Data Quality Strategy

June 26th, 2008

I was just creating a data quality strategy for a customer going through a conversion from multiple sources to one integrated database. While doing the conversion, new software will be introduced and propagated out to the organization. Not an easy task for any organization! Not only are they doing a conversion, but they purchase enhancement data from a third party, which means more quality and integration issues to address. So here is what I know:

  • There is incomplete data in all the sources
  • There is incorrect data
  • There is historical data that must be converted
  • The purchase these files once a year that must be updated in the system

What I don’t know:

  • How far back in history do we need to convert
  • How long do we keep history in the new system
  • Will they require reporting over time – if so, then we are talking a separate data warehouse
  • How bad is the data? Yep, I am scared

So here is the plan:

  • Profile the sources and analyze the relationships
  • Understand how the data should be converted and cover all those gaps
  • Apply data quality routines in the conversion programs where required
  • Profile the data after it is converted
  • Set up continual monitoring of the data at specific intervals
  • Set up profiling routine for the third party data to be used prior to load/integration program execution

Do you think I have it all handled?

2 Responses to “Data Quality Strategy”

  1. Hi Joyce

    Sounds like you’ve got it well covered, as this appears to be something of a data migration project I would probably add…

    1) Ensure there is a business process workshop on the target environment with a view to identifying potential changes of use or new pressures on the underlying data. For example are they looking to automate processes - will the legacy data hold up to this new integrity?

    2) Create a data quality rules process whereby you regularly (ie. weekly) present the DQ issues in a business context to business sponsors, subject matter experts etc. so the business takes ownership for the issues you are finding and you jointly arrive at a solution for moving forward.

    3) Create a customer/supplier contract of quality on the suppliers data so they know what to expect of your quality control, be proactive and show them how they could deliver the same checks and balances within the supplier information chain so they NEVER send dirty data, focus on the financial benefits to them.

    4) I always take the approach that NO data is in scope unless the business can convince the project otherwise so you could build that into the cycle too.

    5) Instead of “profile the data after it is converted”, I would tighten that up somewhat. You should have a set of comprehensive data quality rules that dictate data quality levels across all phases of the project, these are not just the standard profiling checks which are way too simplistic for most data migrations. I think you will almost certainly need some data quality rules with pseudocde and delivered either via scripts or via the kind of advanced DQ processing checks that profilers can’t provide (but I know dataflux possesses). Basic profiling will provide brushstroke analytics but you need to deliver proper data quality rules procedures to fully assess migration success (in my opinion).

    6) Also forgot to add that I prefer to take a top down approach as follows……

    Business function requirements (comes out of the BP workshop mentioned earlier)

    Conceptual modelling gap analysis - ie. what are the broad conceptual entities and relationships in each environment, how do they differ, what are the gaps, how will they be resolved?

    Logical modelling gap analysis - same as conceptual but much deeper

    Physical modelling gap analysis - this is where you explore internal inconsistencies in the legacy platforms first then compare these to the target platform.

    This may seem a bit long-winded (don’t know how big your project is so may well be) but it is still advisable because if you take a bottom-up approach (very common in the DQ fraternity) you can miss some big issues that are not immediately apparent. It also gives you a great tool to carve up workload eg. Jan - you look at the “Customer” conceptual datasets, Bill - you look at the “Equipment” conceptual datasets etc., makes it a lot easier to manage.

    Hope that helps, we’ve been covering a lot of this off over at DataMigrationPro.com so please feel free to take a look.

    Dylan Jones
    Founder - DataMigrationPro.com
    Global Professional Community

  2. Joyce Norris-Montanari said: June 27th, 2008 at 7:25 pm

    Fabulous Input!
    Did you create a business rules document? I just did for a client, and mapped it to all the processes.. FUN STUFF!

Leave a Reply

(required)
(required)

The blog content appearing on this site does not necessarily represent the opinions of DataFlux