DataFlux - The Leader in Data Quality and Data Integration
  1. #1 by Charles Blyth at January 13th, 2010

    Jim

    Another great post. We have discussed before about how important ‘context’ is in the realm of data quality. Metadata is a key entity when it comes to defining and understanding context in data, and therefore I agree, it is an integral part of data quality.

    PS: No ‘olde English’ quotes here, are you feeling alright?

  2. #2 by Phil Simon at January 13th, 2010

    Interesting post, Jim.

    This line:

    “A man with one watch knows what time it is. A man with two watches is never sure.”

    …made me think of the NFL adage:

    If you have two starting quarterbacks, you have none.

  3. #3 by Rob Paller at January 13th, 2010

    Jim,

    I think metadata transcends many data initiatives. The trouble with metadata is that it can be very time consuming just trying to get consensus on what exactly is a product, customer, or vendor to a company; or a student, alumni, or benefactor is to a university. It requires that every part of the enterprise is represented and with that comes politics, ownership, and power struggles.

    How many enterprises come together with the full intention of doing it right when a new operational system comes online or a data quality program begins only to let metadata fall to the way side when things get heated and ugly in the conference room? It takes a special individual to facilitate (mediate?) such endeavors with sufficient support from management (C-level?) to keep things from stagnating and ultimately being abandoned because the project is falling behind schedule.

  4. #4 by Jim Harris at January 13th, 2010

    @Charles Yes, context is key in data quality. No ‘olde English’ quotes, but how about more Shakespearean paraphrasing:

    “What’s in a name? That which we call data
    By any other name would stink without good metadata.”

    @Phil Good (American, the only REAL kind – oh no he didn’t!) football analogy.

    @Rob Excellent points. I agree that “metadata transcends many data initiatives.” I believe that data quality also transcends data initiatives.

    I am coining a new term: Data Transcendentalism. Paraphrasing Raplh Waldo Emerson:

    “So shall we come to look at the world of data with new eyes. It shall answer the endless inquiry of business intelligence. What is a Single Version of the Truth? What is good data? Build, therefore, metadata that accurately depicts the wide world of your most decision-critical enterprise information. The faster you conform your organization to the best practices in your collective business context, the sooner you will realize its great potential.”

    Thanks for your comments, your feedback (as always) is greatly appreciated.

  5. #5 by Garnie Bolling at January 13th, 2010

    Jim, excellent “context” :)

    Back to the basics, since the data will “transcend” across the enterprise, it is the core definition, the all serving “core” meta data that should be the bases of your “golden view(s)”

    taking that the next step, as we know, there will be a need for multiple “views” of that meta data, such as sending “gold information” to another application with it’s own meta data requirements.

    So I like your Transcendentalism philosophy for data.

    Oh and one more thing: Answering a question with a question…. yes always a good thing, we need to find the real question behind the question asked :)

    Thanks Jim for some great posts and insight.

  6. #6 by Monis Iqbal at January 13th, 2010

    I’m taking the example of a database. Here we can call the table statistics as meta-data with the perspective of the DB vendors and DB admins.
    From the business domain perspective, the meta-data changes to what they expect in high level reports.

    I think it may or may not be a part of data quality, depending on what data (meta-data) it holds.

  7. #7 by Dalton Cervo at January 13th, 2010

    Hi Jim,

    Very good posting indeed! I think metadata is one of the most overlooked aspects of data management, and I think that’s because it is pretty darn hard!

    Metadata can potentially encompass so many levels. From a single data element on the database to a more complex entity, such as customer, for example, which will be a composite of other elements and/or entities. You mention revenue, which is another big one, with so many dependencies and context related issues.

    Metadata, in my opinion, is closely associated with quality of data and processes. I see lots of inconsistent workflows because of misinterpretation of the meaning of the data.

    You really got me thinking about metadata as a dimension. But, as much as I think it makes sense, I prefer to have it separate. Metadata is data as well, and as such, it is itself a fair game for data quality. You could potentially apply dimensions of quality to metadata. As much as I like recursive algorithms, I think that’s kind of a stretch here ;-)

    Furthermore, I see Metadata Management as a much bigger task when compared to other dimensions. You may even need a separate repository, track things such as source of data’s value, transformations performed, business rules applied, etc.

    I have more to say on this, but this comment is getting too long. Maybe I’ll add a post to complement your excellent thought provoking entry.

    Thanks!
    Dalton.

  8. #8 by Christopher Blotto at January 13th, 2010

    Jim, great post.

    I actually feel that your question could be transversed and looked at as… is DQ a dimension of metadata.

    Metadata is the hot commodity of 2010. The hot commodity evolution for Information Management has taken us through MDM, then Governance, to DQ, and now Metadata.

    Bottom line users want all of the things that you led with; accuracy, completeness, transparency, and trust.

    Our methodology is to look at information processing then look at all of these critical components to achieve the desired outcomes. With the emergence of the semantic web, ontology modeling is going to bring knowing more about our data to the forefront of any information centric program. I see business processes driving information requirements, ontology modeling driving consensus across LOBs, metadata driving governance as well as information quality management (stewardship), with technology components such as data quality, integration, rules management, and MDM being enablers.
    Lineage traceability to dynamically link metadata from an enterprise model aligned to process and repository (application, ODS, EDW…) centric metadata will be imperative to truly enable operational stewardship which is the ultimately enabler for DQ.

  9. #9 by Jim Harris at January 13th, 2010

    @Garnie Great points (and by the way, you have a really great name – or perhaps I just have a really boring name). Back on topic, I definitely agree with you that there are “core” metadata that the enterprise needs to share as a common foundation, as well as allowing multiple views proving the necessary flexibility for day-to-day operations – just as long as there are justifiable business reasons for doing so.

    You don’t want @Charles and I going off on our Battle of the “Single Version of the Truth” – well, if you do, then check out this link:
    http://www.ocdqblog.com/home/beyond-a-single-version-of-the-truth.html

    And I definitely like answering a question with a question – I was never able to shake the habit most of us developed as kids, where you keep asking “why?”

    @Monis (another really great name!) You make an excellent point about perspective and that not all metadata necessarily relating to data quality – such as the table statistics you mentioned, or operational metadata such as start, end, and duration for process runtimes.

    @Dalton (does everyone have a cooler name than mine?) Deep thought here: “Metadata is data as well, and as such, it is itself a fair game for data quality.” I call this the Cartesian Recursive Paradox – “I am meta-data, therefore I am data; therefore I exist within data quality.”

    @Christopher (okay, I might be able to compete with you first name, but your last name is way cooler than mine!) Metadata is not a dimension of data quality because data quality is a dimension of metadata. Ah – the ontological argument has arrived – wow, what’s with all the deep philosophy today? But seriously, ontology modeling is an excellent example of rich and pervasive enterprise metadata architecture sharing the collective business context needed to understand (and properly leverage) decision-critical enterprise information assets.

    Thanks everyone for your awesome comments – once again proving that your feedback is the best part of the blogging experience. :-)

  10. #10 by Monis Iqbal at January 13th, 2010

    @Jim thanks for the compliments on the name :)
    and btw don’t be too harsh on your name although I know you are kidding. After all, having a name like Joe doesn’t make you an Average Joe, right? :D

  11. #11 by Rayk at January 13th, 2010

    It may be mentioned but I need to add my comment on this great post (and yes, I want a warm feedback on my name as well).

    Currently I’m working on collecting requirements for a Meta data Management System (MDMS), because this is part of our relatively new Data Quality Program. I must say that I have some problems with this.

    IMHO it is not a solution to start collection Meta data only to proof that something is wrong with your data. There must be something more and so I reversed the arguments:

    Let us introduce a MDMS, set up processes (including data owners etc.), maintain stuff properly, and spread expectations (valid values) and check routines into the relevant systems. The result is? Data quality.

    I totally agree with “quality is a dimension of metadata” (and would like to use this phrase). My expectations are that – beside others – Quality in Data is the result of an active meta data management.

    Love you guys!

  12. #12 by Jim Harris at January 13th, 2010

    @Rayk (Yes, yet another fantastic name!) I wasn’t suggesting that the point of collecting metadata is simply to prove something is wrong with your data.

    It was more about clarifying the business context of the data, in order to improve understanding and therefore, among other things, verify usage and help determine if an actual data quality issue exists – which to do so comprehensively, requires other “dimensions” of data quality beyond metadata.

    However, I don’t think we are really that far apart in what we are saying. Metadata and data quality are interrelated – and they are both intricately interrelated with all enterprise data initiatives.

    I have a tendency to see “data quality management” in all things and you, at least it appears to me, have a tendency to see “metadata management” in all things.

    Neither perspective is either right or wrong – but more likely, a matter of semantics.

Comments are closed.