DataFlux - The Leader in Data Quality and Data Integration

The DataFlux Community of Experts is a forum for industry thought leaders to provide perspective and insight and engage in discussions on issues surrounding data governance, data quality, data integration and master data management. Our regular contributors include:

Alexander Pope, Indifference, and Data Atrocities Revisited

Alexander Pope once wrote, “To err is human, to forgive is divine.” Lamentably, making errors is all too human, as I wrote three months ago in a post about data atrocities. In this post, I’m going to take issue with Pope’s quote.

Poles Apart

A few years ago, I consulted on an system upgrade for a large organization. I worked primarily with two people: an incredibly diligent end-user (call her Nina) and an incredibly aloof one (call her Maggie). Nina worked in Finance. She took her job seriously, identified issues, and did what she said she’d do. We worked really well together. On day one of the project, I taught a class on the new functionality of the system. She caught on quickly. Throughout the project, she manifested legitimate issues and worked to understand the new version of the system. In all my years of consulting, I never worked with a better client end user than Nina.

Maggie did not share her counterpart’s enthusiasm or even general interest. She didn’t pay much attention during training, testing, and (ultimately) when we went live.

Read the rest of this entry »

tags:  

No Comments

The Fiscal Calendar Effect

Since today is September 1, I decided to revisit my previous blog post Eternal September and Tacit Knowledge (ironically published in February), which was about the need to formalize your organization’s explicit knowledge as well as the more complex challenge of finding ways to share your organization’s tacit knowledge — the collective wisdom of employee experience.

The common effect of Eternal September is so-named because September is the beginning of the academic year for most colleges and universities, marking the arrival of new freshmen (i.e., first-year students) requiring orientation.

Therefore, Eternal September is just one of many calendar effects, which are changes in behavior that appear to be related to changes on the calendar. 

Calendar effects are sometimes referred to as seasonal tendencies. Although they can have other contexts, they are mostly discussed in an economic context, such as their effects on stock markets and retail prices.

For the purposes of this blog post, I want to focus on the Fiscal Calendar Effect. Specifically, how it affects enterprise information initiatives.

Read the rest of this entry »

tags:   ,

3 Comments

Joins that Make Sense

I have had clients who created views just to bring together data from various databases (which does NOT work very well!).  I have also had clients who had absolutely no views at all. I choose to be in the middle. 

I create views when a view makes sense.  For example, if I create a view to join two tables that are always joined together in a SQL statement, it can actually give the query a bit of a performance kick.

Read the rest of this entry »

tags:  

No Comments

Data Governance and Data Quality: The HR Analogy

For many years there was no real talk of data governance. If you take a look at some of the leading data quality texts over the last 10 years, one could argue that as a distinct discipline it wasn’t covered in any real depth - yet today its acceptance is commonplace.

Entire conferences, careers and technologies are now devoted to data governance, but one of the recurring questions is: how does data governance relate to data quality management?

Sure, you can create a list of all the activities that exist in each discipline (trying to get an industry agreement is somewhat harder), but when explaining these disciplines to those outside the profession a simple analogy becomes very important.

The easiest way I’ve found to explain this relationship is that of the Human Resources (HR) function and the other departments in any organisation.

Let’s look at the high-level goals of a typical HR department:

  • Increased productivity
  • Legal compliance
  • Competitive advantage

Those goals are manifested through a Human Resources policy framework that spans the entire organisation. Each department or organisational unit will receive a common set of HR policies as well as a tailored set of directives to cope with the particular human resources needs of that area. For example, the executive team might have to follow a very different set of policies compared to an engineering team that works down a mine shaft all day.

Read the rest of this entry »

tags:   ,

7 Comments

Half Measures

I’ve written before on this site about Breaking Bad, the fascinating show on AMC about a high school science teacher with an interesting side job manufacturing crystal meth. In When Data Quality Breaks Bad, I described events and trends that cause data quality to, well, break bad.

Today, I’d like to revisit the show in the context of half measures, the title of one of the best episodes of the previous season. For the purposes of this post, half measures represent compromises made during data quality and cleanup endeavors. For example, let’s say that an organization’s systems contain years of suspect data. It intends to clean up what it needs, purge what it doesn’t, and archive all of it in the event of an unexpected issue. Of course, things break bad in the form of delays, exceeded budgets, politics, and all other sorts of fun stuff.

Specific half measures may include:

  • Cleaning up some of the data
  • Postponing parts of the data cleanup efforts
  • Taking a “wait and see” approach as more issues are unearthed

So, when are half measures appropriate?

Read the rest of this entry »

tags:   , ,

No Comments

Data Quality in Medias Res

The planning and execution of enterprise information initiatives is definitely not easy.

Building the business case involves identifying, documenting, verifying and refining a set of requirements that are representative of the various perspectives of the business and technical stakeholders all throughout the organization.

Many such initiatives begin with the very best of intentions, and sometimes with a grand vision of delivering a data-driven solution to every business problem.

But the sobering reality of limited resources, especially financial resources, forces the practical compromises of selecting what can be delivered in a reasonable timeframe and within a reasonable budget.

Therefore, some requirements get deferred to future phases of the initiative, and the scope of the initial deliverable gets established. 

Read the rest of this entry »

tags:  

5 Comments

Master Data Consolidation and Semantic MDM – The Final Chapter?

Actually, if you have been following this stream of blog posts, you might notice that even though the titles all refer to master data consolidation, I have really been talking about master data modeling. But to some extent, can’t we say that modeling and consolidation are tightly coupled? If we don’t have a reasonable model for representing master data, how can we develop our data integration strategy to migrate data from data sources into the master repository?

The ideal approach would be to identify the core entities, then the roles those entities can play, all within the business contexts associated with operational and analytical business processes. Of course, this transcends the typical master data consolidation approach, which is to dump data into a single repository. This requires some up front considerations, largely focusing on a few key items:

  • Business process model, which describes the business processes, participants, and the roles they play, and
  • Semantics, which captures the meanings and hierarchies and ontologies and taxonomies and all sorts of other key principles associated with the ways the business engages those participants.

So master data modeling is less about the data dump and more about a considered representation of interacting parties within a well-defined information ecosystem. Is this really the final chapter? Actually, no, because it leads into deeper thoughts regarding business modeling. And from that perspective, a semantic approach to MDM yields more than a blueprint for data consolidation. Rather, it provides a business artifact that more accurately documents the inter-relationships among master data concepts and entities within the organization while capturing critical semantic knowledge that often gets lost in the shuffle.

No Comments

More Data Please…

Leaving politics aside, the Dodd-Frank Wall Street Reform and Consumer Protection Act passed here in the US in July is going to require financial institutions to report more data to the federal government than these folks have been reporting in the past. After doing some Google-Kung-Fu it looks like financial institutions have to comply to a newer version of a guideline called Home Mortgage Disclosure Act (HMDA) (http://www.ffiec.gov/hmda/guide.htm), and smaller businesses are required to report data under a guideline called ECOA. A quick glance at the HMDA website and it looks like the reporting guidelines have changed almost every year, so this shouldn’t be significant news to those folks who need to supply data to Uncle Sam each year.

Read the rest of this entry »

tags:   , ,

No Comments

The Softer Side of Selling Data Quality

I was reading a post on a data quality forum recently in which someone was attempting to sell a data quality solution and was looking for tips on ROI models.

This is a tough question to answer, obviously, as it’s entirely subjective – but one of the main omissions in the data quality sales process is a lack of focus on the personal motivations of the decision-maker and stakeholders.

This plays out even when trying to pitch new from internal groups as opposed to external solution providers.

Do personal and corporate motivators align?

Take a typical customer manager in a large utilities organisation; he may be the ultimate decision-maker and stakeholder for data quality improvement.

Knowledge workers in the data boiler room may notice they’re spending an increasing amount of time removing duplicated customer records, so they flag up the need for either process improvement or perhaps technology acquisition.

Read the rest of this entry »

tags:  

1 Comment

Big Data, Small Data

I’ve been thinking a great deal about data with respect to company size. In this post, I’ll address one question: Is it better to have big data and small tools or small data and big tools?

The Big Data, Small Tools Argument

To be sure, “small” tools like Microsoft Access and Excel have their limitations. That’s not to say that they can still be really useful. For example, on a massively complicated project a few years ago, I had to aggregate, dissect, then re-aggregate probably about 100,000,000 records from about ten different systems. I nearly broke Access and Bill Gates himself sent me a restraining order.

OK, I made that last part up.

I had to use Access because the company didn’t have a realistic alternative; I needed to create simple, user-friendly front end with forms and buttons invoking complex macros and VBA working their magic behind the scenes. Of course, my audience didn’t need to know about the development of this mini-application; they just needed an answer to their dilemma. Access worked because they could not have handled running SQL statements - not that IT would have been remotely comfortable with that. All things considered, I had to use a “small” tool for really big data.

Read the rest of this entry »

tags:   ,

2 Comments