DataFlux - The Leader in Data Quality and Data Integration

Data Quality, Improvement, Pareto, and Diminishing Returns

David Loshin

September 23, 2009

Thought experiment: I am looking at a process that has 4 problems that all together reduce my efficiency by 50%. After doing some analysis I have come up with this assessment:

  • Problem 1 accounts for 25% of the efficiency reduction
  • Problem 2 accounts for 12% of the efficiency reduction
  • Problem 3 accounts for 8% of the efficiency reduction
  • Problem 4 accounts for 5% of the efficiency reduction

I have a solution for each one of these problems, each costing $10 to fix.

When I fix problem 1, my cumulative spend is $10, and I can account for fixing 50% of the problem (right? – 25/50!). Pretty good, huh?

When I fix problem 2, my cumulative spend is $20, and I can account for fixing 74% of the problem (37/50). Still pretty good.

When I fix problem 3, my cumulative spend is $30, and I can account for fixing 90% of the problem (45/50). But now that I am thinking about it, when I average it out, each time I fix the next problem, the relative investment goes up. Basically, fixing problem 3 cost the same as fixing problem 1, but accomplished about 1/3 of the value. In addition, the aggregate cost goes up linearly, but the average value goes down (fixing only the first problem got me a pretty good bang for my buck, but that eroded in terms of the average value, which is now $10 each for about 30% of the inefficiency.

Yes, I am playing with some numbers here, but actually I am demonstrating the concept of diminishing returns and Pareto’s principle, which suggests that there is some point where the incremental value you get is not worth the investment. This should be considered when reviewing options for instituting systemic improvements (such as data cleansing, improved matching, etc.) where the level of effort to get incremental improvements is greater than the value generated by having the improvement.

tags:  

  1. #1 by Ken O'Connor at November 5th, 2009

    Hi David,

    Excellent post.

    One of the challenges we face in the Data Quality profession is building the business case for Data Quality improvement. In your example, you are fortunate enough to have monetary values on the cost of each of the 4 problems. It is difficult to argue against applying the 80:20 rule in the scenario you paint.

    But what about the impact on other parts of the Enterprise that may depend on the same data, such as:

    1. Data feeds into regulatory systems (e.g Anti Money Laundering, BASEL II, Solvency II etc.)
    2. Access from or data feed into CRM system
    3. Access from or data feed into Business Intelligence system
    4. Ad hoc provision of data to satisfy regulatory requests
    5. Increasingly – feeds to and from other organisations in the supply chain

    You could perhaps argue that the 80:20 principle extends beyond the specific process impacted, into the above processes…

    I believe we need to educate business managers on the need for data to become reusable and interchangeable, in fact to become “plug and play”.

    For more on the above – see: Lego Blocks and data quality

    http://kenoconnordata.wordpress.com/2009/10/22/lego-blocks-and-data-quality/

Comments are closed.