Want to know where to get the most “bang-for-your-buck” with respect to data quality investment?
Dollar for dollar it has to be the source of the information “river” for every organisation – data entry.
The typical information river will start with small trickles of data from web forms, local applications and data entry processes which gradually merge into larger tributaries until they become wide rivers flowing into our data warehouses and other downstream data stores.
But is your organisation investing the right level of data quality resources at the creation points of these information flows?
Organisations regularly invest in downstream data quality solutions but pay scant regard to the quality of the processes, applications and people that create the information at source.
A good example of this is web forms.
Graham Rhind, one of the most active members in the data quality community, is presently waging a war on poorly designed web forms. These are a major source of information creation and are symptomatic of a problem that most organisations face: lack of standards, training and measurement in their data entry processes.
(Incidentally, Graham has recently released an excellent free ebook on how to manage web forms proactively, more details here. My advice is to download it and distribute it to those who create web forms in your business).
Having worked on many data quality projects I find it frustrating at the lack of involvement of data entry workers in the data quality life cycle and the general apathy when it comes to data entry standards. I continue to witness organisations investing hundreds of thousands of dollars on data quality technology and absolutely nothing on root-cause prevention.
So, here is a simple checklist to see if your data entry processes are well managed. This is not exhaustive but it is certainly a starting point:
- Does every data entry worker receive basic data quality training?
- Are all key data entry gateways subject to data quality monitoring?
- Is there a feedback mechanism to help data entry workers raise concerns about the quality of the process/application/training/ etc?
- Are all key data entry gateways assigned to a data steward?
- Does your company have a data entry policy dictating standards for data entry process design?
- For external data entry (eg. Web forms) is there a simple means to gather feedback or provide support?
- Are downstream data quality defects traced back to the source or cleansed at their downstream location?
If you’re about to invest in data quality, ensure you include root-cause prevention in your future plans.
Think about the people and process aspects of data entry.
Get the data entry workers engaged, transform them into a fledgling data quality community. Educate and support them. Listen to them, they will tell you more about your data and it’s issues than any data quality consultant every could.
Look at your data entry processes and standards.
What standards?
Exactly – now is the time to create some.
Start with the basics – all forms must be tested by users and customers, not solely by software developers. The design phase must also have representation from user groups.
Build in prevention. If you have data quality tools then move them higher up the information chain. Get closer to the source.
By all means use cleansing and transformation in downstream processes, Rome wasn’t built in a day. However, ensure that you have a simple feedback loop so these transformations are passed to the data entry designers in the form of preventative specifications.
Or what about re-use? Why not design web services using your cleansing and transformation functions to help prevent dirty data right at the point of creation?
Establish stewards for the data entry gateways. A steward should be accountable AND actionable. Don’t just select a nominal data owner who is powerless to enforce change or lacks the motivation to push back and improve the situation.
In my experience, data quality actually becomes far easier (and cheaper) to manage when you shift the focus upstream. It can actually be a lot more rewarding as you not only improve the quality of data but the quality of people’s lives. If you think cleansing data in a downstream data warehouse is tedious, imagine having to deal with unwieldy data entry applications, irate customers and constant rework activities.
By shifting focus to upstream data quality you get a double whammy: satisfied staff = satisfied customers.
So if you’re looking for greater data quality ROI in a tough economy, moving your focus upstream could be a smart investment.


