If you are a DataFlux customer, please use the MyDataFlux Portal login to access our resources.
Data quality management often concentrates on prevention – instituting inspection and monitoring for potential anomalies as a way of eliminating the source of introduction of erred data. However, there are certain situations in which the organization does not exercise administrative control over data that is used by business applications – such as data sourced from third-party data providers, data that is entered by external parties, or data that is generated by flawed automated processes.
In these cases, it is difficult, if not impossible, to prevent errors from entering the environment. In order to maintain high quality data, the data management practitioner may need to rely on data cleansing techniques. Parsing and standardization are a combination of techniques used to match data values against known patterns to help map values to standard formats, identify errors and potentially correct them, and ultimately normalize data values so that they can be more effectively used within business processes.
In this paper we look at common data error paradigms – descriptions, examples, and ways that data set quality is impacted by those common root causes for introducing errors. We then consider aspects of metadata management that help to limit the scope of introduced errors, and then how that metadata is used by parsing and standardization utilities to normalize data. Last, we’ll explore how parsing and standardization techniques can be integrated into the application framework to help identify potential errors as data enters the environment as a “data quality firewall.”
Registration is required to download DataFlux resources. If you have already registered, please log in. If you are a new user, please fill out the form below.