Accessibility Links

Main Content

Validate the data that you have and understand if existing data exceeds pre-set limits of data quality.

DataFlux provides frequency counts and outlier detection techniques that provide automated validation of data. By validating the data that you have - and finding data points that fall well outside of acceptable limits - you can save the immense cost typically spent on manual data validation. Frequency counts also limit the amount of business analyst fault detection required. In essence, these techniques highlight the data values that need further investigation.

Outlier detection helps you:

  • Gain insight into data values
  • Identify data values that may be considered incorrect
  • Drilldown to the data to make a more in-depth determination about the data

For example, a database of customer information might have a number of valid state abbreviations. In many data sources, California is represented as “CA," “CA.," “Ca.," and “California." Non-standard representations complicate any future state-level analysis.  DataFlux technology contains rules to recognize these state entries, and the software allows you to consistently identify and contact specified individuals under each of these state abbreviations.

Look for outliers to find data outside of tolerance

Outlier detection also helps you pinpoint problematic data. Whereas frequency count looks at how values are related according to data occurrences, outlier detection examines the data values that are remarkably different from other values. Outliers show you the highest and lowest values for a set of data. This technique is useful for both numeric and character data.

This outlier report shows the 10 minimum and 10 maximum values for a field.

Consider this outlier report (showing the 10 minimum and 10 maximum values for the field). The field under analysis is product weight, measured in ounces, for individual-serving microwaveable meals. A business analyst might understand that the valid weights are between 16 and 80 ounces.

However, the report indicates that there are many outliers on both the low end and the high end. On the low end (0, 0.03, etc.), the values were likely entered in pounds instead of ounces. On the high end, a value of 715 or 15552 may be case or pallet weights instead of individual serving weights. Outlier detection from DataFlux allows you to quickly and easily determine if there are gross inconsistencies in certain data elements, and helps you drill through to the actual records and begin to create a defined process to correct the data. The first step to consistent, accurate and reliable data is to understand the current anomalies in the data.