Main Content
Use statistics about data characteristics to uncover trends and commonalities in corporate information.
You can learn a lot about your data just by reviewing some basic statistics about the data. DataFlux data profiling solutions give you a variety of statistical information, including minimum/maximum values, mean, median, mode and standard deviation, to help you assess the validity of your data.
The figure above shows statistical data about personal home loan values from a financial organization. Personal home loans normally range from $20,000 to $1,000,000. A loan database with incorrect loan amounts can lead to many problems, from poor analysis results to incorrect billing of the loan customer. Let's take a look at some basic statistics from a loan amount column in the loan database.
- The minimum value of a loan is a negative value.
- The maximum value for a loan is $9,999,999.
- Two loans have missing values (null count).
- The median and standard deviations are unexpectedly large numbers.
All of these indicate potential problems for a personal home loan data file, which would lead a business analyst to explore these in more detail - and begin to build data correction routines. Additionally, as new data is entered, tracking basic statistics can give you insight into the characteristics of new data that enters your systems. This can alert you to inconsistent information and help prevent adding problematic data to a data source.
