Data Quality Management
Data quality refers to how closely your organization’s data meets your standards for the data to be considered usable or “fit for purpose.” The closer the state of your data is to usability, the higher the data quality is. As a process, data quality management defines the steps necessary to make a company’s data operational – enabling individuals and organizations to draw insights from it.
Your company might need certain formatting for customer social security numbers (e.g. xxx-xxx-xxx vs. xxxxxxxxxx). You could then grade the quality of those records based on how many entries are correctly formatted. In this scenario, a data quality management process could be transforming or reformatting that data so that they all fit the same standard.
Once a company defines its ideal data quality standards, users will implement data governance and quality management procedures to ensure those standards are met. While definitions for data quality can vary from organization to organization, six dimensions create a holistic view of a data set’s quality:
- Completeness. Does your dataset have all the necessary fields filled? Is one of your customer datasets missing phone numbers but still has D.O.B. and first and last name? Then that set would be considered incomplete.
- Validity. Customer phone numbers that are disconnected, phony email addresses, and incorrect postal addresses can all lower your data quality. Validity checks verify that the data conforms to a particular format, data type, and range of values.
- Timeliness. Timeliness has to do with how up-to-date your data is. Data that isn’t updated in real-time or has been sitting in the data lake for too long can be considered unreliable and low quality.
- Uniqueness. Uniqueness measures how much duplicate data is in a given data set, either within any particular column or as whole records.
- Accuracy. Accuracy refers to the number of errors in the data – it measures to what extent recorded data represents the truth. Your organization's 100% accuracy will usually be defined in your data governance program.
- Consistency. If you have conflicting data in two different systems, that data is considered inconsistent.
The value of data quality management is best represented by the risks it helps your organization avoid. Having low-quality data can result in severe business, financial, and legal risks for any company.
Low-quality sales data can waste your marketing budget. You can take on expensive fines for being non-compliant with your Personally Identifiable Information. Your eCommerce system could fail due to inaccurate inventory management. A data quality management system always proves much more affordable than the consequences of bad data.