What Is Data Quality and Why Is It Important?
It is tempting to believe that data, and the management of its quality is something new, brought about by the advent of new regulations such as E-Privacy and the EU GDPR. It is not. Data, its management, and its quality have been around since information was first created: when we started writing things down.
Let's start with a definition of data quality:
We could go further, talking about how data quality is a process by which data becomes operational, enabling individuals and organizations to draw insights from the data which will inform their decision-making.
The reason we describe DQ as a process rather than a single item is that it comprises various elements that all contribute to the purpose of making data “fit for purpose”. Sometimes people use the term Data Preparation to refer to these elements, though data prep should be considered separate for now.
What are the dimensions of data quality?
Sitting underneath the umbrella term of Data Management, DQ takes a holistic view of an entire dataset, combining these elements – often called the dimensions of Data Quality – to provide a snapshot of the quality of data held.
Are there gaps in the data and if so, where? Some gaps are worse than others and what is considered a gap depends on the process where the data is used. For example, if the billing department requires both phone number and email address, then no record missing one or the other can be considered complete. You can also measure completeness for any particular column. Profiling your data will uncover these gaps.
Are the postcode records you hold in a valid format? How confident are you that the email and postal address records you hold in your database are capable of receiving? Validity checks verify that the conforms to a particular format, data type, and range of values.
Since data-driven automation is so important nowadays, data has to be valid to be accepted by processes and systems that expect it.
Is new information entering your CRM every day in real-time or are you manually importing it? How often is the data “refreshed”? Timeliness is a crucial dimension because of the increasing need for up-to-date data.
Similar to other dimensions, timeliness is user-defined. One kind of data needs to be available on a quarterly basis for financial reporting. Other data must not be older than 5 minutes for real-time analytics.
Do you have the same customer recorded twice in your data set? Uniqueness measures how much duplicate data there is in a given data set, either within any particular column or as whole records. For example, in the orders table, each order should have just one row. If, on the other hand, you encounter two records with the same order id, you have a duplicate. How did it get there? Someone could have mistyped the order number. This brings us to the next dimension: accuracy.
Perhaps the most important dimension, accuracy refers to the number of errors in the data. In other words, it measures to what extent recorded data represents the truth. Accuracy is tricky because data might be valid, timely, unique, complete, but inaccurate.
100% accuracy is an aspirational goal for many data managers, and once achieved, the principles of data governance can be combined with DQ to ensure the data does not degrade and become inaccurate ever again.
Do you have conflicting information about the same customer in two different systems? That means the data is inconsistent, which might lead to inconsistent reporting and poor customer service.
How important and what is the value of data quality?
We believe an even more important dimension to data needs to be discussed here: value.
Our definition of the value of data quality is this: what are the business, risk, and financial values assigned to any piece of information? In this manner, data analysts and other practitioners of data management can quickly assign priorities to different data sources or specific data domains when they do data quality projects.
We recommend using a tool to assign literal values to your data such as:
Business - how valuable is, for instance, Employee salary data to marketing? Chances are, it has a much higher business value to the HR department, whereas customer emails are more useful for marketing.
Risk - are you holding Personally Identifiable Information (PII)? This means you could be exposed to the risk of GDPR fines if this data is not accurately protected to ensure the individual’s privacy.
Financial - eCommerce companies are the best example of the financial value of data: typically email address and credit card numbers are all that is needed in order to transact with customers and therefore profiling the data, keeping it of high quality, and reporting it over time can help eCommerce businesses understand the average value of customers and accurate email addresses.
As you can see from these examples, Data Quality can quickly become mission-critical for your business, depending on the quality of the data you hold that you need to perform day-to-day operations.
What are the business costs and risks of poor data quality?
Data quality maturity curves are becoming more prevalent, and organizations can quickly ascertain whether they’re reactive or optimized and governed in their approach to data management.
An example of an organization that is immature in its capture and management of data is one that does not use validation fields or uses free-form capture fields on the contact forms of its website, allowing anyone to enter whatever they like.
Bad data should not be taken lightly as it poses significant risks and business costs. Below are several examples:
- Wasted marketing budget: if your organization is sending physical mail to your customers and marketing leads, but those addresses are out of date or invalid, you’ll be wasting precious marketing dollars and time.
- Non-compliant data: regulations such as GDPR require a certain standard (Article 5) of Data Quality to be maintained in relation to the accuracy and integrity of data. If an organization’s data is found to be non-compliant with data-driven regulations such as the EU General Data Protection Regulation (GDPR) they can be fined up to 20 million euros or 4% of annual turnover - whatever is higher!
- Hindered IT modernization projects: when data moves from source to target system, without correct mapping and data quality efforts old dirty data can wreak havoc on the new system.
- Poor customer experience: If contact information is of poor quality, you cannot provide customers with a tailored customer experience and serve them via their preferred channel.
- Fines: In regulated industries such as healthcare and banking, enterprises risk miscalculating key statistics for regulatory reports and getting fined.
- Unreliable analytics and machine learning: Inaccurate or invalid data will provide inaccurate analytics and unreliable machine learning models.
- Strategic operational mistakes: Building a warehouse at the wrong location, not catching fraud, producing the wrong alloy are all examples of using bad data for business decision-making.
And yes, you can put a number on data quality.
What are the benefits of better data quality?
There are so many benefits to improving the quality of your information that it is impossible to list them all out, but some of the common ones include:
- Increased return on investment for marketing activity thanks to improved email and postal deliverability and more reliable targeting
- Less time spent fixing dirty data. This will save you $1-10 per record.
- Increased ability to personalize your service or product offerings
- Improved, faster decision-making
- Compliance with new and existing regulations and the creation of a consumer-centric data-driven culture
And many more. Ultimately, your business is unique, and therefore how you benefit from improved DQ is also unique.
Giving Voice to the Business Benefits of Data QualityWatch webinar
On demand webinar.
What are must-have features to ensure data quality?
Before you do any data quality, it’s important to examine your data at its source to better interpret and understand it. Data profiling does this faster and more efficiently than via SQL queries. It helps with defining what transformations are necessary for the data and what problems to track in the future.
Data cleansing and transformation
Very often you need to transform data to improve its quality. This includes:
- Format standardization
- Parsing data and breaking it down into separate attributes (e.g., full name into first name and last name)
- Data enrichment: bringing additional data from external sources
- Data deduplication: remove duplicates from data
- Data masking: sometimes you need to obfuscate data for security reasons
It’s important to note that these processes need to happen automatically to any new data before it travels to other systems and makes its way to data analysts and is used for business decision making.
That being said, it's even more beneficial and smart to establish processes that validate and “treat data” before it enters any IT system. This is called a data quality firewall. An example of this is an algorithm that checks data entered into a web form against a required format and alerts the user to fix it, such as email addresses or birth dates. But DQ firewalls can be embedded into complex enterprise applications as well.
Monitoring and reporting
Peter Drucker said it best: “If you can’t measure it, you can’t improve it.” It’s as valid data quality as it is for business in general. Tracking changes and improvements to data over time is crucial and is usually done through data quality dashboards.
First, it shows you whether you are moving in the right direction, i.e., whether the data quality metrics that you have defined are improving or not. Second, monitoring data quality helps catch unexpected influxes of bad data and track it to its source. And third, it helps with tracking compliance with regulatory requirements and more.
How can I start improving my DQ?
An important first step is to profile your data to understand just what state it is in. There are several data management tools that you can use to do this, many of which offer free versions.
Get started with data quality todayDownload data profiler
Online or desktop