You can't guarantee that your data will be accessible or reliable just because your company collects it. Enterprise data landscapes are incredibly complex, so it's impossible to understand the state of data or work with data effectively without proper processes and tools.
Every organization dealing with a large amount of data without good data governance will face many problems. For example:
- Data scientists spend more than 80% of their time searching for data and cleansing it, and the rest building models.
- Business users complain to IT about poor data quality.
- BI tools designed to be self-service fail at the point of understanding data by users.
- Regulatory reports are prepared in panic mode.
What is data governance?
First, let's define data governance.
Data governance is a set of principles, policies, and processes put in place by an organization that ensures data availability, quality, and security for data consumers within that organization.
Our definition specifically mentions data consumers because they are the main benefactors of data governance. These consumers can be humans (very often referred to as "the business") and machines (think operational applications such as CRM, ERP, as well as external and internal APIs).
Usually, this is driven by the data governance office and executed by data stewards. These individuals will compile together the rules and processes that will govern how the company handles the organization's data and make the information available to all relevant parties.
What are the goals of data governance?
Data Governance solutions put the rules and processes in place to achieve several goals for your organization. Data consumers primarily care about data availability and data reliability. Let's break these two down.
- Speed of search: A data user should be able to find the data they need, verify it is indeed the data they were looking for, and start using it. Users should also have a good idea about what is available in general.
- Metadata analysis: Understanding the meaning of data, different and similar business terms, and calculated values. A data user should be able to browse related metadata objects and view data lineage.
Data reliability (aka trust in data)
- Data authenticity: Understanding the origin and quality of a specific data set.
- Data quality transparency: Understanding what enterprise data can be used for a specific purpose at hand.
What are the components of data governance?
Now that we know the goals of data governance, let's discuss what functional components are necessary to achieve them.
Various data governance frameworks available today can include up to 10 components of data governance. You will find that "The DAMA Wheel" links the concept to every corner of the data management spectrum, from data quality down to data storage. You'll find things like data integration, data architecture, data modeling, data security, and others.
The data governance office manages these and some other components in conjunction with other departments, such as IT, Legal, Compliance, or Security. For this article, however, we'd like to focus on the components of data governance that no other function or department usually initiates or is interested in. These components are:
- Data Quality
- Master & Reference Data Management
- Metadata Management
These three directly cater to data consumers' needs by creating policies and processes that ensure data is available and of high quality on a continuous basis.
Data quality is about establishing processes and metrics to ensure that data is fit for processing and further use, such as analysis, reporting, BI, data science, etc. Data quality is a crucial component of data governance because it makes the state of data objectively measurable.
Your data governance strategy will help define metrics and processes for how your company ensures data quality.
At the initial stages, the data governance office (represented by data stewards) will collaborate with different lines of business to understand the most common data issues and set up rules for data validation, cleansing, and monitoring. For example, if your company does not serve clients under 18, you can define that as a data quality rule in your data governance program.
Here is an example of a data quality monitoring project focused on PII:
Based on the most common data use scenarios (like financial reporting), the data governance office will also help define thresholds for high and poor data quality. It will set up processes for proactive data quality, such as capturing data quality issues early and automatically correcting them, assigning data stewards to more complicated matters, and educating users in general.
What is the goal of data quality?
The primary goal of data quality is to ensure data reliability. For your company to process data and fully benefit from its insights, you need data that is complete, valid, not duplicated, accurate, and consistent. If the metrics and processes defined by your data governance program are adequate, you should have data quality that meets all of these criteria.
Master & Reference Data Management
Master and reference data are both critical data types within any large organization. Both act as a single source of the truth for various data consumers. Master data includes information about people, things, and locations, while reference data is a set of codebooks for categorizing master and transactional data.
Data governance can help by developing rules and processes around managing master (MDM) and reference data (RDM). Master data governance can contain directions for forming the "golden record," especially when data from different sources conflict.
In some cases, data stewards need to review a particular golden record and decide on specific attribute values:
Other important aspects of data governance include defining rules for who can create, change, and delete master data and business workflows for approving these changes. You can also define rules about who can access what master data and how it can be used.
With reference data (especially externally procured), data governance helps identify the processes and systems that consume the same (but often duplicated) reference data. It then strives to establish a centralized reference data tool and store to eliminate multiplied spending on the same reference data and simplify the process for updating to newer versions of codesets when they need to be changed or become outdated.
When it comes to internally authored reference data, data governance sets rules for approving changes, for example, via the four-eye principle and avoiding problems for data consumers.
What is the goal of master and reference data management?
The goal of master and reference data management is again all about data reliability. Data authenticity and transparency are imperative for this data because it contains the most critical data within the company. Therefore, the policies and processes your governance efforts define for them need to be well-thought-out and specific to your company's needs.
Metadata management is about creating an enterprise-wide understanding of data to increase data availability and decrease "time to data." Achieving these goals requires setting up data governance tools like a data catalog, a business glossary, and data lineage.
Here is an example of a listing of assets in a data catalog:
Data assets listed in the data catalog
The data governance office helps with setting up these tools, making them part of change management, and integrating them into vital data-related processes. It establishes procedures and provides the tools and methodology for filling in the data catalog and business glossary.
It is then up to specific business units and departments to create consistent definitions of essential business terms, document the calculation of various KPIs or metrics, and keep track of company policies. This way, they create a common point of reference for everyone.
It's worth noting that modern data catalogs integrate with data quality tools to provide transparency about the state of any data set that users find in a data catalog or any business term they find in the business glossary.
In the example below, users can review the overall quality of personal data and drill down to see which systems and table contain in:
What is the goal of metadata management?
The ultimate goal of metadata management is data availability. By implementing a toolset that creates clarity and understanding of data, data governance helps with many regulations and helps data consumers find, understand, and gain data access faster.
Frequently asked questions about data governance
Here is some more information about data governance if you're still curious.
When do you need to implement a data governance plan?
If your company is collecting a relatively small amount of data, you probably won't need a data governance framework, or at least not a very complex or stratified one.
As companies grow larger and their data systems more complex, they need to put these rules and processes in place to ensure their data remains useful and of high quality.
You might also need data governance policies if a new law or data regulation goes into effect. This way, you can know the type of data you're storing, how sensitive it is, and who has access to it.
You might also need data governance to align with existing regulations like the GDPR or the California Consumer Privacy Act.
Who should handle data governance?
Executing data governance effectively requires the collaboration of three parties:
- The business
- Data governance office or data governance committee
IT owns technologies, creates simple technical checks, ensures technical processes work (such as ETL), and monitors changes in data structures. At the same time, IT lacks the end-to-end understanding of business processes and looks at data from a purely technical perspective.
The business understands the meaning of data. They are instrumental in creating checks and business glossary content. In the end, they benefit the most from data governance. However, they have a view of data limited by their product or business process.
Finally, the data governance team looks at all enterprise data and owns methodologies and processes for effective management. This team's approach to issues in data is proactive. However, they lack specific knowledge of business processes and systems or specific data.
The three teams help each other with their strengths and cover other teams' weaknesses. Once the data governance is set in place, it's primarily maintained and enforced by a data steward or a data custodian.
What is a data steward?
Data stewards are in charge of maintaining the data and ensuring the rules put in place by the governance are followed. They can help define and create data quality rules, monitor data quality, work with business teams to populate the business glossary, and resolve any data issues.
Data stewardship makes sure everyone plays within the boundaries of your governance initiative and follows the data's lifecycle from creation to its eventual analysis and, finally, its deletion or expiration.
How to ensure a data governance program is successful?
Not all data governance programs successfully launch or continue growing if they do. There are several factors to keep in mind before and during data governance implementation. We will briefly cover them here.
- Start with a business problem and solution. Having a business case for data governance is vital to its success. For example, instead of offering data governance with the goal of "having better data," sell it to the business as a solution to a problem they have: ineffective data science or lack of self-service data discovery.
- Secure funding and buy-in from upper management. Data governance is a complex end-to-end program that, upon its maturity, covers the entire organization and impacts many processes and systems. Therefore, after building the business case, it's vital to secure funding and get buy-in from key stakeholders, such as the CFO, COO, Head of IT, and others.
- Start small but keep an organization-wide perspective. It's wise to break down the data governance program into manageable steps and go one step at a time, solving practical issues and showing value at each stage. For example, you could start with improving the quality of data for an important regulatory report. It is, however, vital to set up roles and processes with the future perspective of data governance being an organization-wide program.
- Communicate often. It's essential to communicate the success of the data governance program along with the significant milestones of its implementation. Effective communications tools are product demos, dashboards with metrics, and presentations to executives.
Time to start governing your data
To summarize, data governance is a set of rules, policies, and processes put in place by a company to ensure data availability and reliability. While it can be linked to several components of any data strategy, its three key components are data quality, master data management, and metadata management.
Remember the following:
- It's never too early to start a data governance plan
- Data governance requires collaboration between business, the data governance team, and IT
- Data stewards are pivotal to executing a data governance initiative
- The success of data governance requires approaching it with the right mindset, getting buy-in, and executing it correctly.
- Software alone will not cure your data troubles, but data governance tools are an important technological component.
Utilizing data without proper governance is close to impossible. If your business has a lot of source systems and data types, then it's only a matter of time before you realize you cannot trust your data.
Some data governance programs fail because of unintuitive tools. Here at Ataccama, our goal is to help organizations implement data governance programs. We achieve this by building easy-to-use user interfaces and automating tedious, time-consuming tasks. If you'd like to learn more, take a look at our data governance demo.