Blog
Data 101

What is Reference Data Management?

June 1, 2021 9 min. read
What is Reference Data Management?

Welcome to Ataccama’s guide to the fundamentals of reference data management.  This article outlines everything there is to know about what reference data is, how it connects to referential data management, and why it matters for data analysts, governance stewards, and other data professionals.

What is reference data management?

Before we can define reference data management, we must first define what reference data is. Reference data is data that defines the values that are used to classify and characterize other data. This is sometimes known – not always correctly – as master data, golden record, golden copy, or single source of truth.

Reference data management is a crucial component of master data management (MDM) that has evolved into a mature data management discipline in its own right.

Think of the codes and descriptions that make up RDM as how businesses identify information about any number of business activities, such as information about financial transactions, locations, forms of measurements, and inventories.

Why is reference data management important? 

Both volumes of data and opportunities to use it effectively grow through rapidly evolving and maturing technologies. That means data increasingly has an operational impact on organizations. Effective reference data management ensures day-to-day operations across businesses, charities, educational institutions, and other establishments continue without disruption. 

With the increasing volumes of transactional data, the importance of reference data management is growing too. Without it , we would have difficulty organizing and sharing common information about millions of products and services. We would even have difficulty determining the correct locations for the delivery of those products and services, especially in the context of enterprise data governance.

Disorganized, incorrect, or inaccessible reference data can disrupt and delay business processes. Assigning the wrong code to a particular transaction can negatively impact billing, inventory counts, and replenishment, or undermine the accuracy and destination of shipments.

Why reference data management matters 

Therefore, better management of data and its corresponding reference data can present enormous business value in the form of:

  • More accurate analytics (e.g., which type of engine is sold the most regardless of the specific car model?)
  • Reliable regulatory reporting
  • Effective sharing of datasets between organizations (mutual understanding thanks to using industry-standard reference data codes)

Ultimately, the above helps organizations serve their customers better, increase operating margin, and make their employees more productive and happier.

We will discuss more benefits later on in this article.

What are some examples of reference data?

In theory, reference data could apply to just about anything. One of the clearest reference data examples comes from the automotive industry.  Just think about ordering a brand new Jaguar because you’ve been so successful in your data career. 

You’ll be able to customize the type of engine, wheels, seats, warming mechanisms, audio options, and more. Somewhere in their system, Jaguar will store this client reference data in structured tables to categorize and validate each selection during the configuration process. 

For example, here is a table that contains information about different available engines:

CodeDescriptionMain fuel typeEngine category
D165-AWD-A-MHEVIngenium 2.0 liter 4-cylinder 163PS Turbocharged Diesel MHEV (Automatic)DieselMHEV
D200-AWD-A-MHEVIngenium 2.0 liter 4-cylinder 204PS Turbocharged Diesel MHEV (Automatic)DieselMHEV
D300-AWD-A-MHEVIngenium 3.0 liter 6-cylinder 300PS Turbocharged Diesel MHEV Automatic)DieselMHEV
P250-AWD-AIngenium 2.0 liter 4-cylinder 250PS Turbocharged Petrol (Automatic)PetrolPV
P400e-AWD-A-PHEVIngenium 2.0 lite 4-cylinder 404PS Turbocharged Petrol PHEV (Automatic)PetrolPHEV

Attributes such as Main fuel type and Engine category are reference data examples that help categorize and describe broader product data. 

In this case, the system connects client reference data with product master data to ensure consistent categorization and accurate product configuration. 

Here is an example of the ICD-10 codeset for classifying diseases:

CodeDescriptionCategorySub-category
A00CholeraCertain infectious and parasitic diseasesIntestinal infectious diseases
A01Typhoid and paratyphoid feversCertain infectious and parasitic diseasesIntestinal infectious diseases
A02Other salmonella infectionsCertain infectious and parasitic diseasesIntestinal infectious diseases
A03ShigellosisCertain infectious and parasitic diseasesIntestinal infectious diseases
A04Other bacterial intestinal infectionsCertain infectious and parasitic diseasesIntestinal infectious diseases

The list continues to row Z98.89.

These codes are also grouped into taxonomies and hierarchies for easier consumption by users.

Given reference data’s utility, it is typically a static, i.e., non-changing dataset that once defined, governs the use of data in an organization until it is changed.

What are some common reference data management problems?

Like any other data, if ungoverned and unmanaged, reference data becomes a liability, rather than an asset.

Here are some examples of what actually happens in organizations:

  • Lack of a single source of the truth for reference data, which results in
    • Departments managing reference data in silos and synchronizing them slowly and irregularly
    • Using older versions
  • Lack of data governance principles
    • Rules for using external reference data
    • 4-eye principle for approving changes, i.e., business workflow
  • Using unsuitable tools: typical Excel → more in a separate section

How does this impact data stewards?

  • Manually collecting requirements and reconciling differences throughout departments
  • Manually updating reference data in consuming systems
  • Involving IT and maintaining SQL queries or other technical solutions

How does this impact business and operations?

  • Fines in highly regulated industries such as pharmaceuticals or airlines.
  • Inconsistent financial reporting
  • Lack of alignment between different lines of business
  • Operational mistakes

Here is an example of what can happen when two different sets of codes represent the same concept:

Code used in dept 1Description used in dept 1Code used in dept 2Description used in dept 2
DTSDental servicesOD2Dental care
SKSkin careDRMDermatology
VISVision specialistOPMOphthalmologist

Why you shouldn’t manage reference data in Excel

The typical reason users of data run into problems with reference data is that they have not been managed by a sophisticated tool. Instead, it was built and maintained in programs, such as Excel, which are not fit for purpose.

This is why reference data should never be managed in spreadsheets.

Excel lacks flexibility, validation capabilities, and integration features that modern reference data management tools provide. 

  • They provide very limited search functionality.
  • They offer very limited ways to validate data or proactively alert users to change.
  • They lack automated versioning capabilities and can exist across independent silos.
  • Manually merging spreadsheet updates can take hours, and simple tasks, such as adding a column to a codebook, for example, can impact their ability to be exported and shared with other systems and users.

They lack the advanced approval workflow capabilities that are necessary to update critical reference data.

Learn more about why managing reference data centrally is the only way to do it and why Excel is not the right tool to do that in this webinar.

How does reference data management (RDM) work?

RDM provides governance architecture to centralize and manage any reference data.

The data itself is stored in a single RDM hub or database repository because most data stewards, architects, and reference data managers believe an RDM solution should be the single and trusted source for the creation and company-wide distribution of reference data.

1. Reference data modeling

    Reference data is modeled or structured by domain.

    The model may be based on best practices or heavily influenced by external industry standards. HR, manufacturing, finance, and other departments will require different codeset models and approaches.

    2. Reference data imports and mapping

      Given the stability of codesets, importing reference data from outside the hub should not be a frequent occurrence. New codesets will typically be devised and modeled within the system as part of a deliberative process. Pre-existing codeset data may be copied and modified.

      With more complex solution architectures involving legacy systems, centralized reference data authoring might not be possible. 

      In this case, reference data is authored and synchronized between several systems, including the RDM solution itself, which holds the single version of the truth.

      3. Data quality

        Data quality is important for reference data just as it is for any other data. Therefore, it is important to embed data validations into the reference data authoring workflows. This will prevent mistakes from happening.

        Additionally, reference data management solutions are able to maintain the referential integrity of data automatically.

        4. Workflows and approval

          Since reference data has such an organization-wide impact, any changes to it are usually subject to a tightly controlled workflow process. As part of RDM governance, a designated team of data stewards and subject matter experts participates in this process to ensure the 4-eye principle before any changes are published. 

          They collaboratively view, comment, create, update, or even delete codesets. Upon approval, they will be published for viewing and disseminated across all relevant business systems.

          5. Versioning

            Anticipating required updates to reference data should be a priority, as policy changes, new products, parts, location changes, or other new procedures will necessitate ongoing management.

            RDM addresses this requirement through versioning. In the example below, the first row, indicating the current, or active part specification QIX-102-A, is set to be automatically ‘retired’ on March 13, 2015. Upon retirement, RDM will simultaneously publish the replacement part specification, QIX-201-C.

            IDPart SpecificationValid fromValid to
            33QIX-102-A2000-01-012015-03-13
            34QIX-201-C2015-03:132099-01-01

            6. Sharing trusted reference data

            In addition to promoting a centralized governance approach, RDM’s data management mission includes sharing data with all relevant business systems and users.

            There are multiple ways governed reference data can be leveraged and made available throughout an organization:

            • System/application integration: RDM can directly integrate with business systems to publish data directly to hard drives, the cloud, or FTP used by HR, finance, sales, marketing, and more.
            • Data Catalogs: Reference data is a major offering now provided by modern data catalogs. Data catalogs can become a trusted source for reference data, enabling users to search and request code sets for business tasks and transactions.
            • Data Warehouses: Companies can also use their data warehouses to store archived or historical reference data for analytical reporting. Data marts are likely to maintain commercially active versions of reference data to support customer-facing activities and applications.

            Benefits of reference data management 

            Much like other data domains, the value of reference data is entirely dependent on the knowledge, skill, and understanding of the people who know how to use it.

            A mature approach to RDM can bring many benefits to organizations that invest in it, including:

            • Automated reference data governance
            • Reduced overheads by eliminating reliance on spreadsheets, data shared by emails, or SQL query complexities
            • A governance environment that increases the quality and availability of data
            • A single source of reference data for designated users and systems
            • Increased trust in data and improved decision-making as a result
            • Improved reporting for tracking policy and regulation compliance
            • Increased productivity through accurate and accessible reference data
            • Synergy with Master Data Management (MDM) to produce a 360 view of any business domain.

            And much more.

            Start managing reference data centrally 

            Reference data management captures the “What, Where, When, Who, and How” of organizational data.

            If you’re looking for more information about technology that can help you implement RDM, or if you’re generally interested in it as part of your data career, take a closer look at our modular reference data management platform, Ataccama One.

            It’s the only solution that natively integrates RDM, master data management, data quality, and a data catalog. Get started in no time in our secure, scalable cloud.

            Author

            Ataccama

            Our unified data trust platform helps organizations improve decision-making, enhance operational efficiency, and mitigate risks.

            Published at 01.06.2021
            Updated at 07.08.2025

            Do you like this content?
            Share it with others.

            See the platform in action Schedule a demo