3 billion dollars. That's the average amount of money it costs a pharmaceutical company to roll out a drug into the market. The process takes between 10 and 15 years, and the success rate is a mere 12%. This means 7 out of 8 drugs fail at the late stage of the drug development process—after years of work have already been invested.
During this long process, vast amounts of data are generated and exchanged between internal teams and external partners via dozens of specialized IT solutions. And while each solution plays an important role—managing clinical trials, managing compounds, creating models and simulations—the sheer number of them creates complexity in data and technologies that pharma companies just cannot ignore.
And many don't. They understand the importance and benefits of sound data management principles and data governance programs. As Sangeet Khullar, Director of Data Science & Engineering for Daiichi Sankyo affirms, "The pharma industry is coming to a conclusion that we must have an MDM platform."
In the opposite case, they risk letting their data become a huge mess, affecting everything from regulatory compliance and financial reporting to the duration of the drug development process. But with a robust, enterprise MDM solution, who knows? Maybe those 10 to 15 years could shrink to 5 or 10.
Data management drivers in pharma
Let’s look at the drivers and opportunities for data management in the pharmaceutical industry, specifically clinical study data.
Pharma companies face strict regulatory scrutiny beyond reporting drug attributes in standardized formats. Later stages of drug discovery involve humans, which means patient data is collected and must be properly stored and protected to comply with GDPR and similar regulations.
That's not all, though. After a drug hits the market, pharmaceutical companies promote it with individual doctors and hospitals. This includes various incentives, such as covering trade show tickets and travel expenses. These marketing activities are regulated under the Sunshine Act, which requires pharmaceutical companies to report the amount spent per doctor or institution.
Data volume and complexity
Drug discovery is a scientific process that involves the collection, exchange, and processing of large amounts of data. Internal teams send data to universities, labs, and clinical research organizations and get results back. The data they receive might be in a different format, contain invalid values, or use different (or missing) measurement units.
Internal teams cannot control the quality of data they receive, but they must understand it before they decide to use it or not.
Moreover, pharma companies must onboard and use standardized industry dictionaries, such as MedDRA, ATC codes, and other reference data.
All of this takes place within a data landscape comprising several dozen transactional nonsystems that aren’t even integrated (and it's not always practical to do so).
Another side of this data complexity is how it affects the operational efficiency of various teams. Without an established data governance program or data management practices, pharma enterprises might end up using inconsistent versions of the same reference data, create duplicate studies in different systems, and waste time harmonizing, mapping, and stitching this data together.
How data management disciplines help pharma companies calm the data storm
Data profiling is the process of analyzing data and inferring information about it, such as data patterns, numeric statistics, data domains, dependencies, and relationships.
How data profiling helps pharma teams
At least two groups of users benefit from data profiling in pharma enterprises: scientists and data engineers.
Scientists can understand the quality and contents of the data they receive from partner institutions and decide if they want to use it. They can immediately see data domains of this incoming data: PII, product codes, adverse events, gene symbols, etc. Additionally, if a reference data management (RDM) solution is connected to the profiler, they can see whether (and how much) the values match reference data.
As for data engineers, they can identify data quality issues or inconsistencies and resolve them before data moves through the data pipeline. Data profiling helps them create proper transformations to align incoming data to internal data standards and automate issue resolution. After all, it's likely that similar issues will originate from the same source in the future. More about this in the Data quality section below.
Critical data profiling features for pharma
- Integration with RDM: Checking whether data matches reference data codebooks gives scientists an understanding of what sort of data they received for analysis.
- Drilldown to data: The ability to see data samples based on specific characteristics that profiling discovered. For example, 10% of values might be missing in a column containing measurement units. Can this data set be used?
Reference data management
Reference data management is the discipline of managing codebooks (or dictionaries) containing lists of allowed values for a specific data domain. This is slowly changing categorical data, which, in the case of pharma, includes substance codes, product codes, adverse events, gene codes, and diseases.
How reference data management benefits pharma companies
Reference data is at the core of pharmaceutical research and drug discovery. It gives meaning to transactional information and provides every department with a single source of truth, be it clinical operations, pharmacovigilance, or analytics.
The key is to manage reference data centrally and provide a consistent version of it to all teams and systems. Since many different departments need to use the same external dictionaries, such as MedDRA, it's cheaper to procure them once and distribute data to everyone. Besides, it is a single place for authoring internal data as well as managing translations of internal codes to industry-standard ontologies. Besides saving money, a centralized solution reduces the time costs associated with resolving misunderstandings between teams.
Finally, a centralized RDM solution enables data teams to create error-free regulatory reports whose importance cannot be overstated in the pharmaceutical industry: submitting inconsistent data can cost the company a drug license.
Important RDM features for pharma
- Configurable approval workflow to support the handling of various reference data types. Industry-standard reference data such as MedDRA requires minimal validation by stewards and can go through a simplified workflow. On the other hand, creating and editing internal reference data calls for a 4-eye validation process.
- Data versioning to schedule the publishing of changes to reference data that should be valid from a certain date.
Data quality management is a set of practices that aim at ensuring that data is valid, accurate, complete, and consistent, or, to tie all this to something practical, usable for a given task at hand.
Why data quality is important for pharma
Pharma companies constantly form new partnerships with research institutions. This brings new data quality issues that data stewards need to deal with, as we have shown above when talking about data profiling. To automate this process, it is necessary to set up a library of data quality rules linked to specific data domains.
However, it's not just external data that needs treatment—all internally generated data needs to be quality-controlled, too. To stay on top of data quality deterioration, it is necessary to set up data monitoring to provide data managers with on-demand dashboards and reports. Automated data standardization routines are equally important. They cleanse all new data and prevent data quality deterioration in the first place.
Important data quality capabilities for pharma
- DQ firewall: The ability to validate incoming data to prevent invalid data from getting into storage systems.
- Data transformation and enrichment to ensure that the best version of data gets to data warehouses and data lakes.
- DQ evaluation: The ability to perform a quick check of a given data set against a business rule. This can be an automated process, which is part of data profiling, or performed on demand when a scientist needs to check data before using it.
- DQ monitoring: Producing data quality reports per system, data domain, or view and making them available on demand or as scheduled exports.
Master data management
Master data management is a discipline focused on creating duplicate-free, complete, and the most up-to-date versions of critical enterprise data entities. For pharma, this typically includes products, compounds, studies, investigators, partners, and customers.
Why pharma companies should master clinical studies
Clinical studies are one of the important data entities for pharma companies. Typically, several issues plague clinical study data that make mastering necessary.
First, since pharmas are often large, multinational corporations, data about clinical studies originate in different departments across countries. This, in turn, leads to the creation of duplicate entries in different systems with dispersed information.
Second, these duplicate entries often use different naming conventions for studies, greatly complicating the use of this data for analytics, operations, and regulatory reporting.
This means that pharmas can end up not knowing how many studies are currently in progress, who the lead investigator is, or whether specific rules for study validity are met.
An MDM platform addresses these issues by harmonizing discrepancies in naming conventions and consolidating data from all systems into a golden record with complete information. The resulting MDM hub hosts an up-to-date, single source of truth for all clinical studies.
As soon as the MDM hub is established, not only does it provide data to everyone who requires it, but it also prevents connected systems from creating duplicate studies. It acts as an authoritative registry for all systems and users.
Important MDM capabilities for mastering clinical studies
- Integration with RDM: Since reference data is so important in pharma, integrating it into record matching logic more than makes sense. Up-to-date reference data will also be available when creating a new master study.
- Stewardship features: These include a manual review of AI-suggested matches or matches with lower confidence, DQ issues in records that need the attention of stewards, suggestions by business users. All of these are critical at the initial stages of the MDM project and when fine-tuning matching rules.