Data glossary

The vocabulary behind AI-ready data. From data quality and
lineage to agents, data trust, and the modern data stack.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A

AI agents

AI agents are autonomous software systems that perceive their environment, make decisions, and take actions to complete defined objectives without constant human input. In data management, AI agents automate data stewardship tasks like profiling, rule generation, classification, anomaly detection, and issue resolution.

AI readiness

AI readiness is an organization's ability to successfully adopt, scale, and govern AI initiatives, measured across strategy, data, talent, and infrastructure. Trusted, high-quality, well-governed data is the foundation of any AI-ready organization and the leading determinant of AI project success.

AI-ready data

AI-ready data is data that is accurate, complete, contextualized, and governed enough to safely train, ground, or operate AI models and agents in production. It combines data quality, lineage, metadata, and policy enforcement in a single trust layer that AI systems can depend on.

Agentic AI

Agentic AI describes AI systems that operate autonomously to plan, reason, and execute multi-step tasks with limited human oversight. In data management, agentic AI handles work like rule creation, anomaly investigation, classification, and remediation across data assets at machine speed.

Agentic data trust

Agentic data trust is a platform approach that uses autonomous AI agents to deliver continuously trusted, AI-ready data. It unifies data quality, observability, governance, lineage, and master data on a single trust layer that operates without manual rule writing or cleanup.

Anomaly detection

Anomaly detection is the automated process of identifying data points, patterns, or events that deviate significantly from expected behavior or established norms. Often powered by machine learning, it surfaces outliers and silent data quality issues that fixed, rules-based checks would miss.

Augmented data lineage

Augmented data lineage enriches traditional technical lineage with business context, data quality signals, ownership metadata, and AI-generated insights. It makes lineage usable for business analysts, compliance teams, and data stewards, not just data engineers and ETL developers.

Augmented data quality

Augmented data quality applies AI and machine learning to automate the creation, application, and monitoring of data quality rules at scale. Gartner uses the term as the name of its annual Magic Quadrant, evaluating vendors that automate traditional manual data quality work.

C

CCPA

CCPA, the California Consumer Privacy Act, is a data privacy law that gives California residents rights over how businesses collect, store, sell, and share their personal data. Compliance requires disclosures, data subject request workflows, governed access controls, and clear opt-out mechanisms.

Chief Data Officer (CDO)

A Chief Data Officer (CDO) is the senior executive accountable for an organization's data strategy, governance, quality, and business value realization. The CDO role bridges business outcomes and data infrastructure decisions, typically reporting to the CEO, COO, or CIO.

Customer 360

Customer 360, also called a single view of customer, is a unified and trusted view of a customer built by consolidating data from every system that interacts with them. It enables personalization, accurate segmentation, and consistent experience across sales, service, and marketing channels.

D

Data catalog

A data catalog is a centralized, searchable inventory of an organization's data assets, enriched with metadata, lineage, ownership, business glossary terms, and data quality signals. It helps data consumers find, understand, and trust the data they need for analytics and AI.

Data compliance

Data compliance is the practice of ensuring that data collection, handling, storage, sharing, retention, and deletion align with applicable laws, regulations, industry standards, and internal policies. Common compliance frameworks include GDPR, CCPA, HIPAA, SOX, and PCI DSS.

Data contracts

Data contracts are explicit, machine-readable agreements between data producers and consumers that define expected schema, quality, semantics, and service level agreements. They shift data quality enforcement left, into the pipelines and source systems that generate data.

Data curation

Data curation is the end-to-end process of organizing, classifying, enriching, and maintaining data so it remains accurate, discoverable, and usable over time. It covers everything from ingestion and profiling through to access control, documentation, and eventual retirement.

Data democratization

Data democratization is the practice of giving every employee, not just technical users, safe and governed self-service access to the data they need to do their jobs. It depends on trustworthy data, clear business definitions, data literacy, and easy discovery through a catalog.

Data fabric

Data fabric is an architectural approach that uses active metadata, automation, knowledge graphs, and AI to deliver unified access to data across siloed systems, clouds, and applications. It connects and integrates data rather than physically consolidating it into one warehouse.

Data governance

Data governance is the framework of policies, roles, processes, and standards that ensures data is accurate, secure, compliant, and used responsibly across an organization. It defines who can take what action with which data, with accountability enforced through data stewardship.

Data integration

Data integration is the process of combining data from multiple sources into a unified, consistent view for analytics, operations, or AI. Common data integration techniques include ETL, ELT, streaming, change data capture, data virtualization, and API-based integration.

Data lineage

Data lineage tracks data's complete journey from its source through every transformation, system, and final destination. It provides visibility into how data flows and changes over time, powering impact analysis, root cause investigation, regulatory compliance, and audit readiness.

Data mesh

Data mesh is a decentralized data architecture where domain teams own and publish their data as products, governed by shared standards and federated governance. It is an alternative to centralized data lakes and warehouses, designed to scale data ownership across the business.

Data observability

Data observability is the practice of continuously monitoring data health across pipelines and platforms, using signals like freshness, volume, distribution, schema drift, and lineage. It helps teams detect, triage, and remediate data issues before they reach reports, dashboards, or AI models.

Data profiling

Data profiling is the practice of analyzing a dataset to surface its structure, content, and quality characteristics, including null counts, value distributions, formats, patterns, and duplicates. It is the starting point for any data quality, governance, or catalog initiative.

Data quality

Data quality is the degree to which data is fit for its intended business use, decisions, or AI workloads. High-quality data is accurate, complete, timely, consistent, valid, and unique, and forms the foundation for trustworthy analytics, reporting, and AI.

Data quality assurance

Data quality assurance is the structured discipline of measuring, monitoring, and continuously improving data against defined dimensions, rules, and thresholds. It turns data quality from a principle into an operational practice, with profiling, validation, monitoring, and remediation built into daily workflows.

Data quality dimensions

Data quality dimensions are the criteria used to measure data quality, most commonly accuracy, completeness, consistency, timeliness, uniqueness, and validity. They provide a shared vocabulary and measurable framework for assessing whether data is fit for purpose.

Data quality for AI

Data quality for AI is the discipline of ensuring training, fine-tuning, and inference data is accurate, complete, representative, and well-governed enough for AI models and agents to operate safely in production. Poor data quality is the leading cause of failed enterprise AI projects.

Data quality gates

Data quality gates are checkpoints embedded in a data pipeline that validate incoming data against business rules before it is allowed to move downstream. They catch bad data in motion, preventing invalid records from reaching analytics, applications, or AI systems.

Data quality issues

Data quality issues are problems in data that make it unfit for its intended use, including missing values, duplicate records, inconsistent formats, schema drift, and inaccurate or stale entries. They have direct downstream costs across analytics, operations, compliance, and AI.

Data quality monitoring

Data quality monitoring is the ongoing process of measuring data against defined rules and dimensions to detect issues as they emerge. AI-powered monitoring extends coverage with anomaly detection, surfacing unknown issues that fixed rules would never catch.

Data stack

A data stack is the collection of tools an organization uses to ingest, store, transform, govern, observe, and analyze data. The modern data stack typically refers to cloud-native, modular components such as Snowflake, dbt, Fivetran, and a data catalog or governance layer.

Data steward

A data steward is the person accountable for the quality, definition, governance, and appropriate use of data within a specific business domain. Data stewards bridge business and IT, owning the rules, glossary terms, and remediation workflows that keep data trustworthy.

Data trust

Data trust is the confidence that data is accurate, complete, secure, well-governed, and used appropriately by both people and AI systems. It is built and maintained through the combination of data quality, observability, governance, lineage, and transparent stewardship.

Data visualization

Data visualization is the practice of representing data graphically through charts, dashboards, infographics, and interactive diagrams to make patterns, trends, and outliers easier to understand. It is a core capability of business intelligence and self-service analytics tools.

E

EU AI Act

The EU AI Act is the European Union's regulation governing the development, deployment, and use of AI systems, with strict requirements for high-risk applications such as credit scoring, hiring, and biometrics. Article 10 specifically mandates data quality, lineage, and governance evidence.

Enterprise data quality fabric

Enterprise data quality fabric is an automated, metadata-driven approach to data quality that operates across the entire data estate. It combines a data catalog, AI-augmented rule generation, automated profiling, and continuous monitoring on a unified fabric architecture rather than in isolated tools.

F

Forrester Wave

The Forrester Wave is a comparative analyst evaluation of vendors in a defined technology market, scored across current offering, strategy, and customer feedback. Vendors are plotted as Leaders, Strong Performers, Contenders, or Challengers based on their performance against weighted criteria.

G

GDPR

GDPR, the General Data Protection Regulation, is the European Union's data privacy and protection law, governing how organizations collect, process, store, and protect the personal data of EU residents. Non-compliance can trigger fines of up to 4% of global annual revenue.

Gartner Magic Quadrant

The Gartner Magic Quadrant evaluates technology vendors on completeness of vision and ability to execute, plotting them as Leaders, Challengers, Visionaries, or Niche Players. The Augmented Data Quality Magic Quadrant is Gartner's annual evaluation of the data quality solutions market.

Generative AI

Generative AI is a class of AI models that produces new content such as text, code, images, audio, or structured data based on patterns learned from large training datasets. In data management, generative AI accelerates rule generation, documentation, classification, and discovery.

Golden record

A golden record is the single, authoritative, and trusted version of a business entity such as a customer, product, or supplier, created by reconciling and consolidating data from multiple source systems. It is the core output of master data management.

H

HIPAA

HIPAA, the Health Insurance Portability and Accountability Act, is a US federal law that sets standards for protecting sensitive patient health information. HIPAA compliance requires technical, physical, and administrative safeguards across any organization that handles protected health information (PHI).

M

Master data

Master data is the core, slow-changing business data that describes the entities used across an organization's processes, including customers, products, suppliers, employees, and locations. It is referenced by most transactions and forms the foundation of consistent reporting and operations.

Master data management (MDM)

Master data management (MDM) is the discipline of creating and maintaining a single, authoritative source of truth for an organization's core business entities. MDM combines data governance, matching, deduplication, stewardship, and distribution to deliver golden records across systems.

Metadata

Metadata is data about data, describing what a dataset contains, where it came from, who owns it, when it was last updated, and how it should be used. Metadata includes technical, business, and operational categories, and forms the foundation of every modern data management capability.

Metadata management

Metadata management is the practice of capturing, organizing, storing, and using metadata to power data discovery, governance, lineage, and automation across the data estate. Active metadata management uses AI to keep this context continuously up to date.

R

Reference data management (RDM)

Reference data management (RDM) is the discipline of centrally maintaining and distributing reference data, such as country codes, currency codes, status values, and product categories, so every system across the organization uses the same approved, version-controlled values.

Report catalog

A report catalog is a centralized inventory of an organization's reports, dashboards, and BI assets, enriched with metadata about owners, sources, business context, and quality. It does for analytics outputs what a data catalog does for the underlying datasets.

Root cause analysis

Root cause analysis (RCA) is the systematic process of identifying the underlying source of a data issue rather than just its visible symptom. In data management, RCA combines lineage, observability, ownership, and pipeline context to trace problems back to where they originated.

S

Shift-left data quality

Shift-left data quality is the practice of moving data validation and quality checks earlier in the pipeline, closer to where data is produced or ingested. It catches issues before they propagate downstream into reports, applications, and AI systems where remediation is far more expensive.

T

Trust layer

A trust layer is the architectural layer that ensures data feeding AI, analytics, and applications is accurate, governed, explainable, and traceable. It typically unifies data quality, observability, lineage, governance, and master data in a single, AI-ready foundation.