Ataccama
  • Plataforma
    Enterprise Data Quality Fabric
    Enterprise Data Quality Fabric
    Arrow right
    How It Works
    Visión general de la plataforma
    Arrow right
    Calidad de los datos
    Calidad de los datos

    Comprobaciones automatizadas de calidad de datos, supervisión, detección de anomalías y corrección

    Reference Data Management
    Gestión de datos de referencia

    RDM centralizado, autoría, jerarquías y sincronización

    Master Data Management
    Gestión de datos maestros

    Dominio multidominio, administración, emparejamiento de IA, suministro flexible de datos

    Integración de datos
    Integración de datos

    Extracción, transformación y suministro flexible de datos

    Catálogo de datos
    Catálogo de datos

    Descubrimiento automatizado de datos, glosario empresarial y mercado de datos

    Historias de datos
    Historias de datos

    Cuente historias atractivas con sus datos

    Despliegue
    Opciones de despliegue Plataforma como servicio Local e híbrida Arquitectura e integraciones
  • Soluciones
    Volver
    Centrado en
    Implementación de la gobernanza de datos

    Una pila de herramientas para empezar rápido y mantener la gobernanza de los datos

    Tejido de datos

    Active los metadatos y automatice el mapeo, la extracción y el suministro de datos

    Gestión de Big Data

    Ingiera datos fiables, controle su lago de datos y procese datos.

    Vista única de los datos

    Establezca una única fuente de la verdad y cree una visión única para todos.

    Ver todas las soluciones industriales
    Desde
    Banca, Seguros, Finanzas

    Validación de datos en la entrada, Customer 360, cumplimiento normativo

    Cuidado de la salud

    Patient 360, datos fiables para pruebas y HCE, cumplimiento de la HIPAA

    Venta minorista

    Validación de datos en la entrada, Customer 360, enriquecimiento de datos, datos de referencia

    Gobierno

    Citizen 360, intercambio y protección de datos, ciudades inteligentes

    Ciencias de la vida

    MDM de productos, datos limpios para estudios clínicos, transparencia de gastos

    Telecomunicaciones

    Customer 360, enriquecimiento de datos, seguimiento de equipos, privacidad de datos

    Transporte

    Monitorización de equipos, Customer 360, datos de referencia, privacidad de datos

    Última lectura
    Ataccama Receives $150 Million Growth Investment from Bain Capital
    Ataccama Receives $150 Million Growth Investment from Bain Capital

    Ataccama receives a $150 million growth investment from Bain Capital Tech Opportunities to enhance R&D and go-to-market, and enable data democratization.

  • Clientes
  • Empresa
    Volver
    Contáctenos
    Programar una llamada Contáctenos Suscríbase al boletín de noticias Chat en vivo
    Empresa
    Sobre nosotros

    Todo sobre nosotros, quiénes somos, visión, liderazgo, oficinas

    Kit de medios

    Descargue nuestros activos de marca, fotos y capturas de pantalla de productos

    Carreras

    #NotYourAverageJob

  • Recursos 1
    Volver
    Recursos

    Vídeos, artículos, consejos de nuestros expertos y líderes de opinión

    Noticias Historias de éxito Blog Monográficos Seminarios web Demos
    Todos los recursos
    Asistencia

    Obtenga respuestas a sus preguntas técnicas

    Documentación Formación Base de conocimientos Comunidad de usuarios Asistencia al cliente
    Eventos

    Asista a nuestros eventos virtuales y presenciales, próximamente

    Future of Financial Services, Melbourne 2022

    Jul 20

    Innovate VIC 2022

    Jul 21

    Elegido a mano para usted
    title
    What Is Data Quality and Why Is It Important?

    Learn what data quality is, why it is important, what costs and risks bad data carries, and how you can get started with data quality today for free.

  • Socios
    Volver
    Socios
    Hágase socio

    Conozca nuestro modelo de asociación, únase a nosotros

    Portal de socios de Ataccama

    Inicie sesión en nuestro portal de socios para acceder a todas las herramientas y recursos esenciales.

    Oportunidad de registro

    Registre el cliente potencial y obtenga una recompensa de socio

    Nuestros socios

    Vea nuestros socios tecnológicos, integradores de sistemas y socios de entrega

  • Probar ahora
    Volver
    Meeting
    Reserve una reunión

    Hable de sus necesidades y requisitos con uno de nuestros representantes de ventas.

    Herramientas gratuitas
    Perfiles web

    Creación de perfiles con un solo clic en su navegador. Simplemente arrastre y suelte un archivo.

    Analizador de calidad de datos

    Herramienta avanzada de creación de perfiles. Se instala en minutos en Windows.

    Historias de datos

    Modern data visualization. Present complex facts and wow all stakeholders.

    Ver todas las herramientas gratuitas
  • Contact
Ataccama
Login
Usuario
Iniciar sesión o registrarse
Contact
Logo with rockets
Announcing
$150 Million Growth Investment
BainCapital logo
Learn more
Blog

What is Data Profiling?

5 minutes read

Data profiling is the first step to any data initiative. It’s a series of checks and analyses undertaken to gain an increased understanding of data.

What exactly is data profiling?

Once you upload your source data, a data profiler generates information about data patterns, numeric statistics, data domains, dependencies, relationships, and anomalies.

Companies can then use this information to evaluate their data sets (or even single columns within the set through column profiling) and proceed with the data initiative at hand. Whether it’s a simple data analysis or something complex like building a data quality program, a data migration, designing or reviewing architecture, or creating a master model (get more detail about these use cases further below).

Anyone can benefit from using a data profiler because it provides essential information about any data set or data source. To better understand this benefit, let’s look at the types of information captured in a data profile.

What information can you get from data profiling?

Some of the critical insights a data profiling task can provide are:

Data Set Overview

This will be an overall summary of information about your data set. The data profile viewer will include the number of records and attributes, the types of data stored there, relationship discovery, how many of each type, etc.

Ataccama data profiling

Basic Data Quality Information

Your profiler will also provide vital information about the quality of data in your set. It will determine quality based on things like a set's completeness (how complete each entry is, if there is a null value, or if there's inaccurate data) and uniqueness (whether or not there are multiple entries for the same data within the set).

Data Formats and Patterns

Data quality enthusiasts know that there are a finite number of formats for postcodes, for example, and that they should be alpha-numeric. Profilers can visualize the different formats and patterns so that you can understand how many values are off.

Frequency Analysis

Profilers generate information about duplicate values within a data attribute, showing you the most common or distinct values.

Data Domains or Custom Data Tags

Advanced data profiling tools detect what kind of data is stored in a data set and label it. For example, you will see which attributes contain emails, PII, credit card data, or address information.

Other features include detecting data dependencies, checking data against a specific business rule, or slicing data (e.g., by gender, zip code, city, etc.), and analyzing profiles of those particular slices.

Why is Data Profiling Important?

It’s hard to understand whether or not a data set is useful or usable without profiling it first. Whatever the use case might be, using data without fully understanding its contents and quality is at best irresponsible.

Despite this, businesses often overlook data profiling because the service is usually packaged within a more comprehensive data quality platform. However, in many data-specific use cases, the relevance and usefulness of data profiling is striking.

Use Cases for Data Profiling

In all of these use cases, data profiling is the first step to secure vital information about a data set before moving on.

  • Starting a data quality or data governance initiative. Data profiling is very often the first step to building a data quality or data governance program. It uncovers various repeating problems in data that lead to data quality issues. It can also help data stewards create a data rule for cleansing and monitoring data and establishing data governance policies.
  • Building a master data model. The benefit of data profiling for master data management is twofold:
    • First, it gives an overview of where the data of interest is located, for example, which systems store customer data.
    • Next, it provides information about inconsistencies in formatting and value, which, if not standardized, would make the data matching process longer and more compute-intense.
  • Performing data migration. Before a data migration project, profiling data lets data stewards correct errors and perform data cleaning before the data is transferred.
  • Evaluation of data suitability and usability. At some point, everyone works with data. Having a tool that gives you an overview of a data set is useful for anyone from digital marketers to rocket scientists.
  • AI and Machine Learning. Data profiling tools are also an important component of preparing data for AI or machine learning.

Data Profiling Tips

Here are several tips for planning and maximizing the efficiency of your data profiling activities: 

  • Separate priorities from the noise. When strategically profiling on a legacy system you can run into massive walls of erroneous data, the question is if you should care. You have to decide which data sets are most important and need their quality addressed first (CDEs).
  • Be careful about the conclusions you draw from profiling. There are different types of data, reference, transactional, master, this will affect the way you should profile and the actions you take afterward. For example, a DQ issue in a transactional dataset could only affect that one particular entry, however, with master data one error could potentially impact thousands of records.
  • Try to narrow down the sets of your profiling as much as possible. If you know that 95% of profit comes from 10% of your sales then you can eliminate large sections of your data you would need to profile. 

Data Profiling Real-World Examples

If you’re still not sure about the importance of data profiling, look at these real-world examples.

Uncovering fraud in a bank

It might sound surprising, but if you know your banking business well, profiling might help you detect fraud. One of Ataccama’s users analyzing data profiling results of several data sets on banking transactions found outliers in the frequency distribution of phone numbers.

After looking more closely into a few of them, she uncovered that each phone number was associated with several clients. Finally, she passed the information to the fraud team, who confirmed several fraudulent transactions and set up measures to prevent this in the future.

Ensuring data usability in the drug development process

Developing new drugs is a data-intensive process. Researchers collect and analyze data on thousands of combinations of compounds and cooperate with external laboratories to speed up the process. This means data is exchanged a lot.

So, when in-house researchers receive data from a cooperating party, they profile it to make sure the formatting is correct, verify the data contents, and check for other potential errors. Data profiling helps researchers only work with reliable, verified data.

Learn more about why data profiling and data management disciplines are important in the pharma industry in this in-depth article.

Data Profiling with Ataccama

As you can see, you don't have to be a data scientist to benefit from data profiling. It is a powerful tool that can be used in various situations by people whose main job is not necessarily analyzing sales data or building predictive models.

Try our free data profiling tools

If you want to get started with data profiling, try our free data profilers, trusted by 60,000+ users.

Start profiling now

Related articles

What Is Data Quality and Why Is It Important?

What Is Data Quality and Why Is It Important?

Blog
Data Catalog Fundamentals: Main Principles, Benefits, and Key Features

Data Catalog Fundamentals: Main Principles, Benefits, and Key Features

Blog
Privacy Policy Cookie Policy Terms of Use Ethics Hotline
Espanol
English Deutsche Pусский Français Espanol
© Ataccama 2022
Cookies We value your privacy

We use cookies on our website to enhance your browsing experience. By using our website, you consent to the use of cookies. To understand more how we use cookies or how to change your preference and browser settings, please see our privacy policy.

Select cookies