The Situation

Most organizations today face the situation that a critical business asset—the company's data—is not reliable and does not meet expectations of its business users. This, in the end, means the data does not support them in meeting the organization's goals and costs money.

Inconsistent or missing information about an organization's key data (usually customers, products, and contact details) may turn into serious business issues, such as ineffective marketing and retention campaigns, inaccurate regulatory reporting and basis for risk management, and many others. The issues also come into the spotlight when organizations merge and integrate their IT architecture in numerous data migrations.

When dealing with these issues, organizations usually start with automated data cleansing, standardization, and enrichment in order to handle most of the painful data defects quickly.

The Solution

The data cleansing solutions show a high level of diversity, respecting the environment and requirements of the customers. But in general, there are two options—a one-off cleansing project and a permanent data cleansing solution.

One-off data cleansing is used for activities such as data migrations from one system to another, or ad-hoc mass marketing campaigns. The quality of the data tends to deteriorate when not taken care of continuously. Thus, when dealing with quality of data in customer databases, data warehouses and other systems, permanent data cleansing solutions are implemented to run in batch or online mode. In both the one-off and permanent solutions, the definition of requirements and business rules, in close cooperation with the customer representatives, is the key component.

The whole data cleansing process usually consists of the following steps:

  • Data recognition and standardization
    • Pattern-based parsing of whole data entities to identify information hidden in wrong fields
    • Transformation into standard format, removal of typing errors, undesired abbreviations, etc.
  • Data validation
    • Content evaluation: comparison with relevant reference data (both internal and external)
    • Context-based evaluation: validation of business rules, business-relevant data consistency
  • Data enrichment
    • Completion of missing values based on either business rules or information retrieved from both internal and external data sources (e.g., company registers)
  • Data correction
    • Application of automated corrections (from the steps listed prior) to be stored separately from the original values
    • Cleansing score and metadata indicating the quality of the original values and corrections applied

The major deliverable of both solution types is a substantial increase in data quality. In case of one-off cleansing, a data extract with both original and cleansed data, cleansing metadata, and cleansing score is handed over either for backward propagation into the source system or another use. In case of a permanent solution, a data quality tool (Ataccama DQC) is configured and implemented to automatically cleanse the data in connected system(s), and provide inputs for related data quality management processes and optional statistics for Data Quality Monitoring and Reporting.

Ataccama supports the solution

Ataccama provides both the technology (specialized algorithms, predefined data cleansing & standardization plans ready to be adapted to customer requirements) and the business logic (knowledge bases, country-specific cleansing and standardization rules, and the best practices in this area).

The Benefits

  • Higher data reliability/trustworthiness
  • Precise basis for all reporting purposes and risk management
  • Increased efficiency of marketing campaigns by addressing appropriate customers and correct contact data
  • Compliance with regulatory requirements
  • Higher efficiency of business processes in general, as scrap & rework is minimized