Customer Data Consolidation

Customer consolidation

The aim of the consolidation process is to identify all records connected to one customer, address or any other business entity. The consolidation process is based on two consecutive steps:

  • Data cleansing - normalization, standardization, parsing, replacements (e.g. dictionary of typos);
  • Unification - grouping of similar records based on semantic proximity.

Data Consolidation

Data cleansing

Dirty data is penalized on both the particular attribute and whole record (instance) levels. The penalization is expressed via the so-called cleansing score, which represents the severity of data cleansing procedures used for data purification, i.e. obtaining its standard form. The higher the score; the worse the quality of the processed record. Identified errors are expressed by means of cleansing codes. Information on data non-quality serves as a ‘bottom-up' approach input into DQM processes.

Unification

This step represents grouping of records based on a ‘unification key', which is composed of standardized attributes relevant to and sufficient for the subject's (e.g. person or address) unique identification. Possible uncertainty of the result is expressed by multiple-level grouping (in the case of Ataccama DQC we recognize candidate and client grouping levels for the subject of customers). These groups are then classified into several categories based on the data quality of records within each group to simplify interpretation of the results.

Most of the current customer-centric ‘automated' processes in TMSK are ‘MSISDN' based. To implement the ‘customer consolidation' approach, identification of unified customers across all customer data handling systems must be executed as the first step. The following picture demonstrates an example of the previously described consolidation approach (implemented in Ataccama DQC) based on customer cleansing and unification:

Data Consolidation

figure 1. Customer Consolidation Example

Identification represents the ‘front-end oriented' process of cleansing and assigning new input records into respective unified groups, which is typically executed in on-line mode, while unification is often implemented as a ‘back-end' batch process due to its complexity.