DQC is the core module of the Ataccama ONE Data Quality & Governance. Running on top of the DQC engine, there are two user-friendly applications for data stewards and business users.
The diagram below depicts a typical Data Quality process employing these key components:
DQC is an essential tool for complex data quality curation. This domain-agnostic tool is bundled with specific sets of rules and localized dictionaries. Predefined business modules covering cleansing and identity matching for a given business entity include Party, Address and Vehicle.
DQC is able to connect virtually to any database platform via JDBC as well as to various file formats (flat files, XML files, MS Excel spreadsheets, JSONs etc.). This includes formats and sources typically used in the Big Data world such as Avro files, Parquet files, Hive tables and Kafka messages. DQC can provide the outputs of the data processing in various ways—write the standardized/cleansed/deduplicated data to the target data storage (e.g. DWH), calculate data quality indicators (DQIs) for the reporting purposes, or alternatively select records for manual resolution by data stewards/business users and send them to the DQIT application. DQC solutions can be used in both batch and online mode, and any data processing flow can be published as a web service and integrated into various front-end applications, such as a Data Quality Firewall.
DQC is built on a “component” architecture designed to be flexible and customizable while providing ready-to-use data quality modules. Together with task specific modules, DQC can deliver cutting-edge Cleansing, Validation, Match & Merge and Reporting functionalities.
Designed to be used as the main hub for your data quality management. Delivers centralized management for rules and data quality controlled from one location. Whether external master data systems or other data sources, DQC enables data from all sources to be integrated and managed under one data quality platform.
The solution is easily configured using bundled administration applications. Does not require any external tools or other 3rd party applications. DQC is platform independent, based on open standards (XML, Web Services), and uses data models portable across all existing database platforms.
Parallel data processing methods to ensure scalability. Enables incremental data processing in both batch and online processing modes.
Fast data analysis with advanced semantic profiling functionality.
Set of algorithms capable of hierarchical unification by identifier keys irrespective of internal data structures. Can perform approximate matching in record unification.
Easily taps into external data sources to retrieve records. DQC utilizes name, organization, titles and other dictionaries to verify and validate input data. This functionality can be extended with customized variables tailored for individual needs.
Ataccama DQC delivers rich functionality and easy implementation for a competitive cost.