DQC Features

DQC Manager

DQC Manager

DQC Manager is a full featured Eclipse-based application that is used to design, debug and run DQC plans as well as manage online services running on a DQC server.

Ease of Use

Ease of Use

DQC Manager has been designed with the user in mind. It has a streamlined interface, built-in samples and video tutorials as well as comprehensive reference documentation. A number of productivity features simplify the process of designing and testing the data quality plans.

Performance & Scalability

Performance and Scalability

DQC runtime is designed for high performance and huge volumes of data coming from large number of source systems. DQC projects typically process tens to hundreds of millions of records or provide sub-second SLAs on online identification and unification services.

Data Profiling

Data Profiling

DQC includes powerful profiling functionality, which is able to determine basic data statistics, uniqueness, frequency, masks as well as define custom business rules. Relationship analyses include primary and foreign key analysis. Drill-down data can be collected in a database to allow displaying the exact records that fall into one of the profiled categories. Profiling results can be viewed inside the DQC application or exported to HTML or XML.

Rich Connectivity Options

Rich Connectivity Options

DQC includes a number of types of connections, including Database I/O (JDBC), Text files (CSV, Fixed width), and XML and Excel files. iWay Software Connectors are also supported giving you almost unlimited flexibility in accessing the data in the most exotic systems.

Batch Mode / Online Mode

Batch and Online Mode

Batch mode deployments typically process initial data loads or regular increments, possibly with very large data volumes.

Online mode allows publishing prepared DQ processing as an online service for source systems to access within a transaction (e.g. new customer record identification before creating a new record in the system).

Extensive set of Algorithms

Extensive set of Algorithms

DQC contains over a 100 algorithms, ranging from simple-to-use algorithms optimized for a specific operation (e.g. validation of credit card numbers), general task-specific ones (e.g. Generic Parser or Profiling), to complex powerful algorithms (Unification, Address Identifier). For a detailed reference on the algorithms included, see the DQC Algorithms reference included in the DQC installation.

Data Enrichment

Data Enrichment

DQC contains a number of algorithms that allow data enrichment against reference data and external data sources.

The Web Lookup algorithm is a powerful tool that allows enriching records with data obtained from Internet URLs, suitable for cases where a full reference data database is not available.

Match & Merge

Match and Merge

DQC is built on a best of breed unification engine that allows Match & Merge processing that typically cannot be achieved with other tools. Matching is defined explicitly by a set of parameters, and DQC stores information about which business rule was used for matching as well as the reason for matching (expressed in business terms) with each record.

DQC also includes a unique hierarchical unification, which addresses problems with missing matching keys, where a secondary matching key is used and splitting into groups can be controlled based on uniqueness of the matching. Matching via candidate groups is also one of the best performing solutions on the market today.

Approximative Matching

Approximative Matching

Matching in DQC can be set up to use Levenshtein or Hamming distance. Tuning this can provide a controllable level of fuzziness/exactness in the matching.

Address Identification

Address Identification

Address Identification is one of the most complex tasks in data quality. DQC includes two separate algorithms to allow the user to identify addresses down to an address point and enrich the data with geo-location coordinates.

Powerful Business Rules Engine

Powerful Business Rules Engine

DQC includes a very powerful business rule engine based on a high-performance expression evaluator. This gives you complete freedom where most Algorithms can be fine-tuned using custom functions and expressions, removing limitations of typical hard-coded functionality.

Data Cleansing Scoring and Clearing Codes

Data Cleansing Scoring and Clearing Codes

Any data cleansing that the DQC engine performs is recorded in the form of scores and clearing codes. This allows very easy identification of changes that have been done on individual attributes or records, and performing of additional operations with this information, such as profiling and statistical analyses.

Extensibility

Extensibility

There are moments where you encounter environments or problems so uncommon that the built-in set of algorithms and functions is not able to address them. DQC is designed to be extensible - new algorithms or functions can be written in Java and added to the DQC engine.