Big Data Engine

A powerful technology architected for high performance, scalability, and rapid processing of huge volumes of data.

Big Data Engine

Big Data Engine (BDE) offers rich built-in features that respond to all data quality needs. It allows you to quickly explore data, identify priority areas, and conduct detailed data quality analysis for each area of interest before the execution of necessary transformations.

Use Ataccama BDE for the following tasks

  • Big data integration to and from Hadoop
  • General data processing tasks (both in online and batch modes)
  • Transformations, aggregations, data enrichment, and more
  • Quality control in transactional and analytical applications
  • Cleansing and unification in system migrations
  • Quality assurance in software integration projects
  • Data quality improvement in address and contact information
  • Continuous DQ monitoring with a business-friendly interface
  • Cleansing and unification of data for client identification purposes
  • Profile validation and correction of incomplete records
  • Customer input validation in self-service online applications
  • Profiling as a part of data integration project analysis
  • Detection of inconsistencies and irregular patterns for fraud prevention and more
  • Data preparation for other further analytical use

Full of features and high performance

BDE’s most important feature is the ability to execute all of the above-mentioned tasks natively within a Hadoop cluster. All data inputs are directly processed by a series of MapReduce jobs, or by employing Apache Spark. Thanks to this method, BDE’s functionality can scale to virtually any size of data without performance issues.

Data integration

Using built-in connectors and readers, BDE can easily integrate existing data sources with Hadoop to provide seamless data movement whenever necessary.

Data processing

BDE allows users to perform any kind of transformation, aggregation, or modification in order to move data around from one data source to another, blend various sources together, or simply prepare the data for further analysis work. This significantly reduces the data preparation time and increases the efficiency of the discovery process.

Unique profiling

Fast data analysis and advanced semantic profiling functionality.

Data quality

BDE is designed to be used as the main hub for your data quality management. It enables data from all sources to be integrated and managed under one data quality platform.

Text analytics

Thanks to its flexibility and data-agnostic approach, BDE is also a good fit for text mining tasks. Use cases include named entities extraction, sentiment analysis, and classification. All of these are also supported natively within Hadoop, thus capable of running on large amounts of text.


Ataccama BDE provides rich functionality with a very competitive cost and fast and easy implementation.

Flexibility and open standards

The solution is easily configured using bundled administration applications. It does not require any external tools or other third-party applications. BDE is platform-independent, based on open standards (XML, Web Services), and uses data models portable across all existing database platforms.

Data enrichment with external sources

Easily taps into external data sources to retrieve records. BDE utilizes name, organization, titles, and other dictionaries to verify and validate input data. This functionality can be extended with customized variables tailored for individual needs.

Automatic documentation

BDE automatically generates documentation of every step in the plan, including all business rules. The whole solution is easily auditable.

Advanced core functionality

A set of algorithms (capable of hierarchical unification by identifier keys, irrespective of internal data structures) can perform approximate matching during record unification.

Hadoop MapReduce and Apache Spark native support

All calculations and processing are executed directly on a cluster, with no need to take any data out of Hadoop. Based on the cluster characteristics, the solution is automatically translated into a series of MapReduce jobs, or directly utilizes Spark. BDE supports all major Hadoop distributions.

We use cookies on our website to enhance your browsing experience. By using our website, you consent to the use of cookies. To understand more about how we use cookies or how to change your preferences and browser settings, please see our Privacy Policy.