Big Data Engine (BDE) offers rich built-in features that respond to all data quality needs. It allows you to quickly explore data, identify priority areas, and conduct detailed data quality analysis for each area of interest before the execution of necessary transformations.
BDE’s most important feature is the ability to execute all of the above-mentioned tasks natively within a Hadoop cluster. All data inputs are directly processed by a series of MapReduce jobs, or by employing Apache Spark. Thanks to this method, BDE’s functionality can scale to virtually any size of data without performance issues.
Using built-in connectors and readers, BDE can easily integrate existing data sources with Hadoop to provide seamless data movement whenever necessary.
BDE allows users to perform any kind of transformation, aggregation, or modification in order to move data around from one data source to another, blend various sources together, or simply prepare the data for further analysis work. This significantly reduces the data preparation time and increases the efficiency of the discovery process.
Fast data analysis and advanced semantic profiling functionality.
BDE is designed to be used as the main hub for your data quality management. It enables data from all sources to be integrated and managed under one data quality platform.
Thanks to its flexibility and data-agnostic approach, BDE is also a good fit for text mining tasks. Use cases include named entities extraction, sentiment analysis, and classification. All of these are also supported natively within Hadoop, thus capable of running on large amounts of text.
Ataccama BDE provides rich functionality with a very competitive cost and fast and easy implementation.
The solution is easily configured using bundled administration applications. It does not require any external tools or other third-party applications. BDE is platform-independent, based on open standards (XML, Web Services), and uses data models portable across all existing database platforms.
Easily taps into external data sources to retrieve records. BDE utilizes name, organization, titles, and other dictionaries to verify and validate input data. This functionality can be extended with customized variables tailored for individual needs.
BDE automatically generates documentation of every step in the plan, including all business rules. The whole solution is easily auditable.
A set of algorithms (capable of hierarchical unification by identifier keys, irrespective of internal data structures) can perform approximate matching during record unification.
All calculations and processing are executed directly on a cluster, with no need to take any data out of Hadoop. Based on the cluster characteristics, the solution is automatically translated into a series of MapReduce jobs, or directly utilizes Spark. BDE supports all major Hadoop distributions.