Improved logistics and optimized merchandizing
The retailer leveraged Ataccama for in-place data lake processing with Spark to ensure efficient logistics, optimize merchandising, and produce reliable reports for business development based on high-quality data.
- 1.5 billion records checked in 99 seconds
- 2x faster than Apache Tez
- 250 Ataccama jobs run on the data lake each day
- 64 million record address lookup returning data in real time
After several business initiatives failed because of unreliable data, the retailer made data quality a top priority. As a data-driven organization, X5 wanted to ensure accurate data was used for improving its business processes. Optimizing sales, logistics, and merchandising required reliable product, geolocation, and supplier data. At the same time, online customers were demanding more granular and complete product information to make purchasing decisions. Together, these factors led the company to look for a data quality solution.
When selecting a data quality vendor and tool, X5 looked for the following:
- A tool that is easy to learn, understand, and use. The team wanted to be able to hire a data engineer and easily teach them how to use the tool.
- Being able to visualize the results. It was crucial for data quality engineers to present the results of their work and show the value they brought to the company.
- Flexibility and ease of integration. The retailer wanted a DQ tool that would easily integrate with their existing processes, infrastructure, and specifically the ERP system.
Solution & benefits
The retailer took full advantage of the flexibility of the Ataccama ONE platform, starting with ETL and DQ processing use cases on traditional data sources. The DQ team then built a data lake and reused existing configurations to scale their solution with big data processing capabilities leveraging Spark. Later, the team built a Kafka-based pilot to stream real-time data from cash desks in brick-and-mortar shops, and use this data for immediate decision making.
- Business development teams and management can reliably use company master data for driving business decisions.
- DQ engineers can clearly visualize and quantify data quality issues.
- A firmly established data quality culture resulting in good data used for decision making across the company.
Client Use Cases
DQ evaluation on a data lake
The team of 23 data quality engineers perform data quality checks on their data lake for various data-critical projects, such as launching a new product or creating a new report in the data warehouse. In terms of technical implementation, the data is processed with Spark jobs on a Hadoop cluster or a standalone server, depending on data volumes. The results are visualized in Tableau or exported to Excel for further analysis.
An ETL tool for integrating with ERP
Ataccama ONE works as an ETL tool for proving data to the company’s ERP system, with a data volume of 50 GB per day in this particular case. Additionally, DQ engineers use Ataccama ONE for fast ETL prototyping.
Address validation service
X5 also leveraged Ataccama ONE as an address validation solution with two use cases:
- Assisting 30+ business development specialists in completing address forms
- Providing validated address elements to various enterprise systems
To build the solution, the client team used an open source, countrywide address lookup, then constructed several data pipelines around it and published them as web services.