22,000+ of databases and 5000+ applications with over 8 PB of data to scan
Enterprise-wide PII Protection
Automated data discovery and cataloging of sensitive data
- $350 million in cost-avoidance and consumer protection by eliminating the risk of PII leakage
- $50 million in savings through data reuse and removing redundant systems and databases
- $25 million in savings by reducing data preparation times for AI teams
- Always-on automated data scanning, classification, and protection for existing and new data sources
- A solution for Data Mesh metadata gathering and disciplined, comprehensive data management
Thousands of data sources, including:
T-Mobile is the 2nd largest mobile telecommunications company and the leading provider of 5G in the US. As America's Un-carrier, T-Mobile is redefining the way consumers and businesses buy wireless services through leading product and service innovation. Their purpose is to change wireless and be a force for good in the process.
After the 2020 merger with Sprint, the organization had to handle more data than ever and make sure the combined infrastructure was running smoothly.
The 2021 data breach gave T-Mobile a new mission: rethink data management by automating data classification, data privacy, and data protection at scale.
T-Mobile had a clear purpose in mind: to ensure all private data in its ecosystem was secure and compliant at all times.
The main goals were:
- Secure newly added data and re-secure stored data
- Monitor changes in schemas and the metadata catalog
- Automate as many scanning processes as possible
To achieve all these goals, the data governance team came up with the “Data Scanning at Scale” initiative. This meant scanning an estimated 5,000 apps and 22,000 databases in an ongoing manner with 8 petabytes of data.
”With Ataccama, we are able to do reviews of triple the number of databases in half the time.”
Implementing privacy at scale – scanning and classifying 22,000 databases
Because of T-Mobile's federated distributed environment, they had to work with multiple IT, engineering, and shadow IT organizations inside the company. This meant getting access to and scanning a large number of various data sources with structured & unstructured data located in on-prem and cloud environments.
Based on previous implementations and tools used, these were some of the most common issues they were prepared to face:
- Scans would not complete successfully and fail while running
- Delays in onboarding data would leave scan teams waiting
- Numerous rescans due to lack of thorough data quality remediation processes
- Omitting production-level tables from completed scans
- Scan and catalog structured data previously in silos at the edge of the company
Another challenge was directly connected to their complex approach, and that was to identify a complete and versatile solution. The team needed something capable of onboarding different types of data sources, automating the data discovery and cataloging process, and rescanning databases easily.
Finding the right solution for an ever-growing infrastructure
The requirements were clear and called for a tool that would have to:
- Scan multiple systems at the same time
- Allow data stewards to create new data classifiers and labels to automatically and effectively classify new data sources in the future
- Use accepted or rejected status to improve success in future scans
- Integrate with other systems and feed results or issues
- Discover unknown applications & data stores and feed the information into routine scans
After analyzing multiple vendors, T-Mobile decided to do a proof of concept project that would run for 24 hours. The purpose was to challenge the shortlisted vendors to scan through multiple environments such as Oracle, Azure, Snowflake, and AWS and successfully parse through as much data as possible. Ataccama managed to scan 138,972 tables and apply custom extraction rules to narrow the list to 22,494 tables of sensitive data (since then the speed increased to 800,000 tables in a 24-hour period).
Ataccama was ultimately selected for its scanning and classification capabilities, total cost of ownership, integration flexibility, and future-proofing potential, among many other criteria.
”Looking at all the vendors, a lot of them had some great solutions, but holistically with all the things we were looking for in a tool, Atacama was the most mature solution.”
Implementing a complex framework with metadata-driven and AI-powered automation
The “Scanning at Scale” initiative resulted in an automated, self-improving, closed-loop solution that onboards data sources, classifies data with automated rules and AI, integrates with ticketing systems to create data remediation tasks, and provides reporting.
Ataccama is having a significant impact in almost all steps, thanks to the complete nature of the ONE Gen2 platform. Here is an overview of each step and what role Ataccama plays.
24x7 operations for scanning and remediation of data
The project is an ongoing initiative, and the data governance team is tackling each of the steps mentioned above with the help of Ataccama ONE Gen 2. They are continuously scanning systems, classifying data, and securing newly added sensitive assets. All in all, the organization now leverages better processes by implementing the “Data Scanning at Scale” initiative and is better positioned to deal with incoming data.
These are just a few of the results achieved so far:
- Avoiding a breach and sensitive data leakage similar to the one in 2021 results in potential savings of $350 million.
- Estimated $50 million in savings through data reuse and removing redundant systems and databases.
- Estimated $25 million in savings by reducing data preparation times for AI teams thanks to the data shopping experience in the Ataccama Data Catalog.
- Ramped up to 24x7 operations for scanning and remediation.
- Continuously scanning 22,000 databases by automating the classification and discovery process.
”Ataccama allowed T-Mobile to influence its future product capabilities and add at scale technology upgrades, data scannning at scale, classifying remediation, and quality at scale improvements. It also has added integrated tools such as catalog, glossary, and data observability as well as an internal BI tool that will allow us do all of our data management in one tool in the future, cutting down on costs to other tools currently doing this work. This sets them up to be best in class as we move into 2023.”
Automated data protection and data quality solution
Besides the immediate goal of scanning all existing databases and apps, the data governance team is looking to:
- Proactively scan all incoming data and identify new PII.
- Create and build a comprehensive data asset inventory and protect all customer data.
- Identify data redundancy opportunities to eliminate duplicate data environments, processes, and teams.
- Create a centralized Data Marketplace to deliver unified data sets, reports, and analytics.
- Implement automated data quality with several million data quality rules in production and develop a data quality as a service initiative, where every unit inside the organization can track data and make sure it is secured.