Innovate 2023 logo

Ataccama Innovate 2023

The Future of Data Quality
Success Story

T-Mobile Automates PII Protection on Thousands of Databases

Industry

Telecommunications

Challenge

22,000+ of databases and 5000+ applications with over 8 PB of data to scan

Goal

Enterprise-wide PII Protection

Solution

Automated data discovery and cataloging of sensitive data

Key results

  • $350 million in cost-avoidance and consumer protection by eliminating the risk of PII leakage
  • $50 million in savings through data reuse and removing redundant systems and databases
  • $25 million in savings by reducing data preparation times for AI teams
  • Always-on automated data scanning, classification, and protection for existing and new data sources
  • A solution for Data Mesh metadata gathering and disciplined, comprehensive data management

Data stack

Thousands of data sources, including:

The Situation

T-Mobile is the 2nd largest mobile telecommunications company and the leading provider of 5G in the US. As America's Un-carrier, T-Mobile is redefining the way consumers and businesses buy wireless services through leading product and service innovation. Their purpose is to change wireless and be a force for good in the process.

After the 2020 merger with Sprint, the organization had to handle more data than ever and make sure the combined infrastructure was running smoothly.

The 2021 data breach gave T-Mobile a new mission: rethink data management by automating data classification, data privacy, and data protection at scale.

Business objective

T-Mobile had a clear purpose in mind: to ensure all private data in its ecosystem was secure and compliant at all times.

The main goals were:

  • Secure newly added data and re-secure stored data
  • Monitor changes in schemas and the metadata catalog
  • Automate as many scanning processes as possible

To achieve all these goals, the data governance team came up with the “Data Scanning at Scale” initiative. This meant scanning an estimated 5,000 apps and 22,000 databases in an ongoing manner with 8 petabytes of data. 

With Ataccama, we are able to do reviews of triple the number of databases in half the time.”

The Challenge

Implementing privacy at scale – scanning and classifying 22,000 databases

Because of T-Mobile's federated distributed environment, they had to work with multiple IT, engineering, and shadow IT organizations inside the company. This meant getting access to and scanning a large number of various data sources with structured & unstructured data located in on-prem and cloud environments. 

Based on previous implementations and tools used, these were some of the most common issues they were prepared to face:

  • Scans would not complete successfully and fail while running
  • Delays in onboarding data would leave scan teams waiting
  • Numerous rescans due to lack of thorough data quality remediation processes
  • Omitting production-level tables from completed scans
  • Scan and catalog structured data previously in silos at the edge of the company

Another challenge was directly connected to their complex approach, and that was to identify a complete and versatile solution. The team needed something capable of onboarding different types of data sources, automating the data discovery and cataloging process, and rescanning databases easily.

Selection Criteria

Finding the right solution for an ever-growing infrastructure

The requirements were clear and called for a tool that would have to:

  • Scan multiple systems at the same time
  • Allow data stewards to create new data classifiers and labels to automatically and effectively classify new data sources in the future
  • Use accepted or rejected status to improve success in future scans
  • Integrate with other systems and feed results or issues
  • Discover unknown applications & data stores and feed the information into routine scans

After analyzing multiple vendors, T-Mobile decided to do a proof of concept project that would run for 24 hours. The purpose was to challenge the shortlisted vendors to scan through multiple environments such as Oracle, Azure, Snowflake, and AWS and successfully parse through as much data as possible. Ataccama managed to scan 138,972 tables and apply custom extraction rules to narrow the list to 22,494 tables of sensitive data (since then the speed increased to 800,000 tables in a 24-hour period).

Ataccama was ultimately selected for its scanning and classification capabilities, total cost of ownership, integration flexibility, and future-proofing potential, among many other criteria.

Looking at all the vendors, a lot of them had some great solutions, but holistically with all the things we were looking for in a tool, Atacama was the most mature solution.”

The Solution

Implementing a complex framework with metadata-driven and AI-powered automation

The “Scanning at Scale” initiative resulted in an automated, self-improving, closed-loop solution that onboards data sources, classifies data with automated rules and AI, integrates with ticketing systems to create data remediation tasks, and provides reporting.

Ataccama is having a significant impact in almost all steps, thanks to the complete nature of the ONE Gen2 platform. Here is an overview of each step and what role Ataccama plays.

Onboarding

At this initial stage, the project team connects data sources to the data catalog and configures universal reusable data classification and data quality rules.

Data Catalog
  • Connect data sources
  • Develop and test universal scanning rules
  • Develop additional DQ rules
Ataccama Engine
  • Connection readiness testing
Ataccama API
  • Bulk source onboarding
  • Identity management tool integration
  • Scan triggering
Scanning & Classification

At this stage, the Ataccama Engine classifies data based on the classification rules.

Data Catalog
  • Import metadata
  • Manage classification rules
Ataccama Engine
  • Profile sample data
  • Apply classification rules
  • Identify sensitive data assets
  • Train AI for automated classification
Validation

After the data is automatically classified, data stewards validate flagged tables for false positives and negatives in the Ataccama Data Catalog along with an additional home-built AI model.

Data Catalog
  • Review scan results
  • Search for related data assets
  • Flag additional tables
  • Review flagged tables
Ataccama API
  • Create tasks and tickets
  • Integration with ticketing tools
  • Provide metadata to data governance solution
Remediation

After classification results have been validated, data needs to be remediated: masked, deleted or scrambled. Finally, data is rescanned to confirm no sensitive data remained unprotected.

Data Catalog
  • Track the remediation progress
  • Finetune classification
  • Schedule system rescans
Ataccama Engine
  • Run system rescan
Ataccama API
  • Update ticketing tools
Reporting

Finally, the team uses the Ataccama Data Catalog to continuously review scanning processes and send reporting data to BI tools.

Data Catalog
  • Review scanning progress using built-in reporting
  • Gather data source statistics
Ataccama API
  • Send reporting data to BI tools

Results

24x7 operations for scanning and remediation of data

The project is an ongoing initiative, and the data governance team is tackling each of the steps mentioned above with the help of Ataccama ONE Gen 2. They are continuously scanning systems, classifying data, and securing newly added sensitive assets. All in all, the organization now leverages better processes by implementing the “Data Scanning at Scale” initiative and is better positioned to deal with incoming data.

These are just a few of the results achieved so far:

  • Avoiding a breach and sensitive data leakage similar to the one in 2021 results in potential savings of $350 million.
  • Estimated $50 million in savings through data reuse and removing redundant systems and databases.
  • Estimated $25 million in savings by reducing data preparation times for AI teams thanks to the data shopping experience in the Ataccama Data Catalog.
  • Ramped up to 24x7 operations for scanning and remediation.
  • Continuously scanning 22,000 databases by automating the classification and discovery process.

Ataccama allowed T-Mobile to influence its future product capabilities and add at scale technology upgrades, data scannning at scale, classifying remediation, and quality at scale improvements. It also has added integrated tools such as catalog, glossary, and data observability as well as an internal BI tool that will allow us do all of our data management in one tool in the future, cutting down on costs to other tools currently doing this work. This sets them up to be best in class as we move into 2023.”

The Future

Automated data protection and data quality solution

Besides the immediate goal of scanning all existing databases and apps, the data governance team is looking to:

  • Proactively scan all incoming data and identify new PII.
  • Create and build a comprehensive data asset inventory and protect all customer data. 
  • Identify data redundancy opportunities to eliminate duplicate data environments, processes, and teams.
  • Create a centralized Data Marketplace to deliver unified data sets, reports, and analytics.
  • Implement automated data quality with several million data quality rules in production and develop a data quality as a service initiative, where every unit inside the organization can track data and make sure it is secured.

Have a question? Speak to our industry expert.

Marek Ovcacek
Project Manager
Send a message