2021 Gartner® Magic Quadrant for Data Quality Solutions
We're a Leader again! Gartner® Magic Quadrant™ for Data Quality Solutions
Blog

Data for Good: Enabling Data-Driven Altruism with Data Governance

Data and the insights it provides are powerful tools used to identify, assess and resolve social and humanitarian problems in real-time. How can data professionals step up their social input? In this article, you will learn:

  • How data supports the social good
  • What are the steps data professionals and organizations can take to make productive use of data for the good of our community and greater society?
  • How open data and activism help use data for the good of mankind
  • What are the barriers to using data more effectively - and what pillars organizations must focus on to contribute their data to solving pressing social issues. Hint: Building data-sharing agreements, protecting privacy, and collaborating with other stakeholders
  • Three pillars of data usability for the social good: why organizations must focus on data security, data accessibility, and data quality

You can sum up the current state of data management in three words: Never-ending growth. 

"We are in the age of data deluge now. Statistics indicate that nearly 1.1 trillion megabytes of data are getting generated every day, which means around by the end of this year around 70 percent of the globe's GDP would have undergone complete digitization."

Shirjana Cadavarmu, Senior Director of Artificial Intelligence Technology, Walmart

Some social problems can be readily solved using big data, such as traffic data to help ease highway traffic flow or weather data to predict the next hurricane. But what if we want to use data to help us solve our most human and critical social problems, such as homelessness, human trafficking, and education? And what if we not only want to solve these problems but do so in a way that the solutions are sustainable for the future?

It's no wonder organizations of all kinds are making immense efforts to ensure that data is stored, accessed, and managed properly. Today it’s essential to seek an opportunity to make productive use of data for the good of our community and greater society. Yet, an enormous gap exists between the potential of data-driven insights and their availability to solve social problems. What can organizations do to support social causes with data?

"To do that successfully, we need to understand the data we want to use and the steps we need to take to make it organized and classified. Then we need to improve its quality so that we can get the most value from it."

Afshin Lofti, CEO of Ataccama North America, when speaking at the Data for Good 2022 event

Every organization needs to ask themselves these questions:

  • Which efforts can ensure that our data is of the highest possible quality while readily accessible?
  • What initiatives are we undertaking in our quest to improve the quality of our data and classify and organize it properly?
  • What are the current goals for data governance in our organization so that using data for good aligns with our business goals?

And finally: 

  • How can we leverage data to have a positive impact on our society?

4 criteria that define a “data for good” project

Before we jump into examples of how data is used for the good of mankind, let's take a look at the criteria for qualifying projects as a "data for good" effort.

Google Brain researcher Sara Hooker defined these four criteria:

  1. Data products are intended for non-profits or government agencies.
  2. Data products are developed and delivered by skilled volunteers.
  3. Tools and data are provided free of charge or at a heavily subsidized rate to organizations/individuals.
  4. Data usage skills of underserved communities are improved by providing relevant education.



How data supports the social good

Here are some other examples of cases when data could contribute to more efficient decision-making in solving critical social issues:

Healthcare

Human lifespans are increasing worldwide, posing a new challenge for treatment delivery methods today. Analytics in healthcare has the potential to reduce treatment costs, predict epidemic outbreaks, avoid preventable diseases, and improve the quality of life. Here are just some examples.

For instance, French hospitals that are part of Assistance Publique-Hôpitaux de Paris have been analyzing multiple data sources to predict how many patients are likely to be at each hospital daily and hourly, as reported by an Intel whitepaper.

Kaiser Permanente has fully implemented a HealthConnect system that shares data across all of its facilities and makes it easier to use electronic health records. A McKinsey report on big data in healthcare states, "The integrated system has improved outcomes in cardiovascular disease and achieved an estimated $1 billion in savings from reduced office visits and lab tests.”

Overdoses from misused opioids have caused more accidental deaths in the US than road accidents, it has been calculated. Data scientists at Blue Cross Blue Shield collaborated with analytics experts at Fuzzy Logix and have identified 742 factors that predict whether someone will abuse opioids by analyzing decades of insurance and pharmacy data.

Homelessness

Working on the verge of healthcare and homelessness - another burning issue - scientists proposed to make good use of data along with 3D construction printers to construct homes for low-income citizens:

  • Access to big data can determine how many people need accommodation
  • 3D construction printers would be able to build houses accordingly and quickly. 

Combining data and 3D printing technology can help solve global housing crises more efficiently than traditional and unguided construction plans.

Education

By analyzing big data, professors can identify areas where students struggle or succeed, develop strategies for individualized learning, and understand students' individual needs. In the near future, intelligent chatbots could serve as teachers for those without access to other forms of education. Thanks to data visualization, it will be a piece of cake to visualize the results of students’ educational progress graphically. Poverty and lack of education are closely linked, but AI and data can help improve education levels in poorer areas. 

Using data for solving social problems by area

Poverty

Tech giant IBM is looking at different ways to alleviate poverty by applying AI and other societal issues in their “Science for Social Good.” In the “Emergency Food Best Practice” initiative, the team has partnered with the non-profit St John's Bread & Life in New York City, which feeds the most vulnerable in the city, serving more than 2,000 meals a day. IBM develops a tool powered by the data from the non-profit’s distribution model and shares it with other similar non-profits.

Open data and activism to use data for the good of mankind

Human trafficking, global hunger, and poverty are all complex issues that need access to data but it is not necessary to be a data professional to use data professionally. The general public can also create and analyze properly organized and managed data. Open data platforms have made it possible for citizens to create new ideas and products through what has become known as "citizen science." Here’s one example.

In 2010, the City of London opened the London Datastore, a repository of government data. London Datastore provided citizens with access to raw data released by city agencies and government employees. It was managed by Greater London Authority. This information included information on crime, economics, and transit data in real-time. An online map app of the City of London underground, made by a local web developer, received more than 250,000 clicks within days. The London Datastore was used by a cyclist enthusiast to create a bike map.

And there are more projects like that. Citizen activists are using data to make innovations, mostly apps, to address a particular social issue. In order to address the shortage of technical personnel to handle big data projects, DataKind matches scientists and statisticians with nonprofit organizations for pro bono data work. A newer socially-focused competition platform DrivenData partners with non-profit organizations focused on solving social problems and making a real-world impact. 

Barriers to Using Data for Global Good

Because social issues are often more complex than those in business or science, using data can be difficult. The main reason why there isn't enough structured data for social problems available is that…

  • Data is buried in administration systems 
  • Data is often unreliable (low quality)
  • Data governance standards aren't in place
  • It’s hard to achieve efficient data sharing

Data are buried in administrative systems

Organizations collect data for operational purposes, but it often gets buried in their administrative systems. To solve this problem, organizations are experimenting with building large datasets that can be used more widely. 

“Data is a strategic asset, and we want to be very careful about using it across the enterprise.”

Nancy Colton, Staff VP, Health Care Analytics, Anthem

Before datasets can be connected across organizations, data must be transformed and secured simultaneously. A good example is the health care industry—administrative costs associated with inefficient data management range from $100 billion to $150 billion a year. 

The greatest challenge in the health care industry is the sheer number of insurance plans that have their underwriting system, claims management, provider network contracts, and broker network management. Crucial data gets stored in many places in different formats. According to McKinsey Global Institute, if the US health care industry used big data to improve quality and efficiency, it could add more than $300 billion to its profits.

While several government agencies and nonprofit organizations are tackling these problems, collaboration and the use of data is limited. They lack adequate IT resources compared to people in the hard sciences or businesses with access to financial and product information.

Data is often unreliable

Data itself can be a problem. It can be often missing, incomplete, or stored in silos or formats that cannot yield any value. For example, so collecting human trafficking primary data is challenging. And the information we do collect isn't always reliable.

There aren't enough standards for data governance

Data governance standards, which define how data is collected, stored, and curated for accountability, constitute the second challenge in organizations' ability to use data to solve social problems. 

Inconsistencies can result when the captured data do not lend themselves to analysis. To be used, data must be transformed, which involves a lot of work. It is often difficult for analysts to integrate different datasets because they lack good metadata (which describes data) and the poor quality of data.

“The greatest contribution that data gives to humankind is clearing the fog. It gives you the exact situation and a correct, current state. It doesn't give you a range – it tells you exactly where you are.”

Usri Roy Chaudhury, Regional Data Lead for Retail Credit Decision Engine, HSBC Bank

Data.gov, a U.S. government initiative to make its vast amounts of data available to the public for innovative purposes, is an example of such a hindrance. Providing an inappropriate format became a stumbling block. Government datasets are becoming more and more distributed, but only a few are ever used.

Some agencies, like the Environmental Protection Agency, provide data in machine-readable formats, while others still use difficult-to-modify formats such as PDFs. It's not surprising that organizations that succeed in making data available, also have good metadata, easy accessibility, and data governance and ownership standards.

Failing to establish efficient data sharing

Also, there are policy and regulatory challenges to solve, such as building data-sharing agreements, protecting privacy, and collaborating with other stakeholders working on the same issues.

For example, most human trafficking data is kept in a way that meets organizational needs, but not global ones. Data privacy and security concerns mean that data held by various organizations is rarely shared raw, making it hard to create global datasets.

To make matters even worse, agencies fighting traffic often compete with each other for scarce resources, whether it's grants and donations or media attention. Due to competition, agencies don't share data much -- neither do they share with the public. 

For example, Polaris Project uses a comprehensive approach to fighting human trafficking, combining advocacy, client services, technical training and assistance, programs overseas, and a hotline. Polaris ran hotlines for human trafficking survivors between 2003 and 2006. Polaris became the nation's first national human trafficking hotline in 2007 by the US Department of Health and Human Services. Polaris has reportedly logged over 75,000 calls over the years, but access to the data is limited, and little is known about its reliability. 

Imagine if the Polaris data was open to the public and integrated with other data sources, like economic indicators, transportation routes, education statistics, and victim services. 

Three pillars of data usability for the common good

To make their data valuable, organizations need to adjust their approach. “There are three pillars that we need to act on immediately. 

The first is data protection because in the digital world coming, there are quite a lot of mergers between external and internal data. The entire industry becomes extremely responsible and liable to protect the data, especially because we will host it on clouds. That's the first one that is keeping us all alert." 

“The second one is data availability. Some people think that data is an undisclosed asset, so I need to hide it away and not let anybody touch it. That is not going to help. We need to have proper robust access management. We need to balance protecting it and making it available.”

“The third thing is making it [data quality] right the first time. We have always taken data the way it was given to us. However, it may be incomplete. Then we tried to clean it in the data journey within the organization. As the volume of data grows, this approach is no longer sufficient,” said Usri Roy Chaudhury, Regional Data Lead for Retail Credit Decision Engine, HSBC Bank, shared with the audience at Data for Good Summit. 

Pillars of using data for good

Final Thoughts

Data-driven insights can significantly enhance the quality of life for people at risk. But simply having lots of data isn’t enough. To use data for good, organizations also need to have a way to get real insights and value from it. 

At Ataccama, we offer a full-stack data management platform, Ataccama ONE, that unifies data governance, quality, and master data management. Organizations across industries use Ataccama data products to holistically manage their data and produce data products that can be shared and reused by everyone.

Related articles

Blog
What is data governance?

What is data governance?

What is data governance? Data governance is a set of processes and rules that…

Read more
Blog
5 Proven Techniques For Successful Data Governance

5 Proven Techniques For Successful Data Governance

Learn 5 useful techniques from our experts in Data Governance.

Read more
Webinar
The State of Data Governance

The State of Data Governance

Hear our speakers engage in a fireside chat and discuss techniques & insights…

Read more