March 27, 2019

You Get Out What You Put In: Exploring the Role of High-Quality Data in AI

By Kathryn E. Horneffer | HGHI Research Assistant

— “Who has heard of Gartner’s hype cycle?” Adam Landman addressed his audience, pointing to an image on the slides behind him.

The IT research company Gartner had created the hype cycle concept to represent the typical process of adoption of a technology into society. Most artificial intelligence (AI) and digital technologies, Landman explained, are still in the early stages of the cycle, represented by the “peak of inflated expectations.” While health professionals and researchers are excited about the possibilities associated with these technologies, there is limited evidence surrounding the most effective ways to implement them.

This was the overarching takeaway from the first set of flash talks at “Hype vs. Reality: The Role of Artificial Intelligence in Global Health”: While AI has the potential to address a wide set of global health problems, it is not a magic bullet.  Through their discussion of problems related to data access, algorithm bias, and appropriate application of AI, speakers emphasized that high-quality data is crucial to effectively harnessing the power of AI.

Although the definition of AI can be ambiguous, the term has become almost synonymous with deep learning, a type of machine learning that teaches computers to make connections based on large amounts of data.  However, not just any data is useful for deep learning; as Andrew Beam explained, the technique requires massive amounts of labeled data, and it must look high dimensional but actually possess some type of low dimensional structure (Click here for an explanation of high and low dimensional data.) This presents a unique challenge when considering the application of AI to global health: in low-resource settings, data—especially high-quality data—is often limited.

How then do we gather enough data to make AI work in these settings?  Several speakers proposed solutions: perhaps community health workers, armed with mobile technologies, could gather basic data, like blood pressure measurements, and feed the information into a centralized database. Wearables and voice-based biomarkers present the opportunity for a participatory system, where individuals can contribute their data through a sensor on their body and receive individualized insights in return.

As we seek to expand collection of useful global health data, we also must consider the intertwined questions of privacy and access.  Health data, some speakers argued, is a public good.  Open data and transparent algorithms will increase productivity and ensure that findings can be reproduced across research groups.

Merce Crosas presented the FAIR data principles that aim to facilitate data reuse: the data must be Findable, Accessible, Interoperable, and Reusable. Yet security must also be a concern. Crosas proposed a tiered data access system, segmenting datasets into varying levels of access protocols based on the sensitivity of the personal health information included. The issues raised by increasing access to high-quality data—power, politics, and privacy, as Phuong Pham put it— are crucial as researchers look to apply AI to contexts without strong infrastructure for data collection.

Using the wrong data can have big consequences for the effectiveness of AI. If fed low-quality data, or not enough data, algorithms can develop bias. It is important that scientists think carefully about the implications of the data they are inputting and recognize the limitations of their model. Researchers also need to consider the context of their AI. If using the technology in a low- or middle-income country, the setting and resources might look completely different, and the prevalence of certain diseases could vary greatly; any effective model must take these factors into account.

AI has enormous potential to improve the field of global health and the efficacy of medical treatment.  The technology can consider contextual factors, past cases, and multiple sets of data in a way that humans cannot, leading to more effective clinical decision-making and diagnosis.  However, there are many unrealistic expectations surrounding the field, and researchers sometimes try to apply machine learning without fully understanding the type of data that is required.

Although there is still more work to be done in testing and validating AI, specifically regarding its applications to global health, the speakers agreed that the promise of AI is not just empty hype. The data available may be imperfect, but despite its limitations, AI is already being used as an important tool to support healthcare providers. As the focus on gathering and inputting high-quality data increases, the power of AI to revolutionize global health delivery will only grow.