Health data has long been touted as the key to a revolution in medical research, fueled by billions of dollars from tech investors and Silicon Valley giants eyeing new markets.
But in the wake of a botched study on the benefits of an anti-malarial drug against COVID-19, leading academics warn that big sets of health data need to be treated with caution — and can by no means replace tried-and-true scientific methods in the search for medical treatments.
The study, published in leading medical research journal the Lancet in May, linked the use of hydroxychloroquine to increased deaths in patients infected with COVID-19. It immediately led the World Health Organization to pause its own trial on the drug, while some countries went so far as to ban its use as a treatment for the coronavirus.
Yet as questions emerged over the quality of the data used to support the study’s conclusions, the authors withdrew their support and the journal took the rare step of yanking the paper.
The ensuing scandal threw a damper on the idea of medical research being based entirely on big sets of health data, at a time when the market for health data is booming and big industrial players are pushing for its use in medical research. Google this week formally notified the European Commission of its plans to buy wearable health tracking devices company Fitbit — and its troves of sensitive health data — in a deal that has alarmed privacy campaigners, while Palantir, a U.S. data mining company, recently gained access to U.K. health data.
At the heart of the scandal are concerns about the quality of the data, and the disproportionate risk to health from basing medical research on faulty data.
U.S.-based Surgisphere, the company that provided data for the botched study, claimed to have gleaned data from around 700 hospitals across six continents.
When challenged about the quality of the data, it refused to open up its databases for audit, citing privacy and confidentiality agreements with the hospitals.
It wasn’t the first study backed by troves of Surgisphere data. Sapan Desai, the company’s CEO and one of the study’s authors, has long pushed for AI and big data analytics to be used more in health research. “With data like this, do we even need a randomized controlled trial?” he was reported as saying of the hydroxychloroquine study before it was retracted.
However, researchers contacted by POLITICO cautioned against the idea of big data replacing tried and tested methods like randomized controlled trials — considered the gold-standard in medical research. They also questioned Surgisphere’s assertions that it couldn’t open up the data for audit.
Neither Surgisphere nor Desai could be reached for comment.
In a media appearance before the paper was retracted, Desai played up his company’s role in the research, saying that a study of that “size and quality” had only been possible thanks to Surgisphere’s technology.
“If information passes into the hands of a company, what is then the company’s responsibility as steward of that data? My own opinion is that I don’t think there are yet enough clear guidelines and checks for consistency at this point,” said Charles Mayo, a professor at the University of Michigan and author of a paper on big data in clinical trials.
Mayo added that the Surgisphere scandal is likely to prompt most institutions to implement additional practices and frameworks to cover use of patient data with commercial companies — adding that this type of oversight has long been a part of clinical research, with oversight by institutional review boards and compliance offices.
Big role for big data
Experts questioned Desai’s assertion that big data can replace randomized trials — but said the idea is nonetheless gaining traction as more data becomes available and AI capabilities improve.
“It needs a lot of effort to get the data to the point where it can become useful … we don’t see data replacing randomized trials. I would say that is misguided,” said Mayo.
Randomized controlled trials aim to reduce bias by randomly assigning subjects to two or more groups. One group receives the intervention, while the other receives nothing or a placebo. The results of the groups are then compared.
While vital for medical research, these kinds of trials can be expensive and take years to complete. In contrast, the appeal of using big data and AI is linked to speed and lower costs, with its promise of large troves of ready-to-go data, sitting in servers around the world.
But those at the coalface of scientific research say that though big data can be useful, it is no match for the rigor of trials under test conditions.
More perilously, big data can throw up misleading results — a risk inherent to an approach where data is often shoddy and ways of checking it are limited. As Tom Treasure, a University College London professor who co-authored a paper looking at the use of big data in research, put it: “You can make some ghastly mistakes from big data.”
For Mayo, a lack of funding for the infrastructure needed to ensure data quality is one risk of relying too heavily on big data in research.
“It is exceedingly hard for institutions to get funding for the infrastructure. People sell the idea that they can do all these things with the data, but in reality the data often isn’t good or well organized enough,” Mayo said.
He said that analyzing large data sets is useful in finding associations and validating the results of a study, but is “no substitute” for answering targeted questions in a way that trials do.
Treasure also poured cold water on the suggestion that data analytics can replace randomized trials.
“A new treatment has to be tested in a formal way, not just unearthed from a mass of data,” he said. “I think if you asked experienced people who have a good, rounded view, they will say we will still need trials to pick signal from the noise, but databases are still useful.”
Even so, with analytics capabilities and the amount of available data on the rise, belief in the irreplaceability of randomized control trials could be challenged.
“The idea that AI and big data will replace randomized trials is starting to become more standard,” Mayo said.
Theo Arvantis, professor of digital health innovation at the University of Warwick, said that AI and big data analytics are currently considered complementary to randomized controlled trials, but that the way clinical studies are done “might have to evolve.”
Questions over process
While experts are wary of the use of big data in medical studies, the Surgisphere scandal also exposed the extent to which scientific journals are ill-equipped to properly vet research based on an analysis of vast data sets.
In the case of the Lancet study, post-publication issues were raised by over 100 researchers relating to the absence of ethics review, inadequate adjustment for known and measured variables that can change the effect of what’s being studied, and data that is inconsistent with government reports of cases and deaths.
According to Natalie Banner, the Wellcome Trust’s lead for its Understanding Patient Data project, the scandal shows that journals haven’t yet caught up with the new challenges thrown up by the increasing use of big data in research.
“I think every part of the system that is involved in the research process has not quite caught up with the potential of big data research — all the necessary checks and balances that are needed to ensure that it is done ethically and robustly,” she said.
The ability of researchers to get a study based on questionable data published on a reputable platform has shone an unflattering light on the publishing process — and on the journals themselves.
UCL’s Treasure said it is “extraordinary” that the journals didn’t establish the source or the company’s permission to use the data, while Paul Elbers, an intensive care physician at Amsterdam UMC, said the scandal has highlighted the “inadequacy” of the peer review system.
“A limited number of people get to look at the paper before it’s published, and I’m sure they mean well, but they cannot possibly oversee all problems with it,” Elbers told POLITICO.
But the pressure to find a cure for COVID-19 has also contributed to shortcomings in the review process.
“The rush to publish, short-circuiting peer review, and the cognitive bias to fall for impressive news, are features of the times of COVID-19,” Treasure said.
Following the retraction of a paper containing data from Surgisphere, the prestigious New England Journal of Medicine, announced it is assessing its guidelines on reporting of research on big data, reported the Guardian.
A spokesperson for the journal said that they had “limited experience with reviewing or publishing studies like this one, which used a large database based on electronic medical records.” They revealed that none of the peer reviewers had seen the raw data when assessing the study.
“In the future, our review process of big data research will include reviewers with such specific expertise,” they said.
The Lancet also said that it was reviewing its “requirements for data sharing and validation among authors, and data sharing following publication.”
Confidentiality claims
Researchers were also skeptical of Surgisphere’s claims that confidentiality agreements barred it from having its databases audited and revealing the sources of the information.
Warwick University’s Arvantis said he could think of only one place where this argument may be valid — during the peer review process to avoid bias.
Others were even more forthright.
“I cannot understand a privacy argument for not releasing the names of the hospitals that have been involved,” said Wellcome Trust’s Banner. “Not being open and transparent about who’s being partnered with and when, and why and under what circumstances … it’s absolutely toxic to public trust.”
UCL’s Treasure called Surgisphere’s claim into question, too.
“Patient confidentiality is, rightly, sacrosanct. I don’t believe that extends to institutional sources. It is a bizarre claim. Usually, it is a point of honor that you give due credit by formally acknowledging the database and those who gave you access to it,” he said.
A spokesperson for Berlin’s Charité hospital said it was not usual for hospitals to insist on their identity remaining secret when they share data. “As a public research organization of excellence Charité seeks to be transparent about its processes as much as possible without endangering patient privacy rights.”
Researchers emphasized that getting access to databases is not straightforward. Michigan University’s Mayo said that it took him two years to get the sign-offs needed to share some data sets with other researchers. “When there is a company involved then there is even more scrutiny,” he said.
POLITICO emailed dozens of major European hospitals and health ministries asking whether they had ever provided data to Surgisphere. Of those that responded — including leading hospitals in Germany and France, and the Glasgow branch of the National Health Service — none had worked with Surgisphere, nor shared data with it. Some, such as the Dutch Hospitals Association, the Spanish Ministry of Health and Charité in Berlin, also emphasized the highly regulated nature of sharing patient data, and said they had not worked with Surgisphere.
For Treasure, though, questions around the quality and provenance of data used in research aren’t new, and won’t go away.
“There have always been and continue to be cheats. People make up experimental data as well. And it looks as though, if it was fraudulent, [Desai] hasn’t got away with it.”
Comments are closed