It’s impossible to contain covid-19 without knowing who’s infected: until a safe and effective vaccine is widely available, stopping transmission is the name of the game. While testing capacity has increased, it’s nowhere near what’s needed to screen patients without symptoms, who account for nearly half of the virus’s transmission.
Our research points to a compelling opportunity for data science to effectively multiply today’s testing capacity: if we combine machine learning with test pooling, large populations can be tested weekly or even daily, for as low as $3 to $5 per person per day.
In other words, for the price per test of a cup of coffee, governments can safely reopen the economy and halt ongoing covid-19 transmission–all without building new labs and without new drugs or vaccines.
Most people get tested for the coronavirus because they experienced symptoms, or came in close contact with someone who did. But as offices and schools come under pressure to reopen, organizations will need to grapple with an unpleasant truth: relying on symptoms to guide testing will miss asymptomatic and pre-symptomatic cases, and put everyone at risk.
The current alternatives, though, are not appealing. Infrequent testing (monthly seems to be the default in many proposals) or haphazard screening allow active cases to spread the virus for weeks before it’s caught. And the price is still high at $100 to $200 or more per test.
Pooled testing, guided by machine-learning algorithms, can fundamentally change this calculus. In pooled testing, many people’s samples are combined into one. If no virus is detected in the combined sample, that means no one in the pool is infected. The entire pool can be cleared with just one test.
But there’s a catch: if anyone in the pool is infected, the test will be positive and more testing will be required to figure out who has the virus.
So a key part of knowing how to pool is knowing the likelihood that certain people in the group will be positive, and separating them from the rest. How do we know that risk? That’s where machine learning comes in.
The risk of infection is evolving rapidly in the United States–the relative odds in New York and Florida have reversed in a matter of weeks. Risk also differs significantly between people–compare a health-care worker with an employee working remotely. Estimating this risk for each person is a perfect job for machine learning.
Using publicly available data from employers and schools, epidemiological data on local infection and testing rates, and more sophisticated data on travel patterns, social contacts, or sewage (pdf), if available, modelers can predict anyone’s risk of having covid-19 on a day-by-day basis. This allows highly flexible approaches to pooling that drive huge efficiency gains.
Another advantage: pooled testing gets more efficient when disease prevalence is lower. If a population–say, all students at a university–is tested daily, the risk of infection is dramatically lowered for everyone in the group, simply because testers remove positives from tomorrow’s pool when they diagnose them today. That means tomorrow’s pool can be even larger, which reduces the number of tests needed and thus the cost of testing the population. And with more frequent testing, people who are infected but don’t have symptoms can stay home, further reducing spread and making pooled testing even more efficient.
As a result, high-frequency pooled testing with machine learning costs far less than you might think. According to our analysis, testing daily costs only twice as much as testing monthly. And daily testing can actively suppress the virus, whereas monthly testing really only allows us to see how badly things have gone.
This effect can be so powerful, in fact, that under some conditions–such as in meatpacking plants or nursing homes–increasing frequency can actually lower the number of tests needed, and thus the cost of testing a population, in a given time period. You read that right: testing more often can actually be less expensive for the health-care system.
The last pillar of prevention through testing requires accounting for the virus’s spread between people and, therefore, for risk that is correlated. Using machine learning to model social networks has been a growing focus for researchers in computer science, economics, and other fields. Such algorithms, combined with data on jobs, classrooms, university dorms, and many other settings, allow machine-learning tools to estimate the potential that different people will interact. Knowing this likelihood can make group testing even more powerful.
Is high-frequency pooled testing feasible in the real world? While we don’t want to minimize the logistical challenges, they are just that–challenges, not deal-breakers. The US Food and Drug Administration has just approved the first use of pooled testing, and research increasingly shows that this technique is sensitive enough to detect positive cases. So as long as labs are willing, testers can start pooling today.
Though some have called into question the feasibility of pooling given the scale of the current outbreak, this is only a challenge because we traditionally rely on coarse–and, as we show in our paper, potentially inaccurate–estimates of virus prevalence in large populations. Instead, machine learning can give us the precise individual-level estimates we need to make pooling work even at high prevalences, by identifying those likely to test positive and keeping them out of large pools.
Frequency also pays huge dividends when virus prevalence is high. Before pooled testing is implemented–say, at a factory or school–the entire population could complete a one-time screening. Infected people would stay home until they recovered, and high-frequency pooled testing would keep prevalence low by catching disease early.
The logistics of sample collection and pooling in different settings must also be addressed. We’re encouraged by the increasing evidence for products, some approved by the FDA, that allow people to collect and submit their own test samples. One is based on saliva, which means collection costs can be kept low even at large scale.
It’s high time for high-frequency testing to become a core part of the US strategy to combat covid-19 and reopen the economy. Pooled testing that harnesses the power of machine learning makes paying the associated costs not only viable but, when weighed against the alternative of prolonged closures, a tremendous deal.
Ned Augenblick, Jonathan Kolstad, and Ziad Obermeyer are associate professors at the University of California, Berkeley. They are also cofounders of Berkeley Data Ventures, a consultancy that applies machine learning to health-care problems.