Digital Biosurveillance: From Wearables to Real-Time Genomics
Eric Topol has been making this argument for years: hundreds of millions of people already wear devices that continuously monitor their heart rate, sleep, and activity. These data streams could function as a distributed early warning network for infectious disease outbreaks. The Scripps DETECT study showed Fitbit data could identify COVID-19 cases with 80% accuracy, and Stanford found 63% of cases showed detectable physiological changes before symptom onset. Yet no operational biosurveillance system currently uses wearable data for outbreak detection, while AI-enabled genomic surveillance platforms like Nextstrain proved their value during COVID-19 by enabling South Africa to detect Omicron and alert WHO within two weeks of first sample collection.
- Evaluate the evidence for wearable devices as population-level early warning systems for infectious disease outbreaks
- Describe AI-enabled genomic surveillance pipelines (Nextstrain, Pangolin) and their role in real-time variant detection
- Understand the infrastructure requirements for pathogen-agnostic metagenomic surveillance
- Assess the practical limitations of digital biosurveillance, including coverage bias, privacy concerns, and validation gaps
- Apply critical evaluation frameworks to emerging biosurveillance technologies
Introduction: The Missing Early Warning System
Eric Topol has been making this argument for years: we have hundreds of millions of people wearing devices that continuously monitor their heart rate, sleep, and activity. These devices could function as a distributed early warning network for infectious disease outbreaks. Yet we are not using them.
The concept is straightforward. Infectious illness causes measurable physiological changes: elevated resting heart rate, disrupted sleep, reduced activity. These changes often precede symptom onset by hours or days. If we could aggregate and analyze this data at population scale, we might detect outbreaks before the first patient walks into an emergency department.
Outbreak Detection and Surveillance covered traditional surveillance systems. AI for Biosecurity Defense examined AI applications in biosecurity defense. This chapter focuses on emerging infrastructure that could transform biosurveillance: wearable-based early warning, AI-enabled genomic pipelines, and pathogen-agnostic environmental monitoring.
The evidence is mixed. Some applications are operational and proved their value during COVID-19. Others remain research concepts with significant validation gaps. Distinguishing between the two is essential for biosecurity practitioners.
Wearables and Population-Level Digital Biomarkers
The Topol Thesis
The core argument: consumer wearables generate continuous physiological data that could detect infectious disease outbreaks at population scale, potentially days before traditional surveillance systems.
Theoretical advantages:
- Pre-symptomatic detection: Physiological changes precede symptom awareness
- Continuous monitoring: 24/7 data collection without active participation
- Population coverage: Hundreds of millions of devices already deployed
- Geographic granularity: Location data enables local outbreak detection
The devices: Fitbit, Apple Watch, Garmin, Oura Ring, Whoop, and others track heart rate, heart rate variability (HRV), sleep duration and quality, blood oxygen saturation (SpO2), activity levels, and skin temperature.
Evidence from COVID-19 Studies
Scripps DETECT Study
The Digital Engagement and Tracking for Early Control and Treatment (DETECT) study, led by researchers at Scripps Research including Eric Topol, enrolled over 30,000 participants starting in March 2020.
Key findings:
| Metric | Result |
|---|---|
| Participants enrolled | 30,529 (March-June 2020) |
| Device types | 78% Fitbit, 31% Apple Watch |
| COVID-19 prediction accuracy | ~80% (symptomatic individuals) |
| Data used | Resting heart rate, sleep, activity |
The study demonstrated that combining wearable sensor data with self-reported symptoms could identify COVID-19 cases with approximately 80% accuracy among symptomatic individuals.
Follow-up findings: A subsequent analysis published in JAMA Network Open found that COVID-19 infection was associated with prolonged physiological changes. Resting heart rate remained elevated for an average of 79 days after symptom onset in COVID-positive participants, compared to 4 days in those with other respiratory illnesses.
Stanford Smartwatch Study
Researchers at Stanford University analyzed smartwatch data from 5,262 participants, including 32 individuals with confirmed COVID-19 infection.
Key findings:
- 81% of COVID-19 cases (26 of 32) showed detectable alterations in heart rate, steps, or sleep
- 63% of cases could theoretically have been detected before symptom onset
- Some cases showed physiological changes up to 9 days before symptom onset
Methodology: The study used a two-tiered warning system based on extreme elevations in resting heart rate relative to individual baselines.
These studies demonstrate proof-of-concept, not operational readiness. Critical limitations:
Retrospective analysis: Both studies analyzed data after COVID-19 diagnosis was confirmed. Prospective, real-time detection is harder.
Selection bias: Participants were self-selected, tech-savvy individuals who owned wearables and chose to enroll in research.
Symptomatic cases: The 80% accuracy in DETECT applied to symptomatic individuals. Detecting asymptomatic spread is more challenging.
Individual vs. population: Detecting individual illness differs from detecting population-level outbreaks. The signal-to-noise ratio at population scale is unknown.
Practical Limitations for Biosurveillance
Coverage bias: Wearable users are not representative of the general population.
| Factor | Wearable Users | General Population |
|---|---|---|
| Age | Skew younger | Full age range |
| Income | Higher income | Full range |
| Urban/rural | More urban | Mixed |
| Health-seeking | More engaged | Variable |
This means wearable-based surveillance would systematically underrepresent vulnerable populations, precisely the groups often most vulnerable to infectious disease outbreaks.
Privacy concerns: Aggregating continuous health data raises significant privacy issues. Even anonymized, location-tagged health data can potentially be re-identified. No consensus exists on appropriate governance frameworks for population-level wearable surveillance.
Validation gaps: No prospective study has demonstrated that wearable-based surveillance can detect an outbreak before traditional systems. The COVID-19 studies showed individual detection, not population-level early warning.
Commercial fragmentation: Data sits in siloed platforms (Apple Health, Fitbit, Garmin Connect) with no standardized public health interface.
Current Status
Research stage. No operational biosurveillance system currently uses wearable data for outbreak detection. Apple, Google, and Fitbit have pandemic response research programs, but these have not transitioned to operational public health tools.
The infrastructure is tantalizing: millions of sensors, continuous data streams, potential for geographic granularity. But the validation, governance, and integration challenges remain substantial.
AI-Enabled Genomic Surveillance Pipelines
Unlike wearables, AI-enabled genomic surveillance proved its value during COVID-19. This section examines the infrastructure that made real-time variant tracking possible.
The Transformation
During COVID-19, genomic surveillance shifted from an academic exercise to an operational public health tool. The infrastructure that enabled this transformation includes automated lineage assignment, real-time phylogenetic visualization, and global data sharing platforms.
Nextstrain: Real-Time Phylogenetics
Nextstrain provides open-source tools for real-time tracking of pathogen evolution.
What it does:
- Integrates genomic sequence data with geographic, temporal, and epidemiological metadata
- Generates interactive phylogenetic visualizations
- Enables tracking of pathogen spread and evolution in near-real-time
- Supports multiple pathogens (influenza, SARS-CoV-2, Ebola, Zika, and others)
COVID-19 impact: Nextstrain became the primary platform for visualizing SARS-CoV-2 evolution globally. Public health agencies, researchers, and journalists used Nextstrain dashboards to track variant emergence and spread.
I help maintain the Africa CDC Nextstrain instance, which provides genomic epidemiology visualization for the continent. During COVID-19, this infrastructure enabled tracking of variant introductions and local transmission chains across African countries.
Pangolin: Automated Lineage Assignment
Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) automates SARS-CoV-2 lineage classification.
What it does:
- Assigns SARS-CoV-2 sequences to Pango lineages (B.1.1.7, BA.2, etc.)
- Uses machine learning for rapid classification
- Enables consistent lineage nomenclature globally
- Processes sequences in seconds rather than requiring manual phylogenetic analysis
Why it matters for biosurveillance:
| Task | Without Pangolin | With Pangolin |
|---|---|---|
| Lineage assignment | Manual phylogenetic analysis (hours) | Automated classification (seconds) |
| Consistency | Variable interpretation | Standardized nomenclature |
| Scale | Limited throughput | Millions of sequences classified |
Case Study: Omicron Detection
The detection of the Omicron variant in November 2021 demonstrated functional genomic surveillance infrastructure.
Timeline:
- November 9, 2021: First Omicron case (later identified retrospectively) collected in South Africa
- November 22, 2021: South African scientists notice S-gene target failure (SGTF) in TaqPath PCR assays
- November 23, 2021: Sequences submitted to GISAID
- November 24, 2021: Nextstrain visualization shows distinct lineage with explosive growth
- November 25, 2021: South Africa reports new variant to WHO
- November 26, 2021: WHO designates Omicron as Variant of Concern
What enabled this speed:
- Automated SGTF alerts from diagnostic platforms
- Established sequencing capacity at South African laboratories
- Immediate GISAID submission (data sharing norms)
- Nextstrain visualization showing phylogenetic distinctiveness
- Pangolin assigning new lineage designation
The entire detection-to-global-alert cycle took approximately two weeks. Without this infrastructure, detection could have taken months.
South Africa’s reward for transparent, rapid reporting was immediate travel bans from over 40 countries. The economic impact exceeded $63 million (R1 billion) in documented cancellations during the December-March tourism season, according to FEDHASA/SATSA industry surveys.
This created a dangerous precedent: countries that invest in surveillance capacity may face economic punishment, while countries that lack capacity (or choose not to report) face no consequences.
See Global Surveillance Equity for detailed discussion of surveillance equity and incentive structures.
Infrastructure Requirements
Functional genomic surveillance requires investment across multiple domains:
Laboratory capacity:
- Sequencing equipment (Illumina, Oxford Nanopore)
- Trained laboratory technicians
- Quality control protocols
- Biosafety infrastructure for handling pathogens
Bioinformatics:
- Computational infrastructure for sequence analysis
- Trained bioinformaticians
- Standardized pipelines (assembly, quality filtering, lineage assignment)
- Data storage and management systems
Data sharing:
- GISAID accounts and submission workflows
- Metadata standards and collection systems
- Institutional agreements enabling data release
Integration:
- Connection between diagnostic laboratories and sequencing centers
- Links to epidemiological investigation teams
- Communication channels to public health decision-makers
Most high-income countries have this infrastructure. Most low-income countries do not. This asymmetry creates global surveillance blind spots.
The Nucleic Acid Observatory: Pathogen-Agnostic Detection
The Concept
Current surveillance looks for known threats. What if we could detect unknown pathogens before they are characterized?
The Nucleic Acid Observatory (NAO) concept proposes continuous metagenomic monitoring of environmental samples (wastewater, airport air filters) to detect exponential growth of any nucleic acid sequence, regardless of identity.
The approach:
- Collect environmental samples regularly from sentinel sites
- Perform untargeted metagenomic sequencing (sequence everything)
- Use AI to detect sequences showing exponential growth over time
- Flag anomalies for investigation, even if the organism is unknown
This “pathogen-agnostic” approach could theoretically detect novel threats before they are identified, including engineered pathogens that might not match existing reference databases.
Current Status
Pilot stage. The NAO concept is being developed by researchers affiliated with the Secure DNA project and others. Proposals have outlined infrastructure requirements and cost estimates.
Challenges:
- Noise: Environmental samples contain vast amounts of irrelevant genetic material
- Cost: Untargeted sequencing at scale is expensive
- Interpretation: Distinguishing concerning signals from background variation requires sophisticated analysis
- Response: Even if anomalies are detected, investigation and confirmation require traditional public health infrastructure
The concept is promising but not yet operational. Investment in pilot projects could determine feasibility.
Integrating Digital Biosurveillance: Design Principles
The Integration Challenge
Digital biosurveillance tools fail when they exist as standalone dashboards disconnected from response capacity. Design principles that increase effectiveness:
1. Define the decision.
Every surveillance system should answer: “What decision does this inform, and who makes it?”
- Wearable anomaly detected → Trigger epidemiological investigation?
- New sequence lineage identified → Notify WHO? Update travel guidance?
- Metagenomic signal flagged → Dispatch field team?
Systems without clear decision pathways generate noise, not intelligence.
2. Build for integration.
Outputs must connect to existing workflows. A genomic surveillance dashboard is useless if public health officers do not check it. Integration options:
- Direct alerts to designated personnel
- Integration with existing surveillance platforms
- Automated reporting to required notification systems
3. Plan for false positives.
Every sensitive detection system generates false alarms. Design must include:
- Triage protocols for initial signal assessment
- Defined escalation pathways
- Feedback loops to improve specificity over time
4. Maintain during peacetime.
Surge capacity requires baseline investment. Systems built during emergencies and defunded afterward will not be available for the next crisis.
When assessing a new digital biosurveillance technology, ask:
1. What is the evidence base? - Peer-reviewed validation studies? - Prospective or retrospective analysis? - Sample size and population representativeness?
2. What decision does this inform? - Clear use case or exploratory tool? - Defined thresholds for action? - Specified end-users?
3. How does this integrate with existing systems? - Standalone dashboard or integrated workflow? - Interoperability with current infrastructure? - Training and maintenance requirements?
4. What are the failure modes? - False positive rate and management protocol? - False negative consequences? - Performance during surge conditions?
5. What are the equity implications? - Who is covered and who is missed? - Does this widen or narrow existing surveillance gaps? - Privacy and consent frameworks?
Can wearable devices like Fitbit and Apple Watch detect disease outbreaks before traditional surveillance systems?
Research shows promise for individual illness detection but not yet population-level outbreak warning. The Scripps DETECT study found wearables could identify COVID-19 with 80% accuracy among symptomatic individuals, and Stanford research showed 63% of cases had detectable physiological changes before symptom onset. However, these were retrospective analyses. No prospective study has demonstrated outbreak detection before traditional systems, and wearable users skew young, affluent, and urban, creating significant coverage bias.
What is AI-enabled genomic surveillance and how does it work?
AI-enabled genomic surveillance uses automated pipelines to rapidly analyze pathogen sequences and track variant emergence in real-time. Nextstrain provides interactive phylogenetic visualization integrating genomic, geographic, and epidemiological data. Pangolin uses machine learning to assign SARS-CoV-2 sequences to lineages in seconds rather than hours of manual analysis. During COVID-19, this infrastructure enabled South Africa to detect Omicron and alert WHO within two weeks of first sample collection.
What is pathogen-agnostic surveillance and is it operational?
Pathogen-agnostic surveillance uses metagenomic sequencing to detect exponential growth of any nucleic acid sequence, including unknown pathogens. The Nucleic Acid Observatory concept proposes continuous environmental sampling (wastewater, air filters) with AI analysis to flag anomalies before organisms are characterized. This could theoretically detect engineered pathogens not matching reference databases. However, it remains in pilot phase with challenges including noise from irrelevant genetic material, high sequencing costs, and interpretation complexity.
Which digital biosurveillance technologies are actually operational versus still in research phase?
AI-enabled genomic surveillance (Nextstrain, Pangolin) is operational and proved value during COVID-19 for real-time variant tracking. Wearable-based early warning remains research-stage with no operational public health deployment despite promising individual detection studies. Pathogen-agnostic metagenomic surveillance is in pilot phase, years from operational deployment. The critical gap is not technology but integration. The best detection system is worthless if outputs don’t connect to response capacity and decision-making workflows.
This chapter is part of The Biosecurity Handbook. For related content, see Outbreak Detection and Surveillance, AI for Biosecurity Defense, and Global Surveillance Equity.