AI-Enabled Pathogen Design

In 2024, the Nobel Committee awarded its Chemistry Prize to David Baker, Demis Hassabis, and John Jumper for computational protein design and structure prediction, recognition that AI has fundamentally changed how we engineer biology’s most important molecules. What once required years of crystallography now takes minutes on a server. This transformation from discovery science to engineering discipline creates both unprecedented opportunities for drug development and serious concerns: the same AI that designs therapeutic proteins can theoretically design custom toxins, though most AI-designed proteins still fail when tested in actual wet-lab conditions.

Learning Objectives
  • Distinguish between protein structure prediction (AlphaFold) and generative design (RFdiffusion, ESM3) in biosecurity context
  • Explain the “Screening Gap” and why traditional DNA screening may fail against AI-designed sequences
  • Analyze the three-tier risk framework from the 2024 NASEM report
  • Evaluate the dual-use potential of biological design tools using real case studies
  • Apply practical governance questions to assess emerging AI-biology platforms
  • Recognize the persistent wet-lab barriers that limit AI-enabled pathogen design today
Scope of This Chapter

This chapter discusses biosecurity risks at a conceptual level appropriate for education and policy analysis. Consistent with responsible information practices:

  • Omitted: Actionable protocols, specific synthesis routes, exact pathogen sequences
  • Included: Risk frameworks, governance mechanisms, policy recommendations

For detailed biosafety protocols, consult your Institutional Biosafety Committee and relevant regulatory guidance.

The Shift: While LLMs (LLMs and Information Hazards) lower barriers to knowledge, Biological Design Tools (BDTs) lower barriers to expertise. We have moved from predicting structures to generating entirely new proteins from scratch.

The 2024 Nobel Prize Context: The Nobel Prize in Chemistry was awarded to David Baker, Demis Hassabis, and John Jumper for computational protein design and structure prediction - recognition that AI is fundamentally changing biological engineering.

Three Categories of Risk (NASEM 2025):

Risk Category Current AI Capability Barrier Level
Toxin design Can assist Low-Medium
Pathogen modification May partially assist Medium-High
De novo virus design Far beyond current capabilities Very High

The MegaSyn Wake-Up Call: The 2022 Urbina experiment showed drug discovery AI could generate 40,000 toxic molecules in 6 hours - including structures overlapping known CWAs like VX plus novel analogs - by inverting the optimization function. Synthesizability was not validated.

The Screening Gap: AI can design proteins with the same function but completely different sequences, bypassing current pattern-matching screens. This is the “Unknown Unknown” problem.

Reality Check: Most AI-designed proteins fail in the wet lab. Biology’s complexity - folding errors, expression failures, unexpected interactions - remains a significant barrier. AI has lowered design from “impossible” to “possible,” but not execution from “hard” to “easy.”

Bottom Line: These tools offer immense benefits for drug discovery while introducing dual-use concerns requiring ongoing vigilance rather than alarm.

Introduction: From Discovery to Engineering

In October 2024, the Nobel Committee awarded its Chemistry Prize to David Baker, Demis Hassabis, and John Jumper for computational protein design and structure prediction. The announcement recognized advances in computational protein design and structure prediction - much of it ML/AI-enabled - fundamentally changing how we understand and create biology’s most important molecules.

In the early days of structural biology, determining the 3D structure of a protein was a PhD-level project. It involved X-ray crystallography, years of trial and error, and often ended in failure. We treated biology as a discovery science - we went out and found things.

Today, biology is becoming an engineering discipline.

The first time researchers use AlphaFold is often revelatory. What once required months of experimental crystallography now takes minutes on a server. The predicted structures are not always perfect, but they are often good enough to design experiments around. That shift from months to minutes, from uncertain to actionable, represents both the promise and the concern at the heart of this chapter.

For a physician, this is a miracle (custom therapeutics). For a biosecurity expert, it is a concern (custom toxins). The same capabilities that accelerate drug discovery can theoretically be repurposed toward harmful ends.

The Revolution in Protein Structure Prediction

For over 50 years, predicting how a string of amino acids would fold into a three-dimensional protein structure was one of biology’s grand challenges. Then came AlphaFold.

AlphaFold2, released in 2020 by Google DeepMind, achieved accuracy comparable to experimental methods in the biennial Critical Assessment of Protein Structure Prediction (CASP) competition. By 2022, DeepMind and EMBL-EBI had released predicted structures for essentially all known proteins - roughly 200 million structures - transforming what was once a bottleneck into a freely available resource.

What AlphaFold Does and Does Not Do

Understanding the limitations is essential for calibrated risk assessment:

AlphaFold can: - Predict static protein structures from amino acid sequences with high accuracy - Map mutations onto structures to reason about binding or antigenicity - Accelerate drug discovery by revealing therapeutic targets

AlphaFold cannot: - Design proteins with specific functions from scratch - Reliably predict how mutations affect pathogenicity or transmissibility - Model the complex dynamics of viral assembly or host interactions - Generate novel harmful sequences without additional tools

AlphaFold2 also struggles with intrinsically disordered regions, conformational dynamics, and multimeric assemblies; predicted models are static snapshots, not full interaction or motion maps.

As DeepMind’s biosecurity analysis for AlphaFold3 notes, structure prediction is necessary but not sufficient for most dual-use scenarios of concern.

The “Unmasking” Risk

Structure prediction’s primary biosecurity concern is not about creating new weapons - it is about understanding existing ones better.

Target Identification: To optimize a toxin, you need to know exactly how it binds to human receptors. AlphaFold provides this “lock and key” map.

The Unknown Function Problem: Public genomic databases contain millions of sequences labeled “hypothetical protein.” AlphaFold can, in principle, reveal that an innocuous-looking sequence from soil bacteria is structurally similar to a lethal toxin. This “unmasks” potential threats hidden by our previous ignorance.

The marginal risk here is real but manageable - we already live in a world of mapped toxins.

Generative Design: From Prediction to Creation

If structure prediction was the first revolution, generative design is the second. Rather than predicting what natural proteins look like, generative models create entirely new proteins that have never existed in nature.

RFdiffusion and De Novo Protein Design

RFdiffusion, developed at the Baker Lab in 2023, adapts the diffusion model approach familiar from image generators to protein backbones. The tool starts with random noise and iteratively refines it into realistic protein structures, guided by user specifications.

Want a protein that binds to a specific target? Specify the binding site geometry and let the model design a scaffold around it. The results have been remarkable - RFdiffusion can design picomolar-affinity binders with ~10-15% experimental success (correctly folded binders meeting target affinity) per the 2023 Nature study.

ESM3: Sequence, Structure, and Function

ESM3, released by EvolutionaryScale in 2024, goes further by integrating sequence, structure, and function into a single generative model. Trained on 2.78 billion proteins, ESM3 can follow complex prompts combining multiple modalities.

In a striking demonstration, the team used ESM3 to generate a novel green fluorescent protein with only 58% sequence identity to any known fluorescent protein. The authors estimate - per their blog announcement - that this level of divergence would require over 500 million years of natural evolution.

Case Study: The MegaSyn Experiment (2022)

This is the clearest demonstration of dual-use potential from generative AI.

The Context: Collaborations Pharmaceuticals, a small drug discovery company, was invited to present at Spiez CONVERGENCE, a Swiss conference on emerging threats. Asked to discuss how AI might be misused, they decided to actually try inverting their drug discovery platform.

The Experiment: They took MegaSyn, a model trained to avoid toxicity in drug candidates, and simply flipped the reward function to maximize toxicity instead.

The Result: In less than 6 hours, on a consumer laptop, the model generated roughly 40,000 potentially toxic molecules. The model reproduced structures overlapping known CWAs (e.g., VX analogs) and proposed novel toxic analogs; synthesizability and ADMET properties were not validated.

The Lesson: The same scoring functions that guide molecules away from toxicity can be trivially inverted to guide toward it. This is not a bug - it is how these systems work.

The researchers deliberately did not assess synthesizability or explore how to make the molecules. They recognized an ethical boundary and stopped. But the proof of concept was clear.

The “Screening Gap”: Unknown Unknowns

This is the single most critical technical challenge in modern biosecurity.

How DNA Screening Works Today

When you order DNA from a synthesis company, they run your order through screening algorithms:

  • The Check: Does this sequence match Smallpox? Ricin? Anything on controlled lists?
  • The Mechanism: Homology search - matching genetic letters against known dangerous sequences

How AI Breaks This Model

Generative AI can design a protein that has the same shape (and therefore the same function) as a known toxin, but a completely different genetic sequence.

The Result: The screening algorithm sees “Sequence X.” It does not look like Ricin. It does not look like anything on the blacklist. It would approve the order. Most novel AI-generated sequences will not fold or function as intended, but the minority that do poses a screening blind spot.

The Problem: We have moved from a world of “known unknowns” to “unknown unknowns” (sequences that function dangerously but do not match anything in our databases).

The Fix: Function-Based Screening

SecureDNA represents a new approach. Rather than relying on sequence similarity alone, it uses:

  • Random adversarial threshold search looking for exact matches to short functional subsequences
  • Predicted variants that would preserve function
  • Screening down to 30 base pairs - far shorter than traditional approaches
  • Cryptographic protections to preserve both customer privacy and the hazard database

Baker and Church’s 2024 Science commentary proposes extending this specifically to AI-designed proteins - maintaining records of AI-generated sequences and screening them against synthesis orders.

Note: Deployment of function-based screening is uneven; most synthesis providers still rely on homology-based screening, so coverage remains incomplete.

The NASEM 2025 Framework: Calibrated Risk Assessment

The 2025 NASEM report “The Age of AI in the Life Sciences”, commissioned by the Department of Defense, provides the most comprehensive assessment to date. The committee examined three categories of risk:

1. Design of Biomolecules (Proteins and Toxins)

AI Capability: Can assist with this today.

Concern: Novel toxins could be designed to evade existing screening, which typically looks for known hazardous sequences. The MegaSyn experiment demonstrated this directly.

Mitigation: Baker and Church’s proposal for universal screening that includes structure-based checks.

2. Modification of Existing Pathogens

AI Capability: May partially assist.

Concern: AI might help identify mutations that increase pathogenicity or transmissibility.

Limitation: Predicting how specific mutations affect complex phenotypes requires training data that largely does not exist. The biological mechanisms connecting sequence changes to pandemic potential are poorly understood even by experts.

3. De Novo Virus Design

AI Capability: Far beyond current capabilities.

Concern: Creating a functional virus from scratch.

Reality: Even with ideal AI tools, constructing replication-competent viruses would require experimental biology expertise that the tools themselves cannot provide. The “digital-to-physical” barrier remains substantial.

The Complexity Barrier

A recurring theme in biosecurity risk assessment is biological complexity:

  • Molecules have defined structures; viruses are dynamic machines
  • Toxicity can be predicted from binding affinity; transmission requires host interactions across multiple systems
  • Small molecule synthesis is routine; creating functional viral genomes requires specialized expertise
  • Drug activity is measurable in vitro; pathogen behavior can only be assessed in complex biological systems

This complexity is both a natural barrier and a reason for humility in prediction.

What We Know vs. What Remains Uncertain

Demonstrated (supported by published evidence):

  • AlphaFold predicts static protein structures with high accuracy
  • Generative design tools (RFdiffusion) can create novel proteins with ~10-15% experimental success
  • Drug discovery AI can be inverted to generate toxic molecules (MegaSyn - 40,000 in 6 hours)
  • ESM3 generated a novel GFP with only 58% sequence identity to known proteins
  • Most AI-designed proteins fail in wet-lab validation

Theoretical (plausible but not yet demonstrated):

  • AI-designed toxins evading function-based screening at scale
  • AI predicting gain-of-function mutations for pathogens with useful accuracy
  • AI-designed sequences being successfully synthesized and weaponized
  • AI substantially accelerating pathogen modification by sophisticated actors

Beyond current capabilities (no credible pathway with existing technology):

  • De novo design of replication-competent viruses
  • AI systems that can execute wet-lab work autonomously
  • AI replacing the tacit knowledge required for pathogen work
  • AI predicting pandemic potential from sequence alone

The NASEM framework reflects this hierarchy: toxin design assistance is possible today, pathogen modification may be partially assisted, and de novo virus design remains far beyond current AI capabilities.

A 2025 RAND Delphi study surveyed biology and AI experts on theoretical limits of AI-enabled pathogen design. Key findings: (1) limits are interdependent and context-dependent, (2) AI effectiveness depends heavily on biological data quality, and (3) no strong fundamental limit to AI capabilities was identified, though significant near-term barriers exist. Through 2027, AI is expected to remain an assistive tool rather than an autonomous driver of biological design.

A 2025 GovAI framework translates such capability findings into quantified risk estimates, suggesting that even modest capability increases could result in significant population-level harm when aggregated across potential actors.

Reality Check: The Wet Lab Failure Rate

I must pause here to prevent AI hype.

Just because an AI designs a protein does not mean it will work. In fact, most do not.

The Hallucinated Binder Problem

Recent preprints testing generative design tools found that while they could generate thousands of candidate designs, the vast majority failed to bind their targets in actual wet lab experiments. Biology is governed by physics, water dynamics, temperature, and cellular context - factors that AI models still struggle to simulate.

This is temporarily good news for biosecurity - the tacit knowledge and experimental skills required for pathogen work cannot be fully encoded in AI training data. Whether this remains a meaningful barrier as automation advances is an open question.

Responsible Development Practices

The AI biology community has increasingly adopted explicit biosecurity practices.

DeepMind: AlphaFold Biosecurity Assessment

Google DeepMind’s approach to AlphaFold3 included:

  • Pre-release biosecurity consultations with external experts
  • Publication of their risk analysis
  • Implementation of targeted server-side restrictions for potentially concerning use cases

EvolutionaryScale: Tiered Access

ESM3 was released with safeguards - a smaller open version (ESM3-open) with the full model available through a controlled API. This tiered approach allows broad scientific use while maintaining oversight of more capable versions.

The “If-Then” Strategy

Perhaps the most practical recommendation from the 2025 NASEM report is the proposed “if-then” strategy for ongoing assessment:

  1. Track data availability: The quality and quantity of training data determines capabilities. Monitoring what biological data becomes available provides early warning.

  2. Define capability benchmarks: Rather than vague concerns, establish specific testable thresholds. Can the model predict gain-of-function mutations? Design immune-evasive proteins?

  3. Establish trigger thresholds: When capabilities cross defined thresholds, predetermined responses activate - enhanced access controls, expanded screening, updated oversight.

  4. Regular reassessment: This should be continuous, not a one-time evaluation. As AI and biology both advance, the intersection requires ongoing attention.

Practical Governance Questions

Most readers will not be training protein models or building BDTs. Instead, you might be:

  • Sitting on an ethics or DURC committee
  • Advising a health ministry on AI investments
  • Reviewing funding proposals that mention “AI for biological design”
  • Contributing to international discussions on biosecurity norms

In those roles, you can ask sharp, concrete questions:

Checklist: Evaluating AI Biology Platforms

1. What kind of tool is this? - Structure prediction service, generative design platform, or full BDT with synthesis integration? - What user groups is it intended for?

2. What guardrails exist at the system level? - Is there identity verification for users? - Are high-risk features (direct synthesis ordering, batch design) limited to vetted entities? - Are logs kept, and who can audit them? - Are prompts/designs logged with anomaly detection and auditable retention to trace potential misuse?

3. How does this interact with existing biosafety? - Does the system assume synthesis screening at downstream providers? - Are users given biosafety guidance when designing constructs?

4. How is misuse potential evaluated? - Has anyone tested whether the tool can bypass current screening? - Is there a plan for repeat evaluations as the system updates?

5. What is the escalation path? - Who can suspend accounts or notify authorities? - Are there clear thresholds for concern?

These questions are practical tools for steering investments toward platforms that take biosecurity seriously.

Benefits for Biosecurity

The same tools that create dual-use concerns can strengthen defense. This symmetry is often lost in discussions focused exclusively on risks.

Accelerated Countermeasures: AI-enabled medical countermeasure development could dramatically accelerate responses to novel pathogens. During COVID-19, the unprecedented speed of vaccine development still took nearly a year. AI tools could compress timelines further - though wet-lab throughput, regulatory timelines, and manufacturing scale-up remain rate-limiting.

Improved Biosurveillance: AI can analyze genetic sequences from environmental samples and clinical cases to detect emerging threats earlier. Pattern recognition across vast datasets could identify unusual clusters that human analysis might miss.

Enhanced Screening: SecureDNA itself relies on AI to predict which sequence variants would remain functional, enabling screening that catches not just known hazards but their predicted equivalents.

The challenge is ensuring that defensive applications keep pace with - or stay ahead of - potential offensive uses.

The next chapter examines how AI can be applied defensively for biosecurity - accelerating detection, attribution, and countermeasure development.

What is the difference between AlphaFold and RFdiffusion?

AlphaFold predicts the 3D structure of existing proteins from their amino acid sequences. RFdiffusion generates entirely new protein designs from scratch based on user specifications. AlphaFold is a prediction tool; RFdiffusion is a creation tool. This distinction matters for biosecurity because generative design tools can create novel sequences that evade traditional screening.

Can AI currently design functional bioweapons?

No. While AI can assist with toxin design (NASEM Tier 1), it struggles with pathogen modification (Tier 2) and cannot design replication-competent viruses from scratch (Tier 3). Most AI-designed proteins fail in wet-lab validation, and the tacit knowledge required for pathogen work cannot be fully encoded in training data. The biological complexity barrier remains substantial, though this may change as automation advances.

What is the “screening gap” and why does it matter?

Traditional DNA screening works by matching sequences against known dangerous pathogens. AI can design proteins with the same function but completely different sequences, bypassing this pattern-matching approach. SecureDNA addresses this gap through function-based screening that predicts sequence variants preserving dangerous capabilities, but deployment remains incomplete across synthesis providers.

What was the MegaSyn experiment and what did it prove?

The 2022 MegaSyn experiment inverted a drug discovery AI’s reward function from avoiding toxicity to maximizing it. In 6 hours on a consumer laptop, it generated 40,000 potentially toxic molecules including VX analogs and novel structures. This demonstrated that dual-use tools can be trivially repurposed for harm, though synthesizability was not validated and most designs would likely fail in practice.


This chapter is part of The Biosecurity Handbook. For related content, see the previous chapters on AI as a Biosecurity Risk Amplifier and LLMs and Information Hazards.