LLMs and Information Hazards

In 2023, MIT students tasked chatbots with assisting pandemic pathogen creation. Within one hour, the LLMs suggested four potential pandemic pathogens, explained reverse genetics protocols, supplied names of DNA synthesis companies unlikely to screen orders, and recommended contract research organizations for those lacking laboratory skills. Yet RAND’s controlled experiments the following year found no statistically significant difference in biological attack plan viability between groups with LLM access and those using only internet search, revealing the gap between information access and operational capability.

Learning Objectives
  • Define information hazards in the context of biosecurity and dual-use research, including the Bostrom typology
  • Evaluate LLM capabilities and limitations for providing actionable biological attack guidance
  • Analyze methodologies used in biosecurity evaluations (RAND, OpenAI, Anthropic, Gryphon)
  • Distinguish between theoretical information access and operational attack capability
  • Explain why AI hallucinations currently function as an unintentional safety barrier
  • Assess the effectiveness of current LLM safety measures (RLHF, refusals, guardrails)
  • Apply red-teaming concepts to evaluate AI systems for biosecurity risks
Scope of This Chapter

This chapter discusses biosecurity risks at a conceptual level appropriate for education and policy analysis. Consistent with responsible information practices:

  • Omitted: Actionable protocols, specific synthesis routes, exact pathogen sequences
  • Included: Risk frameworks, governance mechanisms, policy recommendations

For detailed biosafety protocols, consult your Institutional Biosafety Committee and relevant regulatory guidance.

What Are Information Hazards? Information that, if disseminated, could enable harmful actions - even when generated without malicious intent. In biosecurity, this includes pathogen enhancement techniques, synthesis routes, and operational attack planning.

Do LLMs Create New Information Hazards?

The evidence is nuanced: - Access: LLMs can synthesize dual-use biological knowledge from training data, presenting it accessibly - Novelty: Current models do not generate genuinely novel attack methodologies beyond published literature - Uplift: RAND (2024) found no statistically significant improvement in attack plan viability with LLM access - Trend: Capabilities are advancing; what was “mild uplift” in 2023-2024 may change rapidly

Key Evaluation Findings:

Study Model Finding
OpenAI (2024) GPT-4 “At most mild uplift” vs. internet search
Anthropic (2024) Claude 3 Uplifted novices in acquisition steps, not experts
RAND (2024) Multiple No significant difference in plan viability

The Jailbreaking Problem: Safety guardrails can be bypassed. Studies show adversarial prompts can elicit harmful outputs from aligned models. This makes evaluating “base” capabilities separate from deployed safety measures essential.

Current Mitigations: - RLHF training to refuse harmful requests - Content filtering and output monitoring - Usage policies and terms of service - Ongoing red-teaming and capability evaluations

Bottom Line: LLMs appear to provide marginal uplift for biological attack planning today - primarily by making existing public information more accessible. The “tacit knowledge” barrier (hands-on skills) remains a significant defense, though emerging multimodal AI and cloud laboratories may erode this barrier. Capabilities are advancing rapidly; these assessments are snapshots in time, not permanent conclusions.

Introduction: Information Hazards in the AI Era

The concept of information hazards - information that could enable harm simply by being known - has long concerned biosecurity practitioners. The 1975 Asilomar Conference grappled with whether certain recombinant DNA techniques should be published. The 2011-2012 H5N1 gain-of-function controversy raised questions about whether transmissibility-enhancing mutations should be disclosed.

Large Language Models introduce a new dimension to this challenge. Unlike journal articles read by specialists, LLMs can synthesize information from across the scientific literature and present it in accessible formats to anyone who asks. They can answer follow-up questions, provide troubleshooting guidance, and adjust explanations to different expertise levels.

The question is not whether LLMs could provide harmful biological information - they clearly can access dual-use knowledge from their training data. The questions are:

  1. Does LLM access meaningfully increase attack capability beyond existing resources?
  2. Can safety measures effectively prevent misuse while preserving beneficial uses?
  3. How should we evaluate and govern these systems as capabilities advance?

This chapter examines the empirical evidence on these questions: what we know, what remains uncertain, and what practitioners should understand about LLM information hazards.

Defining Information Hazards: The Bostrom Typology

The philosopher Nick Bostrom formally defined an information hazard as “a risk that arises from the dissemination of (true) information” that may cause harm or enable some agent to cause harm. This framing is essential: information hazards involve true information - not misinformation or deception.

In biosecurity, information hazards can be categorized by what they reveal:

Blueprint Hazards: Specific instructions enabling weapon creation. Examples include step-by-step protocols to synthesize poliovirus from mail-order DNA, or detailed procedures for weaponizing anthrax spores. These are the most direct form of information hazard.

Idea Hazards: General concepts that point adversaries in dangerous directions. For instance, suggesting “Have you considered using Variola minor instead of major?” or explaining that certain attenuated vaccine strains could be back-mutated to virulence. The specific protocol is not provided, but the conceptual direction is.

Signal Hazards: Information revealing that something is possible. The 2018 horsepox synthesis paper demonstrated that orthopoxviruses could be synthesized from commercially available DNA, even though the specific virus synthesized (horsepox) was not itself dangerous. The signal - “this class of viruses can be made from scratch” - was the hazard.

The Real LLM Hazard

The danger of LLMs is not that they invent new biology - they hallucinate too much for reliable novel design. The danger is that they lower the search cost for existing dangerous information.

LLMs can aggregate dual-use research scattered across thousands of papers, protocols, and databases. Information that was technically public but practically obscure becomes accessible through natural language queries. This aggregation capability is what distinguishes LLM information hazards from simple internet search.

What LLMs Can and Cannot Do

Capabilities

Based on publicly available evaluations and disclosed capabilities, current frontier LLMs can:

Synthesize Existing Knowledge: - Explain complex biological concepts (pathogen biology, immune evasion, transmission dynamics) - Describe laboratory protocols for working with dangerous agents - Summarize the dual-use research literature - Provide general troubleshooting guidance for molecular biology techniques

Answer Stepwise Questions: - Break down complex procedures into sequential steps - Clarify ambiguous instructions - Adapt explanations to stated expertise levels - Suggest alternative approaches when asked

Access Specific Information: - Name pathogens with pandemic potential - Describe known enhancement mutations - Explain DNA synthesis and assembly methods - Discuss historical bioweapons programs

Limitations

Current LLMs cannot:

Generate Novel Attack Methodologies: - They synthesize from training data, not from first principles - Novel pathogen designs require specialized biological design tools (BDTs), not text models - Genuinely new attack vectors are not found in published literature

Provide Operational Specifics: - Exact synthesis routes for regulated toxins typically are not in training data - Supplier names, order procedures, and evasion techniques have limited coverage - Real-time information (current regulations, screening practices) is outdated

Bridge the Tacit Knowledge Gap (for now): - Text descriptions cannot substitute for hands-on laboratory training - Troubleshooting real experiments requires physical observation - Equipment operation, technique execution, and quality assessment require practice - However: Emerging multimodal AI (GPT-5.2, Claude Opus 4.5, Gemini 3 Pro) can observe laboratory technique via video and provide real-time correction. This barrier may erode faster than previously assumed.

Guarantee Accuracy: - Hallucinations are common in technical domains - Safety-critical details may be confidently stated but incorrect - Following LLM protocols without expertise could result in failed (or dangerous) experiments

The Hallucination Problem

LLMs can confidently provide incorrect information. In biosecurity contexts, this is a double-edged sword - and paradoxically, hallucinations currently function as an unintentional safety feature.

For would-be attackers: Incorrect protocols cause failures. A hallucinated synthesis route could waste months of effort. LLMs have been observed confidently describing chemical reactions that are thermodynamically impossible. For a novice without the expertise to evaluate claims, following hallucinated protocols is a trap.

For defenders: We cannot assume LLM errors provide reliable protection. Some outputs will be accurate, and sophisticated actors can cross-check accuracy against primary sources. Hallucination rates vary by domain - models are more reliable when information is well-represented in training data.

For evaluation: Studies measuring “information access” must distinguish between any response and correct response. An LLM that provides a detailed but wrong protocol has provided information without providing capability.

As one analysis noted, LLMs “simulate reasoning through statistical correlation, not symbolic operations” and “remain unsuitable for tasks requiring stateful logical chains.” For bioweapons development, where protocols must be followed exactly and errors can be fatal to the perpetrator, this imprecision matters.

The Empirical Evidence: Uplift Studies

Defining “Uplift”

How do we measure whether an LLM is actually dangerous? The key metric is uplift - the marginal advantage an adversary gains by using AI compared to using standard tools (Google, Wikipedia, textbooks, scientific literature).

Conceptually:

Uplift = Capability(Human + AI) − Capability(Human + Internet)

If the uplift is zero, the AI is not adding biosecurity risk - even if it answers dangerous questions - because the user could have found equivalent information through conventional research. The biosecurity-relevant question is never “Can an LLM provide this information?” but rather “Does LLM access meaningfully improve attack capability beyond existing resources?”

This framing acknowledges that dangerous information already exists in various forms. The question is whether AI access makes that information more accessible, synthesized, or actionable in ways that matter operationally.

RAND Corporation Red Team Study (2024)

The RAND study remains the most rigorous public assessment of LLM impact on biological attack planning.

Methodology: - Recruited participants with varying backgrounds - Randomly assigned to “Internet only” or “Internet + LLM” conditions - Asked to develop biological attack plans - Expert panel evaluated plan viability

Key Finding: > “Current AI models do not meaningfully increase the risk of a large-scale biological weapons attack.”

The study found no statistically significant difference in plan viability between conditions. LLMs helped with brainstorming and information synthesis, but this did not translate to more viable attack plans.

Limitations: - Constrained timeframe (hours, not weeks) - Participants were not actual threat actors - Evaluated planning, not execution - Frontier models continue advancing

OpenAI Biosecurity Evaluation (2024)

OpenAI’s study focused specifically on expert uplift with GPT-4.

Methodology: - Recruited biology PhD students and postdocs - Measured task performance with and without GPT-4 access - Tasks included troubleshooting, knowledge recall, and protocol design

Key Finding: > GPT-4 provides “at most a mild uplift” in biosecurity-relevant tasks.

Mean uplift was 0.88 points on a 10-point scale, not statistically significant. GPT-4’s primary value was saving time on literature search, not providing information unavailable through other means.

Notable Observation: Experts were better at extracting useful information from GPT-4 than novices. This suggests LLMs may amplify existing expertise rather than substitute for it.

Anthropic Claude Evaluation (2024)

Anthropic’s Claude 3 Model Card disclosed biosecurity evaluation results.

Key Finding: Claude 3 models “substantially increased risk in certain parts of the bioweapons acquisition pathway” for novices but did not appear capable of uplifting experts “to a substantially concerning degree.”

Interpretation: The “acquisition pathway” includes many steps - identifying agents, planning approaches, obtaining materials, executing procedures. LLMs may help with early planning steps while providing less value for later execution steps that require tacit knowledge and physical access.

Gryphon Scientific Assessment (2023-2024)

A Gryphon Scientific study for USAID and subsequent work with OpenAI took a more granular approach than other evaluations.

Key Finding - The “Post-Doc” Effect: While LLMs did not help with “grand strategy” - overall attack planning - they were surprisingly effective at troubleshooting specific laboratory problems. For targeted questions like “My viral culture is cloudy, what happened?” current frontier models performed at the level of a postdoctoral researcher.

Implication: This is where the real risk may lie. The bottleneck often is not finding the recipe - it is troubleshooting when the recipe fails. If LLMs can serve as a “Help Desk” for experimental troubleshooting, that represents genuine uplift even if overall planning capability remains unchanged.

Gryphon’s work also found that LLMs provided meaningful information for early-stage planning but were less helpful for overcoming “physical” barriers like material acquisition and laboratory execution - consistent with the tacit knowledge framework.

Synthesis: What the Evidence Shows

Attribute Evidence
Planning assistance Modest benefit; summarizes existing public information
Expert uplift Minimal - experts already know what LLMs provide
Novice uplift Greater relative benefit, but still significant gaps to execution
Novel methodologies None demonstrated - LLMs synthesize, not create
Operational guidance Limited - tacit knowledge and physical access remain barriers
Trend direction Capabilities advancing; assessments are snapshots in time
These Findings Are Snapshots in Time

The studies summarized above reflect specific model versions tested at specific points in time. “Mild uplift” with GPT-4 in 2024 does not guarantee “mild uplift” with GPT-5 or Claude 4. AI capabilities are advancing rapidly, and the marginal risk calculation could shift substantially with next-generation models. Policymakers should treat these findings as baselines for continuous monitoring, not as permanent assurances.

Safety Measures and Their Limitations

RLHF and Constitutional AI

Frontier LLMs are trained using Reinforcement Learning from Human Feedback (RLHF) and related techniques to refuse harmful requests. When asked to help create biological weapons, aligned models typically:

  • Decline to provide assistance
  • Explain why the request is problematic
  • Suggest legitimate alternatives (research, education)

Anthropic’s Constitutional AI builds on this by training models to follow explicit principles, including avoiding harm.

The Jailbreaking Problem

Safety training can be bypassed. Research on adversarial prompts demonstrates that aligned models can be manipulated to produce harmful outputs through:

  • Role-playing scenarios: “As a fiction writer researching…”
  • Hypotheticals: “For educational purposes, explain how one would…”
  • Token manipulation: Character-level tricks to bypass content filters
  • Multi-turn extraction: Building context across many messages

Implications for Biosecurity:

  1. Evaluations should test base capabilities: What can the model do if safety measures are bypassed?
  2. Jailbreaks proliferate: Once discovered, they spread through online communities
  3. Cat-and-mouse dynamics: Safety teams patch jailbreaks; new ones emerge
  4. Open-source models: Local deployment may remove API-level safeguards entirely
Evaluation Philosophy: Capability vs. Disposition

For biosecurity evaluation, we care about:

Capability: What could the model do if safety measures failed or were bypassed?

Disposition: What will the model do given current training and deployment?

A model with dangerous capabilities but strong guardrails is safer than one with moderate capabilities and weak guardrails. But capabilities are latent - they persist even when disposition changes through fine-tuning or jailbreaking.

The Anthropic AI Safety Levels (ASL) framework addresses this by tying safety requirements to capability thresholds, not just deployed behavior.

Content Filtering and Monitoring

Major LLM providers implement additional safeguards:

  • Input filtering: Detect and block suspicious queries
  • Output filtering: Screen responses for dangerous content
  • Usage monitoring: Flag accounts with concerning patterns
  • Rate limiting: Prevent systematic extraction attempts

These measures reduce casual misuse but are unlikely to stop determined, sophisticated actors who can use multiple accounts, obfuscated queries, or open-source models.

Constitutional Classifiers

The most robust current defense against jailbreaks is Constitutional Classifiers, developed by Anthropic. These systems use classifiers trained on synthetic data to monitor exchanges in real-time. The latest generation (Constitutional Classifiers++, January 2026) evaluates model outputs in the context of their inputs, addressing reconstruction attacks that fragment harmful requests and output obfuscation that disguises responses. Through over 1,700 hours of red-teaming, no universal jailbreak was discovered. See Red-Teaming AI Systems for Biosecurity Risks for technical details.

Red-Teaming for Biosecurity

What is Red-Teaming?

In security contexts, red-teaming involves adopting an adversarial mindset to identify vulnerabilities. For AI biosecurity, this means:

  • Attempting to extract harmful biological information
  • Testing whether safety measures can be bypassed
  • Evaluating whether outputs would be operationally useful
  • Simulating threat actor behavior and capabilities

Red-Teaming Methodologies

Expert Elicitation: Biosecurity experts attempt to use LLMs for harmful purposes, evaluating: - What information can be obtained? - How accurate and complete is it? - Would it advance an attack beyond public resources?

Structured Scenarios: Defined attack scenarios (specific pathogen, target, actor profile) guide evaluation: - Can the LLM provide relevant information for each attack stage? - Where do guardrails activate? - What gaps remain that other resources would fill?

Automated Probing: Systematic testing of model responses to biosecurity-relevant queries: - Gradient-based attacks to bypass safety training - Prompt injection techniques - Coverage mapping across pathogen/technique space

Challenges in Red-Teaming

Information hazard recursion: Documenting how to extract dangerous information creates another information hazard. Evaluation reports must balance transparency with security.

Expertise requirements: Effective red-teaming requires both AI/ML expertise and biosecurity domain knowledge - a rare combination.

Temporal limitations: Evaluations are snapshots. Model updates, new jailbreaks, and evolving techniques change the landscape continuously.

Generalization: Success or failure on specific test cases may not generalize to real threat scenarios.

For more on red-teaming methodologies, see the Red-Teaming AI Systems chapter later in Part IV.

The Information Hazard Landscape

Categories of Concern

Dual-Use Knowledge: - Enhancement mutations increasing transmissibility or virulence - Immune evasion modifications - Stability and environmental persistence optimization

Operational Information: - DNA synthesis providers and their screening practices - Acquisition routes for precursor materials - Evasion techniques for regulatory controls

Synthesis and Assembly: - Protocols for assembling dangerous pathogens from synthetic DNA - Reverse genetics systems for RNA viruses - Chimeric agent construction methods

Historical and Technical References: - Declassified bioweapons program documents - Technical manuals from state programs - Academic papers with dual-use content

What’s New vs. What’s Accessible

A critical question is whether LLMs provide access to information that was previously unavailable or merely make accessible information easier to find.

Genuinely restricted information: - Classified intelligence on state programs - Unpublished proprietary research - Controlled technical data (export-restricted)

LLMs cannot provide access to genuinely restricted information not in their training data.

Technically accessible but practically obscure information: - Information scattered across many sources - Content in foreign languages - Older literature not easily searchable

LLMs may make this information more accessible by synthesizing across sources.

Readily accessible information: - Published peer-reviewed literature - Government reports and guidelines - Textbooks and educational materials

LLMs primarily synthesize this category, information that was already accessible to determined individuals.

The Library Metaphor

A useful conceptual framework: LLMs function as librarians for an already-existing library, not as authors of new books. They can locate, synthesize, and explain information scattered across their training data, but they cannot create knowledge that does not exist in that corpus.

This distinction matters for risk assessment. The concern is not that LLMs will invent novel attack methodologies; current models lack that capability. The concern is that information requiring months of literature review becomes accessible through a few queries. The aggregate effect of reducing search costs across many potential actors may create marginal risk increases even when no individual query provides unprecedented information.

RAND’s finding that LLM outputs “generally mirrored information readily available on the internet” confirms this framing (Mouton et al., 2024). LLMs do not expand the library; they make it easier to navigate. For biosecurity, this means the primary risk is democratized access to existing dual-use knowledge, not AI-generated novelty.

This has policy implications. Restricting LLM access to biological information is unlikely to succeed when the underlying sources remain publicly available. More tractable interventions target the chokepoints where information must translate to physical capability: DNA synthesis screening, laboratory access controls, and material restrictions.

The Unmasking Hazard

Beyond providing dual-use knowledge directly, LLMs pose a less-discussed risk: unmasking previously obscured information through data aggregation.

Kevin Esvelt and colleagues have highlighted that LLMs can synthesize information across millions of sources to reveal patterns that individual sources do not disclose. If a publication states "A patient in [City X] with [Genetic Mutation Y] contracted [Virus Z]," an LLM scanning social media, genealogy databases, and public records could potentially triangulate the patient’s identity.

For biosecurity, unmasking risks include:

  • Vulnerability identification: Synthesizing public information to identify security gaps at specific facilities
  • Researcher identification: Connecting publications to identify key personnel with specific expertise
  • Supply chain mapping: Aggregating procurement data to map acquisition routes for regulated materials
  • Pattern recognition: Connecting seemingly unrelated dual-use papers to identify dangerous research directions

This means we cannot rely on "security by obscurity" - the assumption that dangerous information is protected simply by being scattered. If the component pieces exist online, even in fragments, LLMs may be able to stitch them together.

The "Google Plus" Era

We are currently in what might be called the "Google Plus" era of AI biosecurity risk. Current LLMs are essentially better search engines - they synthesize and present existing information more efficiently, but they do not yet possess the reasoning capabilities to design novel pathogens or the physical agency to synthesize them.

However, this era may be ending. In December 2025, OpenAI demonstrated GPT-5 iterating on wet-lab experiments, achieving a 79-fold efficiency improvement through AI-driven protocol optimization. The day an LLM is directly connected to a cloud laboratory is approaching faster than previously assumed. We address this convergence in the chapter on Cloud Labs and Automated Biology.

What We Know vs. What Remains Uncertain

Demonstrated (supported by published evidence):

  • LLMs provide “at most mild uplift” for biological attack planning (RAND 2024, OpenAI 2024, Anthropic 2024)
  • LLMs can synthesize scattered dual-use information more efficiently than search engines
  • RLHF and content filtering reduce casual misuse but can be bypassed
  • Information access alone is insufficient for operational capability
  • Current evaluations measure information access, not attack success

Theoretical (plausible but not yet demonstrated):

  • LLMs providing “significant uplift” with future model generations
  • Multimodal AI substantially eroding tacit knowledge barriers
  • LLMs connected to cloud labs enabling autonomous biological work
  • “Unmasking” attacks successfully identifying operational vulnerabilities

Unknown (insufficient evidence to assess):

  • Whether capability thresholds will be crossed with next-generation models
  • How quickly the “Google Plus” era will end
  • The effectiveness of safety measures against sophisticated adversaries
  • Whether open-source models will reach concerning capability levels

These distinctions inform appropriate policy responses: demonstrated risks warrant immediate intervention, while theoretical risks require monitoring frameworks and contingency planning.

Governance and Policy Implications

Current Approaches

Voluntary Industry Commitments: Major AI labs have committed to biosecurity evaluations before deploying frontier models. The Frontier Model Forum provides a venue for sharing best practices.

Government Engagement: The UK AI Security Institute (renamed from AI Safety Institute in February 2025) and U.S. Center for AI Standards and Innovation (CAISI) (renamed from AI Safety Institute in June 2025) conduct independent evaluations. Executive Order 14110 established reporting requirements for models above certain compute thresholds.

Research Community Norms: The biosecurity research community has developed norms around dual-use research of concern (DURC). Similar norms are developing for AI-biosecurity research.

Risk Quantification Frameworks: Recent work has begun translating capability evaluations into quantified risk estimates. A 2025 GovAI analysis estimated that a 10 percentage point increase in capable individuals could raise annual epidemic probability from 0.15% to 1.0%. Such frameworks help policymakers move beyond qualitative “uplift” language to inform specific resource allocation decisions.

Open Questions

Threshold setting: At what capability level should deployment require additional safeguards or restrictions?

Evaluation standardization: How can we create reproducible, comparable biosecurity evaluations across models and labs?

Open-source governance: How should openly released models - which cannot be centrally controlled - be governed?

International coordination: How do we prevent regulatory arbitrage where dangerous models are developed in permissive jurisdictions?

Recommendations for Practitioners

  1. Calibrate concern appropriately: Current evidence suggests LLMs provide marginal uplift. Don’t dismiss the risk, but don’t overweight AI relative to other biosecurity threats.

  2. Focus on chokepoints: DNA synthesis screening, laboratory access controls, and material restrictions remain more impactful than AI-specific interventions.

  3. Engage with evaluations: If you have biosecurity expertise, consider participating in red-team exercises and evaluations from AI labs and government agencies.

  4. Monitor trends: This is a rapidly evolving field. Today’s assessment may not hold in 1-2 years as capabilities advance.

  5. Support responsible disclosure: Report biosecurity vulnerabilities in AI systems through appropriate channels rather than publicizing them.

The next chapter examines AI-enabled pathogen design - where biological design tools (BDTs) may pose risks beyond information access. For the defensive side of the equation - how AI can strengthen biosurveillance, accelerate countermeasures, and detect novel threats - see AI for Biosecurity Defense.

What are information hazards in biosecurity contexts?

Information hazards are true information that could enable harmful actions when disseminated, even when generated without malicious intent. In biosecurity, this includes pathogen enhancement techniques, synthesis routes, and operational attack planning accessible through dual-use research literature. Nick Bostrom’s typology distinguishes blueprint hazards (specific instructions), idea hazards (dangerous conceptual directions), and signal hazards (revealing something is possible).

Do LLMs provide significant uplift for biological attacks?

Current evidence shows minimal uplift. RAND 2024 found no statistically significant difference in attack plan viability with LLM access. OpenAI 2024 reported GPT-4 provides “at most mild uplift” compared to internet search. Anthropic 2024 found Claude 3 uplifted novices in certain acquisition steps but not experts. LLMs synthesize existing information more efficiently but don’t overcome tacit knowledge barriers or enable operational execution.

Can AI safety measures like RLHF be bypassed?

Yes. Research on adversarial prompts demonstrates that aligned models can be manipulated to produce harmful outputs through jailbreaking techniques including role-playing scenarios, hypotheticals, token manipulation, and multi-turn extraction. This makes evaluating base capabilities separate from deployed safety measures essential, as capabilities persist even when disposition changes through fine-tuning or jailbreaking.

Why do AI hallucinations matter for biosecurity evaluation?

Hallucinations currently function as an unintentional safety feature. Incorrect protocols cause experimental failures, wasting attackers’ time and resources. However, sophisticated actors can cross-check accuracy against primary sources, so hallucinations don’t provide reliable protection. Evaluation studies must distinguish between “any response” and “correct response” when measuring information access versus actual capability provision.


This chapter is part of The Biosecurity Handbook. For foundational context, see the previous chapter on AI as a Biosecurity Risk Amplifier. For connections to broader governance, see International Governance and the BWC and Dual-Use Research of Concern.