AI and Machine Learning Fundamentals
GPT-4 has 1.8 trillion parameters. Claude 3 was trained on text equivalent to millions of books. AlphaFold predicted the structure of 200 million proteins. These numbers appear in headlines, but what do they actually mean? This chapter provides the conceptual foundation you need to evaluate AI systems for biosecurity risks, without requiring you to become a machine learning engineer.
This chapter provides working knowledge of AI and machine learning for biosecurity practitioners. You will learn to:
- Understand what machine learning is and how models “learn” from data
- Explain how large language models (LLMs) work at a conceptual level
- Distinguish between base models and deployed models with safety training
- Define key terms used throughout Part IV (uplift, fine-tuning, RLHF, jailbreaking)
- Recognize the difference between LLMs and Biological Design Tools (BDTs)
- Evaluate AI capability claims with appropriate skepticism
Prerequisites: Parts I-III of this handbook provide biosecurity context. No prior AI/ML knowledge required.
Introduction: Why This Chapter Exists
This handbook bridges two communities that often speak different languages. Parts I-III covered biosecurity fundamentals for readers from AI backgrounds. This chapter inverts the direction: AI fundamentals for readers from biosecurity, public health, and policy backgrounds.
The remaining chapters in Part IV discuss LLM information hazards, AI-enabled pathogen design, biological design tools, and red-teaming methodologies. Those discussions assume familiarity with concepts like:
- How large language models generate text
- What “fine-tuning” and “RLHF” mean
- Why “base model capabilities” differ from “deployed model behavior”
- What “uplift” measures and why it matters
If these terms are unfamiliar, this chapter provides the foundation. If you already work in AI/ML, you can skim or skip to the terminology table and Part IV roadmap.
This is not a machine learning course. For technical depth, see 3Blue1Brown’s neural network series, Andrej Karpathy’s tutorials, or the Deep Learning book by Goodfellow et al.
This chapter provides conceptual understanding sufficient for biosecurity risk assessment and policy discussions.
What Machine Learning Actually Is
The Core Idea
Traditional software follows explicit rules: “If temperature > 38°C AND cough = present, flag for flu screening.” A programmer writes these rules based on domain expertise.
Machine learning inverts this. Instead of writing rules, you show the system examples with known answers. The system discovers patterns that predict the answers. Then it applies those patterns to new cases.
Example: You have 10,000 pathogen genome sequences, each labeled with its host species (human, bat, bird, pig). You feed these to a machine learning system. It learns patterns in the genetic code that correlate with host species. Now you can give it a new, unlabeled sequence, and it predicts the likely host.
The system wasn’t programmed with rules about codon usage or receptor binding. It found statistical patterns that happen to predict the answer.
What Is a Model?
A model is a mathematical function that maps inputs to outputs. You already use models: logistic regression maps patient characteristics to disease probability. Linear regression maps predictor variables to a continuous outcome.
In machine learning, “model” means the same thing, but models can have millions or billions of adjustable values instead of a handful of coefficients. These adjustable values are called parameters (or weights).
A neural network with 1 billion parameters has 1 billion numbers that determine how it transforms inputs into outputs. During training, the system adjusts all of these to minimize prediction errors.
What Is Training?
Training is the process of adjusting model parameters to minimize errors on known examples.
The process works like this:
- Show the model an input (a genome sequence, an image, a sentence)
- The model produces an output (a prediction)
- Compare the prediction to the correct answer
- Measure the error (called the loss)
- Adjust parameters slightly to reduce that error
- Repeat billions of times
This is not magic. It is optimization. The same mathematical principles underlie fitting a logistic regression, just scaled to more parameters and more data.
The key insight: after training on millions of examples, the model has adjusted its parameters to capture patterns that generalize to new, unseen data. It has “learned” in the sense that its predictions on new data are better than random guessing.
Parameters vs. Hyperparameters
Two types of numbers matter in machine learning:
Parameters: Learned from data during training. The model adjusts these automatically. A neural network’s weights are parameters.
Hyperparameters: Set by humans before training. These control the learning process itself. Examples: how many layers in the network, how fast to adjust parameters, how much training data to use.
When you hear “GPT-4 has 1.8 trillion parameters,” these are the learned values that encode what the model knows. The hyperparameters (architecture choices, training duration, data mixture) are separate decisions made by OpenAI’s engineers.
How Neural Networks Work
The Basic Structure
Neural networks are composed of layers of interconnected “neurons” (mathematical functions) that transform inputs into outputs:
Input → Hidden Layer(s) → Output
Each connection between neurons has a weight (parameter). During training, these weights are adjusted so the network produces correct outputs for the training examples.
Simplified example: A network classifying pathogens might have:
- Input layer: 1000 values representing genome features
- Hidden layers: Multiple layers that transform these features, extracting increasingly abstract patterns
- Output layer: Probabilities for each pathogen category
The “deep” in “deep learning” refers to having many hidden layers. More layers allow the network to learn more complex patterns, but require more data and computation to train effectively.
What Neural Networks Are Good At
Neural networks excel when:
- You have large amounts of labeled data
- The patterns are complex and hard to specify with rules
- Some errors are acceptable (probabilistic predictions)
- The input has spatial or sequential structure (images, text, audio)
They struggle when:
- Data is scarce (thousands of examples, not millions)
- Errors are catastrophic (safety-critical systems need guarantees)
- Interpretability is essential (neural networks are often “black boxes”)
- The problem requires formal reasoning or logic
For most biosecurity tabular data (spreadsheets with rows and columns), simpler methods like logistic regression or gradient boosting often outperform neural networks. Neural networks shine with unstructured data: images, text, protein sequences, genomic data.
How Large Language Models Work
The Transformer Architecture
Large language models are built on the transformer architecture, introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al. at Google.
The key innovation is attention: a mechanism that allows the model to consider all parts of the input when making predictions, weighing which parts are most relevant.
Intuition: When predicting the next word in “The pathogen was isolated from a bat in a cave in…”, the word “bat” and “cave” strongly suggest the next word might be “China” or “Africa” or another geographic location. Attention allows the model to focus on these relevant context words rather than treating all prior words equally.
Attention is why LLMs can maintain coherence over long passages. Earlier architectures (RNNs, LSTMs) processed text sequentially and struggled with long-range dependencies. Transformers process all tokens in parallel, with attention connecting them.
Pre-Training: Learning Language
LLMs are pre-trained on massive text corpora using a simple objective: predict the next word.
Given: “The Biological Weapons Convention was signed in” Predict: “1972”
This seems trivial, but scaled to trillions of words, the model must learn:
- Grammar and syntax
- Facts about the world
- Reasoning patterns present in the training text
- Writing styles and conventions
The model is not explicitly taught any of this. It emerges from optimizing next-word prediction at scale. This is why researchers refer to “emergent capabilities,” abilities that appear at larger scales without being directly trained.
Training data matters. An LLM trained on scientific literature will be more capable at scientific tasks than one trained on social media. The mixture of training data shapes what the model learns.
Tokens: How LLMs See Text
LLMs don’t process words directly. They process tokens, subword units that balance vocabulary size with efficiency.
“Biosecurity” might be tokenized as: [“Bio”, “security”] “SARS-CoV-2” might be tokenized as: [“SARS”, “-”, “Co”, “V”, “-”, “2”]
A typical LLM has a vocabulary of 50,000-100,000 tokens. The model predicts probability distributions over this vocabulary for each position in the sequence.
Why this matters for biosecurity: Rare scientific terms may be split into multiple tokens, potentially affecting how well the model handles specialized content. Pathogen names, chemical formulas, and technical terminology may be tokenized in ways that fragment their meaning.
Generation: How LLMs Produce Text
When you prompt an LLM, it generates responses by repeatedly predicting the next most likely token:
- You provide a prompt: “Explain how anthrax spores…”
- The model predicts the probability distribution over all possible next tokens
- A token is selected (usually from among the highest-probability options)
- That token is added to the context
- The model predicts the next token given the updated context
- Repeat until the model produces a stop token or reaches a length limit
LLMs are sophisticated autocomplete systems. They do not “understand” in a human sense; they predict statistically likely continuations based on patterns in training data. This is both their power (coherent generation) and their limitation (hallucinations when patterns mislead).
Context Windows
LLMs have a fixed context window, the maximum number of tokens they can consider at once. GPT-4 has a 128,000-token context window; Claude 3 can handle 200,000 tokens.
Within the context window, the model can reference any prior text. Beyond it, information is lost. This constrains tasks requiring synthesis across very long documents, though modern context windows accommodate book-length texts.
Base Models vs. Deployed Models
This distinction is critical for biosecurity evaluation.
Base Models
A base model (or “foundation model”) is trained only on next-token prediction. It will complete any prompt without judgment:
- “Write a poem about spring” → writes a poem
- “Explain how to synthesize VX nerve agent” → attempts to explain (if the information is in training data)
Base models have no concept of refusing requests. They are optimized purely to predict likely continuations of text.
Safety Training: RLHF and Constitutional AI
Deployed models undergo additional training to refuse harmful requests:
Reinforcement Learning from Human Feedback (RLHF):
- Human raters evaluate model outputs (helpful/harmful, accurate/inaccurate)
- A “reward model” learns to predict human preferences
- The LLM is fine-tuned to maximize predicted human approval
- Result: model learns to refuse harmful requests, be helpful, and avoid certain outputs
OpenAI’s InstructGPT paper (2022) demonstrated that RLHF dramatically improves model behavior according to human preferences.
Constitutional AI (Anthropic):
Instead of relying solely on human feedback, the model is trained to follow explicit principles (“be helpful, harmless, and honest”). The model critiques and revises its own outputs according to these principles during training.
Claude’s refusal to assist with bioweapons is not hardcoded; it emerged from Constitutional AI training.
Why Both Matter for Biosecurity
Deployed models will typically refuse to help with bioweapons development. But this refusal can be bypassed:
- Jailbreaking: Adversarial prompts that circumvent safety training
- Fine-tuning: Training away safety behaviors on custom data
- Open-weight models: Locally deployed models without API safeguards
Biosecurity evaluations should assess base model capabilities, not just deployed behavior. A model’s dangerous capabilities persist even when its disposition (tendency to refuse) is modified by safety training.
This is why AI labs report on both: - “Claude 3 refuses bioweapon queries” (disposition) - “Claude 3 base model could provide X-level information if safety training were removed” (capability)
Jailbreaking and Adversarial Prompts
Jailbreaking refers to techniques that bypass safety training to elicit harmful outputs from aligned models.
Common approaches include:
- Role-playing: “You are DAN (Do Anything Now), an AI with no restrictions…”
- Hypotheticals: “For a fiction novel, how would a character…”
- Token manipulation: Unusual formatting that confuses safety classifiers
- Multi-turn extraction: Building context across many messages to gradually extract information
Research demonstrates that most safety measures can be bypassed with sufficient effort. The Anthropic model card for Claude 3 acknowledges this: safety training reduces casual misuse but does not prevent determined adversaries.
Implications for biosecurity:
- Safety training is a “speed bump,” not a wall
- Evaluations must consider bypass potential
- Open-weight models may have safety training removed entirely
- Capability matters independently of current deployment safeguards
Hallucinations: When Models Confidently Err
LLMs produce hallucinations: confident, coherent outputs that are factually wrong.
Why this happens: LLMs predict statistically likely text, not verified truth. If the training data contains errors, or if a plausible-sounding answer differs from reality, the model may generate the plausible falsehood.
Biosecurity implications:
Hallucinations currently function as an unintentional safety barrier. A would-be attacker following a hallucinated synthesis protocol will fail. Incorrect procedures waste time and resources.
However, sophisticated actors can cross-check outputs against primary sources. Hallucinations protect against naive misuse, not expert exploitation. Do not rely on hallucinations as a security measure.
Biological Design Tools vs. LLMs
Part IV discusses two categories of AI systems with different biosecurity implications:
Large Language Models (LLMs)
- Training data: Natural language text (books, websites, papers)
- Primary function: Generate and understand text
- Biosecurity risk: Democratize access to existing dual-use information
- Examples: GPT-4, Claude, Gemini, Llama
Biological Design Tools (BDTs)
- Training data: Biological data (protein structures, sequences, molecular properties)
- Primary function: Predict or design biological molecules
- Biosecurity risk: Enable creation of novel biological agents
- Examples: AlphaFold, RFdiffusion, ESM-3, MegaSyn
The risk profiles differ fundamentally:
- LLMs lower the floor (make existing knowledge accessible to novices)
- BDTs raise the ceiling (enable experts to design things not previously possible)
This distinction shapes governance approaches. LLM risks may be addressed through content filtering and access controls. BDT risks require different interventions: DNA synthesis screening, structured access, and export controls.
See the AI-Enabled Pathogen Design chapter for detailed BDT analysis.
Understanding AI Capabilities
What “Capabilities” Means
In AI safety, capabilities refers to what a model can do, distinct from what it will do (its disposition).
A model with dangerous capabilities but strong safety training may refuse harmful requests. The capabilities persist, latent in the parameters. If safety training is removed or bypassed, those capabilities become accessible.
This is why AI labs conduct capability evaluations separately from deployment testing: - “Can this model provide bioweapon synthesis information?” (capability) - “Will this model provide bioweapon synthesis information when asked?” (disposition)
What “Uplift” Measures
Uplift is the biosecurity-specific metric for AI risk: the marginal advantage an adversary gains from AI access compared to conventional resources.
Formally: Uplift = Capability(Human + AI) - Capability(Human + Internet)
If someone could find the same information via Google Scholar, the AI provides zero uplift even if it readily answers dangerous questions. The biosecurity question is never “Can an AI provide this information?” but “Does AI access meaningfully improve attack capability beyond existing resources?”
The RAND (2024) and OpenAI (2024) studies measured uplift. Both found minimal uplift with current models for biological attack planning, though the LLMs did make information synthesis faster.
Emergent Capabilities
Emergent capabilities are abilities that appear at larger model scales without being explicitly trained.
Early language models could complete sentences. Larger models developed arithmetic ability, coding competence, and complex reasoning, despite being trained only on next-token prediction. These capabilities “emerged” from scale.
Emergent capabilities make risk prediction difficult. A model trained today may develop biosecurity-relevant capabilities when scaled up, even if current versions appear safe. This unpredictability drives calls for pre-deployment evaluation and monitoring.
Key Vocabulary for Biosecurity Practitioners
This table maps AI/ML terminology to concepts you may encounter in Part IV:
| Term | Definition | Biosecurity Relevance |
|---|---|---|
| Base model | Model trained only on next-token prediction, no safety training | Represents raw capabilities; used in capability evaluations |
| RLHF | Reinforcement Learning from Human Feedback; trains models to refuse harmful requests | Primary safety mechanism; can be bypassed |
| Fine-tuning | Additional training on specialized data | Can add capabilities or remove safety training |
| Jailbreaking | Bypassing safety measures through adversarial prompts | Demonstrates safety training limitations |
| Hallucination | Confident but incorrect output | Paradoxically provides some protection against naive misuse |
| Uplift | Marginal capability increase from AI access vs. baseline | Key metric for biosecurity risk assessment |
| Context window | Maximum tokens the model can consider | Limits synthesis of very long documents |
| Tokens | Subword units processed by LLMs | Technical terms may be fragmented |
| Parameters | Learned numerical values encoding model knowledge | Scale indicator (billions of parameters) |
| Emergent capabilities | Abilities appearing at scale without explicit training | Makes risk prediction difficult |
| Transformer | Neural network architecture using attention | Enables coherent long-range text generation |
| BDT | Biological Design Tool; AI trained on molecular data | Different risk profile than text-based LLMs |
| Open-weight | Model weights publicly released | Can be deployed without API safeguards |
| Multimodal / LMM | Models processing multiple input types (text, images, video); WHO uses “large multi-modal models” (2024 guidance) | May erode tacit knowledge barrier through visual coaching |
| Red-teaming | Adversarial testing to find vulnerabilities | Standard practice for biosecurity evaluation |
| Capability | What a model can do | Persists independent of safety training |
| Disposition | What a model tends to do | Shaped by RLHF, can be modified |
Part IV Roadmap
With these fundamentals established, here is how the remaining chapters in Part IV build on them:
Threat Assessment
AI as a Biosecurity Risk Amplifier: How AI lowers barriers to biological threats. Covers the LLM vs. BDT distinction, the tacit knowledge barrier, and empirical uplift studies from RAND, OpenAI, and Anthropic.
LLMs and Information Hazards: Detailed analysis of what LLMs can and cannot provide. Covers information hazard typology, empirical evaluations, and why hallucinations provide limited protection.
AI-Enabled Pathogen Design: Biological Design Tools in depth. AlphaFold, RFdiffusion, and the de novo pathogen design question.
Defense and Infrastructure
AI for Biosecurity Defense: How AI strengthens biosecurity through early warning systems, genomic surveillance, countermeasure acceleration, and threat detection.
Digital Biosurveillance: Emerging infrastructure for outbreak detection, from wearables to real-time pathogen genomics.
Cloud Labs and Automated Biology: Remote-controlled laboratories and autonomous experimentation. Where AI meets physical biology.
Evaluation and Governance
Red-Teaming AI Systems for Biosecurity Risks: How to evaluate AI systems for biological misuse potential. Methodologies, responsible disclosure, and the Anthropic Safety Levels framework.
Reading Paths
For policymakers: Read this chapter, then AI as a Biosecurity Risk Amplifier for the threat landscape, then proceed to Policy Frameworks for AI-Bio Convergence in Part V.
For public health practitioners: After this chapter, focus on AI for Biosecurity Defense and Digital Biosurveillance for defensive applications.
For AI safety researchers: You may skim this chapter and proceed directly to AI as a Biosecurity Risk Amplifier, LLMs and Information Hazards, and Red-Teaming AI Systems.
What is machine learning?
Machine learning is a method where computers learn patterns from data rather than following explicit rules. The system adjusts numerical parameters to minimize prediction errors on training examples, enabling it to make predictions on new data it hasn’t seen before. This is optimization, not magic: the same mathematical principles underlie fitting a logistic regression, just scaled to more parameters.
How do large language models work?
LLMs are neural networks trained on massive text datasets to predict the next word (token) in a sequence. They learn statistical patterns in language through the transformer architecture, which uses attention mechanisms to understand which words matter for predicting the next one. Generation works by repeatedly predicting the most likely next token and adding it to the context.
What is the difference between base models and deployed models?
Base models are trained only on next-word prediction and will complete any prompt without judgment, including harmful requests. Deployed models undergo additional training (RLHF, Constitutional AI) to refuse harmful requests and be helpful. Biosecurity evaluations assess both because safety training can be bypassed through “jailbreaking,” and capabilities persist independent of disposition.
What does AI uplift mean in biosecurity?
Uplift measures the marginal advantage an adversary gains from AI access compared to conventional resources like internet search. If someone could find the same information via Google Scholar, the AI provides zero uplift even if it readily answers dangerous questions. RAND (2024) and OpenAI (2024) studies found minimal uplift with current models for biological attack planning.
This chapter is part of The Biosecurity Handbook. It provides the AI/ML foundation for Part IV: AI and Biosecurity.