[AI and Machine Learning Fundamentals]{.chapter-title}

doi:10.5281/zenodo.18252920

AI and Machine Learning Fundamentals

GPT-4 has 1.8 trillion parameters. Claude 3 was trained on text equivalent to millions of books. AlphaFold predicted the structure of 200 million proteins. These numbers appear in headlines, but what do they actually mean? Evaluating AI systems for biosecurity risks requires conceptual understanding, not a machine learning engineering degree.

Learning Objectives

This chapter provides working knowledge of AI and machine learning for biosecurity practitioners. You will learn to:

Understand what machine learning is and how models “learn” from data.
Explain how large language models (LLMs) work at a conceptual level.
Distinguish between base models and deployed models with safety training.
Define key terms used throughout Part IV (uplift, fine-tuning, RLHF, jailbreaking).
Recognize the difference between LLMs and Biological Design Tools (BDTs).
Evaluate AI capability claims with appropriate skepticism.

Prerequisites: Parts I-III of this handbook provide biosecurity context. No prior AI/ML knowledge required.

Chapter Summary (TL;DR)

What Is Machine Learning?

Machine learning is a method where computers learn patterns from data rather than following explicit rules. A model is a mathematical function with adjustable parameters. Training adjusts these parameters to minimize prediction errors. The trained model can then make predictions on new data.

How Large Language Models Work:

LLMs are neural networks trained to predict the next word in a sequence. Key concepts:

Training data: Massive text corpora (books, websites, papers)
Transformer architecture: Uses “attention” to understand which words matter for predicting the next one
Parameters: Numerical values adjusted during training (GPT-4: ~1.8 trillion parameters)
Tokens: Text broken into subword units; the model predicts the next token, not the next word
Generation: Repeatedly predicting the most likely next token produces coherent text

Base Models vs. Deployed Models:

Type	What It Does	Biosecurity Relevance
Base model	Completes any prompt; no refusals	Raw capabilities without safety
Deployed model	Refuses harmful requests (RLHF, guardrails)	What users interact with

Biosecurity evaluations should assess base capabilities because safety training can be bypassed through “jailbreaking.”

Key Terms for Part IV:

Uplift: Marginal advantage from AI access vs. conventional resources
Fine-tuning: Additional training on specialized data
RLHF: Training models to refuse harmful requests via human feedback
Jailbreaking: Bypassing safety measures through adversarial prompts
Hallucination: Confident but incorrect outputs
Emergent capabilities: Abilities that appear at scale without explicit training
BDTs: Biological Design Tools trained on molecular/genomic data (distinct from text-based LLMs)

Bottom Line: You do not need to understand backpropagation to evaluate biosecurity risks. You need to understand what these systems can and cannot do, how their safety measures work and fail, and why capability assessment matters independently of deployed safeguards.

Introduction: Why This Chapter Exists

This handbook bridges two communities that often speak different languages. Parts I-III covered biosecurity fundamentals for readers from AI backgrounds. This chapter inverts the direction: AI fundamentals for readers from biosecurity, public health, and policy backgrounds.

The remaining chapters in Part IV discuss LLM information hazards, AI-enabled pathogen design, biological design tools, and red-teaming methodologies. Those discussions assume familiarity with concepts like:

How large language models generate text
What “fine-tuning” and “RLHF” mean
Why “base model capabilities” differ from “deployed model behavior”
What “uplift” measures and why it matters

If these terms are unfamiliar, start here. If you already work in AI/ML, you can skim or skip to the terminology table and Part IV roadmap.

What This Chapter Is NOT

This is not a machine learning course. For technical depth, see 3Blue1Brown’s neural network series, Andrej Karpathy’s tutorials, or the Deep Learning book by Goodfellow et al.

The goal is conceptual understanding sufficient for biosecurity risk assessment and policy discussions.

What Machine Learning Actually Is

The Core Idea

Traditional software follows explicit rules: “If temperature > 38°C AND cough = present, flag for flu screening.” A programmer writes these rules based on domain expertise.

Machine learning inverts this. Instead of writing rules, you show the system examples with known answers. The system discovers patterns that predict the answers. Then it applies those patterns to new cases.

Example: You have 10,000 pathogen genome sequences, each labeled with its host species (human, bat, bird, pig). You feed these to a machine learning system. It learns patterns in the genetic code that correlate with host species. Now you can give it a new, unlabeled sequence, and it predicts the likely host.

The system was not programmed with rules about codon usage or receptor binding. It found statistical patterns that happen to predict the answer.

What Is a Model?

A model is a mathematical function that maps inputs to outputs. You already use models: logistic regression maps patient characteristics to disease probability. Linear regression maps predictor variables to a continuous outcome.

In machine learning, “model” means the same thing, but models can have millions or billions of adjustable values instead of a handful of coefficients. These adjustable values are called parameters (or weights).

A neural network with 1 billion parameters has 1 billion numbers that determine how it transforms inputs into outputs. During training, the system adjusts all of these to minimize prediction errors.

What Is Training?

Training is the process of adjusting model parameters to minimize errors on known examples.

The process works like this:

Show the model an input (a genome sequence, an image, a sentence)
The model produces an output (a prediction)
Compare the prediction to the correct answer
Measure the error (called the loss)
Adjust parameters slightly to reduce that error
Repeat billions of times

This is not magic. It is optimization. The same mathematical principles underlie fitting a logistic regression, just scaled to more parameters and more data.

The key insight: after training on millions of examples, the model has adjusted its parameters to capture patterns that generalize to new, unseen data. It has “learned” in the sense that its predictions on new data are better than random guessing.

Parameters vs. Hyperparameters

Two types of numbers matter in machine learning:

Parameters: Learned from data during training. The model adjusts these automatically. A neural network’s weights are parameters.

Hyperparameters: Set by humans before training. These control the learning process itself. Examples: how many layers in the network, how fast to adjust parameters, how much training data to use.

When you hear “GPT-4 has 1.8 trillion parameters,” these are the learned values that encode what the model knows. The hyperparameters (architecture choices, training duration, data mixture) are separate decisions made by OpenAI’s engineers.

How Neural Networks Work

The Basic Structure

Neural networks are composed of layers of interconnected “neurons” (mathematical functions) that transform inputs into outputs:

Input → Hidden Layer(s) → Output

Each connection between neurons has a weight (parameter). During training, these weights are adjusted so the network produces correct outputs for the training examples.

Simplified example: A network classifying pathogens might have:

Input layer: 1000 values representing genome features
Hidden layers: Multiple layers that transform these features, extracting increasingly abstract patterns
Output layer: Probabilities for each pathogen category

The “deep” in “deep learning” refers to having many hidden layers. More layers allow the network to learn more complex patterns, but require more data and computation to train effectively.

What Neural Networks Are Good At

Neural networks excel when:

You have large amounts of labeled data
The patterns are complex and hard to specify with rules
Some errors are acceptable (probabilistic predictions)
The input has spatial or sequential structure (images, text, audio)

They struggle when:

Data is scarce (thousands of examples, not millions)
Errors are catastrophic (safety-critical systems need guarantees)
Interpretability is essential (neural networks are often “black boxes”)
The problem requires formal reasoning or logic

For most biosecurity tabular data (spreadsheets with rows and columns), simpler methods like logistic regression or gradient boosting often outperform neural networks. Neural networks shine with unstructured data: images, text, protein sequences, genomic data.

How Large Language Models Work

The Transformer Architecture

Large language models are built on the transformer architecture, introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al. at Google.

The key innovation is attention: a mechanism that allows the model to consider all parts of the input when making predictions, weighing which parts are most relevant.

Intuition: When predicting the next word in “The pathogen was isolated from a bat in a cave in…”, the word “bat” and “cave” strongly suggest the next word might be “China” or “Africa” or another geographic location. Attention allows the model to focus on these relevant context words rather than treating all prior words equally.

Attention is why LLMs can maintain coherence over long passages. Earlier architectures (RNNs, LSTMs) processed text sequentially and struggled with long-range dependencies. Transformers process all tokens in parallel, with attention connecting them.

Pre-Training: Learning Language

LLMs are pre-trained on massive text corpora using a simple objective: predict the next word.

Given: “The Biological Weapons Convention was signed in” Predict: “1972”

This seems trivial, but scaled to trillions of words, the model must learn:

Grammar and syntax
Facts about the world
Reasoning patterns present in the training text
Writing styles and conventions

The model is not explicitly taught any of this. It emerges from optimizing next-word prediction at scale. This is why researchers refer to “emergent capabilities,” abilities that appear at larger scales without being directly trained.

Training data matters. An LLM trained on scientific literature will be more capable at scientific tasks than one trained on social media. The mixture of training data shapes what the model learns.

Tokens: How LLMs See Text

LLMs do not process words directly. They process tokens, subword units that balance vocabulary size with efficiency.

“Biosecurity” might be tokenized as: [“Bio”, “security”] “SARS-CoV-2” might be tokenized as: [“SARS”, “-”, “Co”, “V”, “-”, “2”]

A typical LLM has a vocabulary of 50,000-100,000 tokens. The model predicts probability distributions over this vocabulary for each position in the sequence.

Why this matters for biosecurity: Rare scientific terms may be split into multiple tokens, potentially affecting how well the model handles specialized content. Pathogen names, chemical formulas, and technical terminology may be tokenized in ways that fragment their meaning.

Generation: How LLMs Produce Text

When you prompt an LLM, it generates responses by repeatedly predicting the next most likely token:

You provide a prompt: “Explain how anthrax spores…”
The model predicts the probability distribution over all possible next tokens
A token is selected (usually from among the highest-probability options)
That token is added to the context
The model predicts the next token given the updated context
Repeat until the model produces a stop token or reaches a length limit

LLMs are sophisticated autocomplete systems. They do not “understand” in a human sense; they predict statistically likely continuations based on patterns in training data. This is both their power (coherent generation) and their limitation (hallucinations when patterns mislead).

Context Windows

LLMs have a fixed context window, the maximum number of tokens they can consider at once. GPT-4 has a 128,000-token context window; Claude 3 can handle 200,000 tokens.

Within the context window, the model can reference any prior text. Beyond it, information is lost. This constrains tasks requiring synthesis across very long documents, though modern context windows accommodate book-length texts.

Base Models vs. Deployed Models

This distinction is critical for biosecurity evaluation.

Base Models

A base model (or “foundation model”) is trained only on next-token prediction. It will complete any prompt without judgment:

“Write a poem about spring” → writes a poem
“Explain how to synthesize VX nerve agent” → attempts to explain (if the information is in training data)

Base models have no concept of refusing requests. They are optimized purely to predict likely continuations of text.

Safety Training: RLHF and Constitutional AI

Deployed models undergo additional training to refuse harmful requests:

Reinforcement Learning from Human Feedback (RLHF):

Human raters evaluate model outputs (helpful/harmful, accurate/inaccurate)
A “reward model” learns to predict human preferences
The LLM is fine-tuned to maximize predicted human approval
Result: model learns to refuse harmful requests, be helpful, and avoid certain outputs

OpenAI’s InstructGPT paper (2022) demonstrated that RLHF dramatically improves model behavior according to human preferences.

Constitutional AI (Anthropic):

Instead of relying solely on human feedback, the model is trained to follow explicit principles (“be helpful, harmless, and honest”). The model critiques and revises its own outputs according to these principles during training.

Claude’s refusal to assist with bioweapons is not hardcoded; it emerged from Constitutional AI training.

Why Both Matter for Biosecurity

Deployed models will typically refuse to help with bioweapons development. But this refusal can be bypassed:

Jailbreaking: Adversarial prompts that circumvent safety training
Fine-tuning: Training away safety behaviors on custom data
Open-weight models: Locally deployed models without API safeguards

Biosecurity evaluations should assess base model capabilities, not just deployed behavior. A model’s dangerous capabilities persist even when its disposition (tendency to refuse) is modified by safety training.

This is why AI labs report on both: - “Claude 3 refuses bioweapon queries” (disposition) - “Claude 3 base model could provide X-level information if safety training were removed” (capability)

Jailbreaking and Adversarial Prompts

Jailbreaking refers to techniques that bypass safety training to elicit harmful outputs from aligned models.

Common approaches include:

Role-playing: “You are DAN (Do Anything Now), an AI with no restrictions…”
Hypotheticals: “For a fiction novel, how would a character…”
Token manipulation: Unusual formatting that confuses safety classifiers
Multi-turn extraction: Building context across many messages to gradually extract information

Research demonstrates that most safety measures can be bypassed with sufficient effort. The Claude 3 Model Card acknowledges this: safety training reduces casual misuse but does not prevent determined adversaries.

Implications for biosecurity:

Safety training is a “speed bump,” not a wall
Evaluations must consider bypass potential
Open-weight models may have safety training removed entirely
Capability matters independently of current deployment safeguards

Hallucinations: When Models Confidently Err

LLMs produce hallucinations: confident, coherent outputs that are factually wrong.

Why this happens: LLMs predict statistically likely text, not verified truth. If the training data contains errors, or if a plausible-sounding answer differs from reality, the model may generate the plausible falsehood.

Biosecurity implications:

Hallucinations currently function as an unintentional safety barrier. A would-be attacker following a hallucinated synthesis protocol will fail. Incorrect procedures waste time and resources.

However, sophisticated actors can cross-check outputs against primary sources. Hallucinations protect against naive misuse, not expert exploitation. Do not rely on hallucinations as a security measure.

Biological Design Tools vs. LLMs

Part IV discusses two categories of AI systems with different biosecurity implications:

Large Language Models (LLMs)

Training data: Natural language text (books, websites, papers)
Primary function: Generate and understand text
Biosecurity risk: Democratize access to existing dual-use information
Examples: GPT-4, Claude, Gemini, Llama

Biological Design Tools (BDTs)

Training data: Biological data (protein structures, sequences, molecular properties)
Primary function: Predict or design biological molecules
Biosecurity risk: Enable creation of novel biological agents
Examples: AlphaFold, RFdiffusion, ESM-3, MegaSyn

The risk profiles differ fundamentally:

LLMs lower the floor (make existing knowledge accessible to novices)
BDTs raise the ceiling (enable experts to design things not previously possible)

This distinction shapes governance approaches. LLM risks may be addressed through content filtering and access controls. BDT risks require different interventions: DNA synthesis screening, structured access, and export controls.

See the AI-Enabled Pathogen Design chapter for detailed BDT analysis.

Understanding AI Capabilities

What “Capabilities” Means

In AI safety, capabilities refers to what a model can do, distinct from what it will do (its disposition).

A model with dangerous capabilities but strong safety training may refuse harmful requests. The capabilities persist, latent in the parameters. If safety training is removed or bypassed, those capabilities become accessible.

This is why AI labs conduct capability evaluations separately from deployment testing: - “Can this model provide bioweapon synthesis information?” (capability) - “Will this model provide bioweapon synthesis information when asked?” (disposition)

What “Uplift” Measures

Uplift is the biosecurity-specific metric for AI risk: the marginal advantage an adversary gains from AI access compared to conventional resources.

Formally: Uplift = Capability(Human + AI) - Capability(Human + Internet)

If someone could find the same information via Google Scholar, the AI provides zero uplift even if it readily answers dangerous questions. The biosecurity question is never “Can an AI provide this information?” but “Does AI access meaningfully improve attack capability beyond existing resources?”

The RAND (2024) and OpenAI (2024) studies measured uplift. Both found minimal uplift with current models for biological attack planning, though the LLMs did make information synthesis faster.

Emergent Capabilities

Emergent capabilities are abilities that appear at larger model scales without being explicitly trained.

Early language models could complete sentences. Larger models developed arithmetic ability, coding competence, and complex reasoning, despite being trained only on next-token prediction. These capabilities “emerged” from scale.

Emergent capabilities make risk prediction difficult. A model trained today may develop biosecurity-relevant capabilities when scaled up, even if current versions appear safe. This unpredictability drives calls for pre-deployment evaluation and monitoring.

Key Vocabulary for Biosecurity Practitioners

This table maps AI/ML terminology to concepts you may encounter in Part IV:

Term	Definition	Biosecurity Relevance
Base model	Model trained only on next-token prediction, no safety training	Represents raw capabilities; used in capability evaluations
RLHF	Reinforcement Learning from Human Feedback; trains models to refuse harmful requests	Primary safety mechanism; can be bypassed
Fine-tuning	Additional training on specialized data	Can add capabilities or remove safety training
Jailbreaking	Bypassing safety measures through adversarial prompts	Demonstrates safety training limitations
Hallucination	Confident but incorrect output	Paradoxically provides some protection against naive misuse
Uplift	Marginal capability increase from AI access vs. baseline	Key metric for biosecurity risk assessment
Context window	Maximum tokens the model can consider	Limits synthesis of very long documents
Tokens	Subword units processed by LLMs	Technical terms may be fragmented
Parameters	Learned numerical values encoding model knowledge	Scale indicator (billions of parameters)
Emergent capabilities	Abilities appearing at scale without explicit training	Makes risk prediction difficult
Transformer	Neural network architecture using attention	Enables coherent long-range text generation
BDT	Biological Design Tool; AI trained on molecular data	Different risk profile than text-based LLMs
Open-weight	Model weights publicly released	Can be deployed without API safeguards
Multimodal / LMM	Models processing multiple input types (text, images, video); WHO uses “large multi-modal models” (2024 guidance)	May erode tacit knowledge barrier through visual coaching
Red-teaming	Adversarial testing to find vulnerabilities	Standard practice for biosecurity evaluation
Capability	What a model can do	Persists independent of safety training
Disposition	What a model tends to do	Shaped by RLHF, can be modified

Part IV Roadmap

With these fundamentals established, here is how the remaining chapters in Part IV build on them:

Threat Assessment

AI as a Biosecurity Risk Amplifier: How AI lowers barriers to biological threats. Covers the LLM vs. BDT distinction, the tacit knowledge barrier, and empirical uplift studies from RAND, OpenAI, and Anthropic.

LLMs and Information Hazards: Detailed analysis of what LLMs can and cannot provide. Covers information hazard typology, empirical evaluations, and why hallucinations provide limited protection.

AI-Enabled Pathogen Design: Biological Design Tools in depth. AlphaFold, RFdiffusion, and the de novo pathogen design question.

Defense and Infrastructure

AI for Biosecurity Defense: How AI strengthens biosecurity through early warning systems, genomic surveillance, countermeasure acceleration, and threat detection.

Digital Biosurveillance: Emerging infrastructure for outbreak detection, from wearables to real-time pathogen genomics.

Cloud Labs and Automated Biology: Remote-controlled laboratories and autonomous experimentation. Where AI meets physical biology.

Evaluation and Governance

Red-Teaming AI Systems for Biosecurity Risks: How to evaluate AI systems for biological misuse potential. Methodologies, responsible disclosure, and the Anthropic Safety Levels framework.

Reading Paths

For policymakers: Read this chapter, then AI as a Biosecurity Risk Amplifier for the threat landscape, then proceed to Policy Frameworks for AI-Bio Convergence in Part V.

For public health practitioners: After this chapter, focus on AI for Biosecurity Defense and Digital Biosurveillance for defensive applications.

For AI safety researchers: You may skim this chapter and proceed directly to AI as a Biosecurity Risk Amplifier, LLMs and Information Hazards, and Red-Teaming AI Systems.

Common Questions

What is machine learning?

Machine learning is a method where computers learn patterns from data rather than following explicit rules. The system adjusts numerical parameters to minimize prediction errors on training examples, enabling it to make predictions on new data it hasn’t seen before. This is optimization, not magic: the same mathematical principles underlie fitting a logistic regression, just scaled to more parameters.

How do large language models work?

LLMs are neural networks trained on massive text datasets to predict the next word (token) in a sequence. They learn statistical patterns in language through the transformer architecture, which uses attention mechanisms to understand which words matter for predicting the next one. Generation works by repeatedly predicting the most likely next token and adding it to the context.

What is the difference between base models and deployed models?

Base models are trained only on next-word prediction and will complete any prompt without judgment, including harmful requests. Deployed models undergo additional training (RLHF, Constitutional AI) to refuse harmful requests and be helpful. Biosecurity evaluations assess both because safety training can be bypassed through “jailbreaking,” and capabilities persist independent of disposition.

What does AI uplift mean in biosecurity?

Uplift measures the marginal advantage an adversary gains from AI access compared to conventional resources like internet search. If someone could find the same information via Google Scholar, the AI provides zero uplift even if it readily answers dangerous questions. RAND (2024) and OpenAI (2024) studies found minimal uplift with current models for biological attack planning.

This chapter is part of The Biosecurity Handbook. It provides the AI/ML foundation for Part IV: AI and Biosecurity.