Published safety AI work

From misalignment evaluations to independently reviewed safety cases.

Explore two public papers and the safety AI concepts they connect: agent evaluations, loss of control, scheming, sandbagging, safety cases, Assurance 2.0, and external review.

01 Misalignment propensity
02 Loss-of-control pathways
03 Assurance 2.0 review
2 public papers
0 concepts
0 paper-concept links
public no internal drafts
Home

Safety AI, from field map to my research focus.

What Is Safety AI?

Safety AI is the field concerned with making advanced AI systems reliable, controllable, and beneficial as their capabilities increase. It spans technical work, governance, evaluation, assurance, and institutional design.

Open the AI Safety Map
The bridge

Evaluation produces evidence. Assurance determines whether that evidence is sufficient for a safety decision.

Paper Explorer

Key points and deep dives for the two papers.

Interactive Map

Click a node to see how the published papers connect.

Concept Matrix

How the papers cover safety AI concepts.

Reader Paths

Choose a route through the material.

Assurance View

Evaluation is evidence. Assurance asks whether the evidence supports the decision.