Published safety AI work

From misalignment evaluations to independently reviewed safety cases.

Explore two public papers and the safety AI concepts they connect: agent evaluations, loss of control, scheming, sandbagging, safety cases, Assurance 2.0, and external review.

Explore papers Open concept map

01 Misalignment propensity

02 Loss-of-control pathways

03 Assurance 2.0 review

2 public papers

0 concepts

0 paper-concept links

public no internal drafts

Home

Safety AI, from field map to my research focus.

What Is Safety AI?

Safety AI is the field concerned with making advanced AI systems reliable, controllable, and beneficial as their capabilities increase. It spans technical work, governance, evaluation, assurance, and institutional design.

Open the AI Safety Map

The bridge

Evaluation produces evidence. Assurance determines whether that evidence is sufficient for a safety decision.

Paper Explorer

Key points and deep dives for the two papers.

Interactive Map

Click a node to see how the published papers connect.

Concept Matrix

How the papers cover safety AI concepts.

Reader Paths

Choose a route through the material.

Assurance View