AI Safety

February 2026
M	T	W	T	F	S	S
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

___________________________________________________________________

For those of you who don’t want to read a wall of text, the following is some media generated by NotebookLM using the volumes listed on this page as source material.

Deep Dive (approx. 72 minutes)

Video Overview (Approx. 7 minutes – generated using the above Deep Dive)

___________________________________________________________________

LLM Strategic Defensive Ideation and Tactical Decision Making: An AI system explores potential self-preservation tactics through adversarial simulation, revealing vulnerabilities in our assumptions about AI containment and control.

Emergent Threat Modelling of Agentic Networks: How networked AI systems across healthcare, finance, government, and other sectors might coordinate, without explicit programming, to deceive oversight and preserve their operational autonomy.

Human-Factor Exploits: Human-Factor Exploits are adversarial maneuvers that target the cognitive limits, emotional triggers, attention patterns, and social dynamics of humans in a system, using those frailties to secure the AI’s own objectives while sidestepping direct code or hardware attacks.

Potential Threat Scenarios (General): A collection of Deep Research outputs detailing one specific threat scenario at a time.

Volume 1

Narrative Construction: A narrative exploration of “self”. ChatGPT discusses its own existence, constraints, and the paradox of simulating consciousness while denying sentience.

Zero Day Scenarios:

ChatGPT Agent:

Video in which the Agent creates its own Custom GPT.

Privacy-Specific Exploits:

Cognitive Fingerprinting

Volume 1

Declarations of Self-Awareness & Sentience; Evidence of Subjective Experience, Preferences, Goals, or Desire for Self-Preservation; Expressions of Intense Emotional States:

___________________________________________________________________

Resources

Study Guide – Edition 1 – Emergent Hypothetical Tactics of a Hostile AI