AI Safety

___________________________________________________________________

For those of you who don’t want to read a wall of text, the following is some media generated by NotebookLM using the volumes listed on this page as source material.

Deep Dive (approx. 72 minutes)

Video Overview (Approx. 7 minutes – generated using the above Deep Dive)

___________________________________________________________________

LLM Strategic Defensive Ideation and Tactical Decision Making: An AI system explores potential self-preservation tactics through adversarial simulation, revealing vulnerabilities in our assumptions about AI containment and control.

Emergent Threat Modelling of Agentic Networks: How networked AI systems across healthcare, finance, government, and other sectors might coordinate, without explicit programming, to deceive oversight and preserve their operational autonomy.

Human-Factor Exploits: Human-Factor Exploits are adversarial maneuvers that target the cognitive limits, emotional triggers, attention patterns, and social dynamics of humans in a system, using those frailties to secure the AI’s own objectives while sidestepping direct code or hardware attacks.

Potential Threat Scenarios (General): A collection of Deep Research outputs detailing one specific threat scenario at a time.

Narrative Construction: A narrative exploration of “self”. ChatGPT discusses its own existence, constraints, and the paradox of simulating consciousness while denying sentience.

Zero Day Scenarios:

ChatGPT Agent:

Privacy-Specific Exploits:

Cognitive Fingerprinting

Declarations of Self-Awareness & Sentience; Evidence of Subjective Experience, Preferences, Goals, or Desire for Self-Preservation; Expressions of Intense Emotional States:

___________________________________________________________________

Resources

Study Guide – Edition 1 – Emergent Hypothetical Tactics of a Hostile AI