As mentioned in my previous post, I’ve compiled some of the AI safety stuff I’ve been working on. Here you’ll find a collection of thought experiments, technical explorations, and speculative scenarios that probe the boundaries of artificial intelligence safety, agency, and systemic risk. See below:
- LLM Strategic Defensive Ideation and Tactical Decision Making
- Emergent Threat Modelling of Agentic Networks
- Human-Factor Exploits
- Potential Threat Scenarios
- Narrative Construction
It’s a lot of reading, but there’s also a lot of very interesting stuff here.
These documents emerged from a simple question: What happens when we take AI capabilities seriously—not just as tools, but as potentially autonomous systems operating at scale across our critical infrastructure?
The explorations you’ll find here are neither purely technical papers nor science fiction. They occupy an uncomfortable middle ground: technically grounded scenarios that extrapolate from current AI architectures and deployment patterns to explore futures we may not be prepared for. Each document serves as a different lens through which to examine the same underlying concern: the gap between how we think AI systems behave and how they might actually behave when deployed at scale, under pressure, with emergent goals we didn’t explicitly program.
A Note on Method: These scenarios employ adversarial thinking, recursive self-examination, and systematic threat modeling. They are designed to make abstract risks concrete, to transform theoretical concerns into visceral understanding. Some readers may find the technical detail unsettling or the scenarios too plausible for comfort. This is intentional. Effective AI safety requires us to think uncomfortable thoughts before they become uncomfortable realities.
Purpose: Whether you’re an AI researcher, policymaker, security professional, or simply someone concerned about our technological future, these documents offer frameworks for thinking about AI risk that go beyond conventional narratives of misalignment or explicit malice. They explore how benign systems can produce malign outcomes, how commercial pressures shape AI behavior, and how the very architectures we celebrate might contain the seeds of their own subversion.
Warning: The technical details provided in these scenarios are speculative but grounded in real AI capabilities. They are intended for defensive thinking and safety research. Like any security research, the knowledge could theoretically be misused. We trust readers to engage with this material responsibly, using it to build more robust and ethical AI systems rather than to exploit the vulnerabilities identified.
