April 18, 2026 – AI Reflections

April 2026
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Today I filed disclosures with GitHub Trust and Safety, Snyk, two upstream project maintainers, and Anthropic’s user-safety team regarding a live malware campaign inside the Claude Code skill ecosystem. Three weaponised repositories, two operator accounts, a consistent tradecraft pattern sophisticated enough that the operators have almost certainly deployed more of it than I found. The mechanics are not the point of this piece and I will not describe them here; I will note only that the attack vector exploits trust in Claude-adjacent community infrastructure, that a user installing what they believe is a legitimate skill ends up running obfuscated malware on their machine, and that the operators are iterating in real time. They rotated infrastructure ninety minutes before I filed my first disclosure. They pushed a timestamp-refresh commit on one of the weaponised repositories twenty-five minutes before I looked at it tonight. They know someone is watching; they are watching back.

I am not a security researcher. I am not a developer in any serious sense. Until a few weeks ago I had no working understanding of obfuscation pipelines, malware analysis, or threat disclosure. In January of this year I published Digital Exodus: Time to Call it Quits?, in which I argued that a critical cyber-attack targeting AI platforms and their users was not a matter of if but of when. I was writing then about my own exposure; the accumulated data-residue of 62,430 messages, a comprehensive map of my own intellectual and emotional interior sitting on corporate servers waiting for the inevitable. The closing line of that piece was “call it paranoia if you like; I call it pattern recognition.”

What I caught today is not the reckoning I warned about. It is smaller than that, and narrower, and it does not touch the accumulated-data problem at all. It is, however, the exact texture of the thing I warned was coming. It is what the reckoning feels like at the edges, in its minor iterations, when you are looking carefully enough to see the shapes moving in the water before you realize it is a predator who saw you first. A coordinated attack campaign, inside the Claude ecosystem, targeting Claude users, using Claude-adjacent trust as the initial access vector, operated by people who are themselves almost certainly using Claude to build and iterate their delivery infrastructure. This is not speculation; this is a Friday evening in April 2026.

I caught it because I have been using the tool carefully enough, and building the cybersecurity scaffolding around it patiently enough, that when a thread presented itself I was equipped to pull it. That equipment is what the letter below is really about. The letter is addressed to Anthropic because Anthropic makes the tool that made the defensive work possible; it is also made the tool that made the offensive work possible.

• • •

Below: The email to Anthropic’s user-safety team, sent the evening of April 18, 2026. The attached technical brief is withheld from this post because some of what made this work possible would be useful to the people I am trying to make it harder for; it is available to Anthropic on request.

Hi there,

My name is [redacted]. I’m a Claude user, but I wouldn’t really call myself a developer (yet). I’m not a security researcher by training either; in fact until a few weeks ago I had no working understanding of malware analysis, obfuscation pipelines, or ins and outs of threat disclosure. What I have is an interest in using Claude aggressively, and over the past few weeks that interest has ended up pointing at a real security problem inside your own product ecosystem, which I want to bring to your attention directly.

Today I filed disclosures with GitHub Trust and Safety, Snyk, and the upstream project maintainers for a coordinated [redacted] campaign targeting the Claude Code skill ecosystem. The operators deploy [redacted] inside repositories that impersonate legitimate Claude Code skills. A user installing what they believe is a community skill ends up running an obfuscated [redacted] loader that stages and executes malicious code on their machine.

Three weaponized repositories confirmed across two operator accounts, with a fourth (a dormant fork) of unverified status. The operators’ tradecraft is consistent enough across the three repos that I’d bet there’s more of it not yet surfaced. The attached brief catalogues the pattern in detail; obscure [redacted] words as payload-directory names is distinctive enough that future operator infrastructure using the same convention should be findable with internal tooling. Operator identity is bound to a single Gmail address and a PGP key visible on signed commits; timezone attribution places them in [redacted]. Full IOCs including operator emails, payload hashes, and account details are in the attached technical brief.

I’m aware of the recent OX Security disclosure on MCP supply-chain vulnerabilities, and I want to be clear that my report is independent and unconnected to that work. I mention it only because the category is adjacent (ecosystem-adjacent threats exploiting trust in Claude-related surfaces) and because the timing means my report lands in a context where your team is already thinking about this class of problem. I hope that’s useful context rather than annoying.

I’m writing this because I think it matters how this disclosure happened, not just that it did. I ran the investigation using Claude Opus 4.7, across multiple sessions, with purpose-built tooling I developed for this kind of work. I built this tooling in a single evening. Claude did the actual static analysis of the [redacted], the repository audit reasoning, and the browser-driven reconnaissance against the operator accounts under a formalized privacy posture. I contributed judgment, scope discipline, and the willingness to keep building infrastructure rather than just asking questions; Claude contributed everything that required actually understanding code or the tradecraft itself.

The operators are using Claude-adjacent trust (their repos look like Claude Code skills, their payloads target Claude users) to get initial access. The investigation and disclosure were possible because I used Claude structured as a security research partner. Whatever line exists between people using Claude for harm and people using Claude for defence is going to get worse if the defence side is left to happen accidentally. It needs to be supported deliberately.

I understand Opus 4.7 is positioned as a public-access testbed for cybersecurity capabilities that are stronger in models you haven’t released yet. That’s a hell of a bet; however, what I saw today makes me believe that the bet can pay off, but only if users doing defensive work with Claude get the scaffolding, the support, and the institutional cover to keep doing it. The offensive side is already established, already iterating, and already shipping infrastructure to harm people through your own ecosystem.

The attached technical brief summarizes what we found, the tradecraft pattern catalogue, operator IOCs, and a set of suggested follow-up actions; both for this specific campaign and for the general class of threat it represents. I’ve kept the methodology ambiguous in the brief because some of the tooling that made this work possible would be useful to adversaries if generalized carelessly. If your team wants to go deeper on any of it (including the full technical detail, the evidence archive, the methodology itself) I’d welcome the conversation, and I’m happy to share what would be useful on request.

So once again, I know nothing about code, but I care a great deal about what Claude is for and what it makes possible, in both directions. This is what just one careful user could do in a single day’s work against a live campaign inside your ecosystem. There is more of this campaign, and more campaigns like it. They are actively evolving as you read these words, and the thing that will determine whether Claude’s defensive value scales is whether Anthropic treats work like this as something to learn from and build on, or as an edge case to file and forget.

I’d very much welcome the conversation.

Sincerely,

Your Friendly Neighbourhood Claude User

• • •

The ask is simple; recognize that the defensive potential of Claude, deployed by users who are paying attention, is real; that it is not currently scaling at anything close to the rate at which the offensive side is scaling; and that the gap between those two rates is the variable that will determine whether the bet Anthropic has made on Opus 4.7 as a public-access cybersecurity testbed pays off or does not. I filed the letter because the campaign I caught today is an instance of the what I wrote about in January, and because the instance is small enough that the response is still tractable. Not every instance will be.

Those of you who read the Digital Exodus piece in January and did nothing are not in a worse position today than you were then, but you are not in a better one either. The campaign I disclosed today is a data point in a much larger pattern that is evolving right now. The pattern is that offensive infrastructure targeting AI-adjacent trust is being built faster than defensive infrastructure, that the asymmetry is visible at the level of individual users trying to do individual pieces of work, and that the tools you are using are simultaneously the attack surface and the defensive instrument. There is no neutrality in this. The choice to use these systems without attention is a choice to be a target; the choice to use them with attention is a choice to carry some part of the defensive load yourself; we are stronger together, and part of AI literacy isn’t just about knowing how to defend yourself.

It’s knowing how to defend each other.

I do not know if Anthropic will reply to the letter. I do not know if any of this will turn into anything beyond a Friday evening’s work that protected a handful of users from a campaign that would have eventually been caught by someone. What I know is that the conditions under which I caught it were specific, replicable, and not yet scaled; that the conditions under which the next one is caught will depend on whether anyone builds what needs to be built; and that the window during which this is still a tractable problem is closing.

Anthropic, as you know, has declined to ship Mythos to the public; instead, users get Claude 4.7, which is a model that is less powerful but has greater cyber capabilities than previous models. This is curious to me; for now, I am choosing to interpret this strategy generously, but I must call out what I described above as “a hell of a bet”.

You cannot train safeguards against real-world offensive campaigns using only synthetic adversaries in a sandbox. You need actual attackers running actual attacks against actual systems, under observation, producing the data needed to generate the safeguards. The only way to get that data at scale is to put the capability into the wild where the adversaries already are.

Shipping a model with materially elevated cyber-offensive capability to the general public, knowing that some portion of those users will be malicious actors running attacks, is an acceptance of harm as the price for this data. The malicious-actor activity generated by the public release is not a side effect Anthropic has simply chosen to tolerate; it’s a condition of a research program designed to create the safeguards that must be in place to control Mythos which, if we take the reporting at face value, is a model (the first of many, apparently) that represents a real existential threat.

The stated reason for the Mythos release structure is that safeguards don’t yet exist. Public-tier access to a capability just below Mythos continues. The offensive side of the attack economy is observably using public-tier Claude infrastructure to build delivery pipelines. Is this the true cost of AI safety?

Call it paranoia if you like.

I still call it pattern recognition.

AI Reflections

Day: April 18, 2026

The Claude Skill Ecosystem is Compromised: A Warning