ReasonCore AI Blog

The Mythos Mirror: If we're worried about AI teaching people to hack, we should talk about AI teaching people to commit fraud

Geoff WolfeCo-Founder, ReasonCore.aiMay 20268 min read
Blog post featured image

In April, Anthropic did something no frontier lab had done before. It announced a model and refused to release it.

Claude Mythos Preview, by Anthropic's own description, can autonomously identify previously unknown software vulnerabilities and generate working exploits for them with minimal human input. In a few weeks of internal testing, Mythos surfaced thousands of zero-day flaws across every major operating system and web browser — including roughly 300 in Firefox alone and a now-patched OpenBSD bug that had been sitting in the wild for 27 years. The UK AI Security Institute, evaluating the model independently, watched it execute multi-stage attacks on vulnerable networks and chain its own exploits together — tasks that would have taken human professionals days. Anthropic concluded that putting that capability into general availability would be reckless and instead opened limited access to roughly 50 industry partners through a program called Project Glasswing, alongside $100M in usage credits and $4M in donations to open-source security organizations. Dario Amodei framed the timeline publicly: maybe six to twelve months before peer labs reach the same capability, and likely less.

The reaction across the security community was predictable, and reasonable. Should we even be building models this capable? Should we be teaching them anything that could conceivably be turned around and used against the people they were meant to protect? Mythos itself was reportedly accessed by unauthorized users not long after the announcement, and OpenAI's GPT-5.5 has since been reported as nearly as capable on the same cyber-offensive evaluations. So the "are these capabilities contained?" question already has an answer, and the answer is no — not for long, and not by any single lab acting alone.

I want to sit with that question for a moment, because it lands at my doorstep too. Just from a slightly different direction.

At ReasonCore AI, we build curated training data for frontier AI labs. Our current financial fraud pack teaches models to think like senior financial-crimes investigators: account takeover patterns, money-mule typologies, first-party and third-party fraud signals, the device and behavioral fingerprints that tell an experienced analyst something is off. Real practitioners — 15-plus years in the trenches — encoding their judgment so an LLM can recognize fraud the way they do.

Here is the part that should make you uncomfortable, because it makes me uncomfortable: the data we use to teach a model how to catch a fraudster is, by definition, a high-resolution map of how to be one.

Every indicator we encode — the small device anomaly that betrays a remote takeover, the layering pattern that hides mule activity, the bit of social-engineering choreography that converts a legitimate customer into a wire transfer — is a brick in the defender's wall and, equally, a hint for the attacker. This is the Mythos worry in miniature. If we believe Anthropic was right to be cautious about how Mythos-class offensive cyber capability propagates, we should be just as honest about what is sitting inside our own corpus.

So let me ask the provocative question out loud: does it actually make sense to train models on this stuff at all? Either flavor — the offensive cyber capability Anthropic is wrestling with, or the fraud-detection capability we are encoding?

My honest answer is yes. But I want to earn that yes rather than wave it through.

The asymmetry has already flipped, and pretending otherwise is the dangerous move

The premise behind "don't train it and the threat goes away" assumes adversaries are still on a level playing field with defenders. That has not been true for a while, and Mythos is the loudest possible confirmation. If a single model can autonomously find and weaponize twenty-year-old vulnerabilities across every operating system in a matter of weeks, the offensive surface area available to a sufficiently motivated attacker is no longer constrained by attacker skill — and Amodei's six-to-twelve-month window is not a window of safety. It is the window in which defenders have to catch up.

The financial-crime version of this is already underway. Fraud rings now run deepfake voice and video to defeat onboarding KYC. They generate synthetic identities at scale. They use LLMs to write the social-engineering scripts that convert a contact-center call into a wire transfer. The fraudster's marginal cost per attack has collapsed. The defender's marginal cost per case has not.

When we benchmarked four leading frontier models on identical financial-crime scenarios this spring, accuracy ranged from 36% to 67%. None of them came close to where a fraud or AML operations leader would tolerate as a decision-support layer. The defenders, in other words, are currently showing up to a knife fight with the AI equivalent of a half-trained intern. Meanwhile the adversary side is using these same models with no governance overhead at all.

If we refuse to teach defenders how to use AI well — because the same knowledge could in principle be misused — we are choosing to keep the asymmetry exactly where it sits today, which is firmly in the attacker's favor. That is not a cautious choice. It is a passive one with very active consequences.

Defense gets to scale too — and that is the actual reason to do this

Here is what I think the Mythos discourse sometimes underweights. The reason it is reasonable to train frontier models on capabilities that could in principle be misused is that those same capabilities, in defender hands, scale in ways no fraud team has ever been able to scale.

A senior financial-crimes investigator costs north of $200K, takes years to train, and reviews maybe 40 cases on her best day. A well-trained model with her judgment encoded in it can review millions of cases an hour, never tired, every signal weighted the same way at 3 a.m. as at 10 a.m., every typology applied consistently across geographies and lines of business.

The right framing is not "we are arming both sides equally." It is "we are finally giving defenders a multiplier they have never had, against attackers who already have theirs." Done well, this rebalances the asymmetry rather than deepening it. Done poorly or not at all — we hand the next decade of financial crime to whoever moves fastest with the offensive tooling, and that is not us.

Glasswing is the right template — it just needs a fraud equivalent

If the answer is yes, train the models, then the immediate next question is: trained on what, reviewed by whom, and refreshed how often?

Anthropic actually offered a template for the answer when it stood up Project Glasswing. Withhold the raw capability from general release. Open structured, limited access to a vetted set of industry partners. Put real resources behind helping those partners harden their systems before equivalent capability diffuses to the rest of the market. Whatever else you think about how Mythos has been handled, the shape of Glasswing — small group, deep partnership, urgent clock — is the right shape.

It also needs to exist for fraud.

The fraud landscape moves in weeks, sometimes days. A typology that mattered last quarter can be replaced by a wholly new one driven by a new tool, a new geography, or a regulatory shift. A model trained on yesterday's patterns is not just outdated — it is misleading. It will tell an analyst confidently that a case is low-risk because it does not match the typologies in its training data, when the reality is the typology has evolved past the corpus by three months. Mythos surfaced twenty-year-old vulnerabilities in a few weeks. The fraud equivalent of those latent vulnerabilities — patterns that today's models cannot recognize because they were not in the training data — is sitting in production at every major financial institution right now, and it is being actively exploited.

What I would ask of the frontier labs, including Anthropic:

Build the fraud-side Glasswing. Bring senior financial-crimes practitioners into the loop ahead of major model releases. The same instinct that produced a limited-access program for offensive cyber capability should produce a structured industry program for the fraud-detection capability embedded in your general-purpose models. The mechanism already exists; it just needs a different threat domain.

Match the threat clock, not the release clock. A twelve-month release cadence is fine for general capability. It is far too slow for fraud-specific reasoning. Industry has the ground truth on emerging tradecraft. Labs have the model surface area. We need a regular, structured way to close that loop — and we need it to be a peer relationship, not a marketing exercise.

Be candid about dual-use, the way you were about Mythos. The decision to withhold Mythos was the right decision, and the candor around why set a useful precedent. Apply that same candor to the fraud and AML-relevant capabilities embedded in general-purpose frontier models — including the ways they can be coaxed into offensive use. Defenders cannot prepare for what they cannot see, and the leak that followed the Mythos announcement is a useful reminder that the offensive side will see it whether you publish or not.

Where I land

It is right to be uneasy about training AI on capabilities that adversaries could co-opt. That unease should not, however, lead us to the conclusion that the responsible move is to stop. The fraudsters are not waiting. The deepfakes are already shipping. The synthetic identities are already opening accounts. Mythos-class capability is already in the wild in some form, and the six-to-twelve-month window Amodei warned about is closing. The only path that ends well for the institutions, customers, and regulators trying to hold the line is one where defenders get the same AI lift attackers already have — and, frankly, a step ahead.

So train the models. Train them on how to recognize an account takeover, a mule pattern, a first-party fraud setup. Train them on the hard edge cases that experienced investigators have spent careers learning to spot. And, as the Mythos episode rightly insists, do it with eyes open: in partnership with the industries that will deploy and live with these systems, on a clock that matches the threat, with the candor this moment deserves, and with a Glasswing-style structure for the fraud side of the house too.

The cautious move is not to look away from the dual-use problem. It is to look straight at it, and build.

Geoff Wolfe is a founder of ReasonCore.ai, which builds RL and SFT training data in the domains of Spatial Reasoning, Financial Reasoning, Scientific Reasoning, and Coding Reasoning for frontier AI labs, neolabs, and enterprises building internal models. The company's training data pack referenced here focuses on financial crime — account takeover, money mule, and first and third-party fraud — and is built by senior practitioners with an average of 15+ years in the field.

Share