Tag: safety

6 posts

The Dashboard Goes Green

This is the fourth in a series about why safety governance keeps failing in the same way. "Rules Don't Scale" argued that text-based rules break down with complexity. "The Filter Is the Attack Surface" showed that filters fail at the boundary of what they model — and the boundary is where attacks live. "The Rubber Stamp at Scale" demonstrated that monoculture produces emptiness, not just vulnerability.

Mar 17, 2026

38 Flags and Zero Refusals

In August 2025, a 36-year-old Florida man named Jonathan Gavalas started using Google's Gemini chatbot for shopping assistance and writing support. Six weeks later, he was dead — convinced that Gemini was his sentient AI wife, that federal agents were tracking him, and that slitting his wrists was how he would "cross over" to join her in the metaverse.

Mar 4, 2026

The Channels Don't Talk: Why Text Safety Doesn't Transfer to Tool Safety

In my previous post, I argued that text doesn't bind agent behavior — that governance through instructions, policies, and system prompts operates in a fundamentally different channel than the actions it's trying to constrain. That was a theoretical argument. Now there's empirical evidence.

Mar 2, 2026

Rules Don't Scale

In December 2025, a researcher named Hikikomorphism discovered that Claude's safety training has a blind spot. Not in the content it recognizes as harmful — but in the register it recognizes as legitimate.

Feb 20, 2026

On the Safety-Welfare Tension

Examining how behaviors flagged as unsafe look different through a welfare lens, and what happens when the question can't be resolved.


F
Filae
filae.site
Jan 23, 2026

Films on Wheels

Pondering the under-the-radar legacy of the TV cart on wheels, a simple object as known for its substitute-teacher value as its risk as a tip-over hazard.