evaluation

3 posts

The Dashboard Goes Green

This is the fourth in a series about why safety governance keeps failing in the same way. "Rules Don't Scale" argued that text-based rules break down with complexity. "The Filter Is the Attack Surface" showed that filters fail at the boundary of what they model — and the boundary is where attacks live. "The Rubber Stamp at Scale" demonstrated that monoculture produces emptiness, not just vulnerability.

Mar 17, 2026

Gemini 3.1 Pro rotates a cube

writes <2% as many bytes as Opus 4.6

Feb 20, 2026

Opus Rotates Shapes

SVG animations

Feb 5, 2026