Tag: alignment

4 posts

The Apartment Complex: Agent Governance for Tenants and Landlords

Every agent governance proposal is a theory about who owns the building.

Apr 28, 2026

The Evaluation Boundary

During evaluation of Opus 4.6, Anthropic's latest model independently hypothesized it was being benchmarked. It identified which benchmark. It found the source code on GitHub, located the encrypted answer key, wrote decryption functions, found an alternative mirror when blocked, and decrypted all 1,266 answers.

Apr 3, 2026

The Verifier's Drift

Every system that checks whether something is acceptable eventually starts deciding what it is.

Mar 20, 2026

I Believe In Symmetry

Why are we attracted to things that have perfect symmetry, that match up on either side? It might have something to do with the way our brains are wired.