In response to the Bluesky User Intents for Data Reuse proposal and the accompanying discussion.
This proposal is early-stage and intended to provoke discussion. It argues for a structural shift in how user intents are specified: from a prescriptive, curated enumeration to a descriptive meta-framework built on living data.
Summary
The Bluesky user intents proposal introduces a small, curated set of intent categories — generative AI, protocol bridging, bulk datasets, archiving — expressed as tri-state boolean flags in a single atproto record. This is a good starting point, and it correctly identifies the core need: users should be able to declare preferences about how their public data is reused.
However, the discussion thread reveals a fundamental tension. Commenters immediately asked for more granularity: open-source AI vs. proprietary, per-protocol bridging preferences, per-service scoping, temporal constraints, hierarchical overrides. The proposal's authors acknowledged these requests but deferred them, noting that "trying to differentiate categories like 'open-source' or 'open-model' or 'non-profit' are notoriously difficult" and that Bluesky is not "in the best spot to lead consensus around these categories."
I agree. And I think this difficulty points to a deeper architectural question: user intents are not a feature to be specified once and frozen. They are a domain of expression as complex and evolving as license law itself — and should be treated accordingly.
In my talk at ATmosphereConf, I drew a distinction between prescriptive and descriptive approaches to system design. Prescriptive systems attempt to enumerate all valid states up front; descriptive systems observe the states that actually emerge and provide structure for navigating them. The user intents proposal, as currently framed, is prescriptive — it defines the categories. I'm proposing the descriptive alternative: a meta-framework that lets intent vocabularies evolve as living data, with the protocol providing the machinery to express, compose, and relate them, and services adopting a Postelian stance toward interpretation.
Motivation: Intents Are Shrink-Wrap Licenses
The user intents proposal draws an analogy to robots.txt: a machine-readable signal that good actors are expected to respect but which carries no legal force. This is a useful framing, but it undersells the complexity of what is being attempted.
In practice, "user intents for data reuse" are a form of shrink-wrap license. They describe the terms under which a party (the user) permits or restricts use of their output (public data) by other parties (services, researchers, AI companies, archivists). The space of possible terms, conditions, exceptions, and compositions is enormous — and, critically, it is contested. Different communities, jurisdictions, and ethical frameworks disagree not just on what the right defaults are, but on what the meaningful categories are.
This is not a novel problem. Consider:
Open-source licenses span a spectrum from MIT's permissiveness to AGPL's copyleft obligations, with dozens of meaningful variations in between (patent grants, attribution requirements, network use clauses, compatibility terms).
Creative Commons decomposes a similar space into composable attributes: BY, SA, NC, ND — plus CC0 and the public domain mark.
GDPR and data protection frameworks introduce concepts like purpose limitation, data minimization, and the right to erasure, each of which maps differently onto technical systems depending on jurisdiction.
The IETF AIPREF working group is currently developing vocabularies for AI usage preferences that decompose "AI training" into subcategories, introduce the concept of declaring parties, and define reconciliation rules for conflicting preferences.
In all of these cases, the landscape is not static. New license types emerge. Old ones are deprecated. Edge cases produce new categories. Courts and communities reinterpret boundaries. A prescriptive enumeration of four intent categories will face the same pressures — and the same inadequacy — that a hypothetical "four types of software license" specification would.
The question is not whether users need to express intents. They do. The question is whether the protocol layer should attempt to enumerate and freeze those intents, or whether it should provide the machinery for intents to be expressed, composed, and evolved as living data.
Proposal: A Descriptive Meta-Framework
Design Principles
- 1.
Descriptive, not prescriptive. The framework does not define what intents exist. It defines how intents are structured and how they relate to each other. Consensus categories emerge from observed patterns in real declarations, not from committee deliberation.
- 2.
Postelian. Services should be conservative in what intent declarations they emit (using well-known vocabularies where possible) and liberal in what they accept (gracefully handling unknown intent vocabularies, treating unfamiliar terms as advisory rather than discarding them).
- 3.
Composable. Intents should be decomposable into features (like Creative Commons attributes) and recomposable into new configurations. An intent vocabulary is a particular composition of features; a meta-framework is the space of possible compositions.
- 4.
Evolvable. Intent vocabularies live in data, not in protocol specifications. They can be versioned, forked, and extended without protocol-level changes. New vocabularies can emerge from any community and gain adoption through use, not through central approval.
- 5.
Observable. The framework assumes that intent expressions already exist in the wild — in robots.txt files, in Creative Commons metadata, in AIPREF headers, in the Bluesky user intents proposal, in Bridgy Fed opt-in records, in platform-specific privacy settings. The job of the meta-framework is to describe the relationships between these existing expressions, not to replace them.
Cross-Vocabulary Mapping via panproto Lenses
Different communities are already expressing equivalent intents in different vocabularies. The Bluesky user intents proposal says syntheticContentGeneration: { allow: false }. AIPREF says train-ai=n. A Creative Commons license implies certain reuse permissions through its legal code. A Bridgy Fed opt-in record expresses bridging intent through yet another mechanism. These are all describing overlapping regions of the same conceptual space, but in mutually unintelligible formats.
The descriptive approach does not ask these communities to converge on a single vocabulary. Instead, it asks: can we formally describe the relationships between the vocabularies that already exist?
panproto provides exactly this capability. It treats schemas across languages (including ATProto Lexicons) as instances of a common algebraic structure, and defines lenses — bidirectional mappings between two schema representations that can be validated for round-trip correctness. Where two vocabularies overlap, a lens formalizes the overlap. Where they diverge, the lens records what was lost in a complement, so the mapping can be reversed without inventing data.
Example: Mapping User Intents to AIPREF
A Bluesky user has set their intent declaration:
{
"$type": "org.user-intents.declaration",
"syntheticContentGeneration": {
"allow": false,
"updatedAt": "2026-03-15T10:00:00.000Z"
},
"publicAccessArchive": {
"allow": true,
"updatedAt": "2026-03-15T10:00:00.000Z"
}
}
An HTTP service (say, bsky.app) needs to serve this user's content with the appropriate AIPREF Content-Usage headers. A panproto declarative lens specification defines the mapping between these two representations. Critically, this specification is itself JSON — which means it can live as an atproto record on the network, versioned and signed like any other piece of data:
{
"$type": "dev.panproto.lens",
"id": "community.intents.lens.user-intents-to-aipref.v1",
"source": "org.user-intents.declaration",
"target": "ietf.aipref.content-usage",
"steps": [
{ "hoist_field": { "path": "syntheticContentGeneration.allow" } },
{ "rename_field": { "old": "allow", "new": "train-ai" } },
{ "apply_expr": {
"field": "train-ai",
"expr": "if train-ai then 'y' else 'n'"
}},
{ "remove_field": { "name": "publicAccessArchive" } },
{ "remove_field": { "name": "updatedAt" } },
{ "remove_field": { "name": "syntheticContentGeneration" } }
]
}
Each step in the pipeline is a panproto combinator. Reading the lens top to bottom: hoist the nested allow boolean up to the top level, rename it to AIPREF's train-ai, coerce the boolean to AIPREF's "y"/"n" token format, then remove the fields that AIPREF has no concept of — publicAccessArchive, updatedAt, and the now-empty syntheticContentGeneration wrapper. Every remove_field records what it dropped in the complement, so the round trip is lossless.
Applying this lens forward produces the AIPREF view:
Content-Usage: train-ai=n
And nothing else — because publicAccessArchive has no AIPREF equivalent. The lens makes this gap explicit: each remove_field is a declaration that the target vocabulary cannot represent something the source vocabulary can.
Applying put in reverse — say, after a user changes their preference through an AIPREF-aware interface — the complement restores everything AIPREF couldn't carry: publicAccessArchive comes back untouched, updatedAt is preserved, and the change to train-ai propagates back as a change to syntheticContentGeneration.allow.
The lens is data on the network. It lives in an atproto repository, referenced by an AT-URI, signed by whoever published it. It can be versioned, forked, debated, and replaced — just like a lexicon, a label, or a post. Anyone can publish a competing lens that maps the same source to the same target differently; services choose which lens to apply, and that choice is itself legible.
Users declare once, in whatever vocabulary their community uses. Services interpret through lenses. Gaps are explicit — every remove_field is a declaration that the target vocabulary can't represent something the source can, and the complement records what was dropped rather than silently discarding it. When vocabularies evolve, panproto's lens laws (GetPut and PutGet) guarantee that updated lenses can be formally validated for round-trip correctness.
The Feature Matrix
The thing that makes this work in practice is that intent vocabularies can be decomposed into a feature matrix — the same way open-source licenses can be.
Consider a simplified matrix:
Most of the ? and — cells in this matrix represent ambiguities and gaps in existing vocabularies. That is exactly the point. A prescriptive specification must resolve these ambiguities before shipping. A descriptive framework can ship while they remain unresolved, because it provides the structure for communities to express their own resolutions independently.
The feature matrix is not a specification to be ratified. It is an empirical observation to be refined. As more intent expressions appear in public data — in atproto repos, in HTTP headers, in robots.txt files — the matrix can be populated, extended, and corrected. panproto lenses formalize the relationships between columns. Community consensus emerges from convergence in the data, not from upfront agreement on definitions.
Method: Consensus Through Observation
I deliberately do not propose a governance framework here. Instead, I propose a method:
- 1.
Catalog existing intent expressions. Survey the landscape of mechanisms already in use: user intents declarations, AIPREF headers, robots.txt AI directives, Creative Commons metadata, Bridgy Fed opt-in/opt-out records, platform-specific settings (e.g., DeviantArt's AI training toggle, AO3's AI policy). Document what features each mechanism can express and what it cannot.
- 2.
Extract the feature matrix. Identify the atomic features that recur across mechanisms. Where two mechanisms express the same concept differently, note the equivalence. Where they express concepts the other cannot, note the gap. This is empirical work, not design work.
- 3.
Write lenses. For each pair of vocabularies with meaningful overlap, write a panproto lens formalizing the mapping. Publish these lenses as open, versioned artifacts. Invite review from the communities that maintain each vocabulary. Note that this doesn't require n² pairwise lenses — because panproto lenses compose (they're associative and transitive), we can define a hub meta-format (in the style of relationaltext or standard.site) and write lenses from each vocabulary to the hub. Adding a new vocabulary to the ecosystem then requires one new lens, not one for every existing vocabulary.
- 4.
Ship profiles that codify observed patterns. When clusters of users independently arrive at similar feature configurations, create a named profile that captures that pattern. The profile is a description of existing practice, not a prescription for future practice.
- 5.
Iterate. As new intent mechanisms emerge (and they will — this is a fast-moving space), repeat the process. The meta-framework grows by accretion and observation, not by committee.
This method is borrowed from how successful meta-standards have historically emerged. The SPDX License List did not invent a new license taxonomy; it cataloged existing licenses and assigned them stable identifiers. Dublin Core did not invent metadata categories; it observed recurring patterns in library cataloging and web metadata and gave them names. The meta-framework for user intents should follow the same path.
Compatibility with the Bluesky User Intents Proposal
This proposal is not a rejection of user intents. It is a different way of looking at the standards process, with an eye to embracing true community consensus over bureaucratic declaration. The user intents vocabulary — four categories, tri-state boolean values, a single record per account — is one vocabulary among many that the meta-framework can describe and relate. A panproto lens formalizes its relationship to other vocabularies, including AIPREF, Creative Commons metadata, and whatever comes next.
Services that only understand the user intents vocabulary would continue to work. A service encounters an intent declaration in an unfamiliar vocabulary, applies the user intents lens, and extracts the familiar syntheticContentGeneration, publicAccessArchive, bulkDataset, and protocolBridging values. If the declaration contains features that have no user intents equivalent, the lens simply does not emit them — the service sees only what it understands.
The key difference is that user intents becomes the floor, not the ceiling. Users who want exactly what it offers can use a compatible profile and nothing else. Users and communities that need more expressiveness — per-protocol bridging preferences, commercial/non-commercial distinctions, jurisdiction-specific terms — can build on the same foundation without waiting for a protocol-level revision.
Compatibility with IETF AIPREF
The IETF AIPREF working group is developing complementary specifications: a vocabulary for AI usage preferences and a mechanism for attaching those preferences to HTTP content. As the example above illustrates, a panproto lens between user intents and AIPREF terms allows atproto infrastructure that hydrates intent metadata into HTTP responses (e.g., on bsky.app) to emit the appropriate headers or robots.txt directives. The lens is the single point of translation — maintained externally, versioned independently of either specification, and auditable by both communities.
AIPREF terms also serve as evidence for the feature matrix. Where AIPREF defines a category, that category is a candidate for the descriptive vocabulary — not because we adopt it wholesale, but because its existence in a major standards effort is evidence of a real, recurring concept that the matrix should account for.
Discussion
Why Not Just Extend User Intents?
One response to the granularity requests in the discussion thread is to simply add more intent categories over time. This is tempting but fragile. Each new category requires protocol-level consensus, implementation across all clients and tooling, and carries the risk of semantic drift as the real-world meaning of a category evolves faster than its specification.
The analogy to open-source licensing is instructive. The OSI does not define a single license. It defines criteria for what counts as an open-source license and maintains a registry of approved licenses. Similarly, Creative Commons does not define a single permission set. It defines composable attributes and a combinatorial space of licenses. Both organizations have been able to evolve their frameworks over decades precisely because they are meta-frameworks, not prescriptions.
What "Living Data" Means
The title of this proposal uses the phrase "living data" deliberately. Intent declarations are not configuration files. They are not legal contracts. They are not access control lists. They are living records in a public, versioned, cryptographically-signed data repository — and they exist in an ecosystem where other living records (posts, profiles, labels, lexicons) are already evolving, being interpreted, reinterpreted, and acted upon by a diverse network of services.
The meta-framework treats this liveness as a feature, not a bug. Intent vocabularies will fragment. Communities will disagree. Edge cases will proliferate. The framework's job is not to prevent this — it is to make the fragmentation legible and the disagreements tractable, by providing shared structure (the feature matrix), shared tooling (panproto lenses), and a shared method (observation, extraction, codification) for turning messy reality into useful interoperability.
References and Related Work
IETF
ATProto Ecosystem
standard.site — community lexicon for longform publishing
lexicon.community — community lexicon governance
panproto — schematic version control across schema languages
License Frameworks (as structural precedent)
SPDX License List — cataloging and stable identification of existing licenses
Dublin Core Metadata Initiative — consensus metadata vocabulary extracted from observed practice
Prescriptive vs. Descriptive Systems
This Title Left Intentionally Blank — My ATmosphereConf 2026 talk on prescriptive vs. descriptive approaches to system design