Environmental Sensors in the ATmosphere
Environmental sensor networks generate substantial volumes of observation data – river levels, wave heights, atmospheric pressure, wind speed – yet these data remain, in the majority of cases, confined within centralised APIs maintained by individual agencies. Each agency operates its own query format, its own authentication model, and its own availability guarantees. When an API is decommissioned or an endpoint is restructured, the data can become inaccessible to consumers that depend upon it. In addition, many producers currently provide publicly-accessible data on a "best-effort" basis, explicitly asking that consumers do not stress the infrastructure which produce them.
ATProto presents an alternative architecture. Every record published to a Personal Data Server (PDS) forms part of a signed, content-addressed Merkle tree. Records are portable across hosting providers, authenticated by the publisher's decentralised identifier (DID), and discoverable through a federated network. The firehose mechanism broadcasts every commit in real time, enabling any subscriber to construct specialised indexes without dependence on the original publisher's infrastructure.
This project investigates whether ATProto's data model – originally designed for social networking – can also serve as a substrate for environmental sensor data. Following five proof-of-concept implementations against a live PDS, the results confirm that it can.
The data model
The OGC SensorThings API defines a well-established entity model for IoT sensor data: Things host Sensors, which observe Properties through Datastreams that produce Observations. This model maps naturally onto ATProto's collection-based storage, with each SensorThings entity type represented as an ATProto collection under the dev.sensorthings.* namespace; relationships between entities are expressed as AT-URI cross-references.
To illustrate: a weather buoy is modelled as a Thing, its barometer as a Sensor, atmospheric pressure as an ObservedProperty. The combination of that sensor measuring that property on that platform constitutes a Datastream. Each hourly pressure reading is recorded as an Observation.
Key design decisions
Scaled integers for a no-float protocol
ATProto's data model provides no floating-point number type – only integers, strings, and booleans are supported. Environmental sensor data, however, is inherently fractional: 0.426 metres, 1013.5 millibars, 53.4836 degrees latitude.
The adopted solution is scaled integers. Every numeric Observation carries an integer value, while its parent Datastream declares a resultScaleFactor. The original value is reconstructed as value x 10^(-scaleFactor). A water level reading of 0.426 m is thus stored as integer 426 with scale factor 3. Geographic coordinates employ the established latE7/lonE7 convention (multiplication by 10^7), and altitude is stored in millimetres.
This encoding is explicit, lossless, and unambiguous. IEEE 754 precision artefacts are avoided entirely, as are string-encoded floating-point representations. The scale factor is conveyed alongside the data, ensuring that a consumer can always reconstruct the original value without reference to external metadata.
AT-URI cross-references
Relationships between SensorThings entities are represented as AT-URI strings (at://{did}/{collection}/{rkey}). An Observation references its Datastream by AT-URI; a Datastream references its Thing, Sensor, and ObservedProperty by AT-URI. This structure renders the entity graph fully navigable from any starting point, and also enables cross-account references – a community could, for instance, maintain a shared vocabulary of ObservedProperties on one PDS while publishers on other PDSes reference those same definitions.
Discriminated result types
Different sensors produce different categories of result: a pressure sensor yields a numeric value, a weather description is categorical, a GPS track constitutes a geographic point. The Observation lexicon defines a union of five result types (numeric, category, boolean, array, geo-point), discriminated by ATProto's $type field. Consumers are able to dispatch on result type without inspecting metadata, while the Datastream's observationType field provides a mechanism for consistency checking.
Batch and composite observations
The overhead of issuing one putRecord call per 15-minute reading does not scale to networks comprising hundreds of stations. The observationBatch record type addresses this limitation by packing up to 1440 entries into a single record covering a contiguous time window. For a station reporting at 15-minute intervals, a single batch record replaces 96 individual records.
Certain sensors produce multiple co-produced values simultaneously: a wave processing system derives significant wave height, maximum wave height, peak period, mean period, and mean direction from a single accelerometer burst. The multiObservation record type captures these as a single record with per-entry metadata (units, scale factors, observed properties), thereby preserving the co-production relationship that would otherwise be lost if the values were published independently.
Deterministic record keys
All observation record types employ key: "any" with deterministic rkeys derived from the datastream identifier and timestamp (e.g. sandy-mills-level-local:20260209T131500Z). In combination with putRecord, this approach renders re-publishing idempotent: executing a publisher twice produces the same records rather than duplicates. The compact ISO timestamp format (omitting hyphens and colons) sorts lexicographically and remains within ATProto's permitted rkey character set.
The lexicons
Nine lexicon schemas define the dev.sensorthings.* namespace. All have been published to the sensorthings.bsky.social PDS as com.atproto.lexicon.schema records and are resolvable via the standard ATProto NSID resolution chain (NSID -> sensorthings.dev -> _lexicon DNS TXT -> DID -> PDS -> schema record).
Browse all published schemas: pdsls.dev/at/did:plc:.../com.atproto.lexicon.schema
dev.sensorthings.thing — Physical platform: station, buoy, or device. Carries a WGS84 location.
dev.sensorthings.sensor — Instrument or procedure that produces observations.
dev.sensorthings.observedProperty — The phenomenon being measured (e.g. water level, air temperature).
dev.sensorthings.datastream — Groups observations of one property by one sensor on one thing. Carries units, scale factor, and vertical datum.
dev.sensorthings.observation — A single observation. Result is a discriminated union (numeric, category, boolean, array, geo-point).
dev.sensorthings.observationBatch — A batch of observations for one datastream over a contiguous time window (up to 1440 entries).
dev.sensorthings.multiObservation — Co-produced observations from one sensor at one time (e.g. wave statistics).
dev.sensorthings.quality — Quality flag tokens: good, suspect, missing.
dev.sensorthings.featureOfInterest — The real-world feature being observed (e.g. a river reach, a sea area).
Individual schemas can be browsed by appending the NSID to the collection URL, e.g. dev.sensorthings.observation.
Live data on ATProto
All test data resides on a single PDS account:
Browse all records: pdsls.dev/at/did:plc:gqcmwsromoknyigx2afqcvqk
Sandy Mills – River Finn, Co. Donegal
An OPW hydrometric station (station 1041) located at 54.838 N, 7.576 W on the River Finn. Water level is published as individual Observation records at 15-minute intervals, with two Datastreams: one referenced to local gauge zero, the other to Ordnance Datum Malin Head (EPSG:5731). The scale factor of 3 provides millimetre precision.
A total of 286 individual observations covering a 24-hour window are held on the PDS.
M2 Weather Buoy – Irish Sea
The Marine Institute's M2 buoy, positioned at 53.484 N, 5.430 W approximately 20 nautical miles east of Dublin. This multi-sensor platform comprises 5 instruments producing 12 observed properties: atmospheric pressure, air temperature, relative humidity, wind speed, wind gust, wind direction, sea surface temperature, significant wave height, maximum wave height, peak wave period, mean wave period, and mean wave direction.
Scalar observations (pressure, temperature, wind, SST) are published as individual Observation records across 7 Datastreams, yielding 168 records over 24 hours. Wave statistics are published as multiObservation records containing 5 co-produced values per record, yielding 24 records. The combined total is 192 observation records.
Carrick on Suir – River Suir, Co. Tipperary
An OPW hydrometric station (station 16062) located at 52.34 N, 7.41 W on the River Suir. The same dual-datum water level pattern as Sandy Mills is employed, but the data is published as observationBatch records rather than individual observations. One batch per Datastream, containing 96 entries each, covers a 24-hour window. This approach reduces the required API call count from approximately 291 to 7.
End-to-end validation
The data model has been validated in both directions.
On the write side, three publishers targeting three distinct Irish monitoring networks confirmed that the PDS accepts all custom record types, that AT-URI cross-references resolve correctly, that scaled integers round-trip without loss, and that re-publishing is idempotent.
On the read side, a standalone consumer script connected to the PDS without authentication, discovered all records across 7 collections in 12 API calls, walked the full AT-URI cross-reference graph, decoded all scaled integers and coordinates, unpacked batch entries, and reconstructed the complete entity hierarchy as a JSON file. Crucially, no prior knowledge of the data was required beyond the dev.sensorthings.* namespace and a DID.
Next Steps and Future Directions
The foundation is now established: a working data model, published schemas, and live data on ATProto. The next phase concerns consumption and visualisation.
ATProto's firehose broadcasts every record commit in real time. Jetstream, a lightweight proxy, converts the binary CBOR firehose into JSON and supports server-side collection filtering; subscribing to dev.sensorthings.* collections alone reduces bandwidth by over 99% compared to the full network firehose. A Jetstream subscriber could feed an AppView that indexes sensor records, serves time-series queries, and drives real-time dashboards.
The lexicon design is intentionally extensible. Additional Irish networks – Met Éireann synoptic weather stations (which would exercise the categoryResult type for weather descriptions), EPA water quality monitoring, SmartBay coastal sensors – could be incorporated as new publishers without modification to the schema. The entity model and encoding rules remain constant; only the data sources differ.
The broader ambition is the provision of environmental sensor data as a resilient, public, decentralised, interoperable resource that can be aggregated across agencies without bespoke integration, and that any developer can subscribe to in real time.
Links
Published lexicon schemas: pdsls.dev
Browse all sensor data: pdsls.dev
Source code and design documents: GitHub
ATProto Lexicon specification: atproto.com/specs/lexicon
OGC SensorThings API: OGC 15-078r6