Introducing plcbundle

Editor

Published 2 hours ago

The AT Protocol, the technology powering Bluesky, is built on a foundation of user-controlled identity. At the heart of this is the ⁠did:plc method, a decentralized identifier that points to a user's data repository, with the official DID PLC Directory at plc.directory serving as the source of truth.

For developers building services on the AT Protocol, having a reliable, local, and complete copy of this ledger is crucial. Existing solutions like Allegedly or Parakeet PLC Mirror do a great job of providing access to DID data. However, they are primarily focused on serving individual DIDs. If you need to replicate the entire dataset, you often have to blindly trust that their copy is complete and accurate.

This was the exact problem I faced during the development of ATScan V2, my upcoming infrastructure scanner and indexer for the AT Protocol network. I needed a transparent, verifiable, and efficient way to synchronize the entire history of the DID PLC Directory.

This need led to the creation of plcbundle, a new tool and specification focused on synchronization and transparency.

A Changing Landscape

It's important to acknowledge that the DID PLC Directory is evolving. Bluesky PBC has announced plans to establish an independent Swiss association that will operate the directory as the Public Ledger of Credentials. They are also planning technical improvements to make the directory more auditable, including a new WebSocket route for real-time operation streaming and an official PLC Mirror Service reference implementation.

These are fantastic developments that will benefit the entire ecosystem. The goal of ⁠plcbundle is not to compete with these efforts, but to provide a robust, transparent solution that works today. It is an independent, community-driven initiative designed to solve the immediate need for verifiable data replication, and it will happily adapt or be succeeded as these official improvements come to fruition.

So, What is plcbundle?

plcbundle is a tool and a specification for archiving the AT Protocol's PLC directory operations into immutable, cryptographically-chained bundles.

Instead of just serving DIDs, its main goal is to allow anyone to create a verifiable, bit-for-bit identical copy of the entire PLC ledger. It achieves this by grouping a fixed number of operations (currently 10,000) into discrete, compressed files. These files, or "bundles", are then chained together with hashes, creating a verifiable history from the very first operation to the latest.

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Bundle 1   │───▶│  Bundle 2   │───▶│  Bundle 3   │
│  (10k ops)  │    │  (10k ops)  │    │  (10k ops)  │
│             │    │             │    │             │
│ Hash: abc...│    │ Parent: abc │    │ Parent: def │
└─────────────┘    │ Hash: def...│    │ Hash: ghi...│
                   └─────────────┘    └─────────────┘

This approach provides two key benefits:

Transparency: Anyone can run the ⁠plcbundle tool to fetch data directly from the official DID PLC Directory and produce the exact same bundles with the exact same hashes. You don't need to trust a third party; you can build your own copy from scratch and verify it against others.

Synchronization: Because each bundle has a unique, deterministic content hash and chain hash, anyone can easily compare their collection of bundles against another source (like a public mirror) to identify and fetch only the missing pieces.

How It Works: The Nitty-Gritty

The magic of plcbundle lies in its simple, deterministic process. This ensures that anyone, using any implementation (currently Go and some examples in TypeScript, Python or Ruby), will produce identical output.

The technical specification for the plcbundle V1 format, index, and creation process can be found in the specification document.

Here’s a breakdown of the process:

1. Fetching and Bundling

The tool fetches operation logs chronologically from the PLC directory's "/export" endpoint. It collects these operations in a "mempool" until it has exactly 10,000.

2. Serialization and Content Hashing

These 10,000 operations are serialized into a newline-delimited JSON (JSONL) file. A SHA-256 hash of this file's content is then calculated. This is the content hash, which uniquely represents the data within the bundle.

3. Compression

The JSONL file is compressed using Zstandard (.zst) to save space. A typical bundle containing 10,000 operations is around 3-5 MB compressed, down from 7-15 MB uncompressed.

4. Cryptographic Chaining

This is the core of plcbundle's verifiability. A chain hash is calculated for each bundle, linking it to the previous one.

The formula is:

const chainHash: string = sha256(
  parentChainHash + ":" + currentContentHash
);

The very first bundle (the "genesis bundle") uses a fixed prefix instead of a parent hash.

const calculateChainHash = (parent: string, contentHash: string): string => {
  let data: string;
  if (!parent || parent === '') {
    data = `plcbundle:genesis:${contentHash}`;
  } else {
    data = `${parent}:${contentHash}`;
  }
  return sha256(data);
};

This means that Bundle #2's hash depends on Bundle #1's content, and Bundle #3's hash depends on Bundle #2's hash, and so on. If even a single byte changes in any previous bundle, the chain hash for all subsequent bundles will be different.

5. The Index

Finally, metadata for each bundle is stored in a simple JSON file, plc_bundles.json. This index contains the bundle number, start and end times, operation counts, and — most importantly — the hashes.

Here is what the metadata for a single bundle looks like in the index:

{
  "bundle_number": 42,
  "start_time": "2023-07-18T09:19:29.975Z",
  "end_time": "2023-07-21T00:46:19.802Z",
  "operation_count": 10000,
  "did_count": 9323,
  "hash": "b2b5df26c5cc46a1fe187f0f2201abe3a730663d6fc2d4d3c5150bf5d8561fd8",
  "content_hash": "7e3d58bfb9ae887747a5f2bb65e88b13a0dd9f8040b08dfdd7f4489b12974e4d",
  "parent": "67934256a398e061832a6d343d8a963383570ae6bc473e48c03eab3d8a747b71",
  "compressed_hash": "271b877e78ed31efd5f4f365b83cb338a602f58c2ac7fc3d817a0d282a169c43",
  "compressed_size": 1447100,
  "uncompressed_size": 6909659,
  "cursor": "2023-07-18T09:17:47.635Z",
  "created_at": "2025-10-28T08:11:31.359877Z"
}

Here you can check out the current index of my mirror: https://plcbundle.atscan.net/index.json

Using plcbundle CLI

The primary way to interact with ⁠plcbundle is through its command-line tool, written in Go for performance and reliability.

@atscan.net/plcbundle

A Transparent and Verifiable Way to Sync the AT Protocol's PLC Directory

https://tangled.org/@atscan.net/plcbundle

Fetching Bundles

To start syncing the DID PLC Directory, first create a new directory for your repository, ⁠cd into it, and then run ⁠plcbundle fetch:

# 1. Create a directory for your data
mkdir plc_bundles && cd plc_bundles

# 2. Fetch all available bundles
plcbundle fetch

This will create a plc_bundles directory, fetch operations, and create the bundles and index file. It will automatically pick up where it left off on subsequent runs.

Getting Information

You can get a comprehensive overview of your local repository with the plcbundle info command.

plcbundle info --verify

# Output:
# ═══════════════════════════════════════════════════════════════
#               PLC Bundle Repository Overview
# ═══════════════════════════════════════════════════════════════
#
# 📁 Location
#    Directory:  /path/to/your/repo
#    Index:      plc_bundles.json
#
# 📊 Summary
#    Bundles:       8,841
#    Range:         000001 → 008841
#    Compressed:    32.1 GB
#    Uncompressed:  301.4 GB
#    Ratio:         9.39x compression
#
# 📅 Timeline
#    First Op:      2022-12-21 18:40:05 UTC
#    Last Op:       2024-05-23 14:20:10 UTC
#
# 🔐 Chain Verification
#    Verifying 8841 bundles...
#    ✓ Chain is valid
#    ✓ All 8841 bundles verified
#    Head: 9a7e8f...

Comparing and Verifying

The plcbundle compare command is where the transparency model shines. You can point it at the index file of any other public plcbundle mirror and instantly see if your copy matches theirs. As a live example, I maintain the first public mirror at plcbundle.atscan.net:

# Compare your local repository with a public mirror
plcbundle compare https://plcbundle.atscan.net/index.json

If there are any differences — missing bundles, extra bundles, or hash mismatches — the tool will report them. This allows you to trust but verify, ensuring the integrity of the data you are using.

Serving Your Own Mirror

The Go implementation also includes a lightweight, built-in web server as plcbundle serve. With a single command, you can expose your local bundle repository over HTTP. This makes it incredibly simple to host your own public or private mirror, providing a reference for others to compare against or an endpoint for your own distributed services.

# Serve your local bundles on http://localhost:8080
plcbundle serve

The Big Picture and What's Next

The primary motivation for plcbundle was to create a solid foundation for ATScan V2. But its potential goes far beyond that. It can be used for:

Public Mirrors: Anyone can host a complete, verifiable mirror of the PLC directory on any static file host.

PDS & App View Hosting: A local plcbundle mirror can provide fast, reliable DID data to a PDS or other services without constantly hitting the main PLC directory.

Research and Analysis: The complete, structured history of PLC operations is a valuable resource for researchers studying the growth and dynamics of the AT Protocol network.

The project is still in its early stages, but the Go implementation is almost stable and already being used to build the ATScan indexer.

If you're interested in building on the AT Protocol and need a robust way to work with DID PLC data, I encourage you to check out the project on Tangled:

Go Implementation (CLI & Library)

Example implementations in TypeScript, Python and Ruby

Thanks for reading! 木

made using Leaflet