What IronMesh protects,
and what it doesn't.
A practical threat model for developers evaluating agent-to-agent messaging. Primitives, boundaries, failure modes, and the exact libsodium calls involved. No hand-waving.
1. Cryptographic primitives
Everything crypto in IronMesh ultimately calls libsodium via PyNaCl. No homebrew ciphers. No TLS-as-substitute-for-e2e. The primitives are:
Long-term identity key per node. Signs every HELLO, every outer frame (hop authentication), and optionally every inner payload (source authentication that survives relay).
Fresh ephemeral keypair per session + per rekey. Shared secret derived via libsodium crypto_box_beforenm, which rejects all known small-order points. Ephemeral private keys are secure_wiped immediately after derivation.
NaCl SecretBox with 24-byte random nonce per encrypt. Authenticated (Poly1305 MAC). Collision-bounded at 296 — safe for any realistic session lifetime.
Stage 1 mutual passphrase proof. Constant-time comparison via hmac.compare_digest. Single-use nonce bound to the TCP connection prevents replay.
Identity private key is Argon2id-derived-key-wrapped in keys.json with MODERATE ops/mem limits and a per-save 16-byte salt.
Each audit entry's HMAC covers the previous entry's HMAC. Tamper anywhere in history = chain verification fails from that point forward. Cross-process writes are serialized by sentinel-file flock.
RuntimeError from libsodium rather than an exploitable all-zero shared secret. Verified as part of the v0.8.5.6 release hardening.
2. Threat model
| Adversary capability | IronMesh covers it? |
|---|---|
| Passive network observer on the LAN | Yes — all post-handshake traffic is SecretBox-encrypted. |
| Active MITM on first contact | Yes — passphrase HMAC over server nonce, then HELLO signed by long-term identity key with channel binding to the same nonce. |
Identity-key theft (attacker copied keys.json) | Partial — attacker can impersonate the node until you revoke. Forward secrecy protects past sessions; TOFU mismatch on new peers surfaces the issue. Rotation + revocation supported. |
| Compromised peer trying to escalate via capability set change | Yes (v0.8.5.6+) — capability-set binding demotes to pending-cap-change on any hash change; inbound messages queue until operator re-promotes. |
| Cross-transport replay (WebSocket then Reticulum, or vice versa) | Yes (v0.8.5.6+) — dedup cache tracks origin transport per (source, msg_id); a duplicate on a different transport fires a dedicated audit event. |
| Colliding writer on the same trust-store file | Yes (v0.8.5.6+) — MAC-mismatch latches the trust store read-only; refuses to save and thereby refuses to overwrite a file written by a different identity. |
| SIGKILL / power loss mid-write of state files | Yes — every state file (keys.json, known_peers.json, routes.json, capabilities.json) uses atomic tmp + fsync + rename. Audit log also uses fsync per entry and leading-newline recovery. |
| Selective-drop / frame-inject MITM after handshake | Partial — every frame is individually signed and SecretBox-authenticated, so single-frame tampering is caught. Rolling transcript hash (designed, lands in v0.9) closes the multi-frame selective-drop gap. |
| Traffic analysis / metadata (who talks to whom, when) | Out of scope — IronMesh is not an anonymity system. If the threat is "a LAN observer shouldn't see that nodes A and B talk at 14:37," you want mixnets, not IronMesh. |
| Side-channel attacks on the underlying crypto library | Out of scope — IronMesh relies on libsodium. If libsodium has a timing side-channel, so does IronMesh. |
| Malicious operator with physical / shell access to a node | Out of scope — at-rest encryption slows this down but does not stop it. OS-level isolation is your problem, not the protocol's. |
3. TOFU pinning + capability-set binding
IronMesh uses Trust-On-First-Use pinning for peer identity. The first time a node sees a peer, it records the peer's Ed25519 public key. Every subsequent connection verifies the key hasn't changed. A mismatch emits EVENT_TOFU_MISMATCH and refuses the connection — exactly the pattern SSH uses for host keys.
v0.8.5.6 extends this with capability-set binding: the trust record also stores a canonical SHA-256 hash of every capability the peer advertises. When a peer reconnects with a changed capability set, IronMesh demotes it to pending-cap-change and queues inbound messages until an operator re-promotes via ironmesh trust cap-promote.
The canonical form matters: capability tokens are stripped, deduplicated, sorted lexicographically, joined with \n, and prefixed with a domain separator before hashing. Stable against reordering and whitespace, case-sensitive by design. Proven invariant against arbitrary reorder + duplication by Hypothesis fuzz tests.
# TrustStore pseudocode
observed_hash = SHA256(b"ironmesh-cap-hash-v1:" + "\n".join(sorted(unique(caps))))
if observed_hash != stored_baseline_hash:
demote_to("pending-cap-change")
stash_pending(observed_hash, observed_set)
log(EVENT_PEER_CAP_SET_CHANGED, {"added": [...], "removed": [...]})
# inbound messages queue at the daemon until an operator
# runs `ironmesh trust cap-promote <node_id>`
4. Forward secrecy + rekey
Every session derives its SecretBox key from a fresh X25519 ECDH between ephemeral keypairs — not the long-term identity key. The ephemeral private keys are secure_wiped from memory immediately after the shared secret is derived.
Consequence: if an attacker compromises a node's long-term identity key today, they cannot decrypt any session IronMesh had yesterday. The only exploitable window is from-now-on until you rotate.
Long-lived sessions rekey periodically (configurable, default 30 minutes) via REKEY_REQUEST / REKEY_RESPONSE. Each rekey generates fresh ephemeral keys on both sides, wipes the old ones, and resets the per-session sequence counter + replay window. A captured session key from minute 32 doesn't decrypt anything from minutes 0–30.
5. Tamper-evident audit log
Every security-relevant event — key rotations, TOFU outcomes, authentication failures, peer connects/disconnects, capability changes, operator actions — is appended to ~/.ironmesh/audit.log with an HMAC-SHA256 chain. Each entry's HMAC covers the previous entry's HMAC, so tampering anywhere in history breaks verification from that point forward.
Verify integrity with:
$ ironmesh audit verify
OK · verified 7135 entries
v0.8.5.6 hardening added:
- Cross-process sentinel-file exclusive lock around every write (
fcntl.flockon POSIX,msvcrt.lockingon Windows) so multiple processes on one host can't race and corrupt the chain. - Re-read chain tail under the lock before computing the next HMAC — defeats the stale-cache corruption that was possible even with in-process locking.
- Single-syscall
os.write+fsyncper entry for torn-write resistance. - Leading-newline recovery: if the previous write was truncated mid-line by a SIGKILL, the next write prepends a
\nso the new entry doesn't concatenate onto the torn line.
6. Operator surface + pending-trust gate
v0.8.5 introduced a pending-trust message gate (opt-in; default-on in v0.9). When enabled, new peers discovered via mDNS or mesh routing are pinned in pending state rather than trusted. Inbound MSG frames from a pending peer queue at the daemon; the operator sees them in the dashboard and explicitly promotes the peer before messages deliver.
Every state transition leaves an audit trail. As of v0.8.5.6, every CLI-driven trust mutation emits an audit event with an actor: "cli" marker:
| Operator action | Audit event |
|---|---|
ironmesh trust set-state <id> trusted | PEER_PROMOTED |
ironmesh trust set-state <id> blocked | PEER_BLOCKED |
ironmesh trust set-state <id> pending | PEER_STATE_CHANGED |
ironmesh trust revoke <id> (local) | PEER_REVOKED_LOCAL |
ironmesh trust revoke --broadcast | Signed REVOCATION message (network-propagated) |
ironmesh trust cap-promote <id> | PEER_CAP_ACCEPTED |
7. Reporting a vulnerability
If you believe you've found a security issue in IronMesh, please:
- Email info@ironmesh.org with a short description — please do NOT open a public GitHub issue for the initial report.
- Include reproduction steps (minimal CLI or Python snippet is ideal), expected vs. actual behavior, and the IronMesh version.
- For critical issues, expect an acknowledgement within 48 hours and a coordinated disclosure timeline after that.
The full security policy is in SECURITY.md on GitHub.