Earthstar + Willow in June '23
Published on June 9, 2023 by Sam Gwilym
Earthstar LAN sync
Earthstar v10.2.0 can now automatically discover other Earthstar peers on a local network and sync with them over an encrypted TCP connection. This feature is enabled by the new DNS-SD implementation which I wrote about in the last update, as well as some super-annoying streaming encryption stuff I wrote which uses WebCrypto APIs.
Earthstar's Peer class has a new discover method, which you can pass different discovery services (such as DiscoveryLAN). This returns an asynchronous iterator of discovered peers, which you can choose to sync with or not:
// Start a new discovery service, and advertise ourselves as "My Computer". const discoveryLan = new DiscoveryLAN({ name: "My Computer" }); // Wait for new discovery service events... for await (const discoveryEvent of peer.discover(discoveryLan)) { if (discoveryEvent.kind === "PEER DISCOVERED") { // We'll sync with everyone we find! discoveryEvent.sync(); } }
After banging my head against multicast DNS for two months, I thought building this would be a welcome change of pace. Unfortunately I almost instantly found myself implementing an encrypted TCP transport so that LAN sync sessions would not be carried out in the clear.
The encryption scheme is on well-trodden ground: both peers derive a common key from a Diffie Hellman key exchange and then use that to encrypt and decrypt messages with AES-GCM. This is good enough, but I have been told it is vulnerable to MITM attacks where peers sync over a network with hardware controlled by an adversary. So think before you sync™.
This is a feature I've wanted for a long time, and I hope that the new discovery APIs can open the door to other ways of finding peers (e.g. bluetooth, LoRa, DHTs).
With that done, Aljoscha and I have started working in earnest on Willow.
Willow progress: summarisable storage
(I'm not sure 'summarisable' is a word that is used, but let's do me a favour and say it is).
Willow is a new general-purpose sync protocol built upon ideas from Earthstar. Willow pays much closer attention to efficiency, is more flexible, and adds some exciting new features too. Earthstar 11 will in turn be built upon Willow!
Willow’s sync will work with range-based set reconciliation, which is the same technique used in the current version of Earthstar. This technique relies on peers comparing summaries of their data with each other: when summaries match, they know they’re in sync; when they don’t, they compare summaries of smaller and smaller slices of their data until they identify the discrepancy and exchange data.
Building something that can compute these summaries is not difficult, but building something which can compute them efficiently is quite a bit harder.
Earthstar currently uses a specialised data structure to efficiently compute these summaries. But this data structure — a monoid red black tree — is only kept in memory. This means that this special data structure has to be created from scratch fairly often! And the bigger the data set, the longer it takes. This can seriously impact how long sync takes in Earthstar 10.
Ideally we’d like to merge persistent data storage with summary data to create ‘summarisable storage’. This results in quicker sync, and less energy and memory used.
If we want to make this work for Earthstar someday (I do), the challenge is finding a data structure which lends itself well to efficiently computing data summaries and which can be persisted in commonly available storage APIs available in browsers, Deno, and Node. Urgh.
Monoid skip lists
A skip list is a probabilistic data structure made for fast search, which looks a bit like this:
The nodes which ‘skip’ over many nodes on the layer below are what enables the fast search. Instead of traversing 5 elements, you could traverse over 1 instead. It's... computer science!
The nodes which ‘skip’ over many nodes on the layer below are what enables the fast search. Instead of traversing 5 elements, you could traverse over 1 instead. It's... computer science!
Efficient summary computation relies on precomputation of some range’s summaries and reusing those over time. Then you can compute the same summary by combining fewer parts than you otherwise would.
What you want is a data structure which offers a intuitive way to represent groupings of many elements. For instance... a skip list node which skips over many elements below?
For every skip list item, we precompute a 'label' which can be used later to create a summary of any given range. So we can improve the speed of summary computation the same way a skip list speeds up its search!
And there's more! A skip list can be handily represented and accessed within a key-value store like LevelDB. Or IndexedDB. Or Deno.KV. So keeping parity with the runtimes Earthstar can run on is solved.
Best of all, I've now written one of these things in TypeScript, filling in one of the biggest gaps we had in our JavaScript Willow implementation.
There's still a lot to do, but the project is moving at a pace I feel good about. Until next time!