Open Collective
Open Collective
October 2023 update
Published on October 5, 2023 by Andre Staltz

Hi backers,

Thank you once again for being a vital (and reliable!) part of this project. First I want to inform of a small release we made of the app, containing this feature that prevents malicious use cases:

  • 🔷 Show dialog when pressing unrecognized links

Next, I want to talk about garbage collection in our new protocol codename PPPPP. This is not a bonus feature of the protocol, it is in fact the main motivator, if you recall what I said back in March.

A system that is designed to grow indefinitely will inevitably lead to business models and then eventually to surveillance capitalism. The costs of storage grow to a point where you need to create an income stream to support them, and one obvious way of monetizing a large social database is to commoditize user data. Things just evolve beyond the capacity of end-user devices, and the only way to keep up is to manage bigger machines. As an example, in the early days of Bitcoin, it used to be possible to mine on a laptop, but now you need several specialized machines, and this only makes sense if you make income from it.

We want to design PPPPP to be always safely sustained by end-user devices, and the occasional hobbyist server with tiny storage requirements. Garbage collection ensures that data is frequently deleted. This is not merely about deleting data, which is something we already have in SSB to some degree. It also involves making sure that the data is not re-fetched. We shouldn't simultaneously delete data and request to replicate it.

So what I did this past month was to devise a system that can control what data should be fetched and what should be deleted. It starts by tracking goals: for each tangle, declare how much of it you want to replicate (if any at all). This is the basis for the replication module (tangle-sync) and the garbage collection module (gc). They function as a duo: when there is a goal with missing messages in the database, we perform replication; when the database has a message with missing goals, we perform garbage collection. One example of a goal is "newest-100" which says that we're only interested in the 100 most recent messages for a particular tangle, say for instance Bob's emoji reactions. The dual system will then delete Bob's emoji reactions that are too old while replicate newer reactions, trying to synchronize the state of the database with this "newest-100" goal.

This is now implemented in the aforementioned modules, and I believe it'll work. To finish GC, I still need to work on the replication/deletion of "CRDT" messages of records and sets, which is slightly more involved in algorithms. But I'm confident that the "goals" system with the "dual" replication/gc modules is a robust way of approaching this problem. I don't foresee a lot of problems here, which is good because we're approaching full implementation of the protocol.

Before I close this update, there also a few minor things I coded that are worth mentioning:

  • - Improvements in the db module
  • - Upgrading ppppp-record and ppppp-set to the new db module after 3 months of not touching them
  • - Changing/improving the API of the two modules above

Our updated roadmap/TODO diagram looks exciting:

Warm greetings,

– @andrestaltz