CTHULU beats CRUD 2016-05-04

The classic model for data in computing is known as CRUD for:

Create
Read
Update
Delete

When you have a world model that has a single mutable current state on a single device, this makes a certain amount of sense; you need to be able to make things, read them back, change them and remove them.

As soon as you step outside this worldview, you immediately hit a host of problems. The first one is atomicity - if there is more than one process acting on the data, then you are set up for race conditions. Not just simple create/read/delete ones, but the dreaded Read/Modify/Write one that is baked in by the common naïve implementation of Update.

In addition, you hit a granularity and comprehensiveness problem. With CRUD, you are assuming that any processes given permission have consistent understanding of the data structure - the Read/Update cycle implicitly assumes that the reader has wholly parsed the structure, modified it, and sent back a consistent version.

This can interact badly, even when both sides of the transaction are able to do this, if the user model is inconsistent. An example of this is when Google Contacts was first integrated with the iPhone address book. The iPhone fetched all the google contact entries and put them in the phone. However, gmail added any email address you had previously sent mail to to the contacts list for future autocomplete. The iPhone helpfully copied these across, filling the iPhone with stray addresses from years ago. Seeing this, iPhone users cleaned them up manually, grumbling about it as their phone app was no longer useful. Later, they tried to use gmail on the desktop and found that it no longer autocompleted.

With a 'keep all the versions' world view they could have rolled back.

This becomes even more difficult in the Indieweb POSSE world, where you expect to post in one place and have it replicated elsewhere. In this environment actually deleting information is tricky; remote copies are often made by pulling feeds or lists as well as by point publishing or edit events, and communicating an absence in a list is difficult, especially when the common model is to only show the most recent few updates.

Tantek has proposed CDURU for this, but that still has problems propagating the deleted entities. What we need is to send tombstones as well as death notices.

A tombstone is what we call it when we replace an actual data entry with a notice that it has been deleted. It keeps the same position in the list and unique identifier, but replaces the contents with empty versions or warning text, and adds a deleted date.

This brings me to the stages of CTHULU:

Create (new entry)
Tombstone (replace the entry with a tombstone)
Hide (once the tombstone has propagated, suppress it from the list)
Unhide (if we want to restore the entry, revert it to the original)
List (the subscribed copy needs a list of entries to spot which ones are gone)
Update (old fashioned changing of the entry)

For a concrete example, consider the common posting case for a personal site. You Create a post, and add it to the feed list on your homepage (and other feed formats, if you do that). Then you send out notice via PubSubHubbub or POSSE, and Webmention any linked pages so they see it too. Then Google, archive.org, woodwind.xyz and other crawlers fetch a copy of the post page and the feed page too.

In other words, your post now exists in many places online, cached by others. If you Update your post, you go through this process again, and the remote copies get updated too, eventually (some places, like archive.org keep a public history of them).