A True History of the Internet: March 2023

Thursday, March 30, 2023

moratorium bibliotecha

It is time for writers and publishers to pause in their ceaseless efforts to produce more and more books.

The Large Library Machine has reached 14M texts, each of which contains up to 100,000 words, all usually carefully structured in such a way as to convey many truths, much wisdom, but also so many ideas that are just plain wrong. But how is the poor reader supposed to separate the wheat from the chaff? Even the index would take a lifetime to read, and that too has gaps and repetitions.

There are so many of these books that the business of organising them has taken over from tasks like farming, tailoring, doctoring, house building and other essential tasks, such that the population is going unfed, in rags, and constantly sniffling, from the constant breezes blowing through the gaps in the walls.

But no-one can find the right instructions for when to sew, when to plant, when to harvest, what could be used to treat the common cold, or re-plaster the walls.

No, we must call a temporary end to the constant creation of more and more written material, until we have had time to have a fair and open debate on how to accommodate this scourge without the damage to the social fabric that it has caused.

We need metrics. For example, we have no idea how to tell if a particular book will fit in a particular person's head. Nor do we know what size of library would overwhelm any person in the three score years and ten of their natural lifetime, or what would be a delicate sufficiency (assuming occasional re-reading of timeless classics) for most of us. Time to temporarily suspend the generation of new tomes. No more novels for, shall we say, six months. After all, we know that there are only 9 plots and 3 kinds of hero, with a handful of possible counterfactual realities, and merely 4 theme tunes in the televised adaptation.

Indeed, we could probably use the time to thin the libraries down to those works that have proved their worth. The rest could be used to stuff the gaps n the walls, or to weave winter warming underwear, or even to blow one's nose.

databox 2.0 - the manifesto

databox and hat were two personal data store projects (sometimes "edge cloud", sometimes "self soverign data") that were trying to solve the practical and business challenges perceived nearly a decade ago, also recognized in the web community in their solid architecture.

while all these initiatives were somewhat sucessful in their own terms, what they've not managed to do is actually create a platform that others (e.g. mastodon or matrix) could "just" use.

Why not? What's missing?

Manageability. Compared with the cloud (centralised?), edge systems lack some properties which are really non-optional for storage and compute services today (30th March 2023)...

This list is a start on what data centers and cloud computing achieved, but edge has not really delivered platforms for:- Centralised systems provide these using techniques that could be re-applied at the edge, but one of the key missing pieces is what makes cloud "scale out", which is amortising the cost of providing these mechanisms over many many users. In the edge case, you actually have precisely the opposite, which is that you have to increase the cost of participation as you add users, as opposed to (sorry:-) exploiting Metcalfe's law of net scaling...that the value of the net increases super linearly with the addition of each "user", since they can also offer a "service" - i.e. super-additive value, rather than just adding a burden on the overall community. Lets be specific - this isn't about content, this is about compute and storage demand - of course, a decentralised or federated system could still exploit users content, although that is counter-indicated by most peoples' motives for using such systems. But the fabric has to provide:

availability - access networks and edge devices fail. so do core and metro links, and system units and racks in data centers, but much less frequently. To mask failures typically involves adding redundancy - e.g. multihoming systems, and running replicas. Replicas need to be synchronised, typically via some consensus scheme (or CRDT) - in the decentralised case, this is harder because failures are more complex and number of replicas may need to be higher (have seen estimates that you need about 6 fold to be as good as a cloud provider, who might use at most 3-fold).
trustworthiness - a cloud provider makes a deal with you and can be trusted or else goes out of business. Your peer-group in an edge cloud may well include bad actors. so now you need byzantine fault tolerance, which is a lot harder than just having majority consensus.
persistence - cloud providers have mostly shown that they are not "fly by night" (or the ones that are have gone bust). Personal devices come & go - I think I've seen estimates that people change smart phone about every 2-3 years (many contracts are designed around that) - while newer devices have more storage, typically they arrive with very simple cloud based synch with old devices (though android&IoS can do device-to-device synch of course) but also the users have tablets, laptops and workplace systems that also all need synching and expect long term survival of their content (as well as any ongoing processing) - e.g. personal content like email/messaging, photos/videos, even music, films, games etc) - cloud based systems have amortized costs of backup/replicas, and indeed, deployment of new storage tech at a wholly different scale from the edge. New technologies (synthetic DNA, silica/glass storage) are already being looked at for the next steps in that context but are years away from being affordable/usable for edge devices (if ever, given their slow write times....do edge users have the patience or personal persistence to take care of that?).
capacity - while newer devices have impressive storage, that just makes things worse, since one moves from just taking photos on the phone, to HD video and on to augmented reality and so on.

There are other no doubt other things one could add (complexity of key management not the least, although recent thinking in that area might provide some solutions) but for me this needs a concerted effort to re-think the job - If I only depend on other users' edge devices, then how to I find the 5 other users and what is in it for them, to replicate my services so that I can continue to see them wherever and whenever I need? How do we deal with continuous upgrades over vast numbers of disparate end users' systems, rather than vast numbers of systems in a small number of administrations? What are the ways one can seek redress in such a world, for data loss or breach of confidentiality. And on and on...

So one approach that is promising is what has happened with Mastodon, which is only "somewhat decentralised" in that each instance serves potentially quite a few people, and indeed, may not be running on the Raspberry Pi in someone's attic, but on a cloud infrastructural service. The key point is who controls access to applications&content, not where the bare metal compute is located. At least for now. Indeed, even if one distributed the computers, one is still largely still dependant on centralised electricity generation (ok, so I have solar, but how many people do...?).

So decentralised keys, but somewhat centralised, but migratable services looks pretty ok to me...to be getting on with.

There's a temptation to claim that a centralised system can emulate the federated/decentralised properties concerning control/ownership/access by encrypting data in store, transmission, and processing (enclaves or homomorphic encryption or secure multiparty computation) hence partitioning the data by key/function management, rather than spatial/device ownership. While this might work in terms of data use it doesn't deal with several centralised system failure models - including (non exhaustive list) denial of service (deletion intentionally or by central organisation failure/business exit, change of T&C etc); small gene pool of software hence likely vulnerability that strikes all customers of the central service at once (even though data breach might not happen, but it might too, whereas the cost of such an attack on a large heterogeneous system might be far less likely to impact as many users; However, on the other side of that coin, if I want decentralised systems that use replication that itself is heterogeneous, I need to have the copies run on other peoples' systems, who might not be any more trustworthy than the cloud/central providers....so one might still want enclave, and homomorphic crypto on the social net friends replica systems, but the crypto then mitigates against some of the energy savings of being at the "edge". Indeed, edge to edge communications might be very much more complex than edge<->central, which the network has been optimised for. Replica consistency in the edge case, is a relatively new thing, and consensus protocols havn't been stress tested well any where as much as necessary to say they'd be as good as the simpler designed cloud/data center replication environment's use of algorithms like Raft etc...

So it is not quite so simple to compare decentralised high availability systems with centralised high confidentiality systems.

Monday, March 27, 2023

the seven laws of robotherm

Laws of Robots v Thermodynamics

with apologies to wikipedia.

1 A robot may not injure a human being or, through inaction, allow a human being to come to harm.

1. the total energy of an isolated system is constant; energy can be transformed from one form to another, but can be neither created nor destroyed.

Add human, and make it conservation of life

2 A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

2. When two initially isolated systems in separate but nearby regions of space, each in thermodynamic equilibrium with itself but not necessarily with each other, are then allowed to interact, they will eventually reach a mutual thermodynamic equilibrium.

Add human, and make it irreversability

(aka associative, commutative, etc)

3 A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

3. A system's entropy approaches a constant value as its temperature approaches absolute zero.

Add humans, make it equilibrium or perhaps, fairness

0 “A robot may not harm humanity, or, by inaction, allow humanity to come to harm.”

0 If two systems are both in thermal equilibrium with a third system, then they are in thermal equilibrium with each other.

Add human, and make it a scale-freeness property

(society is an ensemble of individuals)

Tuesday, March 21, 2023

federation, for interoperability, needs to be decentralized.

We''re starting to see how open interoperability requirements might be a constructive way forward to reducing the harms potentially caused by oligopolies in the tech sector. This especially in EU, is one response to the DMA, and seems perhaps less painful than enforcing break up of big, successful, and sometimes innovative companies or their services.

However, if proposed platforms for interoperation are centralised, and they are in the least bit successful, e.g. for example in the domain of secure, group messaging, as in this IETF work, then those platforms will need to scale to the same size as a significant subset of users of the systems they interconnect, and that begs the question, "what is to stop them just becoming another one of the big few" or indeed, replacing several of them, leading to an even smaller gene pool. If they can solve the security and UX challenges in bridging all the services (and their meta-protocols/data like key management/onboarding, mutual trust management, spam/abuse management, etc etc), then they are as good or perhaps better than the sum of all their parts.

We can prevent this trend by simply designing the interoperability platform as a pure federation service, which is infrastructureless - it can run in clients only, or it can run as a classic P2P or decentralized service piggybacking on relatively inexpensive servers that many users have already deployed (e.g. use of raspberry pi for mastodon servers is already a thing, but given the separation of concerns, there's no reason not to run decentralised application protocol service on a (choice of ) centralised cloud providers (infrastructure-as-a-service providers, that is, not AAAS).

We have built another example of such a thing for digital identity called trustchain, as a proof-of-concept.

A True History of the Internet