Monday, January 29, 2024

Centralization, Decentralization, and Digital Public Infrastructures


Centralization, Decentralization, and Digital Public Infrastructures 

with apologies to Mark Nottingham,

Through the Control/Power Lens.

Governments have typically centralized control of the things that governments do-  raising tax to fund provision of certain services - education, health,  transportation, defence, and even in the not too distance past, telecommunications. Decentralised government (e.g. syndicalism) has been rare. On the other hand, most governments in recent history have left domains outside of government to markets or communities, although not without some (perhaps limited) regulation or control of governance. 

In the past, communities have built cooperative ventures (shared barns, shared savings and loans) and more recently community networks and power grids.

Through the Economic Lens

Markets often espouse competition where multiple providers offer  equivalent products and services. Various models exist for central versus decentralised economies.

There's interaction between government and economy,  through regulation, especially when  threat of monopoly, or even just oligopolies, through coercion, government v. companies w.r.t making sure market operates transparently, efficiently and fairly. (see later feds v. apple, and in the UK, IPA v. GDPR.)

Through law, government may also provide citizens with Agency, Representation, Redress.Of course, there are good and bad government, and typically this can show up in terms of poor practice, or deliberate removal of rights (to agency or redress, e.g. concerning unfair treatment, including exclusion etc 

Governments may be good now, but bad later, or vice versa. It is not an accident that Germany has the strictest privacy laws in the world, it was as a result of their past experience of East Germany under the Stasi. They are not so naive as to believe that that couldn't happen again one day (sadly).

Through the Technology Lens

The Internet is probably the best example of something that had been largely a decentralised system for decades.

Horizontal - services - interoperation/federation

E-mail, web, name spaces

Vertical - stacks - silos

cloud, social media, online shopping, entertainment

Horizontal systems are somewhat decentralised (or at least distributed) whilst in some informational sense, vertical systems are somewhat centralised.

Through the Information Lens

Where is data, is orthogonal to who can read it, who can alter it.ownership and control depends on access, and access control, and legibility. So whether I can get at my, or your data depends on my role and my privilege level but also whether I can then actually decode your data depends on my having the right software.

At some level, we can expect most data today to be encrypted , at rest, during transfer and even while processing. Protection through access control is not sufficient, since there are mistakes, insider attacks and coercion. Software has vulnerabilities. Hence we employ keys, and encryption/decryption depends on having both data and relevant keys.

One step further, who has accessed the data, and who has been able to decode the data is part of audibility (who can see who can see - Quis custodiet ipsos custodes?  etc)

If the user controls the keys, they may not care too much where the data is (except for potential denial of access) since others copying the data will not be able to decode it. On the other hand, if the government keeps copies of keys, then they can access any relevant data, wether it is central or decentralised. Of course, if the government accesses my data o my computer, I may be aware of that (through audit trails). but that might not do my much good in the face of a "bad" government. 

There are two separate aspects of identity systems where visibility of data matters, in terms of threats to citizens from bad actors: firstly, foundational id provides linkability across surveillance of actions (voting, signing on to services, etc) so exposes the individual's digital footprint to long term analysis; secondly, functional id includes particular attributes (age, gender, race, religion, licenses to operate vehciles, medical, academic qualifications etc etc), which offers opportunity to discriminate (treating groups preferentially or excluding or reducing rights of other groups etc etc). A bad actor doesn't need the whole government (or its service providers) to misbehave - just that systems are poorly designed so that insiders can exploit vulnerabilities). The perception of this possibility is enough to create distrust, and disengagement, which itself will mitigate against vulnerable groups in society more than privileged.

Through the Efficiency Lens

We can put all the data in the world in one place, or we can leave it where it was originally gathered. This is a choice that represents two points on a spectrum of centralized versus decentralized data. One can also copy the data to multiple places.

There are efficiency considerations in making this choice, which entail more energy, higher latency, lower resilience, worse attack surface,  and potential for catastrophic mistakes, when taking the centralised path. The decentralised path reduces these risks, but still requires one to consider copies for personal data resilience.

These choices are orthogonal to the access choices, which merely concern who has rights ad keys, and where they keep those, not who holds data where.

Conclusions, regarding Alternative Solutions in the Digital Identity Space

A digital public infrastructure such as an identity system needs to be trusted (so people use it), and therefore considerations about whether the user base trust the government or not matter.

If we don't trust the government, we might choose a decentralised system, or at least a system with decentralised keys (like the Apple iCloud eco-system).

The question of whether there should be one provider, or six, or 10 billion is orthogonal to this trust, although it does impact resilience and latency, i.e. efficiency. If the keys are owned by users, then this impacts governments'  ability to use identity data (attributes, and identity usage) to plan, whether for good or bad. That said, some privacy technology (e.g. FHE or MCS)  combined with decentralised learning might allow non privacy invasive statistics  to be gathered by a centralised agency (i.e. government) without actual access to  individually identifying attributes. A good example of this was the  Google Apple Exposure Notification system designed for use for  digital contact tracing during Covid, which could have been adapted to offer statisticcal information (e.g. differentially private) if necessary (though it wasn't used that way in practice).

All of this leads to the question about who provides key management, and a related question of certification (i.e. why should we trust the key management software too). One solution to this is to provide a small (e.g. national scale) set of identity services, but a decentralised key management system that can also be used to federate across all the identity services (cross-border/ or between state and private sector).One technology that we built to provide that independent key management for identity systems is trustchain[1], which is a prototype that services to replace a (somewhat) centrally owned platform such as Web PKI.

An interesting oligopolistic system that offers somewhat decentralised certificates is the Certificate Transparency network (of about 6 providers) that sign keys for the Internet -- this arose because the previously centralised CAs were hit by attacks which caused major security breaches in the Internet. We would argue that a similar scale system for key management and certification for digital identity is evidentially the bare minimum for acceptability for any trustworthy system.

Whether the system infrastructure itself is decentralised or not is a separate question which concerns efficiency, and, perhaps, some types of resilience (Estonian Digital Citizenship systems are distributed over several countries for backup/defensive/disaster recovery reasons).

[1] trustchain is a prototype that is based on ION and makes parsimonious use of the bitcoin proof-of-work network to provide decentralised trustworthy time, and then can create/issue keys in a way not dependent on any central provider or service, resilient to coercion, collusion and sybil attacks. We are currently investigating replacing the proof-of-work component with TimeFabric, which itself depends on a ledger, but can use a proof-of-stake or proof-of-authority and is therefore massively more sustainable.

Thursday, January 11, 2024

Replacing the Turing Test with the Menard Test

 In Borges short short tale, "Paul Menard, author of the Quixote", he reports on the astonish tale of the 19th century author, who livs a life so exemplary in the literal sense that it is exemplary of what the author of the classic work, Don Quixote should be like, that when Menard produces the Quixote, it is not a copy of the work by Cervantes, but a better work, despite being word-for-word identical. But it is not a copy, it was made through the creative efforts of Menard, based on his experiences and knowldge and skills.

Imagine an AI that was trained i n the world, not on a large corpus of text, so that it didn;t just acquite a statistical model of text, but acquired an inner life, and then could use that inner life to create new works.

Imagine such an AI was able to produce, for example, a book called Don Quixote, without having read the work by Cervantes.

That AI would necessarily contain a model of Cervantes, or at least something that had many of the same elements.

This model of a creative human is quite different from a model of lots of blocks of text, which can be regurgitated with many small variations, but are, of necessity, merely stochastic parrots.

Was one to interrogate the true creative AI, it might respond with other works, that Cervantes might have written, if he were still around.

A similar AI with an inner life, that modelled, say, Schubert, might be capable of completing symphony number 8, or another, with the "eye" of Jackson Pollock, might move from abstract expressionism, to hyper-realism one day.

Such AIs might be able to introspect (e.g. in the manner of Alfred Hitchock, when interviewed by Francois Truffaut about why he used certain approaches for sense in his film).

Such systems would really be interesting, and not rote learned in how to pass trivial Turing Tests.

Tuesday, January 02, 2024

AI predictions with the possibility of fairness?

 There's a bunch of work on impossibility results associated with machine learning and trying to achieve "fairness" - the bottom line is that if there is some characteristic that splits the population, and the sub-populations have different prevalence of some other characteristic, then designing a fair predictor that doesn't effectively discriminate against one or other sub-population isn't feasible.

one key paper on the impossibility result covers this (alternative is to build a "perfect" predictor, which is kind of infeasible).

On the other hand, some empirical studies show that this can be mitigated by building a more approximate predictor/classifier, perhaps, for example, employing split groups and even to try to achieve "fair affirmative action" - this sounds like a plan, but (I think - please correct me if I am wrong), assumes that you can

  • work out which group an individual should belong to
  • know the difference in prevalence between the sub-groups
Suggests also to me that it might be worth looking at causal inference over all the dimensions to see if we can even determine some external factors that need policy intervention to, perhaps, move the sub-populations towards having equal prevalence of those other characteristics (high school grade outcomes, risk of re-offending, choose your use case)....

I guess one very important  value of the work above is to make these things more transparent, however the policy/stats evolve.

Blog Archive

About Me

My photo
misery me, there is a floccipaucinihilipilification (*) of chronsynclastic infundibuli in these parts and I must therefore refer you to frank zappa instead, and go home