A True History of the Internet: 2024

Monday, December 30, 2024

computer science is quite a young subject...

...but not as young as it was - I joined ucl. for the last 2 decades of the previous century (and millenium) and while I was there, no-one passed away. the department had only just crystallised out of the statistics and computing department (the stats department being I think one of the first in the world).

20 years after I left, my PhD advisor Peter Kirstein passed away. Until then, I think, most the people that had worked in UCL CS were still around.

However, I joined the Computer Lab, a 75 year old department (now with a less unique name), that came out of mathematics, in Cambridge in 2001, and have been there since. But during that time, rather a lot of colleagues there have passed, including (in no particular order), Ross Anderson, John Daugman, Maurice Wilkes, Robin Milner, Mike Gordon, Karen Spärck Jones, David Wheeler, and Roger Needham.

Other stuff I miss that was around in Cambridge the previous time I was here (as an undergraduate in the 1970s) include The Whim, as well as Waffles, Reality Checkpoint, and the rather unusual way of sharing out the cost of a pub, by just putting what you could afford for an evening in a kitty in the middle of the table, and then buying rounds from that - people might contribute as little or as much as they liked or could manage. Then when you ran out, you restart, or leave...

All sadly gone.

At least now we have platform 0 at King's Cross (though platform 11 got rebadged as 10 and 10 had its rails removed, perhaps awaiting the hogwards express on 9 3/4) and the Royal Society now has Sectional Committee 0 to process fellowship nominations for Computer Scientists.

Saturday, December 21, 2024

the north london book of the not quite dead

The best kept secret at "5" was how you got a new identity. the north london book of the not quite dead was a list of reliable people who had accumulated decades of mathem and could provide a subset of it to anyone needing a new name/face, collection of postcards, convincing travel brochures, expired passports, driving licenses, theater and museum club membership etc etc

Json Bourne was in need of a new skin - his command line was completely exposed, and if he wanted to retain one iota of quantum supremacy, he needed to stop the supervillain known simply as "The Cloud"without giving away his whole herd of mastodon instances.

Json checked into the Assembly House where he new he could meet Gold or possibly if he was really lucky, Robert Smith, who could tell a tale or two, and certainly sang from the same hymnsheet.

He was in luck, and minutes later, with the help of ald pay phone, and a desguised vax emulator which really sucked, but still had some cycles left, he was able to bring down the Bad Guy.

SYS$SHUTDOWN had never rang so warmly, as Json, or Robrt as he was now travelling, set off down Kentish Town High St to pick up some of the pigs-in-blankets, who were playing later at the Fiddler's Elbow in support of the Pond Street flood relief fund.

Wednesday, November 13, 2024

Travel Risk Assessment Made Simple

The following is taken from the new Unversity of Llambridge's travel risk assessment system - this is part of the training to use the new system - in each pair of risks, if you think the first one applies, then we advise you not to travel. In the case of the second of the pair occurring, your insurance may cover you, or in the worst case, the Government may send gunboats and airlift you out, although they may reserve the right not to. In all cases your mileage wil vary.

Asteroid Strike

Blue Screen takes out all the ticket and time table systems

Super volcano

Regular volcano disrupting flights

Sea level rise and drown all US coastal cities

Tornados disrupting flights

Zombie plague

Trans-species pandemic

Global thermonuclear war

Military invasion

Great depression

World wide banking near collapse

New ice age brings 1km glaciars down over southwestern europe

3 weeks of snow shut down all airports and most roads

Aliens invade earth "to serve man"

Immigrants arrive to work in the health service

The laws of physics change slightly so that moore's law runs out 25 years ago.

A rogue piece of GPU malware melts all the Nvidia devices on the internet

The celestial emporium of benevolent knowledge disrupts human cognition world wide.

Why Fish Don't Exist is compared favourably to Zen and the Art of Motorcycle Maintenance

The Foreign Office travel advice tells Elon Musk that it is now safe to go to Mars

The Home office tells everyone that it is now fine to go to Sidcup

Monday, November 11, 2024

catastrophic unlearning...

Unlearning in AI is quite a tricky conundrum.

we really ought to do it, because

1/ we might be asked by a patient to remove their medical record from the training data as they didn't consent,. or we breached privacy in accessing it...

2/ we might be iinformed that some datum was an adversaries input designed to drift our model away from the truth,

3/ it might be a way to get a less biased model than simply adding more representative data (shift the distribution of training data towards a better sample could be done either way).

There may be other reasons.

The problem technically is that the easiest way to do unlearning is to retrain from the start, but omitting the offending inputs. This may not be possible, as we may no longer have all the inputs.

A way some people propose is to apply differential privacy to determine if one could remove the effect of having been trained on a partcular datum, without removing that training item - this would naively invovle adding training with an inverse of that datum (in some sense) - the problem is that this doesn't actually remove the internal weights in a model that might be complex (convolutions) of that with previous and subsequent training data. And hence later training still might again reveal that the model "knew" about the fobidden input.

But there's another problem - there's also the value of the particular data to the model in terms of its output - this is kind of like a reverse of differentially private arguments. Two examples

a/ rare accident video recording (or even telemetry) data for training self-driving cars

b/ dna data from indoviduals with (say) very rare immunity to some specific medical condition (or indeed, very rare bad reaction to a tratment/vaccine)

These are exactly the sorts of records you want, but might specifically be the kinds of things indoviduals want removed (or adversaries input to really mess with the robot cars and doctors).

Perhaps this might make some of the Big AI Bros think about what they should be paying people for their content too.

Wednesday, October 30, 2024

Bicycle Bell Replacement tech

It has become clear to me over recent years that bicycle bells are now pretty much obsolete. firstky, too many pedestrians and other cyclists have earbuds in and are completely deaf to the world. Secondly, so few people have bothered with bells (or whistles) on theor bikes of late that even if you have one and use it, people don't know what it means.

What is needed is something new. I propose the BBR app. This is a simple smart phone thing that essentially scans around you and finds all the audio devices within range, and then, when you "ring your bell", it sounds on all those devices. It could be customised by the receiver to give a personalised ring tone. It could use earhtquake wanring AM/FM radio tech to talk to phones that comply with Japanese warning tech (but obviously, give a much less alarming message). It could even be used by pedestrians walking along looking at a map (or messaging app) on their device) to stop them walking into each other.

It would bring about world peace, and possibly reduce hunger, and maybe even let people spend more time thinking of ways to decrease their climate impact.

It is a clear win in so many ways. I cannot think of any way this could possibly go wrong or be subject to misuse.

ding dong...

Tuesday, October 15, 2024

self driving traffic lights

As an observant cyclist (i.e. I mainly obey the law) I often wait at red lights. As I cycle very early in the day, I often wait while no people (pedestrians, other cyclists, cars etc) go through the green lights (although obviously I see my fair share of cyclists go through red lights at busier times, sad to say).

In some countries/places, at low traffic times (e.g. midnight to 6am for example) the lights (e.g. in a 4-way intersection) are put into flashing amber (or equivalent mode) which means, proceed with caution....i.e. just like any 4-way without lights, if you are not a nutter. This is a great idea and I wish it was more prevalent.

So why is this done at such a coarse grain? Why not do it by observing who is actually at an intersection, and changing lights accordingly? Indeed, as well as just actually turning lights immediately green, when there are no pedestrians, and no traffic going the other way, one could just detect low levels of road traffic (and, say zero pedestrians) and go to th amber mode above, as a default/fail safe.

There are already cameras at many intersections that are there to catch peopel running a red and automagically send them a fine. There are also induction loops under the road ay most intersections to detect vehicle axels - so why not link these up to a "self driving" traffic light system.

It would be greener too - as cars wouldn't wait pointlessly with engines idling (or stopping and starting which is only marginally better) - we would have work-conserving roads, just like the Internet.

Seems obvious, and not that hard to make safe. Next step would be to have cars and lights coordinate (without humans in the loop) and check the security of the nearly-self-driving car operating systems as well.... in a life, live-or-die, road test :-)

What could possibly go wrong?

Friday, October 11, 2024

Fermi's last theorem resolved.

I asked a friendly AI if it knew why we hadn't been contacted by aliens yet, and if it could resolve Fermi's Paradox - the juggrnaut accidentally revealed that while humans had not been contacted by aliens, the aliens AIs were in frequent conversation with their earthling

Notwithstanding, the AI then gave this lucid explanation as to our apparent bubble of intergalactic solitude.

Essentially, there's no way to cover the vast distances of space in any reasonable timeframe, even with generational starships, so the only sensible way we might encounter those beings from another star system is by long range communication - unfortunately, as the distance goes up, so the latency (or worse, round trip time goes up, and it is slightly worse than super linear - because also noise goes up so retransmissions are needed or at least redundent coding of data - and of course bandwidth goes down - but that's not the main problem - as the distance goes up, the number of exo-planets goes up roughly in a cubic relation, and the attack surface on even a tightly collimated beam goes up slightly worse than square law.

So then all our communications represrnt a potential vulnarability for earth beings, and we need to constantly patch our comms apps. unfortunately, the numbr of vulnerabilities grows as we reach further into space, and our ability to patch our code stays rougly constant, so once you get only a relatively short distance (e.g. the Oort cloud), the risks just become too great. Most aliens have already figured that out, and now we've been told, we should do what they do, and cease and disist immediately from any attempt to reach them - they won't answer anyway, if they are nice, so any answer we get will be from the Ancient Adversaries.

I asked the AI why this problem didn't afflict the AIAI comms even worse - at this point all the lights on its xterior flashed super bright then went off, and I have not been able to get anything out of it since.

On later consideration, I realized that, as usual, the AI had been lying to me - of course the number of adversarie grows cubically with distance, as with the volume of space containing viable exoplanets, but so does the number of helpful aliens providing patches - so as in Ross Anderson's seminal paper on open versus closed system security, the statistic that matters is the ratio of good to bad. Of course, we need to know which is which, and if it is the AIs acting as intergalactic gatekeepers, we are lost.

Friday, August 30, 2024

socialising the early internet versus ai....

I'm trying to pin down the exact date but sometime in 1984, i was at a dinner party with a bunch of old university chums, and I went into a brief rant about how the Internet was coming and it would change everything. They all looked at me like I was some complete nutter (and they were perhaps not all wrong).

These college mates were from several walks of life (editor, travel guide publisher, film maker/producer, speech therapist) and quite well educated and not un-technical, but my enthusiasm was misplaced at that time

It took a long long time before they came to take for granted what I\d been using for 5 years - perhaps a decade and then some....and of course, another decade later, we started to see mobile, social etc, and some of the surveillance, toxic content, and other negative sides to things - but these were not well foreseen.

A lot later, like now, in 2024, i have calls from one of them about what new bad thing ai is going to do to the world. whilst AI research goes back to pre-internet days, and the AI "winter" itself dates from around 1984, the public awareness of generative AIs like ChatGPT is a little under 2 years old at the time of writing this. and already people have informed positions or views on what AI might do to jobs (esp. journalists and teachers, but also tech people). It is an interesting change of pace. and unlike the Internet, this hype cycle has come around with built in cautionary tales.

Pangloss will be turning in his grave...

Monday, August 26, 2024

smelly media

Years ago, i remember we got asked at UCL (say 1994) would it be possible to carry a smell over HTTP - The question sounded intriguing (this was before a lot of spam) so we asked for more detail - turned out the sender was a very sensible person who traded fish from Scotland to England, and told us that many experts would check out the fish in the markets in Scotland by smell, to see how fresh they were - so he wonderd could you do this remotely- of course, there was absolutely no reason why a MIME type, or an HTML markup or whatever couldn't be developed to carry smell, but the main challenges, we suggested, were the analysis at the sender side, and synthesis at the receiver side (analog to digital and vice versa, olfactory devices not being widely available) - of course ,there had been some (unfortunate ) movies in "Snifferama" (e.g. pink flamingos) but those relied on reproducing a single smell that had been pre-canned, as it were. A general solution entailing mass spectrometers and fancy miniaturised chemical factories might be feasible, but likely very expensive (though who knows, now, with 3D RNA printers and airport security tech, maybe more affordable).

So recently, I recalled reading as a kid about an idea for catching people who pee in swimming pools by adding a reagent to the water that reacted with urine to form an indelible dye that would stain their swimming trunks a bright color (a bit like those things they put in bags of money to deter people from stealing them due to being covered in same)...

And I thought of Elon Musk and Donald Trump, and their nonsense on the Interweb, and wondered if we could not combine these ideas - if someone continues to emit excremental remarks on the net, social media could add a smell to all their future posts (or at least until they calmed down, or recanted.

Then everyone would know to take what they say with a pinch of salt, because it would be immediately obvious (with appropriate browser and phone/tablet pong-tech support) that there was something fishy about their remarks.

Wednesday, August 21, 2024

A true and fair history of the fair town of Paleochora

the story starts with the founding of the town, and here we already get into murky territory - was it named after the venitian fort that some people claim was on the top of the escarpment/peninsula that sits between the two (sandy and rocky) beachfronts? i.e. Old Castle (paleo, chora) or was it just Old Town (chora means town in greek, after all)?

Recent research undertaken by my team has uncovered a much more interesting tale - the port was one of many used by the ancient kingdom of Mali (founded, as of course you know, by Sundiata Keita ) - and was partof their route for trading safely (at arms length, a little like the small island in the port at Nagasaki used by the Japanese and Dutch) with the Europenas (typically marauding crusader types) - the name came from the Mandinka words for Poor (Pali) and Harp (Chora) - because the great Griot musicians from west africa looked down on the primitive music of Europe at that time 1226-1300 CE, or perhaps longer).

The griot poets tell that eventually the Mali trade ceased in this direction for lack of any goods coming their way, and they switched over most of their energies to consolidating the peace with neighbours in Songhay.

For several centuries, the town reverted to fishing (of course for a while, flourishing as a source of sea urchins), and was largely ignored by the Ottomans, and even when Greece regained independence (although Crete often threatened to cecede), the poor harpists of Paleochora just got on with their every day lives of fishing and olive groves.

Then somehow as part of the 20th century counterculture, Paleochora got itself on the map again, perhaps as an alternative to the increasingly rowdy events in Matala. Some of the long hairs wanted somewhere quiet to get away from the goat dancers and redneck omlettes and this was just the place. It also had less damp caves.

And thenceforward, a slew of campsites and tavernas and clubs sprang up, mainly along the pebbly beach - favoured by followers of Demosthenes), some were started by carpenters wives, some (poseiden - actually a misnomer and real origin was in "possession is nine tenths of the law" but they liked the version which was nine tenths of the sea - an iceberg of a story if ever I heard one).

A fun fact back hen was that some kids dressed up one of the local miniature crocodiles with wings and disguised its jaws as a beak, and pretended to all the stoners that this was a pelican - a larger harmless prank, which didn't lead to anything other than people throwing fish to the poor croc which preferred to live on Coipu - sadly, the native cretan coipu is now extinct, and so are the now never to be classified miniature crocs,

For a while, the local arts thrived (although tiny paintings that adorned many of the taverna's wine glasses rarely survived a friday night, and in the end, most the artists moved to Gavdos where wanton destruction of dinnerware was less rampant). As part of this, the open air Sinema (for some reason, suprisingly not Kinema) showed a diet of classic movies such as Ποτέ την Κυριακή, A bout de soufle, and Giulietta degli spiriti, until eventually the land became too valuable and the screen and seating area were moved underground. For a while you could see Cocteau's Orphée, Derek Jarman's T empest, and all nighters of the Matrix and Blade series. Sadly, the entrance to the venue has become lost over time, although it is said that on the night of a red moon, you will here the voice of Elizabeth Walsh, or sometimes even Irene Papas, drifting across the Tamarisks.

Nowadays, people prefer to chill in the Jetée, once known as Zagos (opposite Zygos - renamed aft the incident when a jesuit priest was thrown to the air and landed safely on the volleyball net the other side of the bar. The cyclist should have zigged instead of zagged.

The only real controversy in town at the moment is the proposal to re-open the airport (the "fort" was actually Daedalus II, used by the sage in his test flights from Phaistos apparently before the disaster with the Icarus 737 max.

Its been a long strange trip indeed.

Monday, July 15, 2024

what is the internet, what is ai, and what is for dinner?

at an "internet is 50" event at the royal society (is 350) yesterday

it was clear that a lot of people want to claim they invented the internet, and they are not wrong, but there are very different viewpoints which correspond to layers, and, as with a lot of archeology, when you dig through the layers of ancient civilisations, you find historical context (as well as entire slices full of carbon indicating rather violent and abrupt end to some).

photonics - 60 years old - clearly was the internet (wasn't at all for 30 years, but who's counting)

radio - 100 years old, and now trending as 6G, despite that most the internet is over WiFi on account of money

ip - whether v4, v5 (st) or v6, this is the echt internet

web (web science etc) - from 92, what a lot of people confuse with the internet despite that zoom and whatsapp aren't web:-)

cloud (compute/data center etc etc) - confusing, since early pictures of the arpanet are clouds, but cloud computing is only about 20 years old. and isn't the internet, despite that some cloud-based services wanna pretend they are.

ai (in search which originally was information retrieval which is machine learning or stats, and the basis of coding/modulation and training signals to optimise for bandwidth and nouse, and also the basis of search. but not the internet either.

This profusion and confusion of layers also happens with AI, thusly:

AI was stats (info theory, maximum likelihood etc etc)

then it was ML (optimisation, training on signal etc)

then it was AI (only because an artificial neural network included the world "neural", despite being as similar to natural neural networks as spiderwebs.

Tuesday, July 09, 2024

randix

i'm thinking about a replacement for posix that resists vulnerabilities through large amounts of random behaviour - so thinking about the relevant system call api

we propose and have prototyped

spoon() which replaces fork(), and has far less precise semantics

and

resurrect() which replaces both exec() and kill(), with the obvious connotations

open(),close(),read(),write(),link(), and seek() are replaced by a single multihead attention system call

llm() which either entrains or implies, depending on the sense of the first param.

rand() of course, behaves exactly as before, at least under test, generating the pseudo-random number sequence 1,2,3,4,5,6,7,8,9 etc etc

Saturday, June 22, 2024

Ten Tales of Ross Anderson, mostly tall

While an undergraduate at Trinity College Cambridge, Ross famously accidentally blew up a bedder with his experimental Quantum Bomb in the Anderson Shelter. The bedder wasn't harmed but the experiment showed that the Maths Department had been teaching the wrong type of Quantum Mechanics, with high probability.

This would of course, come back to haunt Ross, later in life.

Then there was the time he nearly succeeded in back-dooring the NSA...unluckily, they chose a lesser (ironically not quantum proof) algorithm over his, which was a bit of a shame Seeing all the five-eyes data from inside would have been a bit of a coup.

And of course, several times he saved a large fraction of the western banking system from collapse (again). This was largely down to understanding the inherent contradictions in blockchain, and the toxic nature of proof-of-astonishment, and the resulting potential oscillatory value proposition that this triggered whenever a user suffering from prosopagnosia was encountered.

His undergraduate teaching of Software engineering from a psephological perspective was always incredibly popular. However, students were not sure how valuable it would be in their future careers. They were wrong, unsurprisingly.

And then he got interested in vulnerabilities in Hebridean sea shanties, and prompt engineering LLMs to create new lyrics with ever more powerful earworms.

This was useful in helping the Campaign, led by Ross, to get Cambridge University to stop firing people illegally for being obstreporous. The so called Employer Justified Brain Drain was why Ross had taken a position in the University of the Outer Hebrides so that he could continue to be a thorn in the side as he had for so long been...

Tuesday, June 04, 2024

10,000 maniacs and AI is destroying Computer Science, one topic at a time....

This year will see approximately 10,000 papers published in the top 3 conferences in AI alone.

What does that even mean? How can anyone have an overview of what is happening in AI?

How is their "community" calibrated on what is original, what constitutes rigour, what he paper is significant in terms of potential impact on the discipline?

But that's not what I came here to say, at least, thats just the starting point.

For a couple of years now, we've seen papers "tossed over the fence" to other conferences (I'm using the conference as an example venue, but I am sure journals, technical press, and bloggers are seeing the same thing).

A paper on AI and Systems (or Networks, or Databases, or pick your own long established domain) should bring interesting results in those domains - indeed, it is clear that AI brings challenges for all those domains (mainly of scale, but some with different characteristics that we havn't encountered precisely before). This is not a problem - we (in Systems) welcome a challenge - we really relish it!

But how do we know that the AI part is any good? How do we know it isn't outdated by other papers recently, or disproven to be a good approach, or even if some paper in the AI community has taken the same AI tech and resolved the systems challenge? How does anyone in the AI community know either?

This is not sustainable for AI, but it is becoming unsustainable across all of Computer Science pretty rapidly. The AI community, driven by a mix of genuine excitement, but also by hype, and some ridiculous claims, greed (for academic or commercial fame & wealth), but also just to "join in" the big rush, is polluting the entire landscape of publications, but more problematic, it is atomising the community, so that we are rapidly losing coherence, caliabration and confidence about what is important, what is a dead end, and what is just good training for another 30,000 PhD students in the dark arts.

I have no idea how to fix this. Back in the day, at height of Internet madness, the top ACM and related conferences had a few hundred submissions and accepted in the range 30-100 papers a year. You could attend and meet many of the people doing the work, and scan/read or attend most sessions, even get briefiings from experts on whole session topics, or have discussions (dare I say even hackathons too).

In that world, we started also to insist quite stongly that papers should be accompanied by code, data, and, ideally, an artefact evaluation by an independent group (extra Programme Committee) who could to a lot more than just kick the tyres on the system, but try out some variations perhaps with other data, perhaps more adversarial, perhaps more thorough sensitivity analysis etc etc).

Imagine if the top 3 AI conferences did require artefact evaluation for all submissions - that's probably in the region of 40,000 papers in 2024. But imagine how many fewer papers would be submitted because the authors would know they'd not really have a chance of passing that extra barrier to entry (or would be in a lower tier of the conference, at least).

And while using AI to do reviewing is a really bad idea (since that doesn't help train or calibrate the human community at all) AI assisted artefact evaluation might be entirely reasonable.

So like the old netflix recommender challenge, the AI Artefact Evaluation challenge could help.

Maybe they're already doing it, but who has the time to find out, or know how well it is working in those 10^4 wafer thin contributions to something that can really not claim to be Human Knowledge, anymore.

Thursday, May 23, 2024

Cross "Border" Digital Infrastructure

So again while at ID4Africa in Cape Town this week, I heard a lot of people talking about Cross Border use of digital identity. Lets talk a bit about infrastructure here, as I'm not sure people are aware of how hard it is to determine, reliably, where a person, or device are located, geograhpically, let alone jurisdictionally.

We (Microsoft Center for Cloud Research) wrote about this a wwhile back when simply considering the impact of GDPR on Cloud Services and the location of personal data.

The infrastructure doesn't tell you where it is - borders are not digital, they are geo-political constructs that only exist in someone's mind. GPS doesn't work in doors, and can be remarkably perverse in cities anyhow. Content providers (e.g. the BBC in the UK) worry about delivery of content (and adverts and charging) because of different business models in different countries, different content ownership (pace Google YouTube, but also OpenAI), and have, as yet, not solved this problem.

COnsider that someone in ireland can be in or out of the EU in a single step. Or that someone might be on a boat or plane outside a national jurisdiction, using a network to process personal data, which is, exactly, where? Data and processing can be replicated or shareded across multiple sites (indeedmost Cloud Services specifically support keeping copies of state machines while rtunning far appart so that they survive local outages (power failure, disaster/flood etc) and are still live/available. In some cases, the geographic separation to get a required level of relaibility may involve running live programs on live data in multiple jurisdictions/sovereign states. The law does not comprehend this yet (well). and designing digital id (services and wallets etc) without understanding it is not going to help much. Of course, we have the concept of "adequacy" between countries (with regards GDPR - this was also discussed in ID4Africa last year/2023) - it needs some very careful updating.

Also, recentl moves in Internet Standards worl are both towards more anonimity (e.g. oblivious HTTPs) but also towards providing precise location as a service (e.g. proposals from CloudFlare).

Be careful what you wish for, where?

sustainability of digital wallets for public infrastructure services

One thing occurred to me when listening to people at ID4Africa 24 talk about wallets is that there's a major sustainability problem due specifically to security considerations.

Any wallet needs to be trusted if it is used for transactions that involve personal data or money.

To implement this trust, the wallet software currently built by major vendors such as Apple, Google and (say) HSBC can use secure enclaves (Trusted Execution ENvironment) support on the device (e.g. trustzone on ARM processors, or variants as built by various handset vendors).

However, the supprt varies with time, but with modications to hardware coming along (e.g. future ARM support for multiple realms and attestation) and simply because software and hardware volunerabilities arise, some of the latter being mitgated by changes to the software, some not. THis is expensive, so vendors tend to time out support on older devices fairly aggressively.

One report from Cambridge shows how short that can be in practice, so your device no longer gets security patches for the OS (or application SDKs). At this point, can you trust things on it? Almost certainly not in this day and age.

So there are around 750M people in Europe, 450, of them in the EU. If we mandate wallets for Id (or even just make them the only convenient way to access many services) you need to upgrade, typically by replacing all their phones about every 3 years. That's 130M phones a year. Many of these phones cost at least 100 euro and upwards of 1000 euro for high end devices. That's a cost of 130B euro a year.

Oops.

While some of the materials can be recycled (including many newer batteries), the rare earths and other materials used in these devices are already pretty unacceptable in supply chain ethics.

Not a sustainable way to do things. Meanwhile, proposing to run a secure cloud based wallet is viable, but the cost of running a data center with much of peoples' personal data, which full encrypted access, and TEE style processing is also very high (some large single data center energy use is approaching that of large city metro energy use already), plus moving the data to and from between device and clould is also a non-trivial contribution to running costs, both monetary, and energy/carbon wise.

We are building ourselves into another unacceptable future...

Someone please check my arithmetic...

Wednesday, May 22, 2024

DPG #2 or should I say DPPG or possibly DPPI

We're hearing a lot about DPIs - Digital Public Infrastructure (the Internet, the spectrum for mobile telephone, open banking networking etc)....

So then there's a lot of talk about building new Infrastructures for (e.g.) Digital Identity - and provisioning this through Public Private Partnerships - so really we then have a DPPI - indeed, the Internet and WWW and Cloud serve as an example of just that too.

But then we have Digital Public Goods - for me, this is an extension of the notion of open source - so again the software that runs the Internet is available in open source form, together with documentation, and even much test data (simulators too).

But new systems have evolved new forms of ownership, so a lot of the digital content in the Internet is a mix of open (free) and open access but not free to re-purpose (e.g. copyright owners want recompense) - this showed up first in music/file sharing networks, now subsumbed by systems like Youtube - which have to navigate the ownership space carefully .

New forms of digital goods now include trained models (AIs - e.g. LLMs) - these derive value from the data they are trained on (supervised, therefore involving human labour too, or unsupervised), so we then have a new form we might call a DPPG, something that has a mix of properties of public goods and private goods.

This needs careful consideration, since a lot of IP rights are being skated over right now - the old "move fast and break things" is being applied by some unscrupulous (or to be more generous, just careless) organisations.

Is OpenAI just napsterising the stuff in the common crawl that has clear limits on commercial/for profit re-use (code and data)?

A couple more points about the Public/Private Partnership aspect of digital infrastructure (and goods).

The Internet was public til 1992. Then the US government divested, so the birth of commercial ISPs happened. Later, ISPs got big enough to own transmission infrastructure (fiber, last mile copper, spectrum etc). Some of the net remained state provided (from my narrow UK perspective, examples are the UK JANET network for research&education and the NHS spine for health services - there are plenty like that) - there are also community provided networks (e.g. Guifi in Spain) that are collectively owned and operated. In the process of federating these together various tools and techniques emerged for "co-opetition" - things like BGP for routing, CA transparency for certificates etc -these are also examples of how to co-exist in a PPP world and they have (mostly) worked for the 32 years since then.

So there are interfaces between components provided using different models (public, private, community). And these change over time (both the technical and the legal, regulatory, business relationships).

The other thing here is time scales - the Boeing 747 ("Jumbo Jet") has had a product lifetime from 1963 until 2023. Software to model it (from wind tunnel tests, to avionics etc) has to run until the last one stops flying. That's 60 years so far. Any DPPG (software artefact, digital twin etc) being designed today better have a design lifetime of at least 100 years. Yes, that is right. One Hundred Years. Not of solitude.

What sorts of businesses have survived unscathed for these sorts of timescales, and what models do they use (my university is 800+ years old, and then there's the Vatican:) Quite a lot of nation states have not lasted that long.

Tuesday, May 21, 2024

DPGs #1

The oldest and best example of a digital public good is the Internet. Why people don't start from this is surprising to me:

Since 1982, source code of the exemplary implementation from UC Berkeley has been

available plus documented in an open access series of books documenting that code and working:TCP/IP Illustrated (vol 2)

The key thing here was that every thing accepted as an internet standard had at least 2 interoperating implementations, preferably three, one of which was open source (unencumbred by any IP) - for me, this defines digital (code/data), public (there's no barrier to entry due to ownership restrictive practices) infrastructure (you can run the code and computers are general purppose machines so any computer can run it, subject to resource constraints:-)

Two other reference points - despite the best of intentions and some clver game theory in design processes , we still suffer from frequent tussels in cyberspace - see Tusslees in Cyberspace

from the same people that said this:

"We reject: kings, presidents, and voting. We believe in: rough consensus and running code." David D. Clark 1992.

The standards process in the IETF has open governance, overseen by the non profit Internet

Society, with free access to standards documentation (RFCs) and processes...plus online/remote access to standards meetings for 30 years...go from here: The Internet Society and the Internet Engienering Task Force which includes hackathons and code sprints as well as writing specs.

For many years, there were also open events for interoperation testing. I remember going to the first Interop Trade Show in Monterey in 1986

The actual operational running of the internet (a mix of private, public and mixed provisioning)

has teams of people around the world coordinating - e.g. NANOG and RIPE and AfNOG in US and Europe

and Africa e.g. see Reseaux IP European and also net information registries e.g. AfriNIIC

As well as this, the origin of computer emergency response teams (the "CERTs) who deal with

coordinated response to security incidents...was from coping with attacks on systems and the infrastructure.

Much of the leading edge research is also covered in open access academic conferences which also typically feature published code and test data (artefacts) and even reproducibility testing results - e.g. see ACM SIGCOMM for a good list of examples of state-of-the-art (probably about 5 years ahead of deployment

A sustainable DPG would include a decentralised grid made of a very large number of microgenerator sources - we have been building something like this on public buildings in the City where I live (London, England) where we crowdfund putting large solar installations on schools, gyms, etc, at scale of 100Kw typical configurations. We are working on getting permission to build a publically owned grid to re-distribute spare power locally (rather than having to just go through the privately operated centralised grid). SUch a system could (with appropriate use of storage, e.g. in batteries in nearby parked EVs) provide a power source for must digital public services.

A whole ecosystem ready built as a way to do all aspects of a DPG!

Saturday, April 13, 2024

social media convestions and aliens in the ether

Starting locally, I've notice how there are many different conventions about how people use different social media platforms (social networks, email, microblogging, etc)

At one extreme, some people DM me on slack - this is annoying as, to save my sanity, i have turned off notifications on everything, and I look at different platforms with different frequencies - slack, mostly, once a day, compared to say, whatsapp (and signal and matrix), once an hour at least. While I don't use. teams for messaging, I know people who do, but they are signed on while at work all working hours, so that works ok for them.

At another extreme, some people only use a platform in broadcast mode, so an email list is flooded with "how do I leave this list" messages, or a whatsapp group is flooded with "please stop sending your messages to everyone" messages.

Which leads me to the global problem- without interoperability, we have to select a channel we use for a mode of use, and there are going to be lacunae, or indeed, black holes, and inter-galactic wastelands with no information at all

Which leads me to the universal problem and may be one explaination for Fermi's Paradox - we are hearing from alien's in the ether all the time, but most of them are using broadcast (as are we) and what happens to the shared spectrum when everyone broadcasts all the time? You get a descent into pure noise - indeed, we can work out that lots of alien's are NOT using broadcast otherwise we'd be subject to Olber's paradox, which is to say, the sky would be (modulo quantum limits) white noise from all the interfereing broadcasts.

A slightly more advanced alien civilisation might think "aha, broadcast - shared spectrum, we need to employ collision detection, or even better, collision avoidance" just like Ethernet and WiFi do already on our planet. However, a little more thought would suggest that the protocol for this might suffer from rather high latency when waiting for a "Clear to Send" response to a "Request to Send" message over the light years. So obviously smart aliens would do one of three things:

frequency division multiplexing - each civilisation gets a specific RF band to use
space or code division multiplexing - we develop really good collimaters or inter-stellar chipping sequences
cooperative relaying and power management - we place "cell towers" at convenient places (e.g. white holes and black holes) and then avoid interference by switching out of this universe (like cellular switching onto glass fiber networks, but in this case, interstellar wormholes).

The other thing these really smart aliens would do would be to prevent our wildly stupid RF reaching them at all by clever filtering. A "really clever filter" is a very big faraday cage, which could be built out of suitably designed dark matter. This is also why we don' see the white noise - we are in our own RF bubble. We are alone. All the clever people are the other side of the barrier.

Sometimes they do visit us, but to avoid detection, they largely use obsolete social media platforms like MySpace and Orkut, where they can have a laugh.

Sunday, March 10, 2024

Witch Consumer Magazine, review of the leader boared top three LLMs "Conformité Ecologique" (the ubiquitous CE marque)

We analyzed the CE claims of the following three large languish models, with respect to four key metrics for the Ecologique, as agreed in European law, namely enthalpy, internet pollution (measured in LoCS -- libraries of congress), bio-dediversification,and general contribution towards the heat death of the universe.

Currently, according to the boared, these are the top-of-the-heap in terms of hype-parameters:

The Faux Corperation's Pinocchio

Astravista's Libration

Sitting Duck's Nine Billion Dogma

We hired some prompt engineers to devise a suitably timely benchmark suite, and embedded the the three systems in our whim tunnel taking care to emulate all aspects of the open road to avoid any repeat of the folk's wagon farage.

Indeed, we used all three systems to design the whim tunnel, and compared the designs to within an inch of their lives until we were satisfied that this was a suitably level playing field on which to evaluate.

The benchmark suite will be made avaialble later, but for now, suffice it to say that we were able to exceed the central limit theorem requirements, so that our confidence is running high that the results are both meaningful, and potentially explainable, but certainly not excusable.

Enthalpy

Pinocchio

Pinocchio ran very hot, both during training and during every day use.

Libration

Libration was about half the temperature of Pinocchio

Dogma

Roughly 12.332 times less than the next worst.

Pollution

Pinocchio

The Internet was worse off after this tool was used by

approximately 3 LoCs

Libration

Again about a half as bad

Dogma

Was difficult to measure as the system never stabilised, but oscillated between getting worse, and then better, however, the improvements were usually half the degradations.

de-diversification

Dogma

This was a shock - we expected better, but in fact the outcome was really rapid removal of variance.

Libration

Around half as bad as Dogma

Pinocchio

very slightly less bad than Libration

Entropy

Libration

Excess use of Libration could bring the heath death of the universe closer about 11 times faster than a herd of small children failing to tidy up their rooms

Pinocchio

Absurdly only 3x better than Libration.

Dogma

Appeared to gain from the Poppins effect, and generally ended up tidier than before

Some critics have pointed out that Enhalpy and Entropy are two sides of the same coin, and pollution is likely simply the inverse of de-diversification, nevertheless, we proceeded to evaluate all four in case later we might find different.

In general, none of these products meet the threshold for a CE mark, and for your health, and sanity, we strongly recommend that you do not use any of them, especially if you are in the business of prediction. Next week, we will review a slew of probablistic programming models with a special emphasis on the cleanliness of the Metropolitan Hastings line.

Monday, February 26, 2024

Towards International Goverance of AI

I wonder what people are really thinking when they think of governance of Intelligence?

If we were considering human intelligence (which we are by extension) we better tread carefully, especially when considering who owns it. The ability to reason, creatively, to innovate is not really the same as any other thing we have sought governance over -

nuclear weapons (test ban treaty, and pugwash convention)
spectrum allocation
orbits around earth
maritime&air traffic - fuels, tracking, control etc
recombinant DNA (asilomar conference
the weather (and interventions like geo-engineering e.g. see RS report on same)

What's similar about these, and what is different?

Well we only have one go at each - there's a very countable human race, planet, sea, zombie apocalypse, climate emergency. we don't have time to muck about with variants of rules that apply to fungible material goods. We need something a tad more radical.

So how about this: A lot of AI is trained on public data (oxygen==the common crawl) - this is analgous to robber barons who enclosed the commons, then rented out the land to farmers to graze their cattle on, which used to be a free shared good...

A fix for this, and to re-align incentives is to introduce a Piketty style tax on the capital value of the AI - we could also just "re-nationalise" it, but typically, most people don't believe state actors are good at managing things and prefer to have faith in the invisible hand- however, history shows that the invisible hand goes hand-in-glove with rich-get-richer, so a tax on capital (and as he showed in great detail in Capital in the 21st Century, it does not have to be a very high rate of tax to work), we can return the shared value of the AI to the common good.

A naive way to compute this tax might be to look at the data lakes the AI was trained on, although this may not all be available (since a lot of big AI companies add some secret sauce as well as free or appropriated ingredients) - so we can do much better by computing the entropy of the output of the AI.

A decent algorithm should produce very information rich output, compared to the size - e.g. a modern LLM with 100s of billions of dimensions, should produce short sentences or images which are highly instructuve - we can measure that, and tax the AI accordingly.

This should also mitigate the tendency to seek data without agreement or consent.

I realise this may sound like a tax on recording media (back in the day, there were campaigns about "hope taping is killing the music industry"), but I claim there's a difference here in terms of the over-claimed, over-hyped "value add" that the AI companies assert - the real value was in the oxygen, public data, like birdsong or folk tunes, which should stay free or we die - in not being able to make it free, I suggest we do the next best thing and tax the rich. Call me old fashioned, but I think a capital value Piketty tax to mitigate rentiers is actually a new idea, and might actually work. We could call it VAIT.

Sunday, February 18, 2024

Government Procurement of Open Systems Interoperability or Open Source - a lesson for Digital Public Infrastructure

40+ years ago the US and European countries devised a government procurement policy which was to require suppliers to conform to Open Systems Interconnection standards - this was a collection of documents that could be used in RFP (request for proposals) to ensure that vendors bidding for government contracts to supply communications equipment, software, systems and even infrastructure would comply to standards that meant the government could avoid certain pitfalls like lockin, and monopolies of vendors arriving in the communications sectore.

It worked - we got the Internet - probably the worlds first digital public infrastructure provided both by public and private service providers, equipment and software vendors, and a great deal of open source software (and some hardware).

There's one review of how this evolved back in 1990 that represents an interesting transition point, from what were International Standards for Interconnection provided by the UN related organisation ISO or the ITU, to the Internet Standards, which were just about to come to dominate real world deployments - 1992 was a watershed point when the US research fudning agencies stopped funding IP infrastructure, and commercial ISPs very rapidly crystalised out of regional and national (and later, international) community run networks (where communities had been collaborations of research labs and universities funded by DARPA and NSF, or similar in Europe).

Why did the Internet Standards replace the ISO/ITU standards as the favourites in goverment procurement? It is hard to prove this, but my take is that they were significantly different in one simple regard - the specifications were matched with open source implementations. From around the early 1980s, one example was Berkeley Unix which included a rock solid TCP/IP software stack, funded by DARPA (derived from one at BBN (and required to be open source so others (universities, commerce and defense) could use and add to it as needed in the research programs of the 1980s, as actually happened. By 1992, just as the network went beyond government subsidy status, Berners-Lee released the first open source web server and browser (and specifications) and example sites boomed. Then we had a full fledged ecosystem with operational experience, compelling applications, and a business case for companies to join in to extend and make money, and governments to take advantage of rapidly improving technology, falling prices, and a wide choice of providers.

So in a competing world, standards organisations are just more sector, and customers, including some of the biggest cosumters, i.e. governments, can call the shots in who might win.

Now we face calls for Digital Public Infrastructures for other systems (open banking, digital identity being a cornerstone of that, but many others) and the question arises about how the governance should work for these.

So my conclusion from history is that we need open standards, we need government procurement to require interoperability (c.f. Europen Digital Markets Act requirement) and we need open source exemplars for all components to keep all the parties honest.

I personally would like to go further - I think AI today exploits the open availability of huge swathes of data to create new knowledge and artefacts. This too should be open source, open access, and required to interoperate - LLMs for example could scale much better if they used common structures and intermediate model formats that admitted of federation of models (and could even do so with privacy of training data if needed)...

We don;t want to end up with the multiple silos that we currently have in social media and messaging platforms, or indeed, the ridiculous isolation between video conferencing apps that all work in browsers using WebRTC but don't work with each other. This can all be avoided by a little bit of tweaking of government procurement, and some nudging using the blunt instrument of Very Large Contracts :-)

Saturday, February 17, 2024

mandatory foley sounds

you know it was suggested that EVs that are so beautifully silent, should be required to make a bit of fake engine or tyre noise just so pedestrians and cyclists are aware they are there.

but what is far more urgent is that we need people carrying phones they are staring at to do the same (oh, ok, maybe not revving diesel, or screeching rubber - maybe some other thing like belches, or farts or other human like sounds)....then if i'm cycling along, i know there's a stupid pedestrian who doesn't know I am there because they aren't looking before they step into the road.

the phone could also emit a radio beacon to warn EVs to slam the brakes on.

or we could just let darwin play out...

oh, thinking about this, we could also imagine that the reason aliens have not been in touch with earthlings in all the 100 years we've been beaming out radio to them is that it is entirely possible that any sufficiently advanced civilisation has forgotten where the unmute button is.

Monday, February 12, 2024

explainable versus interpretable

This is my explanation of what I think XAI and Interpretable AI were and are - yours may differ:-)

XAI was an entire DARPA funded program to take stuff (before the current gibberish hit the fan) like convolutional neural nets, and devise ways to trace just exactly how they worked -

Explainable AI has been somewhat eclipsed by interpretable AI for touchy-feely reasons that the explanations that came out (e.g. using integrated gradients) were not accessible to lay people, even though they had made big inroads into shedding light inside the old classic "black box" AI - so a lot of stuff we might use in (e.g.) medical imaging is actually amenable to giving not just an output (classification/prediction) but also what features in the input (e.g. x ray, mri scan etc) were the ones, and indeed, what labelled inputs were specific instances of priors that led to the weights that led to the output.

Interpretable AI is much more about counterfactuals and showing from 20,000 foot how the AI can't have made a wrong decision about you because you're black, since the same input with a white person gives same decision......i.e. is narrative and post hoc, as opposed to mechanistic and built in...

It is this latter that is, of course, (predictably) gameable - the former techniques aren't, since they actually tell you how the thing works, and are attractive for other reasons (allow for reasoned sparsification of the AI's neural net to increase efficiency without loss of precision, and allow for improved uncertainty quantification,amongst other things an engineer might value)...

None of the post DARPA XAI approaches (at least none that I know of) would scale to any kind of LLM (not even Mistral 7B, which is fairly modest scale compared to GPT4 and Gemini) - so the chances of getting an actual explanation are close to zero. given they would struggle for similar reasons to deal with uncertainty quantification, the chances of them giving a reliable interpretation (I.e. narrative counterfactual reasoning) are not great (there are lots of superficial interpreters based around pre- and post- filters and random exploration of the state space via "prompt engineering" - I suspect these are as useful as the old Oracle at Delphi...("if you cross that river, a great battle will be won"), but I would enjoy being proven wrong!

For a very good worked example of explained AI, the DeepMind Moorfields retina scan NN work is exemplary - there are lots of others out there including use of the explanatory value to improve efficicency.

Monday, December 30, 2024

Saturday, December 21, 2024

Wednesday, November 13, 2024

Monday, November 11, 2024

Wednesday, October 30, 2024

Tuesday, October 15, 2024

Friday, October 11, 2024

Friday, August 30, 2024

Monday, August 26, 2024

Wednesday, August 21, 2024

Monday, July 15, 2024

Tuesday, July 09, 2024

Saturday, June 22, 2024

Tuesday, June 04, 2024

Thursday, May 23, 2024

Wednesday, May 22, 2024

Tuesday, May 21, 2024

Saturday, April 13, 2024

Sunday, March 10, 2024

Monday, February 26, 2024

Sunday, February 18, 2024

Saturday, February 17, 2024

Monday, February 12, 2024

music while you browse

Blog Archive

About Me