A True History of the Internet

Thursday, November 20, 2025

cloud versus edge - i think the jury is still out

With recent Cloudflare and AWS (and previous meta outage) it comes down to one simple tradeoff :

on the one side is everyone running an internet service saves some money by paying a Cloud provider to run the infrastructure for them -

So the cloud outfit get to amortize a lot of costs over all the customers by having a small number of big data centers (numbered in thousands) instead of millions of enterprise computing services run by every tom, dick harry, tescos, twitter, openai, slack, signal, ticketmaster etc etc (all actual examples of people who lost service during aws and cloudflare outages)

As Spidey says "with great power comes great responsibility". So cloud providers not only provision carefully,

but they do actually provide some levels of fault tolerance by providing redundant servers,

and even run some consistency protocols to make sure of a customer's service needs very high availability,

so long as a majority of the duplicate servers are running, the service is ok - this can operate globally,

so even if there's a whole country disconnected (e.g. international fiber cut, or national grid outage, both also real events in recent years), the rest of the world can move on ok...

But it would seem that they don't fully apply this distributed, replicated, fault tolerant/high availability,

possibly somewhat even decentralised thinking to the implementation of their own internal necessary infrastructure - so in AWS and Cloudflare case, the error was central - someone in AWS didn't consider a particular pattern of performance that meant a DNS config (their design is 95% sane) led to a slow server updating DNS and overwriting more recent entries, causing customer services that had needed those new entries to be unable to find them In the Cloudflare case,

a centrally managed configuration file grew in one overnight update by twice the size, exceeding cloudflare services maximum file size constraint (this is actually rather sad in terms of being fairly esasy to prevent by normal system checking/validation processes. The AWS one is slightly more subtle, but not much more. in fact, earlier outages in replicated/distributed services (actually at cloudflare earlier) took PhD level thinking to come up with long term solution - see this paper for one example: Examining Raft’s behaviour during partial network failures

https://dl.acm.org/doi/pdf/10.1145/3447851.3458739

The cloudflare example is also a little reminiscent of the Crowdstrike outage, but that wasn't cloud - crowdstrike has a rulebase for its firewall productsm and microsoft windows is required to allow thir parties to install firewall products (ven though a modern microsoft OS firewall is actually good) - crowdstrike had a bug in a new rule base so when all the windows machines using that product updated their rules, the firewall code (inside the OS, allowed in by microsoft due to anti-monopoly rules) read a broken file, which caused an undetected code bug to triger an exception, and, due to oversight the crowdstrike software engineers had not put in an exception handler, which would have led to a safe exit of that code, so instead the exceptin caused an OS crash (i.e. bluescreen!)...i this case, the central error affected millions of edge systems directly. - and due to the way the s/w update worked, needed a lot of manual intervention by many many people in many organisations..

In a non-cloud setup, you'd have natural levels of diversity in all the millions different enterprise deployments (even if just different versions of things running) so outages would typically be restricted to particular services, but in the cloud setup, an infrastructure outage takes down thousands of enterprises....(I think AWS reckoned about 8000 large cusomters - not sure about Cloudflare but estimates are they run about 25% of the Internet ecosystem's defenses)...

background

AWS explainer

https://aws.amazon.com/message/101925/

Cloudflare explainer

https://blog.cloudflare.com/18-november-2025-outage/

Replication failure during partial network outages

https://dl.acm.org/doi/pdf/10.1145/3447851.3458739

UK government very useful report on data center sustainabilty (has lots of useful statistics):

https://researchbriefings.files.parliament.uk/documents/CBP-10315/CBP-10315.pdf

To be fair to the cloud service folks at AWS and Cloudflare, they found, fixed and publically reported the problems in under a day, so the concentration of resources in cloud also meant a concentration of highly paid, really expert people who can troubleshoot a problem, and once its fixed, the deployment is also quick. on the other hand, a decentralised setup (more like the frowdstrike example) can also be deployed fairly fast if they had been slightly more careful about their s/w update process....

so i'd say clould v. edge, at the moment hard to pick which is more resilient, which is cheaper

Thursday, November 13, 2025

prisoner of cellular blockheaded thinking #9

So the cellular industry is often lauded for its planning and general coherent approach to the world.

Let's remind people how inaccurate that is.

From the get go, Bell Labs & parent AT&T decided, having invented cellular telephony, that there was no market for it. (repeating the famous IBM error that "the world will only need 3 computers, 2 in America and maybe one in England).

Later, people invened the bluetooth stack (yech - serial line emulation, modem control tones, and no sensible mesh mode for ages)

Then blackberry and nokia tried to copy apple's iphone ideas and almost totally tanked what had previously been incredibly smart and succesful industries.

They they caused the fixed telephone service to run out of telephone numbers multiple times.

Most recently, car owners and smart meter owners are finding that the ability to remote access said vehicles and devices is being turned off because providers won't be running edge, gprs (2.5G) Or even 3G pretty soon - so because there's no backwards compatibility until much later generations, goodbye to all those useful services.

And what is 6G actually for, remind me?

My wifi still works ok :-) on IPv4 (only 46 years old) and IPv6 (20+ years old...

Friday, October 31, 2025

AI versus Humanity

humantity is too stupid to build an AI that would threaten its existence (humanity, not AI's existence...we could easily build a self-destructive AI - see below).

the main reason is logistical. we are rubbish at supply chains - food comes across the planet out of season, for rch folks but bypasses many people on the way who are suffering shortages. we make computing and communications devices that depend on rare earths only available in war zones or our adversaries land.

we throw away working stuff.

Any AI will have to build itself reliable supply chains for replacement parts, software maintenance and energy. To do that, it will need an army of robots (in the origianl sense of robot, Karel Capek's obedient tireless servants). But any humans spotting such a reliable supply chain will immediately take it over and steal from it, ratehr than rely on their own rubbish production line. Capitalism and natural selection mean a race to the bottom - humanity's inferiority will be AIs downfall. The seeds of our digital overlords demise are built in, due to the inherent contradictions in the rules of engagement.

Friday, October 10, 2025

zero inbox wars

i've had a zero inbox policy since first getting e-mail (late 1970s) - having moved through various systems (roughly once a decade), recently landed on fastmail (which I very very much recommend- extremely fast, but also very easy to migrate to, integrate with other mail systems and calendars (!) and very good support).

so throughout various systems, I've used different tools for managing incoming, and also archives - i have kept all e-mail to/from/cc:d me since 1976:-)

at some point, everything is esentially kept in a bunch of directories (folders) organised with a small (<=3) levels of hierarchy - somewhat like internet name space (.edu .com etc) - with names of people (students) or projects or personal (money, health, house, transport etc)....

not sure what fastmail uses behind the scenes but seems to scale well and has nice rule system for automatically processing incoming too....plus I really like how it interacts with other mail systems ( I have to maintain several outlook accounts for some places I work, and at least one doesn't allow forwarding, but fastmail can pretend to be a client, and make the mail look like it was just got through imap etc)....

anyhow, through various stages (cambridge's ownbrew, then exchange baed system, gmail, and now fastmail) have seen a steady decrease in spam - really very little getting through at all these days (maybe one a day) and very low false positives too ...

but what's left has two things steadily increasing

1/ academic "spam" - e.g. calls for papers, invitations to review, offers to publish my work "for free" (like why would I ever pay?) etc

2/ mandarin - not reading or speaking any version of chinese, I'm assuming this is actually more of 1/ but just for chinese events and publications...

I'm looking for a two stage LLM to deal with those two cases - one "translate", two see if it is relevant - could train the model (or refine/fine tune) based on my own publications or conferences i'm on programme committee for...

maybe a student project!

Anyhow, in my current fastmail setup with 5G of mail, there are only 3 messages in the inbox...soon to be 0.

Wednesday, July 23, 2025

How to reform a National AI Institute?

The Reform Club

A lot of people have been busy recently writing plans for the Turing Institute most of which revolve around criticising the pace and direction of changes that it has been attempting over the past 2 years, and several culminating in the trivial proposal to put all the eggs in one basket (defense and security) and use the big stick that the EPSRC has of deciding whether to continue the block core funding for the next 4+ years to "make it so". This got up to ministerial level.

That isn't reform. That's simply asset stripping. Not that the asset isn't a good one - the defense&security program has always been strong (at least as much as we can publically know) - other work had variable mileage, although commercial partners from banking/commerce, pharmaceuticals et al, keep coming back with more money, which suggests that they liked what they got (with their real world hard-nosed attitide to funds, especially in the UK where R&D spend is so typically low from industry). We're talking £1M per annum per partner levels.

Also the Turing managed to hire very strong people, both as postdoc researchers and as research engineers, in the face of fierce competition from the hyperscale companies (that all have significant bases in the UK, e.g. Google&Deepmind based here, Microsoft, Amazon in Cambridge, Meta and X in London, OpenAI has a London lab, etc etc) - as well as quite a reasonable set of opportunities in UK universities in places significantly more affordable than London (or the South in general) - so presumably, the interesting jobs had challenges, and an open type of work, that attracted people who had a choice.

How not to reform a National AI Institute?

Make it defense. You will lose a large fraction of the other money, and the majority of staff will not be allowed to work here. As with medicine, the UK does not have capacity to provide AI expertise at the level needed from UK citizens alone. Those left with real clue will often find it easier to work at one of those companies too. Especially since salaries start at 3 to 4 times as much.

You will lose all the cross-fertilization of ideas that happen in one domain into others especially with the Turing's unique engineeing team acting across all the possible projects, acting as a conduit for tools and techniques.
You will lose all the visitor programmes including the PhD visitors, which benefits students from all over the UK, many of whom would not be allowed to visit.
You'd lose access to much interesting data, which would not be allowed to be given to people that can't transparently say what they will do with it. Legally.
You wouldnt have a "national institute" - you'd just have another defense lab. It might be very good, but no-one would know. In fact, how come there isn't already one, e.g. in Cheltenham? They have plenty of funds?

What's my alternative?

To be honest, I don't have one. The nearest thing I have is what the Turing has managed to do in weather prediction (see Nature papers on Aardvaak and Aurora), and what we did (still do) in Finance & Economics with some very nice work in explainable AI and fraud detection, and synthetic data, which have multiple applications across many domains. Likewise the engineering teams work on data safe havens which is useful in aforesaid finance, but also in practical privacy preserving machine learning in healthcare and any other sensitive domains. And recent work on lean language models. There are quite a few other things one could mention like that.

You can't predict where good or useful ideas will come from. Who knew giving a ton of money to CERN would lead to the World Wide Web? Who knew a Dutch free OS (minix) would incentivise a Finnish gradute student to write Linux (the OS platform on most the cloud, half the smart phones out there)? Who knew that some small TV company (the BBC) request for a simple low cost computer would lead to the founding of ARM (that has more chips in the world than Intel or anyone else - again in your mobile device)? Who knew this neat little paper on Attention is all you need, would lead to all the harsh language about peoples' failure to predict the importance of LLMs (hey some of those people predicted that blockchain might not be a good idea:-) Who knew?

And who knows how to reform a national AI institute?

Tuesday, July 22, 2025

persistent technology design mistakes that go on giving

History of technology is littered with design mistakes. Quite a few are mercifully shortlived, and some, deservedly, just don't even see the light of day at all (sadly in some cases as they might be useful lessons in what not to do:-)

some mistakes aren't mistakes - one famous one was the choice between VHS and Betamax video tape formats - this was actually a tradeoff in cost/price/distribution and quality - in the end it didn't really matter.

Others somehow survive, grow and persist, and persist in having negative consequences for decades -

In Computer Science, these are things like the C++ language....enough has been written about that and alternative histories (if only Algol 60, or if only Objective C or if only people had written better compilers and libraries for functional languages earlier) -

In Communications Systems, two examples I'd pick: Bluetooth, and Edge.

An early workshop (25+ years ago) in california had presentations on bluetooth and explained why they had made it look like a serial lines (think COM3 on a windows box), stemming from everything lookling like circuits, or telephones to the folks who made this up. And Edge was made reliable (local retransmissions) to mask wireless packet loss, when everything should just "look like an ethernet", as I think Craig Partride put it. The cellular data folks eventually got it (and have done a number of other things much better than wifi), but bluetooth's horribleness persists, partly because the broken architecture was baked in to early software stacks, and is very very hard to persuade people to just ditch. In the IoT world, this led to a plethora of other wireless technologies (zigbee, lorawan etc) at least partly so people could avoid that fate, although there were other goals too.

Anyhow, we avoided being stuck with X.25, but we are stuck with QWERTY. We avoided being stuck with VHS, but we are still stuck with IPv4.

Saturday, July 12, 2025

a brief history of asking for forgiveness versus permission - the napsterisation of AI...

Back in the day, the Jesuits used to say that it was better to ask forgiveness than permission. I think that this may refer to the idea that people may have committed minor errors without knowing what they did was wrong, so they were less blameworthy, especially if, after the event, when the priest or other wise person explained to them the error of their ways, they recanted and were forgiven. To ask permission implies that the answer might be "no".

So now we are in a. world where people are being paid to run stuff like this legitimised botnet, effectively becoming part of a P2P file sharing world. Once upon a time (a generation ago, or almost infinitely in the past) if you ran a thing like this (the Napsterised Internet) you would get sharp letters from lawyers or even just be fined by the copyright infringement police.

Post Napster, Google acquired Youtube and took an interesting step...they basically took the Jesuit line, with a vengance - the trick was that Google went and did massive deals with all the large copyright owners (actually paying quite serious money) and then if you or I uploaded something already covered by that agreement, then no problem. If we uplaoded something not yet covered by the agreement, Google had an offer - they could offer advertising revenue, or possibly market research (popularity metrics), or as a last resort, take down the content.

While the large copyright owners have not been the best of friends to the artists who actually create stuff, this was at least semi-legitimate (I'm not a lawyer, obviously, but it seems to follow the aforesaid Jesuit model, and that has history behind it:-)

Now we have all those GenAI tools trained on a lot of content that is avaialble on the non pay-walled Internet - this does not mean it isn't copyrighted. The AI/LLM companies are notably trying to claim fair use type arguments (which search engines 20 years ago most notably did not) - the difference may reflect a change of culture, a shift in legal interpretation (of say fair use) or perhaps simply a shift in power (AIs owned by companies that have a larger market cap than the GDP of most countries).

At least one of those AIs is run by the aforesaid search engine company. But others are not, and don't necessarily have search, and certainly don't appear to have done the large deals for content with those big copyright owning companies...

So the game is afoot... ... ...