tag:blogger.com,1999:blog-190621272024-03-28T20:29:37.238-07:00A True History of the Internetyes its true, all of it - the internet doesn't really exist, so it must be.jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.comBlogger513125tag:blogger.com,1999:blog-19062127.post-18562494150923260992024-03-10T08:38:00.000-07:002024-03-10T08:38:13.056-07:00 Witch Consumer Magazine, review of the leader boared top three LLMs "Conformité Ecologique" (the ubiquitous CE marque)<p> </p><p><br /></p><p><br /></p><p>Witch Consumer Magazine, review of the leader boared top three LLMs "Conformité Ecologique" (the ubiquitous CE marque)</p><p>We analyzed the CE claims of the following three large languish models, with respect to four key metrics for the Ecologique, as agreed in European law, namely enthalpy, internet pollution (measured in LoCS -- libraries of congress), bio-dediversification,and general contribution towards the heat death of the universe.</p><p>Currently, according to the boared, these are the top-of-the-heap in terms of hype-parameters:</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>The Faux Corperation's Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Astravista's Libration</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Sitting Duck's Nine Billion Dogma</span></p><p><br /></p><p>We hired some prompt engineers to devise a suitably timely benchmark suite, and embedded the the three systems in our whim tunnel taking care to emulate all aspects of the open road to avoid any repeat of the folk's wagon farage.</p><p>Indeed, we used all three systems to design the whim tunnel, and compared the designs to within an inch of their lives until we were satisfied that this was a suitably level playing field on which to evaluate.</p><p><br /></p><p>The benchmark suite will be made avaialble later, but for now, suffice it to say that we were able to exceed the central limit theorem requirements, so that our confidence is running high that the results are both meaningful, and potentially explainable, but certainly not excusable.</p><p><br /></p><p>Enthalpy</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Pinocchio ran very hot, both during training and during every day use.</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Libration</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Libration was about half the temperature of Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Dogma </span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Roughly 12.332 times less than the next worst.</span></p><p><br /></p><p>Pollution</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>The Internet was worse off after this tool was used by </span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>approximately 3 LoCs</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Libration</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Again about a half as bad</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Dogma</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Was difficult to measure as the system never stabilised, but </span>oscillated between getting worse, <span> </span><span> </span><span> </span><span> </span><span> </span>and then better, however, the improvements were usually half the degradations.</p><p><br /></p><p>de-diversification</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Dogma</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>This was a shock - we expected better, but in fact the outcome was </span>really rapid removal of <span> </span><span> </span><span> </span><span> </span><span> </span><span> </span>variance.</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Libration </span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Around half as bad as Dogma</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>very slightly less bad than Libration</span></p><p><br /></p><p>Entropy</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Libration</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Excess use of Libration could bring the heath death of the </span>universe closer about 11 times faster <span> </span><span> </span><span> </span><span> </span>than a herd of small children failing to tidy up their rooms</p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Pinocchio</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Absurdly only 3x better than Libration.</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Dogma</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Appeared to gain from the Poppins effect, and generally ended up </span>tidier than before</p><p><br /></p><p>Some critics have pointed out that Enhalpy and Entropy are two sides of the same coin, and pollution is likely simply the inverse of de-diversification, nevertheless, we proceeded to evaluate all four in case later we might find different.</p><p>In general, none of these products meet the threshold for a CE mark, and for your health, and sanity, we strongly recommend that you do not use any of them, especially if you are in the business of prediction. Next week, we will review a slew of probablistic programming models with a special emphasis on the cleanliness of the Metropolitan Hastings line.</p><div><br /></div>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-65656442118019976992024-02-26T08:31:00.000-08:002024-02-27T23:34:00.270-08:00Towards International Goverance of AI<p> I wonder what people are really thinking when they think of governance of Intelligence?</p><p>If we were considering human intelligence (which we are by extension) we better tread carefully, especially when considering who <i><b>owns</b></i> it. The ability to reason, creatively, to innovate is not really the same as any other thing we have sought governance over - </p><p><br /></p><p>nuclear weapons (test ban treaty, and pugwash convention)</p><p>spectrum allocation</p><p>orbits around earth</p><p>maritime&air traffic - fuels, tracking, control etc</p><p>recombinant DNA (asilomar conference</p><p>the weather (and interventions like geo-engineering e.g. see <a href="https://royalsocietypublishing.org/doi/10.1098/rspa.2019.0255">RS report on same</a>)</p><p>what's similar about these, and what is different? </p><p>Well we only have one go at each - there's a very countable human race, planet, sea, zombie apocalypse, climate emergency. we don't have time to muck about with variants of rules that apply to fungible material goods. We need something a tad more radical.</p><p>So how about this: A lot of AI is trained on public data (oxygen==the common crawl) - this is analgous to robber barons who enclosed the commons, then rented out the land to farmers to graze their cattle on, which used to be a free shared good...</p><p>A fix for this, and to re-align incentives is to introduce a Piketty style tax on the <i>capital</i><b style="font-style: italic;"> </b>value of the AI - we could also just "re-nationalise" it, but typically, most people don't believe state actors are good at managing things and prefer to have faith in the invisible hand- however, history shows that the invisible hand goes hand-in-glove with rich-get-richer, so a tax on capital (and as he showed in great detail in <a href="https://en.wikipedia.org/wiki/Capital_in_the_Twenty-First_Century">Capital in the 21st Century</a>, it does not have to be a very high rate of tax to work), we can return the shared value of the AI to the common good.</p><p>A naive way to compute this tax might be to look at the data lakes the AI was trained on, although this may not all be available (since a lot of big AI companies add some secret sauce as well as free or appropriated ingredients) - so we can do much better by computing the entropy of the output of the AI.</p><p>A decent algorithm should produce very information rich output, compared to the size - e.g. a modern LLM with 100s of billions of dimensions, should produce short sentences or images which are highly instructuve - we can measure that, and tax the AI accordingly.</p><p>This should also mitigate the tendency to seek data without agreement or consent. </p><p>I realise this may sound like a tax on recording media (back in the day, there were campaigns about "hope taping is killing the music industry"), but I claim there's a difference here in terms of the over-claimed, over-hyped "value add" that the AI companies assert - the real value was in the oxygen, public data, like birdsong or folk tunes, which should stay free or we die - in not being able to make it free, I suggest we do the next best thing and tax the rich. Call me old fashioned, but I think a capital value Piketty tax to mitigate rentiers is actually a new idea, and might actually work. We could call it VAIT.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-87920752007991001452024-02-18T02:47:00.000-08:002024-02-18T02:47:12.207-08:00Government Procurement of Open Systems Interoperability or Open Source - a lesson for Digital Public Infrastructure<p>40+ years ago the US and European countries devised a government procurement policy which was to require suppliers to conform to Open Systems Interconnection standards - this was a collection of documents that could be used in RFP (request for proposals) to ensure that vendors bidding for government contracts to supply communications equipment, software, systems and even infrastructure would comply to standards that meant the government could avoid certain pitfalls like lockin, and monopolies of vendors arriving in the communications sectore.</p><p>It worked - we got the Internet - probably the worlds first digital public infrastructure provided both by public and private service providers, equipment and software vendors, and a great deal of open source software (and some hardware).</p><p>There's <a href="https://doi.org/10.1016/0169-7552(90)90084-6">one review</a> of how this evolved back in 1990 that represents an interesting transition point, from what were International Standards for Interconnection provided by the UN related organisation ISO or the ITU, to the Internet Standards, which were just about to come to dominate real world deployments - 1992 was a watershed point when the US research fudning agencies stopped funding IP infrastructure, and commercial ISPs very rapidly crystalised out of regional and national (and later, international) community run networks (where communities had been collaborations of research labs and universities funded by DARPA and NSF, or similar in Europe).</p><p>Why did the Internet Standards replace the ISO/ITU standards as the favourites in goverment procurement? It is hard to prove this, but my take is that they were significantly different in one simple regard - the specifications were matched with open source implementations. From around the early 1980s, one example was Berkeley Unix which included a rock solid TCP/IP software stack, funded by DARPA (derived from one at BBN (and required to be open source so others (universities, commerce and defense) could use and add to it as needed in the research programs of the 1980s, as actually happened. By 1992, just as the network went beyond government subsidy status, Berners-Lee released the first open source web server and browser (and specifications) and example sites boomed. Then we had a full fledged ecosystem with operational experience, compelling applications, and a business case for companies to join in to extend and make money, and governments to take advantage of rapidly improving technology, falling prices, and a wide choice of providers.</p><p>So in a competing world, standards organisations are just more sector, and customers, including some of the biggest cosumters, i.e. governments, can call the shots in who might win.</p><p>Now we face calls for Digital Public Infrastructures for other systems (open banking, digital identity being a cornerstone of that, but many others) and the question arises about how the governance should work for these.</p><p>So my conclusion from history is that we need open standards, we need government procurement to require interoperability (c.f. Europen Digital Markets Act requirement) and we need open source exemplars for all components to keep all the parties honest.</p><p>I personally would like to go further - I think AI today exploits the open availability of huge swathes of data to create new knowledge and artefacts. This too should be open source, open access, and required to interoperate - LLMs for example could scale much better if they used common structures and intermediate model formats that admitted of federation of models (and could even do so with privacy of training data if needed)...</p><p>We don;t want to end up with the multiple silos that we currently have in social media and messaging platforms, or indeed, the ridiculous isolation between video conferencing apps that all work in browsers using WebRTC but don't work with each other. This can all be avoided by a little bit of tweaking of government procurement, and some nudging using the blunt instrument of Very Large Contracts :-)</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-75364696999272869162024-02-17T05:02:00.000-08:002024-02-17T05:02:36.840-08:00mandatory foley sounds<p>you know it was suggested that EVs that are so beautifully silent, should be required to make a bit of fake engine or tyre noise just so pedestrians and cyclists are aware they are there.</p><p>but what is far more urgent is that we need people carrying phones they are staring at to do the same (oh, ok, maybe not revving diesel, or screeching rubber - maybe some other thing like belches, or farts or other human like sounds)....then if i'm cycling along, i know there's a stupid pedestrian who doesn't know I am there because they aren't looking before they step into the road. </p><p>the phone could also emit a radio beacon to warn EVs to slam the brakes on.</p><p>or we could just let darwin play out...</p><p><br /></p><p>oh, thinking about this, we could also imagine that the reason aliens have not been in touch with earthlings in all the 100 years we've been beaming out radio to them is that it is entirely possible that any sufficiently advanced civilisation has forgotten where the unmute button is.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-9473191907621168092024-02-12T02:00:00.000-08:002024-02-12T02:00:22.556-08:00explainable versus interpretable<p> This is my explanation of what I think XAI and Interpretable AI were and are - yours may differ:-)</p><p><br /></p><p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">XAI was an entire DARPA funded program to take stuff (</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">before the </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">current gibberish hit the fan) like convolutional neural nets, and </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">devise ways to trace just exactly how they worked - </span></p><p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;"><i>Explainable</i> AI has </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">been somewhat eclipsed by <i>interpretable</i> AI for touchy-feely reasons </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">that the explanations that came out (e.g. using integrated gradients) </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">were not accessible to lay people, even though they had made big i</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">nroads into shedding light inside the old classic "black box" AI - so </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">a lot of stuff we might use in (e.g.) medical imaging is actually </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">amenable to giving not just an output (classification/prediction) but </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">also what features in the input (e.g. x ray, mri scan etc) were the </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">ones, and indeed, what labelled inputs were specific instances of </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">priors that led to the weights that led to the output.</span></p><p></p><div style="text-align: justify;"><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Interpretable AI is much more about counterfactuals and showing from </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">20,000 foot how the AI can't have made a wrong decision about you </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">because you're black, since the same input with a white person gives same decision......i.e. is narrative and post hoc, as opposed to </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">mechanistic and built in...</span></div><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">It is this latter that is, of course, (predictably) gameable - the </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">former techniques aren't, since they actually tell you how the thing </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">works, and are attractive for other reasons (allow for reasoned </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">sparsification of the AI's neural net to increase efficiency without </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">loss of precision, and allow for improved uncertainty quantification,</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">amongst other things an engineer might value)...</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">None of the post DARPA XAI approaches (at least none that I know of) would </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">scale to any kind of LLM (not even Mistral 7B, which is fairly modest s</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">cale compared to GPT4 and Gemini) - so the chances of getting an </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">actual explanation are close to zero. given they would struggle for </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">similar reasons to deal with uncertainty quantification, the chances </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">of them giving a <i>reliable</i> interpretation (I.e. narrative counterfactual </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">reasoning) are not great (there are lots of superficial interpreters based around pre- and post- filters and random exploration of the state space via "prompt engineering" - I suspect these are as useful as the old Oracle at Delphi...("if you cross that river, a great battle will be won"), but I would enjoy being proven wrong!</span><p></p><p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">For a very good worked example of explained AI, the DeepMind Moorfields retina scan NN work is exemplary - </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">there are lots of others out there including use of the explanatory value to improve efficicency.</span></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-69528285293497363452024-02-04T03:00:00.000-08:002024-02-04T03:00:40.783-08:00standards and interoperable AI and the lesson from the early internet...<p>Back in the day (e.g. 1980), when we were deploying IP networks there were a ton of other comms stacks around, from companies (DEC, IBM, Xerox etc) and from international standards orgs like ITU (was CCITT- X.25 nets) and ISO (e.g. TP4/CLNP). They all went away because we wanted something that was a) open, free, including code and documentation...</p><p>and</p><p>b) worked on any system no matter who you bought it from, whether very small (nowadays, think rasperrry pi etc) or very large (8000 core terabytes of ram, loads of 100s Gbps NICs etc), and </p><p>c) co-existed in a highly federated, global scale system of systems.</p><p>So how come AI platforms can't be the same? We have some decent open source, but I don't see much in the way of interoperability right now, yet for a lot of global problems, we would like to federate, at coarse grain/large scale - e.g. for healthcare or environmental models or for energy/transportation so we get the benefit e.g. better precison/recall, longer prediction horizons, more explainability, and, indeed, more sustainable AI, at the very least, since we wont all be running our own silos doing the same training again and again.</p><p>We should have an IETF for AI and an Interop trade show for AI and we should shun products and services that don't play - we could imagine an equivalent of what happened to europe and US GOSIP) Government Open Systems Interconnection Procurement) - which evolved into "just buy Internet, you know it makes sense, and it should be the law".</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-66542187227980470462024-01-29T02:00:00.000-08:002024-02-01T00:39:58.974-08:00Centralization, Decentralization, and Digital Public Infrastructures<p> </p><p><br /></p><p><span style="font-size: large;">Centralization, Decentralization, and Digital Public Infrastructures </span></p><p>with apologies to Mark Nottingham, https://www.rfc-editor.org/rfc/rfc9518.html</p><p><span style="font-size: medium;">Through the Control/Power Lens.</span></p><p>Governments have typically centralized control of the things that governments do- raising tax to fund provision of certain services - education, health, transportation, defence, and even in the not too distance past, telecommunications. Decentralised government (e.g. syndicalism) has been rare. On the other hand, most governments in recent history have left domains outside of government to markets or communities, although not without some (perhaps limited) regulation or control of governance. </p><p>In the past, communities have built cooperative ventures (shared barns, shared savings and loans) and more recently community networks and power grids.</p><p><span style="font-size: medium;">Through the Economic Lens</span></p><p>Markets often espouse competition where multiple providers offer equivalent products and services. Various models exist for central versus decentralised economies.</p><p>There's interaction between government and economy, through regulation, especially when threat of monopoly, or even just oligopolies, through coercion, government v. companies w.r.t making sure market operates transparently, efficiently and fairly. (see later feds v. apple, and in the UK, IPA v. GDPR.)</p><p>Through law, government may also provide citizens with Agency, Representation, Redress.Of course, there are good and bad government, and typically this can show up in terms of poor practice, or deliberate removal of rights (to agency or redress, e.g. concerning unfair treatment, including exclusion etc </p><p>Governments may be good now, but bad later, or vice versa. It is not an accident that Germany has the strictest privacy laws in the world, it was as a result of their past experience of East Germany under the Stasi. They are not so naive as to believe that that couldn't happen again one day (sadly).</p><p><span style="font-size: medium;">Through the Technology Lens</span></p><p>The Internet is probably the best example of something that had been largely a decentralised system for decades.</p><p><br /></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Horizontal - services - interoperation/federation</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>E-mail, web, name spaces</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>Vertical - stacks - silos</span></p><p><span style="white-space: normal;"><span style="white-space: pre;"> </span>cloud, social media, online shopping, entertainment</span></p><p>Horizontal systems are somewhat decentralised (or at least distributed) whilst in some informational sense, vertical systems are somewhat centralised.</p><p><span style="font-size: medium;">Through the Information Lens</span></p><p>Where is data, is orthogonal to who can read it, who can alter it.ownership and control depends on access, and access control, and legibility. So whether I can get at my, or your data depends on my role and my privilege level but also whether I can then actually decode your data depends on my having the right software.</p><p>At some level, we can expect most data today to be encrypted , at rest, during transfer and even while processing. Protection through access control is not sufficient, since there are mistakes, insider attacks and coercion. Software has vulnerabilities. Hence we employ keys, and encryption/decryption depends on having both data and relevant keys.</p><p>One step further, who has accessed the data, and who has been able to decode the data is part of audibility (who can see who can see - Quis custodiet ipsos custodes? etc)</p><p>If the user controls the keys, they may not care too much where the data is (except for potential denial of access) since others copying the data will not be able to decode it. On the other hand, if the government keeps copies of keys, then they can access any relevant data, wether it is central or decentralised. Of course, if the government accesses my data o my computer, I may be aware of that (through audit trails). but that might not do my much good in the face of a "bad" government. </p><p>There are two separate aspects of identity systems where visibility of data matters, in terms of threats to citizens from bad actors: firstly, foundational id provides linkability across surveillance of actions (voting, signing on to services, etc) so exposes the individual's digital footprint to long term analysis; secondly, functional id includes particular attributes (age, gender, race, religion, licenses to operate vehciles, medical, academic qualifications etc etc), which offers opportunity to discriminate (treating groups preferentially or excluding or reducing rights of other groups etc etc). A bad actor doesn't need the whole government (or its service providers) to misbehave - just that systems are poorly designed so that insiders can exploit vulnerabilities). The perception of this possibility is enough to create distrust, and disengagement, which itself will mitigate against vulnerable groups in society more than privileged.</p><p><span style="font-size: large;">Through the Efficiency Lens</span></p><p>We can put all the data in the world in one place, or we can leave it where it was originally gathered. This is a choice that represents two points on a spectrum of centralized versus decentralized data. One can also copy the data to multiple places.</p><p>There are efficiency considerations in making this choice, which entail more energy, higher latency, lower resilience, worse attack surface, and potential for catastrophic mistakes, when taking the centralised path. The decentralised path reduces these risks, but still requires one to consider copies for personal data resilience.</p><p>These choices are orthogonal to the access choices, which merely concern who has rights ad keys, and where they keep those, not who holds data where.</p><p><br /></p><p><span style="font-size: medium;">Conclusions, regarding Alternative Solutions in the Digital Identity Space</span></p><p>A digital public infrastructure such as an identity system needs to be trusted (so people use it), and therefore considerations about whether the user base trust the government or not matter.</p><p>If we don't trust the government, we might choose a decentralised system, or at least a system with decentralised keys (like the Apple iCloud eco-system).</p><p>The question of whether there should be one provider, or six, or 10 billion is orthogonal to this trust, although it does impact resilience and latency, i.e. efficiency. If the keys are owned by users, then this impacts governments' ability to use identity data (attributes, and identity usage) to plan, whether for good or bad. That said, some privacy technology (e.g. FHE or MCS) combined with decentralised learning might allow non privacy invasive statistics to be gathered by a centralised agency (i.e. government) without actual access to individually identifying attributes. A good example of this was the Google Apple Exposure Notification system designed for use for digital contact tracing during Covid, which could have been adapted to offer statisticcal information (e.g. differentially private) if necessary (though it wasn't used that way in practice).</p><p>All of this leads to the question about who provides key management, and a related question of certification (i.e. why should we trust the key management software too). One solution to this is to provide a small (e.g. national scale) set of identity services, but a decentralised key management system that can also be used to federate across all the identity services (cross-border/ or between state and private sector).One technology that we built to provide that independent key management for identity systems is <a href="https://github.com/alan-turing-institute/trustchain">trustchain</a>[1], which is a prototype that services to replace a (somewhat) centrally owned platform such as Web PKI.</p><p>An interesting oligopolistic system that offers somewhat decentralised certificates is the Certificate Transparency network (of about 6 providers) that sign keys for the Internet -- this arose because the previously centralised CAs were hit by attacks which caused major security breaches in the Internet. We would argue that a similar scale system for key management and certification for digital identity is evidentially the bare minimum for acceptability for any trustworthy system.</p><p>Whether the system infrastructure itself is decentralised or not is a separate question which concerns efficiency, and, perhaps, some types of resilience (Estonian Digital Citizenship systems are distributed over several countries for backup/defensive/disaster recovery reasons).</p><p>[1] trustchain is a prototype that is based on ION and makes parsimonious use of the bitcoin proof-of-work network to provide decentralised trustworthy time, and then can create/issue keys in a way not dependent on any central provider or service, resilient to coercion, collusion and sybil attacks. We are currently investigating replacing the proof-of-work component with <a href="https://svr-sk818-web.cl.cam.ac.uk/keshav/wiki/images/1/1a/Time_Fabric_FAB_21.pdf">TimeFabric</a>, which itself depends on a ledger, but can use a proof-of-stake or proof-of-authority and is therefore massively more sustainable.</p><div><br /></div>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-13402150637466887442024-01-11T12:22:00.000-08:002024-01-11T12:22:51.995-08:00Replacing the Turing Test with the Menard Test<p> In Borges short short tale, <a href="https://interestingliterature.com/2021/02/jorge-luis-borges-pierre-menard-author-of-the-quixote-summary-analysis">"Paul Menard, author of the Quixote"</a>, he reports on the astonish tale of the 19th century author, who livs a life so exemplary in the literal sense that it is exemplary of what the author of the classic work, Don Quixote should be like, that when Menard produces the Quixote, it is not a copy of the work by Cervantes, but a better work, despite being word-for-word identical. But it is not a copy, it was made through the creative efforts of Menard, based on his experiences and knowldge and skills.</p><p>Imagine an AI that was trained i n the world, not on a large corpus of text, so that it didn;t just acquite a statistical model of text, but acquired an inner life, and then could use that inner life to create new works.</p><p>Imagine such an AI was able to produce, for example, a book called Don Quixote, without having read the work by Cervantes.</p><p>That AI would necessarily contain a model of Cervantes, or at least something that had many of the same elements.</p><p>This model of a creative human is quite different from a model of lots of blocks of text, which can be regurgitated with many small variations, but are, of necessity, merely stochastic parrots.</p><p>Was one to interrogate the true creative AI, it might respond with other works, that Cervantes might have written, if he were still around.</p><p>A similar AI with an inner life, that modelled, say, Schubert, might be capable of completing symphony number 8, or another, with the "eye" of Jackson Pollock, might move from abstract expressionism, to hyper-realism one day.</p><p>Such AIs might be able to introspect (e.g. in the manner of Alfred Hitchock, when interviewed by Francois Truffaut about why he used certain approaches for sense in his film).</p><p>Such systems would really be interesting, and not rote learned in how to pass trivial Turing Tests.</p><p><br /></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-60347193415373253532024-01-02T05:41:00.000-08:002024-01-02T05:41:46.450-08:00AI predictions with the possibility of fairness?<p> There's a bunch of work on impossibility results associated with machine learning and trying to achieve "fairness" - the bottom line is that if there is some characteristic that splits the population, and the sub-populations have different prevalence of some other characteristic, then designing a fair predictor that doesn't effectively discriminate against one or other sub-population isn't feasible.</p><p><br /></p><p><a href="https://arxiv.org/pdf/1609.05807.pdf">one key paper on the impossibility result</a> covers this (alternative is to build a "perfect" predictor, which is kind of infeasible).<br /></p><p><br /></p><p>On the other hand, some <a href="https://arxiv.org/abs/2012.02972">empirical studies</a> show that this can be mitigated by building a more approximate predictor/classifier, perhaps, for example, employing <a href="https://arxiv.org/pdf/1707.06613.pdf">split groups</a> and even to try to achieve <a href="https://arxiv.org/abs/1104.3913">"fair affirmative action"</a> - this sounds like a plan, but (I think - please correct me if I am wrong), assumes that you can</p><p></p><ul style="text-align: left;"><li>work out which group an individual should belong to</li><li>know the difference in prevalence between the sub-groups</li></ul><div>Suggests also to me that it might be worth looking at causal inference over all the dimensions to see if we can even determine some external factors that need policy intervention to, perhaps, move the sub-populations towards having equal prevalence of those other characteristics (high school grade outcomes, risk of re-offending, choose your use case)....</div><div><br /></div><div>I guess one very important value of the work above is to make these things more transparent, however the policy/stats evolve.</div><p></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-16785637408900358242023-12-04T03:02:00.000-08:002023-12-04T03:02:53.984-08:00Just what is autonomy (in an automated system like an embodied AI, Robot or even Human Proxy)?<p> Something or someone (a proxybot) carries out an action "on behalf of someone" at some distance in space and time from the person issuing the instructions. They were given instructions on what to do, including contingencies for varying circumstances. What level of autonomy does this represent, if the proxybot can vary what they do if the circumstances are not precisely any of those foreseen? </p><p>(If this, then that, otherwise...)</p><p>Of course, the proxybot programmer could try to foresee the universe of possibilities, or could include "failsafe" alternates, or describe overall / overarching principles for decision making in the presence of novel situations (ethical envelopes, so to speak).</p><p>But the instrument is still an instrument, and not really autonomous. It is an agent of the orginator. Just because it is removed in space and/or time does not reeuce the agency of the programmer, surely? Unless the programmer and proxybut "agree" to handover agency: but what would such a handover look like? how would you know?</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-56784264913249923532023-11-20T03:50:00.000-08:002023-11-20T03:50:09.171-08:00scholastic parrots<p> having a conversation in the Turing with my mentor and discussing whether LLM is just AGI because AGI is "just" statistics, and also "just passed the Turing Test"....and we both observed that most interactions we have with other GIs (human intelligences) are pretty dumb.</p><p>so my main concern with this is the usual repetition of the Theodore Sturgeon comment about most SF being pretty terrible, and he responded with "most everything is pretty terrible". Intelligence is rare - most GIs can exhibit it, but only do so very occasionally, as intelligence is really not often very useful - habit is much more useful (thinking fast, rather than slow, is a survival trait according to kahenman and tverski).</p><p>so like many things, smartness is zipf/heavy tailed - </p><p>the title of this entry refers to scholarly works - most papers are cited less than once. A few papers get tens of thousands of citations.</p><p>So you train an LLM on the common crawl, or on the library of congress, and the majority of stuff you've trained it on isn't even second class, it is just variations of the same thing.</p><p>This isn't model collapse - this is an accurate recording of a model of what most people's visible output looks like. Dim, dumb, and dumber. So what?</p><p>Well, going back to the Turing test, if you, an Average Joe, pick an LLM at random, prompt it with some average prompts and compare it to the average GI, you will unsurprisingly conclude the LLM has passed the turing test.</p><p>But what if you had Alan Turing (assuming still alive) at the other end of the GI teletype, I ask? and what if you got Shakespeare and Marie Curie and Onara O'Neil to ask some questions of it and the LLM.</p><p>Then I suspect you'd find your LLM was a miserable failure, like the rest of us. Except that every now and then, we rise to the occasion and actually engage our brains, which it cannot do.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-76740753719658249302023-10-24T00:18:00.004-07:002023-10-24T00:18:44.671-07:00In-network processing - do we ever really need it?<p> We've looked at this problem from several sides now - to solve the "incast", to do aggregation for map/reduce or any federated learning platform, to aggregate acknowledgements for PGM.</p><p>When we say "in-network", we're talking about in-switch processing - borrowing resources from the poor P4 switch to store and process multiple application layer packets worth of stuff, so that only one actual packet (or at least a lot less) needs to be sent on its way.</p><p><br /></p><p>So how about we compare with multicast (in network copying) and its (largely) replacement by CDNs/overlays.</p><p>Key point is branches in the net - this is where the "implosion" (for incast) or "explosion" (for multicast) happens:</p><p>So do we have a server nearby? Or can we just put one there (or just connect one there?</p><p><br /></p><p>Answer is (for multicast yes:</p><p>netflix/pops in wide area - use distribution trree to all pops, and caches</p><p>So in data center: </p><p>use servers, not switches and build sink forest of trees</p><p>clos system, connect servers to local switch, top of rack, and spine switch/server...then for servers at some level, use a node at the next level up as aggregation server (note Clos even has redundancy so this will survive edge/switch outages)</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com1tag:blogger.com,1999:blog-19062127.post-68918681852856327592023-10-13T02:37:00.002-07:002023-10-13T02:37:25.003-07:00Unseeing like a State<p> Just read <a href="https://yalebooks.co.uk/book/9780300246759/seeing-like-a-state">Seeing like a state</a>, by James C. Scott. Incredible scope and vision for what is often, but not always) wrong with "tech" led solutions - though in a very broad sense. - looks at imposition of regularised/normalized villages, farming, transport, city structures and so on, especially by "developed" world on the (frequently) completely inappropriate contexts of colonies but also post colonial, self imposted. From russian collective farms, to modernist cities like Brasilia, from mono-culture farming to single-minded, wrong-headed cultural impositions, an amazing read!</p><p>It basically makes it pretty obvious why the following stuff happens:-</p><p>Tim Wu's <a href="https://www.theguardian.com/books/2016/dec/26/the-attention-merchants-tim-wu-review">Eyeball Bandits</a></p><p>Ian Hislop's <a href="https://www.bbc.co.uk/programmes/m00095hv">Fake News</a></p><p>Doctorow's <a href="https://www.wired.com/story/tiktok-platforms-cory-doctorow/">Drain Overflow</a></p><p><br /></p><p>Basically, the Internet users are the hunters and gatherers that just got fenced in and collectively farmed, like ants. Grate.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-56252606472061126392023-09-25T05:48:00.002-07:002023-09-25T05:49:07.250-07:00boxing clever with AI<p> There was this AI creative challenge where the you had to figure out things to do with 4 objects, as follows:</p><p>A box, a candle, a pencil and a rope</p><p><br /></p><p>Here's my 3 proposals:</p><p>1. Draw a still life on the box of the candle and the rope so that it looks like 3D (i.e. draw on all 6 sides of the cube, with the pencil)</p><p>2. make a clock out of setting fire to the candle, the rope and the pencil - they will burn at different rates and you could mark out the seconds, minutes and hours with box lengths, then sit on the box, passing time</p><p>3. Have a boxing match between the pencil and the candle, in a ring made by the rope.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-32141030874229498652023-09-21T08:26:00.007-07:002023-10-25T00:59:05.478-07:00dangerous AI piffle...<p> So what's a dangerous model?</p><p><br /></p><p>The famous equation, E=mc^2 is dangerous - it tells you about nuclear power, but it tells you about A-bombs too.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEiQzakmhgDT9V2qX6p0cnAlJNTlwLIVdqnwXkkUZvuLyuuy-2AOX4jvV-6oyl_1vnlFWsfm4MObXxiOmkrPMS7WsA42oN3gmUQKjdtM1jHOYTHtnAKYHRmRmEXZHX8AHRhKWOhTiZO7bD58hR6GvQalcBWvOZY79hJDiTxFtuD64anbc4hyAE5U" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="" data-original-height="433" data-original-width="233" height="120" src="https://blogger.googleusercontent.com/img/a/AVvXsEiQzakmhgDT9V2qX6p0cnAlJNTlwLIVdqnwXkkUZvuLyuuy-2AOX4jvV-6oyl_1vnlFWsfm4MObXxiOmkrPMS7WsA42oN3gmUQKjdtM1jHOYTHtnAKYHRmRmEXZHX8AHRhKWOhTiZO7bD58hR6GvQalcBWvOZY79hJDiTxFtuD64anbc4hyAE5U=w65-h120" width="65" /></a></div>This famous molecular structure dangerous too - it tells you about DNA damage, but it tells you about eugenics too.<div><br /></div><div>[picture credit By Zephyris, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6285050]<p><br /></p><p>So we had Pugwash and Asilomar, to convene consensus not to work on A bombs and not to work on recombinant DNA. Another example - the regulator has just approved exploiting the <a href="https://www.bbc.co.uk/news/live/uk-66933804">Rosebank</a>UK oilfield, despite that solar and wind power are now cheaper than fossil fuel, and that <a href="https://www.un.org/en/climatechange/cop26">COP26</a> made some pretty clear recommendations about not heating the planet (or losing biodiversity) any more.</p><p>What would a similar convention look like for AI? Are we tallking about not using Generative AI (LLMs, Stable Diffusion etc) to create misinformation? really? seriously? that's too late - we didn't need that tech to flood the internet and social media with effectively infinite amounts of nonsense.</p><p>So what would be actually bad? well, a non explainable AI that was used to model climate interventions and led to false confidence about (say) some Geo-Engineering project, that made things worse than doing nothing. That would be bad. Systems that could be inverted to reveal all our personal data. That would be bad. Sytems that were insecure and could be hacked to break all the critical infrastructure (power, water, transportation, etc) - that would be bad. So the list of things to fix isn't new - it is the same old things, just applied to AI like they should have been applied to all our tech (clean energy, conserving bio-diversity, building safe resilient critical infrastructures, verifiable software, just like aircraft designs etc etc)...</p><p>n.b. the trivial <span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Excel error </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">that led to UK decision to impose </span><span class="il" style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">austerity</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">, that was exactly incorrect:-</span></p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Recall the Reinhart-Rogoff error:</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><a data-saferedirecturl="https://www.google.com/url?q=https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646&source=gmail&ust=1698307065896000&usg=AOvVaw3cUwC2J7uhlw7mb6MLvGCG" href="https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646" rel="noreferrer" style="background-color: white; color: #1155cc; font-family: Arial, Helvetica, sans-serif; font-size: small;" target="_blank">https://theconversation.com/th<wbr></wbr>e-reinhart-rogoff-error-or-how<wbr></wbr>-not-to-excel-at-economics-136<wbr></wbr>46</a><p>So dangerous AI is a red herring. indeed, the danger is that we get distracted from the real problems and solutions at hand.</p><p><br /></p><p>Late addition:- "<span face=""hco ideal sans ssm", verdana, sans-serif" style="background-color: white; color: #282828;">"There's no art / to find the mind's construction in the face."</span></p><p><span face=""hco ideal sans ssm", verdana, sans-serif" style="background-color: white; color: #282828;">sad Duncan, ironically, not about Macbeth...</span></p><p><span face=""hco ideal sans ssm", verdana, sans-serif" style="background-color: white; color: #282828;">So without embodiment, AI interacts with us through very narrow channels - when connected to decision support systems, it is either via text, images or actuators, but there is (typically) no representation of the AI itself (it's internal workings, for example) so we construct a theory of mind, about it, without any of the usual evidence that we rely on (construction in the face...) to infer intent (humour, irony, truth, lie etc)</span></p><p><span face="hco ideal sans ssm, verdana, sans-serif" style="color: #282828;"><span style="background-color: white; font-size: 18px;">We then often err on the side of imparting seriousness (truth, importance) to the AI, without any supporting facts. This is where the Turing test, an idea devised by a person somewhat on the spectrum by many accounts, fails to give an account of how we actually interact in society.</span></span></p><p><span face="hco ideal sans ssm, verdana, sans-serif" style="color: #282828;"><span style="background-color: white; font-size: 18px;">This means that we fall foul of outputs that are biased, or deliberate misinformation, or dangerous movements, far more easily than we might with a human agent, where our trust would have to be earned, and our model of their mental state would be acquired over some number of interactions, involving a whole body (pun intended) of meta-data.</span></span></p><p><span face="hco ideal sans ssm, verdana, sans-serif" style="color: #282828;"><span style="background-color: white; font-size: 18px;">Of course, we could fix AIs so they did this too - embody them, and have them explain their "reasoning", "motives" and "intents"... That would be fun.</span></span></p><p><span face=""hco ideal sans ssm", verdana, sans-serif" style="background-color: white; color: #282828; font-size: 18px;"><br /></span></p></div>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-68808895961383852692023-08-21T03:21:00.007-07:002023-09-12T06:41:05.915-07:00AI4IP<p> </p><p><br /></p><p>Plenty can and has been said about networks (&systems) for AI, but AI for nets, not so much.</p><p><br /></p><p>The recent hype (dare one say regulatory capture plan?) by various organisations for generative AI [SD], and in particular LLMs has not helped. LLMs are few shot learning that make use of the attention mechanism to create what some have called a slightly better predictive text engine. Fed a (suitably "engineered") prompt, they match an extension database of training data, and emit remarkably coherent, and frequently cogent text, at length. The most famous LLMs (e.g. ChatGPT) were trained on the Common Crawl, which is pretty much all the publicly linked data on the Internet. Of course, just because content is on the common crawl doesn't necessarily mean it isn't covered by IP (Intellectual Property - patents, copyrights, trademarks etc) or indeed isn't actually private data (eg. covered by GDPR), which causes problems for LLMs.</p><p>Also, initial models were very large (350B dimensions) which means most of the tools & techniques for XAI (eXplainable AI) won't scale, o we have no plausible reason to believe their outputs, or to interpret why they are wrong when they err. Generally, this causes legal, technical and political reasons that they are hard to sustain. Indeed, liability, responsibility, resilience are all at risk.</p><p><br /></p><p>But why would we even think of using them in networking?</p><p>What AI tools make sense in networking?</p><p>ML</p><p>Well, we've used machine learning for as long as comms has existed - for example, training modulation/coding on the signal & noise often uses Maximum Likelihood Estimation to compute the received data with best match.</p><p>This comes out of information theory and basic probability and statistics.</p><p>Of course, there are a slew of simple machine learning tools like linear regression, random forests and so on, that are also good for analysiing statistics (e.g. performance, fault logs etc)</p><p><br /></p><p>NN</p><p>But also traffic engineering has profited from basic ideas of optimisation - TCP congestion control can be viewed as distributed optimisation (basically Stochastic Gradient Descent) coordinated by feedback signals. But more classical traffic engineering can be carried out a lot more efficiently than simply using ILP formulations on edge weights for link state routing, or indeed, load balancers.</p><p>Neural Networks can be applied to learning these directly based on past history of traffic assignments. Such neural nets may be relatively small so explainable via SHAP or Integrated Gradients.</p><p><br /></p><p>Gassian processes </p><p>Useful for describing/predicting traffic, but perhaps even more exciting is Neural Processes which combine stochastic functions and neural networks, and are fast/scalable, and being used in climate modeling already, so perhaps in communications networks now? Related to this is Bayesian optimisation.</p><p><br /></p><p>Bayes</p><p>Causal inferencing (even via probabilistic programming) can be used for fault diagnosis and has the fine property that it is explainable, and even reveals latent variables (and confounders) that the users didn't think of - this is very handy for large complicated systems (e.g. cellular phone data services) and has been demonstrated in the real world too.</p><p>Genetic Algorithms</p><p>Evolutionary Programming (GP) can also be applied in protocol generation - and has been - depending on the core language design, this can be quite succesful. Generally, coupled with some sort of symbolic AI, you can even reason about the code that you get.</p><p>RL</p><p>Of course, we'd like networks to run unattended, and we'd like our data to stay private, so this suggests unsupervised learning, and with some goal in mind, especially, re-enforcement learning seems like a useful tool for some things that might be being optimised.</p><p>So where would that leave the aforementioned LLMs?</p><p>Just about the only area that I can see they might apply is where there's a human in the loop - e.g. manual configuration - one could envisage simplifying the whole business of operational tools (CLI) via an LLM. But why use a "Large" language model? there are plenty of Domain Specific (small) models trained only on relevant data - these have shown great accuracy in areas like law (patents, contracts etc), user support (chatbots for interacting with your bank, insurance, travel agent etc). But these don't use the scale of LLMs nor are they typically few shot or use the attention mechanism. They are just good old fashioned NLP. And like any decent language (model) they are interpretable too.</p><p>Footnote SD: we're not going to discuss Stable Diffusion technologies here - tools such as Midjourney and the like are quite different, though often use text prompts to seed/boot the image generation process, so are not unconnected with LLMs.</p><div><br /></div>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-80265760047315367722023-08-07T06:32:00.001-07:002023-08-07T06:32:26.178-07:00re-identification oracle<p> surely, chatgpt should be a standard piece of any attempt to show whether allegedly anonymised data is?</p><p>effectively it is a vantage point from which to triangulate (any and almost every angle)...</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-66347346756905428992023-08-04T00:11:00.002-07:002023-08-04T00:11:19.838-07:00postman pidge<p> I'm getting very tired of the infestation of sky rats (as germans call pigeons) in london - they make a mess, are unbearably stupid at getting in the way of cars and cyclists and pedestrians, and serve no obvious use - apparently, they taste so awful that none of the cats or urban foxes in our area will devour them. We need a solution fast.</p><p><br /></p><p>I asked folks about putting up a hawk silhouette, but apparently this would scare off all birds indiscriminately and we have meadow grass for the express purpose of having some nice critters like our garden space, which any others do, when not flocked out by aforesaid grey menace.</p><p><br /></p><p>I'm also not a fan of drone delivery systems - ok, for crop spraying or parcels going across to the Orkneys, that's fine, but in urban spaces, those quad copters are just too noisy.</p><p><br /></p><p>I've considered getting a slingshot, to practice taking out both the pigeons and drones (2 birds with 1 stone, even - if one was lucky could crash the drone into the pigeon or vice versa) - could even be a game, but then there are the neighbours windows, and the people down below to worry about, so that probably doesn't fly (ha ha).</p><p>so then I thought about building drones with wings instead of rotors, and then, designing the drones to tackle the pigeons - even further, could we use pigeon as a form of biofuel for the drone, fitting them into the ecosystem in a special sustainable postal niche? seemed possible but tricky bio-engineering.</p><p>So then it occurred to me the answer was much more obvious, and more obviously darwinian. </p><p>What we need is a hawk that looks like a pigeon, can cary more than a pigeon, finds its way like a pigeon, and lives on pigeons. Hopefully, the cross breeding programme can just be done right away and doesn't need any GM flocks, though in this case, I am not against it.</p><p>I can imagine a society of hawks (or perhaps falcons or some other raptor) living in a very aristocratic manner, serving humans as friends, not slaves, whilst the "cattle" are bred and kept high up on rooftops as fuel. Cities would once more be adroned with beautiful creatures instead of ugly grey winged rodents, and the postal service would be quiet, prompt, and free, if occasioally stained with pigeon blood.</p><p><br /></p><p>I can see no downsides.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com1tag:blogger.com,1999:blog-19062127.post-10757702957327114572023-08-02T08:56:00.002-07:002023-08-09T00:54:22.966-07:00The Enigma Variationals<p> After many years of study, Scientists at the Alan Tuning Institute have finally decoded this machine, and we are now ready to show you, or indeed, play to you want it was originally intended for.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-nqSGeXgyeKeS7yhI6t41Uk7s0IPt6YkebWoOvG5-dMeVQDS8OQjkJ721zp99t7PTpipzax1nfAp1w3OF4tk7IYkb37QQykoDI6kcLRwAqnXntI8ARW0C6VAnWyH_BiNUhDTZBUW-KJRNTQIZTiii3NFRXaU362Ht_dvKpGyM5zoGtm6SqkjI/s4032/PXL_20230802_145014439.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="4032" data-original-width="3024" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-nqSGeXgyeKeS7yhI6t41Uk7s0IPt6YkebWoOvG5-dMeVQDS8OQjkJ721zp99t7PTpipzax1nfAp1w3OF4tk7IYkb37QQykoDI6kcLRwAqnXntI8ARW0C6VAnWyH_BiNUhDTZBUW-KJRNTQIZTiii3NFRXaU362Ht_dvKpGyM5zoGtm6SqkjI/s320/PXL_20230802_145014439.jpg" width="240" /></a></div><br /><p><br /></p><p>Many years ago, Edward Elgar the Elder was strugling to complete his final symphony and turned to his friend Curt Y<span face="sans-serif" style="background-color: white; color: #202122; font-size: 14px;">ö</span>del, who was only able to contribute a theory that suggested that some compositions could be finished, but wrong, while others would be perfect, but unfinished. Of course, there was one famous prior, Tomas Albinionini, whose unfinished work, the Adagio Al Fresco was found written in the margins of the remains of the library of Eberbach, possibly scrawled there by the long dead monk, George Borgesi.</p><p>Alan Tuning found this keyboard in the belongings of Edward the Elder after his demise, and being familiar with Y<span face="sans-serif" style="background-color: white; color: #202122; font-size: 14px;">ö</span>del's Unfinished Therem, devised his own approach to figuring out what El Gar may have been finguring out. His inspiration was that whilst the dominant and tonic notational semantics in use at the time relied on letters (A,B,C,D,E,F,G,H and so on), or even entire words ("doh", "ray" etc), these could easily be represented by numbers - for example, 1,2,3, or in the later case 646F68, if you didn't mind risking the wrath of the coven. Given this, one could work through all the combinations and pernotations that could be played on the keyboard, and evaluate whether they sounded plausible - this could be "fed back" to the player, via a small electric shock system, devised to deliver a higher voltage if the sound was sufficiently unpleasant, or a lower voltage, if the direction of travel (gradient) was promising. This method of learning to play pleasing sequences became known as "voltage scaling" and was in use in the best sanitoria and conservatories such as the Sheboygan until relatively recently, when the Muskatonic link became more popular.</p><p>I've transcribed the piece here for the guitar, as it is easy to play than the old Enigma Keyboard, which frankly has atrocious action, and makes too much fret noise too. I've taken the liberty also of transposing it to the Allen Key.</p><p><a href="https://youtu.be/g421ifPPL64">Here</a> is my modest attempt at the piece. I do hope you like the results - I had a super conductor.</p><p>You'll note that this is in Sonata form, and features several themes with recapitulations.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-65033584040137735132023-08-01T06:21:00.003-07:002023-08-01T06:21:23.432-07:00teaching CS topic X top down for X={networks, graphics, databases, operating systems...} but what about AI?<p> computer science text books have often been written bottom up - start with hardware (here's a CPU, here's a disc, here's a link) and move from physical characteristics, through low level representation of data and processing properties (ISA, memory, errors, coding&modulation, etc) up through the layers of abstraction.</p><p><br /></p><p>Then along came the pedagogic idea of teaching a couple of CS topics top down. Famous example is Kurose/Ross book on networks, and also Mel Slater and Anthony Steed's book on graphics</p><p>(start with web, start with ray tracing etc)</p><p>Other books have tried to do this for data bases, operating systems, and (to some extent) PL.</p><p><br /></p><p>So what would a top-down approach to AI look like? eh? eh, Chat-bard, llamadharma, out with it.</p><p><br /></p><p><br /></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-50304201823335695972023-07-25T02:29:00.003-07:002023-07-25T03:01:01.466-07:00differentially private high dimensional data publication - perhaps a common case<p> imagine you have data about 100M people, that has around 1000 dimensions,</p><p>some binary, some other types statistically distributed in various ways, but lets just say kind of uniform random</p><p>so a given person as a pretty clear signature even if it is all binary - 2^1000 is a big space. i.e. a key that specifically very likely is different for each person</p><p>but imagine 10 of the dimensions are not binary, but (say) a value gaussian distributed, and 990 dimensions are basically 0 for most people, but 1 (or a small number) for each person, but for a different dimension </p><p>so the 10 dimensions are a fairly poor at differentiating between individuals in the 100M population</p><p>but the remaining 990 still work really well. i.e. these are rare things for most people but different for different people, so still a very good signature</p><p>but say we want to publish data that doesn't allow that re-identification, but retains the distribution in te 990 dimensions -</p><p>so what if we just permute those values between all the individuals? we leave the 10 values alone, but swap (at random) the very few 1s between fields with other fields (mostly 0s, a few 1s). for all 100M members of the population?</p><p>what's the information loss?</p><p>baiscally, we're observing that unaltered, and published the data in the higher but sparsely occupied dimensions has very strong identifying power, but very poor explanatory power....so messing with it this way, massively reduces the identification facet, but shouldn't alter the overal distributions over these diemensions (w.r..t the densley populated fewer (10) dimensions)</p><p><br /></p><p>does this make any sense to try?</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikz0Mhn8cWCOe-X1vFMthxZyXlAGefj945ePbNQG8ig5--WHlxfKkA2R3__ePEyo5sjog5lmP2AXGyYWcD9je_Daqe94T9lcxOSQYKuKQVO1x8gLIBtES5A6Xo9HAbXU4Q9QwpWuLV8UgucoyTaBxJ6iJUcdninkqP2jV6OYFNxLxqxyW3gFMp/s2306/PXL_20230725_094149600.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1730" data-original-width="2306" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikz0Mhn8cWCOe-X1vFMthxZyXlAGefj945ePbNQG8ig5--WHlxfKkA2R3__ePEyo5sjog5lmP2AXGyYWcD9je_Daqe94T9lcxOSQYKuKQVO1x8gLIBtES5A6Xo9HAbXU4Q9QwpWuLV8UgucoyTaBxJ6iJUcdninkqP2jV6OYFNxLxqxyW3gFMp/s320/PXL_20230725_094149600.jpg" width="320" /></a></div>ref: <a href="http://wrap.warwick.ac.uk/92273/">PrivBayes</a><br /><p> Another way to think of this is that the low occupancy dimensions are unlikely to be part of causation coz they have poor correlation with anything else, mostly</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-85053968020311049152023-07-17T04:00:00.006-07:002023-07-17T04:00:44.160-07:00National Infrastructure for AI, ML, Data Science<p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">There's been a tension between super expensive </span></p><p><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">HPC clusters and </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">on-prem cloud style data centers for large scale computation</span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;"> since the </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">e-Science programme 20+ years ago (just noting that as part of that, </span></p><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">(We (Cambridge University Computer Lab) developed the </span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Xen Hypervisor subsequently used by </span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Amazon in their Cloud setup for quite a while, so there). </span><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;"><br /></span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">The High Energy Physicists and</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">folks with similar types of computation have favoured buying expensive</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">systems that have astronomical size RAM and a lot of cores very close</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">to the memory. Not only are these super expensive (because they are</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">not commodity compute hardware) they are almost always dedicated to</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">one use and are almost always used flat out by those groups, perfectly</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">justifiably since the data they process keeps flowing.</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Meanwhile, most people have tasks that can be classified as either</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">small (work on a fast laptop these days) or what we call</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">"embarrassingly parallel", which means they trivially split into lots</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">of small chunks of data that can be independently processed to (e.g.)</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">create models which can then be aggregated (or federated). These work</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">really well in Cloud Computing platforms (AWS, Azure etc).</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">However, public cloud is a pay-per-use proposition, which is fine for</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">a few short term goes, but not great if you have things that run for a</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">while, or frequently. Or if you are a member of a large community</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">(e.g. UK academics and their friends) who can outright buy and operate</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">their own cloud platforms in house (aka "on prem" short for on</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">premises). This is also true for any data intensive organisation</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">(health, finance etc).</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">There are operational costs obviously (but these are already in the</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">price of public pay-per-use clouds) that include energy, real-estate,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">and staffing at relatively high levels of expertise.</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">However, most universities have got more than one such a service in</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">house already. And all are connected to the JANET network (which is</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">about to upgrade to 800Gbps, which continues to be super reliable and</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">just about the fastest operational national network in the world). So</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">they are sharable. THey also often feature state of the art</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">accelerators (GPUs etc) - these are also coordinated nationally in</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">terms of getting remote access as psrt of collaborating projects, so</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">that sign-on is fairly straighforward to achieve for folks funded from</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">UKRI- see <a href="https://www.ukri.org/councils/epsrc/facilities-and-resources/using-epsrc-facilities-and-resources/apply-for-access-to-high-performance-computing-facilities/">UKRI facilities</a> </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">for current lists etc </span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">There are good reasons to continue this federated system of work</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">including </span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;"><br /></span></div><div><ul style="text-align: left;"><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">better resource utilisation and </span></li><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">better cost aggregation as </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">well as </span></li><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">potentially higher availability </span></li><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">(lower latency and </span></li><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">lower power </span>consumption) than nationally centralised systems.</li></ul><div><br /></div><ul style="text-align: left;"><li><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">The other reason that a widely distributed approach is good is that it </span>continues to support teams of people with requisite state of the art computing skills, who are not distanced from their user communities, so understand needs and changing demands much better than a remote, specialised and elite, but narrow facility.</li></ul></div><div><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">Since a principle use of such facilities is around discovery science,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">it is unlikely to be successful in that role if based on </span><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">pre-determined designs based </span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">on 10-20 year project cycles such as the</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">large scale computational physics community embark on. This is not,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">however, an either/or proposition. We need both. But we need the bulk</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">of spending to target the place where most new things will happen,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">which is within the wider research community</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">pre-determined designs based on 10-20 year project cycles such as the</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">large scale computational physics community embark on. This is not,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">however, an either/or proposition. We need both. But we need the bulk</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">of spending to target the place where most new things will happen,</span><br style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">which is within the wider research community</span></div></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;"><br /></span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">We have a track record of nearly 4 decades of having a national comms infrastructure </span></div><div><span style="background-color: white; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: small;">which is pretty much best in the world - we can quite easily do as well for a compute/storage setup too.</span></div>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-44590122449611257362023-07-11T23:34:00.004-07:002023-07-12T00:41:23.578-07:00Why is the design principles of the Internet are like Climate Interventions are like a bicycle helmet laws?<p></p><ol style="text-align: left;"><li>For a long time, people argued about whether the Internet should have reliable, flow controlled link layers. In olden times, physical transmission systems were not as good as today, so the residual errors and multiplexing contention led to all sorts of performance problems. There were certainly models that suggested that for some regime of delay/loss, you were better off with a hop-by-hop flow control and retransmission mechanism. As the physical network technologies (access links like WiFi, 4G, Fibre to the home) and switches got faster and more reliable, the end-to-end flow control&reliability, and congestion control seem to be a more optimal solution (I'm tempted to add security here too!). But here's the key point I want to deliver - if we had built a lot of switches with additional costs of hop-by-hop (e.g. <a href="https://ieeexplore.ieee.org/document/1543917">just one of many</a>) mechanisms, we would have added a lot of latency, which would have led the network to take a lot longer to reach the operating point where a pure <a href="https://web.mit.edu/saltzer/www/publications/endtoend/endtoend.pdf">end-to-end</a> set of solutions might never have come about - indeed the sunk cost in deploying, and maintaining much more complex switches and NICs would lean against the removal of such tech.</li><li>So how is this like climate? Well, <a href="https://fivetimesfaster.org/">people</a> are now sufficiently worried about global heating, and the failure to slow our emissions to anything approaching the necessary low to prevent even 2C temperatures, and worse, that chain-reaction effects may be imminent, that now we are re-visiting arguments for <a href="https://royalsocietypublishing.org/doi/10.1098/rspa.2019.0255">geoengineering</a>, or what I sometimes call re-terraforming the Earth. One such mechanism involves seeding the upper atmosphere so that it reflects a lot more <i>sunlight</i> than currently - an affordable approach exists and could mitigate 1-2C of global heating almost right away. Aside from the downsides (for example, you might catastrophically interfere with precipitation so that things like the Monsoon could move by 1000s of kilometers and months), any such technology would also slow down the effectiveness of actual viable long term solutions like <i>solar</i> power generation. So the short term fix actually directly messes up the better answer.</li><li>And how on earth can this be like bicycle helmet laws? So the arguments for wearing bicycle helmets are good - in the event of an accident, they definitely can save your life, or reduce the risk of serious brain injury. No question, there. There is a small amount of plausible evidence that cyclists who wear more visible safety gear do attract a slightly higher risk from drivers who drive closer, based on (unconscious bias) perception that the cyclist is less likely to do something random. That's not the main problem. Statistics from countries that make cycling helmets mandatory conclusively show a large scale reduction in the number of people that cycle, and this leads to a reduction in population health, both from reduced opportunities for exercise and from increased pollution from other modes of transport. Some of those people that don't cycle will actually die as a result of not wearing a helmet, in some sense. So the long term solution is to make cycling safer and to remove the need for personal, unsafe, cars or their drivers who are the root cause of the risk. Autonomous vehicles, and segregated bike lanes seem like things one should continue to argue for, rather than forcing a short term solution on people that is counter productive (i.e. reduces the inherent, healthy actual demand for cycling.).</li></ol><div>So there you have it - the Internet Architecture is like Geoengineering and Helmets - as easy as falling off your bike,</div><p></p><p></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-37614707732242903632023-07-11T01:39:00.001-07:002023-07-11T01:39:10.890-07:00AI everyday life skillz<p> This extremely useful report from <a href="https://www.adalovelaceinstitute.org/report/public-attitudes-ai/">Ada Lovelace et al</a> has lists of "AI" stuff that the public actually encounter - it just predate the hysteria about LLMs so it might change (a bit) if people were re-surveyed (though I doubt it, as this was well constructed being about lived experience more than hearsay and fiction)</p><p><br /></p><p>nevertheless, it suggests we might want to assess the public readiness to cope with various new AI tech as it (slowly) deploys....</p><p><br /></p><p>we can look at it through several lenses - the lens of every day includes smart devices (home, phone, health/fitness) and services (cloud/social/media - recommenders etc), and workplace (better software that reduces slog on boring tasks and integrates things nicely - especially stupid stuff like travel/expense claims, meeting&document org/sharing, fancy tricks to improve virtual meeting experiences etc), then there's state interventions (in the report above, face recog, but what about tax surveillance and the like).</p><p>of course, there's the trivial lens - that of your camera phone:-) enhanced by some clever lightfield tricks etc etc...</p><p><br /></p><p>but if we are thinking longer term (5-50 years), what are the key lessons people should be internalising to reduce future shock?</p><p><br /></p><p>to be honest, I have no idea, and I think climate is far more important than worrying about the LLM taking your job. unless you are a really bad wordmith.</p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0tag:blogger.com,1999:blog-19062127.post-89007320254586225142023-07-10T08:38:00.001-07:002023-07-10T08:38:22.950-07:00Existential threads<p> People who like being in headlines are clutching at straws when they talk about existential threats.</p><p>The latest in a long line of "we're all doomed" was trigged by the hype surrounding a new chatbot, mostly similar to the old chatbot, but with a slightly smoother line of patter. LLMs are not AI, or even AGI, they are giant pattern matchers.</p><p>In order of threats to things, my list is quite short</p><p></p><ul style="text-align: left;"><li>LLMs are a threat to journalists, as they reveal how few journalist actually do their job, and that job, therefore is at risk, from being replaced by a script, just like workers in call centers. Threat? tiny. When? Right now.</li><li>Nuclear Fusion Reactor - these actually could save the planet, and the tech is now mere engineering away from being deployable - just main problem is that that engieering is very very serious - more complex than, say, a 747/Jumbo Jet, which is typically a 20 year lead time. Nevertheless, these are. a threat to fossil fuel industry. Threat: modest. When? 10-20 years off.</li><li>Quantum Computers - these are.a threat to some old cryptographic algorithms, for which we already have replacements. However, decoherence and noise are a threat to QC, so these may never happen. Someone clever might solve that, so let say 5-50 years, or not at all. Threat: miniscule.</li><li>Climate. catastrophe. already. right now. Threat: total; When: yesterday.</li></ul>So there's my list. AGIs might happen if we survive all the above, or at least 3. You choose.<p></p><p></p>jon crowcrofthttp://www.blogger.com/profile/05692091803072506710noreply@blogger.com0