Plenty can and has been said about networks (&systems) for AI, but AI for nets, not so much.
The recent hype (dare one say regulatory capture plan?) by various organisations for generative AI [SD], and in particular LLMs has not helped. LLMs are few shot learning that make use of the attention mechanism to create what some have called a slightly better predictive text engine. Fed a (suitably "engineered") prompt, they match an extension database of training data, and emit remarkably coherent, and frequently cogent text, at length. The most famous LLMs (e.g. ChatGPT) were trained on the Common Crawl, which is pretty much all the publicly linked data on the Internet. Of course, just because content is on the common crawl doesn't necessarily mean it isn't covered by IP (Intellectual Property - patents, copyrights, trademarks etc) or indeed isn't actually private data (eg. covered by GDPR), which causes problems for LLMs.
Also, initial models were very large (350B dimensions) which means most of the tools & techniques for XAI (eXplainable AI) won't scale, o we have no plausible reason to believe their outputs, or to interpret why they are wrong when they err. Generally, this causes legal, technical and political reasons that they are hard to sustain. Indeed, liability, responsibility, resilience are all at risk.
But why would we even think of using them in networking?
What AI tools make sense in networking?
Well, we've used machine learning for as long as comms has existed - for example, training modulation/coding on the signal & noise often uses Maximum Likelihood Estimation to compute the received data with best match.
This comes out of information theory and basic probability and statistics.
Of course, there are a slew of simple machine learning tools like linear regression, random forests and so on, that are also good for analysiing statistics (e.g. performance, fault logs etc)
But also traffic engineering has profited from basic ideas of optimisation - TCP congestion control can be viewed as distributed optimisation (basically Stochastic Gradient Descent) coordinated by feedback signals. But more classical traffic engineering can be carried out a lot more efficiently than simply using ILP formulations on edge weights for link state routing, or indeed, load balancers.
Neural Networks can be applied to learning these directly based on past history of traffic assignments. Such neural nets may be relatively small so explainable via SHAP or Integrated Gradients.
Useful for describing/predicting traffic, but perhaps even more exciting is Neural Processes which combine stochastic functions and neural networks, and are fast/scalable, and being used in climate modeling already, so perhaps in communications networks now? Related to this is Bayesian optimisation.
Causal inferencing (even via probabilistic programming) can be used for fault diagnosis and has the fine property that it is explainable, and even reveals latent variables (and confounders) that the users didn't think of - this is very handy for large complicated systems (e.g. cellular phone data services) and has been demonstrated in the real world too.
Evolutionary Programming (GP) can also be applied in protocol generation - and has been - depending on the core language design, this can be quite succesful. Generally, coupled with some sort of symbolic AI, you can even reason about the code that you get.
Of course, we'd like networks to run unattended, and we'd like our data to stay private, so this suggests unsupervised learning, and with some goal in mind, especially, re-enforcement learning seems like a useful tool for some things that might be being optimised.
So where would that leave the aforementioned LLMs?
Just about the only area that I can see they might apply is where there's a human in the loop - e.g. manual configuration - one could envisage simplifying the whole business of operational tools (CLI) via an LLM. But why use a "Large" language model? there are plenty of Domain Specific (small) models trained only on relevant data - these have shown great accuracy in areas like law (patents, contracts etc), user support (chatbots for interacting with your bank, insurance, travel agent etc). But these don't use the scale of LLMs nor are they typically few shot or use the attention mechanism. They are just good old fashioned NLP. And like any decent language (model) they are interpretable too.
Footnote SD: we're not going to discuss Stable Diffusion technologies here - tools such as Midjourney and the like are quite different, though often use text prompts to seed/boot the image generation process, so are not unconnected with LLMs.