This is my explanation of what I think XAI and Interpretable AI were and are - yours may differ:-)
XAI was an entire DARPA funded program to take stuff (before the current gibberish hit the fan) like convolutional neural nets, and devise ways to trace just exactly how they worked -
Explainable AI has been somewhat eclipsed by interpretable AI for touchy-feely reasons that the explanations that came out (e.g. using integrated gradients) were not accessible to lay people, even though they had made big inroads into shedding light inside the old classic "black box" AI - so a lot of stuff we might use in (e.g.) medical imaging is actually amenable to giving not just an output (classification/prediction) but also what features in the input (e.g. x ray, mri scan etc) were the ones, and indeed, what labelled inputs were specific instances of priors that led to the weights that led to the output.
It is this latter that is, of course, (predictably) gameable - the former techniques aren't, since they actually tell you how the thing works, and are attractive for other reasons (allow for reasoned sparsification of the AI's neural net to increase efficiency without loss of precision, and allow for improved uncertainty quantification,amongst other things an engineer might value)...
None of the post DARPA XAI approaches (at least none that I know of) would scale to any kind of LLM (not even Mistral 7B, which is fairly modest scale compared to GPT4 and Gemini) - so the chances of getting an actual explanation are close to zero. given they would struggle for similar reasons to deal with uncertainty quantification, the chances of them giving a reliable interpretation (I.e. narrative counterfactual reasoning) are not great (there are lots of superficial interpreters based around pre- and post- filters and random exploration of the state space via "prompt engineering" - I suspect these are as useful as the old Oracle at Delphi...("if you cross that river, a great battle will be won"), but I would enjoy being proven wrong!
For a very good worked example of explained AI, the DeepMind Moorfields retina scan NN work is exemplary - there are lots of others out there including use of the explanatory value to improve efficicency.