This year will see approximately 10,000 papers published in the top 3 conferences in AI alone.
What does that even mean? How can anyone have an overview of what is happening in AI?
How is their "community" calibrated on what is original, what constitutes rigour, what he paper is significant in terms of potential impact on the discipline?
But that's not what I came here to say, at least, thats just the starting point.
For a couple of years now, we've seen papers "tossed over the fence" to other conferences (I'm using the conference as an example venue, but I am sure journals, technical press, and bloggers are seeing the same thing).
A paper on AI and Systems (or Networks, or Databases, or pick your own long established domain) should bring interesting results in those domains - indeed, it is clear that AI brings challenges for all those domains (mainly of scale, but some with different characteristics that we havn't encountered precisely before). This is not a problem - we (in Systems) welcome a challenge - we really relish it!
But how do we know that the AI part is any good? How do we know it isn't outdated by other papers recently, or disproven to be a good approach, or even if some paper in the AI community has taken the same AI tech and resolved the systems challenge? How does anyone in the AI community know either?
This is not sustainable for AI, but it is becoming unsustainable across all of Computer Science pretty rapidly. The AI community, driven by a mix of genuine excitement, but also by hype, and some ridiculous claims, greed (for academic or commercial fame & wealth), but also just to "join in" the big rush, is polluting the entire landscape of publications, but more problematic, it is atomising the community, so that we are rapidly losing coherence, caliabration and confidence about what is important, what is a dead end, and what is just good training for another 30,000 PhD students in the dark arts.
I have no idea how to fix this. Back in the day, at height of Internet madness, the top ACM and related conferences had a few hundred submissions and accepted in the range 30-100 papers a year. You could attend and meet many of the people doing the work, and scan/read or attend most sessions, even get briefiings from experts on whole session topics, or have discussions (dare I say even hackathons too).
In that world, we started also to insist quite stongly that papers should be accompanied by code, data, and, ideally, an artefact evaluation by an independent group (extra Programme Committee) who could to a lot more than just kick the tyres on the system, but try out some variations perhaps with other data, perhaps more adversarial, perhaps more thorough sensitivity analysis etc etc).
Imagine if the top 3 AI conferences did require artefact evaluation for all submissions - that's probably in the region of 40,000 papers in 2024. But imagine how many fewer papers would be submitted because the authors would know they'd not really have a chance of passing that extra barrier to entry (or would be in a lower tier of the conference, at least).
And while using AI to do reviewing is a really bad idea (since that doesn't help train or calibrate the human community at all) AI assisted artefact evaluation might be entirely reasonable.
So like the old netflix recommender challenge, the AI Artefact Evaluation challenge could help.
Maybe they're already doing it, but who has the time to find out, or know how well it is working in those 10^4 wafer thin contributions to something that can really not claim to be Human Knowledge, anymore.