Where next in peer review? Part 2: COPE commentary

16 November 2023

In Part 1 of this Commentary we outlined some of the problems, solutions, and tools facing peer review. In Part 2 we address perhaps the most lively area of debate on this topic at the moment: the role of artificial intelligence (AI).

Artificial intelligence

Covid-19 saw a surge in rapid, community-led peer review because of the urgent need to understand more about the virus. Unfortunately, a number of the resulting articles later had to be retracted and this damaged faith in community-driven rapid review. This time, however, we seem so far to have avoided these reservations.

COPE held its first Forum on AI in November 2019, when the topic was AI in decision making. At that stage it was noted that technology was starting to bring in ‘data-driven solutions’, and that AI could potentially provide guidance to humans or automate a range of steps in the editorial and reviewer process. Even at that point it was clear that while the opportunities were encouraging, it was necessary to exercise caution and to use AI only alongside human decision-making. The ensuing discussion document published by COPE in 2021 stressed, among the potential benefits of using AI-assisted automation in publishing, its ability ‘to improve the quality of the automation, lessen the burden on human participants, and increase the speed of the review process, ultimately sharing validated and peer reviewed research outcomes more quickly, and with reduced demands on editors, reviewers, and authors.’ However, it also identified several areas where a ‘cautious’ and ‘people-centred’ approach was advisable. These could be grouped under three headings: accountability (tools should be ‘non-discriminatory and fair’), responsibility (involving human agency and oversight), and transparency (related to technical robustness and data governance).

In March 2023 COPE hosted a second Forum on AI, this time focusing on fake papers. The launch of ChatGPT 3 in November 2022 (and version 4 in March 2023) led to a huge surge in publicity and mainstream usage, and in February the COPE Officers issued a position statement which stated that AI should not be cited as an author in scholarly publications. It was very rapidly becoming clear how much generative AI (tools which can generate new content from a prompt) could offer – but that these opportunities could be exploited by paper mills and other ‘bad actors’.

It is remarkable, almost exactly four years after that first COPE Forum, how little has changed from the ethical standpoint – but how much has moved on from the technological point of view. ChatGPT 3 was an example of a tool which could engage in natural language processing (also known as Large Language Models or LLMs). These use examples of human writing to create content which mimics plausible writing patterns and content. There are also AI tools which can generate images, music and computer code, all based on patterns seen in training data. For publishers and research institutions, however, these new capabilities have only complicated the picture further, making paper mills, image manipulation, and fabrication ever more sophisticated and ever harder to detect. The accessibility of these tools gives AI the potential to enhance the capabilities of those working in publication ethics, at the same time as posing an existential threat.

It remains important to distinguish AI – which involves the software ‘learning’ from its processes to create capabilities that equal or exceed a human’s – from automation. Automation allows a machine to carry out predetermined tasks with specified outcomes and is already used routinely in peer review to do things like identifying whether all ethical declarations are present, checking that citations are correctly presented, and screening for poor language use. These automated tools can have many benefits, not least in saving editors time and by improving scholarly language in a whole range of cases where linguistic access could be a barrier. This study gives some more examples of the benefits of AI. However, we are now at the stage where artificial intelligence can carry out tasks which require creativity and judgement, such as recommending acceptance or rejection, creating reviewer reports, and identifying cases of image manipulation, duplication, and plagiarism. This is where the ethical issues really come to the fore.

As might be expected, responses to these innovations have been mixed. The growth of Peer Review Studies as an interdisciplinary field of research has been particularly helpful in assessing their impact. On the positive side, one 2021 study found that an AI model was able to correctly predict human reviewer scores in a large proportion of cases. This should encourage us that AI judgements are not wildly out of line with those of humans. There is also potential for AI to reduce human forms of bias, which are shown to manifest in favour of authors from prestigious institutions, and against people from countries where in English is not the primary language. Machines are better than people at identifying dubious or duplicated plots and images meaning that AI could have a big impact in identifying problems before publication rather than afterwards.

High on the negative side are concerns about unknown and unquantified biases in AI training data. The danger is that these could lead to discrimination against certain authors or groups of authors, or against research which is innovatory and therefore unfamiliar from training data. Large Language Models (LLMs) can also return different answers to the same question, making AI-led review arguably just as susceptible to inconsistency as humans. There are serious reservations about confidentiality, especially where information fed into an AI tool might make its way back into training data and thus the public domain. One recent study showed that a neural network AI model could successfully identify the authors of a large proportion of submitted papers based simply on the first few hundred words of the submission. This may not in itself be an issue (many human reviewers can at least guess at authors’ identities, after all), but it becomes more sinister if an AI tool starts to discriminate against certain authors. While human reviewers are also not free from bias, the ‘black box’ nature of most AI models means that such outcomes could be hard to predict or definitively identify. Both the US National Institutes of Health and the Australian Research Council have banned grant reviewers from using LLMs like ChatGPT for this reason. It is difficult to see how even specialist tools can be held legally responsible for their output or accountable for confidentiality. Another concern is that such tools may not remain free, increasing inequality rather than levelling it out; they are already not available in certain countries.

Despite these concerns, AI remains an attractive prospect for editors struggling to find reviewers and facing increasingly more sophisticated breaches of publication ethics. Certainly, many Peer Review Studies scholars are positive about the potential for AI tools, as long as they are used in very well-defined ways and alongside human oversight and training. For the most part, the benefits lie in the early screening stages of submission, leaving human experts to assess papers which have already gone through a preliminary assessment of integrity issues. Frontiers presents their AI Review Assistant Platform (AIRA) in this way: as a tool to empower researchers rather than replace them. AI thus becomes a way to improve the rigour and equity of human review rather than replacing it, helping us to understand human decision making and bias rather than highlighting its deficiencies. BioMed Central is also trialling a tool called ‘StatReviewer’ at three of its journals, which assesses aspects of statistical reporting.

One of the key problems with AI, though, is that it is developing at such speed. Many of us are feeling left behind and lacking in the knowledge to know how to think about it. When IOP polled the audience at a publishing event in June 2023 on their feelings about the implications of adopting AI in publishing, the average response was 5.25 out of 10 (1=terrified, 10= very excited). However, after the session this had risen to 6.17 implying that some of the apprehension is from worries about lack of knowledge, or feelings of isolation in being an early adopter. Knowledge, as those working in scholarly publishing know so well, is power.

COPE made its own contribution to knowledge-building in a very popular session on AI and peer review at September’s Publication Integrity Week. This featured discussion between Hum Co-founder and President Dustin Smith, and lecturer in research ethics and integrity, Mohammad Hosseini. While both speakers allowed for uncertainty about the future developments of AI, they gave quite definite answers on a number of technical points, illustrating the need for knowledge to fully understand the capabilities and threats presented by AI.

On the potential for personal information to make its way back into training data, for example, Dustin Smith (a self-declared ‘AI optimist’) was quite clear. Where LLMs are trained on closed data sets (that is, with no new information added to it over time), there is little risk of material submitted to it reappearing in later responses. The same is true where users build their own models. The greatest threat to confidentiality is where people use AI tools poorly, without understanding what they are asking them to do. An example is by giving it very complex or overly specific prompts: this increases the likelihood of it producing ‘hallucinations’ (that is, non-authentic results such as fabricated citations) where the response has been over-fitted to meet the requirements. Both speakers were clear in calling for greater training in peer review generally, but particularly to better understand how to use and interpret AI tools. Another session at Publication Integrity Week showcased some of the tools being developed to detect a wide range of ethical issues and improve the integrity of the published record. It was noteworthy that all of the developers stressed the need to combine the power of their tools with human interpretation and decision-making.

In Part 1 of this Commentary we considered the reliance of peer review on reciprocity and voluntarism. It is worth examining briefly how the rise of AI might impact on this, and whether it would be to the benefit or the detriment of the community. Would researchers feel more accountable to an AI reviewer or to a fellow human, for example? While the current system relies on ethical behaviour, a more machine-based future might increase accountability because of the increased chance that unethical actions will be detected. A study in 2019 showed that the use of digital tools for screening was associated with a lower chance that a paper would be retracted after publication, although it is difficult to untangle whether this is because of higher detection rates, or because it acted as a deterrent to bad behaviour. We should ask ourselves if this is a better future (higher ethical standards) or a worse one (motivations are instrumental rather than ethical)? What would the longer-term future look like: Would standards start to lapse over time? Would ‘bad actors’ develop new ways to cheat AI reviewing tools? Would freeing up reviewers’ time promote more mentoring for younger researchers in ethical behaviours? Does any of this matter, and to whom?

While it seems that we are still reluctant to wholly abandon the trust, esteem, and reciprocity which form the basis of conventional peer review, we are increasingly acknowledging the utility of AI assistance. As ever, users must be transparent about what tools they used, in what ways, and when. The NISO peer review declaration guidelines encourage disclosure of the use of AI or machine learning tools in checking or screening the content of papers. Users should experiment and report on their findings, and they should think about how AI can be incorporated into better training for those involved in peer review, and better systems of awarding credit.

However, in this field, things move fast, and we should be prepared for our terms of reference to change quickly. At COPE’s Publication Integrity Week, Dustin Smith expressed an opinion that transparency over use of AI will soon become redundant because it will be so pervasive: no one reports whether they used a spell checker, after all. If authors can use AI in ways that do not infringe their ability to truthfully claim originality and accountability; if publishers and editors can do the same while still claiming that authors’ work was judged fairly and without bias, then perhaps there is a bright world ahead for a publishing partnership between humans and AI.

Alysa Levene, COPE Operations Manager

Related resources

Where next in peer review? Part 1 of this Commentary problems, solutions, and tools facing peer review
Authorship and AI tools COPE position statement, Feb 2023
AI and fake papers COPE Forum discussion, March 2023
Artificial intelligence in the news latest update, April 2023
Artificial intelligence and authorship COPE editorial, February 2023
The challenge of AI chatbots for journal editors guest editorial, February 2023
Trustworthy AI for the future of publishing COPE Seminar 2021
AI in decision making discussion COPE Forum discussion, Nov 2019
Artificial intelligence (AI) in decision making COPE Discussion document, 2021

Your comments

You must be logged in to leave a comment on the COPE website. If you are not a member of COPE you can create a guest account.

Comments will not appear immediately. We review comments on our website before publishing them, to ensure they are respectful and relevant.

You are here

Where next in peer review? Part 2: COPE commentary

Artificial intelligence

Related resources

Your comments