In this session, COPE Council Member, Mike Streeter, hosts a discussion between Dustin Smith (Hum), and Mohammad Hosseini (postdoctoral researcher and collaborator at the Institute for Artificial Intelligence in Medicine) about the state of Artificial Intelligence (AI) in peer review.
This discussion is one of eleven sessions hosted by COPE during Publication Integrity Week 2023.
Using AI in peer review
The speakers begin by talking about what functions AI is best suited to as a peer review tool. It is unlikely that any current AI can reliably carry out a well-substantiated review, and we should be thinking instead about utilising it as a tool to improve human-authored reports, and to carry out triage tasks. More can be expected of the more advanced Large Language Models, and generally speaking most bad practices arise when weaker models are used poorly by humans.
AI models
Critics have recently suggested that using AI for review purposes risks putting confidential information back into the public domain. The discussion draws a distinction between the type of model used: local models which do not upload information back into training datasets, or self-built models, have much reduced risks in this respect. The most popular models, such as ChatGPT, should be treated with more caution and may leave users more vulnerable to having their data shared. Users need to be educated in the way that different models work, and the ways that they should use outputs responsibly. Attention is drawn to the need for organisations to undertake frequent review of their policies on AI use.
AI disclosure
The speakers give quite different responses to the way that AI use should be disclosed, from full disclosure by reviewers and publishers, to the view that AI use will soon be so pervasive that disclosure will scarcely be relevant. However, users need to remember that AI tries to meet the brief of the user, and this can produce ‘hallucinations’ and unreliable responses if the prompts are not carefully constructed. Organisations could collect their own logs and best practices to guide their policies on usage and disclosure, and engage in more training for reviewers on all aspects of peer review.
Unsuitable tasks for the use of AI
The speakers also discuss some tasks for which AI is not suited. It is not well structured to review work in mathematics, for example. More experimentation will be the best way to reveal what works and what doesn’t. Users need to be alert to the ways that AI can alter their own perceptions: for example, using AI to generate an initial review can introduce biases into a human reviewer’s assessment of the research. However, as a tool to improve language, AI can reduce the risk of mis-evaluation of scholarly research and bring great benefits to the community.
Further reading
- A survey of large language models Weixin Lang et al
- Was Chat GPT set up to fail? Dustin Smith
Related COPE resources
- AI and authorship tools COPE position statement 2023
- Artificial intelligence in decision making COPE discussion document 2021
- Login to your account or register
to post comments