You are here

Guest editorial: the challenge of AI chatbots for journal editors

One of us (RW) published an editorial in his journal on the use of ChatGPT in writing academic manuscripts. ChatGPT (a chatbot developed by OpenAI) was included in the author list in the interests of transparency. The journal’s publisher (Elsevier) does not allow AI to be named as an author and any submissions where it is included are screened out at the online submission stage. Editorials, however, are handled by the Editor-in-Chief and so this issue was missed. In due course, a corrigendum was issued but not before the erroneous authorship had been picked up by Nature News and the issue went viral. It seems that everyone from the mainstream media to sceptical websites is taking an interest in the potential for AI to help write academic manuscripts. Nevertheless, it does highlight challenges for universities and other educational establishments, the publishing industry generally and—specifically—the academic publishing industry.

The issue for academic publishing and thereby for journal editors is quite simple: what do we do about the potential use of AI in writing academic manuscripts? The options range from embracing it, thus ignoring many of the attendant problems—which is most unlikely, to outlawing it—which is surely impossible. After all, since the problems have only relatively recently been highlighted, how can we be sure that, undetected, we have not already published swathes of articles where AI was used to assist the author? Of course, one of the specific problems that arises is plagiarism and how to detect it, since  the chatbots are trained on already published material.

Although the first plagiarism detection software applications were developed almost twenty years ago, it took another five to ten years for most publishing companies to widely adopt them in the process of pre-checking the submitted scientific papers. In the case of generative AI-based software (which generates free text on demand), we may not have that much time to introduce solutions that would prevent, but most likely only reduce, the number of computer-generated papers entering the reviewing process. The first experiment with existing solutions such as Turnitin to detect AI-generated text show that it is extremely inefficient in this task.

In the last few weeks a specific software called GPTZero attracted a lot of attention in both social and mainstream media. GPTZero promises to detect AI-generated text by analyzing features referred to as perplexity and burstiness. Perplexity measures whether the text is similar to the text seen by the AI language model before, while burstiness measures sentence variability, which is generally lower in AI-generated text than that written by a human. Even if the extremely low false-positive rate claimed by the GPTZero author is true, its use might be controversial. Proofreading software is also problematic; the use of modern proofreading software (e.g. Grammarly, PerfectIt, ProWritingAid, etc.) results in increased perplexity (where precise meaning is required eg in a legal sense) and burstiness (rare words appearing in a text) of the text. This might present an additional challenge for some authors using this software. Their writing may be detected and flagged up as text written using AI, especially in non-native English-speaking countries, where language editing software has been widely used and helps their students to at least come close to their native-speaking colleagues in English speaking countries.

The publishers of academic journals now routinely run manuscripts through similarity detecting software such as iThenticate and, in the early days of its use, percentages of acceptable similarity with other sources were set and adhered to. However, it soon transpired that some human judgement was still required. For example, a 5000-word manuscript that is 10% similar to another source may have 500 words taken directly and in their entirety from elsewhere. However, a manuscript that is 50% similar to another source may have small and unavoidable similarities scattered throughout. The software will highlight the similarity but not the intention of the author. It strikes us that, if suitable software is developed to detect the use of chatbots then we are unlikely to be able to rely solely on the raw results. Human judgement will still be required.

Chatbots will not go away and, surely, further AI tools will be developed. Google is already engaged in this process. Academic publishers and editors, with the help of bodies such as COPE, need to start a big conversation about how we are going to react to the use of chatbots…SOON!

Roger Watson, Editor-in-Chief, Nurse Education in Practice

Gregor Štiglic, Associate Editor, AI in Medicine

Conflict of interest: the authors declare that they have no competing interests or affiliations that have not been declared either in the by-line or in the content of the editorial

Related resources