Systematic manipulation of the publishing process via “paper mills”
Increasingly, across the research publishing landscape, publishers are seeing large scale manipulation of the publication process. The production of fraudulent papers at scale via alleged ‘paper mills’ is one such manipulation.
Paper mills are profit oriented, unofficial and potentially illegal organisations that produce and sell fraudulent manuscripts that seem to resemble genuine research. They may also handle the administration of submitting the article to journals for review and sell authorship to researchers once the article is accepted for publication. Indications that manuscripts may be produced by a paper mill are more readily detected at scale as they may be similar in layout, experimental approach and have similar images or figures.
What are the issues associated with paper mills?
Publishers and editors are being alerted to many potentially problematic published articles that may indicate the involvement of alleged paper mills because they all contain possibly problematic research images (eg, Western blots and flow cytometry data).
Paper mills are difficult to spot until there is a bulk of published papers that can be compared, often across publishers.
Asking for raw data from the authors is a logical first step, but collecting and checking the data is not straightforward. This is especially the case if data files require specialist software and the raw data may not seem obviously problematic when assessed in isolation on a case-by-case basis.
There are difficulties in detecting use of ‘stock images’ as they have no obvious manipulations per se and data can be provided. The issue is only apparent across multiple published papers.
Correspondence can be difficult because it is unclear if approaching real authors or paper mill agents.
Approaching institutions may be difficult when it is not clear who to approach or there is a lack of response.
Questions for discussion
While publishers and editors can handle individual papers on a case-by-case basis, how do we tackle the bigger issues at play?
What can publishers do to improve the screening processes to detect these problems earlier?
Are there key author declarations that could be suggested that researchers make?
Even for journals which do not have a mandatory data sharing policy, should authors be asked to store their original data/images in open or institutional repositories so that data are at least available on request?
Can COPE play an active role?
This topic was discussed at the start of the COPE Forum on Friday 4 September 2020, with guest speaker Elisabeth Bik (Microbiome and Science Integrity Consultant).
Elizabeth Bik, is a Dutch microbiologist living in California who has worked for 15 years at Stanford University and two years in industry before becoming a science integrity volunteer and consultant in 2019. Her work with her team in scanning the biomedical literature for images and other data concerns has been instrumental in detecting paper mill activity.
Paper Mills are becoming a real problem. There are likely thousands of these papers and our team has reported on only a couple of hundred so far.
They are generated by a commercial entity—a company that might be associated with a real laboratory.
Articles are sold to people who need a paper for their resume or promotion. These papers usually contain fabricated data, although they might include real photos of cells or tissues or other microscopy photos. The companies are difficult to find online and they are usually run under the guise of language editing services for non-native English speakers. Behind the scenes, they might offer more services, such as selling papers to authors who need them for their promotion or to finish medical school, for example.
This is a particular problem for doctors at Chinese hospitals because they have a requirement to publish a paper to finish medical school or to get a promotion, which will mean they will earn more money. Usually these doctors are underpaid, and to support their families they need the promotion and hence they need to publish one scientific paper, even though they might not be interested in doing research. Despite this requirement, they are not given time off from their busy clinic schedule to do research, and they usually do not work at a hospital with research facilities. Yet they are expected to write a research paper. So, when companies offer to give them authorship on a paper in return for money, they usually see this as a small investment to make a living.
We have worked on identifying these papers by screening many of them. They are hard to recognise because they look like real papers with no clear red flag. But if you look at the broad datasets, certain patterns start to emerge: the papers are often about non-coding RNAs (micro RNAs or circular RNAs); they are often in the cancer field; they often have authors with private or non-institutional email addresses; and they usually have a particular set of photos and types of experiments.
Journals can ask the authors for the raw data but often the raw data are also manipulated. Or the raw data might be, for example, a flow cytometry file, which is hard to recognise and difficult to read if you are not a flow cytometry expert.
One solution is that journals could include a checkbox in their manuscript submission process for authors to confirm that they generated all of their data within their own laboratories. This may not prevent paper mills but it might make it more difficult to defer to other companies who make the data when such questions are raised later.
There are many red flags: papers from Chinese hospitals because of their requirement to have a research paper published; papers about small non-coding RNAs and cancer; papers containing very particular types of experiments, such as western blots, flow cytometry and microscopy; papers seeming to follow a particular title structure; and non- institutional e-mail addresses, an email address that does not appear to be linked to any of the names of the authors or an e-mail address that ends with 123. Individually, these factors may not be problematic but taken together they raise concerns and could be part of a pattern.
Comments from the Forum, Friday 4 September 2020
Note: Comments do not imply formal COPE advice, or consensus
- In our article submission system, we see strategic changes to the authorship list in papers that we suspect might be paper mills. It seems that they know the journal is going to ask them to confirm authorship changes and they make sure that there are enough of the original authors, at each stage, to make it seem legitimate. But sometimes the final paper has a completely new authorship list and affiliation list. We have had examples where the corresponding author name has changed, the affiliation has changed, but the e-mail addresses is the same. If we contact the authors, we are not sure if we are talking to the agents. The institutions are not responding to us.
- If there are extensive changes to the authorship list, we ask authors to provide institutional input. We ask for an email to be sent directly from the institute to the journal.
- Using an ORCID ID and insisting on ORCID IDs to confirm identity can help. But people can create a new ORCID ID if necessary, and it does not deter them from submitting these papers.
- We are now making populated ORCID or an institutional email mandatory at our journal.
- The disadvantage of mandatory institutional addresses is that it potentially prohibits people between jobs, post doctorates and retired researchers from submitting papers to journals.
- How do paper mills differ from authorship by medical communication company? Both are ghost authorship.
- Our system triggers an email to all co-authors of a paper stating that they need to confirm authorship and send a contributor statement.
- Is it possible to make it mandatory for Chinese authors to provide an institutional email address? Can we insist that when researchers from institutions in China submit papers to international journals that they should have an institutional e-mail address? Collectively, can we work with Chinese institutions, explaining that if there is pressure on researchers to publish in international journals, then the institutions within these regions need to have the infrastructure in place to have legitimate verifiable email addresses in place so that journals can work with them.
- Could institutional review board approval be a point of reference to contact the institution and get reassurance that the studies were performed at the institutions?
- Paper mill papers are often characterised by superficial justifications for carrying out their analyses, which are usually around genes and gene function. They may combine a gene, sometimes multiple genes, with a human disease, which is often a form of cancer. The link between the gene and the disease is often superficial. The discussion sections are also superficial. One red flag is incorrect nucleotide sequences and often the reagents described in these papers are wrong.
- Many of these papers have no funding. Also, in the acknowledgement section, they note that somebody collated the data but some do not mention who performed the experiments.
- We often become aware of these papers when we are looking into potential problems with duplicate submission. Authors will request withdrawal or will not be contactable if we find they have published elsewhere.
- Paper mill activity has been largely in the life sciences portfolio but some has been found in engineering journals. We have seen paper mills in other disciplines such as computer sciences, engineering, humanities and social sciences. In civil engineering, we discovered some of the authors were medical doctors, and did not seem to have any degree in civil engineering and the institutes did not have any departments of civil engineering
- Scholar One offers a tool called “unusual activity detector”, an algorithm that picks up submissions from the same computer. This is not necessarily a sign of a paper mill submission but if there are other red flags, it can be useful. Some additional features have been added. The original features for the tool were to protect against reviewer fraud and a report would only be issued after peer review. A new setting (low, medium, high) can raise an alert if there is a high rate of submission from the same device. Would it be useful to approach Clarivate to get more details about these reports? Is it possible to see where the IP addresses come from and if they can be traced back to the paper mill?
- We have been alerted by Pub Peer reports to problems with figures, such as western blots. We then ask for the underlying data from the authors for an internal review. The data we receive are almost always incomplete and do not completely match the types of experiments described in the methods. Other issues are suspiciously narrow SD values in graphs given the experimental description. Hence the problems can go beyond clever image manipulation. We are fortunate to have scientists on our publishing team who can understand the data quickly and present a summary and proposal of action to the editor in chief. This has helped us considerably and the editor in chief is not overwhelmed by the situation.
- How can comparisons be done systematically across different articles in different journals?
- How does it differ from authorship by medical communication company (ie, ghost authorship)?
- If a submission looks suspicious, it may be worth looking into the metadata. Larger journals might notice 10 or 20 manuscripts that seem very similar in structure and title, submitted on the same day or within a few days. The documents properties might show that the revision number of the manuscript is very high. Also, the initials of the last person who made changes to the document may not match any of the authors. These issues do not confirm that this is a paper mill article but could raise suspicions that need a closer look at the authors, their institutes and whether its likely that the study would have come from that institute.
- If we discover that we have published a paper mill article, we follow the COPE flowchart on “What to do if you suspect fabricated data in a published manuscript”. This is challenging as we are often not able to get in touch with the institution. We struggle with whether to issue an editorial note of concern or retract the paper.
- Can we envisage approaches to retract published paper mill papers in batches?
- Without an admission from the authors, is it ever possible to prove a paper came from a paper mill?
There are more comments on paper mills posted on our website before the Forum discussion.
Following this discussion, COPE will look at revising our guidelines on systematic manipulation of the publication process.
Please add your comments, to the discussion, below.