How to Assess Peer Review Quality

Background: Reliable and validated quality assessment of peer review reports is among the most difficult aspects of peer review scholarship. Our Summary of Scholarly Approaches to Peer Review Quality Assessment can be found at the bottom of this page.

First, however, we will showcase the EASE’s Four Considerations for Editors Regarding Peer Review Quality Assessment. Note that any quality assessment process should begin with a thorough deliberation of what quality is and by providing detailed and clear guidance on quality to authors and reviewers. For example, quality can relate to individual peer review report (e.g., its tone, clarity, timeliness, thoroughness, constructive feedback, absence of bias, degree of manuscript improvement etc.), to the overall quality of all reviewer reports or comments received for the same manuscript, or to the review process itself (e.g., timeliness, diversity of reviewer background and expertise, transparency, etc.). We encourage you to consider which aspects of quality are important for you and what approaches could be used to measure them. This guide focuses on assessing the quality of a single review report.

While not all manuscript submission systems allow easy implementation of below recommendations, much can still be achieved using survey platforms and emails to authors and reviewers. We also strongly recommend transparency, i.e., informing reviewers that their review reports will be assessed and sharing that assessment with them. Please be mindful that similar to reviewing manuscripts, reviewing or assessing review reports can elicit strong emotions. To avoid retaliation or other unprofessional behaviours, editorial policies should be established, made visible to the community, and updated regularly. Editors should also consider sharing anonymous assessment reports. For example, when a reviewer has conducted 5 or 10 reviews for a journal, the journal could share an average or median score of their assessments, accompanied by appropriate measures of dispersion (standard deviation or interquartile range). Additional input, such as suggestions for improving the quality of the reviews, thank-you letters, or certificates, should ideally accompany those reports.

EASE’s Four Considerations for Editors Regarding Peer Review Quality Assessment

1. Consider asking authors to rate the review reports they receive.

Detailed assessment of review reports can be time-consuming, and no validated, easy-to-use methods are currently available. An easy approach might be to ask authors a simple question: “How would you rate the quality of this review report?” (Note: This question can be designed using a Likert-type 5-rating scale for answers, e.g., from 1 (poor) to 5 (excellent), or using a five star rating system: ☆☆☆☆☆). Additionally, editors should consider including an open-ended question: “Please leave any feedback about the review received or suggestions for its improvement.”

While simple questions such as these can be easy to implement and require no specific training for authors, editors should also consider more specific questions, e.g., “Have the editor and the reviewers provided constructive feedback for the manuscript’s improvement?”, or questions tailored around the structured peer review the journal employs or recommendations listed in the reviewers’ guidelines. The most challenging aspect of rating review reports can be deciding how much a certain omission influences the rating. For example, if one of the reviewers missed a crucial error in the methods or results of the manuscript, but all other aspects of the report were excellent, how much should this omission influence the overall score? Editors should, therefore, consider if they want to provide any guidance on this to the authors. Additional considerations are covered in the Summary of Scholarly Approaches to Peer Review Quality Assessment section below. Examples of sharing review assessment scores or satisfaction surveys can be found on the following pages: Frontiers; Global State of Peer Review.

Note: Previous research has indicated that authors’ satisfactions with peer review are influenced by the (final) recommendations (e.g., accept, revise, reject) for the manuscript. To avoid such potential biases, editors should consider not asking for or disclosing reviewers’ recommendations to authors, and asking for a separate quality evaluation of the editor’s comments or decision. If asking other questions of authors, e.g., regarding the overall handling of the manuscript, journal responsiveness, or timeliness, we recommend including all questions together to reduce multiple emails to authors and increase response rates.

2. Consider asking reviewers to self-rate the review reports they submit.

Editors should consider asking reviewers to submit their self-evaluation of their review reports. Possible questions might include:

“How would you self-rate the quality of your submitted review report?”
“How confident are you in your review of this paper?”
“How would you rate your expertise in evaluating this manuscript in its entirety (e.g., including its statistics, data, or code)?”

(Note: These questions can be designed using a Likert-type 5-rating scale for answers, e.g. from 1 (poor) to 5 (excellent), or using a five star rating system: ☆☆☆☆☆) Such simple questions can be added alongside the structured peer review questions your journal might employ, or together with other questions you might ask of reviewers, such as:

“Should this manuscript be additionally reviewed by a statistician or a language editor?”
“Were there any aspects of the manuscript you were not able to assess, or that should be reviewed by additional experts?”

When a decision on a manuscript is made, journals should share their decision and all review reports obtained with all reviewers. Along with these decision letters, editors should consider asking reviewers to (re-)rate their review reports, and perhaps even rate the reports of the other reviewers. Potential questions might include:

“Having seen the review reports of other reviewers, how would you (re-)rate the quality of your review report?”
“How would you rate the quality of review reports of the other reviewers?”

Alternatively, more specific questions can be asked regarding different quality notions, such as constructiveness, completeness, accuracy, etc. The most challenging aspect of rating review reports can be deciding how much a certain omission influences the rating. For example, if one of the reviewers missed a crucial error in the methods or results of the manuscript, but all other aspects of the report were excellent, how much should this omission influence the overall score? Editors should, therefore, consider if they want to provide any guidance on this to the reviewers. Additional considerations are covered in the Summary of Scholarly Approaches to Peer Review Quality Assessment section below.

Finally, if editors are interested in the consensus between reviewers, they should consider asking:

“Which comments of the other reviewer(s) do you agree or disagree with?”

3. Consider rating the review reports you receive.

Before sending the review reports to authors, editors should check the reports for unprofessional comments and ensure they are free from bias. Additionally, editors should highlight reviewers’ suggestions that require special attention or that do not need to be addressed. During these checks, editors often perceive the quality of the reports, but do not formally rate that quality. Editors should, therefore, consider answering the same question we suggested above for authors, i.e., “How would you rate the quality of this review report?” (Note: This question can be designed using a Likert-type 5-rating scale for answers, e.g., from 1 (poor) to 5 (excellent), or using a five star rating system: ☆☆☆☆☆).

Questions can be also tailored according to the specific structured peer review questions a journal might employ, or centred around the reviewer guidelines of the journal. Editors should also consider sending their rating and their feedback (e.g., suggestions on how the review could have been improved) to the reviewer(s).

Note: While rating each report at the time when it is received may be the most practical approach, especially when clear criteria for the ratings are set, editors should consider if they want to collect two scores for each report: one as an independent evaluation and the other which includes adjustment for the quality of other review report(s) received for the same manuscript. The most challenging aspect of rating review reports can be deciding how much a certain omission influences the rating. For example, if one of the reviewers missed a crucial error in the methods or results of the manuscript, but all other aspects of the report were excellent, how much should this omission influence the overall score? More on this topic is discussed in the section Summary of the Scholarly Approaches to Peer Review Quality Assessment.

4. Consider collaborating with researchers on rating your journal’s past review reports.

While the number of studies on peer review has been increasing, numerous questions about its effectiveness and impact remain. Researchers often struggle getting access to review reports that could shed light on these questions. We highly recommend publishers and journals opening up their past peer review reports to researchers. This can be done securely with creation of non-disclosure agreements or secure environments for peer review report access (see, for example, the Peer Review Workbench). EASE is also in the process of creating additional guides for secure sharing of peer review data between journals and researchers. Researchers can employ similar types of questions as those we recommended above, alongside additional approaches listed in the section below. Any such project should have clear agreement on if, when, where, and in what form should the results be disseminated.

Summary of Scholarly Approaches to Peer Review Quality Assessment

Systematic reviews from 2002 and 2019 provided an overview of research related to the quality of peer review. An additional review of tools and instruments used to measure that quality was published in 2019. Unfortunately, most of the tools have been found to be of low validity, and have not been properly tested. The tools often focused on the word length in combination with the presence or coverage of topics mentioned by the reviewers (e.g., whether a reviewer comments on the study’s strengths, limitations, statistics, interpretation of results, etc.), but failed to capture how accurate those reviewer comments were, and if and how many significant errors, ethical issues, language issues, or other issues the reports missed. Furthermore, many tools have not been properly tested to demonstrate how often different raters using the same tools would produce the same scores. Newer machine-based approaches have also been proposed, such as assessment of the developmental index, thoroughness and helpfulness of reports, and the use of LLMs. Additionally, indirect measures of assessing the quality of the review reports have been proposed based on how much the manuscripts were improved or changed from their submitted to published version.

For those interested in assessing the peer review quality we also recommend considering the potential influence of omissions, language style, time-investment, and if weighted scoring should be applied.

Omissions

Imagine a scenario in which four reviewers assessed the following manuscript title: Comparing two approaches for asessing the quality of peer review: a randomised controlled trial/

The assessments were as follows:

Reviewer 1: Title is excellent

Reviewer 2: Title is not good

Reviewer 3: Title is not good, and needs modify so that the main finding of the study is shown, i.e. which tool is more better.

Reviewer 4: Title itself shows that the authors are idiots and know nothing of methods of research.

How should one rate the quality of these review comments?

If you were asked to score the above comments using a 5-point grading system, e.g., from 1 (poor) to 5 (excellent), which scores would you give?

Reflect on what elements of the review reports contributed to your score.

Now consider that in this particular case all reviewers failed to detect a spelling error in the title, and that only reviewer 4 recognized that the study was not a randomised trial as the method of random participant allocation was not proper randomisation. Would knowing this information impact the rating you would give? Consider how you would re-rate the above four comments with this new information in mind. Also consider if the language quality or style influenced your score.

Language Style

A language style can influence how a review report is perceived. A same issue detected can be be expressed in multiple ways, for example:

1. Stating the issue as a criticism:

There is a spelling error in the title

2. Stating the issue as a suggestion:

The authors should correct asessing to assessing.

3. Stating the issue as a question:

Have the authors checked the spelling of their title?

4. Stating the issue as combination of the above styles:

There is a spelling error in the title, please change asessing to assessing.

For the reviewers and editors, a concrete suggestion is often the most useful, but for those rating the review reports, it is important to consider whether the scoring of the issue detected is impacted by the style in which it was presented. Furthermore, keep in mind that some comments may be misunderstood due to different language proficiencies of reviewers. Some journals might, therefore, require language improvements, or provide them before reviewer comments are obtained.

Time-investment

As the review evaluates the quality of the manuscript, a better manuscript usually requires less comments and suggestions for improvements. Imagine a scenario where the same reviewer was asked to answer a set of 8 questions regarding two training manuscripts.

The final assessments were as follows:

Report 1 (for the first manuscript): “Manuscript can be accepted as is, only please correct the spelling error on line 2.”

Report 2 (for the second manuscript): The reviewer wrote a 3-page document with 25 suggestions on how to improve the manuscript.

Should these reports be rated the same if the same 8 questions were assessed during the review, or should one take into consideration the extent of the advice provided by the reviewer.

Weighted scoring

Journals differ in their resources as well as criteria they employ for manuscript assessment. Some may consider novelty and potential impact, while others focus primarily on methodological rigour. Some might require adherence to reporting guidelines, others to data and code sharing. Some might ask reviewers to check any of those aspects, while others might have in-house staff to do so. This can be especially important when comparing review reports across different journals.

It is also crucial to consider whether all aspects of the review report contribute equally to the final score. For instance, is detecting a spelling error as important as detecting a methodological error, a recent study that was not mentioned in the literature review, or a code error? Additionally, how many, if any, errors can be missed by the reviewer for the score to be unaffected. And should listing all spelling errors be the same as stating there are multiple spelling errors authors should address.

—–

We hope these questions can help those developing new or applying existing methods when assessing peer review quality.