Complaints about the NIH grant review process

Earlier this week, I met with a collaborator to discuss what to do with our NIH grant proposal, whose “A1” was “unscored” (ie, the revised version, and you don’t get a third try, received a “preliminary score” in the lower half and so was not discussed by the review panel and couldn’t be funded).

NIH proposals are typically reviewed by three people and given preliminary scores on five aspects (significance, approach, investigators, environment, innovation) and overall, and the top proposals based on those scores are discussed and scored by the larger panel.

One of the reviewers gave our proposal an 8 for “approach” (on a scale of 1-9, with 1 being good and 9 being terrible) with the following review comments:

4. Approach:

Strengths

Well described details for mining of [data] and genotyping of [subjects].

Weaknesses

There is no power analysis for Aim 2. Without knowing which and how many [phenotypes] will be evaluated it is not possible to estimate the statistical power.

Valid comments, but is that really all the reviewer had to say? What about Aims 1 and 3, or the other aspects of Aim 2? That is totally fucking inadequate.

Looking at this review again, I was reminded of how much I despise many aspects of the NIH review process. So it’s led me, finally, to write down some of the things that annoy me.

The scoring system is too discrete

I’ve been involved in reviewing NIH grants for 15 years. A bunch of changes were made in about 2009 or so, with a few things for the better (like having the review order of grants based on preliminary score) but many for the worse.

The worst change was to the scoring system. In the old system, reviewers scored grants on a scale of 1-5, in tenths (1.0, 1.1, 1.2, …), with 1 being best and 5 being worst. Scores were averaged and multiplied by 100, so the best possible score was 100 and the worst was 500. (And note that the middle value was 300 not 250, a point of confusion for many.)

In the new system, reviewers score grants on a scale of 1-9, in single digits, with 1 being best and 9 being worst (and 5 being the middle value). Scores are averaged and multiplied by ten, so the best possible score is 10 and the worst is 90.

When the new scale was introduced, we were given the following handy chart:

NIH 9-point score chart

As I understand it, there were two main reasons for revising the scoring system:

Fix the “grade inflation” problem (too many good scores)
Reviewers can’t score grants with 0.1 precision.

But as a fix for the grade inflation problem, revising the scoring system could only be a temporary solution. And as I understand it, the problem is now worse than ever.

The big problem with the new scoring system is that reviewers have just 9 choices of scores, whereas before they had 41. Yes, a reviewer can’t really discriminate 1.3 from 1.4. But if you have 25 imprecise measure instruments, would it be better to

average and then round, or
round and then average

It’s obviously better to average and then round, but the new scoring system rounds and then averages. (Gary Churchill pointed this out to me.)

This leads to the frequent statements like, “I’d put this somewhere between a 1 and a 2.” And with line (or “band”) between funded and not funded being well above 20, it seems like, in many cases, whether a grant is funded has to do with the proportion of reviewers that give it a 1 rather than a 2.

The bullet-point-based reviews lead to superficial and incomplete comments

It used to be that the written reviews of grant proposals were much like reviews of journal articles: for each aspect of a proposal (significance, approach, etc.), we’d write a few paragraphs, in some cases a full page. Such reviews were hard to write, were often long, and sometimes didn’t do a good job of making clear what were the really important issues and what were the less important ones.

With the big review change in 2009, reviewers were asked to write bullet points for “Strengths” and “Weaknesses.” (And in a Microsoft Word template!)

It’s a lot easier to write a few bullet points than to construct coherent prose, but the bullet points that reviewers produce are generally shallow and incomplete. In some cases, a reviewer’s thinking about a proposal may be left in a shallow and incomplete state.

The review at the top of this post is the best (or really worst) instance of this problem.

Don’t drop the proposal summary from the beginning of the discussion

In the old days, the discussion of a proposal would begin with the primary reviewer giving a brief summary: what are the investigators proposing to do? This is critical, as only 3 of the 25 or so people on the panel will have read the proposal.

I don’t know if they’re still doing this, but with the other changes to the review process, we were asked to skip the summary of proposal and just discuss the significance of the work. But how on earth can you talk about the significance of the work without some mention of what the work actually is?

The electronic grant format is often not human readable.

When I first reviewed NIH grant proposals, we each got a copy-paper-sized box with all of the proposals (and when proposals were triaged, all 25 reviewers would simultaneously throw the proposal into a big pile). It was great to move to electronic versions of grants (initially scanned, then fully electronic), but the electronic versions of proposals are not constructed in a way that has a human reader in mind.

The front page of grants used to be quite clear and informative, but now all of the form-generated pages are a design mess. For example, the pages listing the key personnel and their institutions are really hard to parse.

Also, there are loads of useless pages describing what documents were included. Couldn’t the reviewers get a version without all of that crap?

Finally, the PDF bookmarks are often off by a page. If you want to go to the biosketches, you need to click the biosketch bookmark and then page down one.

These things aren’t that big of a deal, but they’re a constant annoyance, and they should be easy to fix.

Tony Scarpa, the former director of the NIH Center for Scientific Review (CSR) who was responsible for the big change in the NIH review system, once visited U Wisconsin, and I asked him about whether these electronic proposals could be improved, and he said, “Oh, that’s not us; that’s grants.gov.” So I asked who I should talk to about the issue, and he said, “Call your Congressman.”

There needs to be some review of reviewers

One final comment: there needs to be some formal way for reviewers to comment on other reviewers. I think most reviewers are very careful and responsible, but there’s often at least one total jerk on a review panel: didn’t read the proposal carefully, didn’t write a coherent review, didn’t pay any attention to what the other reviewers said, gave completely unfair scores.

There should be some system for other reviewers to say, “So-and-so on the panel was a complete jerk and shouldn’t be brought back.”