In 2007, a paper was published in the medical journal The Lancet that sought to study how the WHO expert panels reach their conclusions, which are profoundly important in shaping global policy on public health . The study, conducted jointly between the Norwegian Centre for Health Services and the Centre for Health Economics at McMaster University in Canada, and funded by the EU concluded systematic reviews were rarely used and the favoured way of developing a report was to use an expert committee or individual experts. One interview among the 29 directors or equivalents commented thus: “There is a tendency to get people around a table and get consensus – everything they do has a scientific part and a political part. This usually means you go to the lowest common denominator or the views of a ‘strong’ person at the table.” This criticism was bad enough but worse was to come. Two papers were published subsequently in the Lancet, one by researchers looking at insecticide treated anti-malarial bed-nets and another looking at child mortality . For the first paper, the authors outline the success of the programme but, importantly, they also outlined some important uncertainties in the data. The WHO received drafts of the data and ahead of the Lancet publication, issued a press release claiming that the data “ends the debate about how to deliver long-lasting insecticidal nets”. The second paper from researchers at Harvard and Queensland universities reported disappointing progress I the rate of reduction of childhood mortality. UNICEF contacted the Lancet about the paper but after considerable consultation with individual experts the Lancet decided to publish and informed UNICEF of the intended data of the publication. UNICEF then fast tracked the publication of its annual State of the World’s Children Report and made claims contrary to the paper. These two actions by the UN agencies, caused the Lancet to pen an editorial which concluded thus: “But the danger is that by appearing to manipulate science, breach trust, resist competition and reject accountability, WHO and UNICEF are acting contrary to scientific norms that one would have expected UN technical agencies to uphold. Worse, they risk inadvertently corroding their own long-term credibility” Scornful criticism for a top class medical journal!!
The UN moves slowly and thus in 2012, in response to such scathing criticism, it issued a specific handbook for guideline development and they established a Guideline Review Committee to be involved in evaluating all subsequent guidelines. Central to this process was the internationally accepted approach to the development of guidelines called the GRADE (Grading of Recommendations Assessment, Development and Evaluation) process. Grade is used to evaluate confidence in the effect of some action or intervention and classifies this confidence as high, moderate, low or very low. If there is more than on effect possible, the overall grading is based on the weakest measure of confidence. In addition to the strength of evidence on outcomes from actions or interventions, GRADE also rates the overall recommendations as strong or conditional. An international panel set out to examine how guidelines and recommendations of the WHO adhered to the GRADE system since its introduction in 2007 up to the year 2012 . A total of 160 recommendations were found and reviewers worked in pairs to evaluate adherence to GRADE guidelines. Of the guidelines deemed to be strong, 56% were found to have low or very low confidence in estimates. Only 17% had high confidence in estimates. . Turning to the 167 recommendations that were considered weak, 85% were indeed based on low to very low confidence in the estimates of the effect of the action or intervention. Thus for example, 100% of the strong recommendations were found to be based on low to very low effects estimates for guidelines on nutrition and influenza. Half of the recommendations in the area of maternal and reproductive health, child health, HIV/AIDS and TB were deemed to be strong recommendations based on low to very low confidence in the outcome effects.
The same set of researchers went one step further in a follow up paper. Sometimes, expert committees have to make judgments . The confidence in the true significance effect estimate might not be as strong as they’d like but the expert committee feels that a strong recommendation is warranted for whatever reason. These are called discordant recommendations and GRADE recognised 5 situations where a discordant recommendation is warranted. Given the very high number of strong recommendations with weak effect evidence observed in the previous study, the researchers set out to see how many of these met any one of the five situations, which GRADE allows a discordant recommendation. Only 16% of the discordant recommendations met any one of the 5 situations where GRADE accepts a discordant recommendation. In all, 84% of the discordant recommendations did not meet the GRADE guidelines. 46% of the discordant recommendations (strong recommendation but low supporting evidence) should have been classified as simply conditional recommendations. These two papers show that the WHO still has a long way to go to meet reasonable levels of scientific integrity. It may well be that expert panels make strong recommendation based on weak evidence of effect because otherwise their recommendations will be ignored. The problem is that in many countries, a strong recommendation from the WHO is the first step in the development of national policies and such is the respect that many national public health agencies have in the WHO and their guidelines that they go unquestioned. Anyone who has had dealings with large UN agencies knows that they are frequently short of resources and given that they answer to multiple national governments and to multiple non-governmental organisations, it is correct to have some level of understanding of their constraints. However, failure to rigorously embed their guidelines in the highest quality of science and the repeated issuing of strong recommendations based on weak to very weak evidence based outcomes, means that they cannot be excused. They may keep most non-governmental activists happy but in the long term, global trust is more important. It is hard won and easily lost.