Do Quantitative Forecasters Have Special Obligations to Policy Advisees?

A recent Dart-Throwing Chimp blog post by Jay Ulfelder asks the question, how circumscribed should quantitative forecasters be? The question was prompted by recent comments he made at a meeting on genocide where he described his efforts to help build a quantitative system for early warning of genocide. As he notes, “The chief outputs of that system are probabilistic forecasts, some from statistical models and others from a “wisdom of (expert) crowds” system called an opinion pool.” His post was prompted by a set of online replies from one of the other panelists, Patrick Ball, executive director of Human Rights Data Analysis Group.

The gist of Ball’s replies (as summarized by Ulfelder) is that forecasters should be wary of using quantitive techniques in place of more conventional qualitative approaches because policy makers (or other decision makers reliant on forecasting advice) are disproportionately swayed by quantitative information, perhaps especially when it’s visualized as a figure or graph. Such information can, in Ball’s view, crowd out the more conventional forms of human assessment, which Ball sees as having much value. As well, since many users don’t have the technical skills to judge the integrity of the quantitative techniques employed on their own, there is a special obligation, in Ball’s view, for presenting the limitations of quantitative approaches up front to users.

Ulfelder is not convinced. He cites Kahneman (and, by the way, who doesn’t?), who notes that people have a strong bias for human judgment and advice over machine or technology. We take greater pride in human triumphs than in the triumphs of machines (that we built!) and we are also more willing to accept human error than machine or quantitative modelling error. This is why we resist greater reliance on quantitative modelling techniques as judgment aids even when the evidence clearly indicates that judgment accuracy is improved. We just don’t trust machines to inform us like we do humans, even when they do much, much better.

Ulfelder draws on Kahneman in raising another point, which is that advisers often get ahead by inflating their confidence, often way past the point of proper calibration. Hemming and hawing to policy makers about the limitations of one’s quantitative approach is a surefire recipe for advice neglect, especially if it’s done up front as Ball suggests. By the time the advisor gets to the message, the audience may have tuned out, not only because the technical details aren’t what they wanted to know about, but also because the messenger has decided the first thing to tell them about is the problems. That negatively primes the receiver. Since qualitatively-oriented advisors don’t bend over backwards to qualify the limitations of their approaches — the predominant one being “expert intuition” — why should the quantitative advisor self-handicap?

Here’s my take: First, Ball has got a point. We should be concerned that our models or other quantitative approaches to advice giving (forecasts or otherwise) are sound. But, who says developers of such models like Ulfelder aren’t? It seems odd to presume that quantitative types would be less concerned about rigour than their qualitative counterparts. I would have thought that model developers would be more sensitized to issues about cross validation than qualitative types would be to validation or reliability tests, if for no other reason than in the former case there are pretty clear methods for validating and testing reliability, whereas as one moves toward “expert intuition” the methods are murkier. That murk I would think would translate into “rigour neglect.”

Second, Ball has got another point. Many users won’t understand the mechanics behind the model. If they are technophiles, they may unduly trust (which is what I believe he was emphasizing). If they’re technophobes, they may unduly dismiss (which tracks with Ulfelder’s experiences). Either way, their reactions are driven more by their attitudes toward technology and quantification than by accuracy, diagnostic value, relevance, timeliness and other criteria that matter for effective decision making. There is a real problem here because in many cases it’s very hard to explain how the model works, so purveyors may end up saying “just trust me — it works, at least better than the alternatives.” Now advisees will likely fall back on their attitudes. Technophiles might be more inclined to trust, technophobes to doubt.

I don’t think Ball’s solution of pre-emptive warning is the right one, but I do think he was onto something, and that is that it would be very helpful if quantitative forecasters could find a way in layman’s terms to explain their basic approach — what their model does, how it does it, and how we know if it’s any good. That I believe would help foster trust. It’s not a special obligation in my view, but rather a benefit to all sides (except perhaps anyone who feels threatened by the prospect of models replacing humans as sources of advice).

From the quantitative forecaster’s perspective, I’d say this is a strategic necessity because, as Sherman Kent noted long ago (e.g., in words of estimative probability), not only are most assessors “poets” rather than “mathematicians”, but most policy makers are also poets, perhaps even in greater proportion than within the intelligence community. Kent was of course referring to the qualitative types who put narrative beauty ahead of predictive accuracy (poets) and their counterparts in the intelligence community who not only want predictive accuracy but also want to communicate judgements very clearly, preferably with numbers rather than words (mathematicians). To put it into the psychological terms Phil Tetlock articulated some decades ago, the quantitative forecaster is accountable to a skeptical audience and, accordingly, he may engage in some pre-emptive self criticism to show his audience that he has considered all sides. I believe this is, more generally, why strategic analysts are underconfident rather than overconfident in their forecasts, despite showing very good discrimination skill, as we show in a recent report and in a recent paper in the Proceedings of the National Academy of Sciences.

The poets have the home court advantage because they are advising other poets, who are skeptical about the products mathematicians offer. Because of this ecology of beliefs, confidence peddling, which might work quite well for poets, will probably flop for mathematicians, maybe even as badly as up-front self-handicapping. Just as the mathematicians strive for clarity and crispness in forecast communication, they need to do likewise regarding communications about their methods. That’s seldom the case. Too often, the entry price for understanding is set far too high. That puts off even those who might have been inclined to listen to advice from new, more quantitatively-oriented advisors.

And, of course we should be asking about the qualitative human assessments — how good are they? how can we know? — just as Tetlock had done in his landmark study of geopolitical forecasting and as we’ve more recently done with strategic intelligence analysts in the aforementioned report and paper. When forecasters refuse to give forecasts that are verifiable either because their uncertainties are shrouded in the vagueness of verbal probability terms or because the targets of their forecasts are ill defined, it becomes difficult, if not impossible, to verify accuracy. Some poets might like it that way. Their bosses and their bosses’ political masters aren’t going to force them to do it differently since they are mainly poets as well. The mathematicians have to try to advance the accountability issue. It might help if some of them got into high-ranking policy positions. Then again, maybe they don’t make good leaders. The ecology of these individual differences — poets vs. mathematicians, foxes vs. hedgehogs, etc. — across functional roles in society is surely not accidental.