Thursday, August 18, 2016

Statistics versus judgement.

This interesting website, pointed out to me by a friend, offers to send a daily gem of information to you, usually an excerpt from a published, being a glutton for input streams, I signed up. I usually move on after glancing at a given day's topic, but this excerpt from Kahneman's "Thinking, Fast and Slow" I pass on, after excerpting even further:
In his book Clinical vs. Statistical Prediction: A The­oretical Analysis and a Review of the Evidence, psychoanalyst Paul Meehl gave evidence that statistical models almost always yield better predictions and diagnoses than the judgment of trained professionals. In fact, experts frequently give different answers when presented with the same information within a matter of a few minutes...Meehl's book provoked shock and disbelief among clinical psychologists, and the controversy it started has engendered a stream of research that is still flowing today, more than fifty years after its publication. The number of studies reporting comparisons of clinical and statistical predictions has increased to roughly two hundred, but the score in the contest between algorithms and humans has not changed. About 60% of the studies have shown significantly better accuracy for the algo­rithms. The other comparisons scored a draw in accuracy, but a tie is tanta­mount to a win for the statistical rules, which are normally much less expensive to use than expert judgment. No exception has been convinc­ingly documented.
The range of predicted outcomes has expanded to cover medical vari­ables such as the longevity of cancer patients, the length of hospital stays, the diagnosis of cardiac disease, and the susceptibility of babies to sudden infant death syndrome; economic measures such as the prospects of success for new businesses, the evaluation of credit risks by banks, and the future career satisfaction of workers; questions of interest to government agencies, including assessments of the suitability of foster parents, the odds of recidivism among juvenile offenders, and the likelihood of other forms of violent behavior; and miscellaneous outcomes such as the evaluation of scientific presentations, the winners of football games, and the future prices of Bor­deaux wine. Each of these domains entails a significant degree of uncer­tainty and unpredictability. We describe them as 'low-validity environments.' In every case, the accuracy of experts was matched or exceeded by a simple algorithm.
Another reason for the inferiority of expert judgment is that humans are incorrigibly inconsistent in making summary judgments of complex information. When asked to evaluate the same information twice, they frequently give different answers. The extent of the inconsistency is often a matter of real concern. Experienced radiologists who evaluate chest X-rays as 'normal' or 'abnormal' contradict themselves 20% of the time when they see the same picture on separate occasions. A study of 101 indepen­dent auditors who were asked to evaluate the reliability of internal corpo­rate audits revealed a similar degree of inconsistency. A review of 41 separate studies of the reliability of judgments made by auditors, pathologists, psy­chologists, organizational managers, and other professionals suggests that this level of inconsistency is typical, even when a case is reevaluated within a few minutes. Unreliable judgments cannot be valid predictors of anything

No comments:

Post a Comment