Calibration of Expert Judgments in Counterterrorism Risk Assessment

Principal Investigator: Vicki Bier


The goal of this research is to develop and verify elicitation methods to assess counterterrorism values that could be of interest for the purpose of planning defensive strategies (but are difficult to estimate because of insufficient historical data) by combining the judgments of multiple domain experts. In particular, the project focused on producing reliable estimates from the opinions of multiple experts by reducing bias and overconfidence. To achieve this, we generated a consensus probability distribution for any given quantity of interest by weighting the experts’ judgments based on their performance. First, we created a list of seed variables for use in this process, whose values are verifiable using existing terrorism databases, so that the true values of the seed variables could be used to assess the experts’ performance. The experts were then asked to provide their opinions in the form of a 90% prediction interval for each quantity of interest (including the median, 5th percentile, and 95th percentile). Several methods of aggregating probabilistic judgments have been developed. The results of these various methods were compared based on their reliabilities (i.e., whether the true value of the seed variable was contained within the resulting interval), and the width of the corresponding interval. Two types of methods were evaluated. First, we considered performance-based aggregation of the raw intervals provided by the experts. In this approach, weighted average intervals were obtained by weighting each expert’s judgments based on their performance. Cooke’s classical method (1991) and Hora’s method (2004) belong to this category. In the second type of method, the original intervals provided by the experts were first broadened to ensure that 90% of the intervals contained the true values of the corresponding seed variables. The resulting broadened intervals were then aggregated using equal weights. Methods in this category varied depending on whether intervals were broadened additively or multiplicatively, and on whether the 5th and 95th percentiles of the resulting intervals were averaged directly (a simple heuristic that is known to be theoretically incorrect) or whether the percentiles were first fit to distributions and the resulting distributions then averaged. The results of this work demonstrated the effectiveness of methods for utilizing expert opinion to estimate counterterrorism values when information sources are limited. In particular, most of the methods tested provided higher accuracy than the performance of an individual expert. Results also yield insights into which methods gave the best performance for our seed variables.