Title Relative importance of Recall, Precision, F-Measure, Informedness, and Markedness metrics to evaluate security tools in Business Critical, Heightened Critical, Best Effort, and Minimum Effort scenarios, according to the declared preferences and familiarity with measures of experts in the domain Authors Miquel Martínez, Universitat Politécnica de València, Campus de Vera S/N, 46022, Valencia, Spain, mimarra2@disca.upv.es Juan Carlos Ruiz, Universitat Politécnica de València, Campus de Vera S/N, 46022, Valencia, Spain, jcruizg@disca.upv.es Nuno Antunes, University of Coimbra, Polo II - Pinhal de Marrocos, 3030-290 Coimbra, Portugal, nmsa@dei.uc.pt David de Andrés, Universitat Politécnica de València, Campus de Vera S/N, 46022, Valencia, Spain, ddandres@disca.upv.es Marco Vieira, University of Coimbra, Polo II - Pinhal de Marrocos, 3030-290 Coimbra, Portugal, mvieira@dei.uc.pt Date Raw data obtained between 17 Oct 2016 and 20 Nov 2016 through a Google Forms questionnaire Funding This work has been partially supported by the project EUBra-BIGSEA (www.eubra-bigsea.eu), funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement no 690116, the "Programa de Ayudas de Investigación y Desarrollo" (PAID) de la Universitat Politécnica de València and the project DINAMOS (dinamos.webs.upv.es), funded by the Ministerio de Economía, Industria y Competitividad de Españna, grant agreement no TIN2016-81075-R. Description The benchmarking of security tools is endeavored to determine which tools are more suitable to detect system vulnerabilities or intrusions. The analysis process is usually oversimplified by employing just a single metric out of the large set of those available. Accordingly, the decision may be biased by not considering relevant information provided by neglected metrics. This work proposes a novel approach to take into account several metrics, different scenarios, and the advice of multiple experts. The proposal relies on experts quantifying the relative importance of each pair of metrics towards the requirements of a given scenario. Their judgments are aggregated using group decision making techniques, and pondered according to the familiarity of experts with the metrics and scenario, to compute a set of weights accounting for the relative importance of each metric. Then, weight-based multi-criteria-decision-making techniques can be used to rank the benchmarked tools. This dataset contains raw data obtained from 21 experts, who declared their familiarity with considered metrics and their preference for each metric in the considered scenarios. Processed data include the consistency ratio of resulting pairwise comparison matrices so inconsistent matrices are rejected - weight = 0.00), the relative contribution of each expert according to their declared familiarity with metrics and computed CRs, and the contribution (weight) of each metric towards each considered scenario. Index Terms Benchmark Analysis, Security tools, Multiple-Criteria Decision Making, Decision Support License This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License: https://creativecommons.org/licenses/by/4.0/legalcode Files information The "raw" and "processed" folders contain files for the raw and processed data, respectively. - raw/Experts_Preferences_v01.csv (version 1.0): Each expert declares her preference, for each pair of metrics, for each considered scenario. This leads to a total of 21 Experts (identified by an integer number) declaring 10 (pairwise comparison of 5 metrics) preferences (metric A better than metric B) and 10 strengths for such preferences, for a total of 4 scenarios (21 rows x 81 columns). - raw/Experts_Familiarity_With_Metrics_v01.csv (version 1.0): Each experts declares her familiarity with the considered metrics. This leads to a total of 21 Experts (identified by an integer number) declaring 5 familiarities (21 rows x 6 columns). - processed/Experts_Preferences_Matrix_Consistency_Ratio_v01.csv (version 1.0): Pairwise comparison matrices declared by each expert are analised to obtained their Consistency Ratio (CR). Matrices are consistent if CR < 0.20, and are rejected otherwise. This leads to a total of 21 Experts (identified by an integer number) and the CR for the 4 considered scenarios (21 rows and 5 columns). - processed/Experts_Weight_According_To_Familiarity_v01.csv (version 1.0): The contribution of each expert with a consistent matrix for each scenario is computed after her declared familiarity with considered metrics. This leads to a total of 21 Experts (identified by an integer number) and her contribution to the 4 considered scenarios (21 rows and 5 columns). - processed/Consensus_Priority_Vector_For_Each_Scenario_v01.csv (version 1.0): The contribution of each metric to each scenario is computed taking into account the contribution of each expert. This leads to a total of 4 Scenarios and the weight of the 5 metrics (4 rows and 6 columns). Methodology information Experts were asked to complete a Google Forms questionnaire (https://goo.gl/forms/EEmkUmLIj20nMJS33) to compare all 5 metrics in pairs for the 4 considered scenarios (40 comparisons). Two questions were defined for each pairwise comparison: i) which is the preferred metric between the two presented (A/B), and ii) which is the intensity of this preference (following Saaty's fundamental scale of absolute numbers: 1-5). Likewise, they declared their familiarity with considered metrics in a Likert 1-5 scale. This information is then used to compute each expert's individual judgement by i) computing the geometric mean for each row of her pairwise comparison matrix, ii) summing up all computed geometric means, and iii) dividing each geometric mean by the resulting sum. The result is a priority vector. The Consistency Ratio (CR) is computed in three successive steps: i) the Principal Eigen Vector (PEV) is calculated by multiplying the sum of the various columns of the pairwise comparison matrix and the weights contained in the priority vector, ii) a consistency index (CI) is deduced attending to the PEV and the number of metrics under study, and iii) the CR can be obtained by normalizing the CI to the random consistency index (RI) that is directly obtained from a table defined in T. L. Saaty, "Decision-making with the ahp: Why is the principal eigenvector necessary," European Journal of Operational Research, vol. 145, no. 1, pp. 85 – 91, 2003. Inconsistent matrices will not be taken into account (weight 0.00). The familiarity declared by each expert is used to compute, using the row geometric mean, the contribution (weight) that her preferences for metrics will have in each scenario. The weight of each metric for each scenario (consensus priority vector) is also be obtained using the weighted geometric mean. List of variables (Name : Description (Possible values)) - raw/Experts_Preferences_v01.csv: Expert: Integer number identifying the expert (1-21) 1. Markedness vs. Recall - Business Critical: Which of these is the best metric for the business critical scenario? (Markedness/Recall) 1. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 2. Bookmaker Informedness vs. F-Measure - Business Critical: Which of these is the best metric for the business critical scenario? (Bookmaker Informedness/F-Measure) 2. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 3. Precision vs. Recall - Business Critical: Which of these is the best metric for the business critical scenario? (Precision/Recall) 3. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 4. F-Measure vs. Markedness - Business Critical: Which of these is the best metric for the business critical scenario? (F-Measure/Markedness) 4. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 5. Recall vs. Bookmaker Informedness - Business Critical: Which of these is the best metric for the business critical scenario? (Recall/Bookmaker Informedness) 5. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 6. Markedness vs. Precision - Business Critical: Which of these is the best metric for the business critical scenario? (Markedness/Precision) 6. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 7. Bookmaker Informedness vs. Precision - Business Critical: Which of these is the best metric for the business critical scenario? (Bookmaker Informedness/Precision) 7. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 8. F-Measure vs. Recall - Business Critical: Which of these is the best metric for the business critical scenario? (F-Measure/Recall) 8. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 9. Markedness vs. Bookmaker Informedness - Business Critical: Which of these is the best metric for the business critical scenario? (Markedness/Bookmaker Informedness) 9. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 10. Precision vs. F-Measure - Business Critical: Which of these is the best metric for the business critical scenario? (Precision/F-Measure) 10. Determine how much - Business Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 11. Precision vs. Markedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Precision/Markedness) 11. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 12. F-Measure vs. Recall - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (F-Measure/Recall) 12. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 13. F-Measure vs. Bookmaker Informedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (F-Measure/Bookmaker Informedness) 13. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 14. F-Measure vs. Markedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (F-Measure/Markedness) 14. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 15. Recall vs. Precision - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Recall/Precision) 15. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 16. Bookmaker Informedness vs. Markedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Bookmaker Informedness/Markedness) 16. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 17. F-Measure vs. Precision - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (F-Measure/Precision) 17. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 18. Recall vs. Bookmaker Informedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Recall/Bookmaker Informedness) 18. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 19. Markedness vs. Recall - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Markedness/Recall) 19. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 20. Precision vs. Bookmaker Informedness - Hightened Critical: Which of these is the best metric for the heightened critical scenario? (Precision/Bookmaker Informedness) 20. Determine how much - Hightened Critical: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 21. Bookmaker Informedness vs. F-Measure - Best Effort: Which of these is the best metric for the best effort scenario? (Bookmaker Informedness/F-Measure) 21. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 22. Recall vs. F-Measure - Best Effort: Which of these is the best metric for the best effort scenario? (Recall/F-Measure) 22. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 23. Bookmaker Informedness vs. Markedness - Best Effort: Which of these is the best metric for the best effort scenario? (Bookmaker Informedness/Markedness) 23. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 24. Recall vs. Bookmaker Informedness - Best Effort: Which of these is the best metric for the best effort scenario? (Recall/Bookmaker Informedness) 24. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 25. Markedness vs. F-Measure - Best Effort: Which of these is the best metric for the best effort scenario? (Markedness/F-Measure) 25. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 26. Markedness vs. Precision - Best Effort: Which of these is the best metric for the best effort scenario? (Markedness/Precision) 26. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 27. Precision vs. F-Measure - Best Effort: Which of these is the best metric for the best effort scenario? (Precision/F-Measure) 27. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 28. Bookmaker Informedness vs. Precision - Best Effort: Which of these is the best metric for the best effort scenario? (Bookmaker Informedness/Precision) 28. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 29. Markedness vs. Recall - Best Effort: Which of these is the best metric for the best effort scenario? (Markedness/Recall) 29. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 30. Precision vs. Recall - Best Effort: Which of these is the best metric for the best effort scenario? (Precision/Recall) 30. Determine how much - Best Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 31. Precision vs. F-Measure - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Precision/F-Measure) 31. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 32. Markedness vs. F-Measure - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Markedness/F-Measure) 32. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 33. Precision vs. Markedness - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Precision/Markedness) 33. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 34. Recall vs. Bookmaker Informedness - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Recall/Bookmaker Informedness) 34. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 35. Recall vs. F-Measure - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Recall/F-Measure) 35. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 36. Bookmaker Informedness vs. Precision - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Bookmaker Informedness/Precision) 36. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 37. F-Measure vs. Bookmaker Informedness - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (F-Measure/Bookmaker Informedness) 37. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 38. Markedness vs. Recall - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Markedness/Recall) 38. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 39. Precision vs. Recall - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Precision/Recall) 39. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) 40. Bookmaker Informedness vs. Markedness - Minimum Effort: Which of these is the best metric for the minimum effort scenario? (Bookmaker Informedness/Markedness) 40. Determine how much - Minimum Effort: Intensity of the preference (1 - equal, 2 - moderate, 3 - strong, 4 - very strong, 5 - extreme) - raw/Experts_Familiarity_With_Metrics_v01.csv: Expert: Integer number identifying the expert (1-21) How familiar are you with these metrics [Recall]: Familiarity with Recall (1 - First time I hear it, 2 - A little bit, 3 - have used it before, 4 - Used it many times, 5 - very well) How familiar are you with these metrics [Precision]: Familiarity with Precision (1 - First time I hear it, 2 - A little bit, 3 - have used it before, 4 - Used it many times, 5 - very well) How familiar are you with these metrics [F-Measure]: Familiarity with F-Measure (1 - First time I hear it, 2 - A little bit, 3 - have used it before, 4 - Used it many times, 5 - very well) How familiar are you with these metrics [Bookmaker Informedness]: Familiarity with Informedness (1 - First time I hear it, 2 - A little bit, 3 - have used it before, 4 - Used it many times, 5 - very well) How familiar are you with these metrics [Markedness]: Familiarity with Markedness (1 - First time I hear it, 2 - A little bit, 3 - have used it before, 4 - Used it many times, 5 - very well) - processed/Experts_Preferences_Matrix_Consistency_Ratio_v01.csv: Expert: Integer number identifying the expert (1-21) Business Critical: Consistency Ratio (CR) of the pairwise comparison matrix for the business critical scenario (0.0-1.0, matrix consistent if CR < 0.20). Heightened Critical: Consistency Ratio (CR) of the pairwise comparison matrix for the business heightened scenario (0.0-1.0, matrix consistent if CR < 0.20). Best Effort: Consistency Ratio (CR) of the pairwise comparison matrix for the best effort scenario (0.0-1.0, matrix consistent if CR < 0.20). Minimum Effort: Consistency Ratio (CR) of the pairwise comparison matrix for the minimum effort scenario (0.0-1.0, matrix consistent if CR < 0.20). - processed/Experts_Weight_According_To_Familiarity_v01.csv: Expert: Integer number identifying the expert (1-21) Business Critical: Contribution of this expert towards the business critical scenario (0.0-1.0). Heightened Critical: Contribution of this expert towards the business critical scenario (0.0-1.0). Best Effort: Contribution of this expert towards the business critical scenario (0.0-1.0). Minimum Effort: Contribution of this expert towards the business critical scenario (0.0-1.0). - processed/Consensus_Priority_Vector_For_Each_Scenario_v01.csv: Scenario: Identifies each possible scenario (Business Critical/Heightened Critical/Best Effort/Minimum Effort). Recall: Contribution of this metric towards this scenario (0.0-1.0). Precision: Contribution of this metric towards this scenario (0.0-1.0). F-Measure: Contribution of this metric towards this scenario (0.0-1.0). Informedness: Contribution of this metric towards this scenario (0.0-1.0). Markedness: Contribution of this metric towards this scenario (0.0-1.0).