Relative importance of  Recall, Precision, F-Measure, Informedness, and Markedness metrics to evaluate security tools in Business Critical, Heightened Critical, Best Effort, and Minimum Effort scenarios, according to the declared preferences and familiarity with measures of experts in the domain

Martínez Raga, Miquel; Ruiz García, Juan Carlos; Antunes, Nuno; Andrés Martínez, David de; Vieira, Marco

doi:10.4995/Dataset/10251/162121

Riunet Móvil

Relative importance of Recall, Precision, F-Measure, Informedness, and Markedness metrics to evaluate security tools in Business Critical, Heightened Critical, Best Effort, and Minimum Effort scenarios, according to the declared preferences and familiarity with measures of experts in the domain

Título: Relative importance of Recall, Precision, F-Measure, Informedness, and Markedness metrics to evaluate security tools in Business Critical, Heightened Critical, Best Effort, and Minimum Effort scenarios, according to the declared preferences and familiarity with measures of experts in the domain

Autor: Martínez Raga, Miquel; Ruiz García, Juan Carlos; Antunes, Nuno; Andrés Martínez, David de; Vieira, Marco

Resumen: [EN] The benchmarking of security tools is endeavored to determine which tools are more suitable to detect system vulnerabilities or intrusions. The analysis process is usually oversimplified by employing just a single metric out of the large set of those available. Accordingly, the decision may be biased by not considering relevant information provided by neglected metrics. This work proposes a novel approach to take into account several metrics, different scenarios, and the advice of multiple experts. The proposal relies on experts quantifying the relative importance of each pair of metrics towards the requirements of a given scenario. Their judgments are aggregated using group decision making techniques, and pondered according to the familiarity of experts with the metrics and scenario, to compute a set of weights accounting for the relative importance of each metric. Then, weight-based multi-criteria-decision-making techniques can be used to rank the benchmarked tools. This dataset contains raw data obtained from 21 experts, who declared their familiarity with considered metrics and their preference for each metric in the considered scenarios. Processed data include the consistency ratio of resulting pairwise comparison matrices so inconsistent matrices are rejected - weight = 0.00), the relative contribution of each expert according to their declared familiarity with metrics and computed CRs, and the contribution (weight) of each metric towards each considered scenario.

Descripción: Experts were asked to complete a Google Forms questionnaire (https://goo.gl/forms/EEmkUmLIj20nMJS33) to compare all 5 metrics in pairs for the 4 considered scenarios (40 comparisons). Two questions were defined for each pairwise comparison: i) which is the preferred metric between the two presented (A/B), and ii) which is the intensity of this preference (following Saaty's fundamental scale of absolute numbers: 1-5). Likewise, they declared their familiarity with considered metrics in a Likert 1-5 scale. This information is then used to compute each expert's individual judgement by i) computing the geometric mean for each row of her pairwise comparison matrix, ii) summing up all computed geometric means, and iii) dividing each geometric mean by the resulting sum. The result is a priority vector. The Consistency Ratio (CR) is computed in three successive steps: i) the Principal Eigen Vector (PEV) is calculated by multiplying the sum of the various columns of the pairwise comparison matrix and the weights contained in the priority vector, ii) a consistency index (CI) is deduced attending to the PEV and the number of metrics under study, and iii) the CR can be obtained by normalizing the CI to the random consistency index (RI) that is directly obtained from a table defined in T. L. Saaty, "Decision-making with the ahp: Why is the principal eigenvector necessary," European Journal of Operational Research, vol. 145, no. 1, pp. 85 – 91, 2003. Inconsistent matrices will not be taken into account (weight 0.00). The familiarity declared by each expert is used to compute, using the row geometric mean, the contribution (weight) that her preferences for metrics will have in each scenario. The weight of each metric for each scenario (consensus priority vector) is also be obtained using the weighted geometric mean.

URI: http://hdl.handle.net/10251/162121

Fecha: 2021-02-23

Relacionado Ítems en Google Scholar