Further Details on Examining Adversarial Evaluation: Role of Difficulty

Mehrbakhsh, Behzad; Martínez-Plumed, Fernando; Hernández-Orallo, José

Riunet Móvil

Further Details on Examining Adversarial Evaluation: Role of Difficulty

dc.contributor.author	Mehrbakhsh, Behzad	es_ES
dc.contributor.author	Martínez-Plumed, Fernando	es_ES
dc.contributor.author	Hernández-Orallo, José	es_ES
dc.date.accessioned	2023-07-28T09:55:05Z
dc.date.available	2023-07-28T09:55:05Z
dc.date.issued	2023-07-28T09:55:05Z
dc.identifier.uri	http://hdl.handle.net/10251/195689
dc.description.abstract	Adversarial benchmark construction, where harder instances challenge new generations of AI systems, is becoming the norm. While this approach may lead to better machine learning models ---on average and for the new \mbox{benchmark---,} it is unclear how these models behave on the original distribution. Two opposing effects are intertwined here. On the one hand, the adversarial benchmark has a higher proportion of difficult instances, with lower expected performance. On the other hand, models trained on the adversarial benchmark may improve on these difficult instances (but may also neglect some easy ones). To disentangle these two effects we can control for difficulty, showing that we can recover the performance on the original distribution, provided the harder instances were obtained from this distribution in the first place. We show this difficulty-aware rectification works in practice, through a series of experiments with several benchmark construction schemas and the use of a populational difficulty metric. As a take-away message, instead of distributional averages we recommend using difficulty-conditioned characteristic curves when evaluating models built with adversarial benchmarks.	es_ES
dc.description.sponsorship	We thank the anonymous reviewers for their comments. This work was funded by valgrAI, the Norwegian Research Council grant 329745 Machine Teaching for Explainable AI, the Future of Life Institute, FLI, under grant RFP2-152, the EU (FEDER) and Spanish grant RTI2018-094403-B-C32 funded by MCIN/AEI/10.13039/501100011033 and by CIPROM/2022/6 funded by Generalitat Valenciana, EU’s Horizon 2020 research and innovation programme under grant agreement No. 952215 (TAILOR), US DARPA HR00112120007 (RECoG-AI) and Spanish grant PID2021-122830OB-C42 (SFERA) funded by MCIN/AEI/10.13039/501100011033 and "ERDF A way of making Europe" In compliance with the recommendations of the Science paper about reporting of evaluation results in AI [3], we include all the results at the instance level	es_ES
dc.language	Inglés	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Class difficulty	es_ES
dc.subject	Adversarial robustness	es_ES
dc.subject	Artificial Intelligence (AI)	es_ES
dc.subject	Adversarial Benchmark	es_ES
dc.subject	AI Evaluation	es_ES
dc.title	Further Details on Examining Adversarial Evaluation: Role of Difficulty	es_ES
dc.type	Otros	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//CIPROM%2F2022%2F006/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI//PID2021-122830OB-C42//MÉTODOS FORMALES ESCALABLES PARA APLICACIONES REALES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094403-B-C32/ES/RAZONAMIENTO FORMAL PARA TECNOLOGIAS FACILITADORAS Y EMERGENTES/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Instituto Universitario Valenciano de Investigación en Inteligencia Artificial - Institut Universitari Valencià de Recerca en Intel·ligència Artificial	es_ES
dc.contributor.affiliation	Valencian Graduate School and Research Network of Artificial Intelligence (ValgrAI)	es_ES
dc.description.bibliographicCitation	Mehrbakhsh, B.; Martínez-Plumed, F.; Hernández-Orallo, J. (2023). Further Details on Examining Adversarial Evaluation: Role of Difficulty. http://hdl.handle.net/10251/195689	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES

Ficheros en el ítem

Descargar (4.365Mb)

Further Details...HAPP.pdf

PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Servicios y unidades UPV. Material de investigación [78]

Mostrar el registro sencillo del ítem

Riunet Móvil

Further Details on Examining Adversarial Evaluation: Role of Difficulty

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tema móvil para Riunet