Further Details on Examining Adversarial Evaluation: Role of Difficulty

Mehrbakhsh, Behzad; Martínez-Plumed, Fernando; Hernández-Orallo, José

Riunet Móvil

Home Versión de escritorio

Further Details on Examining Adversarial Evaluation: Role of Difficulty

Mostrar el registro completo del ítem

Título: Further Details on Examining Adversarial Evaluation: Role of Difficulty

Autor: Mehrbakhsh, Behzad; Martínez-Plumed, Fernando; Hernández-Orallo, José

Resumen: Adversarial benchmark construction, where harder instances challenge new generations of AI systems, is becoming the norm. While this approach may lead to better machine learning models ---on average and for the new \mbox{benchmark---,} it is unclear how these models behave on the original distribution. Two opposing effects are intertwined here. On the one hand, the adversarial benchmark has a higher proportion of difficult instances, with lower expected performance. On the other hand, models trained on the adversarial benchmark may improve on these difficult instances (but may also neglect some easy ones). To disentangle these two effects we can control for difficulty, showing that we can recover the performance on the original distribution, provided the harder instances were obtained from this distribution in the first place. We show this difficulty-aware rectification works in practice, through a series of experiments with several benchmark construction schemas and the use of a populational difficulty metric. As a take-away message, instead of distributional averages we recommend using difficulty-conditioned characteristic curves when evaluating models built with adversarial benchmarks.

URI: http://hdl.handle.net/10251/195689

Fecha: 2023-07-28

Relacionado Ítems en Google Scholar

Ficheros en el ítem

Descargar (4.365Mb)

Further Details...HAPP.pdf

PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Servicios y unidades UPV. Material de investigación [78]

Mostrar el registro completo del ítem

Riunet Móvil

Further Details on Examining Adversarial Evaluation: Role of Difficulty

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tema móvil para Riunet