Adversarial Validation in Image Classification Datasets by Means of Cumulative Spectral Gradient
Journal
Algorithms
ISSN
1999-4893
Publisher
MDPI
Date Issued
2024
Author(s)
Renza, Diego
Chavarro, Adrian
Type
Resource Types::text::journal::journal article
Abstract
The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, how can we assess whether the training and test data have a similar distribution? To answer this question, this paper presents a proposal to determine the degree of distribution shift of two datasets. To this end, a metric for evaluating complexity in datasets is used, which can be applied in multi-class problems, comparing each pair of classes of the two sets. The proposed methodology has been applied to three well-known datasets: MNIST, CIFAR-10 and CIFAR-100, together with corrupted versions of these. Through this methodology, it is possible to evaluate which types of modification have a greater impact on the generalization of the models without the need to train multiple models multiple times, also allowing us to determine which classes are more affected by corruption. ©The authors ©MDPI ©Algorithms.
License
Acceso Abierto
How to cite
Renza, D., Moya-Albor, E., & Chavarro, A. (2024). Adversarial Validation in Image Classification Datasets by Means of Cumulative Spectral Gradient. Algorithms, 17(11), 531. https://doi.org/10.3390/a17110531
