Leveraging Influence Functions for Dataset Exploration and Cleaning - IRIT - Université Toulouse III Paul Sabatier Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Leveraging Influence Functions for Dataset Exploration and Cleaning

Résumé

In this paper, we tackle the problem of finding potentially problematic samples and complex regions of the input space for large pools of data without any supervision, with the objective of being relayed to and validated by a domain expert. This information can be critical, as even a low level of noise in the dataset may severely bias the model through spurious correlations between unrelated samples, and under-represented groups of data-points will exacerbate this issue. As such, we present two practical applications of influence functions in neural network models to industrial use-cases: exploration and cleanup of mislabeled examples in datasets. This robust statistics tool allows us to approximately know how different an estimator might be if we slightly changed the training dataset. In particular, we apply this technique to an ACAS Xu neural network surrogate model use-case[14] for complex region exploration, and to the CIFAR-10 canonical RGB image classification problem[20] for mislabeled sample detection with promising results.
Fichier principal
Vignette du fichier
Leveraging_Influence_Functions_for_Dataset_Exploration_and_Cleaning_finalV2.pdf (1.33 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03617649 , version 1 (23-03-2022)

Identifiants

  • HAL Id : hal-03617649 , version 1

Citer

Agustin Martin Picard, David Vigouroux, Petr Zamolodtchikov, Quentin Vincenot, Jean-Michel Loubes, et al.. Leveraging Influence Functions for Dataset Exploration and Cleaning. 11th European Congress Embedded Real Time Systems (ERTS 2022), Jun 2022, Toulouse, France. pp.1-8. ⟨hal-03617649⟩
411 Consultations
386 Téléchargements

Partager

Gmail Facebook X LinkedIn More