Integration of feature vector selection and support vector machine for classification of imbalanced data

Abstract : Support Vector Machine (SVM) has been widely developed for tackling classification problems. Imbalanced data exist in many practical classification problems where the minority class is usually the one of interest. Undersampling is a popular solution for such problems. However, it has the risk of losing useful information in the original data. At the same time, tuning the hyperparameters in SVM is also challenging. By analyzing the geometrical meaning of kernel methods, an approach is proposed in this paper that combines a modified Feature Vector Selection (FVS) method with maximal between-class separability and an easy-tuning version of SVM, i.e. Feature Vector Regression (FVR) proposed in our previous work. In this paper, the modified FVS method selects a small number of data points that can represent linearly all the dataset in the Reproducing Kernel Hilbert Space (RKHS) and the selected data points give also a maximal separability of the imbalanced data in RKHS. The FVR model is also solved analytically, as in least-squared SVM. The decision threshold for classification is optimized to maximize the predefined accuracy metric. Twenty-six imbalanced datasets are considered and comparisons are carried out with several SVM-based methods for imbalanced data. Statistical test shows the effectiveness of the proposed method.
Type de document :
Article dans une revue
Liste complète des métadonnées

https://hal-mines-paristech.archives-ouvertes.fr/hal-01962339
Contributeur : Magalie Prudon <>
Soumis le : jeudi 20 décembre 2018 - 15:22:38
Dernière modification le : mardi 13 août 2019 - 11:10:04

Identifiants

  • HAL Id : hal-01962339, version 1

Citation

Jie Liu, Enrico Zio. Integration of feature vector selection and support vector machine for classification of imbalanced data. Applied Soft Computing, Elsevier, 2019, 75, pp.702-711. ⟨hal-01962339⟩

Partager

Métriques

Consultations de la notice

115