Options
Advantages of Oversampling Techniques: A Case Study in Risk Factors for Fall Prediction
ISSN
18650929
Date Issued
2023-01-01
Author(s)
Sihag, Gulshan
Yadav, Pankaj
Vijay, Vivek
Delcroix, Veronique
Siebert, Xavier
Yadav, Sandeep Kumar
Puisieux, François
DOI
10.1007/978-3-031-37496-8_4
Abstract
The evaluation of risk factors for falls (RFF) is a key point in fall prevention for the elderly. Since the information of the main actionable RFF can not always be regularly re-evaluated by medical factors, their automatic prediction would allow providing useful recommendations to reduce the risk of falls. This article explores the advantages of three oversampling methods to improve the quality of the prediction of 12 target RFF on the basis of a real imbalanced data set. We first present the data set, together with the selection of 45 variables and 12 target variables and other pre-processing steps. Second, we present the three oversampling methods, SMOTE, SMOTE-SVM, and ADASYN, the classifiers (Logistic Regression, Random Forest, Bayesian Network, Artificial Neural Network, and Naive Bayes), and the quality measures that we use in this study (balanced accuracy, area under ROC curve, area under Precision-Recall curve, F1 and F2 score). Each target is successively evaluated from all other variables. Results are presented by the classifier (averaging over targets) and by target (averaging over classifiers), for each oversampling method and quality measure. Finally, statistical tests validate the interest of using oversampling methods. The three methods demonstrate a clear advantage in comparison with the imbalanced data set, and SVM-SMOTE provides the best increment.