Options
Prediction of transport behavior of nanoparticles using machine learning algorithm: Physical significance of important features
ISSN
01697722
Date Issued
2023-09-01
Author(s)
Banerjee, Sayan
Bhavna, Km
Raychoudhury, Trishikhi
DOI
10.1016/j.jconhyd.2023.104237
Abstract
There is a rising concern related to the possible risk of human exposure to nanoparticles (NPs). Several studies have reported on the transport behavior of NPs in the porous media under varying conditions. Thus, there is a scope to use this information in a predictive model so that the transport behavior of any un-explored NPs could be predicted. The main focus of his study, therefore, is to apply different machine learning (ML) based models to predict the transport efficiency of a wide range of NPs and to identify the important features. To achieve the objective, first, the dataset is prepared by extracting data from published papers for selected NPs [i.e., silver (nAg), titanium dioxide (nTiO2), zinc oxide (nZnO), graphene oxide (nGO), and etc.]. Then, random forest, XGBoost, and CatBoost algorithms combined with synthetic minority oversampling technique (SMOTE) were applied where retention fraction (RF) is considered as the target feature and particle characteristics (i.e., surface charge, size, concentration), solution chemistry [pH, ionic strength (IS]), porous media properties (grain size, porosity) and flow rate are considered as the training features. The outcome of the study indicates that CatBoost combined with SMOTE performed the best in predicting RF for the entire range of NPs (R2 > 0.89 and MSE < 0.007) as well as for individual NPs. Feature importance analysis indicates four features, namely zeta potential, IS, pH, and particle diameter (the entire range of NPs, nGO, nZnO) or grain size (nAg, nTiO2) have significant weightage (>75%). The result suggests that the features overrule the prediction of transport behavior rather than the types of individual NPs. The relative importance of the features depends on the range of the parameter used. The identified important features are in accordance with the underlying physical process, which makes the prediction model more reliable.