Technical Report CMS-TR-20110401

Scientific Computing, Modeling & Simulation

Title	Predicting odor concentration from mass spectra using random forests
Author/s	A.M. Bhagwat KERMIT, Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, 9000 Ghent, Belgium and Centre for Modeling and Simulation, Savitribai Phule Pune University, Pune 411 007 India J. Van Durme PRG Odournet NV, Industrieweg 114H, 9032 Ghent, Belgium VK Jayaraman Centre for Development of Advanced Computing (C-DAC), Pune 411 007 India Mihir Arjunwadkar Centre for Modeling and Simulation, Savitribai Phule Pune University, Pune 411 007 India B. De Baets KERMIT, Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, 9000 Ghent, Belgium
Abstract	In response to the growing demand for an automated technology to quantify odor concentration, we recently established a direct link between mass spectrometry and olfactometry using random forest regression. Here, we further improve the prediction performance of this methodology further by subsequently tuning the random forest parameters Ntree and Mtry, performing a feature selection to find the best feature subset, investigating the impact of normalizing the data prior to regression and analyzing the effect of including the industrial sector from which the sample originated as a feature. Optimizing Ntree leads to a 71 % improvement of the variance on the prediction, and optimizing Mtry and the selecting feature subset to an 18 % improvement of the mean prediction. The optimal feature subset contains only 14 out of a total of 243 features. Normalizing the features and adding the industrial sector as an additional feature does not have a significant effect on the prediction performance, showing the robustness of random forests to different data distributions and suggesting that the relationship between mass spectrum and odor concentration might be applicable in general, over several different industrial sectors. In the context of feature selection, we introduce the dynamic feature importance as a new measure that expresses the (non-)replaceability of a feature, in addition to the traditionally used measure that quantifies the feature’s contribution to the prediction performance. This new measure is useful in situations where a heuristic feature selection is performed and the modeling method is stochastic. The best model manages to account for 48 % or nearly half of the observed variance. This is substantial but not yet enough for our method to be a ready-to-use alternative to sensory olfactometry. We are currently expanding the dataset and investigating new modeling approaches in order to progress towards that goal. Keywords: odor concentration, random forest, prediction, model, modeling, feature selection, subset selection, variable selection, OlfaMS technology, olfactometry, soft sensor
Keywords
Download	See Contact
Citing This Document	A.M. Bhagwat, J. Van Durme, VK Jayaraman, Mihir Arjunwadkar, and B. De Baets , Predicting odor concentration from mass spectra using random forests . Technical Report CMS-TR-20110401 of the Centre for Modeling and Simulation, Savitribai Phule Pune University, Pune 411007, India (2011); available at http://scms.unipune.ac.in/reports/.
Notes, Published Reference, Etc.
Contact	bhagwataditya AT gmail.com
Supplementary Material