Nondestructive Determination of Protein and Oil Contents in Cottonseed Using Least-Squares Support Vector Machines and near Infrared Spectroscopy

Friday, January 6, 2012: 8:00 AM
Crystal Ballroom C (Orlando World Center Marriott)
Shuijin Zhu , Zhejiang University
Zhuang-Rong Haung , Zhejiang University
Cotton is an important fiber crop in the world, and also the second important potential source for plant protein after soybean and the fifth oil-bearing crop. Nowadays, standard methods for determining protein and oil contents with wet chemical methods are time and labor consuming. In this work, a new method was developed for fast and nondestructive determination of protein and oil contents in intact cottonseed, using multivariate spectral analysis with least-squares support vector machine (LS-SVM). Standard normal variate (SNV) and Savitzky-Golay (SG) derivate were applied for spectra preprocessing. As variable selection technique, the Monte Carlo uninformative variable elimination (MC-UVE) method was presented in multivariate calibration. In addition, this paper presented an optimization approach for LS-SVM parameters by genetic algorithms (GA). Compared with the optimal partial least squares (PLS) and LS-SVM models both with full-spectrum data, and MC-UVE-PLS models, the prediction performance of MC-UVE-LS-SVM models was proven to be much better. The correlation coefficients (R2), residual predictive deviation (RPD) and root mean squares error of prediction (RMSEP) were 0.959, 4.871 and 0.977 for protein, and these were 0.950, 4.429 and 0.834 for oil, respectively. These results showed that it was possible to built robust and nondestructive models to quantify protein and oil contents using near infrared (NIR) spectroscopy. Furthermore the nonlinear LS-SVM based on variables selected by MC-UVE would be a better-performing alternative in the NIR data analysis than other conventional calibration techniques.