Abstract: Crop yield prediction is a paramount issue in
agriculture. The main idea of this paper is to find out efficient
way to predict the yield of corn based meteorological records.
The prediction models used in this paper can be classified into
model-driven approaches and data-driven approaches, according to
the different modeling methodologies. The model-driven approaches are based on crop mechanistic
modeling. They describe crop growth in interaction with their
environment as dynamical systems. But the calibration process of
the dynamic system comes up with much difficulty, because it
turns out to be a multidimensional non-convex optimization problem.
An original contribution of this paper is to propose a statistical
methodology, Multi-Scenarios Parameters Estimation (MSPE), for the
parametrization of potentially complex mechanistic models from a
new type of datasets (climatic data, final yield in many situations).
It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction
is free of the complex biophysical process. But it has some strict
requirements about the dataset.
A second contribution of the paper is the comparison of these
model-driven methods with classical data-driven methods. For this
purpose, we consider two classes of regression methods, methods
derived from linear regression (Ridge and Lasso Regression, Principal
Components Regression or Partial Least Squares Regression) and
machine learning methods (Random Forest, k-Nearest Neighbor,
Artificial Neural Network and SVM regression).
The dataset consists of 720 records of corn yield at county scale
provided by the United States Department of Agriculture (USDA) and
the associated climatic data. A 5-folds cross-validation process and
two accuracy metrics: root mean square error of prediction(RMSEP),
mean absolute error of prediction(MAEP) were used to evaluate the
crop prediction capacity.
The results show that among the data-driven approaches, Random
Forest is the most robust and generally achieves the best prediction
error (MAEP 4.27%). It also outperforms our model-driven approach
(MAEP 6.11%). However, the method to calibrate the mechanistic
model from dataset easy to access offers several side-perspectives.
The mechanistic model can potentially help to underline the stresses
suffered by the crop or to identify the biological parameters of interest
for breeding purposes. For this reason, an interesting perspective is
to combine these two types of approaches.
Abstract: In comparison to the original SVM, which involves a
quadratic programming task; LS–SVM simplifies the required
computation, but unfortunately the sparseness of standard SVM is
lost. Another problem is that LS-SVM is only optimal if the training
samples are corrupted by Gaussian noise. In Least Squares SVM
(LS–SVM), the nonlinear solution is obtained, by first mapping the
input vector to a high dimensional kernel space in a nonlinear
fashion, where the solution is calculated from a linear equation set. In
this paper a geometric view of the kernel space is introduced, which
enables us to develop a new formulation to achieve a sparse and
robust estimate.
Abstract: To extract the important physiological factors related to
diabetes from an oral glucose tolerance test (OGTT) by mathematical
modeling, highly informative but convenient protocols are required.
Current models require a large number of samples and extended
period of testing, which is not practical for daily use. The purpose
of this study is to make model assessments possible even from a
reduced number of samples taken over a relatively short period.
For this purpose, test values were extrapolated using a support
vector machine. A good correlation was found between reference and
extrapolated values in evaluated 741 OGTTs. This result indicates
that a reduction in the number of clinical test is possible through a
computational approach.