Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Crop yield prediction is a paramount issue in
agriculture. The main idea of this paper is to find out efficient
way to predict the yield of corn based meteorological records.
The prediction models used in this paper can be classified into
model-driven approaches and data-driven approaches, according to
the different modeling methodologies. The model-driven approaches are based on crop mechanistic
modeling. They describe crop growth in interaction with their
environment as dynamical systems. But the calibration process of
the dynamic system comes up with much difficulty, because it
turns out to be a multidimensional non-convex optimization problem.
An original contribution of this paper is to propose a statistical
methodology, Multi-Scenarios Parameters Estimation (MSPE), for the
parametrization of potentially complex mechanistic models from a
new type of datasets (climatic data, final yield in many situations).
It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction
is free of the complex biophysical process. But it has some strict
requirements about the dataset.
A second contribution of the paper is the comparison of these
model-driven methods with classical data-driven methods. For this
purpose, we consider two classes of regression methods, methods
derived from linear regression (Ridge and Lasso Regression, Principal
Components Regression or Partial Least Squares Regression) and
machine learning methods (Random Forest, k-Nearest Neighbor,
Artificial Neural Network and SVM regression).
The dataset consists of 720 records of corn yield at county scale
provided by the United States Department of Agriculture (USDA) and
the associated climatic data. A 5-folds cross-validation process and
two accuracy metrics: root mean square error of prediction(RMSEP),
mean absolute error of prediction(MAEP) were used to evaluate the
crop prediction capacity.
The results show that among the data-driven approaches, Random
Forest is the most robust and generally achieves the best prediction
error (MAEP 4.27%). It also outperforms our model-driven approach
(MAEP 6.11%). However, the method to calibrate the mechanistic
model from dataset easy to access offers several side-perspectives.
The mechanistic model can potentially help to underline the stresses
suffered by the crop or to identify the biological parameters of interest
for breeding purposes. For this reason, an interesting perspective is
to combine these two types of approaches.




References:
[1] Drummond S T, Sudduth K A, Joshi A, et al. Statistical and neural
methods for site-specific yield prediction(J). Transactions-American
Society of Agricultural Engineers, 2003, 46(1): 5-16.
[2] Liu J, Goering C E, Tian L. A neural network for setting target corn
yields(J). Transactions-American Society of Agricultural Engineers,
2001, 44(3): 705-714.
[3] Kang F. Mod`eles de croissance de plantes et m´ethodologies adaptees
`a leur parametrisation pour l’analyse des ph´enotypes(D)0.5em minus
0.4emChatenay-Malabry, Ecole centrale de Paris, 2013.
[4] Cournede P H, Chen Y, Wu Q, Baey C, Bayol Development and
evaluation of plant growth models: Methodology and implementation in
the PYGMALION platform, 0.5em minus 0.4emMathematical Modelling
of Natural Phenomena, 2013, 8(4): 112-130.
[5] Cournede P H, Letort V, Mathieu A, et al. Some parameter estimation
issues in functional-structural plant modelling(J). Mathematical
Modelling of Natural Phenomena, 2011, 6(2): 133-159. [6] Goodwin G C, Payne R L. Dynamic system identification: experiment
design and data analysis(J). 1977.
[7] Wallach D, Goffinet B. Mean squared error of prediction in models
for studying ecological and agronomic systems(J). Biometrics, 1987:
561-573.
[8] Wallach D. Evaluating crop models(J). Working with Dynamic
Crop Models Evaluation, Analysis, Parameterization, and Applications,
Elsevier, Amsterdam, 2006: 11-54.
[9] Mess´ean A, Bernard H, de Turckheim ´ E. Concevoir et construire la
d´ecision: D´emarches en agriculture, agroalimentaire et espace rural(M).
Editions Quae, 2009.
[10] Lecoeur J, Poir´e-Lassus R, Christophe A, et al. Quantifying
physiological determinants of genetic variation for yield potential in
sunflower. SUNFLO: a model-based analysis(J). Functional plant
biology, 2011, 38(3): 246-259.
[11] Brun F, Wallach D, Makowski D, et al. Working with dynamic crop
models: Evaluation, analysis, parameterization, and applications(M).
Elsevier, 2006.
[12] Saltelli A, Tarantola S, Campolongo F, et al. Sensitivity analysis in
practice: a guide to assessing scientific models(M). John Wiley and
Sons, 2004.
[13] Saltelli A, Chan K, and Scott EM, eds. Sensitivity analysis. Vol. 1.
New York: Wiley, 2000.
[14] Wu, QL, Courn`ede PH and Mathieu, A An efficient computational
method for global sensitivity analysis and its application to tree growth
modelling(J). Reliability Engineering & System Safety, 2012, 107: 35-43.
[15] Courn`ede PH, Chen Y, Wu QL, Baey C, Bayol B Development and
evaluation of plant growth models: Methodology and implementation
in the pygmalion platform(J). Mathematical Modelling of Natural
Phenomena, 2013, 8: 112-130.
[16] Eberhart R, Kennedy J. A new optimizer using particle swarm theory(C)
Micro Machine and Human Science, 1995. MHS’95., Proceedings of the
Sixth International Symposium on. IEEE, 1995: 39-43.
[17] Shi Y. Particle swarm optimization: developments, applications and
resources(C) Evolutionary computation, 2001. Proceedings of the 2001
Congress on. IEEE, 2001, 1: 81-86.
[18] Shi Y, Eberhart R. Parameter selection in particle swarm
optimization(C) Evolutionary programming VII. Springer
Berlin/Heidelberg, 1998: 591-600.
[19] Kennedy J. Particle swarm optimization(M) Encyclopedia of machine
learning. Springer US, 2011: 760-766.
[20] Kennedy J, Mendes R. Population structure and particle swarm
performance(C) Evolutionary Computation, 2002. CEC’02. Proceedings
of the 2002 Congress on. IEEE, 2002, 2: 1671-1676.
[21] Clerc M. The swarm and the queen: towards a deterministic and
adaptive particle swarm optimization(C) Evolutionary Computation,
1999. CEC 99. Proceedings of the 1999 Congress on. IEEE, 1999, 3:
1951-1957.
[22] Shi Y, Eberhart R. A modified particle swarm optimizer(C) Evolutionary
Computation Proceedings, 1998. IEEE World Congress on Computational
Intelligence., The 1998 IEEE International Conference on. IEEE, 1998:
69-73.
[23] Eberhart R C, Shi Y. Comparing inertia weights and constriction factors
in particle swarm optimization(C) Evolutionary Computation, 2000.
Proceedings of the 2000 Congress on. IEEE, 2000, 1: 84-88.
[24] Schutte J F, Reinbolt J A, Fregly B J, et al. Parallel global optimization
with the particle swarm algorithm(J). International journal for numerical
methods in engineering, 2004, 61(13): 2296.
[25] Clarke F H. Optimization and nonsmooth analysis(M). Society for
Industrial and Applied Mathematics, 1990.
[26] Singh A, Ganapathysubramanian B, Singh A K, et al. Machine learning
for high-throughput stress phenotyping in plants(J). Trends in plant
science, 2016, 21(2): 110-124. [27] Von Storch H. Misuses of statistical analysis in climate research(M)
Analysis of Climate Variability. Springer Berlin Heidelberg, 1999: 11-26.
[28] Belsley D A. Conditioning diagnostics(M). John Wiley & Sons, Inc.,
1991.
[29] Cline A K, Moler C B, Stewart G W, et al. An estimate for the condition
number of a matrix(J). SIAM Journal on Numerical Analysis, 1979,
16(2): 368-375.
[30] Yin S, Ding S X, Haghani A, et al. A comparison study of basic
data-driven fault diagnosis and process monitoring methods on the
benchmark Tennessee Eastman process(J). Journal of Process Control,
2012, 22(9): 1567-1581.
[31] Shin M Y. The use of ridge regression for yield prediction models with
multicollinearity problems(J).. Journal of Korean Forestry Society, 1990,
79(3): 260-268.[32] Hassan S S, Farhan M, Mangayil R, et al. Bioprocess data mining using
regularized regression and random forests(J). BMC systems biology,
2013, 7(1): S5.
[33] Chang J, Clay D E, Dalsted K, et al. Corn (L.) yield prediction using
multispectral and multidate reflectance(J). Agronomy journal, 2003,
95(6): 1447-1453.
[34] Abdel-Rahman E M, Mutanga O, Odindi J, et al. A comparison of
partial least squares (PLS) and sparse PLS regressions for predicting
yield of Swiss chard grown under different irrigation water sources using
hyperspectral data(J). Computers and Electronics in Agriculture, 2014,
106: 11-19.
[35] Hall M A. Correlation-based feature selection of discrete and numeric
class machine learning(J). 2000.
[36] Ru G. Data mining of agricultural yield data: A comparison of
regression models(C) Industrial Conference on Data Mining. Springer
Berlin Heidelberg, 2009: 24-37.
[37] Albuquerque M C F, de Carvalho N M. Effect of the type of
environmental stress on the emergence of sunflower (Helianthus annus L.),
soybean (Glycine max (L.) Merril) and maize (Zea mays L.) seeds with
different levels of vigor(J). Seed Science and Technology (Switzerland),
2003, 31(2): 465-479.
[38] Midmore E K, McCartan S A, Jinks R L, et al. Using thermal time
models to predict germination of five provenances of silver birch (Betula
pendula Roth) in southern England(J). Silva Fennica, 2015, 49(2).
[39] Atwell B J, Kriedemann P E, Turnbull C G N. Plants in action:
adaptation in nature, performance in cultivation(M). Macmillan
Education AU, 1999.
[40] Williams M M. Agronomics and economics of plant population density
on processing sweet corn(J). Field Crops Research, 2012, 128: 55-61.
[41] Monteith J L, Moss C J. Climate and the efficiency of crop production
in Britain (and discussion)(J). Philosophical Transactions of the Royal
Society of London B: Biological Sciences, 1977, 281(980): 277-294.