Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Analysis of Maize Yield under Climate Change, Adaptations in Varieties and Planting Date in Northeast China in Recent Thirty Years

The Northeast China (NEC) was the most important agriculture areas and known as the Golden-Maize-Belt. Based on observed crop data and crop model, we design four simulating experiments and separate relative impacts and contribution under climate change, planting date shift, and varieties change as well change of varieties and planting date. Without planting date and varieties change, maize yields had no significant change trend at Hailun station located in the north of NEC, and presented significant decrease by 0.2 - 0.4 t/10a at two stations, which located in the middle and the south of NEC. With planting date change, yields showed a significant increase by 0.09 - 0.47 t/10a. With varieties change, maize yields had significant increase by 1.8~ 1.9 t/10a at Hailun and Huadian stations, but a non-significant and low increase by 0.2t /10a at Benxi located in the south of NEC. With change of varieties and planting date, yields presented a significant increasing by 0.53- 2.0 t/10a. Their contribution to yields was -25% ~ -55% for climate change, 15% ~ 35% for planting date change, and 20% ~110% for varieties change as well 30% ~135% for varieties with planting date shift. It found that change in varieties and planting date were highest yields and were responsible for significant increases in maize yields, varieties was secondly, and planting date was thirdly. It found that adaptation in varieties and planting date greatly improved maize yields, and increased yields annual variability. The increase of contribution with planting date and varieties change in 2000s was lower than in 1990s. Yields with the varieties change and yields with planting date and varieties change all showed a decreasing trend at Huadian and Benxi since 2002 or so. It indicated that maize yields increasing trend stagnated in the middle and south of NEC, and continued in the north of NEC.