Analysis of cross-validation methods for robust retrieval of biophysical parameters

Ll. Pérez-Planells, J. Delegido, J.P. Rivera-Caicedo, J. Verrelst


Non-parametric regression methods are powerful statistical methods to retrieve biophysical parameters from remote sensing measurements. However, their performance can be affected by what has been presented during the training phase. To ensure robust retrievals, various cross-validation sub-sampling methods are often used, which allow to evaluate the model with subsets of the field dataset. Here, two types of cross-validation techniques were analyzed in the development of non-parametric regression models: hold-out and k-fold. Selected non-parametric linear regression methods were least squares Linear Regression (LR) and Partial Least Squares Regression (PLSR), and nonlinear methods were Kernel Ridge Regression (KRR) and Gaussian Process Regression (GPR). Cross-validation results showed that LR performed most unstable, while KRR and GPR led to more robust results. This work recommends using a nonlinear regression algorithm (e.g., KRR, GPR) in combination with a k-fold cross-validation technique with k=10 to realize robust retrievals.


Hold-Out; k-fold; Cross-validation; MLRA; Gaussian Process Regression; Kernel Ridge Regression

