A comparative study of regression methods to predict forest structure and canopy fuel variables from LiDAR full-waveform data


  • P. Crespo-Peremarch Universitat Politècnica de València https://orcid.org/0000-0003-2241-4493
  • L.A. Ruiz Universitat Politècnica de València
  • Ángel Balaguer-Beser Universitat Politècnica de València




Regression models, Random Forest, CART, M5, Wilcoxon, Friedman, forest structure, canopy fuel, LiDAR full-waveform


Regression methods are widely employed in forestry to predict and map structure and canopy fuel variables. We present a study where several regression models (linear, non-linear, regression trees and ensemble) were assessed. Independent variables were calculated using metrics extracted from full-waveform LiDAR data, while the reference data used to generate the dependent variables for the prediction models were obtained from fieldwork in 78 plots of 16 m radius. Transformations of dependent and independent variables with feature selection were carried out to assess their influence in the prediction of response variables. In order to evaluate significant differences and rank regression models we used the non-parametric tests Wilcoxon and Friedman, and post-hoc analysis or post-hoc pairwise multiple comparison tests, such as Nemenyi, for Friedman test. Regressions using transformation of the dependent variable, like square-root or logarithmic, or the independent variable, increased R2 up to 6% with respect to linear regression using unprocessed response variables. CART (Classification and Regression Tree) method provided poor results, but it may be interesting for categorisation purposes. Square-root transformation of the dependent variable is the method having the best overall results, except for stand volume. However, not always has a significant improvement with respect to other regression methods.


Download data is not yet available.

Author Biographies

P. Crespo-Peremarch, Universitat Politècnica de València

Personal investigador delGrupo de Cartografía GeoAmbiental y Teledetección (CGAT), Departamento de Ingeniería Cartográfica, Geodesia y Fotogrametría,

Camí de Vera s/n 46022-Valencia, España

L.A. Ruiz, Universitat Politècnica de València

Grupo de Cartografía GeoAmbiental y Teledetección (CGAT), Departamento de Ingeniería Cartográfica, Geodesia y Fotogrametría

Camí de Vera s/n 46022-Valencia, España

Ángel Balaguer-Beser, Universitat Politècnica de València

Grupo de Cartografía GeoAmbiental y Teledetección (CGAT), Departamento de Matemática Aplicada

Camí de Vera s/n 46022-Valencia, España


Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory. Akadémia Kiado, Budapest, Hungary. pp. 267-281.

Andersen, H.E., McGaughey, R.J., Reutebuch, S.E. 2005. Estimating forest canopy fuel parameters using LiDAR data. Remote Sensing of Environment, 94(4), 441-449. http://dx.doi.org/10.1016/j.rse.2004.10.013

Andersen, H.E., Breidenbach, J. 2007. Statistical properties of mean stand biomass estimators in a lidar-based double sampling forest survey design. In: ISPRS Workshop on Laser Scanning 2007 and SilviLaser, 2007. Espoo, Finland, September 12-14. pp. 8-13.

Baccini, A., Laporte, N., Goetz, S.J., Sun, M., Dong, H. 2008. A first map of tropical Africa’s aboveground biomass derived from satellite imagery. Environmental Research Letters, 3(4), 1-9. http://dx.doi.org/10.1088/1748-9326/3/4/045011

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and regression trees. New York: Chapman and Hall.

Breiman, L., 2001. Random Forests. Machine Learning, 45, 5-32. http://dx.doi.org/10.1023/A:1010933404324

Cao, L., Coops, N.C., Hermosilla, T., Innes, J., Dai, J., She, G. 2014. Using small-footprint discrete and full-waveform airborne LiDAR metrics to estimate total biomass and biomass components in subtropical forests. Remote Sensing, 6, 7110-7135.http://dx.doi.org/10.3390/rs6087110

De’Ath, G., Fabricius, K.E. 2013. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology, 81(11), 3178-3192. http://dx.doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2

Demšar, J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Reasearch, 7, 1-30.

Duong, H.V. 2010. Processing and application of ICESat large footprint full waveform laser range data. Ph.D. Thesis, Delft University of Technology, Netherlands.

Erdody, T.L., Moskal, L.M. 2010. Fusion of LiDAR and imagery for estimating forest canopy fuels. Remote Sensing of Environment, 114(4), 725-737. http://dx.doi.org/10.1016/j.rse.2009.11.002

Friedman, M. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675-701. http://dx.doi.org/10.1080/01621459.1937.10503522

Friedman, M. 1940. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86-92. http://dx.doi.org/10.1214/aoms/1177731944

García-Gutiérrez, J., González-Ferreiro, E., Mateos-García, D., Riquelme-Santos, J.C., Mirando, D. 2011. A comparative study between two regression methods on LiDAR data: A case study. Lecture Notes in Artificial Intelligence, 6679, 311-318. http://dx.doi.org/10.1007/978-3-642-21222-2_38

García-Gutiérrez, J., González-Ferreiro, E., Riquelme-Santos, J.C., Miranda, D., Diéguez-Aranda, U., Navarro-Cerrillo, R.M. 2013. Evolutionary feature selection to estimate forest stand variables using LiDAR. International Journal of Applied Earth Observation and Geoinformation, 26, 119-31.http://dx.doi.org/10.1016/j.jag.2013.06.005

Guyon, I., Elisseeff, A. 2003. An introduction to variables and feature selection. Journal of Machine Learning Research, 3, 1157-1182.

Hannon, L., Knapp, P. 2003. Reassessing nonlinearity in the urban disadvantage/violent crime relationship: an example of methodological bias from log transformation. Criminology, 41(4), 1427-1448. http://dx.doi.org/10.1111/j.1745-9125.2003.tb01026.x

Hermosilla, T., Ruiz, L.A., Kazakova, A.N., Coops, N.C., Moskal, L.M. 2014. Estimation of forest structure and canopy fuel parameters from smallfootprint full-waveform LiDAR data. International Journal of Wildland Fire, 23(2), 224-233. http://dx.doi.org/10.1071/WF13086

Hernández-Orallo, J., Ramírez, M.J., Ferri, C. 2004. Introducción a la minería de datos. Madrid: Pearson Educación S.A.

Hyyppä, J., Hyyppä, H., Inkinen, M., Engdahl, M., Linko, S., Zhu, Y-H. 2000. Accuracy comparison of various remote sensing data sources in the retrieval of forest stand attributes. Forest Ecology and Management, 128(1-2), 109-120. http://dx.doi.org/10.1016/S0378-1127(99)00278-9

Lefsky, M.A., Cohen, W.B., Acker, S.A., Parker, G.G., Spies, T.A., Harding, D. 1999. Lidar remote sensing of the canopy structure and biophysical properties of Douglas-fir western hemlock forests. Remote Sensing of Environment, 70(3), 339-361. http://dx.doi.org/10.1016/S0034-4257(99)00052-8

Li, L., Guo, Q., Tao, S., Kelly, M., Xu, G. 2015. Lidar with multi-temporal MODIS provide a means to upscale predictions of forest biomass. ISPRS Journal of Photogrammetry and Remote Sensing, 102, 198-208. http://dx.doi.org/10.1016/j.isprsjprs.2015.02.007

Luengo, J., García, S., Herrera, F. 2012. On the Choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems, 32(1), 77-108. http://dx.doi.org/10.1007/s10115-011-0424-2

Marabel-García, M., Álvarez-Taboada, F. 2014. Estimación de biomasa en herbáceas a partir de datos hiperespectrales, regresión PLS y la transformación continuum removal. Revista de Teledetección, 42, 49-59. http://dx.doi.org/10.4995/raet.2014.2286

Means, J.E., Acker, S.A., Fitt, B.J., Renslow, M., Emerson, L., Hendrix, C.J. 2000. Predicting forest stand characteristics with airborne scanning lidar. Photogrammetric Engineering & Remote Sensing, 66(11), 1367-1371.

Naesset, E., Bollandsas, O.M., Gobakken, T. 2005. Comparing regression methods in estimation of biophysical properties of forest stands from two different inventories using laser scanner data. Remote Sensing of Environment, 94(4), 541-553. http://dx.doi.org/10.1016/j.rse.2004.11.010

Nemenyi, P.B. 1963. Distribution-free multiple comparisons. Ph.D. Thesis, Princeton University, New Jersey, USA.

Posada, D., Buckley, T.R. 2004. Model selection and model averaging in Phylogenetics: advantages of Akaike Information Criterion and Bayesian Approaches over Likelihood Ratio tests. Systematic biology, 53(5), 793-808. http://dx.doi.org/10.1080/10635150490522304

Quinlan, J.R. 1992. Learning with continuous classes. Machine Learning, 92, 343-348.

Segal, M.R. 2004. Machine learning benchmarks and Random Forest regression. Technical report, Center for Bioinformatics & Molecular Biostatistics, University of California, San Francisco, USA.

Skowronski, N.S., Clark, K.L., Duveneck, M., Hom, J. 2011. Three-dimensional canopy fuel loading predicted using upward and downward sensing LiDAR systems. Remote Sensing of Environment, 115(2), 703-714. http://dx.doi.org/10.1016/j.rse.2010.10.012

Standish, J.T., Manning, G.H., Demaershalk, J.P. 1985. Development of biomass equations for British Columbia tree species. Canadian Forestry Service, Pacific Forest Research Center, Information Report BC-X-264, Victoria, BC, Canada.

Schwarz, G. 1978. Estimating the dimension of a model. The annals of statistics, 6(2), 461-464. http://dx.doi.org/10.1214/aos/1176344136

Temesgen, H., Strunk, J., Andersen, H., Flewelling, J. 2015. Evaluating different models to predict biomass increment from multi-temporal lidar sampling and remeasured field inventory data in south-central Alaska. Mathematical and computational forestry and natural resource sciences, 7(2), 66-80.

Wang, L., Zhou, X-H., 2005. A fully nonparametric diagnostic test for homogeneity of variances. The Canadian Journal of Statistics, 33(4), 545-558. http://dx.doi.org/10.1002/cjs.5550330406

Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83. http://dx.doi.org/10.2307/3001968

Zar, J.H., 1999. Biostatistical analysis. Upper Saddle River, New Jersey: Prentice Hall.