A comparative study of regression methods to predict forest structure and canopy fuel variables from LiDAR full-waveform data
DOI:
https://doi.org/10.4995/raet.2016.4066Keywords:
Regression models, Random Forest, CART, M5, Wilcoxon, Friedman, forest structure, canopy fuel, LiDAR full-waveformAbstract
Regression methods are widely employed in forestry to predict and map structure and canopy fuel variables. We present a study where several regression models (linear, non-linear, regression trees and ensemble) were assessed. Independent variables were calculated using metrics extracted from full-waveform LiDAR data, while the reference data used to generate the dependent variables for the prediction models were obtained from fieldwork in 78 plots of 16 m radius. Transformations of dependent and independent variables with feature selection were carried out to assess their influence in the prediction of response variables. In order to evaluate significant differences and rank regression models we used the non-parametric tests Wilcoxon and Friedman, and post-hoc analysis or post-hoc pairwise multiple comparison tests, such as Nemenyi, for Friedman test. Regressions using transformation of the dependent variable, like square-root or logarithmic, or the independent variable, increased R2 up to 6% with respect to linear regression using unprocessed response variables. CART (Classification and Regression Tree) method provided poor results, but it may be interesting for categorisation purposes. Square-root transformation of the dependent variable is the method having the best overall results, except for stand volume. However, not always has a significant improvement with respect to other regression methods.
Downloads
References
Akaike, H. 1973. Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory. Akadémia Kiado, Budapest, Hungary. pp. 267-281.
Andersen, H.E., McGaughey, R.J., Reutebuch, S.E. 2005. Estimating forest canopy fuel parameters using LiDAR data. Remote Sensing of Environment, 94(4), 441-449. http://dx.doi.org/10.1016/j.rse.2004.10.013
Andersen, H.E., Breidenbach, J. 2007. Statistical properties of mean stand biomass estimators in a lidar-based double sampling forest survey design. In: ISPRS Workshop on Laser Scanning 2007 and SilviLaser, 2007. Espoo, Finland, September 12-14. pp. 8-13.
Baccini, A., Laporte, N., Goetz, S.J., Sun, M., Dong, H. 2008. A first map of tropical Africa’s aboveground biomass derived from satellite imagery. Environmental Research Letters, 3(4), 1-9. http://dx.doi.org/10.1088/1748-9326/3/4/045011
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and regression trees. New York: Chapman and Hall.
Breiman, L., 2001. Random Forests. Machine Learning, 45, 5-32. http://dx.doi.org/10.1023/A:1010933404324
Cao, L., Coops, N.C., Hermosilla, T., Innes, J., Dai, J., She, G. 2014. Using small-footprint discrete and full-waveform airborne LiDAR metrics to estimate total biomass and biomass components in subtropical forests. Remote Sensing, 6, 7110-7135.http://dx.doi.org/10.3390/rs6087110
De’Ath, G., Fabricius, K.E. 2013. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology, 81(11), 3178-3192. http://dx.doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2
Demšar, J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Reasearch, 7, 1-30.
Duong, H.V. 2010. Processing and application of ICESat large footprint full waveform laser range data. Ph.D. Thesis, Delft University of Technology, Netherlands.
Erdody, T.L., Moskal, L.M. 2010. Fusion of LiDAR and imagery for estimating forest canopy fuels. Remote Sensing of Environment, 114(4), 725-737. http://dx.doi.org/10.1016/j.rse.2009.11.002
Friedman, M. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675-701. http://dx.doi.org/10.1080/01621459.1937.10503522
Friedman, M. 1940. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86-92. http://dx.doi.org/10.1214/aoms/1177731944
García-Gutiérrez, J., González-Ferreiro, E., Mateos-García, D., Riquelme-Santos, J.C., Mirando, D. 2011. A comparative study between two regression methods on LiDAR data: A case study. Lecture Notes in Artificial Intelligence, 6679, 311-318. http://dx.doi.org/10.1007/978-3-642-21222-2_38
García-Gutiérrez, J., González-Ferreiro, E., Riquelme-Santos, J.C., Miranda, D., Diéguez-Aranda, U., Navarro-Cerrillo, R.M. 2013. Evolutionary feature selection to estimate forest stand variables using LiDAR. International Journal of Applied Earth Observation and Geoinformation, 26, 119-31.http://dx.doi.org/10.1016/j.jag.2013.06.005
Guyon, I., Elisseeff, A. 2003. An introduction to variables and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Hannon, L., Knapp, P. 2003. Reassessing nonlinearity in the urban disadvantage/violent crime relationship: an example of methodological bias from log transformation. Criminology, 41(4), 1427-1448. http://dx.doi.org/10.1111/j.1745-9125.2003.tb01026.x
Hermosilla, T., Ruiz, L.A., Kazakova, A.N., Coops, N.C., Moskal, L.M. 2014. Estimation of forest structure and canopy fuel parameters from smallfootprint full-waveform LiDAR data. International Journal of Wildland Fire, 23(2), 224-233. http://dx.doi.org/10.1071/WF13086
Hernández-Orallo, J., Ramírez, M.J., Ferri, C. 2004. Introducción a la minería de datos. Madrid: Pearson Educación S.A.
Hyyppä, J., Hyyppä, H., Inkinen, M., Engdahl, M., Linko, S., Zhu, Y-H. 2000. Accuracy comparison of various remote sensing data sources in the retrieval of forest stand attributes. Forest Ecology and Management, 128(1-2), 109-120. http://dx.doi.org/10.1016/S0378-1127(99)00278-9
Lefsky, M.A., Cohen, W.B., Acker, S.A., Parker, G.G., Spies, T.A., Harding, D. 1999. Lidar remote sensing of the canopy structure and biophysical properties of Douglas-fir western hemlock forests. Remote Sensing of Environment, 70(3), 339-361. http://dx.doi.org/10.1016/S0034-4257(99)00052-8
Li, L., Guo, Q., Tao, S., Kelly, M., Xu, G. 2015. Lidar with multi-temporal MODIS provide a means to upscale predictions of forest biomass. ISPRS Journal of Photogrammetry and Remote Sensing, 102, 198-208. http://dx.doi.org/10.1016/j.isprsjprs.2015.02.007
Luengo, J., García, S., Herrera, F. 2012. On the Choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems, 32(1), 77-108. http://dx.doi.org/10.1007/s10115-011-0424-2
Marabel-García, M., Álvarez-Taboada, F. 2014. Estimación de biomasa en herbáceas a partir de datos hiperespectrales, regresión PLS y la transformación continuum removal. Revista de Teledetección, 42, 49-59. http://dx.doi.org/10.4995/raet.2014.2286
Means, J.E., Acker, S.A., Fitt, B.J., Renslow, M., Emerson, L., Hendrix, C.J. 2000. Predicting forest stand characteristics with airborne scanning lidar. Photogrammetric Engineering & Remote Sensing, 66(11), 1367-1371.
Naesset, E., Bollandsas, O.M., Gobakken, T. 2005. Comparing regression methods in estimation of biophysical properties of forest stands from two different inventories using laser scanner data. Remote Sensing of Environment, 94(4), 541-553. http://dx.doi.org/10.1016/j.rse.2004.11.010
Nemenyi, P.B. 1963. Distribution-free multiple comparisons. Ph.D. Thesis, Princeton University, New Jersey, USA.
Posada, D., Buckley, T.R. 2004. Model selection and model averaging in Phylogenetics: advantages of Akaike Information Criterion and Bayesian Approaches over Likelihood Ratio tests. Systematic biology, 53(5), 793-808. http://dx.doi.org/10.1080/10635150490522304
Quinlan, J.R. 1992. Learning with continuous classes. Machine Learning, 92, 343-348.
Segal, M.R. 2004. Machine learning benchmarks and Random Forest regression. Technical report, Center for Bioinformatics & Molecular Biostatistics, University of California, San Francisco, USA.
Skowronski, N.S., Clark, K.L., Duveneck, M., Hom, J. 2011. Three-dimensional canopy fuel loading predicted using upward and downward sensing LiDAR systems. Remote Sensing of Environment, 115(2), 703-714. http://dx.doi.org/10.1016/j.rse.2010.10.012
Standish, J.T., Manning, G.H., Demaershalk, J.P. 1985. Development of biomass equations for British Columbia tree species. Canadian Forestry Service, Pacific Forest Research Center, Information Report BC-X-264, Victoria, BC, Canada.
Schwarz, G. 1978. Estimating the dimension of a model. The annals of statistics, 6(2), 461-464. http://dx.doi.org/10.1214/aos/1176344136
Temesgen, H., Strunk, J., Andersen, H., Flewelling, J. 2015. Evaluating different models to predict biomass increment from multi-temporal lidar sampling and remeasured field inventory data in south-central Alaska. Mathematical and computational forestry and natural resource sciences, 7(2), 66-80.
Wang, L., Zhou, X-H., 2005. A fully nonparametric diagnostic test for homogeneity of variances. The Canadian Journal of Statistics, 33(4), 545-558. http://dx.doi.org/10.1002/cjs.5550330406
Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83. http://dx.doi.org/10.2307/3001968
Zar, J.H., 1999. Biostatistical analysis. Upper Saddle River, New Jersey: Prentice Hall.
Downloads
Published
Issue
Section
License
This journal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International