Thematic vocabulary selection for didactic purposes: evaluation of a quantitative approach

Jasper Degraeuwe, Patrick Goethals

Abstract

The aim of this study is to evaluate the results of a quantitative approach to the thematic selection of vocabulary for didactic purposes. We describe in detail how three quantitative measures (absolute frequency, keyness and dispersion) are configured and combined to automate the selection of specific vocabulary from a specialized corpus. We then evaluate whether the automatic selection is confirmed by the judgements of SFL teachers. The results of this evaluation experiment show that in more than 85% of the cases the output of the quantitative selection method is accepted by at least half of the teachers. This observation is also backed from a statistical angle, with the outcome of an interrater reliability test indicating that there is a substantial agreement (Cohen’s kappa = 0.61) between the judgements of the teachers and the automatic selection.


Keywords

corpus linguistics; vocabulary learning; automatic vocabulary selection; thematic vocabulary selection; absolute frequency; keyness; dispersion; Spanish as a foreign language (SFL)

Full Text:

PDF

References

Biber, D., Connor, U. y Upton, T. A. (2007). Discourse on the move: using corpus analysis to describe discourse structure. Ámsterdam: John Benjamins. https://doi.org/10.1075/scl.28

Boulton, A. (2017). "Data-Driven Learning and Language Pedagogy", en S. L. Thorne & S. May (eds.), Language, Education and Technology, Encyclopedia of Language and Education. Berlín & Heidelberg: Springer International Publishing, 181-192. https://doi.org/10.1007/978-3-319-02237-6_15

Bowker, L. y Pearson, J. (2002). Working with specialized language: a practical guide to using corpora. Londres & Nueva York: Routledge. https://doi.org/10.4324/9780203469255

Buyse, K., Delbecque, N. y Speelman, D. (2004). Portavoces. Thematische woordenschat Spaans. Malinas: Wolters Plantyn.

Davies, M. (2006). A frequency dictionary of Spanish: Core vocabulary for learners. Nueva York: Routledge. https://doi.org/10.4324/9780203415009

Gabrielatos, C. y Marchi, A. (2011). "Keyness: Matching metrics to definitions" (Contribución presentada en the Corpus Linguistics in the South), Portsmouth, NH.

García Salido, M. y Alonso Ramos, M. (2018). "Asignación de niveles de aprendizaje a las colocaciones del Diccionario de Colocaciones del español", Revista signos, 51/97, 153-174. https://doi.org/10.4067/S0718-09342018000200153

Goethals, P. (2018). "Customizing vocabulary learning for advanced learners of Spanish", en T. Read, B. Sedano Cuevas y S. Montaner-Villalba (Eds.), Technological innovation for specialized linguistic domains (pp. 229- 240). Berlin: Éditions Universitaires Européennes.

Goethals, P., Tezcan, A. y Degraeuwe, J. (2019). "Vocabulary selection for didactic purposes: report on a machine learning approach". Argentinian Journal of Applied Linguistics, 7/2, 34-51.

Goethals, P., Lefever, E. y Macken, L. (2017). "SCAP_tur: Tagging and lemmatising Spanish tourism discourse, and beyond". Ibérica, 33, 279-288.

Gries, S. T. (2008). "Dispersions and adjusted frequencies in corpora", International Journal of Corpus Linguistics, 13, 403-437. https://doi.org/10.1075/ijcl.13.4.02gri

Izquierdo Gil, M. d. C. (2005). La selección de léxico en la enseñanza del español como lengua extranjera. Su aplicación al nivel elemental en estudiantes francófonos. Málaga: ASELE Colección Monografías.

Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Sage, California: Thousand Oaks.

Landis, J.R. y Koch, G.G. (1977). "The measurement of observer agreement for categorical data", Biometrics, 33, 159-174. https://doi.org/10.2307/2529310

Laufer, B., Meara, P. y Nation, P. (2005). "Ten best ideas for teaching vocabulary", The Language Teacher, 29/7, 36.

Nation, P. (2016). Making and Using Word Lists for Language Learning and Testing. John Benjamins. https://doi.org/10.1075/z.208

Oakes, M. P. y Farrow, M. (2007). "Use of the chi-squared test to examine vocabulary differences in English-language corpora representing seven different countries", Literary and Linguistic Computing, 22/1, 85100. https://doi.org/10.1093/llc/fql044

Okamoto, M. (2015). "Is corpus word frequency a good yardstick for selecting words to teach? Threshold levels for vocabulary selection", System, 51, 1-10. https://doi.org/10.1016/j.system.2015.03.004

Schmitt, N. (2008). "Review article: Instructed second language vocabulary learning", Language Teaching Research, 12/3, 329-363. https://doi.org/10.1177/1362168808089921

Scott, M. (1996). WordSmith Tools Manual. Oxford: Oxford University Press.

Scott, M. (1997). "PC analysis of key words - and key key words", System, 25/2, 233-245. https://doi.org/10.1016/S0346-251X(97)00011-0

Sinclair, J. (2005). "Corpus and texts - Basic principles", en M. Wynne (ed.) Developing linguistic corpora: a guide to good practice. Oxford & Oakville: Oxbow Books, 116.

Vincze, O. (2015). "Learning multiword expressions from corpora and dictionaries" (tesis de doctorado), Universidade Da Coruña.

Zijlstra, W.P., van der Ark, A. y Sijtsma, K. (2007). "Outlier Detection in Test and Questionnaire Data". Multivariate Behavioral Research, 42/3, 531-555. https://doi.org/10.1080/00273170701384340

Abstract Views

435
Metrics Loading ...

Metrics powered by PLOS ALM




This journal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Universitat Politècnica de València

e-ISSN: 1886-6298    https://doi.org/10.4995/rlyla