Giving Students the Tools: Looking at Teaching and Learning using Corpora




Big data, Computer-Assisted Language Learning (CALL), teaching and language corpora (TALC), language learning, L2 pedagogy


This article discusses a pilot project aimed at giving tertiary students a wider repertoire of resources to use in language learning, with a particular focus on Italian. This project responds to the exponential increase in and access to online data and the potential value such data represent for students studying additional languages at tertiary level. By examining whether current language students are aware of online resources, such as linguistic corpora and other potential applications of big data, we aim to provide an insight into the possible uses of corpus-assisted learning in the language classroom. In this paper, we detail a project undertaken in 2017 with undergraduate students of Italian in a major metropolitan university. Our project directed students to complete a translation task using corpora-based resources and assessed their experience through a post-assessment survey. Subsequently, we present our initial findings in relation to the possibilities of a corpus-based approach to language teaching and learning. While today’s students are already predisposed to relying on online resources as part of their language studies, our results suggest students are not aware of emerging online resources such as corpora. Moreover, even when these resources are presented to students, the complex nature of the software programs used to interrogate corpora often results in their underutilisation.


Download data is not yet available.


Abrams, Z., & Schiesti, S.B. (2017). Using Authentic Materials to Teach Varieties of German: Reflections on a Pedagogical Experiment. Unterrichtspraxis/Teaching German, 50(2), 136–150.

Backus, A. (2008). Data Banks and Corpora. In Wei, L. and Moyer, M.G. (Eds.), The Blackwell Guide to Research Methods in Bilingualism and Multilingualism (pp. 232–248). Wiley-Blackwell.

Boulton, A. (2010). Data-driven learning: Taking the computer out of the equation. Language Learning, 60(3), 534–572.

Boulton, A. (2017). Data-driven learning and language pedagogy. In Thorne, S. & May, S. (Eds.), Language, Education and Technology: Encyclopedia of Language and Education (pp. 1–12). Springer.

Braun, S. (2006). ELISA—A pedagogically enriched corpus for language learning purposes. In Braun, S., Kohn, K. & Mukherjee, J. (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods (pp. 25–47). Peter Lang.

Feldstein, M. (2013). Why Big Data (Mostly) Can’t Help Improve Teaching. E-Literate,

Flowerdew, L. (2015). Data-driven learning and language learning theories. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 15–36). John Benjamins.

Gandomi, A. & Haider, M. (2015). Beyond the hype: Big data concepts, methods and analytics. International Journal of Information Management, 35(2), 137–44.

Godwin-Jones, R. (2017a). Data-informed language learning. Language Learning & Technology, 21(3), 9–27.

Godwin-Jones, R. (2017b). Scaling Up and Zooming In: Big Data and Personalization in Language Learning. Language Learning & Technology, 21(1), 4–15.

Han, S., & Shin, J. (2017). Teaching Google search techniques in an L2 academic writing context. Language Learning and Technology, 21(3), 172–196.

Jarvis, H. & Krashen, S. (2014). Is CALL Obsolete? Language Acquisition and Language Learning Revisited in a Digital Age. TESL-EJ, 17(4), 1–6.

Kei Daniel, B. (Ed.). (2017). Big Data and Learning Analytics in Higher Education: Current Theory and Practice. Springer International.

Kennedy, C. & Miceli, T. (2010). Corpus-assisted creative writing: Introducing intermediate Italian learners to a corpus as a reference resource. Language Learning & Technology, 14(1), 28–44.

Leńko-Szymańska, A. (2017). Training Teachers in data-driven learning: Tackling the challenge. Language Learning and Technology, 21(3), 217–241.

Li, S. (2017). Using Corpora to Develop Learner’s Collocational Competence. Language Learning & Technology, 21(3), 153–171.

Liu, D. & Jiang, P. (2009). Using a corpus-based lexicogrammatical approach to grammar instruction in EFL and ESL contexts. Modern Language Journal, 93(1), 61–78.

Liu, V. & Curran, J.R. (2005). Web Text Corpus for Natural Language Processing. Association for Computational Linguistics, 11th Conference of the European Chapter of the Association for Computational Linguistics, (April 2005), 233–240.

MacWhinney, B. (2019). Understanding Spoken Language through TalkBank. Behaviour Research Methods, 51, 1919–2197.

National Academy of Education. (2016). Big Data in Education: Balancing the Benefits of Educational Research and Student Privacy.

Panichi, L. (2015). The employment of Social Networking for Language Learning and Teaching: Insights and Issues. In R. Hernández & P. Rankin (Eds.), Higher education and second language learning: promoting self-directed learning in new technologies and education contexts (pp. 159–180). Peter Lang.

Peters, J. (2016). How Big Data Can Improve Student Performance and Learning Approaches. Dataconomy

Potter, J. (2002). Two kinds of natural. Discourse Studies, 4, 539–542.

Römer, U. (2015). Corpus research and practice; What help do teachers need and what can we offer? In K. Aijmer (Ed.) Corpora and language teaching (pp. 83–98). John Benjamins.

Tribble, C. (2015). Teaching and language corpora: Perspective from a personal journey. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 37–62). John Benjamins.

Wang, Y. (2016). Big Opportunities and Big Concerns of Big Data in Education. TechTrends: Linking Research & Practice to Improve Learning, 60(4), 381–384.

Zanettin, F. (2009). Corpora-based Translation Activities for Language Learners. The Interpreter and Translator Trainer, 3(2), 209–224.






Research papers