Content Adaptation for Language Learning: A Hybrid AI Approach
Submitted: 05/07/2025
|Accepted: 12/23/2025
|Published: 12/26/2025
Copyright (c) 2025 Jatin Arora, Irina Elgort, Junhong Zhao

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Downloads
Keywords:
Large Language Model, Artificial Intelligence, comprehensible input, simplification, second language learning
Supporting agencies:
Abstract:
In learning a foreign language, access to comprehensible input is a critical success factor. However, at early stages, when learners are still below an intermediate-proficiency level, finding level-appropriate and engaging materials is highly problematic. Although the Internet abounds in text and multimedia materials in many languages, most of them are too difficult to be useful for lower-proficiency language learners. The present project aimed to establish whether the affordances of large language models (LLMs) can be harnessed to turn authentic audio, video, and text materials into comprehensible input for independent elementary-level language learners. The present article reports on the outcomes of a research and development project that adopts a hybrid approach to simplifying authentic materials, combining affordances of LLMs with careful prompt engineering and rule-based refinement. The article details the hybrid sequential pipeline system and the results of two rounds of evaluation: language teacher ratings and automated text analysis indices. Based on the outcome of these evaluations, it is concluded that the proposed approach can provide an efficient way of simplifying authentic content for and by lower-proficiency language learners. Directions for future research and development are also proposed.
References:
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1-22. https://doi.org/10.1093/applin/amt018
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–90. https://doi.org/10.3758/BRM.41.4.977
Cobb, T. (n.d.). Compleat Web VP v.2.6 [computer program]. Lextutor: Vocabulary profiler. Retrieved from https://www.lextutor.ca Accessed 10 Jan 2025 at https://www.lextutor.ca/vp/comp.
Crossley, S. A. (2024). Developing Linguistic Constructs of Text Readability Using Natural Language Processing. Scientific Studies of Reading, 29(2), 138–160. https://doi.org/10.1080/10888438.2024.2422365
Crossley, S. A., Allen, D. B., & McNamara, D. S. (2011). Text simplification and comprehensive reading: Effects of text modification on lexical processing and comprehension. Journal of Educational Psychology, 103(1), 90–105.
Crossley, S. A., & McNamara, D. S. (2016). Text simplification and text cohesion: The role of connectives and anaphoric references. Discourse Processes, 53(7), 524–546.
Crossley, S. A., Louwerse, M.M., McCarthy, P.M., & McNamara, D.S. (2007), A Linguistic Analysis of Simplified and Authentic Texts. The Modern Language Journal, 91, 15-30. https://doi.org/10.1111/j.1540-4781.2007.00507.x
Day, R. (2002). Top ten principles for teaching extensive reading. Reading in a Foreign Language, 14(2), 136-141. https://doi.org/10.64152/10125/66761
Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational Research Bulletin, 27(1), 11–20.
Davies, M. (2008-) Word frequency data from The Corpus of Contemporary American English (COCA). Data available online at https://www.wordfrequency.info.
Developers, F. (2024). Flask web framework. Retrieved from https://flask.palletsprojects.com.
Developers, P. (2024). Pandas: Data analysis library. Retrieved from https://pandas.pydata.org
Dupuy, B. C., (1999). Narrow Listening: an alternative way to develop and enhance listening comprehension in students of French as a foreign language. System, 27(3), 351-361. https://doi.org/10.1016/S0346-251X(99)00030-5
Durbahn, M., Rodgers, M., & Peters, E. (2020). The relationship between vocabulary and viewing comprehension. System, 88, 102166. https://doi.org/10.1016/j.system.2019.102166
Ellis, N. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition, SSLA, 24, 143-188. doi:10.1017/S0272263102002024
Gagen-Lanning, K. (2015). The effects of metacognitive strategy training on ESL learners’ self-directed use of TED Talk videos for second language listening. (Unpublished Master Thesis). Iowa State University.
Gimeno-Sanz, A. (2002). Principles in CALL software design and implementation. International Journal of English Studies, 2(1), 109–128. Retrieved from https://revistas.um.es/ijes/article/view/48511
In’nami, Y., Koizumi, R., Jeon, E. H., & Arai, Y. (2022). Chapter 8. L2 listening and its correlates: A meta-analysis. In Understanding L2 Proficiency: Theoretical and meta-analytic investigations (pp. 235-283). John Benjamins Publishing Company.
Jeon, E. H., & Yamashita, J. (2022). Chapter 3. L2 reading comprehension and its correlates: An updated meta-analysis. In Understanding L2 proficiency: Theoretical and meta-analytic investigations (pp. 29-86). John Benjamins Publishing Company.
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease Formula) for Navy enlisted personnel (Tech. Rep. 8-75). U.S. Navy Research Branch. https://doi.org/10.21236/ADA006655
Krashen, S. (1985). The input hypothesis. Longman.
Krashen, S. (2004). The case for Narrow Reading. Language Magazine 3(5), 17-19, http://www.sdkrashen.com/content/articles/narrow.pdf.
Larsen-Freeman, D., & Long, M. (1991). An introduction to second language acquisition research. Longman.
Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 10(1), 15-30. http://hdl.handle.net/10125/66648
Levy, M., & Stockwell, G. (2006). CALL dimensions: Options and issues in computer assisted language learning. Lawrence Erlbaum Associates.
Liu, F., Jiang, Y., Lai, C., & Jin, T. (2024). Teacher engagement with automated text simplification for differentiated instruction. Language Learning & Technology, 28(2), 163–182. https://doi.org/10.64152/10125/73576
Ma, Q., Crosthwaite, P., Sun, D., & Zou, D. (2024). Exploring ChatGPT literacy in language education: A global perspective and comprehensive approach. Computers and education: Artificial intelligence, 7, 100278.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. ICLR Workshop. https://arxiv.org/abs/1301.3781
Nation, I. S. P. (2022). Learning vocabulary in another language (3rd ed.). Cambridge University Press. https://doi.org/10.1017/9781009093873
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian modern language review, 63(1), 59-82. https://doi.org/10.3138/cmlr.63.1.59
Nation, I. S. P. (2016) Making and Using Word Lists for Language Learning and Teaching. John Benjamins, Amsterdam. https://doi.org/10.1075/z.208
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta‐analysis. Language learning, 50(3), 417-528. https://doi.org/10.1111/0023-8333.00136
OpenAI. (2024a). ChatGPT-4o model. Retrieved from https://openai.com
OpenAI. (2024b). GPT and Whisper API documentation. Retrieved from https://platform.openai.com/docs/
OpenAI. (2024c). TTS for simplified text conversion. Retrieved from https://openai.com/tts
OpenAI. (2024d). Whisper speech-to-text model. Retrieved from https://openai.com/whisper
Rets, I., Astruc, L., Coughlan, T., & Stickler, U. (2022). Approaches to simplifying academic texts in English: English teachers’ views and practices. English for Specific Purposes, 68, 31–46. https://doi.org/10.1016/j.esp.2022.03.003
Rodgers, M.P.H., & Webb, S. (2011), Narrow Viewing: The Vocabulary in Related Television Programs. TESOL Quarterly, 45: 689-717. https://doi.org/10.5054/tq.2011.268062
Van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension? Applied Linguistics, 34(4), 457–479. https://doi.org/10.1093/applin/ams074
Wu, Chia-Pei. (2020). Implementing TED Talks as Authentic Videos to Improve Taiwanese Students’ Listening Comprehension in English Language Learning. Arab World English Journal (AWEJ) Special Issue on CALL (6). 24-37. https://doi.org/10.24093/awej/call6.2
Hu, M., & Nation, I. S. P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430. https://nflrc.hawaii.edu/rfl/item/43 , https://doi.org/10.64152/10125/66973
Young, D. N. (1999). Linguistic simplification of SL reading material: Effective instructional practice? The Modern Language Journal, 83(3), 350-366. https://doi.org/10.1111/0026-7902.00027


