EUROCALL: European Association for Computer Assisted Language Learning

Sustainability in CALL Learning Environments: A Systemic Functional Grammar Approach

Peter McDonald
J.F. Oberlin University, Tokyo (Japan)


This research aims to define a sustainable resource in Computer-Assisted Language Learning (CALL). In order for a CALL resource to be sustainable it must work within existing educational curricula. This feature is a necessary prerequisite of sustainability because, despite the potential for educational change that digitalization has offered since the nineteen nineties, curricula in traditional educational institutions have not fundamentally changed, even as we move from a pre-digital society towards a digital society. Curricula have failed to incorporate CALL resources because no agreed-upon pedagogical language enables teachers to discuss CALL classroom practices. Systemic Functional Grammar (SFG) can help to provide this language and bridge the gap between the needs of the curriculum and the potentiality of CALL-based resources. This paper will outline how SFG principles can be used to create a pedagogical language for CALL and it will give practical examples of how this language can be used to create sustainable resources in classroom contexts.

Keywords: CALL, Multimodality, Systemic Functional Grammar, Sustainability, Curriculum innovation.


1. Introduction

The rapid development of ubiquitous technologies provides new opportunities for learners to express themselves. In the pre-digital age, learners could only express themselves through written text (the written mode) or spoken text (the oral mode). By contrast, learners in a computer-assisted environment can quickly combine these two modes and even add visuals or sound. These new multi-modal texts are already widely used in courses that use Moodle/Blackboard, computer mediated communications (CMC) such as posts or blogs, and digitally created compositions such as PowerPoint or video presentations.

Nevertheless, new pedagogical opportunities create new pedagogical challenges. Despite the successful additions introduced by computer-assisted language learning (CALL) to established curricula, overall, the curricula that educational establishments deliver to students have fundamentally remained the same. Curricula, to a large extent, are still based on old technologies (paper, pens, and textbooks) and traditional classroom methodologies. At present, classrooms still follow the 19th century industrial model, in which a large group of students sit at separate desks while a teacher delivers a pre-prescribed, traditional curriculum (Collins & Halverson, 2009). Owing to the scarcity of computer labs in such teaching contexts, CALL is often limited to a set number of classes each term.

Curricula are difficult to change because of “situational constraints” (Cuban, 2001) that educational institutions encounter. Educational institutions have developed complex systems (budget, size, number of stakeholders, inbuilt working practices, and so on), which makes it difficult for them to adapt to change effectively (Kennedy, 2013). Likewise, teachers are constrained. Considering that teachers must successfully meet the needs of all institutional stakeholders, introducing innovation in the classroom may be difficult for them. Teachers must necessarily depend on shared and established educational knowledge.

Such a shared pedagogical knowledge, the foundation upon which curricula are developed, does not exist for CALL. Therefore, without this shared knowledge, CALL resources cannot be integrated into the existing curricula. The current study supports the view (Kress & Van Leeuwen, 2006; Royce, 2002) that systemic functional grammar (SFG) can help provide this missing pedagogical framework as it provides teachers with a multi-modal meta-language that can work alongside existing pedagogical meta-languages. Equipped with a multi-modal pedagogical language, teachers can create “sustainable CALL resources.” A sustainable CALL resource, as defined in this paper, can: 1) work alongside existing classroom resources that have been established by teachers to meet the needs of the curriculum, and 2) help to prepare students for using the new and exciting affordances that digital technology offers.

2. Expanding existing pedagogical languages to create sustainable CALL resources

2.1. The role of pedagogical languages in creating sustainable resources in the curricula

In the pre-digital society dominated by the printed-word pedagogical languages, from the traditional grammar of Latin and Greek to the modern approaches such as Consciousness Raising (C/R), have always played a role in creating classroom resources. For example, in teaching reading or listening, standard classroom activities (textual comparisons, assessments, examination of writers’ textual choices, and so on) are possible because the underlying textual relationships in written/spoken texts can be explicitly expressed. Thus, the various methods of linguistic description that already exist for written/spoken texts, whether traditional grammar or modern communicative approaches such as discourse analysis or pragmatics, assist the teaching of these texts in the classroom.

The pervasiveness of digital media in our society, however, is changing the nature of text, and thus, the established methods of linguistic description that teachers use in the classroom are no longer sufficient for teaching modern texts. In the past, texts predominantly utilized the alphabet to send their messages; by contrast, digital devices of today combine words, visuals, and audio to create multi-modal texts where the traditional linear structure of reading left to write on a page is challenged by a visual hyperactive reading path that follows the rules of visual design, as well as the rules of the written language (Kress, 2003, pp. 35–60).

Furthermore, digitalization is significantly changing the traditional written text itself. In a modern digital society in which the means of production have shifted and writers are now assuming many of the publication tasks that were once considered specialized, written texts that vary from traditional academic essays to modern tweets and blogs incorporate a wide array of visual features, such as bullet points, tables, and emoticons. Another change that the digital revolution has helped to introduce is the pedagogical acceptance of popular media texts that incorporate multi-modal relations, such as comic books, videos, or computer games. These texts, which may not have been considered useful in the classroom 30 years ago, are now being recommended as essential additions to the modern curricula (Hagood, 2008).

Considering these fundamental changes in the nature of texts, our existing methods of linguistic description must be updated so that teachers can talk about multi-modal texts explicitly in the classroom, in the same way as they can currently talk about traditional printed word or spoken texts in the classroom. Indeed, creating a multi-modal pedagogical language is important because research suggests that multi-modal texts are more complex than has been accounted for in existing classroom approaches.

2.2. The complexity of multi-modal texts

Although multi-modal texts may appear to have simple means of presenting information, the textual relationships underlying such texts may be complex. In order to effectively comprehend a multi-modal text, the reader, viewer, or listener must engage in “parallel processing” (Luke, 2003, p. 399). In this type of processing, the receiver must initially (and perhaps unconsciously) decode different semiotic systems, including the spatial system of design to decode the images and the linear system of the alphabetic text to decode the words, and then interpret how the systems combine to deliver a singular meaning. Moreover, Unsworth (2008, p. 378) reports the effect of “naturalization,” in which these complex underlying semiotic relationships can be hidden by multi-modal writers to create cohesive texts. In the context of Teaching English as a Second Language (TESOL), parallel processing and naturalization are extremely complex because learners not only have to process different modes, but also have to translate these modes from their second language (L2) into their first language (L1).

TESOL research has shown that two multi-modal relations, defined in SFG research as concordance and complementation, can produce complex effects on comprehension. Lui’s (2004) research on L2 multi-modal comprehension suggests that images only support comprehension when the graphic text clearly reiterates the same information as the written text. In SFG grammar research, this relationship is defined as a graphic/alphabetic text relationship of “concurrence” (Unsworth, 2008). Positive support occurs in concurrent relationships when the students’ proficiency level is just below the level of the alphabetic text. In this case, students can use the images to infer the meaning of the words. However, concurrent text relationships may also result in redundancy when the students’ proficiency level is above the level of the alphabetic text. In this case, the students do not need the graphic text to infer the meaning of the words (Lui, 2004).

Other negative effects on comprehension can be observed in multi-modal relationships of “complementation” (Unsworth, 2008). In a relationship of complementation, the graphic text and the alphabetic text contain closely related information that augment, rather than reiterate, each other in some way. Relationships of complementation can result in incomprehension or miscomprehension when the students’ proficiency is lower than the words in the text. Incomprehension occurs when the lack of textual integration prevents students from using the graphic text to infer the meaning of the words, which renders them unable to understand the text. Miscomprehension occurs when students make the wrong assumption about the graphic/alphabetic text relationship. They assume that the graphic text reiterates the information in the written text; that is, the graphic text can support the words. However, the lack of harmony between the alphabetic and graphic text clues creates difficulties in processing. The students then make wrong inferences about the text. Thus, the graphic text hinders the comprehension of the written text (Lui, 2014).

Therefore, a multi-modal pedagogical language can allow for teachers and students to decode complex semiotic relationships in a meaningful way that can be applied to their existing teaching contexts, as will be demonstrated in Parts 3 and 4. A multi-modal pedagogical language can therefore help teachers create sustainable CALL resources. This process can be assisted by SFG, given that it is a communicative approach to language learning (Halliday & Matthiessen, 2004) and it can therefore work alongside established classroom approaches to language and learning such as C/R. In C/R language users work with the language in use, making a series of assumptions about the language, rules of thumb, which can be adjusted to suit the needs of the communicative situation (Rutherford, 1987).

3. The SFG theoretical model for creating sustainable CALL resources

3.1. SFG and reading image-based multi-modal texts

The Kress and Van Leeuwen (2006) model for analyzing visual texts serves as the theoretical basis of the ideas presented in this paper. This paper aims to demonstrate that the semiotic model of language can be used in the classroom in a practical and simple manner. Therefore, this study will not provide a full account of the model, rather it will only focus on the use of the Kress and Van Leeuwen SFG model to create sustainable resources in classroom contexts, as outlined in Section 4.

3.2.1. The compositional function

In alphabetic texts, Fries (1994, p. 230) points out that the placement of clauses in a written text determines the importance of the information placed within the clause. This concept is also true of images: the placement of elements in an image such as a picture or a web page determines the visual importance of the elements. This study will focus on two compositional elements, namely, framing and salience. Framing refers to the way elements (image elements include words, pictures, hyperlinks, and others) are connected or disconnected through frame lines. Salience refers to the prominence ascribed to one image element over another by varying an image’s size, color, contrast, and by choosing to place elements at the top, bottom, center, or margins of a picture.

3.2.2. The representational function: narrative images versus concept images

An image can represent two things to the viewer, a narrative event or a concept. Artists create narrative events in images by joining the participants (people, animals, objects, and so on) together with an imaginary line called a ‘vector’ (Kress & Van Leeuwen, 2006, p. 59). Figure 1 shows an excerpt from Macbeth: The Graphic Novel (McDonald, Haward, Dobbin & Erskine, 2008, p.8), where panels 1 and 5 are narrative images. The fire is the vector. The witches' attention is focused on the fire, and the fire is connected to the witches by framing, salience, and color. This communicates to the reader that the main action of the image is centered on the witches and the fire. In concept images, the participants are not represented in action; that is, no vector joins them. By contrast, the participants are represented in a fixed state of being, such as a portrait painting. In Figure 1, panels 2, 3, and 4 are all concept pictures. In these panels, the witches’ faces are given salience through a close-up view staring in the direction of the viewer, as in a portrait.

3.2.3. The interpersonal function: offer images versus demand

Images can interact with viewers in two ways: by offering information or by demanding attention. A simple way to understand the difference between offer and demand is to make a comparison between teaching a lesson in front of a group of students with having a face-to-face conversation with one student. In the classroom setting, the speaker (the teacher) is offering information to the class; the speaker is at a distance from the class; and the students can choose between listening to the speaker and thinking about other things that are unrelated to the lesson. By contrast, a face-to-face situation requires that the participants demand attention from one another; that is, they need to directly focus on what is being said.

Referring again to the series of Macbeth panels, we see that Panel 1 and 5 are offering information to the viewer. The viewer is asked to observe the scene from a distance, as well as choose their own reading path: the viewer can begin with the text boxes, the background image, or the main image of the witches cooking a spell in a cauldron. Alternatively, Panel 2, 3, and 4 demand attention from the viewer. They are confronted with a talking head image and are asked to focus directly on the words.

Figure 1

Figure 1: Macbeth panels (Used with permission © Classical Comics Ltd.).

3.3. Classifying images into types

Once teachers and students have a working knowledge of these underlying principles, they can begin to classify images into types by asking a series of C/R questions, as provided in Table 1 below. Thus, applying the questions to the Macbeth texts, we see that Panel 1 and 4 are narrative/offer pictures, in which the illustrators/writers ask viewers to observe events. Meanwhile, Panel 2, 3, and 4 are demand/concept pictures. The illustrators present an idea, not an action, and demand an emotional response from the viewer. However, as mentioned above, multi-modal text relations can create very complex texts. Therefore, as suggested by the underlying principles of C/R, teachers would be dealing with rules of thumb, rather than proscribed rules, when applying these theories to the classroom.

The Compositional Function

  1. Which elements are most salient in the image? (How is this salience created? Is it created through placement, color, or contrast?)
  2. How are the elements of the image framed? (Are the participants/elements joined together?)

The Narrative Function

  1. Is the image representing a narrative? (Does the image portray an event? Does it have a vector?)
  2. Is the image representing a concept? (Are the participants not joined in action together? Are they staring at the viewer or into the distance? Does the image portray an idea rather than an event?)

The Interactive Function

  1. Is the image interacting with the viewer by offering information to the viewer?
  2. Is the picture interacting with the viewer by demanding attention from the viewer?

Table 1. C/R questions for classifying image-based texts.

4. Creating sustainable CALL resources: applying the SFG grammar model to the classroom

4.1. Sustainable CALL resource 1: converting written compositions to multi-modal compositions

This multi-modal task is relevant to beginner writing classes in which students are taught how to write five-paragraph essays using the meta-language of topic sentences, supporting sentences, concluding sentences, and the function of each type of sentence. For example, topic sentences represent general ideas whereas supporting sentences provide details about the topic using examples or explanations. In the established curricula activities in my teaching institution, students use such classifications to create their own original paragraphs and deconstruct teacher-created paragraphs. Figure 1 (Appendix) shows a comparative paragraph, which is a teacher-created example of a written text that students are expected to follow. Figure 2 (Appendix 1) shows the deconstructed paragraph using the traditional classroom meta-language, which is a common classroom activity.

In the multi-modal activity, students convert the written paragraph to a multi-modal presentation using presentation software. Students re-read their paragraphs, select the key words that communicate the overall ideas, perform a Google image search to find a supporting image that helps to visually communicate the main ideas to the listener, and rewrite the sentences to adapt them to the spoken mode, if necessary. A teacher-created example of a presentation is provided in Figure 4 of the Appendix.

As shown in the examples, the SFG model’s classification of images into types, using the compositional, representational, and interactive functions, provides teachers with the pedagogical tools to outline clear organizational patterns for students when they construct their classroom presentations. In this example, students are encouraged to follow a general-specific textual pattern. Thus, the information represented by the topic sentence and the concluding sentence (Appendix, Figures 1 and 2, sentences 1 and 5) that comprise the general ideas in the written paragraph are represented by concept/offer images (Appendix, Figures 3 and 4, slides 1 and 5), whereas supporting sentences that contain examples (Appendix 1, Figures 1 and 2, sentences 2, 3, and 4) are all represented by concept/demand images (Appendix 1, Figures 3 and 4, slides 2, 3, and 4). Moreover, knowledge of visual/verbal text relations of concurrence and complementation enables teachers to provide clear advice to students. In this L2 setting students are encouraged to use simple slides with strong relations of concurrence and simple relationships of complementation to facilitate comprehension (Figure 3 column 4).

Therefore, the SFG model creates sustainability because new multi-modal skills can be taught alongside the established curricula. For example, naturalization, identified in Section 2 as a multi-modal skill, is already being taught in writing curricula through textual patterns. Research shows that deconstructing textual patterns, such as cause-effect, compare-contrast, problem-solution, has a positive effect on language learning (Hoey, 2001). Likewise, teaching the way texts are designed across different genres is now part of our established writing curricula (De Voss, 2010) and, interestingly, Computer-Mediated Communication (CMC) is now being defined as a separate genre that can be taught in the classroom (Marchand, 2013). The SFG model provides teachers and students with a multi-modal framework that can be used to unpack and teach naturalized components that comprise multi-modal texts in the same manner that they can unpack alphabetic texts.

From a visual perspective, unpacking the text in this manner makes it possible to introduce students to ‘image juxtaposition’ (McCloud, 1994). Image juxtaposition involves combining different types of images together in a sequence to create meaning. The process is another example of naturalization that is effectively utilized in multi-modal texts, but is often overlooked by the untrained eye. In this task, students can see how images can be juxtaposed in way that creates a clear textual pattern, as outlined above. Similar multi-modal conversion tasks can easily be created for the other textual patterns that are taught in the curricula. For example, non-computerized texts such as the Macbeth text (Figure 1) can be used to introduce students to image juxtaposition in traditional classroom settings. In this narrative text, the writers/illustrators use narrative/offer images to set the scene and portray the event (the spell being cast) juxtaposed with three concept/demand pictures, not only to attach the reader to the witches who are casting the spell emotionally, but also to provide salience to the spoken text that cataphorically points the reader to a key event (their future meeting with Macbeth).

Finally, converting the written text to a multi-modal text reinforces established curricula skills, such as peer review, revision, and rewriting, in a creative and engaging manner. Hence, students must reflect on their individual writing, discuss it with peers, and evaluate its meaning, clarity, and communicative competence. Students must re-read, edit, rewrite, and summarize as they convert the written mode to the visual/verbal mode, which is then converted back to the written mode. Moreover, learner autonomy is encouraged through the navigation of presentation software and the use of search engines in English.

4.2. Sustainable CALL resource 2: making multi-modal comparisons

This multi-modal task is designed for reading and writing curricula in which students learn how to work with different genres from different discourse communities. In this task, students perform the standard curriculum task of making textual comparisons between two texts of different registers, which is a task that is usually done with traditional written texts. The SFG model, however, as outlined above, allows for the expansion of the standard curricula comparison activity to include multi-modal texts.

The task is composed of two parts. In part one, students compare and contrast the homepages of BBC (traditionally considered as a politically neutral creator of serious news) with the Daily Mail (traditionally considered as a creator of popular, right-leaning news). In part two, students create a multi-modal report. Selfe (2004) has many examples of different types of multi-modal reports that could be used. To perform the task, students use the multi-modal C/R questions in Table 1 combined with the multi-modal C/R questions in Table 2.

    Homepage Composition
  1. Which elements on the home pages are most salient?
  2. How do the sites use color, framing, and placement of images —top to bottom, left to right?
  3. What kind of participants occupies the most salient positions on the page?
  4. Headlines and Images
  5. What types of images are used on the homepage?
  6. What kind of language is used in the headline text?
  7. Can you identify any relationships of concurrence or complementation between the headline texts and the image texts? If so, why do you think the writers created this relationship?
  8. Body Texts, Headlines and Images
  9. Can you identify any relationships of concurrence and complementation between the main texts/images/ and the headlines?
  10. If so, why do you think the writers created these relationships?
  11. Homepages and Political Stance
  12. Do you think the political leanings of the news sites are noticeable through the image texts? In the headlines? In the body texts? If so, can you identify the elements that communicate the political stance? If not, why not?
  13. Choose two news sites in your L1 that take different political stances. Compare the images/headline and texts. How are they similar? How are they different?

Table 2. Multi-modal C/R questions for analyzing news sites.

Comparing the sites from a multi-modal perspective, compositional elements allow the sites to create different registers. As Figure 2 shows, through variations in the compositional elements in column one, the BBC and the Daily Mail create differently looking sites to serve their different news functions. The BBC creates a relatively neutral-looking site by balancing the salience of its stories (i.e., a wide range of medium-sized images and headlines), clearly summarizing the type of stories through framing and placement, and using color (the formal blue) and images of politicians and professionals to reinforce the serious tone of the website. Moreover, “foregrounding,” a technique often used in advertising and newspapers, in which negative images accompany negative words and positive images accompany positive words was not particularly evident. By contrast, the Daily Mail created a popular and entertaining-looking site by giving salience to a small number of compelling stories for their perceived readership. The stories are clearly organized by size, with the most compelling stories given the most salience. In contrast to the BBC, the most common images were of celebrities and members of the public. Moreover, in the Daily Mail, headline to image foregrounding using graphic-written text relations of concurrence was clearly evident.

Thus, by comparing the sites, students review how different texts create different registers, which is a task they study in the written mode. However, with their knowledge of the compositional function, they can study the registers from a wider multi-modal perspective. The final point regarding the compositional function is that compositional principles are easily transferable to the construction of many other texts. Whether students are creating an academic essay or designing a web site from a template, compositional decisions must be made: Which elements should be most salient? Where are the different elements placed in the texts to best represent or support the writers’ ideas or opinion? Are the elements appropriate for the audience or argument?

Compositional Elements


Daily Mail


large number of small pictures with headlines

small number of large pictures with large headlines, as well as medium and small pictures with medium and small headlines


clearly framed and separated its stories by spaces and frame lines

used very few frame lines and separated stories by very small spaces


placed in clearly organized separated lines form left to right, top to bottom, under clearly discernible categories, for example, News, Asian Pacific, Business

stories are organized through interest, from top to bottom, with large stories at the top and smaller stories below them, right hand margins are used for the smallest stories and hyperlinks


dark formal blue

light blue


both images and words took a relatively neutral stance

positive images were often accompanied by positive words and negative image were often accompanied by negative words

Image Participants

international politicians, professionals, some celebrities and members of the public

a number of celebrities and members of the public, some politicians and professionals

Figure 2. Compositional analyses of BBC and Daily Mail homepages.

Using the representational and interactive functions, the SFG model allows for a comparison of the communicative roles that images play when they are combined with words to create meaning in authentic text contexts (the news sites), and when they are combined with words in non-authentic contexts (the classroom-created report, which the students create in part two of the task). This communicative role is fundamentally different, as explained below.

The communicative role of the images in news sites (at least those that have been analyzed in the writer’s classroom) was to set the scene for the readers before they read the written text, to connect the readers to a key orienting point or concept in the written text, or to enhance or embellish one part of the written text. To accomplish these goals, a complex combination of different types of images was used with different types of graphic written/text relations.

The most common type of images used to perform these functions were concept/offer images, similar to the ones shown in Appendix 1, Figure 4, slides 1 and 5. Concept/demand images, similar to the images found in the Macbeth text Panel 2, 3, and 4, were found in stories that were intended to elicit an emotional response, to shock, or to stimulate the reader. For instance, a close-up image of a participant laughing or crying would be used to convey joy or sadness. Narrative/event images, which might be considered as common in news stories, were relatively rare. This observation is understandable because the majority of stories were not breaking news, and the texts provided details about events that had already occurred, and thus, narrative event images might not be relevant. Furthermore, the practical difficulty of finding narrative/event images of actual events from the time that they happened to press time may also account for their limited use.

In analyzing how the newspaper sites use graphic/written text relationships of concurrence and complementation, the latter was more common than the former. Concurrence had two main functions: to draw the viewer’s attention to a main participant, for example, the government, the army, an actor, and to create foregrounding, as pointed out previously. The limited use of concurrence, and the frequency of relationships of complementation, raises student awareness that although news sites are image rich texts compared to traditional printed texts, the alphabetic text still carries the main illocutionary force of the text because words are more efficient at conveying detailed meaning than image-based texts in this context and many other authentic text contexts.

Unlike classroom-based texts, overusing concurrent relationships in authentic texts would create redundancy. Although strong relationships of concurrence do occur in some authentic texts (e.g., in children's stories where reiteration is comforting for young learners or in the visual instructions that accompany the assembling of household objects) (Stenglin & Iedema, 2001), most L1 texts are proscriptive. That is, the texts drive the reader forward, and as explained above, the image-text is therefore used to augment or add information to the written text. Hence, concurrence occurred in news sites to create foregrounding.

Language students, reared in the use of concurrent images that provide linguistic support, may need teacher support when reading such online news sites and other image rich L1 homepages pages, because the texts may create miscomprehension or incomprehension. Indeed, compared with the traditional printed press reading paths, online reading paths are very challenging: students must navigate a wide array of images-texts, alphabetic-texts, hyperlinks, and advertisements. Depending on the student’s proficiency level and experience, this process may be a very difficult task.

In contrast to the authentic text, the communicative role of images in the student report is to support linguistic understanding. Therefore, the types of images that students choose and the role that images play in relation to the spoken text will be fundamentally different from the authentic text. In the report, students can be encouraged to create multi-modal texts that have strong concurrent relationships to enable ease of comprehension in an L2 audience setting and follow clear academic patterns (Sustainable Resource 1 discussed above provides an example of the type of model students could follow), in contrast to the newspaper genre in which images text relations are far more complex. Finally, in creating the classroom report, not only are students working with new multi-modal skills, but they are also applying skills taught in the reading and writing curriculum, such as autonomous research, summarizing, note taking, paraphrasing, critical thinking, and citing sources.

4.3. Sustainable CALL resource 3: evaluating multi-modal materials

The multi-modal meta-language, outlined above and summarized in Tables 1 and 2, gives teachers the tools, not only to create sustainable resources from their existing classroom practices, but also to evaluate multi-modal classroom materials, such as publisher-created videos, software presentations, and online materials adopted for classroom use, for their sustainability in relation to the existing curricula. Table 3 provides examples of questions that might be included in such an evaluation process. The goal of the questions is to evaluate whether the digital materials support the goals of the existing curricula and whether they diverge from the curricula in ways that are inappropriate.

  • What curriculum goals is the multi-modal features intended to support?
  • Are the multi-modal relations of concurrence and complementation appropriate for the levels being taught, or will the visuals create redundancy, miscomprehension, and/or incomprehension?
  • How effectively does the video involve the students in the text through its use of demand/offer concept/narrative images?
  • Is the visual component composed in a way that is appropriate for the student’s level, age, and learner type?
  • Is the visual component composed in a way that is appropriate for the institution?

Table 3. Multi-modal questions for material evaluation.

For example, in the established curricula in my teaching context, the World Link Textbook series uses a video course book to expand on the textbook materials and recycle the linguistic components in natural settings and situations (Stempleski, 2013). In a short excerpt (see script in Figure 4 of the Appendix) from Video Course Workbook 2 (Unit 1, City Living, pp. 8-9), the lesson reviews the past tense of verbs using a discussion on keepsakes. In this example, the keepsake that triggers the recollection of Tara, a character in the video, is a pendant. As Table 4 shows, demand/concept images, such as close-ups of the pendant and Tara’s face, accompany the key communicative phrases of the script: “it’s a pendant from my grandmother,” and “she gave it to me when I was 18 years old”.


Script: Verbal Text

Visual Text

1) Sun-hee

How about this?

Concept/demand image showing a close up of the pendant

2) Tara

Now that is my favorite keepsake. It’s a pendant from my grandmother. She gave it to me when I was 18 years old.

Concept/demand of showing close up of the pendant
Concept/offer showing close of Tara’s face

3) Sun-hee

 For your birthday?

Concept/demand showing close up of Sun-hee’s face

4) Tara

No. It was in my first year of college and things were rough. I had no friends. I hated my classes. I did not think I could make it. And one day my grandmother told me a story.

Concept/demand showing a close up of Tara’s face

Table 4. Text excerpts from World Link Video Course Book.
Textbook Extract Used with permission © Cengage Learning.

In this scene, the demand/concept images, similar to the face-to-face close-ups of Panel 2, 3, and 4 of the Macbeth text (Figure 1), create interactions between Tara, who is re-telling the story, and the viewers, by clearly focusing the viewers on the speakers and the keepsake at a key point in the text. The images allow viewers to identify with the speaker emotionally, thus reinforcing the communicative component of the lesson, which is the expression of the emotional value of keepsakes. Moreover, the demand/concept images allow students to pick out variations in facial expressions and intonation that the actress uses when expressing the key phrases, providing clear models for students to replicate when re-telling their own past stories. In a traditional textbook in which students listen to an audio recording with minimal visual support, such emotional content is very difficult to establish.

Furthermore, the video does not overuse concurrent verbal/visual text relations that could make the linguistic goals of past tense re-telling redundant. The past tense recollection story (Table 4) relies on spoken text re-telling; the concept/demand images do not reiterate the key events in the story. This verbal/visual text relationship creates a positive teaching opportunity because, given that past tense re-telling is the linguistic aim of the lesson, the use of concurrent relationships would create redundancy at this level.

Through the development an appropriate pedagogical language, this ability to evaluate the extent to which multi-modal resources are appropriate for teaching linguistic goals, is a key feature in creating sustainable materials in the long term. For example, digital games are recommended for educational use because games have interactive features that create intrinsic motivational factors lacking in traditional classroom textbook materials. Such factors include encouraging participation through player investment in characters and game development, creating opportunities for player decision making, systems of reward and merit, competition, and interacting storytelling with play (Miller, 2004, pp.198-199). Nonetheless, games that employ these motivational features are not currently available for language learning contexts. In addition, the extent to which current video games on the market are directly beneficial to the curricula is debatable (Gee, 2011).

The multi-modal pedagogical language outlined above can be used by designers and material developers to aid in the creation of a new generation of classrooms and/or self-study materials that incorporates the motivational features of gaming, while ensuring that the materials are relevant for L2 contexts. For example, when digital computer games are created, a traditional alphabetic-based script accompanies the digital script. Knowledge of textual relations and their effects on comprehension at different levels of proficiency can ensure that the scripts for language learning digital games are appropriate for students’ levels. Alternatively, understanding how images are composed to create different emotional reactions from viewers can help developers design visual/alphabetic interfaces that support linguistic goals, as well as creating stimulating multi-modal features.

5. Conclusion

Given the demands of working with institutionally created curricula, one of the most challenging questions confronting language teachers is an opportunity cost question: will sending your students to the computer room be an appropriate use of classroom time? The concept of sustainability that is outlined in this paper is designed to address this question. If teachers have a multi-modal pedagogical language available to them, class activities such as the creation of a multi-modal composition need not be regarded as separate or distinct from teaching the established curricula.

Nevertheless, the primary focus of this paper is short-term adaptability to overcome situational constraints; thus, teachers use their existing pedagogical knowledge, coupled with SFG multi-modal pedagogical knowledge, to create sustainable classroom resources for CALL. In the long term, however, teachers cannot overcome situational constraints individually. Moreover, the SFG model alone is not sufficient to address the challenges of preparing students for effective communication in digital environments. To achieve long-term sustainability, researchers, practitioners, curricula developers, classroom material designers and textbook publishers must develop a pedagogical language that embraces a new multi-disciplinary approach to language and learning for the digital age.


Appendix. Example of sustainable resource 1: multi-modal conversion activity.

1. The country is better than the city because there is a lot of pollution in the city. 2. In the city there are many types of pollution: noise, tobacco smoke, gas exhaust, and acid rain. 3. Pollution is bad for our health and puts humans at risk of diseases such as cancer. 4. The countryside is free from pollution and there is less risk of disease. 5. For this reason, I prefer to live in the countryside than in the city.

Figure 1. Teacher-created Written Paragraph.


Sentence Number

Type of Sentence



Topic Sentence

Introduce the general idea


Supporting Sentence

Support general idea with an example


Supporting Sentence

Support general idea with an explanation/example


Supporting Sentence

Support general idea with an explanation


Concluding Sentence

Repeat the main idea

Figure 2. Teacher-created Written Paragraph Deconstructed.



Type of Image


Verbal/Visual Textual Relation



Introduce the main idea




Support general idea with an example




Support general idea with an example




Support general idea with an example




Repeat the main idea


Figure 3. Teacher-created Presentation Deconstructed.


Figure 4

Figure 4b

Figure 4c

Figure 4. Teacher-created presentation.



Collins, A. & Halverson, R. (2009). Rethinking education in the age of technology. Columbia: Teachers College.

Cuban, L. (2001). Oversold and underused: computers in the classroom. Cambridge, MA: Harvard University Press.

DeVoss, D.N., Eidman-Aadahl, E., & Hicks, T. (2010). Because Digital Writing Matters. CA: John Wiley & Sons.

Fries, P.H. (1994). On Theme Rheme and Discourse Goals.  In M. Coulthard (ed.). Advances in Written Text Analysis. (pp. 229- 249). New York: Routledge.

Gee, Paul, (2011). Reflections On Empirical Evidence On Games and Learning. In T.Sigmund & J.D. Flechter (Eds.), Computer Games and Instruction. (pp. 223-232). Charlotte NC: Information Age Publishing

Hagood, M. (2008). Intersections of Popular Culture, Identities, and New Literacies. In J. Coiro et al (eds). Handbook of Research into New Literacies. (pp. 377-407). New York: Erlbaum.

Halliday, M.A.K. & Matthiessen, C.M.I.M. (2004). An Introduction to Functional Grammar. London: Hodder and Arnold.

Hoey, M. (2001). Textual Interaction. Oxon: Routledge.

Kennedy, C. (2013). Models Of Change and Innovation.  In K. Hyland & C. Wong (Eds.), Innovation and Change In English Language Education. (pp. 13-27). Oxen:Routledge.

Kress, G.  (2003). Literacy in the New Media Age. Oxon: Routledge.

Kress, G. & Van Leeuwen, T. (2006). Reading Images. London: Routledge.

Liu, J. (2004). Effects of comic strips on L2 learners reading comprehension. TESOL Quarterly, 38(2), 225-243.

Luke, C. (2001). Connectivity, Multimodality and Interdisciplinarity. In Reading Research Quarterly, Vol.38, No.3, 397-403.

Marchand, T. (2013). Speech in written form? A corpus analysis of computer-mediated communication. Linguistic Research, 30(2), 217-242.

McCloud. S. (1994). Understanding Comics: The Invisible Art. New York: Harper Collins.

Mc Donald, J. Haward, J. Dobbin, N. Erskine, G. (2008). Macbeth: The Graphic Novel. Bristol: Classical Comics Ltd.

Miller, C.H. (2004). Digital Storytelling. Oxford: Elsevier.

Unsworth, L. (2008). Multiliteracies and Metalanguage: Describing Image Text Relations as a Resource for Negotiating Multimodal Texts. In J. Coiro et al. (eds). Handbook of Research into New Literacies. (pp. 377-407). New York: Erlbaum.

Royce, T. (2002). Multi-modality in the TESOL classroom. In TESOL Quarterly. Vol. 36, No.2, Summer 2002. 191-205.

Rutherford, W .E. (1987). Second Language Grammar and Teaching. New York: Pearson Education.

Stempleski, S. (2013). World Link: Developing English Fluency. Singapore: Cengage

Stenglin, M. and Iedema, R. (2001). How to Analyse Visual Images: A Guide for TESOL Teachers. In A. Burns. & C. Coffin. Analyzing English in a Global Context. (pp.194-208) London: Routledge.

Selfe, C.L. (2007). Multi-Modal Composition. NJ: Hampton Press.

Abstract Views

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.


Cited-By (articles included in Crossref)

This journal is a Crossref Cited-by Linking member. This list shows the references that citing the article automatically, if there are. For more information about the system please visit Crossref site

1. Dialogic-Interactive Media Design for Language Learning To Improve Speaking Activities and Skills
Atmazaki, Syahrul Ramadhan, Vivi Indriyani, Jeihan Nabila
Journal of Physics: Conference Series  vol: 1779  first page: 012029  year: 2021  
doi: 10.1088/1742-6596/1779/1/012029

Licencia Creative Commons

This journal is licensed under a  Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Universitat Politècnica de València

e-ISSN: 1695-2618