This article is based on an extract from a chapter in Dunlosky J and Rawson K (eds) The Cambridge Handbook on Cognition and Education. New York: Cambridge University Press.

Over the past 30 years, educational and cognitive psychology have amassed encouraging evidence that human understanding can be improved substantially when we add appropriate graphics to text. In short, people learn better from words and pictures than from words alone. This article explores the potential of this multimedia principle for improving how people understand communications about academic content, as measured by their ability to take what they have learned and apply it to new situations (i.e. to solve transfer problems).

Multimedia instruction

Multimedia instruction (or a multimedia instructional message) refers to a lesson containing both words and pictures, where the words can be in spoken form or printed form and the pictures can be in static form (such as illustrations, charts, graphs or photos) or dynamic form (such as animation or video). Multimedia instruction can be presented in books, in live slideshow presentations, in e-learning on computers, or even in video games or virtual reality.

In multimedia learning, pictures do not replace words, but rather work together with words to form an instructional message that results in deeper understanding. For example, consider a verbal description of how a bicycle pump works. After students listen to an explanation, they are not able to generate many useful answers to transfer questions such as the troubleshooting question, ‘Suppose you push down and pull up several times but no air comes out. What could have gone wrong?’ (Mayer and Anderson, 1991). However, if we add a simple animation depicting the movement of the handle, piston and valves in a pump, in sync with the narration, students are able to generate more than twice as many useful answers.

How the multimedia principle works

The cognitive theory of multimedia learning is based on three key ideas from cognitive science:

  • Dual-channel principle: The human information processing system contains separate channels for verbal and pictorial information (Baddeley , 1992)
  • Limited capacity principle: Only a few items can be processed in a channel at any one time (Baddeley, 1992)
  • Active processing principle: Meaningful learning requires appropriate cognitive processing during learning, including attending to relevant information, mentally organising it into a coherent structure, and integrating it with relevant prior knowledge (Mayer , 2009).

Overall, in multimedia instruction, meaningful learning occurs when the learner selects relevant words and images from the multimedia message for further processing in working memory, mentally organises the words into a coherent structure (or verbal model) and the images into a coherent structure (or pictorial model), and integrates the verbal and pictorial representations with each other and with relevant prior knowledge activated from long-term memory. The main challenge in teaching is to guide learners to engage in these processes, while not overloading their limited processing capacity in each channel of working memory. Designing effective multimedia instruction requires not only presenting the relevant material, but also guiding the learner’s cognitive processing of the material.

Implications of the multimedia principle for the classroom

            In attempting to apply the multimedia principle in practice, it becomes clear that some ways of incorporating graphics are more effective than others. Table 1 lists 11 evidence-based principles for the design of multimedia instruction. The first column gives the name of the principle; the second column gives a brief description of the principle; the third column lists the median effect size based on published experiments comparing the transfer test performance of students who learned with the standard version of the lesson, versus those who learned with an enhanced version; and the fourth column shows the number of experiments showing a positive effect out of the total number of experiments. We focus on principles that yield median effect sizes greater that d = 0.40, which is considered substantial enough to be practically important for education (Hattie, 2009).

Extraneous processing

The first 5 principles address the goal of reducing extraneous processing — cognitive processing during learning that does not support the instructional goal. Working memory capacity is limited, so if a learner allocates too much cognitive processing capacity to extraneous processing, there will not be enough cognitive capacity left to fully engage in essential processing (i.e. cognitive processing aimed at mentally representing the essential information in working memory) and generative processing (i.e. cognitive processing aimed at reorganising the material and integrating it with relevant knowledge activated from long-term memory).

The coherence principle is that people learn better when extraneous material is excluded rather than included (Mayer, 2009); (Mayer and Fiorella , 2014). Extraneous material includes unneeded detail in graphics, background music, or interesting but irrelevant facts in the text. More learning occurs when the instructional message is kept as simple as possible.

The signaling principle is that people learn better when essential material is highlighted (van Gog, 2014). Highlighting of printed text can involve the use of color, underlining, bold, italics, font size, font style or repetition. Highlighting of spoken text can involve speaking louder or with more emphasis. Highlighting of graphics includes the use of arrows, color, flashing and spotlights.

The spatial contiguity principle is that people learn better when printed words are placed near to, rather than far from, corresponding graphics (Ayres and Sweller , 2014). Johnson and Mayer (Johnson and Mayer , 2012) reported that students performed substantially better on transfer tests when they received integrated presentations (the words placed near the part of the graphic they describe) rather than separated presentations (the words presented as a caption at the bottom of the page or screen), even though the words and graphics were identical.

The temporal contiguity principle is that people learn better from a narrated lesson when the spoken words are presented simultaneously with the corresponding graphics (such as drawings, animation or video), rather than ‘successive presentation’, when the spoken words are presented before (or after) the graphics (Ginns , 2006).

The redundancy principle is that people learn better from narration and graphics than from narration, graphics and redundant text (Adesope and Nesbit , 2012).

Essential processing

The next 3 principles in Table 1 are aimed at managing essential processing (i.e. cognitive processing for mentally representing the essential material in working memory).  When the material is complex for the learner, the amount of essential processing required to mentally represent the material may overload working memory capacity. In this case, the learner needs to be able to manage his or her processing capacity in a way that allows for representing the essential material.

The segmenting principle calls for breaking a multimedia lesson into manageable parts (Mayer and Pilegard , 2014). For example, rather than presenting a 2.5-minute narrated animation on lightning formation as a continuous presentation, break it into short segments and allow the learner to click to go to the next segment, enabling them to digest one step in the process of lightning formation before going on to the next one.

The pretraining principle calls for teaching students about the names and characteristics of key elements before presenting the multimedia lesson (Mayer and Pilegard, 2014). For example, before presenting a narrated animation depicting how a car’s braking system works, students can be presented with a diagram of the braking system showing the key parts, e.g. brake pedal, piston, wheel cylinders and brake shoes.

The modality principle is that people learn better from multimedia presentations when the words are spoken rather than printed (Low and Sweller , 2014), so the visual channel does not become overloaded by having to process both graphics and printed words.

Generative processing

The final 3 principles in Table 1 are intended to foster generative processing, that is, cognitive processing aimed at making sense of the presented material. Even if cognitive capacity is available, learners may not be motivated to use it to process the material deeply.  Social cues can help motivate learners to engage in deeper processing because people tend to want to understand what a communication partner is telling them. Thus, principles based on social cues are intended to make learners feel as if they are in a conversation with the teacher, that is, they feel that the teacher is a social partner. This approach yields the newest of the multimedia design principles, including using conversational language (personalisation principle), using an appealing human voice (voice principle), and using human-like gestures (embodiment principle).

The personalisation principle is that people learn better from a multimedia lesson when the words are in conversation style rather than formal style (Ginns et al., 2013). For example, the words from a lesson on how the human respiratory system works could be presented in third person form (e.g. ‘the lungs’) or in first and second person form (e.g. ‘your lungs’).

The voice principle is that people learn better from multimedia lessons involving spoken words when the narrator has an appealing human voice rather than a machine voice or unappealing voice (Mayer, 2014). An important boundary condition is that the positive impact of human voice can be overturned by the use of negative social cues such as presenting an onscreen agent that does not engage in humanlike gesturing (Mayer and DaPra , 2012).

The embodiment principle is that people learn better from multimedia lessons in which an onscreen agent or instructor uses humanlike gesture (Mayer, 2014). For example, Mayer and DaPra (2012) presented students with a narrated slideshow lesson in which an onscreen animated pedagogical agent stood next to the slide and either displayed humanlike gestures or did not move during the lesson. Students learned better when the onscreen agent used humanlike gestures.

Boundary conditions

Each of the 11 evidence-based principles has important boundary conditions, largely consistent with the cognitive theory of multimedia learning. Some principles may apply more or less strongly, or have weaker or stronger effects, depending on, for example, working memory capacity, level of prior knowledge and complexity of the material being presented.

Multimedia learning principles in practice

What happens when we combine these principles within the context of an actual classroom? Issa et al. (Issa et al., 2013) compared how beginning medical students learned from a standard slideshow lesson or from a lesson in which the slides were modified based on multimedia design principles such as in Table 1. On a transfer test administered 4 weeks later, students in the modified group outperformed those in the standard group with an effect size of d = 1.17, even though the content was the same. This study, and similar ones (Harskamp et al., 2007); (Issa et al., 2011), suggest that applying multimedia principles to the design of classroom instruction can greatly increase student learning.


Adesope O and Nesbit J (2012) Verbal redundancy in multimedia learning environments: A meta-analysis. Journal of Educational Psychology (104): 250–263.
Ayres P and Sweller J (2014) The split attention principle in multimedia learning. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning . 2nd ed. New York: Cambridge University Press, pp. 206–226.
Baddeley A (1992) Working memory. Science (255): 556–559.
Ginns P (2006) Integrating information: A meta-analysis of spatial contiguity and temporal contiguity effects. Learning and Instruction (16): 511–525.
Ginns P, Marin A and Marsh H (2013) Designing instructional text for conversational style: A meta-analysis. Educational Psychology Review (25): 445–472.
Harskamp E, Mayer R and Suhre C (2007) Does the modality principle for multimedia learning apply to science classrooms? . Learning and Instruction (17): 465–477.
Hattie J (2009) Visible Learning. New York: Routledge.
Issa N, Schuller M, Santacaterina S, et al. (2011) Applying multimedia design principles enhances learning in medical education. Medical Education (45): 818–826.
Issa N, Mayer R, Schuller M, et al. (2013) Teaching for understanding in medical classrooms using multimedia design principles. Medical Education (47): 388–396.
Johnson C and Mayer R (2012) An eye movement analysis of the spatial contiguity effect in multimedia learning. Journal of Experimental Psychology: Applied (18): 178–191.
Low R and Sweller J (2014) The modality principle in multimedia learning. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning. 2nd ed. New York: Cambridge University Press, pp. 227–246.
Mayer R (2009) Multimedia Learning. 2nd ed. New York: Cambridge University Press.
Mayer R and DaPra C (2012) An embodiment effect in computer-based learning with animated pedagogical agent. Journal of Experimental Psychology: Applied (18): 239–252.
Mayer R and Fiorella L (2014) Principles for reducing extraneous processing in multimedia learning: Coherence, signaling, redundancy, spatial contiguity, and temporal contiguity. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning . 2nd ed. New York: Cambridge University Press, pp. 345–368.
Mayer R and Pilegard C (2014) Principles for managing essential processing in multimedia learning: Segmenting, pretraining, and modality principles. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning. 2nd ed. New York: Cambridge University Press, pp. 379–315.
Mayer R (2014) Principles based on social cues in multimedia learning: Personalization, voice, image, and embodiment principles. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning. 2nd ed. New York: Cambridge University Press, pp. 345–368.
Mayer R and Anderson R (1991) Animations need narrations: An experimental test of a dual-coding hypothesis. Journal of Educational Psychology (83): 484–490.
van Gog T (2014) The signaling (or cueing) principle in multimedia learning. In: Mayer R (ed.) The Cambridge Handbook of Multimedia Learning . 2nd ed. New York: Cambridge University Press, pp. 263–278.