The science of learning with videos
Videos revolutionized learning but not always for the better. What does the science say? Mayer’s now-classic principles show that true innovation comes from cognitive science, not just flashy media.
Videos were once the future of learning. They’re now its present and, although they have lost their novelty, there is no sign of them going away.
According to data from 2017, learning-related videos alone get 500 million views per day on YouTube. That’s half a billion. Per day. The popularity of this format is beyond question, and the reasons seem pretty evident: a video can do what text and images can’t. It moves, it talks, it shows.
That richness still tricks people into thinking any video lesson will naturally improve learning outcomes. Anyone who has sat through a weak explainer knows the opposite. The effectiveness of a learning video has little to do with the fact that it’s a video and has everything to do with how its design fits the limits of human cognition and how we learn.
The principles of multimedia learning
Richard Mayer’s cognitive theory of multimedia learning explains why some videos teach and others merely look modern. His 12 evidence-based principles apply beyond video, shaping all forms of multimedia.
Mayer first presented these principles in the early 2000s before learning videos became mainstream. YouTube didn’t even exist yet. He drew on decades of studies and led his own experiments to identify which choices in audiovisual design support learning. He presented them in his now-classic work Multimedia Learning, and later in the Cambridge Handbook of Multimedia Learning.
What’s fascinating about them is how specific they are: they give practical guidelines, such as placing words next to images, and avoidances, such as reading out loud text that is already on the screen. This clarity explains their popularity: Mayer’s principles are now an indispensable reference for every instructional designer.
The 12 principles rest on the concept of dual coding: learning improves when verbal information (processed through the ears) and visual information (processed through the eyes) work together. Each channel has limited capacity, so how we split information between them matters.
These ideas connect directly to cognitive load theory, which splits mental effort into three types. Mayer mapped his principles onto these types: five principles for reducing unnecessary load, three for managing complexity and four for strengthening productive effort. Let’s look at them!
Reduce extraneous load: remove distraction
Extraneous load is the bad kind of mental effort: it comes from mental work that doesn't support the learning goal. Poor multimedia design creates a lot of it. Five principles target this problem:
1) Coherence principle: remove extraneous details, visuals or sounds.
2) Signaling principle: add cues that highlight essential structure.
3) Redundancy principle: avoid combining narration, identical on-screen text and graphics.
4) Spatial contiguity principle: place text and visuals close together.
5) Temporal contiguity principle: present words and pictures at the same time.
Coherence principle
We often think of decorative visuals or background music as a nice-to-have. They aren’t! When learners must process irrelevant sounds, images that are mere decoration or mismatched text and visuals, they lose capacity needed for comprehension.
Coherence goes beyond sounds or visuals. Adding seductive details like fun facts to make a video more engaging can backfire because learners remember the trivia instead of the lesson. Mayer showed this talking about a study on teaching how viruses cause colds: including a note that "people who make love once or twice a week have better immunity” reduced learning rather than improved it.
Participants found the fact that having regular sex helps prevent colds most interesting. But those who remembered it couldn’t link it to how viruses cause colds and when the fact was removed, actual learning improved.
Unsurprisingly, Mayer explains, people find topics such as sex or death to be the most interesting, but adding fun facts about these juicy topics doesn’t carry over to the learning of other topics, even if they’re somewhat related. Too bad!
Signaling principle
Think of the signaling principle as the instructional designer’s equivalent of putting a neon arrow over “The Important Thing™” so learners don’t wander off. Good cues act like breadcrumbs for the brain, guiding attention to the structure that actually matters. Don’t just describe a part of a visual on screen and expect the learner to find their way to that part, use an arrow or a circle to highlight what you’re talking about, use section headings, overviews, summaries, etc. Without cues, your screen is a visual maze for non-experts!
Redundancy principle
The redundancy principle is another counterintuitive case. It seems logical to think that narrating and showing the same text would reinforce learning. After all, aren’t we all used to reading captions on Instagram reels and subtitles on Netflix series nowadays? Yet when a lesson includes graphics, research consistently shows the opposite.
When learners read on-screen text while hearing the same words, the visual channel becomes overloaded, since the written text is competing with the rest of the on-screen visuals. As a result, comprehension drops.
This isn't a contradiction of dual coding. Written text and images share the visual channel, while speech uses the auditory one. Learning improves when channels provide complementary information and declines when both present the same verbal content.
There are cases where redundancy can support learning: if the text appears next to the corresponding visual and if it highlights only key elements (rather than duplicating the narration). That means: this principle doesn’t exclude the use of key terms, cues, headings or summaries that support understanding.
Subtitles on video platforms such as TikTok, Instagram or Netflix aren't counterexamples. These services aim for attention or accessibility (in noisy or multilingual settings), not for learning.
Split-attention principles
The two principles on spatial and temporal “contiguity” address split attention (they’re often treated as one principle): when visuals and their explanations appear far apart or out of sync. The effort learners have to put in trying to look for the explanation that matches a diagram or graph is simply wasted and takes a toll on understanding. Use the audio channel (voiceover) to explain the visual as it’s directly on screen. In other words, different sources of information should be physically and temporally integrated.
Manage intrinsic load: support complexity
Some material is simply complex in itself. Intrinsic or essential load depends on the topic and how much the learner already knows about it (prior knowledge). Three principles help manage this complexity:
6) Segmenting principle: break content into learner-controlled units.
7) Pre-training principle: introduce key elements before the main lesson.
8) Modality principle: use narration rather than on-screen text with visuals.
Segmenting lets learners control the pacing. Segmented videos following the principles of microlearning let learners pause, rewind and consolidate, without a forced processing speed. Nowadays, most video platforms offer their users video controls, so the key lies in the length and the alignment with learning outcomes (proper chunking).
Pre-training prepares learners by introducing the key elements before the main lesson. When learners know the names and functions of parts in a system, they can devote more capacity to understanding how the system works.
The modality principle (what I call the anti-PowerPoint principle) sounds similar to the redundancy principle but isn’t the same. It says that when explaining a visual, audio narration is better than on-screen text. Say goodbye to those long blocks of text explaining a visual! Your audience will thank you.
Foster germane load: deepen understanding
Once extraneous load is minimized and essential load is manageable, designers can encourage deeper cognitive processing. Another four principles support this stage:
9) Multimedia principle: combine words and relevant pictures.
10) Personalization principle: use conversational style.
11) Voice principle: use a human voice instead of a synthetic one.
12) Image principle: adding the speaker’s image rarely helps.
The multimedia principle states that people learn better from words and relevant pictures than from words alone. This principle sums up the idea behind dual-coding in a nutshell. If you combine words and visuals, learning increases.
The personalization and voice principles reflect social agency. Learners engage more deeply when instruction feels like a human speaking directly to them rather than a formal script or artificial voice. The language used should be conversational and the voice shouldn’t sound robotic! Even LLMs apply these principles now.
The image principle is a surprising outlier! Showing the instructor’s face doesn't reliably improve learning and often harms it. A talking head consumes visual capacity and attention without adding instructional value. Don’t take it personally, but the movement of your face won’t add to the understanding of any topic, unless we’re discussing anatomy, and even then…
The image principle challenges our inherited idea of an ideal lesson. It shows that in-person presentations like TED Talks or conferences aren’t the best learning environment and that well designed learning videos can outperform them.
Myths, misconceptions and reversals
Mayer’s research overturns multimedia myths by showing counterintuitive findings: “more” media doesn’t always mean better learning, subtitles can hinder unless needed, talking heads don’t necessarily improve outcomes, etc.
But do his principles always work? Mayer often talks about an important boundary condition: the expertise reversal effect. These principles apply mainly to beginners because once learners have strong mental models, extra guidance becomes redundant and uses up cognitive capacity.
Also consider: more recent revisions of the 12 principles add additional ones: embodiment principle, generative activity principle, immersion principle, collaboration principle, etc. So the list definitely isn’t a closed one and keeps evolving as new research emerges.
The modern landscape: videos, AI tutors and beyond
Here are some examples of platforms and channels that excel at implementing Mayer’s principles:
Kurzgesagt - In a nutshell: Clear scientific explanations paired with high-end motion design. Strong signaling, tight narration and solid storytelling, though the playful style can add some extraneous load.
Art of the Problem: Deep conceptual focus with minimal, purposeful visuals. Strong alignment with Mayer’s principles through clean coherence, synchrony and controlled pacing.
Khan Academy: Function over flair. The handwriting approach keeps load low, uses effective modality and signaling, and supports learners with short segments and practice.
As we’ve seen, videos remain important in digital learning, even if multimedia design now includes interactive animations, simulations, AR, VR and increasingly AI tutors. Mayer’s principles apply across these formats because they tie technological possibility to how learning actually works within the limits of human cognition.
Educational innovation succeeds when it follows cognitive science. The key question for any new tool is not whether it’s new but whether it aligns with how we learn.
Because tools evolve, but our working memory doesn’t.
Keep learning
Prompt suggestions. Always ask follow-up questions:
I want to make an educational video. Can you help me apply Mayer’s 12 multimedia principles to my script and visuals? Start by asking about my topic and audience.
Explain the difference between extraneous, intrinsic and germane cognitive load using short examples from video learning. Then test me with 5 retrieval questions, one at a time.
Many creators think “more visuals = better learning.” Can you show me how this belief conflicts with Mayer’s research and help me redesign a short explainer script to follow the coherence and signaling principles instead?
Links
▶️ A lecture by Mayer at Harvard explaining the principles: This 2014 video contains the full, hour-and-a-half-long recording of a lecture by Mayer himself, in which he explains an early version of the principles in a very accessible way. I can’t think of a better way to spend 90 minutes than watching this!
📑 Introduction to Multimedia Learning: Here you can read for free the introduction to the 3rd and most recent version of the Cambridge Handbook of Multimedia learning. It’s a great first step to start reading directly Mayer’s works. Spoiler: he starts by showing an example of multimedia instruction from 1657!






