The Testing Effect: Why Flashcards Work and Re-Reading Doesn't

Here’s an experiment you can run on yourself right now.

Take a list of 20 foreign words. Split them into two groups of 10. For the first group, read each word and its translation four times. For the second group, read each word once, then try to recall the translation from memory three times — checking only after you’ve attempted to remember.

Come back tomorrow and test yourself on all 20.

The second group will win. Decisively. And the margin won’t be close.

This is the testing effect — one of the most robust, most replicated, and most ignored findings in the science of learning. And if you’re studying a language, it should fundamentally change how you spend your study time.

The Experiment That Changed Everything

In 2006, cognitive psychologists Henry Roediger III and Jeffrey Karpicke published a study that should have rewritten every textbook on how to study. The design was elegant in its simplicity.

Participants read short prose passages. Then they were split into two groups:

Group 1 (Study-Study): Read the passage again. Twice.
Group 2 (Study-Test): Read the passage once, then took a free recall test — writing down everything they could remember, with no feedback.

Both groups spent the same total amount of time. The only difference was how that time was used: re-reading versus attempting to recall.

Five minutes later, Group 1 performed slightly better. This is the result that feels right — of course re-reading helps, you’re seeing the material again.

But then Roediger and Karpicke tested both groups one week later.

Group 2 — the ones who had practiced recalling — remembered significantly more than Group 1. The advantage wasn’t marginal. The testing group retained approximately 50% more material after one week than the re-reading group.

Let that sink in. The group that spent less time looking at the material and more time struggling to remember it had dramatically superior long-term retention.

This wasn’t a one-off finding. A subsequent meta-analysis by Rowland (2014), covering 159 experimental comparisons, confirmed that retrieval practice produces a moderate to large positive effect on long-term retention across a wide range of materials, populations, and conditions. The testing effect is one of the most reliable phenomena in cognitive psychology.

Why Retrieval Beats Re-Exposure

The intuition behind re-reading is seductive: if I look at it more, I’ll remember it better. More exposure = more learning. It seems logical.

It’s wrong.

The problem is that re-reading creates what psychologists call fluency illusions. When you re-read a word and its translation, a feeling of familiarity washes over you. “Oh yes, kudasai means please. I know this.” The information feels accessible. Your brain interprets that fluency as knowledge.

But familiarity is not the same as retrievability. Recognizing something when you see it is fundamentally different from producing it when you need it. And in a real conversation, you need to produce — not recognize. Nobody holds up a flashcard during a conversation in Tokyo.

Retrieval practice works because it strengthens a completely different process: the search and reconstruction pathway. When you try to recall a word from memory — actively searching for it, struggling, sometimes failing — you’re exercising the exact neural pathway you’ll need in real use. Each successful retrieval strengthens that pathway. Each failed retrieval followed by feedback tells your brain “this is important, store it more durably.”

The neuroscience supports this. Retrieval activates the hippocampus — the brain’s memory consolidation hub — more intensely than passive re-exposure. A 2009 study by Karpicke and Roediger using fMRI confirmed that successful retrieval produces distinct patterns of hippocampal activation that predict long-term retention. The brain literally encodes memories differently when you recall them versus when you merely re-encounter them.

Robert Bjork, one of the pioneers of desirable difficulty theory, explains it this way: storage strength (how well something is encoded) and retrieval strength (how easily you can access it right now) are separate dimensions. Re-reading increases retrieval strength temporarily — the information is right there in front of you, so it feels accessible. But it does almost nothing for storage strength. Retrieval practice, by contrast, directly increases storage strength — even when retrieval strength is low and the recall feels effortful.

This is the core paradox: the harder it is to remember something, the more you benefit from trying to remember it. Struggle isn’t a sign that learning has failed. It’s the mechanism through which learning happens.

The Illusion of Knowing

There’s a darker side to re-reading that makes it actively harmful for language learners: it systematically deceives you about what you actually know.

Psychologists call this metacognitive illusion — a disconnect between your subjective sense of knowledge and your actual ability to recall it. Re-reading inflates your confidence. You feel like you know 80% of your vocabulary deck after reviewing it. You actually know 40%.

This illusion has measurable consequences. In Roediger and Karpicke’s experiments, the study-study group predicted they would perform better on the delayed test. They were wrong. They were not just less knowledgeable — they were also less aware of their own ignorance.

For language learning, this creates a vicious cycle:

You re-read your vocabulary list
Everything feels familiar
You think you know it
You move on to new material
Two weeks later, you can’t remember half of it
You conclude that your memory is bad or that you’re not talented at languages

Your memory isn’t bad. Your study method was deceiving you about what you actually knew.

Retrieval practice breaks this cycle because it gives you honest feedback. When you try to recall a word and can’t — that moment of failure is information. It tells you exactly what you don’t know, so you can allocate your study time where it matters.

The Testing Effect Applied to Language Learning

The testing effect has been studied specifically in the context of foreign vocabulary learning, and the results are even more dramatic than for general knowledge.

A 2011 study by Karpicke and Blunt, published in Science, compared retrieval practice to elaborate concept mapping for learning new material. Retrieval practice produced 50% better retention — even though concept mapping involved deeper cognitive processing. The authors concluded that retrieval practice is “a powerful way to promote meaningful learning of complex concepts.”

For vocabulary specifically, a study by Pyc and Rawson (2010) demonstrated that retrieval practice with feedback was the single most effective method for learning foreign language word pairs — outperforming increased study time, keyword mnemonics, and elaborative encoding.

The practical implications are stark:

What doesn’t work well: - Re-reading vocabulary lists - Highlighting words in a textbook - Staring at flashcards and flipping them immediately - Listening to word lists on repeat - Copying words multiple times

What works: - Seeing the foreign word and attempting to recall the meaning before checking - Seeing the meaning and attempting to produce the foreign word - Writing sentences using the target word from memory - Recalling vocabulary in conversation (the ultimate retrieval practice) - Any activity where you struggle to remember before receiving feedback

How to Use Flashcards as Retrieval Practice (Not Passive Review)

Flashcards are the most natural vehicle for retrieval practice — but most people use them wrong. The difference between effective and ineffective flashcard use comes down to one thing: whether you actually try to recall before looking at the answer.

The wrong way

See the front of the card: 覚える
Think “hmm, I’m not sure”
Immediately flip to see the answer: “to remember”
Think “oh right, I knew that”
Rate it as “Good” and move on

This is re-reading with extra steps. You never actually retrieved anything. The fluency illusion kicked in the moment you saw the answer, and you told yourself you “knew” it.

The right way

See the front of the card: 覚える
Pause. Actively search your memory. What does this mean?
Generate an answer — even a guess. Say it out loud or say it in your mind.
Then flip the card.
Compare your answer to the correct one.
Rate honestly: if you couldn’t recall it, mark it as failed — even if you “almost” had it.

The critical step is step 3: generating an answer before checking. This is where the testing effect happens. Without it, you’re doing passive review.

Productive recall vs. receptive recall

There’s an important distinction in how you orient your cards:

Receptive recall: See the foreign word → recall the meaning. (“What does Aufgabe mean?”)
Productive recall: See the meaning → produce the foreign word. (“How do you say ‘task’ in German?”)

Productive recall is harder — and therefore more effective for building speaking ability. If you only do receptive recall, you’ll develop the ability to recognize words but not to produce them in conversation. This is exactly the comprehension-production gap that Swain identified.

The ideal practice: do both directions. Receptive recall builds your listening and reading vocabulary. Productive recall builds your speaking and writing vocabulary. Most SRS apps (including Anki) can generate cards in both directions from a single entry.

Sentence cards amplify the effect

Rather than isolated word pairs (覚える → “to remember”), create sentence cards that embed the target word in context:

Front: 新しい単語を覚えるのは難しい。 Back: It’s hard to remember new words. (覚える = to remember/memorize)

This combines retrieval practice with contextual encoding, collocational learning, and grammar exposure — all in a single card. You’re retrieving the meaning of the target word while simultaneously processing the sentence structure, the particles, and the surrounding vocabulary.

Spacing + Testing: The Combination That Dominates

The testing effect is powerful on its own. Combined with spaced repetition, it becomes the most effective learning method cognitive science has ever documented.

Here’s why the combination works:

Spaced repetition determines when you review — scheduling each item at the optimal moment before you would have forgotten it.

Retrieval practice determines how you review — forcing active recall instead of passive re-reading.

Together, they create a system where you practice recalling information at the exact moment when recall is most effortful — and therefore most beneficial. Each successful retrieval at the point of near-forgetting simultaneously strengthens the memory trace and delays the next forgetting curve.

This is what makes Anki (with FSRS) so effective when used correctly: it’s a machine that automates the scheduling of retrieval practice at optimal intervals. But the machine only works if you do the retrieval honestly — if you actually try to recall before flipping the card, and rate your performance truthfully.

If you flip cards passively and generously rate everything as “Good,” you’ve turned the most powerful learning tool available into a re-reading app with fancy scheduling. The algorithm can’t help you if you’re not doing the cognitive work.

The Desirable Difficulty Principle

The testing effect is a specific instance of a broader principle identified by Robert and Elizabeth Bjork: desirable difficulties.

The Bjorks’ insight (1992, updated in subsequent work) is that learning conditions that make performance worse in the short term often make retention better in the long term. Difficulties are “desirable” when they force deeper cognitive processing — encoding that creates more durable and more retrievable memories.

Examples of desirable difficulties in language learning:

Spacing (reviewing after a delay rather than immediately)
Interleaving (mixing vocabulary from different topics rather than studying one category at a time)
Retrieval practice (testing yourself instead of re-reading)
Generation (trying to produce the answer before seeing it)
Variation (encountering the same word in different contexts)

Examples of undesirable difficulties:

Material that is completely incomprehensible (below the 95% threshold)
Distracting study environments that prevent encoding
Insufficient sleep (which impairs memory consolidation)
Excessive volume of new cards that overwhelms the review system

The key distinction: a desirable difficulty challenges the retrieval process while keeping the encoding intact. If you can’t even understand what you’re studying, the difficulty isn’t desirable — it’s just noise.

The Bottom Line

The testing effect is not a study tip. It’s a fundamental property of how human memory works. Attempting to retrieve information from memory doesn’t just measure what you know — it changes what you know. Each act of effortful recall strengthens the memory in a way that no amount of re-reading can replicate.

For language learners, this means one thing: stop reviewing and start recalling.

Don’t re-read your vocabulary list — cover the translations and test yourself. Don’t passively flip through flashcards — generate the answer before checking. Don’t listen to word lists on repeat — pause after each word and try to produce the translation before hearing it.

The struggle is the point. The moment you can’t quite remember a word — that effortful search, that frustrating pause — is the exact moment your brain is building a stronger memory.

Re-reading feels productive. Retrieval feels hard. The hard thing is the thing that works.

This article is part of the series “The Science of Language Learning” — where we break down what research actually says about how adults acquire languages, and how to use that science to learn faster.

Previous in the series: Why Adults Can Still Learn Languages (But Not Like Children)

Next in the series: How Your Brain Learns New Sounds (And Why Some Are So Hard)

References:

Roediger, H.L. III & Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255.
Rowland, C.A. (2014). The effect of testing versus restudy on retention: A meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463.
Karpicke, J.D. & Blunt, J.R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772–775.
Karpicke, J.D. & Roediger, H.L. III (2008). The critical importance of retrieval for learning. Science, 319(5865), 966–968.
Pyc, M.A. & Rawson, K.A. (2010). Why testing improves memory: Mediator effectiveness hypothesis. Science, 330(6002), 335.
Bjork, R.A. & Bjork, E.L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy et al. (Eds.), From Learning Processes to Cognitive Processes: Essays in Honor of William K. Estes (Vol. 2, pp. 35–67). Erlbaum.
Bjork, E.L. & Bjork, R.A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M.A. Gernsbacher et al. (Eds.), Psychology and the Real World (pp. 56–64). Worth Publishers.
Dunlosky, J., Rawson, K.A., Marsh, E.J., Nathan, M.J., & Willingham, D.T. (2013). Improving students’ learning with effective learning techniques. Psychological Science in the Public Interest, 14(1), 4–58.