Here’s an experiment you can run on yourself right now.

Take a list of 20 foreign words. Split them into two groups of 10. For the first group, read each word and its translation four times. For the second group, read each word once, then try to recall the translation from memory three times — checking only after you’ve attempted to remember.

Come back tomorrow and test yourself on all 20.

The second group will win. Decisively. And the margin won’t be close.

This is the testing effect — one of the most robust, most replicated, and most ignored findings in the science of learning. And if you’re studying a language, it should fundamentally change how you spend your study time.

The Experiment That Changed Everything

In 2006, cognitive psychologists Henry Roediger III and Jeffrey Karpicke published a study that should have rewritten every textbook on how to study. The design was elegant in its simplicity.

Participants read short prose passages. Then they were split into two groups:

Both groups spent the same total amount of time. The only difference was how that time was used: re-reading versus attempting to recall.

Five minutes later, Group 1 performed slightly better. This is the result that feels right — of course re-reading helps, you’re seeing the material again.

But then Roediger and Karpicke tested both groups one week later.

Group 2 — the ones who had practiced recalling — remembered significantly more than Group 1. The advantage wasn’t marginal. The testing group retained approximately 50% more material after one week than the re-reading group.

Let that sink in. The group that spent less time looking at the material and more time struggling to remember it had dramatically superior long-term retention.

This wasn’t a one-off finding. A subsequent meta-analysis by Rowland (2014), covering 159 experimental comparisons, confirmed that retrieval practice produces a moderate to large positive effect on long-term retention across a wide range of materials, populations, and conditions. The testing effect is one of the most reliable phenomena in cognitive psychology.

Why Retrieval Beats Re-Exposure

The intuition behind re-reading is seductive: if I look at it more, I’ll remember it better. More exposure = more learning. It seems logical.

It’s wrong.

The problem is that re-reading creates what psychologists call fluency illusions. When you re-read a word and its translation, a feeling of familiarity washes over you. “Oh yes, kudasai means please. I know this.” The information feels accessible. Your brain interprets that fluency as knowledge.

But familiarity is not the same as retrievability. Recognizing something when you see it is fundamentally different from producing it when you need it. And in a real conversation, you need to produce — not recognize. Nobody holds up a flashcard during a conversation in Tokyo.

Retrieval practice works because it strengthens a completely different process: the search and reconstruction pathway. When you try to recall a word from memory — actively searching for it, struggling, sometimes failing — you’re exercising the exact neural pathway you’ll need in real use. Each successful retrieval strengthens that pathway. Each failed retrieval followed by feedback tells your brain “this is important, store it more durably.”

The neuroscience supports this. Retrieval activates the hippocampus — the brain’s memory consolidation hub — more intensely than passive re-exposure. A 2009 study by Karpicke and Roediger using fMRI confirmed that successful retrieval produces distinct patterns of hippocampal activation that predict long-term retention. The brain literally encodes memories differently when you recall them versus when you merely re-encounter them.

Robert Bjork, one of the pioneers of desirable difficulty theory, explains it this way: storage strength (how well something is encoded) and retrieval strength (how easily you can access it right now) are separate dimensions. Re-reading increases retrieval strength temporarily — the information is right there in front of you, so it feels accessible. But it does almost nothing for storage strength. Retrieval practice, by contrast, directly increases storage strength — even when retrieval strength is low and the recall feels effortful.

This is the core paradox: the harder it is to remember something, the more you benefit from trying to remember it. Struggle isn’t a sign that learning has failed. It’s the mechanism through which learning happens.

The Illusion of Knowing

There’s a darker side to re-reading that makes it actively harmful for language learners: it systematically deceives you about what you actually know.

Psychologists call this metacognitive illusion — a disconnect between your subjective sense of knowledge and your actual ability to recall it. Re-reading inflates your confidence. You feel like you know 80% of your vocabulary deck after reviewing it. You actually know 40%.

This illusion has measurable consequences. In Roediger and Karpicke’s experiments, the study-study group predicted they would perform better on the delayed test. They were wrong. They were not just less knowledgeable — they were also less aware of their own ignorance.

For language learning, this creates a vicious cycle:

  1. You re-read your vocabulary list
  2. Everything feels familiar
  3. You think you know it
  4. You move on to new material
  5. Two weeks later, you can’t remember half of it
  6. You conclude that your memory is bad or that you’re not talented at languages

Your memory isn’t bad. Your study method was deceiving you about what you actually knew.

Retrieval practice breaks this cycle because it gives you honest feedback. When you try to recall a word and can’t — that moment of failure is information. It tells you exactly what you don’t know, so you can allocate your study time where it matters.

The Testing Effect Applied to Language Learning

The testing effect has been studied specifically in the context of foreign vocabulary learning, and the results are even more dramatic than for general knowledge.

A 2011 study by Karpicke and Blunt, published in Science, compared retrieval practice to elaborate concept mapping for learning new material. Retrieval practice produced 50% better retention — even though concept mapping involved deeper cognitive processing. The authors concluded that retrieval practice is “a powerful way to promote meaningful learning of complex concepts.”

For vocabulary specifically, a study by Pyc and Rawson (2010) demonstrated that retrieval practice with feedback was the single most effective method for learning foreign language word pairs — outperforming increased study time, keyword mnemonics, and elaborative encoding.

The practical implications are stark:

What doesn’t work well: - Re-reading vocabulary lists - Highlighting words in a textbook - Staring at flashcards and flipping them immediately - Listening to word lists on repeat - Copying words multiple times

What works: - Seeing the foreign word and attempting to recall the meaning before checking - Seeing the meaning and attempting to produce the foreign word - Writing sentences using the target word from memory - Recalling vocabulary in conversation (the ultimate retrieval practice) - Any activity where you struggle to remember before receiving feedback

How to Use Flashcards as Retrieval Practice (Not Passive Review)

Flashcards are the most natural vehicle for retrieval practice — but most people use them wrong. The difference between effective and ineffective flashcard use comes down to one thing: whether you actually try to recall before looking at the answer.

The wrong way

  1. See the front of the card: 覚える
  2. Think “hmm, I’m not sure”
  3. Immediately flip to see the answer: “to remember”
  4. Think “oh right, I knew that”
  5. Rate it as “Good” and move on

This is re-reading with extra steps. You never actually retrieved anything. The fluency illusion kicked in the moment you saw the answer, and you told yourself you “knew” it.

The right way

  1. See the front of the card: 覚える
  2. Pause. Actively search your memory. What does this mean?
  3. Generate an answer — even a guess. Say it out loud or say it in your mind.
  4. Then flip the card.
  5. Compare your answer to the correct one.
  6. Rate honestly: if you couldn’t recall it, mark it as failed — even if you “almost” had it.

The critical step is step 3: generating an answer before checking. This is where the testing effect happens. Without it, you’re doing passive review.

Productive recall vs. receptive recall

There’s an important distinction in how you orient your cards:

Productive recall is harder — and therefore more effective for building speaking ability. If you only do receptive recall, you’ll develop the ability to recognize words but not to produce them in conversation. This is exactly the comprehension-production gap that Swain identified.

The ideal practice: do both directions. Receptive recall builds your listening and reading vocabulary. Productive recall builds your speaking and writing vocabulary. Most SRS apps (including Anki) can generate cards in both directions from a single entry.

Sentence cards amplify the effect

Rather than isolated word pairs (覚える → “to remember”), create sentence cards that embed the target word in context:

Front: 新しい単語を覚えるのは難しい。 Back: It’s hard to remember new words. (覚える = to remember/memorize)

This combines retrieval practice with contextual encoding, collocational learning, and grammar exposure — all in a single card. You’re retrieving the meaning of the target word while simultaneously processing the sentence structure, the particles, and the surrounding vocabulary.

Spacing + Testing: The Combination That Dominates

The testing effect is powerful on its own. Combined with spaced repetition, it becomes the most effective learning method cognitive science has ever documented.

Here’s why the combination works:

Spaced repetition determines when you review — scheduling each item at the optimal moment before you would have forgotten it.

Retrieval practice determines how you review — forcing active recall instead of passive re-reading.

Together, they create a system where you practice recalling information at the exact moment when recall is most effortful — and therefore most beneficial. Each successful retrieval at the point of near-forgetting simultaneously strengthens the memory trace and delays the next forgetting curve.

This is what makes Anki (with FSRS) so effective when used correctly: it’s a machine that automates the scheduling of retrieval practice at optimal intervals. But the machine only works if you do the retrieval honestly — if you actually try to recall before flipping the card, and rate your performance truthfully.

If you flip cards passively and generously rate everything as “Good,” you’ve turned the most powerful learning tool available into a re-reading app with fancy scheduling. The algorithm can’t help you if you’re not doing the cognitive work.

The Desirable Difficulty Principle

The testing effect is a specific instance of a broader principle identified by Robert and Elizabeth Bjork: desirable difficulties.

The Bjorks’ insight (1992, updated in subsequent work) is that learning conditions that make performance worse in the short term often make retention better in the long term. Difficulties are “desirable” when they force deeper cognitive processing — encoding that creates more durable and more retrievable memories.

Examples of desirable difficulties in language learning:

Examples of undesirable difficulties:

The key distinction: a desirable difficulty challenges the retrieval process while keeping the encoding intact. If you can’t even understand what you’re studying, the difficulty isn’t desirable — it’s just noise.

The Bottom Line

The testing effect is not a study tip. It’s a fundamental property of how human memory works. Attempting to retrieve information from memory doesn’t just measure what you know — it changes what you know. Each act of effortful recall strengthens the memory in a way that no amount of re-reading can replicate.

For language learners, this means one thing: stop reviewing and start recalling.

Don’t re-read your vocabulary list — cover the translations and test yourself. Don’t passively flip through flashcards — generate the answer before checking. Don’t listen to word lists on repeat — pause after each word and try to produce the translation before hearing it.

The struggle is the point. The moment you can’t quite remember a word — that effortful search, that frustrating pause — is the exact moment your brain is building a stronger memory.

Re-reading feels productive. Retrieval feels hard. The hard thing is the thing that works.


This article is part of the series “The Science of Language Learning” — where we break down what research actually says about how adults acquire languages, and how to use that science to learn faster.

Previous in the series: Why Adults Can Still Learn Languages (But Not Like Children)

Next in the series: How Your Brain Learns New Sounds (And Why Some Are So Hard)


References: