Cognitive Reflection Test (CRT-3) Assessment
Assess cognitive reflection online with a CRT-3 score that separates lure answers from other misses and gives clearer coaching for teaching or self-review.CRT-3 Reflection Snapshot
- {{ question.id }}. {{ question.text }}
Outcome mix chart
This chart keeps the three scored outcomes together so it is obvious whether the run was mostly clean overrides, classic lure captures, or other misses.
What stands out
{{ interpretationLead }}
{{ interpretationFollowThrough }}
- {{ point }}
What to review next
- {{ step }}
Held-up vs review items
| Held up on this pass | Review next |
|---|---|
|
{{ row.heldUpLabel }}
{{ row.heldUpNote }}
|
{{ row.reviewLabel }}
{{ row.reviewNote }}
|
How to use this result
- {{ note }}
Score guide
| Score | Signal | Interpretation use |
|---|---|---|
| {{ row.score }} | {{ row.signal }} | {{ row.useCase }} |
| # | Item | Your answer | Outcome | Correct | Check habit | Copy |
|---|---|---|---|---|---|---|
| {{ row.id }} | {{ row.short }} | {{ row.answerDisplay }} | {{ row.statusLabel }} | {{ row.correctLabel }} | {{ row.checkHabit }} |
{{ row.explanation }}
- Tempting lure: {{ row.lureLabel }}. {{ row.whyLure }}
- Check habit: {{ row.checkHabit }}
- Reading note: {{ row.extraNote }}
Introduction
Cognitive reflection is the habit of stopping long enough to test the answer that first feels obvious. The original three-item Cognitive Reflection Test, usually called CRT-3, was built to expose that pause. Each question invites a quick but wrong reply, so the useful signal is not just whether someone lands on the right number, but whether they notice the trap and check it before committing.
This page keeps the classic Bat and Ball, Machines, and Lily Pads items, then reads the result in a more granular way. You still get the familiar raw score from 0 to 3, but each answer is also classified as Correct, Intuitive lure, or Other miss. That extra split matters because two people can both finish at 2/3 while failing for very different reasons.
After the third answer, the tool turns the run into a compact debrief. It shows a reflection snapshot, an outcome-mix chart, a held-up versus review table, item-level explanations, and structured exports in CSV, DOCX. The result also includes a reflection index, which penalizes classic lure answers more strongly than other wrong answers so that quick coaching summaries do not flatten all mistakes into one bucket.
The interpretation is deliberately cautious. These three questions are famous, and familiarity can raise scores even when the underlying reasoning habit has not changed much. For that reason, the result record keeps prior-exposure context and labels memorized or teaching-demo runs as practice-mode scores instead of clean fresh estimates.
Routine scoring stays in the browser. No server-side scoring helper is shipped for this page. Copied text and downloaded files can still reveal answers, settings, and interpretation notes, so treat exported results like any other small assessment record.
Technical Details
Frederick's original CRT used open-ended answers. This implementation can score the same three stems either as four-option multiple choice or as typed numeric entry, depending on how the session is configured. Typed entry is closer to the classic administration, while a four-option format is a studied variant that can make scoring and review easier without changing the item stems.
| Item | Scored correct | Classic lure | What an other miss often means here |
|---|---|---|---|
| Bat and Ball | 5 cents | 10 cents | The price relationship was set up badly, or the unit conversion drifted at the end |
| Machines | 5 minutes | 100 minutes | The production rate was not carried through cleanly after the first answer was rejected |
| Lily Pads | 47 days | 24 days | The doubling rule was noticed partly, but the final backward step did not land on the keyed answer |
| Raw score | Tool signal | Best use of that signal |
|---|---|---|
3/3 | Clean override | All three traps were handled, but novelty still matters before you read the result as a strong reflection estimate |
2/3 | Mostly reflective | Check whether the remaining miss is a lure capture or a later setup or arithmetic slip |
1/3 | Mixed or unstable | Use the item split before drawing conclusions, because one correct answer can mask very different failure paths |
0/3 | Strong first-impression pull | Start with the item explanations rather than the total, because the page is showing that the checking step never took hold on this pass |
The scorer normalizes a small range of answer formats before classifying them. On Bat and Ball, for example, 0.05 dollars is converted to 5 cents. On the other items, plain numbers and simple phrases such as five minutes or 47 days can still be recognized. That makes typed-entry review more readable without changing the keyed targets.
The page follows the standard keyed answer set used in most CRT write-ups, including 47 days for Lily Pads. Some later discussion has argued that 1 day can be defensible if someone interprets the question as the second half of the lake rather than the first half. This tool does not use that alternate scoring. Non-keyed answers stay in the other-miss bucket, which keeps the result consistent for coaching and export.
Everyday Use & Decision Guide
CRT-3 is most useful as a quick reflection check, a classroom debrief, or a short coaching exercise. It helps when you want to know whether someone slows down, rebuilds the relation in the prompt, and verifies the final number before locking in the answer. Because the tool separates classic lure answers from other misses, it is better for teaching than a bare total score would be.
It is much less useful when you need a broad statement about intelligence, diagnosis, or stable decision quality. The test is tiny, mathematically flavored, and widely circulated. Someone can score well because they are reflective, because they are strong with arithmetic, because they have seen the questions before, or because the response format made the structure easier to recognize. That is why this page keeps the interpretation narrow.
| Good use | What this page adds | What it still cannot settle |
|---|---|---|
| Teaching the pause-and-check habit | Separates classic lure captures from other wrong answers and gives an item-specific reset move | Whether the person has strong reasoning across unrelated tasks |
| Comparing a fresh pass with a familiar one | Keeps prior exposure in the result record and downgrades memorized runs to practice mode | Whether later improvement came from better reasoning rather than memory |
| Keeping a lightweight debrief record | Exports the score, settings, and item outcomes in CSV, DOCX | A full psychometric profile or a clinical interpretation |
Step-by-Step Guide
- Answer all three items in one sitting and avoid checking the well-known solutions first. If the session uses typed entry, enter the value plainly and keep the units consistent with the question.
- Read the headline result, but do not stop at the raw total. On a three-item test, one answer changes the score a lot.
- Use the outcome-mix chart and the review-first card to see whether the run was mostly clean overrides, classic lure captures, or other misses.
- Open the item table or detailed cards to read the tempting answer, why it fails, and the specific check habit that would have prevented the mistake.
- Copy or download the result only if you need a durable record. The export preserves both the answers and the interpretation settings used for that run.
Interpreting Results
Start with the total score, then move immediately to the pattern underneath it. On CRT-3, the distinction between a classic lure answer and some other wrong answer is often more useful than the headline score itself. A lure-heavy run suggests that the first answer is still winning too often. A run with few lures but some other misses suggests that the person is resisting the trap but still slipping on setup, rate, arithmetic, or units.
| Pattern | What it usually means | Best next move |
|---|---|---|
Fresh 3/3, no lure answers | The pause-and-check habit held on all three classic items | Keep the result narrow and use less familiar items if you want a tougher follow-up check |
2/3 with one intuitive lure | One problem still triggered the classic fast answer before verification took over | Rework that item by stating exactly why the tempting answer violates the prompt |
2/3 with one other miss | The lure was probably resisted, but the later setup or arithmetic step still failed | Slow down the equation, rate conversion, or final unit check on that item |
0/3 or 1/3 with several lures | The first answer is driving the run more than deliberate checking | Practice the pause itself before worrying about speed or score improvement |
| Perfect score with memorized or demo exposure | The result is useful as rehearsal evidence, but not as a clean fresh estimate | Read it as practice mode and switch to a less familiar reflection task for benchmarking |
The reflection index is most helpful when you need one compact number for a summary sheet. It rewards correct answers, subtracts a configurable penalty for lure picks, clamps the result to 0 to 100, and leaves other misses distinct. That makes it sharper than the raw total for quick coaching, but it should never replace the item table because different answer patterns can land on the same index.
Response format changes the reading slightly too. A four-option run reduces typing noise and makes review tidier. A typed-entry run is closer to the classic test, but a misplaced unit or small transcription slip can become an other miss even when the person partly understood the item. The item-by-item review is there to catch that difference.
Worked Examples
The same total can hide two different coaching problems
One respondent answers Bat and Ball with 10 cents and gets the other two items right. Another answers Bat and Ball with 15 cents and also finishes at 2/3. The total is identical, but the page reads them differently. The first pattern says the classic lure still won. The second says the person moved past the lure but set up the relationship badly. That is why the lure split matters.
Typed entry can preserve the right idea across units
Suppose someone types 0.05 dollars for Bat and Ball, five minutes for Machines, and 47 for Lily Pads. The scorer normalizes those entries to the canonical units and still marks the run as 3/3. That keeps numeric-entry sessions readable without forcing everyone to type the answer in exactly the same style.
A memorized rerun is useful, but not in the same way
Imagine a class demo where students already know the answers and then retake the three items. A later 3/3 still shows that they can reproduce the correct structure, and the exports may still be worth keeping for teaching records. The page nevertheless labels that context as practice mode, because a memorized clean score is not the same thing as a fresh reflective override.
FAQ:
Is CRT-3 an IQ test or a diagnosis?
No. It is a very short reflection screen built around three famous trick-style problems. It can be informative, but it is too small and too narrow to support a broad claim about intelligence or mental health.
Why does this page separate intuitive lure answers from other wrong answers?
Because they imply different breakdowns. A lure answer suggests the first response was accepted too quickly. An other miss suggests the person may have resisted the lure but still lost the thread later in the setup, arithmetic, or unit check.
Does the multiple-choice format invalidate the result?
Not by itself. The original CRT used open-ended responses, but later studies have validated four-option versions as workable variants. This page still keeps the response format in the result context because typed entry and guided choice are not identical experiences.
Why does prior exposure matter so much?
Because these three questions are now widely known. Prior exposure can lift scores or change the pattern of mistakes, so a familiar run is better treated as coaching evidence than as a clean first-pass benchmark.
Why does the Lily Pads item score 47 days here?
This page follows the standard CRT answer key, which scores 47 days as correct and 24 days as the classic lure. Some later commentary has argued that 1 day can be defensible under a different reading of the prompt, but that alternate key is not used here.
Are my answers sent to a server?
Routine scoring is local to the browser and there is no server-side scoring helper for this page. Copied results, downloaded files, and any saved session record can still expose the answers and settings.
Glossary:
- Cognitive reflection
- The tendency to stop, question the first tempting answer, and work through a more deliberate check.
- Intuitive lure
- The classic wrong answer that feels immediately plausible before the problem is checked carefully.
- Other miss
- A wrong answer that is not the classic lure and often points to setup, arithmetic, or unit-handling drift.
- Reflection index
- This page's bounded
0to100summary that rewards correct answers and penalizes lure answers more strongly than other misses. - Practice-mode score
- A result label used when prior exposure is high enough that the run should be treated as rehearsal or teaching evidence rather than as a fresh estimate.
References:
- Cognitive Reflection and Decision Making, Journal of Economic Perspectives, 2005.
- Effect of response format on cognitive reflection: Validating multiple-choice versions of the Cognitive Reflection Test, Behavior Research Methods, 2018.
- Investigating an alternate form of the Cognitive Reflection Test, Judgment and Decision Making, 2016.
- A limitation of the Cognitive Reflection Test: familiarity, PeerJ, 2016.
- An Alternative Correct Answer to the Cognitive Reflection Test, Frontiers in Psychology, 2021.