Cognitive Reflection Test (CRT-3) Assessment
Take the classic three-item Cognitive Reflection Test, separate correct answers from intuitive lures, and review a 0 to 100 reflection index.Reflection snapshot
Score status
- {{ question.id }}. {{ question.text }}
Result details
Share result
Share this result page with someone you trust to review your answers and result.
Outcome mix
What stands out
{{ interpretationLead }}
{{ interpretationFollowThrough }}
- {{ point }}
What to review next
- {{ step }}
Held-up vs review items
| Held up on this pass | Review next | Copy |
|---|---|---|
|
{{ row.heldUpLabel }}
{{ row.heldUpNote }}
|
{{ row.reviewLabel }}
{{ row.reviewNote }}
|
How to use this result
- {{ note }}
Score guide
| Score | Signal | Interpretation use | Copy |
|---|---|---|---|
| {{ row.score }} | {{ row.signal }} | {{ row.useCase }} |
Item explanation
| Item | Outcome | Why the answer scores that way | Tempting lure check | Reading note | Copy |
|---|---|---|---|---|---|
|
Q{{ row.id }}. {{ row.short }}
{{ row.answerDisplay }}
|
{{ row.statusLabel }} | {{ row.explanation }} |
{{ row.lureLabel }}
{{ row.whyLure }}
|
{{ row.extraNote || row.checkHabit }} |
Answer review
| # | Item | Your answer | Outcome | Correct | Check habit | Copy |
|---|---|---|---|---|---|---|
| {{ row.id }} | {{ row.short }} | {{ row.answerDisplay }} | {{ row.statusLabel }} | {{ row.correctLabel }} | {{ row.checkHabit }} |
Introduction:
Short reasoning questions can feel settled before the reasoning has really happened. A number in the wording suggests a quick operation, the answer looks tidy, and only a later check shows that the relationship was different. Cognitive reflection is that pause between the first plausible answer and the answer that still works after the problem is restated.
The original three-item Cognitive Reflection Test, usually shortened to CRT-3, became well known because each item creates a tempting wrong answer before the slower solution becomes clear. The bat-and-ball item hides a difference equation inside a price total. The machines-and-widgets item asks for a production rate rather than a bigger headline number. The lily-pads item depends on doubling, where half coverage arrives one day before full coverage, not halfway through the calendar.
A raw total from 0 to 3 is easy to read, but it can hide the difference between two kinds of errors. An intuitive lure means the classic tempting answer won. An other miss means the standard lure may have been avoided, but the setup, arithmetic, or unit check still failed. That distinction matters in teaching, coaching, and self-review because the fix is different.
Prior exposure is the largest practical caution. The CRT-3 items are famous, and many people have seen the answers in classes, books, websites, or social posts. A memorized perfect score does not show the same fresh override that a first encounter would show. Multiple-choice versions also change the experience because the possible answers are visible from the start and can cue recognition.
CRT-3 should be treated as a narrow reflection check, not as a broad measure of intelligence, carefulness, or decision quality. It is best used to discuss how fast impressions can mislead short quantitative reasoning, and how a final consistency check can catch them.
How to Use This Tool:
Complete the three-item assessment before using the result. The current version uses four answer choices for each item and advances as choices are recorded.
- Press
Start Assessmentand answer from memory, without looking up the well-known solutions. - For each question, choose the option you can defend after checking the wording. The progress bar should move from
0 / 3 answeredto3 / 3 answered. - After the third choice, read the
Reflection snapshotfirst. It shows the rawScore,Reflection index, and the count of correct, lure, and other-miss outcomes. - Check
Overall result,Strongest item,Review first, andProfile spreadbefore drawing a conclusion from the total. - Use
What stands out,What to review next, andHeld-up vs review itemsto decide which item needs a lure explanation, rate setup, or doubling check. - Open
Item explanationandAnswer reviewwhen you need the keyed answer, the classic lure, and the check habit for a class note or private debrief.
If the report does not appear, one of the three choices is still unselected. Return to the highlighted item in the question navigator and record a choice.
Interpreting Results:
The most useful reading starts with the outcome mix, not the score alone. A 2/3 score with one Intuitive lure means one first impression was accepted too quickly. A 2/3 score with one Other miss means the standard lure may have been resisted, but a later setup or arithmetic step still drifted.
| Visible pattern | Most useful reading | Review cue |
|---|---|---|
3/3, no lures |
The classic traps were overridden on this pass. | Check whether the items were fresh or already memorized. |
Two or more Intuitive lure outcomes |
Fast answers dominated the run more than random arithmetic noise. | Use Item explanation to say why each tempting answer fails. |
No lures, one or more Other miss outcomes |
The obvious trap may have been avoided, but the final problem setup was still unstable. | Use the displayed Check habit for the missed item. |
0/3 or 1/3 |
The total is too small to interpret without item-level review. | Start with Review first rather than comparing the score broadly. |
The Reflection index is a 0 to 100 summary that rewards correct answers and penalizes classic lure picks. It helps compare outcome mixes inside this report, but it is not a published CRT standard and should not replace Answer review.
The strongest false-confidence warning is familiarity. If the answers were already known, treat the result as practice evidence and use newer or less familiar reflection items for a cleaner follow-up.
Technical Details:
Frederick's CRT-3 is built around conflict between an intuitive response and a consistency check. The keyed answers are 5 cents for Bat and Ball, 5 minutes for Machines, and 47 days for Lily Pads. The standard lures are 10 cents, 100 minutes, and 24 days.
In four-option form, each recorded choice maps to one of those numeric answers. A choice equal to the keyed value is Correct. A choice equal to the classic lure is Intuitive lure. Any other offered choice is Other miss. The category is deterministic, so the same three choices always produce the same score, outcome mix, and review order.
Formula Core
In these equations, c is 1 for a correct item and 0 otherwise, C is the number correct, and L is the number of intuitive lures. The reflection index clamps below 0 and above 100. For example, 2 correct answers and 1 lure gives ((2 x 3) - (1 x 2)) / 9 = 4/9, which rounds to an index of 44.
| Item | Keyed answer | Classic lure | Why the lure fails |
|---|---|---|---|
| Bat and Ball | 5 cents |
10 cents |
Ten cents makes the bat cost $1.00, only 90 cents more than the ball. |
| Machines | 5 minutes |
100 minutes |
The number of machines and widgets both scale up, so the production time stays at the per-machine rate. |
| Lily Pads | 47 days |
24 days |
A doubling process is half full one day before it is full, not halfway through the calendar. |
| Profile condition | Displayed label | Interpretation boundary |
|---|---|---|
3/3 |
Clean override pattern | All three lures were caught, but prior exposure can still explain the result. |
| At least two lures | Lure-dominated pattern | The run is mostly about first-answer capture on these items. |
| No lures and at least one other miss | Reflective but inconsistent | The lure was avoided, but the final setup or check still broke down. |
2/3 after earlier rules |
Mostly reflective | One item still needs review, usually because one lure or miscoded final pass slipped through. |
1/3 after earlier rules |
Mixed check pattern | One item held up and the remaining items need item-level diagnosis. |
0/3 after earlier rules |
Under-checking pattern | No item landed on the keyed answer in this pass. |
The Lily Pads item uses the standard CRT key of 47 days. Some later discussion has argued for an alternate reading of the wording, but the conventional CRT scoring model still treats 47 days as the keyed answer and 24 days as the classic lure.
Responsible Use Note:
CRT-3 is a brief psychology assessment screen, not a clinical diagnosis, IQ test, hiring measure, or full reasoning profile. Routine scoring happens in the browser, but copied result links, chart exports, answer-review exports, and shared files can reveal the selected answers, keyed answers, outcomes, and settings. Treat them as assessment notes.
Worked Examples:
One lure inside a mostly correct run
A run with 10 cents, 5 minutes, and 47 days finishes with Score 2/3. Outcome mix shows one Intuitive lure, Review first points to Bat and Ball, and the reflection index rounds to 44 because the lure penalty applies.
Same raw score, different miss type
A run with 15 cents, 5 minutes, and 47 days also scores 2/3, but Bat and Ball becomes Other miss rather than Intuitive lure. The Overall result can read Reflective but inconsistent because the classic lure was avoided but the equation still did not close.
All three classic lures
Choosing 10 cents, 100 minutes, and 24 days gives Score 0/3, three lures, and Lure resistance 0%. What to review next should focus on why each first impression violates the problem wording.
Known answers in a class demonstration
A student who has seen the items before may choose all three keyed answers and receive Clean override pattern. The result can still support a discussion of the traps, but the Answer review should be described as practice evidence rather than a fresh measure of spontaneous cognitive reflection.
FAQ:
Is CRT-3 an IQ test?
No. It is a three-item cognitive reflection screen built around famous quantitative traps. It can describe this run, but it is too narrow for broad claims about intelligence.
Can I type my own answer?
No. This version uses four-option choices for each item. Pick the option that best matches the answer you would commit to after checking the wording.
Why separate intuitive lures from other wrong answers?
They point to different fixes. A lure means the classic first answer won. An other miss means the standard lure was not selected, but setup, arithmetic, or units still failed.
Does multiple choice change the result?
It can change the experience because the options make possible answers visible. Read the result as a four-option CRT-3 run, not exactly the same as an open-ended administration.
Why does prior exposure matter so much?
The items are widely known. Seeing the answers earlier can raise the score by recall, even when the current run does not show a fresh pause-and-check process.
Are my answers sent to a server for scoring?
Routine scoring runs in the browser. A copied result link or downloaded export can still contain the recorded answers and outcomes, so share those files only with people who should see the assessment notes.
Glossary:
- Cognitive reflection
- The habit of checking a tempting first answer against the actual structure of the problem.
- Intuitive lure
- The classic wrong answer that feels plausible before the problem is worked carefully.
- Other miss
- A wrong answer that is not the classic lure, often tied to setup, arithmetic, or unit drift.
- Reflection index
- A
0to100summary that rewards correct answers and penalizes classic lure picks. - Lure resistance
- The share of the three items that did not land on the classic lure answer.
- Prior exposure
- Having seen the CRT-3 items or answers before, which can make a result less informative as a fresh reflection check.
References:
- Cognitive Reflection and Decision Making, Journal of Economic Perspectives, 2005.
- Effect of response format on cognitive reflection: Validating a two- and four-option multiple choice question version of the Cognitive Reflection Test, Behavior Research Methods, 2018.
- A limitation of the Cognitive Reflection Test: familiarity, PeerJ, 2016.
- An Alternative Correct Answer to the Cognitive Reflection Test, Frontiers in Psychology, 23 August 2021.