It has been a great experience creating the Reasoning and Intelligence Online Test (RIOT), and Iâm looking forward to the testâs upcoming launch. Because theory and real life do not always align, there have been some surprises along the way.
One unexpected discovery is how much work it would be to set time limits for the RIOTâs 15 subtests. Every few months, I find myself toiling away over the time limit data.
On the surface, it seems so easy to set a time limit: just figure out how long most people take to answer a question and set the time limit accordingly. The reality is much more complex.
First, there is the question of why to have a time limit at all. The answer is that having a time limit is needed to keep the test practical to administer. I discovered this years ago when I was a professor at Utah Valley University. I was in charge of designing and implementing my departmentâs standardized statistics final. It was administered in the universityâs testing center without a time limit for the first two or three semesters. Each semester, there was at least one student who took 3 or 4 hours to answer 97 multiple choice questions!
I donât know why these students took so long. Did they fall asleep? Were they quadruple-checking their answers? Did they have several panic attacks? Regardless of the reason, when the department shifted to in-class proctoring, it was not practical to set a 4-hour time limit. Instead, we were confined to the time that a class had its room reserved during finals week (1 hour, 50 minutes). Even with this âshortenedâ time limit, we never had any student complain that they could not finish the test in time. The lesson from this experience was that setting a time limit makes test administration practical.
Second, a time limit can be a useful feature of the testing situation. Some tests are supposed to measure how quickly examinees think or perform a task. Job performance tests for EMTs or air traffic controllers are a good example of situations where it is important to have a tight time limit.
For most tests, though, it is best to have a time limit that is tight enough to be practical, but not so low that the test starts measuring mental speed. The challenge is striking that balance. An appropriate amount of time depends on the nature of the task, the age of the examinees, the response format, the trait being measured, and more.
There is a large body of literature in the testing world about setting time limits. Unfortunately, much of it was not helpful in setting time limits for the RIOT. Most tests in these time limit studies are academic achievement tests or college admissions tests. That information is not helpful for setting a time limit for an intelligence test because the tasks are too different. (Research on the time limit for a reading comprehension test has little to say about setting a time limits for the Object Rotation subtest on the RIOT.)
Additionally, most of the research discusses setting a time limit for an entire test. But the RIOTâs items are set administered individually, and time limits are set for each item. Research about how examinees pace themselves so that they do not run out of time at the end of a test is irrelevant when the time is set for each item. On the RIOT, hurrying on earlier items does not mean a person gets more time on later items. So, pacing behaviors are irrelevant â and may even be counterproductive in some cases.
Because theory and prior research were not very informative, the solution was to collect data. In the item tryouts, I either (1) set no time limit and collected data about response times, or (2) set a time limit and measured the percentage of people who could answer the item. My goal was to give examinees enough time that 75% or more could answer every item. (This was eventually raised to 80% for the final version of the RIOT.) After 3-4 tryouts for every subtest, I found test limits that met this standard.
It sounds like a trial-and-error process â and to some extent, it was. But it involved a lot of scrutiny of items. In the process, I learned quite a bit about examineesâ response times. One consistent pattern is that examinees respond to easier items on a subtest more rapidly than they respond to harder items. We can see that, for example, in the graph below, which shows the cumulative percentage of people who have responded to a vocabulary item in a given time. In the graph, Item 5 is the easiest (and by more people at every time level), and Item 19 is a medium-difficulty item. Item 37 is the hardest. Generally, the more difficult an item is, the more slowly people respond to it.
But there are differences, too. How much item difficulty impacts response time can vary a lot. This is apparent in the graph below, which also shows three items from the Matrix Reasoning subtest. The three items are (from easiest to hardest) Items 44, 50, and 10. They have the approximately the same difficulty levels as the Vocabulary items in the graph above.
Not only do Matrix Reasoning items take longer to respond to, but there is much more separation between easy, medium, and hard items. For the three Vocabulary items, half of examinees answered the easy item (Voc5) in 5.75 seconds; it took just two seconds longer for half of examinees to answer the hard item (Voc37). For Matrix Reasoning, the time differences were much larger. Half of respondents could answer the easy item (MR44) in 10.75 seconds, but it took over double that time â 23 seconds â for half to answer the hard item (MR10). This means that that item difficulty has a much larger impact on how long it takes to respond to a Matrix Reasoning item than a Vocabulary item.
Another interesting finding is that the relationship between response time and item score is not consistent. This is apparent in the two graphs below, which show how quickly people respond to medium-difficulty items when they get the item right or wrong. On the Vocabulary subtest, people who respond more quickly are more likely to get the item correct:
But on the RIOTâs Matrix Reasoning subtest, the reverse is true. People who respond quickly on that subtest are more likely to get the item wrong. For Matrix Reasoning, slow and steady really does win the race.
From a practical perspective, these analyses reveal that setting a time limit is the right approach. The effect of the time limit is apparent in the two graphs above: the time limit kicks in where examinees getting the item incorrect are bunched up in a vertical line at the top right. This bunching occurs because these people did not respond before reaching the time limit, and their response (or lack thereof) was scored as incorrect.
On the other hand, the leveling off in the red line shows that as more time passed, fewer and fewer people were responding correctly to the item. This trend indicates that adding more time would not greatly increase the number of people answering an item correctly. In other words, most of the people who ran out of time would have answered incorrectly anyway.
This is just the start of what I have learned from analyzing time limit data. Much to my surprise, it is a complex and interesting aspect of test data. Time limits are much more important than people realize, and I believe it is time to give them their due.
Be the first to experience the RIOT IQ test. We will be launching with multiple paid IQ tests as well as a free IQ test. Sign up for exclusive updates and priority access.
Community
News & Press
Intelligence Journals & Organizations
Our Articles