The first subtest the child encounters [on the WISC-III] is picture completion… For example, an item might depict a lamp, and the child would be required to glean that the light bulb is missing. As items progress in difficulty, so too do the missing details increase in irrelevant minutiae ... Why should the ability to notice these missing details be considered intelligent behavior? (Kwate, 2001, p. 229)
. . . the tasks featured in the IQ test are decidedly microscopic, are often unrelated to one another, and seemingly represent a “shotgun” approach to the assessment of human intellect. The tasks are remote, in many cases, from everyday life. (Gardner, 2011, p. 19)
Imagine that you are a parent whose child is being evaluated by a school psychologist to determine if the child should be placed in special education classes. You sit in the corner of the room behind your child as the school psychologist sits at a table with your child. After a few minutes of chatting with your child, the school psychologist engages them in a series of simple tasks and questions:
“I want you to count backwards for me from 20 to 1.”
“What’s the thing for you to do when you have broken something which belongs to someone else?”
“I am going to name two things which are alike in some way, and I want you to tell me how they are alike. Wood and coal: in what way are they alike? An apple and a peach?”
“What is a soldier?”
“What does ‘scorch’ mean?”
After more than an hour of this, the testing ends. A few days later, you get the results from the psychologist. They believe that your child’s intelligence is substantially below average and that the child belongs in special education classes. You are not sure you agree with the school psychologist. Although they appeared competent, you are skeptical because the questions and tasks seemed more like games and trivia questions than any serious evaluation of your child. The testing seemed superficial and unrelated to much of what happens in school.
In reality, the example items were part of the 1916 version of the Stanford–Binet intelligence test (Terman, 1916) and were designed for typical 8-year-olds at the time. While these test items are no longer in use, many modern intelligence tests have similar questions and tasks. Skeptics like Kwate (2001) and Gardner (2011) do not believe that such trivial tasks can measure something as important and complex as intelligence. Indeed, the impression that some of the items resemble games is accurate: some early intelligence test creators were inspired by children’s games and activities when they created some subtests (Gibbons & Warne, 2019).
In a way, the skeptics are correct. When they administer a test, psychologists are not really interested in whether an examinee can define words, count backwards, explain how two objects are similar, perform on a digit span task, or solve a matrix problem. The reason these tasks appear on intelligence tests is that they are manifestations of intelligence – not intelligence itself. In other words,
Most psychologists are no more interested in digit span than a physician is intrinsically interested in oral temperature. What these scientific practitioners are interested in are the correlates and the causes of individual differences assessed by these measures, because this network enables them to generate many more valid inferences than if they were ignorant of their client’s status on these dimensions. (Lubinski & Benbow, 1995, p. 936)
To elaborate on Lubinski and Benbow’s analogy, a person’s performance on an intelligence subtest is a symptom of a person’s intelligence level. By systematically examining the collection of these symptoms, a psychologist can infer how intelligent an examinee is. Thus, it is not the tasks themselves – trivial as they appear – that matter. Instead, these tasks are part of intelligence tests because they give clues into an examinee’s broader abstract reasoning and intelligence.
Evidence that Cognitive Tasks Measure Intelligence
The evidence is strong that Lubinski and Benbow (1995) are correct that intelligence test tasks provide insight into g. The evidence comes from multiple sources, but I will focus on two in this chapter. The first is the results of factor analysis, while the second is how the subtests on intelligence test scores correlate with other variables.
Earlier chapters and the Introduction have discussed factor analysis extensively, and the results from factor analysis often indicate that the items on intelligence tests indeed measure a global mental ability. Chapter 1 explained that any assortment of cognitive tasks will form a general intelligence factor. As a result, any task that engages thought or cognitive effort will relate to g in some way. Yes, some of these tasks will appear trivial, but they all relate to g. No one has ever found a cognitive task that has a correlation of zero with g.
But factor analysis alone is not enough to demonstrate that an item, subtest, or test measures intelligence. After all, factors are nothing more than groups of variables that correlate with one another. To establish that a test really measures intelligence, there must be evidence that test scores correlate with variables outside the test that are theorized to also be manifestations of intelligence (Kane, 2006, 2013). In technical language, these manifestations are called criteria (singular: criterion). From the earliest days of the field, test creators knew that a test score is useless if it does not correlate with or predict the criteria of real-life behavior. This requirement is an important part of validity, which is the degree to which a test score can be interpreted as measuring a psychological trait. The need for validity is why Sir Francis Galton examined whether there was a relationship between his measures of intelligence (e.g., head size, visual acuity, reaction time) and the criteria of education level and social class. Galton believed that smarter people would also be better educated and belong to a higher social class. When Galton did not find a correlation between his measures of intelligence and these criteria, he abandoned his measures of intelligence.
The next generation of intelligence test creators followed the same strategy of verifying that their tests measured intelligence. Alfred Binet’s criterion for his test score was whether the examinee was struggling in school (Binet & Simon, 1905/1916). When there was a correlation between Binet’s test score and the criterion, he understood this (correctly) as evidence that his test measured intelligence. Some of the tasks on Binet’s test indeed appeared trivial. For example, he tested whether children recognized that a piece of chocolate was food and a similarly sized wooden block was not. Another task on Binet’s original test required a child to determine which of two boxes of identical size and shape was heavier. Binet also asked children to generate words that rhymed with a word he gave them. The superficial appearance of these tasks did not matter. What mattered was how well the score they produced correlated with Binet’s criterion. In fact, a few tasks from Binet’s original test correlate so well with relevant criteria that similar items are still on intelligence tests today (Gibbons & Warne, 2019).
Later test creators followed Galton’s and Binet’s lead in investigating whether intelligence test scores correlated with criteria. Because intelligence tests are most often used in schools, many of these criteria are educational in nature. IQ scores correlate positively with grade-point averages (Coyle, 2015), performance on standardized educational tests (Deary, Strand, Smith, & Fernandes, 2007), the number of years of education in adulthood (Damian, Su, Shanahan, Trautwein, & Roberts, 2015), adult socioeconomic status (Deary, Taylor, et al., 2005), and being labeled as gifted (Wechsler, 2014). For a bunch of items that seem trivial, this is impressive. To argue that items on intelligence tests are too superficial to measure intelligence, one also has to argue that these educational criteria are also unrelated to intelligence, despite the fact that they are correlated with IQ scores – a hard argument to make. Even prominent modern critics of g concede that educational success requires the skills needed to perform well on an intelligence test.
Apart from the importance for showing that intelligence tests measure g, the correlations between IQ scores and educational outcomes are important in their own right because they can be used for making predictions. Even if one does not believe in the existence of g, it is still possible to make predictions about a child’s educational future based on an intelligence test score, despite the fact that much of the material on many intelligence tests is not explicitly taught in school. These scores can still help teachers and other school personnel know which children will need extra help or which are prepared for advanced course work.
Conclusion
The belief that intelligence test items are too trivial to measure a complex ability like intelligence implies that a person can ascertain what a test measures just by reading the test questions. In discussing this implication, one intelligence expert wrote:
Like reading tea leaves, critics list various superficialities of test content and format to assert, variously, that IQ tests measure only an aptness with paper-and-pencil tasks, a narrow academic ability, familiarity with the tester’s culture, facility with well-defined tasks with unambiguous answers, and so on. Not only are these inferences unwarranted, but their premises about content and format are often wrong. In actuality, most items on individually administered batteries require neither paper nor pencil, most are not timed, many do not use numbers or words or other academic-seeming content, and many require knowledge only of the most elementary concepts (up-down, large-small, etc.). (Gottfredson, 2009, p. 29)
Ascertaining what a test really measures requires more than just reading the items and making a subjective judgment. Indeed, this strategy for understanding test functioning is practically useless, and has been recognized as such for over 100 years. As Terman and his colleagues (1917, p. 135) stated, “The classification and criticism of tests by mere inspection may form an interesting pastime, but it can hardly be taken seriously as a contribution to science” (see also Clarizio, 1979; Reschly, 1980).
Instead of armchair judgments, critics must use data from factor analysis and correlations with criteria to understand what a test measures. The evidence is overwhelming that these tests measure intelligence – and measure it well. While test items may seem unimportant, they are the “yardstick” that scientists use to measure intelligence. The “yardstick” of intelligence tests does not reflect the ability that they measure (Gottfredson, 2009). To say otherwise is like claiming that a thermometer does not measure temperature because a thermometer only appears to display the expansion of mercury in a glass tube.
Items that appear superficial can (and do) measure a complex cognitive ability. Indeed, because of Spearman’s (1927) indifference of the indicator (see Chapters 1 and 7), the fact that items may appear trivial is irrelevant. What matters is the cognitive processes that people engage in to answer test items, and every cognitive task encourages examinees to demonstrate g to some extent.
From Chapter 8 of "In the Know: Debunking 35 Myths About Human Intelligence" by Dr. Russell Warne (2020)
We hope you found this information useful. For further questions, please
join our Discord server to ask a Riot IQ team member or email us at
support@riotiq.com. If you are interested in IQ and Intelligence, we co-moderate a related
subreddit forum and have started a
Youtube channel. Please feel free to join us.