How do IQ tests work? Discover how IQ tests measure reasoning & learning, how scores are calculated from norms, and what your IQ really means.
Dr. Russell T. WarneChief Scientist
Most people know that IQ tests measure intelligence, but the process of turning examinees’ answers into a meaningful score involves careful design, statistical analysis, and a lot of psychological research.
IQ tests are designed to measure general intelligence, which is the ability to reason, solve problems, think abstractly, and learn from experience. According to a consensus of over 50 leading intelligence researchers, intelligence is "a very general mental capability" that helps people comprehend our surroundings, "catch on" to new ideas, and "figure out" what to do in unfamiliar situations.
Unlike school tests that measure how well a person has learned knowledge that they have been explicitly taught, IQ tests are designed to measure a person’s capacity to think. That is why studying for an IQ test in the same way a person would for a history exam is usually ineffective.
What Tasks Appear on IQ Tests?
One surprising fact about IQ tests is that there is no single task or question type that appears on every test. Charles Spearman, an early pioneer in intelligence research, proposed a principle called "the indifference of the indicator." His idea was simple: as long as a task requires thinking, judgment, or reasoning, it can measure intelligence to some extent.Â
This flexibility is actually an advantage. Different situations call for different types of tasks; some work well face-to-face but would not well function in a group setting, while others might be better suited for testing children versus adults, or for people who speak different languages.
Common tasks include vocabulary questions, matrix reasoning puzzles that require the examinee to identify patterns in visual grids, arithmetic problems under time pressure, memory challenges, processing speed tasks, and spatial reasoning problems involving mental rotation of objects. The specific combination varies from test to test, but all tap into a person’s underlying cognitive ability.
How Professional IQ Tests Are Built
Creating a legitimate IQ test takes a lot of time and expertise in psychometrics, the science of psychological measurement. Test creators begin by conducting background research and choosing a theoretical framework. Most modern tests are based on the Cattell-Horn-Carroll (CHC) theory, which recognizes both general intelligence (g) and more specific cognitive abilities like verbal reasoning, fluid reasoning, and processing speed.
Next, creators select tasks that balance strengths and weaknesses across different item types. A good test samples broadly from various cognitive abilities rather than overemphasizing one type of thinking. After writing test questions, creators pilot them with sample groups. The data undergoes statistical analysis to identify which items work well and which need revision or removal. This cycle often repeats several times.
Finally, the test is administered to a norm sample; a large, representative group that future test-takers will be compared against. This norm sample defines what "average" performance looks like and is crucial for interpreting an IQ score.
The Norm SampleÂ
An examinee’s raw score -- that is, the number of questions you answered correctly -- does not mean much by itself. What matters is how a person’s performance compares to others in their age group, and that is why a norm sample matters.
A norm sample should mirror the population the test is designed for. For a test intended for American adults, the sample should reflect the U.S. adult population in age, education level, geographic location, and other demographic factors. If a test taker performs exactly at the average for the norm sample members in their age group, then the test taker obtains an IQ of 100. Better than average yields a score above 100, while below average results in a score under 100.
This is why legitimate tests document their norm samples thoroughly. Tests that use self-selected groups, like people who seek out free online IQ tests, cannot provide accurate scores because these groups do not represent the general population.
From Raw Scores to IQÂ
The conversion of a raw score to an interpretable score uses a statistical procedure that produces a score called a deviation IQ. The process requires a multistep mathematical process which is described in another article. Modern tests are designed so that the average score is 100 and the standard deviation is 15 points, with scores forming a normal distribution (sometimes informally called a “bell curve”).
In this system, about 68% of people score between 85 and 115, and about 95% score between 70 and 130. Scores become increasingly rare as numbers are further from above or below 100. This approach allows IQ scores to be compared across different ages; a 10-year-old with an IQ of 120 and a 40-year-old with an IQ of 120 are both performing at the same level relative to others in their age group, even though the adult has accumulated more knowledge and experience.
Why Reliability and Validity Are Important
Professional IQ tests must demonstrate two critical properties. Reliability refers to consistency. Because intelligence is considered stable throughout most of the lifespan, a test score should not fluctuate wildly between testings. High-quality tests report reliability coefficients, typically above .90, indicating strong consistency.
Validity addresses whether test scores actually measure what they claim to measure or whether there is evidence that they can be used for their intended purpose. Validity is not a yes-or-no property; validity is specific to particular uses and interpretations. Â
Professional test creators gather validity evidence from multiple sources: correlations with other established tests, predictions of real-world outcomes, and studies showing the test functions fairly across different groups.
Screening for Bias
Since the 1960s, psychologists have been concerned about potential bias in IQ tests. By the 1980s, screening for bias became standard practice before releasing any test to the public.
Bias occurs when one group has a systematic advantage or disadvantage for reasons unrelated to intelligence. It is important to note that average score differences between groups do not, by themselves, indicate bias. Investigators use sophisticated statistical methods that examine whether test items function differently for different groups after controlling for overall ability level.
Modern professionally developed tests undergo extensive bias reviews. Items that show bias are revised or removed before publication. This does not eliminate all group differences in scores, but it ensures the test is measuring intelligence fairly for all members its intended population.
What Your Score Really Means
An IQ score is a snapshot of a person’s cognitive abilities compared to others in their age group. An IQ of 115 means that a person performed better than approximately 84% of people in the norm sample members in their age group. An IQ of 85 means that the test taker performed better than approximately 16%.
A single IQ score, while useful, does not measure everything important. That is why many comprehensive tests report multiple scores. A global IQ measures general intelligence, while subscores reveal specific abilities like verbal reasoning, spatial ability, or working memory. These detailed scores can expose cognitive strengths and weaknesses that a single number obscures.
To learn more, watch this episode on what an IQ test measures on the RIOT IQ YouTube channel:Â