🇺🇸The official website of Riot IQ

Jun 19, 2026·Taking an IQ Test

Where Can You Find Reliable IQ Tests?

Tired of fake online quizzes? Learn how to identify reliable IQ tests backed by real psychometrics. Read our guide and try the RIOT IQ test today!

Dr. Russell T. WarneChief Scientist

People come to the question of IQ testing from many different starting points. Some are curious about their own cognitive abilities. Some are in a clinical or educational setting where a score actually matters for a decision. Others have taken an online quiz and walked away with a number they're not sure how to interpret. Wherever the starting point, the central question is the same: how does one find a test that actually measures intelligence with accuracy?

The honest answer is that reliable IQ tests are not hard to find — but they do require some research. The internet is cluttered with instruments that imitate the look and language of professional testing while delivering results that have no meaningful connection to intelligence. Learning to separate the credible from the fraudulent is a matter of knowing what to look for.

What "Reliable" Actually Means

Before looking for a reliable IQ test, it is worth being precise about what that word means in a psychometric context. Reliability refers to the consistency of test scores — a reliable test produces similar scores when the same person takes it under similar conditions on separate occasions. This is measured through a correlation coefficient. Most professionally developed IQ tests achieve test-retest reliability between .80 and .95 with several weeks between testings, which is considered excellent in psychological measurement. The Wechsler Adult Intelligence Scale, for example, has a test-retest reliability of approximately .94 for its full-scale IQ score.

Reliability, however, is not enough on its own. A test also needs validity — evidence that its scores actually measure intelligence and not something else, such as familiarity with test formats, cultural knowledge, or response style. Establishing validity requires multiple types of evidence: content validity, which involves expert review of whether the test samples the domain of intelligence appropriately; construct validity, which examines whether scores correlate with other measures of intelligence; and criterion validity, which investigates whether scores correlate with relevant outcomes.

Together, reliability and validity are the foundation of any credible IQ test. A test can produce consistent scores and still be measuring the wrong thing. And a test can be theoretically well-designed but produce wildly inconsistent scores in practice. Both properties are required.

The Landscape of Available Tests

IQ tests available today fall into a few broad categories. Understanding the differences between them helps set appropriate expectations for accuracy and appropriate use.

Individually administered tests are the most rigorous option. These are delivered face-to-face (or via video call) by a licensed psychologist or specially trained clinician. The examiner controls the testing conditions, gives standardized instructions, and can observe the examinee directly. Tests in this category include the Wechsler Adult Intelligence Scale (WAIS), the Stanford-Binet Intelligence Scales, and the Woodcock-Johnson Tests of Cognitive Ability. All of these are full test batteries, meaning they assess multiple cognitive abilities rather than a single task. Their scores are drawn from large, representative norm samples and have been validated through decades of peer-reviewed research.

The obvious limitation of individually administered tests is access. They require a trained professional and typically cost several hundred dollars. In many regions, waiting lists are long. For someone who needs a full neuropsychological evaluation, this is the route. For general purposes, however, there are other credible options.

Group-administered tests are designed to be taken by many examinees simultaneously, often in educational or organizational settings. The Cognitive Abilities Test (CogAT) and many military aptitude tests fall into this category. These tests sacrifice some of the observational benefits of individual administration but maintain rigorous development standards when created by professionals.

Online tests represent a newer category, and one where the gap between credible and fraudulent is widest. What matters is whether the test behind the website was built to professional standards — not whether it happens to be delivered through a browser.

The Professional Standard

Not all IQ tests are created equal, and the gap between professional and non-professional tests is far larger than most people realize. Creating a psychological test is a highly technical undertaking governed by a set of published standards. The Standards for Educational and Psychological Testing, developed jointly by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME), addresses professional and technical issues of test development and use in education, psychology, and employment. Published collaboratively by these three organizations since 1966, the Standards represent the gold standard in guidance on testing in the United States and worldwide.

These are not optional guidelines that test creators consult if they feel like it. They represent the minimum expectations for any test used to make meaningful inferences about a person's cognitive ability. Here is what a professional IQ test requires:

Named authorship. Legitimate test creators are proud of their work. They attach their names and credentials to their tests. Anonymity is not a quirk — it is a means of avoiding accountability.

Technical documentation. A professional test comes with a technical manual describing the development process, the statistical properties of the scores, the intended population, and the appropriate uses and limitations of the results.

A representative norm sample. IQ is a relative measure. A score of 115 only means something because it is interpreted against a reference group — the norm sample. For a psychometric test to be reliable, its results must be consistent across time. Large population omnibus measures such as IQ tests should average about 100 participants per age group in their norm sample, and the normative data should be adequate in both size and representativeness. A norm sample drawn from self-selected internet users is not representative of the general population and will produce distorted scores.

Bias screening. Professional tests are reviewed by diverse expert panels and subjected to statistical analyses for differential item functioning before release.

Alignment with scientific theory. Professional tests are built on established, peer-reviewed theories of intelligence — most commonly the Cattell-Horn-Carroll (CHC) model, which organizes cognitive abilities in a hierarchical structure with general intelligence (g) at the top and narrower abilities below.

The Problem with Most Online Tests

A quick search for "IQ test" returns dozens of websites offering instant results, often free, often with scores presented in official-looking graphics. Most of these have nothing to do with intelligence measurement as psychologists understand it.

The majority of online IQ tests are created by individuals without training in psychometrics or psychology. Their questions often lack calibration, their scoring is arbitrary, and their claimed accuracy is unsupported by evidence. Many websites use inflated scores to encourage users to pay for certificates or detailed reports. Because these tests lack norm data and peer review, their results bear no relationship to scores on established measures of intelligence.

The inflation issue is worth understanding. An amateur test with no proper norm sample cannot produce a meaningful IQ score. It can produce a number, but that number is not calibrated against any reference population. Many of these tests systematically report high scores because users who receive flattering results are more likely to purchase a certificate or share their score online, driving traffic back to the site.

There are also outright scams. Reviews of certain online IQ test platforms describe unauthorized recurring charges, difficulty reaching customer service, and no actual test being delivered after payment. The use of psychometric-sounding language — "normed," "validated," "clinically designed" — does not indicate that the test was actually developed to any standard.

How to Evaluate Any Test You Encounter

When encountering an IQ test — whether online or otherwise — a few specific questions narrow the field quickly.

Who built it, and what are their qualifications? The author's name and professional background should be discoverable without difficulty. A credential in psychometrics, educational psychology, or a related field is meaningful. Tests created by non-professionals cannot be held to any standard — and the criteria laid out in Section 3 apply here directly.

Has it appeared in peer-reviewed research? Searching Google Scholar for the name of a test reveals whether researchers have used it and subjected its scores to independent scrutiny. A test with no scholarly citations has not been independently evaluated. A test that appears in multiple published studies has at least been used by professionals who examined its properties.

What is the norm sample? The test's website or technical documentation should describe the group against which scores are compared — how many people are in it, how it was collected, and whether it is representative of the intended population of test takers. The WAIS-IV, for instance, was standardized on a sample of 2,200 people in the United States ranging in age from 16 to 90, with demographic characteristics modeled after proportions from U.S. Census Bureau data. That level of detail is what documentation should look like. If a test cannot provide comparable information, there is no basis for interpreting any score it produces.

Does the test measure multiple abilities or just one? Tests that consist of a single item type — for example, only matrix reasoning or only vocabulary — provide a narrower picture of intelligence than tests that sample multiple cognitive abilities. Charles Spearman demonstrated that virtually any task requiring reasoning or judgment will correlate with general intelligence, so a single-format test is not useless. But a battery of subtests captures more of the structure of intelligence and produces a more stable, interpretable score.

Are the standards the test was built to meet stated clearly? Legitimate test developers cite the Standards for Educational and Psychological Testing and describe their efforts to comply with relevant provisions. They also acknowledge the appropriate and inappropriate uses of their test scores. Blanket claims that the test is "valid" without specifying what it is valid for are scientifically meaningless.

Specific Tests Worth Knowing About

A brief overview of the professionally developed tests most likely to be encountered:

Wechsler Adult Intelligence Scale (WAIS) — Currently in its fifth edition, the WAIS is the most widely used individually administered intelligence test for adults. It is a comprehensive battery covering verbal comprehension, visual spatial ability, fluid reasoning, working memory, and processing speed. The WAIS is the standard against which most other adult intelligence tests are compared and has been subject to extensive validation research over many decades.

Stanford-Binet Intelligence Scales — Now in its sixth edition, the Stanford-Binet traces back to Alfred Binet's original 1905 test. The modern version is a comprehensive battery for individuals from age 2 through adulthood and is widely used in educational and clinical settings.

Woodcock-Johnson Tests of Cognitive Ability — Currently in its fifth edition, the Woodcock-Johnson is built closely around the Cattell-Horn-Carroll theory and is particularly useful in educational contexts where specific cognitive strengths and weaknesses need to be identified.

Raven's Progressive Matrices — A single-format test consisting entirely of matrix reasoning items, the Raven's is one of the best measures of fluid intelligence and general ability available. It is widely used in research and group settings. Its single-format design means it captures only part of the picture of intelligence, but it does that part very well.

Cognitive Abilities Test (CogAT) — A widely used group-administered test for schoolchildren that assesses reasoning in verbal, quantitative, and nonverbal domains. Commonly used for gifted program screening.

The Role of Reliability Data in Choosing a Test

One of the quickest ways to separate a serious test from a superficial one is to look at the reliability data. Professional tests publish this prominently. A test that provides no reliability information has either never calculated it or the numbers are too poor to report.

Reliability in IQ testing is assessed in several ways. Test-retest reliability measures whether a person scores consistently across two administrations. Internal consistency measures whether the different items within a test or subtest all hang together as measures of the same ability. Split-half reliability examines whether two halves of a test produce comparable scores. All three, when calculated on a properly developed test with an appropriate sample, should yield coefficients in the .80 to .95 range.

These numbers matter because they establish the degree of measurement error in any given score. A score from a test with a reliability of .90 has a smaller standard error of measurement than a score from a test with a reliability of .70. The standard error of measurement defines the range within which a person's true score likely falls — it is what allows a report to say something like "this person's score of 112 falls within a confidence interval of 107 to 117." Without reliability data, no such statement can be made.

Free Versus Paid Tests

Free online IQ tests are everywhere, but free professional IQ tests are rare. Test development is genuinely expensive. Building a representative norm sample, conducting bias screenings, running pilot studies, and producing a technical manual requires years of work and significant funding.

A legitimate test that charges a fee is not automatically better than a free test, but the economics of professional development make it difficult to deliver a properly normed, validated instrument for free. Free tests created by researchers for specific research purposes can be credible — but they are typically designed for controlled conditions and may not be appropriate for casual self-assessment.

There is a middle path: sample or preview versions of professional tests. These allow examinees to become familiar with the format of a test without committing to the full assessment. They are useful for understanding what kinds of tasks appear on a battery and for reducing test anxiety on the actual administration.

A Note on What Tests Cannot Tell You

Finding a reliable IQ test is the first step. Interpreting the score correctly is the second. IQ scores are strong predictors of broad life outcomes at the population level. Higher scores are associated with better educational attainment, higher income, lower rates of certain health problems, and longer life. IQ is among the most robust and well-validated predictors of life outcomes in all of psychology. Meta-analyses show especially strong relationships with academic achievement, training, and job performance. These relationships are real and consistent, but they are probabilistic — they describe tendencies across large groups, not certainties for individuals.

IQ is not a measure of character, creativity, moral judgment, or potential to excel in any specific domain. A person with a high IQ who chooses not to pursue formal education is not destined for any particular outcome. A person with a more modest score who works with focus and in an environment well-suited to their strengths may exceed what population-level statistics would predict.

A test score also has a confidence interval — a range within which the true score is likely to fall. A responsible test report presents this clearly. The number is an estimate, bounded by measurement error, and it is most useful as one data point among others rather than a definitive characterization of a person's mind.

The First Professional Online IQ Test

For those who want a professionally developed IQ test accessible without a clinical referral, the Reasoning and Intelligence Online Test (RIOT) is the first online test built to meet the full standards expected of a professional psychological assessment.

I created the RIOT after more than 15 years of research in intelligence measurement and the publication of dozens of peer-reviewed articles and a book on the topic. The development process followed the same rigorous steps as any individually administered professional test: grounding in the Cattell-Horn-Carroll theory of intelligence, expert review of item content by panels from cognitive, educational, and developmental psychology, statistical item analysis, comprehensive bias screening, and norming on a representative U.S. sample.

The RIOT produces a full-scale IQ score along with index scores for Verbal Reasoning, Fluid Reasoning, Spatial Ability, Working Memory, Processing Speed, and Reaction Time — a profile that reflects the hierarchical structure of intelligence that CHC theory describes. Its technical manual documents all psychometric properties in detail. The test meets the standards for psychological testing established by AERA, APA, and NCME.

For anyone curious about where their cognitive abilities stand but looking for a starting point without scheduling a clinical evaluation, the RIOT offers an option that has been held to the same standards as the tests that have been used by psychologists for decades. A free sample version is available to preview the test format before committing to a full administration.

Sources

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. AERA.
National Institute of Environmental Health Sciences. (n.d.). Principles for evaluating psychometric tests. NIEHS Report on Evaluating Features and Application of Neurodevelopmental Tests in Epidemiological Studies.
Wechsler, D. (2008). Wechsler Adult Intelligence Scale — fourth edition: Technical and interpretive manual. Pearson.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401–426.
Gottfredson, L. S., & Deary, I. J. (2004). Intelligence predicts health and longevity, but why? Current Directions in Psychological Science, 13(1), 1–4.
Richardson, K., & Norgate, S. H. (2015). Does IQ really predict job performance? Applied Developmental Science, 19(3), 153–169.
Warne, R. T. (2021). In the know: Debunking 35 myths about human intelligence. Cambridge University Press.
Nisbett, R. E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D. F., & Turkheimer, E. (2012). Intelligence: New findings and theoretical developments. American Psychologist, 67(2), 130–159.
Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement. Intelligence, 35(1), 13–21.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.
McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37(1), 1–10.
Warne, R. T. (2025). Technical manual for the Reasoning and Intelligence Online Test, version 1.0. RIOT IQ.
Hülsheger, U. R., Maier, G. W., & Stumpp, T. (2007). Validity of general mental ability for the prediction of job performance and training success in Germany. International Journal of Selection and Assessment, 15(1), 3–18.