Can Amateur IQ Tests Give Accurate Scores?

Can you trust free online IQ scores? Discover why amateur tests fail strict psychometric standards. Read the article and take the professional RIOT test!

Dr. Russell T. WarneChief Scientist

Can Amateur IQ Tests Give Accurate Scores?

Creating an IQ test looks deceptively simple from the outside. Write some questions, put them on a website, and assign a score at the end. But the gap between a collection of questions and a professionally developed IQ test is enormous — comparable to the gap between a homemade stethoscope and an FDA-approved medical device.

As someone who spent over 15 years studying intelligence before creating a professional online IQ test, I know firsthand how much work goes into developing a test that produces meaningful, trustworthy scores. The process is difficult, expensive, and requires years of specialized training. That is precisely why amateur tests — no matter how well-intentioned — cannot produce accurate IQ scores. This article explains why, using a specific example that illustrates the most common problems.

What Makes a Test "Professional"?

Psychometrics is the science of psychological measurement, and it is the discipline that governs how intelligence tests are created, evaluated, and used. A professional IQ test is not defined by its appearance, its marketing, or even its questions. It is defined by the scientific process behind it. That process is outlined in the Standards for Educational and Psychological Testing, published jointly by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education [1].

The Standards describe the expectations that psychological tests should meet before they are used to make decisions about people. These expectations cover validity, reliability, fairness, test documentation, and appropriate use. Professional test creators spend years learning how to meet these expectations. Non-professionals are typically unaware they exist.

A Case Study: cognitivemetrics.com

To make these concerns concrete, consider cognitivemetrics.com, a website that markets itself as offering "professional-grade IQ tests" that are "professionally developed and normed on millions of testees." The site hosts multiple tests and claims its assessments "load on intelligence at a comparable level to professional tests such as the Wechsler Adult Intelligence Scales and Stanford-Binet Intelligence Scales." These are bold claims. Examining them against the standards of professional test development reveals significant problems.

Who Created These Tests?

The site does not identify any psychometrician, psychologist, or named professional as responsible for the tests. The CAIT (Comprehensive Adult Intelligence Test) was created by a Reddit user known as u/EqusB. The CORE (Comprehensive Online Reasoning Exam) is described on the site as "developed by the r/cognitiveTesting community." The AGCT is described as a "restored" version of a World War II-era military test. The site's own tagline is "Community Trusted — Developed by the r/cognitiveTesting community."

None of these origins involve a professional test creator. The Standards for Educational and Psychological Testing expect that tests are developed by qualified professionals whose expertise can be independently verified [1]. When a test is attributed to a Reddit username or an online community, there is no way to verify training in psychometrics, no professional license at risk if scores are misleading, and no ethical code being followed. Legitimate test creators are proud to attach their real names and credentials to their work. Anonymity prevents accountability.

The Problem With "Restoring" Old Tests

The AGCT is presented as a major selling point of the site. The marketing describes it as "professionally developed by the US Army" and normed on "more than 12 million soldiers." This framing implies that the AGCT on cognitivemetrics.com carries the scientific weight of the original Army test. It does not.

The original AGCT was indeed developed by the U.S. Army during World War II as a classification tool for military recruits. It was a legitimate instrument created under professional oversight for a specific population and purpose. But the version on cognitivemetrics.com is a reproduction of the test questions hosted on a website, with the original score distribution "re-normalized by correcting for skew" — adjustments made by unnamed community members, not by the U.S. Army or a licensed psychometrician.

The claim that the test was normed on "12 million soldiers" is particularly misleading. Those 12 million soldiers were tested by the U.S. Army in the 1940s under controlled military conditions. That norm data belongs to the U.S. Army. It was collected from a specific population (drafted soldiers, overwhelmingly male, predominantly aged 18–35) in a specific era. Claiming those norms as the basis for scoring a different version of the test, administered online to a self-selected civilian population 80 years later, is not scientifically justifiable.

The Flynn effect compounds this problem [2]. Average performance on IQ tests rose approximately 3 points per decade throughout the 20th century. Norms from the 1940s are dramatically outdated for scoring modern test-takers, regardless of statistical adjustments applied. The site claims "there are absolutely no Flynn effects for this test," but this claim is based on a single 1980 comparison with the ASVAB and a validation study of 58 community members with self-reported professional test scores — not on rigorous, peer-reviewed research.

Self-Selected Samples Are Not Norm Samples

The CAIT was normed on approximately 1,692 filtered first-time attempts from English-speaking countries. The AGCT's online validation data comes from 1,734 test-takers with a mean score of 121.7 — nearly 1.5 standard deviations above the population average. These are not representative norm samples. They are convenience samples of self-selected individuals who sought out IQ tests on the internet, many of whom are members of online communities dedicated to cognitive testing.

A sample with a mean IQ of 121.7 is not representative of the general population, where the mean is defined as 100. When a new test-taker's performance is compared to this group, their score will be systematically deflated — they will appear less intelligent than they actually are. This is a fundamental flaw in the scoring system that affects every score the test produces.

Professional IQ tests avoid this problem by actively recruiting norm samples that match the target population's demographics. The Wechsler Adult Intelligence Scale, for example, was normed on over 2,000 adults stratified to match the U.S. Census on age, sex, race/ethnicity, education level, and geographic region [3].

"Comparable to the WAIS and Stanford-Binet"?

The site claims its tests "have been proven to load on intelligence at a comparable level to professional tests such as the Wechsler Adult Intelligence Scales and Stanford-Binet Intelligence Scales." The AGCT's reported g-loading of .925 is calculated from the convenience sample described above, with corrections for range restriction and SLODR. These are standard psychometric procedures, but they are only as good as the data they are applied to. When the sample is self-selected IQ-testing enthusiasts, the assumptions underlying range restriction corrections are difficult to verify.

More fundamentally, a high g-loading does not make a test "comparable" to the WAIS or Stanford-Binet. Those tests have book-length technical manuals, representative norm samples of thousands, decades of independent peer-reviewed research, established reliability and validity data, comprehensive bias screening, and named professional creators with verifiable credentials. A high g-loading calculated from a convenience sample does not substitute for any of these things.

No Technical Manual

Professional IQ tests are accompanied by book-length technical manuals that document every aspect of the test's development [4]. cognitivemetrics.com does not have a technical manual for any of its tests. The methodology page consists of a few sentences about monitoring for cheating and "regularly renorming." The AGCT has a wiki page with some validation statistics, but a wiki page written by unnamed contributors is not a technical manual — it has not been reviewed by outside experts and does not document the full development process.

The CAIT's Own Disclaimer

It is worth noting that the CAIT's own documentation includes a disclaimer: "The CAIT is not a substitute for a professional IQ test. Scores obtained using the CAIT, if taken correctly, are designed to give an accurate estimation of FSIQ. However, the CAIT is not a diagnostic tool and cannot be used in any capacity other than as an informative tool."

This disclaimer is appropriate — the CAIT creator is being transparent about the test's limitations. But it stands in tension with the way cognitivemetrics.com markets the CAIT alongside other tests under the banner of "professional-grade IQ tests."

Putting It All Together

cognitivemetrics.com serves as an instructive example of how amateur tests are presented as professional instruments. The site makes claims of professional-grade quality, comparability to the WAIS and Stanford-Binet, and norming on millions — but the tests are created by anonymous community members, lack technical manuals, rely on convenience samples or decades-old military data, and have not been used in peer-reviewed research.

The people behind the site may be genuinely passionate about cognitive testing. But examinees deserve to know the difference between a community-built quiz that may provide a rough cognitive exercise and a professionally developed instrument that produces scientifically meaningful IQ scores.

Take the First Ever Professional Online IQ Test

The Reasoning and Intelligence Online Test is the first online IQ test that actually meets professional standards for psychological assessment. It was created by Dr. Russell Warne, who has over 15 years of experience in intelligence research.

What makes the RIOT different from the countless online IQ tests found with a quick internet search? Most of those tests are created by amateurs without proper training in psychometrics. The RIOT clearly stands out as the first-ever professional online IQ test. The RIOT underwent the same rigorous development process as traditional in-person IQ tests used by psychologists, including expert review, the first-ever proper U.S.-based online norm sample, and compliance with educational and psychological testing standards from APA, AERA, and NCME. The RIOT reports not just an overall IQ score but six index scores — Verbal Reasoning, Fluid Reasoning, Spatial Ability, Working Memory, Processing Speed, and Reaction Time — providing a comprehensive picture of cognitive strengths and areas for growth.

References

[1] American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. AERA. https://www.testingstandards.net/

[2] Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95(1), 29–51. https://doi.org/10.1037/0033-2909.95.1.29

[3] Warne, R. T. (2020). In the know: Debunking 35 myths about human intelligence. Cambridge University Press. https://doi.org/10.1017/9781108593298

[4] Wechsler, D. (2008). Wechsler Adult Intelligence Scale—Fourth edition: Technical and interpretive manual. Pearson.

[5] Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.

[6] Spearman, C. (1904). "General intelligence," objectively determined and measured. American Journal of Psychology, 15(2), 201–293. https://doi.org/10.2307/1412107

[7] American Psychological Association. (2017). Ethical principles of psychologists and code of conduct. https://www.apa.org/ethics/code

[8] Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. https://doi.org/10.1037/0033-2909.124.2.262

[9] Warne, R. T. (2025). Technical manual for the Reasoning and Intelligence Online Test, version 1.0. Riot IQ.

[10] International Test Commission. (2017). ITC guidelines for the large-scale assessment of linguistically and culturally diverse populations. https://www.intestcom.org/page/28