In the 1940s, Smith and Tuddenham independently noticed and reported  that IQ test scores were rising over time.  The phenomenon did not draw much attention at the time and was idle until Lynn and then Flynn independently rediscovered and reported it in the early 1980s.  When The Bell Curve was published, Herrnstein and Murray named the secular rise the “Flynn Effect.”  Subsequently the FE has been the subject of a great deal of study and speculation, but has remained elusive and enigmatic.

Characteristics

The FE has been seen as a rise in IQ test raw scores at various rates in virtually every corner of the world.  In the U.S. the usually cited effect size is 3 points per decade.  In Estonia, the gains have been about 1.65 points per decade, but have accelerated in more recent years.  Japan and Korea have experienced rates around 7.7 points per decade, but the Korean gains were delayed by almost 30 years.

Numerous studies of the FE have found that score gains were larger in the lower half of the IQ distribution.  For example this was reported for Denmark, Britain, Turkey and Spain.  These observations have led to explanations related to improvements in nutrition and education.  But some countries have shown greater FE gains in the upper half: Brazil and the U.S. (in the National Longitudinal Study of Youth data).  Using data sets of about 1.7 million scores, Wai showed that the FE was present in the top 5% of the U.S. IQ distribution.

Some studies have shown no FE gains; Cotton found none in Australian children.  Scandinavian countries have shown rapid FE gains followed by an end to the gains, and by reversal (negative FE) in some cases.

Various studies have shown that the full amount of score gains has been observed in children from age 4 to 6.  Likewise studies of developmental quotients (DQ) have shown gains that are similar to IQ gains.  [DQs reflect rates of maturation--hold up head, sit up, stand, walk, jump, etc.  They are measured by the Griffiths Test and Bayley Mental Scales.]

A large number of studies have reported that FE gains were greater on abstract test items than on scholastic items.  This can be stated as a bias towards tests of fluid intelligence and away from crystallized intelligence.  The highly abstract Raven’s Progressive Matrices tests (Color, Standard, and Advanced), have shown strong FE gains.  The Wechsler has shown FE gains that are almost as large as for the Raven tests, and the performance component is almost twice the magnitude of the verbal component.

The Raven tests have shown gains of 18-20 IQ points per generation in many industrialized countries. Dutch gains were 21 points over 30 years.  Urban Chinese gained 22 points between 1936 and 1986.

At the same time that IQ scores have been rising, academic performance has been declining in the US and Britain.  SAT scores have been declining, even after correction for the change in demographics of those taking the test.  A quarter of the decline remains after the correction.

Hypothetical causes

Among the causes that have been proposed to explain the FE are these:

 Education                                   Decreased family size
 Increased exposure to testing               Heterosis
 Exposure to artificial light                More complex visual environment
 Nutrition                                   Child rearing practices
 and the use of Classical Test Theory versus Item Response Theory.

 

Education

It is possible that improved education has accounted for some test score gains, although such gains would most likely have no g loading.  The finding that FE gains are seen in preschool children (at the same magnitude as seen in adults) suggests that education is not a primary cause of the FE.

Increased exposure to testing

Two mechanisms have been proposed:  1)  Brand suggested that the use of timed tests has caused students to work faster by guessing more frequently (multiple choice).  While this may be a factor, FE gains are seen on tests that are untimed and on tests that do not use multiple choice.  2) Jensen mentioned “increasing test wiseness from more frequent use of tests.”  His point was that frequent testing may have the same sort of impact on test scores as the increase associated with test-retest.  This is the same process that is observed with learning and shows up in situations where test training has been used (as is common with the SAT). Both Brand’s and Jensen’s ideas would presumably cause test scores to increase without showing gains on g.

Nutrition and medical care

DQs have gained 3.7 points per decade, while IQ  gains of 3.9 points per decade have been seen in preschool children (age 4-6).  These and gains in the lower part of the IQ distribution are consistent with the argument that improved nutrition has contributed to the FE.  Other factors also agree: increased birth weight; head size measured in 1 year olds has increased by about 1.5 cm from 1930 to 1985 [head size to brain size correlation = 0.80]; and height gains that have increased by about 1 SD over 50 years (similar to DQ gains).

Arguments against nutrition as a cause include: studies of nutrition have shown that neither vitamins nor supplements have had any impact on intelligence; nutrition is unlikely to have declined over the past 20 years in those countries that have a negative FE.  Height did not decline in those countries; and contrary to the intelligence gains seen in Norway, height gains from 1969 to 2002 were mostly in the upper half of the intelligence range.

Exposure to artificial light

Artificial light stimulates the pineal gland in animals.  The pineal gland appears to play a major role in sexual development, hibernation, metabolism, and seasonal breeding.  The effect of stimulating growth is used by poultry farmers to increase their output.   There does not seem to be any data available for whether this effect happens in humans, but the speculation is that it might.  There has been an obvious increase in the use of electric lighting by humans over much of the time that the FE has been observed.

Decreasing family size

Low IQ people statistically have more children than high IQ people.  The high heritability of intelligence, therefore, is a source of dysgenic pressure.  If the average family size decreases, the reduced numbers of low IQ children should produce a net increase in the mean, which would show up as a FE gain.

In a very large study of Norwegian conscripts, the previously debated birth order effect was shown to be real, although not large.  If family size is declining in various groups, there must be a positive contribution to mean IQ due to fewer low IQ children being born.

Heterosis

Mingroni has argued that since the effects of the environment  (on intelligence) are so small, the possibility of a genetic effect should be investigated. Lynn argued that heterosis is unlikely for three reasons:
1 – There was little immigration in Europe before 1950 (the FE was present before that date).
2 – The FE for IQs and DQs are just as large in Europe as in other places.
3 – Studies of heterosis have shown little positive effect on IQ.

Perhaps the most important consideration in determining whether there is a heterosis effect was pointed out by Mingroni: If the FE is found within-families, the cause is not genetic.  The FE, however, has been shown to exist within families (conscripts in Norway).

Enriched visual environment

Greenfield and others suggested that the FE gains are caused by the ever increasing shift from verbal communication to visual and interactive media.  This is seen globally in the increased presence of movies, television, photography, video games, computers, puzzles, mazes, exploded views, etc.

The mechanism for this hypothesis is that the shift towards visual representations removes some of the novelty from tests, especially the culture reduced tests that have shown about double the FE gains as found in other tests.  This is particularly convincing for tests such as the Raven’s, which presents abstract figures in a matrix.  Several decades ago these figures may have been more baffling than they are today.

Child rearing practices

The FE has been seen throughout the world, in both developed and undeveloped countries where child rearing practices certainly vary greatly.  It is unlikely that this hypothesis is a significant factor, not only because of the cultural variation in child rearing practices, but also because the shared environment has essentially no impact on adult intelligence (per prior discussion).

Classical Test Theory (CCT) versus Item Response Theory (IRT)

Most studies in the literature are based on CTT and are presented without passing along the test item data.  This practice hides some of the information that could be extracted from a data set.  Test scores are given, but the latent constructs they are designed to measure cannot be examined.  IRT allows the researcher to examine the changes in underlying latent ability.  Thus, CTT can show differences in scores, even when there is no change in the latent variable.  An increase may be due to a general gain in real intelligence, or a decrease in the levels of difficulty of test items.

Alex Beaujean’s results using CCT and IRT to measure FE gains:

Peabody Picture Vocabulary Test-Revised
CCT       0.44 points per year
IRT        0.06 points per year

Peabody Individual Achievement Test-Math
CCT       0.27 points per year
IRT        0.13  points per year

The results clearly show that the FE essentially vanishes for the PPVT-R when IRT is used.  The PIAT-M gains are cut to half using IRT.  Ergo, the FE gains are determined by the methodology, leading to the concern that much of the literature has reported findings that might be quite different if IRT had been used.

 

Is the FE invariant?

Multigroup confirmatory factor analyses of several data sets showed that they were not invariant, meaning that FE gains were not gains on the latent variables that the tests were supposed to measure.  Besides providing insight as to the nature of the FE gains, the rejection of factorial invariance demonstrates that subtest score interpretations are necessarily different over time.

Real or hollow gains?

When Flynn begain his study and reports on the secular gains, he gave numerous examples of how extreme the gains have been, questioning that they could possibly be real.  For example the large gains in The Netherlands would mean that, by 1982 standards, the Dutch mean IQ in 1952 would have been 79.  Flynn commented “Has the average person in The Netherlands ever been near mental retardation?”  “Does it make sense to assume that at one time almost 40% of Dutch men lacked the capacity to understand soccer, their most favored national sport?”  He noted that there are not more gifted Dutch school children now and that patented inventions have shown a sharp decline.  He presented a number of similar arguments, all of which questioned the possibility that such large changes could have happened and, therefore, the score gains must be meaningless.

If the secular gains are real, they must show a g loading (this is called a Jensen Effect).  Numerous studies of the g loading of FE gains have shown that the gains were not on g.  The usual test for a Jensen Effect is the use of the method of correlated vectors.  When applied to data showing a FE, it has not shown a Jensen Effect.

Rushton used principal components analysis to show that data exhibiting a FE forms a cluster, thereby indicating that it is a real effect.  But the cluster does not overlap with the clusters formed by racial differences,  inbreeding depression scores (purely genetic), and g  factor  loadings  (largely genetic). The secular increase is, therefore, unrelated to g and other heritable measures.

As with virtually every aspect of the FE, different data sets produce different results.  Colom found that tests of crystallized intelligence did not show gains in g, but tests of fluid intelligence did.

Predictive bias

Jensen commented that the definitive test of whether FE gains are hollow or not is to apply the predictive bias test.  This means that two points in time would be compared on the basis of an external criterion (real world measurement, such as school grades).  If the gains are hollow, the later time point would show underprediction, relative to the earlier time. This assumes that the later group has not been renormed. [Earlier IQ points would exceed the performance of the later generation for the same IQ.] In actual practice tests are periodically renormed so that the mean remains at 100.  The result of this recentering is that the tests maintain their predictive validity, indicating that the FE gains are indeed hollow.  If the gains were real and the tests were renormed, people at a given IQ would be getting smarter and this would show up in the predictive validity.

Summary

•          The FE exists between birth cohorts.
•          It is found within sibships.
•          It appears early in life (before school age).
•          There are presumably multiple causes.
•          The gains are all or mostly hollow (not Jensen Effects).
•          There are serious methodological issues to be resolved and which may be a major cause of the gains.
•          The FE is not invariant over time.

The foregoing review is a greatly condensed version of my paper, Understanding the Flynn Effect.  It contains more detailed discussions of all of the points mentioned above, with identification of the researchers and their papers, as they apply to each topic.  A full reference list is included.  Anyone wishing to read the full paper may find it here:

https://sites.google.com/site/thirdstratum/papers-1

Bob Williams

Facebook comments:

Leave a Reply