The TLAP-R (Jouve, 2013a) is an untimed, non-verbal reasoning test prepared with perceptual material. It consists in 8 matrices, each one including 6 lines of 15 patterns of which 4 have been deleted. The examinee is asked to find the 4 missing patterns. The first version (limited to 5 matrices) of this test has been prepared by Xavier Jouve in mid 2000 for a research project.
The chiefly involved cognitive process mostly depends on fluid intelligence and the test is minimal-knowledge based. The material used to prepare the matrices is numerical but not heavily mathematically based so that the TLAP-R is suitable to assess any person with only basic arithmetic knowledge.
In order to complete the matrices, to find which patterns are missing, the TLAP-R requires inductive reasoning. The task requires figuring out a specific logical and governing rule from a chaotic situation. Spearman (1927) described this process and considered it as the requisite for the eductive part of g. However, the TLAP-R showed a strong relationship with crystallized intelligence even without measuring it directly (Jouve, 2013b).
Item Response Theory (IRT) relates characteristics of items (item parameters) and characteristics of individuals (latent traits) to the probability of a positive response. A variety of IRT models could be applied for dichotomous as well as polytomous data. In each case, the probability of answering correctly or endorsing a particular response category can be represented graphically by an item response function (IRF). This function represents the non-linear regression of a response probability on a latent trait, such as inductive reasoning, spatial ability, verbal ability, or any other psychological factor (Hulin, Drasgow & Parsons, 1983).
In the Two-Parameter Logistic Model (2PLM) as expressed by Birnbaum (1968), the first parameter is the discrimination parameter referred to with letter a, which is proportional to the slope of the ICC at the ability scale point b, which is the second parameter and named difficulty parameter. This model is the generalization of the One-Parameter Logistic Model (1PLM) firstly introduced by Rasch (1960) that only takes into account the item difficulty as given by the b-parameter.
Additionally, it also exists a model that includes a third parameter for guessing, called the c-parameter. However, according to Hambleton, Swaminathan, & Jane Rogers (1991) the Three-Parameter Logistic Model (3PLM) is only of proper application for multiple-choice items in which the examinee needs to choose among diverse given alternatives. In fact, for free-response items like those of the TLAP-R, in which the examinee needs to engender an own original answer, the assumption of no guessing is quite probable and thus the model does not require the c-parameter.
Participants. The analyses performed in this investigation are based on the data collected from 1,003 respondents over a two-years period, between 2000 and 2002. The mean age of the individuals whose data were used was 24.43 years (SD = 8.11). This cohort included 45% female, 53.8% male and 1.2% missing gender data.
Method. The characteristic curves, for both the Item Empirical Functions (IEF) and the Item Response Functions (IRF) were drawn for each of the TLAP-R 48 matrix lines are all shown in Figures 1 to 2 respectively. These Item Response Functions (IRF) and could eventually be used in order select, discard or revised questions that would not match with psychometric standards. Test items must be those with steeper slopes because they are more useful to discriminate subjects and separate them into different ability levels. As a matter of consequence, a high a-parameter value results in a slope having a sharp inclination.
Estimation of the parameters has been performed with the Normal Ogive by Harmonic Analysis Robust Method (NOHARM; Fraser, 1986; Fraser & McDonald, 1988). This process of parameters estimation is a variant of the one described in McDonald (1982, 1985). In a general point of way, it is close to that of Christofferson (1975), with a main difference in the use of ordinary least squares where Christofferson utilizes generalized least square (McDonald, 1997). NOHARM yields some stable parameter estimates (Ackerman, 1988; Miller, 1991).
The calculation of the quadrature nodes and weights for empirical values followed the Levine & Drasgow (1988) method. On one hand, the theta scale of the IEF was divided into twenty-five points corresponding to the equally spaced percentile ranks from 2 to 98: -2.05, -1.56, -1.29, -1.08, -.92, -.77, -.64, -.52, -.41, -.31, -.20, -.10, 0, .10, .20, .31, .41, .52, .64, .77, .92, 1.08, 1.29, 1.56 and 2.05. On the other hand, the IRF theta scale that represents ability was sectioned into sixteen equally separated intervals of .5, ranging from -4 to 4. The midpoints of each interval were -3.75, -3.25, -2.75, -2.25, -1.75, -1.25, -.75, -.25, .25, .75, 1.25, 1.75, 2.25, 2.75, 3.25 and 3.75.
Results & discussion. According to their ICCs, the TLAP-R lines, each one representing an item, appeared suitable to discriminate examinees along the ability scope. Although the psychometric efficiency of test lines seemed varied, and not perfectly homogenous, the TLAP-R did not show any unappropriated item. Apparently, some lines of the TLAP-R were producing noise in the low-end of the proficiency scale, especially in empirical data.
This might be explained because in some lines of the test matrices, the examinees could be tempted to reproduced the patterns occurring with the highest frequency in the entire problem, and earned positive responses without the correct reasoning. In a such case, the test-taker’s behavior might be interpreted as a form of guessing.
Additionally, as can be seen in the two figures, some items are very similar the ones to the others. This finding suggests that they are redundant. However, this is seen as inevitable because of the inherent nature of the design of the TLAP-R matrices. In fact, in some of them, the person who is administered the test needs to look at the matrix globally, making an effort of abstraction, in order to complete the problem as a whole, i.e. to respond to more than a single line at once.
Ackerman, T. A. (1988). Comparison of multidimensional IRT estimation procedures using benchmark data. Paper presented at the ONR Contractors’ meeting, Iowa City,IA.
Birnbaum, A. (1968). Some latent traits models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novik, Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Christofferson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 14(40), 5-32.
Fraser, C. (1986). NOHARM: An IBM PC Program for Fitting Both Unidimensional and Multidimensional Normal Ogive Models of Latent Trait Theory. Amindale, Australia: The University of New England.
Fraser, C., & McDonald, R. P. (1988). NOHARM: Least Squares item factor analysis. Multivariate Behavioural Research, 23, 267-269.
Hambleton, R. K., Swaminathan, H., & Jane Rogers, H. (1991). Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publications, Inc.
Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item Response Theory: Applications to Psychological Measurement. Homewood, IL: Dow Jones Irwin Inc.
Jouve, X. (2013a). TLAP: Revised. Retrieved from http://www.cerebrals.org/tlap/
Jouve, X. (2013b). Correlations between the TLAP-R and other measures. Retrieved from http://www.cogn-iq.org/archives/770
Levine, M. V., Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53, 161-176.
McDonald, R. P. (1982). Linear versus nonlinear models in item response theory. Applied Psychological Measurement, 6(4), 379-396.
McDonald, R. P. (1985). Unidimensional and multidimensional models for item response theory. In D. J. Weiss (Ed.), Proceedings of the 1982 Item Response and Computerized Adaptive Testing Conference (pp 65-87). Minneapolis, MN: University of Minnesota.
McDonald, R. P. (1997). Normal Ogive Multidimensional Model. In W. J. van der Lindenand R. K. & Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 257–269). New York, NY: Springer-Verlag.
Miller, T. (1991). Empirical Estimation of Standard Errors of Compensatory MIRT Model Parameters Obtained from the NOHARM Estimation Program (Research report ONR91-2). The American College Testing Program.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Spearman, C. (1927). The abilities of man. New York, NY: McMillan.
Xavier Jouve, Ph.D. is a former psychometrician, author of the Epreuve de Performance Cognitive (EPC), a test published by the Editions du Centre de Psychologie Appliquée (Paris), the French branch of Pearson Education, Inc. Among others, he is a member of the Psychometric Society. Expertise: test development, cognitive abilities, giftedness, verbal reasoning, numerical culture fair sequences, IRT, MDS