I Am a Word Test: An Open-Ended and Untimed Approach to Verbal Ability Assessment

 

This study investigates the psychometric properties of the I Am a Word (IAW) test, an open-ended and untimed assessment of verbal ability. The test aims to reduce guessing probability, increase accessibility, and enhance inclusivity by allowing examinees to produce their own answers and work at their own pace. The primary objectives were to evaluate the test's reliability, concurrent validity, and standard score comparisons with established verbal ability assessments. A sample of 1,083 examinees from the 2023 revision of the IAW test was analyzed, yielding excellent internal consistency reliability. Concurrent validity was established through strong correlations with the Wechsler Adult Intelligence Scale - Third Edition, Verbal Comprehension Index and the Reynolds Intellectual Assessment Scale Verbal Intelligence Index. Additionally, the IAW test demonstrated comparable standard scores with these external measures. The results indicate that the IAW test is a reliable and valid measure of verbal ability, with potential for a more inclusive and engaging approach to cognitive assessment in the field of psychometrics.

IAW, WAIS, RIAS, psychometrics, verbal ability, open-ended, untimed, assessment, reliability, validity

 

Psychometric testing has been a cornerstone of psychological assessment and educational research for over a century. Traditional psychometric tests often rely on closed-ended, multiple-choice questions to assess cognitive abilities (Cronbach & Meehl, 1955). While these tests have shown strong reliability and validity, they have limitations, such as the increased probability of guessing and limited inclusivity for diverse populations (Kane, 2006). Open-ended and untimed tests have been proposed as alternative approaches to address these limitations (Messick, 1994). The present study investigates the psychometric properties of the I Am a Word (IAW) test (Jouve, 2023), an innovative open-ended and untimed verbal ability assessment, and its potential for a more inclusive and engaging approach to cognitive assessment in the field of psychometrics.

The IAW test is grounded in the theories of open-ended testing and measurement, which posit that allowing examinees to produce their own answers can reduce guessing probability and increase the validity of the assessment (Jiao & Lissitz, 2017). Furthermore, untimed tests have been shown to reduce test anxiety, improve accessibility, and enhance inclusivity (Cizek & Burg, 2006). The IAW test aims to address these concerns by providing a unique approach to verbal ability assessment, with 100 open-ended verbal problems and no time constraints.

In the field of psychometrics, various theories and models have been used to analyze test data and estimate test properties, such as reliability and validity. Classical Test Theory (CTT) and Item Response Theory (IRT) are two prominent approaches employed in this context (Lord & Novick, 1968; van der Linden & Hambleton, 1997). The current study employs both CTT and IRT methods to evaluate the psychometric properties of the IAW test, aiming to provide a comprehensive analysis of its reliability and validity.

The concurrent validity of the IAW test is evaluated by examining the correlations between IAW scores and established verbal ability assessments, such as the Wechsler Adult Intelligence Scale - Third Edition, Verbal Comprehension Index (WAIS-III VCI; Wechsler, 1997), and the Reynolds Intellectual Assessment Scale Verbal Intelligence Index (RIAS VIX; Reynolds & Kamphaus, 2003). These comparisons are crucial in establishing the test's external validity and comparability to other well-established measures of verbal ability (Anastasi & Urbina, 1997).

This study aims to contribute to the literature on open-ended and untimed testing by providing evidence of the psychometric properties and potential benefits of the IAW test. The results of this study will have implications for the development and application of innovative cognitive assessments in educational and clinical settings, as well as for future research in the field of psychometrics.

Method

Research Design

This study employed a correlational research design to investigate the psychometric properties of the I Am a Word (IAW) test. The design focused on examining the relationships between IAW scores and scores from other established verbal ability assessments, as well as analyzing the internal consistency reliability and standard score comparisons. The use of a correlational design allowed for the assessment of concurrent validity, which is essential for evaluating the effectiveness of a new psychometric instrument (Cohen et al., 2003).

Participants

The primary sample consisted of 1,083 examinees who participated in the 2023 revision of the IAW test. The sample included individuals from diverse demographic backgrounds, with ages ranging from 15.67 to 75.75 years (M = 31.26, SD = 13.07), 32.77% identifying as female, and 67.23% identifying as male. The majority of participants in the study held a bachelor's degree or higher (57.89%), followed by those with some college or an associate degree (26.32%), and those with a high school diploma or less (15.79%). According to the statistics on occupations, 35.04% of individuals worked in management, business, science, and arts occupations, followed by 24.87% in service occupations, 15.28% in sales and office occupations, 12.75% in natural resources, construction, and maintenance occupations, 6.49% in production, transportation, and material moving occupations, and 5.57% in healthcare support and healthcare practitioners and technical occupations.

Materials

The IAW test consists of 100 open-ended verbal problems, designed to assess verbal ability in a non-traditional manner. The test items were developed based on existing literature on verbal ability assessment (e.g., Carroll, 1993; Sternberg, 1987) and refined through expert review and pilot testing. The test is untimed, allowing examinees to work at their own pace and encouraging reasoning over guessing. Each item has multiple correct answers, broadening the scope of evaluation for verbal ability.

To assess concurrent validity, two established verbal ability assessments were used: the Wechsler Adult Intelligence Scale - Third Edition, Verbal Comprehension Index (WAIS-III VCI; Wechsler, 1997) and the Reynolds Intellectual Assessment Scale Verbal Intelligence Index (RIAS VIX; Reynolds & Kamphaus, 2003). These instruments were selected based on their widespread use and strong psychometric properties (Sattler, 2001; Urbina, 2004).

Procedures

Before data collection, participants provided informed consent and completed a demographic questionnaire (American Psychological Association, 2017). The study consisted of two parts: a correlational study with two samples of participants who completed the IAW, WAIS-III, or RIAS tests in a controlled laboratory setting, and an online study with a separate sample of participants who completed the IAW via a dedicated webpage.

For the correlational study, participants first completed the IAW test, which was administered using a computerized format (Jiao & Lissitz, 2017). After completing the IAW test, participants then took the WAIS-III (Wechsler, 1997) or RIAS (Reynolds & Kamphaus, 2003) tests, both administered using pencil-and-paper formats. This order was maintained for all participants in order to minimize potential learning effects from the different formats of the tests (Balota et al., 2007). Breaks were provided between tasks to minimize fatigue (Cohen et al., 2003). To ensure data quality, research assistants were trained in data collection and administration procedures (Creswell & Creswell, 2017), and participants' responses were recorded and double-checked for accuracy (Cohen et al., 2003).

For the online study, participants accessed the IAW test through a dedicated webpage. Participants were provided with clear instructions on how to complete the test, and their responses were collected electronically (Jiao & Lissitz, 2017). In the context of the online study, participants were asked for biographical information and also to report their scores on a pre-established list of group tests if taken. The online administration of the IAW test allowed for the collection of a larger and more diverse sample, which could help to further validate the test's psychometric properties (Kane, 2006).

In both the correlational study and the online study, data was collected in accordance with ethical guidelines and the principles of informed consent, confidentiality, and anonymity were maintained throughout the research process (American Psychological Association, 2017).

Data Analysis

The psychometric properties of the IAW test were analyzed using both Classical Test Theory (CTT) and Item Response Theory (IRT) approaches (Lord & Novick, 1968; van der Linden & Hambleton, 1997). Internal consistency reliability was assessed using Cronbach's alpha coefficient (Cronbach, 1951), with a value of .80 considered satisfactory and a value of .90 or higher recommended for cognitive assessments (Aiken, 2000; Nunnally & Bernstein, 1994).

IRT analyses included the Kernel Estimator to draw Option Characteristic Curves (OCC) and the 2-Parameter Logistic Model (2PLM) using Bayes Modal Estimator to draw Item Characteristic Curves (ICC). These methods allowed for the examination of the functioning of individual test items and the detection of correct but unusual responses that might otherwise be overlooked (Embretson & Reise, 2000; van der Linden & Hambleton, 1997). By identifying both expected and unexpected valid answers, the analyses ensured that the test's reliability was not compromised and that examinees were not misevaluated.

Concurrent validity was assessed by calculating Pearson correlation coefficients between IAW scores and scores from the WAIS-III VCI and the RIAS VIX (Cohen et al., 2003). Additional correlations were computed with the Scholastic Aptitude Test (SAT, the 2005 to 2016 version; College Board, 2023; Lemman, 1999) and its subtests, as well as the Armed Forces Qualification Test (AFQT; Welsh, Jr. et al., 1990).

To compare the IAW Verbal Ability Index (VAI) with other standard scores, means and standard deviations were computed for two groups of examinees who also completed the WAIS-III or the RIAS. These comparisons allowed for the assessment of the IAW test's ability to produce scores that are consistent with existing measures of verbal ability.

Ethical Considerations

All study procedures were conducted in accordance with the ethical guidelines of the American Psychological Association (APA, 2017). Informed consent was obtained from all participants before their involvement in the study, and confidentiality was maintained throughout the data collection and analysis processes. Additionally, participants were debriefed at the conclusion of the study and provided with information on the study's findings and implications.

Results

Statistical Analyses

The data obtained from the IAW test and other measures were analyzed using various statistical techniques to assess the reliability, validity, and comparability of the IAW test scores. These analyses included internal consistency analysis (Cronbach's Alpha), Item Response Theory (IRT) methods, Pearson correlation coefficients, and comparison of means and standard deviations for the IAW Verbal Ability Index (VAI) and other standard scores.

Reliability Analysis

The internal consistency of the IAW test scores was assessed using Cronbach's Alpha coefficient. This analysis was conducted using the responses of 1,083 examinees who completed the 2023 revision of the IAW test. The resulting Cronbach's Alpha coefficient for the IAW test was .95, indicating excellent internal consistency and meeting psychometric recommendations (Aiken, 2000; Nunnally & Bernstein, 1994). The derived standard error of measurement was 3.32.

IRT Analysis

Item Response Theory (IRT) methods were employed to further examine the IAW test items, including the Kernel Estimator for Option Characteristic Curves (OCC) and the 2-Parameter Logistic Model (2PLM) using the Bayes Modal Estimator for Item Characteristic Curves (ICC; van der Linden & Hambleton, 1997). The OCCs were instrumental in identifying correct but unusual responses, while the ICCs confirmed the discrimination potential of each item. Figure 1 presents the ICCs for the IAW test items.

 

Figure (1) Item Characteristic Curves, 2PLM (Bayes Modal Estimator, N = 1,083)

 

The results of the test of fit indicated that the model provided a good fit to the data, as all item-level chi-square tests were non-significant (p > .05), with values ranging from .79 to 24.18. The global chi-square value was 547.28 with 897 degrees of freedom, indicating that the model was a good fit to the data overall.

The estimated parameter values for each item were used to calculate the item-level goodness-of-fit statistics, including the item fit residual, outfit mean square error (MSE), and infit MSE (Hu & Bentler, 1999; Kline, 2011). All items showed acceptable goodness-of-fit statistics, with item fit residuals within the recommended range of ±2.50 and outfit and infit MSE values between .50 and 1.50. The parameter estimates themselves also showed good variability across items, with slopes (a) ranging from .40 to 1.94 and thresholds (b) ranging from -4.72 to 4.75.

The model fit indices provided additional evidence of the adequacy of the model fit. The comparative fit index (CFI) was .97, indicating a good fit of the model to the data. The root mean square error of approximation (RMSEA) was .06, also indicating a good fit of the model to the data.

Concurrent Validity Analysis

Pearson correlation coefficients were calculated to assess the concurrent validity of the IAW test. These analyses involved comparing the IAW test scores with those from the Wechsler Adult Intelligence Scale - Third Edition, Verbal Comprehension Index (WAIS-III VCI; Wechsler, 1997) and the Reynolds Intellectual Assessment Scale Verbal Intelligence Index (RIAS VIX; Reynolds & Kamphaus, 2003). The correlation between the IAW test and the WAIS-III VCI was .82 (N = 100), indicating a strong positive relationship. Figure 2 illustrates this relationship. Similarly, the correlation between the IAW test and the RIAS VIX was .84 (N = 98), further demonstrating the concurrent validity of the IAW test.

 

Figure (2) Scatter Plot, IAW vs. WAIS-III VCI (N = 100), r = .82 (p < .05)

 

Additional correlations were computed between the IAW test scores and the Scholastic Aptitude Test (SAT, the 2005 to 2016 version; College Board, 2023; Lemman, 1999) and its subtests, as well as the Armed Forces Qualification Test (AFQT; Welsh, Jr. et al., 1990). The IAW test correlated at .73 with the Composite SAT (N = 38), .64 with the Reading subtest, .62 with the Writing subtest, and .57 with the Math subtest. Additionally, the IAW test correlated at .61 with the AFQT (N = 28).

Standard Scores Comparisons

Two groups of examinees who completed either the WAIS-III or the RIAS, in addition to the IAW test, were used to compare the IAW VAI with other standard scores. The mean IAW VAI for the IAW-WAIS group (N = 100) was 126.00 (SD = 14.60), while the mean IAW VAI for the IAW-RIAS group (N = 31) was 111.90 (SD = 9.10). These means were closely aligned with the external measures' means: 126.50 (SD = 16.40) for the WAIS-III VCI and 112.00 (SD = 7.80) for the RIAS VIX. Independent samples t-tests revealed no significant differences at p < .05 between the means of the IAW test and the WAIS-III VCI (t(98) = .23, p = .81) or between the means of the IAW test and the RIAS VIX (t(29) = .06, p = .95). These results further support the concurrent validity of the IAW test scores.

Limitations

Several limitations should be considered when interpreting the results of this study. First, the sample size for the concurrent validity analyses was relatively small, which may have limited the generalizability of the findings. Future research should aim to replicate these results with larger, more diverse samples. Second, the study focused primarily on the verbal ability aspect of intelligence, and it remains unknown how well the IAW test would perform in measuring other aspects of intelligence. Lastly, the study did not include a test-retest reliability analysis, which would provide valuable information about the stability of the IAW test scores over time. Future studies should consider conducting a test-retest reliability analysis to further establish the psychometric properties of the IAW test.

Discussion

The current study aimed to evaluate the psychometric properties of the IAW test as a measure of verbal intelligence. The results of the study provide strong evidence of the reliability and validity of the IAW test, with implications for theory, practice, and future research. In this section, we discuss the main findings, their implications, and limitations, as well as suggestions for future research.

The IAW test demonstrated excellent internal consistency, as indicated by a Cronbach's Alpha coefficient of .95. This finding is consistent with previous research on the reliability of intelligence tests, which has consistently found high levels of internal consistency for established measures of verbal intelligence (Nunnally, 1978; Salvia & Ysseldyke, 2007). The IRT analysis provided further evidence of the test's reliability, showing that the items discriminated well between individuals with different levels of verbal ability (Embretson & Reise, 2000).

Concurrent validity was supported by strong positive correlations between the IAW test and established measures of verbal intelligence, such as the WAIS-III VCI and the RIAS VIX. These correlations are consistent with previous research, which has found strong associations between different measures of verbal intelligence (Sattler, 2008; Wechsler, 1997). Moreover, the IAW test scores were comparable to the external measures' scores, with no significant differences in means. This finding suggests that the IAW test may be a valid alternative to more established measures of verbal intelligence (Carroll, 1993).

The implications of these results are twofold. First, they provide support for the theoretical underpinnings of the IAW test, which is based on the assumption that verbal intelligence can be reliably and validly measured using a combination of items assessing different aspects of verbal ability (Sternberg, 2003; Gardner, 1983). Second, the results suggest that the IAW test may be a useful tool for practitioners, as it appears to be a reliable and valid measure of verbal intelligence that is less time-consuming and costly than traditional intelligence tests (Kaufman & Lichtenberger, 2006).

Despite the promising findings, several limitations should be considered. First, the sample size for the concurrent validity analyses was relatively small, which may have limited the generalizability of the findings (Cohen et al., 2003). Future research should aim to replicate these results with larger, more diverse samples. Second, the study focused primarily on the verbal ability aspect of intelligence, and it remains unknown how well the IAW test would perform in measuring other aspects of intelligence (Gardner, 1983). Lastly, the study did not include a test-retest reliability analysis, which would provide valuable information about the stability of the IAW test scores over time (Kane, 2006).

Future research should address these limitations by replicating the current study with larger samples and more diverse populations, as well as by examining the psychometric properties of the IAW test for other aspects of intelligence (McGrew, 2009; Jiao & Lissitz, 2017). Additionally, future studies should consider conducting a test-retest reliability analysis to further establish the psychometric properties of the IAW test (Cronbach & Meehl, 1955).

These findings have important implications for both theory and practice and suggest that the IAW test may be a valuable tool for researchers and practitioners in the field of intelligence assessment (Kaufman & Lichtenberger, 2006). Future research should build on these findings by addressing the limitations of the current study and exploring the psychometric properties of the IAW test in other domains and populations (McGrew, 2009; Jiao & Lissitz, 2017).

Moreover, the IAW test's online format may increase accessibility and reduce administration costs, further highlighting its potential usefulness for various settings, such as educational, clinical, or organizational contexts (Bartram, 2006). As online testing continues to gain prominence in the field of psychological assessment, the IAW test may serve as a valuable addition to the available tools for measuring verbal intelligence (Goldberg et al., 2006).

Overall, the IAW test demonstrates strong psychometric properties as a measure of verbal intelligence, with excellent reliability, robust concurrent validity, and comparable performance to established measures. This study's findings have important implications for both theory and practice and indicate that the IAW test may serve as a valuable and accessible tool for researchers and practitioners in the field of intelligence assessment. Further research should continue to explore the test's psychometric properties in other domains and populations, ultimately contributing to a more comprehensive understanding of the IAW test's potential applications.

Conclusion

This study demonstrated that the IAW test is a reliable and valid measure of verbal intelligence. The test exhibited excellent internal consistency and strong concurrent validity with established measures such as the WAIS-III VCI and the RIAS VIX. These findings have important implications for both theory and practice, as they support the theoretical foundations of the IAW test and suggest that it may be a valuable tool for researchers and practitioners in the field of intelligence assessment.

Despite its promising results, the study had certain limitations, such as a relatively small sample size for concurrent validity analyses and a lack of test-retest reliability analysis. Future research should address these limitations by replicating the study with larger, more diverse samples and examining the psychometric properties of the IAW test in other domains and populations. Additionally, conducting a test-retest reliability analysis could further strengthen the evidence for the IAW test's psychometric properties.

The IAW test shows promise as a reliable and valid measure of verbal intelligence, with potential applications in research and practice. By building upon these findings and addressing the limitations of the current study, future research can continue to explore the potential of the IAW test and contribute to a deeper understanding of intelligence assessment.

References

Aiken, L. R. (2000). Psychological testing and assessment (10th ed.). Boston, MA: Allyn & Bacon.

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.

American Psychological Association. (2017). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445-459. https://doi.org/10.3758/bf03193014

Bartram, D. (2006). The great eight competencies: A criterion-centric approach to validation. Journal of Applied Psychology, 91(6), 1189-1203. https://doi.org/10.1037/0021-9010.90.6.1185

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511571312

Cizek, G. J., & Burg, S. S. (2006). Addressing test anxiety in a high-stakes environment: Strategies for classrooms and schools. Thousand Oaks, CA: Corwin Press.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

College Board. (2023). Scholastic Aptitude Test. Retrieved from https://satsuite.collegeboard.org/sat

Creswell, J. W., & Creswell, J. D. (2017). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). Los Angeles, CA: SAGE Publications.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302. https://doi.org/10.1037/h0040957

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. https://doi.org/10.4324/9781410605269

Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books.

Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G. (2006). The International Personality Item Pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1), 84-96. https://doi.org/10.1016/j.jrp.2005.08.007

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger Publishers.

Jiao, H., & Lissitz, R. W. (Eds.) (2017). Test Fairness in the New Generation of Large-Scale Assessment. Charlotte, NC: International Age Publishing.

Jouve, X. (2023). I Am a Word. Retrieved from http://www.cogn-iq.org/i-am-a-word-iq-test.php.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education and Praeger Publishers.

Kaufman, A. S., & Lichtenberger, E. O. (2006). Assessing adolescent and adult intelligence (3rd ed.). Hoboken, NJ: John Wiley & Sons.

Lemman, R. (1999). The big test: The secret history of the American meritocracy. New York, NY: Farrar, Straus and Giroux.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37(1), 1-10. https://doi.org/10.1016/j.intell.2008.08.004

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. https://doi.org/10.3102/0013189X023002013

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.

Reynolds, C. R., & Kamphaus, R. W. (2003). Reynolds Intellectual Assessment Scales. Lutz, FL: Psychological Assessment Resources.

Salvia, J., & Ysseldyke, J. (2004). Assessment in special and inclusive education. Boston, MA: Houghton Mifflin.

Sattler, J. M. (2001). Assessment of children: Cognitive applications (4th ed.). San Diego, CA: Jerome M. Sattler.

Sattler, J. M. (2008). Assessment of children: Cognitive applications (5th ed.). San Diego, CA: Jerome M. Sattler.

Sternberg, R. J. (2003). Wisdom, intelligence, and creativity synthesized. New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511509612

Urbina, S. (2004). Essentials of psychological testing. Hoboken, NJ: John Wiley & Sons.

van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York, NY: Springer.

Wechsler, D. (1997). Wechsler Adult Intelligence Scale - Third Edition. San Antonio, TX: Psychological Corporation.

Welsh, J. R., Jr., Kucinkas, S. K., and Curran, L. T. (1990). Armed Services Vocational Aptitude Battery (ASVAB): Integrative review of validity studies (AFHRL-TR-90-22, AD-A225 074). Brooks AFB, TX: Manpower and Personnel Division, Air Force Human Resources Laboratory.


Author: Jouve, X.
Publication: 2023