DRAFT: PLEASE DO NOT QUOTE WITHOUT PERMISSION

Try this new kind of intelligence test now! by clicking here

 

A New Implicit Measure of Cognitive Aptitudes

Joseph Psotka, Peter J. Legree, & Daniel E. Martin

U.S. Army Research Institute

Alexandria, Virginia 22333-5600

 

Abstract

Current survey technology does not easily support the measuring of cognitive aptitude in social science surveys that use the Internet. This is because existing cognitive aptitude measures: (1) have not been validated for self - administered purposes; and (2) are easily recognized as tests and, therefore, inappropriate for unsupervised response, where interviewees can leisurely consult external reference materials. Instead, surveys have used self-report and demographic information to estimate cognitive aptitude, but this approach is not very accurate because demographic variables are poor predictors of cognitive aptitude. This project expanded the utility of survey data by developing procedures that can begin to measure cognitive aptitude in Internet surveys, using self - administered formats. Because the military is: (1) mandated to maintain cognitive aptitude standards and (2) increasingly uses internet survey research to develop recruiting and training policy, it is important to develop techniques that can not easily be faked or computed using secondary sources.

The feasibility of measuring cognitive aptitude in Internet surveys was evaluated by developing an implicit knowledge measure that appeared to be a guessing game of others' general preferences. The measure was designed to assess a cognitive process of inference in a simple format of guessing Xs and Os. This measure used a binary choice response format, which is associated with surveys of preferences. This measure was developed to be: (1) exceptionally short, (2) correlated with cognitive aptitude, (3) unobtrusive, i.e., not appear to be a knowledge test, (4) without correct answers that could be computed or sought out in available reference materials, and (5) suitable for internet-based survey administration. The implicit reasoning scale was administered to 288 Air Force recruits for whom cognitive aptitude scores were available. Performance on the measure correlated up to .38 with cognitive aptitude in the sample, allowing a population correlation up to .58 to be estimated based on the bivariate correction for range restriction. These values indicate that the implicit reasoning scale procedure provided an adequate measure of cognitive aptitude that could be incorporated into internet-based surveys. Limitations of IRS difficulty level, ambiguity of instructions, and dynamic item selection and presentation are being addressed in follow on designs.

 

1. Introduction

It is safe to assert that current survey technology does not accurately estimate cognitive aptitude in internet-based surveys. In the past, surveys that have estimated general cognitive aptitude have used poor surrogates, such as socio-economic status (SES) or education level. Analyses with these variables only can address the role of general cognitive aptitude in a very speculative manner. The reason for this is that variables such as SES and education level are multi-determined and have only a weak relationship with cognitive aptitude. Therefore, they can not accurately assess the role of cognitive aptitude in modeling human behavior. As a general statement, cognitive aptitude has been neglected in understanding a number of important social and practical topics, e.g., criminality, civility, political involvement, career interests, and consumer preferences.

This limitation is apparent in analyses conducted using at least two surveys in which the U.S. Government has a vested interest. The first is the Monitoring the Future study (MTF). MTF is a large federally funded annual survey of American high school seniors, has been ongoing for more than 20 years, is primarily intended to monitor drug use, and contains career-oriented items including interest in pursuing a military career. However, towever, he MTF survey never has estimated individual differences in cognitive aptitude and, therefore, cannot be used to explore the relationships between cognitive aptitude, drug use and career-oriented decisions.

The inability to estimate general cognitive aptitude also limits the utility of data collected under the Youth Attitude Tracking Study (YATS), and this limitation has been noted by Rand Corporation (Orvis & Gahart, 1989). The YATS is telephone-based survey, has been ongoing for more than 20 years and samples 10,000 American youth annually. The YATS survey contains a variety of questions to estimate interest in pursuing a military career. The purpose of YATS is to provide market information to military recruiters and government policy makers.

Both YATS and MTF would be more useful to the military if they contained cognitive aptitude scales that could be used to allow market segmentation by cognitive aptitude and support policy designed to attract higher aptitude individuals to the military. It is relevant that: (1) federal law mandates a minimal level of cognitive aptitude for military recruits as measured by the Armed Forces Qualification Test (AFQT), and (2) all the military services attempt to exceed that requirement by primarily recruiting higher-scoring AFQT individuals.

Apart from the YATS and MTF surveys, which focus on young adults for recruiting purposes, there are other surveys addressing social phenomena that might benefit from the availability of a general cognitive aptitude measure. In response to these concerns, the Army Research Institute initiated a research program to develop methods to accurately measure cognitive aptitude in telephone and mail-based surveys.

 

2. The Implicit Reasoning Scale (IRS) Study

The introduction argued that measures developed to measure cognitive aptitude would be useful for military surveys by allowing segmentation of participants by AFQT score. However, a standard multiple choice test cannot be included in an internet or mail-based survey because participants may recognize the instrument as a knowledge test and consult outside sources of knowledge. Because of this consideration, a scale was developed that had no computed correct answers or answers that could be retrieved from reference materials. This implicit knowledge survey scale has the unusual characteristic of appearing to be a judgment of others' preferences.

3.1 Method

3.1.1 Overview

The implicit reasoning scale is scored by computing for each item the distance between a subject’s response for that item and a reference point derived from the mean of everyone else's preferences; better performance is indicated by smaller distances between a subject’s ratings and the reference points. We believed that an implicit reasoning scale designed to assess others' preferences might measure cognitive aptitude while appearing to be opinion questions as opposed to tests.

To validate the assumption that subjects would not view the implicit reasoning scale as a test, we included a measure at the end of the survey book that required participants to indicate whether they considered the instruments they had just completed to be "tests" or "surveys."

3.1.2 Instrument

The experimental book contained a number of other instruments that are described in Legree, Martin, and Psotka (1998). The first instrument, Self-Descriptive Information, consisted of a sheet that required individuals to provide demographic information that is frequently requested in surveys, e.g., age, gender, ethnicity, etc.

The next six instruments consisted of the Likert scales. The Likert scales required participants to rate: (1) the frequency of various words in the English language, Word Frequency; (2) connotations of terms implying varying degrees of excellence, Excellence; (3) the size of various Army job families, e.g., infantry, medical, etc., Military Positions; (4) automobile reliability, Auto Reliability; (5) automobile fuel economy, MPG; and (6) a multiple choice scale of military knowledge that was developed using general knowledge sources to identify items, Knowledge of the Military.

The seventh instrument was the implicit reasoning scale, presented on a single page, and headed Guessing Game. This consisted of 22 sequences of 8 Xs and Os that subjects were asked to complete with the choice of an X or an O. The scale is now on-line at:

Try this new kind of intellignece test now! by clicking here

 

The final instrument, Test or Survey consisted of a questionnaire that required participants to indicate whether the preceding scales and that final scale had appeared to be "tests" or "surveys".

AFQT scores were obtained for participants from Department of Defense records.

3.1.2 Participants

Participants were 288 recently enlisted U.S. Air Force recruits who were in basic training at Lackland Air Force Base, Texas.

3.1.3 Procedure

Data were collected over a two-month period between 7:00 and 9:00 a.m. Subjects were seated in a classroom and were told to follow the instructions contained in the experimental book. We used standard U.S. Air Force procedures governing subject participation in this research.

3.2 Results and Conclusions

The reliability and correlation of the implicit reasoning scale with AFQT was substantially attenuated because of restriction of range and ceiling effects of both the implicit reasoning scale (IRS) and AFQT. AFQT is restricted because only individuals with scores greater than .4 are recruited. The IRS was subject to a pronounced ceiling effect, in which some individuals at all levels of the AFQT were able to get a perfect score. In spite of this ceiling effect and the differences in difficulty level between the two scales, the correlation with AFQT was significant (.20).

 

Figure 1. Correlation of the implicit reasoning scale with AFQT (.20).

 

 

 

Figure 1. provides an indication both of the correlation with AFQT (.36) and the ceiling effects in the first half of all participants. This first half had 79 % of all female participants, so it is considered to be more representative than the entire group of 288. The ceiling effect is visible in that individuals at all levels of AFQT scored in the top quartile of deviation scores. The range of possible scores is from 7.87 to 15.05. However, anything lower than 11.8 is a negative correlation with group means, a contrarian prediction. Therefore these have been eliminated from the scale. This ambiguity in individuals' responses creates a problem for the scale, because it is possible for individuals to second-guess or predict the opposite of what they really think. However, this only results in lower scores than the best they are capable of creating.

 

Because range restriction (s sample=16.4 vs. s population = 28.7) due to military entrance requirements substantially attenuates all ASVAB correlations, the bivariate correction for range restriction was used to estimate the population correlations between AFQT and the IRS. The population correlation estimate for the IRS is .53.

One feature of the scattergram is that some of those in the lowest range of AFQT score at the highest level of the IRS. This apparently wide range of scores may in part be an artifact of the ceiling effect in the IRS, and the differences in difficulty level between the IRS and AFQT. It may also reflect the differences in what is being assessed. AFQT assesses enduring knowledge and skills acquired over many years as well as generic abilities that may underlie the acquisition of those knowledge and skills. The IRS appears to be largely independent of any acquired knowledge and skills, since the underlying inference process is speculated to be simple perceptual matching and conditional frequency encoding (Psotka, 1977).

It is not clear how to make the IRS more difficult, but one first step is to lengthen the sequences. This has been done in a new version available at:

http://psotka.org/survey/examples.html (These new scales are still under construction. They have not been standardized on a population, so their answers are uncertain.)

It is not clear how to reduce the amount of second guessing or contrarian response to the scale. Pat Kyllonen (personal communication) has suggested emphasizing agreement with others' preferences in the instructions, over the good continuation of the sequences themselves, and this suggestion has been implemented in the new versions.

These correlations with AFQT are surprisingly high, since the reliability of the data is quite low (.26). Figure 2 gives the reliability correlation:

 

 

 

 

 

 

 

 

Figure 2. The split half reliability estimate for the implicit reasoning scale (.24).

However the small number of questions in the scale (22) and the fact that there are only binary answers considerably attenuates this reliability measure. By carefully splitting the items into two matched groups of 11, the correlation for reliability can be raised to .36 (or .51 when corrected for item loss.) Perhaps a better estimate comes from the correlation of the deviation scores with a theoretical prediction of the deviation scores from Psotka (1977), shown in Figure 3.

 

 

Figure 3: An estimate of the replicability of the data from a correlation (.88) of deviation scores with a theoretical prediction of the deviation scores from Psotka (1977).

A factor analysis of these data with the subscales of the AFQT revealed that they loaded equally on two factors: a quantitative factor (AR, MK) and a speed factor (CS, NO). Perhaps this factor structure comes from the presumed underlying inference processes of the implicit reasoning scale: counting and matching (cf. Psotka, 1977).

The proportion of participants who viewed the implicit reasoning scale instrument as a "test" was .51 and as a "survey;" .49. These data validate the assumption that the implicit reasoning scale is an unobtrusive knowledge scale. These values indicate that the implicit reasoning scale could be incorporated into internet-based surveys to unobtrusively measure cognitive aptitude.

 

Figure 4. Syntely and AFQT should be positively related in this figure: higher syntely should be equal to higher AFQT, yet there appears to be no relationship, except that dispersion decreases with higher AFQT and Syntely.

A puzzling finding in the data, when using the theoretical measure (as well as with deviation scores) for a minority subgroup of the population showed a triangular relationship between the syntely measure and AFQT. There was no correlation between syntely and AFQT, and dispersion appeared to be greater for lower AFQT. The finding is only suggestive because of the small number of minority participants (48).

 

References

  1. Orvis, B., & Gahart, M. (1989). Quality-Based Analysis Capability for National Youth Surveys: Development, Application, and Implications for Policy. Santa Monica, CA: Rand Corporation.
  2. Psotka, J. Syntely: Paradigm for an inductive psychology of memory, perception, and thinking. Memory and Cognition, 1977, 5, 553 - 560.
  3. Legree, P. J., Martin, D. E. & Psotka, J. (1998) New Technologies to Measure Cognitive Aptitudes in Surveys. Paper prepared for the 1998 Army Science Conference, Norfolk, VA.