Friday, September 01, 2006
Types and Limitations of Psychometric Tests
Psychometric refers to any measure of a mental ability, but is also refers to the mathematical and in particular statistical measurements used on psychological data. In regards to this topic it is intelligence tests that are of most interest and relevance.
The assumption behind intelligence tests is that there is a general mental ability (g) which underlies performance on many different types of tests. However it is also believed that alongside this general ability there are specific abilities (s) which can influence certain types of tasks. 'Thus in any intelligent act, "g"' is involved, plus the "s" factor or factors appropriate to that particular act' (Fontana 1995, p. 103). IQ scores are now measured against an average of 100 and that deviation from this norm reflects either lacking in intelligence or possessing more than is usual, for example, if an individual scores 70 on an IQ test they will be regarded as boderline (at the point between 'normal' intelligence and that of experiencing learning difficulties/disabilities. A score of over 130 indicates a very significant improvement in intelligence away from the norm.
A key point in regard to intelligence tests is that they aim to measure underlying ability in regard to intelligence and not the products of specific learning programmes. Attainment tests, such as National Curriculum tests, measure the outcomes, or knowledge demonstrated, after specific programmes of instruction. Intelligence tests attempt to measure abilities, which reflect experience but are not specifically taught as part of the school curriculum.
While there are a number of mental ability tests used in public schools, the two lost frequently used contemporary individual intelligence test batteries are the Stanford-Binet Intelligence Scale (1986) and the Wechsler Intelligence Scale for Children (1992). Different school psychologists may use either of these tests.
The Stanford-Binet Intelligence Scale (1986) has undergone a number oftransformations from the original English-language version developed by Terman (1916). The latest revision of the Stanford-Binet (1986) is the fourth edition. This edition attempts to address some of the criticisms that have been leveled at the Stanford-Binet test and intelligence tests in general.
In responding to these criticisms, the Stanford-Binet Intelligence Scale (1986) generally has avoided using the term intelligence quotient or IQ score. The IQ score has been replaced with the standard age score (SAS). This change in terminology came after the term IQ score was removed from a number of the standardized group intelligence tests. The group tests are now termed mental abilities or cognitive abilities tests. Now, the only major individual intelligence test to consistently use the term IQ score is the Wechsler Battery.
The editors of the Stanford-Binet also have responded to critics by expanding the areas of material covered by the test. The Stanford-Binet had long been criticized as too heavily weighted toward vocabulary and reasoning skills. The new version attempts to correct for such biases by increasing the variety of subtests included in the battery There are now fifteen subtests in the latest Stanford-Binet scale, which are grouped into four ability scales -
- The Verbal Reasoning ability scale contains four subtests. These tests are designed to measure the ability to define words; to comprehend the use of items, objects, or events; to determine what is missing in a picture; and to identify differences and similarities in a series of words.
- The Abstract/Visual Reasoning ability scale contains four subtests. These tests attempt to measure the ability to compete different visual patterns. The subject is asked to use blocks to complete a pattern or design; to copy figures; to complete a matrix; and to identify what a folded paper object would resemble once it is unfolded.
- The Quantitative Reasoning ability scale is composed of three subtests. These tests involve pictorial and verbal arithmetic problems; different types of numerical series with the last two digits in the series absent; and equations that the P must unscramble and solve.
- The Short Term Memory scales contain four subtests. These tests involve repeating word for word a sentence that is read aloud; a visual presentation of a stack of beads that must be correctly repeated in a certain sequence; repeating and reversing a series of digits; and a series of pictures of various objects that must be correctly recalled in the order in which they were presented.
The Wechsler Battery of Intelligence tests is divided into three separate versions. The Wechsler Preschool Primary Scale of Intelligence-Revised (WPPSI-R) (1989) is for ages three to seven. The Wechsler Intelligence Scale for Children-Ill (WISC-III) (1992) is for ages seven to sixteen. The Wechsler Adult Intelligence Scale-Revised (WAIS-R) (1981) is for ages sixteen years and older. The WISC-III test is the one most commonly used in public schools. With a few exceptions, the WPPSI-R and WAIS-R follow the same general format.
The WISC-III is divided into two basic sections: verbal and performance. The verbal section examines reasoning and vocabulary skills. The performance section examines visual-spatial skills. The combined score from these two sections yields a full-scale IQ score. Thus, the examiner can obtain three IQ scores from this test.
The verbal section contains six subtests: the Information test, which involves general knowledge questions about the culture and the environment; the iimilarities test, in which the subject is asked to compare two items and deternine the ways in which the items are similar; the Arithmetic test, which involves enting the subject with verbal arithmetic problems in sentence form; the Vocabulary test, in which the subject is asked to define specific words; the .omprehension test, in which the subject is asked what would be appropriate in a given situation (for example, why is it wrong to set off a fire alarm when , there is no fire?); and the Digit Span test, in which the subject is asked to repeat a series of digits and if completed correctly is then asked to repeat another series of digits.
The performance section of the WISC-III contains the following subtests: the Picture Completion test, in which the subject is shown a series of pictures each of which has a part missing that the subject is asked to identify; the Picture Arrangement test, in which the subject is presented with a series of pictures in a mixed-up order that the subject must then arrange in a logical format that tells a coherent story; the Block Design test, in which the subject is presented with a cube with red and white designs (somewhat like a Rubik's cube) that the subject is asked to change into a number of different designs that are displayed on cards; the Object Assembly test, which is like a child's puzzle where pieces are provided that the subject must place together in the correct manner; the Coding test, in which the subject is provided with a series of nonverbal symbols that the subject must copy correctly in a space below the symbol; and the Mazes test (a supplementary test that is not generally used in the standard WISC-III test), which involves the subject correctly tracing a path through a series of mazes.
Validity: Simply, do these tests measure what they claim to measure? One argument is that since intelligence is such a diverse set of abilities (refer to Gardner's ideas on intelligence) then any one test can not hope to cover all aspects of what we may regard intelligence to be. Another issue is that the tests outlined above assume that intelligence is a fixed and global phenomenon, meaning they work with the idea that intelligence effects all aspects of your functioning and in a predictable and static way. Well, what if this is not the case. For example, we know that people can score low in maths tests but do complex calculations when shopping and in other everyday settings (Cumming and Maxwell, 1999). Thus there may be a mismatch between tests scores and ability.
Factors that affect reliability of tests: Comprehension of questions, presence of tester, motivation and self-efficacy in regards to the test, previous educational experience.
Issues regarding ethnic groups: Much controversy surrounds the issue of differences in IQ scores across different ethnic groups. There have been criticisms regarding whether IQ tests are culturally fair. Modem IQ tests have struggled to eliminate such biases. It is also argued that IQ tests are culturally bound, that is they reflect a Western view of intelligence. However, even within Western societies there are differences in IQ scores between ethnic groups. Although individuals from all ethnic groups can be seen at all levels ofIQ, the mean IQ of white Americans is higher than that of black Americans. The APA (1996) report that this result is not due to differences in socio-economic status or to obvious biases in test construction. Further they state that there is no evidence to support a genetic interpretation for these findings, but the reason for such differences is not known. In a discussion of such differences it is crucial to recall what IQ tests actually measure. Neisser (1997, P. 1) states that IQ tests 'tap certain abilities that are relevant to success in school and do so with remarkable consistency. On the other hand, many significant cognitive traits - creativity, wisdom, practical sense, social sensitivity - are obviously beyond their reach'.