by Dr. Linda Silverman and Kathy Kearney
Used by Permission
The Stanford-Binet Intelligence Scale (Form L-M) is the only measurement tool we have that can adequately assess extraordinarily gifted children; yet, it is in danger of extinction. Newer tests, newer conceptions of intelligence, and newer normative samples make the "old Binet" appear hopelessly antiquated. Among the misunderstandings about testing is the widely held belief that newer is better, with no consideration of the fact that a test may be better for some populations but not for others. In this article, we will discuss the deficits of the old Binet as well as its strengths, problems with the use of newer instruments for the exceptionally gifted population, and recommendations for when it is advisable to administer the Binet L-M.
Why We Need the Old Binet
Before those of you who are knowledgeable about testing begin, "But the L-M is so." we hasten to assure you that we are well aware of myriad flaws in the old Binet: it is sexist, morbid, outdated; men no longer make $20 per week; it uses terms that children no longer hear and describes experiences children no longer have; it has 20-year-old norms; it is highly verbal; it generates only one global IQ score; before 1972, the normative sample was entirely Caucasian; specific strengths and weaknesses cannot be compared easily; it is not user-friendly--in fact, it's a nightmare to learn to administer; scoring and interpretation require subjective judgment; it is so old that institutions of higher education are no longer teaching graduate students how to administer it and school districts are discarding it. So why bother? We bother because it is all that we have. We have worked with over 200 children who test above 160 IQ on the Stanford-Binet (L-M). Some of them were misunderstood by both their families and communities and assigned to totally inappropriate school placements until they were assessed appropriately and found to be exceptionally gifted and out of place. It has been key for both the family and the school to realize how gifted these children really are, and how different they are from their chronological age-peers and even from other, more moderately gifted children.
Within the top 1% of the IQ distribution, then, there is at least as much spread of talent as there is in the entire range from 1st to 99th percentile. Moreover, those we might call the "supergifted," (those with IQs 4 or more standard deviations above the mean) tend to be as unlike the "garden-variety gifted" (with IQs 2 or 3 standard deviations above the mean) as the "garden-variety gifted" are unlike children with scores clustered within 1 standard deviation of the mean of the population. (Robinson, 1981, p. 71)
Further, Robinson points out "that there are many more truly exceptional young children in the population than would be predicted on the basis of the normal curve alone" (p. 73) and that children in the very highest ranges of intelligence "may not fare as well in many respects as those with more moderate gifts" (p. 75). Without the tools to find such children, the children themselves remain doubly at risk. Nothing has come along to replace the Stanford-Binet (L-M) for this particular population. We eagerly greeted every recently released new test and revisions of the old ones, hoping they would correct all the woeful aspects of the L-M, only to find that not one of them was designed with the highly gifted in mind. Why? In order for the testing industry to survive, it must focus all of its energies on creating tests that are as culturally unbiased and as marketable as possible. That is a tall order. Second, the tests must be excellent diagnostic tools for learning disabilities and retardation. Third, they must be easy to administer, not too time consuming, and applicable to the majority of the school population so that they will be economically feasible to produce (Hagan, in Silverman, 1986a).
In constructing a cognitive abilities test you are always faced with constraints. You have to produce an instrument that will adequately appraise the full range of individual differences in a chronological age group from the very slowest level of development to the most rapid. At the same time, you have to produce an instrument that can be administered fairly easily and within a reasonable amount of time. The compromise is to produce an instrument that is most effective in the range of .4 s. d.'s: therefore you can't use tasks that are successfully completed by 99.99 percent of an age group or that are failed by 99.99 percent of an age group. In the construction of the Binet [Revision IV], I was working with some nonverbal items that could only be solved by children who were in classes for the gifted. You can't put items like that in an intelligence test because they aren't functional for a wide enough group. (p. 171)
This helps to explain why newer tests, like the Stanford-Binet Fourth Edition or the WISC-III, are inadequate for highly gifted children. When an item can be solved only by children enrolled in a gifted class, it is removed from the test. Differentiating exceptionally from moderately gifted children was never a goal of current test makers.
Deflation of Scores in the Gifted Range
We have been blithely going along using all of the newer instruments to make placement decisions about gifted students without paying any attention to the lack of representation of these students in the normative samples. Few studies of the gifted are reported in the technical manuals and no studies of the exceptionally gifted appear at all. The Stanford-Binet: Fourth Edition was originally going to provide scores only to 148, since there were not enough highly gifted children in the normative samples to warrant printing norms beyond that point (E. Hagan, J. Sattler, R. Thorndike, personal communication, 1985). The norms beyond 148 had to be extrapolated, which Thorndike was very reluctant to do. The test was designed for children within 3 standard deviations of the mean. The same can be said of the WISC-R, WISC-III, and the K-ABC. It is extremely difficult to attain a composite or Full Scale score above 150 on any of these tests. In order to fit IQ scores into the normal curve of distribution, the scores in the highest ranges have been systematically depressed for the last 2 decades (Silverman, 1989). A young child scoring 160 on the 1960 norms of the Stanford-Binet (L-M) would score approximately 129 on the WISC-III! This is a loss of 31 IQ points in 31 years, almost 2 standard deviations of intelligence. Scores for highly gifted children dropped 10 to 14 points from the 1960 norms of the L-M to the 1972 norms. Another 13.5 points on average were lost for moderately gifted children between the 1972 L-M norms and the 1986 norms on the Stanford-Binet: Fourth Edition. The Fourth Edition correlated closely with the WISC-R for children in the 116 range (surprisingly labeled "gifted" in the technical manual). The WISC-III manual reports that scores in the gifted range average 5 to 6 points lower than on the WISC-R. The average Full Scale score on the WISC-III of 38 children who were independently identified as gifted on other measures was 129, low enough to just miss the cut-off score for most gifted programs! "Five of these 38 children obtained FS [Full Scale] IQ scores less than 120 on the WISC-III" (WISC-III Technical Manual, p. 210). Instead of taking these enormous losses seriously, the deflation is waved away in the technical manual in one sentence: "These differences are expected because the WISC-III norms are more contemporary than WISC-R norms" (WISC-III Technical Manual, p. 211). However, the discrepancies cannot be explained away simply in terms of the entire population getting brighter over time. The rise in intelligence in the general population is reflected in differences in scores in the average range of only 8 or 9 IQ points during the same time period. Differences in the gifted range are more than 3 times the differences in the average range (Silverman, 1989). For the highly gifted range, the situation is even worse.
Seven of the children in the Maine group who had been tested on the WISC, WISC-R, WPPSI, or K-ABC intelligence tests scored between 139 and 155, with only two scoring above 145. They were then given the Stanford-Binet Intelligence Scale [Form L-M].On this test, these same children scored between 169+ and 194. One child's score showed a discrepancy of more than 50 points between the K-ABC and the Stanford-Binet (143 as opposed to 194); another had a similar discrepancy between the WISC (139) and the Stanford-Binet L-M (187+). In the Colorado group, similar discrepancies were found for the six children who had been tested on both the WISC-R and the Stanford-Binet L-M. Only one child in the 170+ range scored above 150 on the WISC-R, and another scored as low as 135.
Since the time that article was released, an additional child has been found who scored 182 on the Stanford-Binet (Form L-M) and 127 on the Stanford-Binet: Fourth Edition. Another scored 137 on the WISC-R, and a year later tested 229+ on the Stanford-Binet (Form L-M), at the age of nine missing only two items on the entire test! This "test artifact" amounts to blatant discrimination against the highly gifted, and has major implications for the location of gifted students, and for their placement in programs. The situation is shocking, but no one appears to be paying attention because the highly gifted are not of central interest to test constructors. In contrast, the gifted and highly gifted were definitely important to Lewis Terman, who constructed the original Stanford-Binet. Among other things, Terman planned to use the test to find potential "geniuses," so he had an investment in creating a difficult enough examination with a high enough ceiling to permit their discovery.
The Structure of the Old Binet
Terman's Stanford-Binet was constructed in a different manner from its 1986 successor. Tasks are organized by age level from ages 2 to 14, with four additional adult levels culminating in the Superior Adult III level. The items at each age level are organized to tap different mental processes and to assess the child's flexibility in going from one type of task to another. By comparison, the Wechsler tests, the Stanford-Binet: Fourth Edition, and the Kaufman Assessment Battery for Children (K-ABC) are all organized in subtests. The child stays with one type of item until he or she reaches a ceiling (cannot accurately complete a certain consecutive number of questions). The rapid movement from one kind of task to another in the old Binet appears to keep children interested in the assessment, and, therefore, likely to do their best. Vernon (1987) notes that Certainly tests of the Wechsler type have many advantages, but I believe that a strong case can still be made for retaining the L-M, with its apparently haphazard arrangement of items, since it gives the tester greater flexibility.I suggest that children below about 6 years have great difficulty with WPPSI and WISC in maintaining the same set throughout all the items in a particular subtest. In contrast, the shortness of the Binet items and their great variations in content help the tester to catch and hold the child's attention. (p. 253)
The Stanford-Binet (Form L-M) provides mental ages, which are no longer used in modern testing. One reason they have been abandoned is that they appear derogatory and invalid when applied to the functioning of retarded children. However, when they are applied to gifted children, parents and teachers have a greater understanding of why these children are bored with the regular curriculum and why their friends are often several years older. In addition, the mental age permits the extrapolation of both deviation and ratio IQs for the highly gifted range, which cannot be done with the newer tests. Perhaps the most paradoxical difference between the Form L-M and its successors is the fact that even though it produces a global IQ score, a child can attain the very highest level of the test on one or two skills alone, such as vocabulary or verbal reasoning or spatial orientation. He or she is not overly penalized by lack of fine motor coordination. All of the newer instruments that purport to be sensitive to different types of intelligence still produce composite or Full Scale scores which penalize children for every one of those intelligences that they do not demonstrate. One would have to be exceptional in all areas to obtain a score above 150 on a WISC-R, WISC-III, Stanford-Binet Fourth Edition, or K-ABC, while such a score can be obtained on an old Binet with just one or two major strengths.
The Demise of the Binet L-M
The demise of the Stanford-Binet (Form L-M) began when its successor, the Stanford-Binet: Fourth Edition appeared in 1986. From then on, psychologists looked askance at the use of the old Binet, primarily because it had "outdated norms." Ironically, these same psychologists continued comfortably using the WISC-R, even though the norms for the WISC-R were from 1974, only two years later than the 1972 norms for the Binet (L-M). Extremely gifted children and their families face unique and difficult academic, cultural, and social adjustment issues; indeed, this is a population that is truly "at risk" in many ways. Lack of academic challenge is rampant for these children in contemporary American schools. Highly gifted children must deal early and continually with marked discrepancies in development unknown to their average peers, the long-term consequences of which we still do not understand very well. (See last issue of Understanding Our Gifted). As early as 1930, Terman noted that "The child of 180 IQ has one of the most difficult problems of social adjustment that any human being is ever called upon to meet" (Burks, Jensen, & Terman, 1930, p. 265). It is safe to say that if any other special population of gifted children (or any other group of children, for that matter) was at risk in similar ways, we would use whatever effective tools were available in order to identify them and provide appropriate services for them. For this particular population, an older tool (the Stanford-Binet Form L-M) may well be more effective than newer ones. In today's schools, assessment of the gifted is often done only as a means for entrance into a gifted program. For extraordinarily gifted children, it is important to take a much broader view of assessment, since the concomitants of extreme intellectual giftedness markedly affect individual development in all areas, as well as affecting the culture and socialization of the family.
Therefore, we recommend the following:
(1) Entrance requirements for gifted programs should be lowered to 120 to take into account the lower norms on newer instruments.
(2) Gifted children should be tested initially with one of the more recent tests (Stanford-Binet: Fourth Edition, WISC-III, or K-ABC) solely to meet whatever requirements exist at their schools for entrance into gifted education programs.
(3) Whenever a child obtains three or more subtest scores at or near the ceiling of any current instrument (such as a 17, 18, or 19 on three or more WISC-R or WISC-III subtests), he or she should be retested on the Stanford-Binet (Form L-M).
In this case, the L-M is being used as a supplemental test to obtain further information about the child, and to tie that information to the 75-year research history regarding the extraordinarily gifted, which used this test and it predecessors extensively for identification. Using standard formulas (Pinneau, 1961), scores should be extrapolated for any child who scores beyond the norms in the manual, in order to obtain a rough estimate of the child's ability. Since a number of highly gifted children have dramatic weaknesses that may artificially depress IQ scores, parents should request administration of the L-M as a supplemental test whenever they suspect that the newer assessments have underestimated their children's abilities. Paradoxically, one of the common criticisms of the L-M is that it is too "verbally loaded." Yet for children whose greatest strength is their abstract and verbal reasoning ability, the L-M may be the best measure to capture this strength in early childhood, without having to wait until the age of 11 or 12 to take the verbal section of the Scholastic Aptitude Test (SAT) as part of the national talent searches. Vernon (1987) states that
There are two special groups for whom the L-M is often preferable to the Wechsler scale: the potentially gifted who are being considered for special classes or enrichment programs, and severely retarded.children or adults. Neither the four verbal subtests in WISC or WAIS nor the four NS [Stanford-Binet: Fourth Edition] verbal subtests give as much opportunity as the L-M for gifted children to display their fluency, imagination, unusual or advanced concepts, and complex linguistic usage. (p. 256)
Use of the Stanford-Binet (Form L-M) as a supplemental tool to identify highly gifted children means that appropriate intervention can be implemented in the critical early childhood and elementary years, providing a chance to avert academic and adjustment difficulties. It is best to administer the old Binet to children under the age of 12. Even at age 9, highly gifted children may surpass the ceiling on the Binet L-M, and a sufficient ceiling is necessary to capture the full strength of the child's abilities. We need to share these recommendations with school psychologists so that the old Binet kits are not discarded. The release of the WISC-III last August places the Stanford-Binet (Form L-M) in even greater danger of disappearing. It must be preserved so that it can be used as a supplemental test for the highly gifted; otherwise, we have no other similar assessment tools with the range and sensitivity necessary to distinguish these children from their more moderately gifted peers, until they are of middle school age and able to take the SATs as an out-of-level test. Perhaps our best hope in saving the old Binet lies in the fact that it is also more accurate in identifying children in the moderately and severely retarded ranges. The newer tests have both lower ceilings and higher floors, making them appropriate for children closer to the mean. But when children veer 3 standard deviations from the mean in either direction, the newer tests are of limited value. Vernon (1987) recommends that "psychologists who wish to continue using the third edition (Form L-M) with 2 - 6-year-olds, or with likely gifted children, should do so." (p. 257). We concur, and urge readers to share this article with all those who might be in a position to save the Stanford-Binet (Form L-M) from extinction.
Burks, B. S., Jensen, D. W., & Terman, L. M. (1930). Genetic studies of genius, Vol. 3: The promise of youth. Stanford, CA: Stanford University Press.
Pinneau, S. R. (1961). Changes in intelligence quotient: Infancy to maturity. Boston: Houghton Mifflin.
Robinson, H. B. (1981). The uncommonly bright child. In M. Lewis & L. A. Rosenblum (Eds.), The uncommon child (pp. 57-81). New York: Plenum Press.
Silverman, L. K. (1986a). An interview with Elizabeth Hagan: Giftedness, intelligence, and the new Stanford-Binet. Roeper Review, 8, 168-171.
Silverman, L. K. (1989, October). Lost: One IQ point per year for the gifted. Paper presented at the National Association for Gifted Children 36th Annual Convention, Cincinnati, OH.
Silverman, L. K., & Kearney, K. (1989). Parents of the extraordinarily gifted. Advanced Development, 1 (1), 41-56.
Vernon, P. E. (1987). The demise of the Stanford-Binet scale. Canadian Psychology/Psychologie Canadienne, 28 (3), 251-258.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children III Manual. San Antonio, TX: Psychological Corporation.
(Copyright) ©1992 (Silverman/Kearney)
Used by Permission
For further reading on this subject see The Case for the Stanford Benet L-M as a Supplemental Test: Roeper Review, September 1992 Section: Test Review