<% if Session("LoginStatus") <> "LoggedIn" then Response.Redirect "../Login.asp" %> Noesis

 

"The Making of a Scientist"

by Dr. Anna Rowe 
Dodd, Mead, & Company
New York
1952, 1953

A Book Review
by Robert N. Seitz, Ph. D.
July 15, 2001

 

Introduction


  Dr. Rowe's "The Making of a Scientist" is a widely-cited study of 60 eminent U. S. scientists whose careers, as reviewed in this book, fell within the first half of the twentieth century. These would have been men who were born primarily in the interval between 1890 and 1920. What's most widely quoted about her study were the scores she obtained on "IQ tests" that she administered to them. She gave them three types of tests: verbal, mathematical, and spatial. She then defined three different kinds of "IQ" for the three tests: a "verbal IQ", a "mathematical IQ", and a "spatial IQ".  I've put "IQ" in parentheses because the measurement of adult IQ's above, perhaps, a ratio IQ of 160 has been a Holy Grail of psychometrics. I wondered what kind of IQ test Dr. Roe found that would take these men's measures at levels at or above a "verbal IQ" of 177, and up to a "mathematical IQ" of 194. To make matters more extraordinary, she gave her 39-question mathematical test "only to the biologists and the social scientists". She continues, "I tried it on a few of the physicists just to see. It bothered one of them, but the others sailed right through, making an occasional careless mistake. The test was obviously not difficult enough for them and a waste of their time."
    The median "verbal IQ" for them was 166, with a range from a "verbal IQ" of 121 to a "verbal IQ" of 177. She observes that the 177 high score on this test is probably less than the high scorers could have gotten on a test with a higher ceiling. The 177 "verbal IQ" represents a raw score of 75 out of 79 test items, so the highest scorers were bumping their heads on the ceiling of this test, which would probably have been in the low 180's.
    Their highest score on the spatial test was a "spatial IQ" of 164, with a median IQ score of 137. The score of 164 corresponds to a raw score of 22 out of the 24 questions on the test, so again, the testakers were bumping their heads on the ceiling of the test.
    Their "mathematical IQ's" ranged from 128 to 194, with a median "mathematical IQ" of 154. The "mathematical IQ" equated to a raw score of 27 out of 39 questions on the test, so this test had plenty of headroom.
    So let's see now. It has been argued that although children's IQ's fail to fit a Gaussian distribution, this is because children have unequal mental growth rates, but adults fit a Gaussian distribution curve. Now an IQ of 194 (standard deviation of 16) occurs with an expected frequency of 1 in 500,000,000 people (the 99.9999998th-%tile), or if her tests had a standard deviation of 15, then with an expected frequency of 1 in 5,000,000,000 (the 99.999999998th-%tile). At least two of Dr. Rowe's biologists and psychologists scored at the 194 level on this test, with a raw score of 27 out of 39. Given a standard deviation of 16, that would have made them arguably the two brightest mathematical minds in Western civilization in 1952--or whenever they took Dr. Rowe's test. But wait! The biologists and the social scientists are the second string! The physicists "sailed right through, making an occasional careless mistake." So what does that make their mathematical IQ"s? 220? 240? Reason reels! So you can see why I was intrigued with the details of these widely quoted IQ values.
    Dr. Roe might have argued that these scores were only one of three factor-test results whose scores would have to be combined to yield something like a general IQ score. But even so, the idea that 40% to 50% of the world's reservoir of mathematical aptitude was concentrated in the U. S. in 1950 or 1951 is a little hard to swallow. Then when you crank in the fact that these were just the runners-up---that the first team was well ahead of this group---you realize that either the test didn't measure what it was purported to measure---namely, a deviation iQ of 194---or the 194 was a ratio IQ, or both. (The 194 "mathematical IQ" score was achieved by solving only 69% of the problems on the test so there was still plenty of ceiling left for the physicists who "sailed right through, making an occasional careless mistake". How could she have presented this story with a straight face?) What I don't understand is why neither she, nor apparently anyone else, ever challenged or investigated these claims. To me, this is what science is about, and how science advances. Nor are Dr. Roe's incongruous numbers unusual. Over the next few weeks, I'll be highlighting several other areas that also seem to me to raise significant questions, and to stake out needed studies (and perhaps, opportunities for discovery, or at least, for clarification) vis-a-vis what's going on in this field of intelligence testing.
    To give a brief example, the Flynn Effect is impossible if IQ is primarily determined by heredity.
    To compound the issue, the Flynn Effect has occurred only, or virtually only to fluid g and not to vocabulary, arithmetic, or general information. Scores on the Raven Progressive Matrices have risen in England by 47 points over a 90 year interval, or by 63.5 points (121/74)  as measured by the IQ tests of 1900, had they existed, or by 39 points viewed from 1990. In other words, a sample of average (IQ = 100)  test takers born in 1967 and taking the RRM in 1990 would have gotten a 163.5 IQ on the RAPM. Conversely, someone born in 1877 taking the RPM today would have scored at the same level as one of  today's borderline retarded with an IQ of 61. But this would only occur in tests assessing pattern recognition and the eduction of relationships. The Britisher born in 1877 would have had a vocabulary, an arithmetic capability, and a fund of general information approximately equal to today's Britisher 1967 who scores so dramatically higher on inferential and pattern discernment tests than would his equivalent born in 1877. But in a homogenous cultural milieu such as Britain, there should be no major difference between fluid intelligence (g or Gf) and crystallized intelligence scores. Given a common culture, the person with the higher g will learn more and will use it more proficiently than the person with a lower g score. Crystallized intelligence will closely track fluid intelligence. Dr. Arthur Jensen, in "The g Factor", pg. 124, says of this,
    "Gf and Gc typically emerge as higher order (usually second-order) factors in any large collection of tests given to a highly heterogeneous subject sample in terms of educational or cultural background. In factor analyses based on groups that are quite homogeneous in these respects, such as schoolchildren of the same age and socio-cultural background, Gf and Gc  often are not clearly differentiated and amalgamate into a single general factor. But in the general population, Gf and Gc are clearly discerned, and the distinctions that Cattell makes between them are valid. The major exception is Cattell's prediction that the heritability of Gf is greater than that of Gc. Although this may be true in linguistically or culturally samples for which some of the Gc-loaded tests may be inappropriate or culturally biased measures, the usual finding is that Gf and Gc have about the same heritability. In fact, the heritability of scores on scholastic achievement tests is about the same as that on the best tests of Gf. In terms of Cattell's investment theory, one could say that persons' standings on tests of Gc quite closely reflects that amount of Gf they had to invest in the kinds of content that typically compose highly Gc-loaded tests."
    In other words, Gf shouldn't be able to rise without a corresponding rise in Gc---and certainly not by 63 points. But it has. Something must be wrong somewhere in this chain of logic.
    In fairness to Dr. Roe, it should be mentioned that she says (pg, 159),
    "I was not particularly concerned at the outset over the fact that I had no norms for this test. That is, I had no idea what any other population would do on the test. I just assumed that eminent scientists were extremely bright people, and I did not particularly care just how bright they were. What I wanted to know was whether there was a pattern in the relative standings on these tests for any group, and if so, how these patterns compared. That is, I wanted to know if one group of scientists tended to be relatively high on one test and relatively low on another. The tests, though of different factors and different numbers of items, could be compared directly for any person or group, by converting the raw score (in this case the number of items answered correctly) into what is known as a standard score. The name refers to the standard deviation, a statistical measure which is useful in computing the score. This score gives you the position of the subject with respect to the average and distribution of all the scores in his group. If his standard score is 0, it means that he scores exactly at the average of his group; if his standard scor is -.05, it means that he is at such a position below the average of the group that only one-third of the group got a lower score; if his standard score is +1.0, it means that only one-sixth of the group scored higher than he*."(The standard score is the score measured in standard deviations... e. g., 2 standard deviations.)



"* This assumes a standard distribution of scores on the test, which will be near enough the case on this type of test with a large group."
    What? These are precisely the very-high IQ's which deviate dramatically from a normal distribution. This assumption seems to me to beg the question.



    She also writes (pg. 156),
    "It was assumed for some time that a person's IQ was a fixed part of him, like complexion or eye-color, and that, except in extraordinary instances, it did not change. This was what was called the constancy of the IQ. The term was first used by Terman and although he pointed out at that time that about half of the children he examined had shown changes in IQ (from 1 to over 20 points) this tended to be overlooked.
    "Our ideas on the nature of intelligence and on the constancy of the IQ have changed. There is, however, more agreement on the latter point than the former. We know now that the IQ is not constant in the sense we used to think of it, but that there are many things that may affect it, and that particularly in the very early years, we cannot effectively predict what any individual's IQ will be 10 years later. On the other hand, by the age of 7 or 8, we can get about as good an estimate as we are ever likely to, but we cannot be sure that environmental or emotional influences won't alter it to a greater or lesser extent. Shifts after that time, however, are under most circumstances sufficiently small that the measurement of intelligence is a very useful technique."
    Seeking a test that had sufficient ceiling to test her 60 eminent scientists, Dr. Roe says,
    "I could find none that seemd to me to be difficult enough for the group I proposed to test. In psychological jargon, they did not have enough ceiling."
    It's interesting to note that she must have ruled out the CMT-A and the CMT-T, as well as the Wechsler-Bellevue test. (The latter is only recommended for IQ's up to two standard deviations above the mean, although its official ceiling is 3 2/3rds standard deviations above the mean, and psychometrists sometimes extrapolate its results to scores even higher than that.) She continues,
    "I took my problem to the Educational Testing Service... After some sonsultation, they pulled out a lot of difficult items from their files and made up the verbal test. The spatial test is part of another test, and the mathematical test is an abbreviation of a special test they constructed for one of the military services during the war. All were given with arbitrarily set time limits."
    The Educational Testing Service tends to score its tests upon a percentile basis. Converting percentile scores to IQ scores by reading them off a Gaussian normal distribution makes the implicit assumption that IQ's are normally distributed. But they're not, nor are childhood and adult heights.
Childhood IQ's and Non-Gaussian Distributions
    Both childhood and adult heights are approximately normally distributed near the average adult U. S. male height of 5' 9", but extreme heights occur at a far higher rate than a normal curve would predict. A normal curve predicts that an adult male height of 5 feet or less should be expected to occur about once in every thousand men. It predicts that an adult male height below 4 feet should occur at a rate of only one per billion, and an adult male height below 3 feet would be absurdly impossible.
    A similar situation exists with respect to children's IQ's (and perhaps with adult IQ's as well). Only one IQ of 200 or above should occur on this planet. In practice, IQ's of 200 occur about once among every 500,000 children. For example, one child with an IQ of 200 was identified in the 1921-1922 Terman Study during the screening of 250,000+ California schoolchildren. More to the point, they would have expected, perhaps, one child with an IQ of 170. Instead, they turned up 77 of them! An IQ of 180 or above has an expected frequency of occurrence of 1 in 3,500,000, so they wouldn't have expected to find any IQ's of 180+ during the Terman screening. In fact, they unearthed 26 of them, or about 300 times the number they would have expected. Among her 12 children with IQ's above 180, Leta Hollingworth found one with an IQ of 199, (Child L), one with an IQ of 200 (Child K), and one with an IQ of 200+ (Child F). Four children with IQ's of 200+ were found in the 1940's among the Quiz Kids who lived in the greater Chicago area, Richard Williams, Joel Kupperman, Lonnie Lunde, and Ruth (Duskin) Feldman. Miraca Gross found four in her study of the severely gifted in Australia, including one, Adrian Seng, with an IQ of 220. Marilyn vos Savant's 10-year-old IQ of 228 and other IQ's that are significantly above 200 would be totally impossible if IQ's were distributed in strict accordance with a Gaussian normal curve.
Bottom Line:
    Extreme heights, and extreme IQ's, occur much more frequently than a normal bell curve would predict.

    Getting back to our book, Dr, Roe continues,
    "I made an attempt to get some graduate students to take the same test, just as a matter of general interest, but succeeded in getting only 10, and under circumstances which made it impossible to judge how they had been selected. I then dropped the idea of getting any comparison group... ... ... I had the great good fortune then to meet an old acquaintance, Dr. Irving Lorge, who came to my rescue and arranged to give the test to all students matriculating at Teachers College, Columbia for a Ph. D. that February. All of their Ph. D. students have to take a battery of tests.This test would be included in the battery. Since the other tests had been well standardized it would then be possible to draw up tables of equivalents by which scores on the VSM could be converted (within certain limits of assurance) to scores on these other tests. This, incidentally, upset my budget considerably."
    For me, three questions arise.
    (1) Wouldn't the other tests in Columbia's battery of Ph.D. exams be subject-matter preliminary exams? Would they have included an IQ test? If so, what IQ test would have had sufficient headroom to properly encompass those Ph. D. students?
    (2) How could she have drawn up tables of equivalents that would have allowed her to convert scores on her VSM to scores on other tests? What tests could have gone high enough?
    (3) Could these other tests have been school-administered IQ tests? If so, what about regression to the mean? And above all, we're given no real information about this validation of her VSM. How many Columbia Ph. D.students took the VSM?
The Verbal Test
    She continues, explaining that the verbal test consisted of 80 items, but one was dropped because it didn't discriminate at all well, leaving 50 questions in the first section, and 29 questions in the second section. In the first section, out of four words, you were to pick out the two that were most nearly oposite in meaning. Here is one of the questions: 1. Predictable  2. Precarious  3. Stable  4. Laborious
    "In the second section, the task was the same, but it was presented a little differently This time, one of the opposites was given and the task was to to pick one of five other words which was most nearly opposite to the first one. Here is an example of that group."
    ABSOLUTE:   1-forget  2-usurp  3-absolve  4-utilize  5-limit
    The lowest scores were made by the experimental physicists, with a range of 8 ro 71, and an average score of 46.6, the lowest of any of the groups. The highest scores on this test were made by the theoretical physicists, with a range of 52 to 75, and an average of 64.
IQ's of Various Collegiate Groups
    "Now let us look at IQ's of college populations of today. Embree found that 1,200 high school graduates who went to college had been found during childhood to have a median IQ of 118; those who graduated with a B. A., an IQ of 123. Honor graduates had a median IQ of 133 and those elected to Phi Beta Kappa of 137. The range of IQ's for all of those who received degrees was from 95 to 180. For persons who went on to take a Ph. D., Wrenn found a median IQ of 141.
   " It is clear, then, that so far as verbal ability is concerned these eminent scientists are on the average higher than the general run of those who get Ph. D's, but, and this is very important, some of them are not as high as the average Ph. D. It is, then, not essential to have this ability at the highest level in order to become an eminent scientist. That it is doubtless  a great help is another matter, but it should be remembered that it is less helpful in some fields than in others."
The Spatial Test
    The spatial test consisted of 24 items, with 20 minutes to solve them. Depicted below are three practice for this test.


 

   Once again the theoretical physicists led the pack, although this time, the experimental physicists were right behind them. As mentioned above, scores ranged from "spatial IQ's" of 123 to 164, with a median score of 137. One interesting and unexpected discovery was that the "spatial IQ's" of the 60 eminent scientists declined significantly as a function of increasing age. The correlation coefficient was -0.40. Reviewing it today, we might say that these declines in "spatial IQ's" reflected declines in fluid g, or the Flynn Effect, or both.
The Mathematical Test
    Dr. Roe explains that the mathematical test "was taken from one which the Educational Testing Service had developed for a special project. The original was too long for my purposes, so we selected portions of it, omitting some of the easiest items, and then deleting other items of varying levels of difficulty. There were 39 items in the final form, and 30 minutes was allowed for work on it. The items were generally of the type known as mathematical reasoning, and an example is given below.
    "Select the correct answer:
   "If  x + 3 y = 7 x + 5 y, x/y = ?

   "(A)  -3    (B)  -1/3    (C)  -1/9    (D)  1    (E)  3.

   "If you did it properly, you underlined A."
 

    Whoops, Dr. Anna! The correct answer is B (-1/3), as we can see by solving the problem. Subtracting x + 3 y from both sides of the equation gives us,

    0 = 6 x + 2 y.

    Dividing both sides by 2 yields 3 x + y = 0, or 3 x = -y, or x = -y/3 or, dividing both sides by y, x/y = -1/3.

    Dr. Roe says,
    "Let us look at the equivalents on this test. It is not correlated with age (the correlation coefficient is .00). The lowest score on this test is about equivalent to an IQ of 123, the median score is an IQ of 154 and the highest to an IQ of 194. That is very high indeed. Mathematical ability is certainly important for work in physics, but it seems it can also be important in some other sciences, particularly biology and psychology. The two highest score attained by biologists were made by geneticists."
    "If we examine the correlations between the tests, we see that it is true in this instance that ability to do the mathematical test is not related to ability to do either of the other tests. The correlations are .14 and .21 which are not significant with these groups. There is, however, some correlation between the spatial and verbal tests. The coefficient is +.33, and that is high enough to indicate that some relationship exists. It is not close, but it points up one of the difficulties with the spatial test. That is, it can be done in different ways, Those who do it extremely well do it for the most part without much conscious reasoning about it. They can tell the answer 'just by looking' at the figures and imagining them turned around in various ways. Some of the others, however, are able to do it fairly well by talking to themselves about it, and it is through such circumstances, I think, that the relation with the verbal test comes in."

     To me, Dr. Roe's "verbal IQ's", "spatial IQ's", and "mathematical IQ's" are being quoted and bandied about as though they were gospel, whereas in reality, there's fine print associated with them that, I think, has gotten lost over the years and in the translation. Also, as I read the tea leaves, something doesn't add up in the original IQ numbers.