by Nancy Diamond and Hugh Davis Graham

Planning for the National Research Council's (NRC) next study of research-doctorate programs in the United States, with publication expected in 2004, has highlighted disagreements over how quality should be measured. One side in the debate supports continued reliance on reputational surveys as the primary measure of quality. On the other side, advocates call for more objective measures of research performance, as demonstrated in publications, awards, prizes, and other indicators of scientific and scholarly achievement. Elite institutions favored by the traditional reputational method generally resist the use of more quantitative per capita measures that may favor newer, aspiring programs and universities. Like Republicans and Democrats arguing over which methods to use in conducting the Census, the champions of subjective and more objective methods know that the choice or mix of methods will significantly determine who benefits --and who loses -- from the findings.

The commercial success of college and university rankings published annually by U.S. News & World Report and the 1995 publication of the NRC report, Research-Doctorate Programs in the United States (hereafter, Report), has intensified this debate. (1)  The Report contained a wealth of program data, including quantitative indicators of research output. At the same time, however, the NRC ranked faculty and programs exclusively by their reputational rating. This produced top-quartile lists and top-twenty bragging rights that necessarily disappointed many of the 274 institutions whose programs were included in the study. In the competitive academic marketplace, the stakes of this ratings game are high. Top-ranked research-doctorate programs, or those seen to be within striking distance of the top tier, may win increased funding, recruit nationally recognized faculty and talented students, and place their graduates in the academic job market. Conversely, low ranking can produce program decline and even termination. The prospect of another national NRC study, the first of the 21st century, has heightened interest in the planning process. Ambitious universities not previously accorded top-tier status are especially open to alternative methods that offer institutional challengers an opportunity, one less influenced by inherited hierarchies of status and prestige, to demonstrate their research achievement. In this article, reference to "rising" or "challenging" institutions denotes universities that were not ranked among the top 25 according to any of the four major national surveys since 1960.

In The Rise of American Research Universities (Johns Hopkins, 1997), we emphasized the importance and value of quantitative per capita measures of scholarly research over reputational surveys. (2)  Because that book charts the research development of more than 200 universities since World War II, we aggregated data at the institutional level at several points over time, rather than at the program level, where national studies sponsored by the American Council on Education (ACE) and the NRC have concentrated their analysis. (3)  In this article, we apply the per capita method to program-level data and compare the results with the NRC's reputational ratings of the research quality of program faculty. Our purpose is to test, at the program or department level, our book's dual finding that first, quantitative per capita assessments confirmed the research excellence of most of the elite universities customarily found among the top 20 when judged according to reputation. Second, per capita measures also demonstrated the superior performance of "rising" institutions, whose achievements often have been masked by the national surveys that ranked campuses according to reputation. On the basis of these comparisons, we offer specific recommendations for how -- and how not -- to rate research universities in the next NRC study.

The Strengths and Weaknesses of Reputational Ratings

Reputation surveys have dominated 20th-century assessments of American faculty and graduate education. Developed during the 1920s and 1930s through the pioneering work of Raymond Hughes, and advanced by Hayward Keniston in the late 1950s, reputational surveys had won credibility for three reasons. (4)  First, these evaluations rested on the peer-review principle that scientific, scholarly, and artistic quality is best assessed by recognized experts in the field. Peer review thus represented a qualitative, holistic judgment that also could reflect quantitative measures of research performance. Since World War II especially, peer review has enjoyed wide respect among academics, as well as government, business, and foundation officials, as the most appropriate method for awarding appointments, promotions, tenure, research grants and contracts, and prizes.

Second, the crucial assumptions underpinning peer review -- that the rater is an expert who knows the body of work or persons being assessed -- were reasonably met during the early and middle decades of the 20th century when reputational ratings became the primary evaluation method of the major national studies. Doctoral education prior to World War II was dominated by the prestigious members of the Association of American Universities (AAU), a group of 14 founding campuses whose ranks increased to only 30 institutions in 1940. Even in 1960, the Council of Graduate Schools (CGS), representing institutions that granted 95 per cent of all Ph.D.s, had only 100 member universities. In this still relatively small world of graduate study, the teaching function of doctoral education largely coincided with its research function. Doctoral programs were housed in traditional academic departments, where the faculty generally knew the work of their disciplinary colleagues on other American campuses.

Third, in the absence of alternative, more objective methods of measurement, this legacy of rater familiarity with the research of faculty in their disciplines lent credibility to subjective ratings. Not until the late 1960s and early 1970s did the reporting of federal research funding and developments in electronic data processing, most notably in citation indexing, offer opportunities to measure individual and institutional research output directly, rather than indirectly through the filter of reputation. (5)  At the same time, however, the development of quantitative measures, together with American higher education's dramatic expansion in the 1960s, and the larger revolution in communications and research networks, rapidly undermined the institutional arrangements that had earned early respect for reputational ratings.

The resulting criticism of reputational assessments generally rests on two grounds. One is based on research in the psychology of human perception, while the other, accelerating in its impact, is based on the rapidly changing research environment of the post-Sputnik era. The first body of criticism, duly noted in the NRC Report, emerged from the development of survey research in the 1950s and 1960s. It demonstrates that reputation surveys are biased by a halo effect that lifts the reputations of departments and programs with academic stars, and of those located on prestigious campuses. (6)  Additionally, reputation ratings are biased in favor of large programs. Raters who recognize three published scholars in a department of forty faculty tend to rate it higher than a department of twenty where only two are recognized. (7) 

A second line of criticism, less recognized though more damaging to the validity of reputational ratings, is based on changes which have undermined the very premise that legitimated reputational surveys in the first place. Driven by the defense research imperatives of the Cold War, the unprecedented growth of the American economy, the demographics of the baby boom, and technical advances in communications, the revolution in knowledge creation has radically rearranged our research environment. We have witnessed this great transformation in our lifetimes, and our careers have been enriched by it. Yet we are so intimately caught up in its processes that we need to step back and consider the impact of these changes on the assessment of research achievement.

What are the chief attributes of this transformation? Perhaps most important, research became increasingly specialized, widening the spectrum of inquiry and deepening its penetration. Knowledge creation also grew increasingly interdisciplinary, with a resulting fragmentation of our disciplinary communities. By the 1980s and 1990s, as American universities conferred between 30-to-40,000 new Ph.D.'s annually, the number of qualified researchers exploded, and quality research spread to second- and third-tier institutions. Research institutes proliferated, as did new scientific and scholarly associations and journals. The entire apparatus of research communications and infrastructure was internationalized. Interdisciplinary research was furthered as publication and research collaboration on the internet and electronic mail communication became instantaneous.

The Peer Review Disconnection

These changes have produced important consequences for the evaluation of research-doctorate programs. Most significant has been a profound split between the university's discipline-based organization for graduate training on the one hand, and the interdisciplinary research networks on the other. As the American research university enters the 21st century, its department-based teaching is still grounded in a horizontal structure that is resistant to change. Departments hire faculty to cover the main subfields of the disciplinary terrain and attend to important organizational routines -- such as promotion and tenure decisions and graduate and undergraduate teaching obligations -- requirements that fix faculty firmly within these traditional arrangements. Our large, discipline-based professional associations continue to publish directories that list faculty rosters by department, and the reputational surveys reflect such arrangements. In 1993, for example, the NRC used a disciplinary focus, asking more than 16,000 respondents to rate the scholarly quality of the faculty in some fifty departments in their fields. (8) 

At the same time, faculty research networks that have become increasingly vertical no longer correspond to this horizontal department organization. In this constantly changing research environment, specialized, interdisciplinary networks typically connect researchers to only one or two members of their discipline who share their research interests. These networks then branch outward, and with increasing regularity, reach across the globe. Rather than reflecting department directories, faculty research networks more closely reflect our own e-mail address lists.

This growing disconnection between faculty research networks and the discipline-based doctoral programs is the loss of expertise from the peer review equation in reputational surveys. Faculty raters, who know a great deal about the quality of scholarship in their research areas, are asked instead to assess the work of entire faculties and graduate programs in scores of other departments. It is probable that the distortions of the halo effect, always problematical, have been magnified during recent decades as raters have faced departments filled with specialists whose work was unfamiliar to them. Under these circumstances, scholarship was far less important in determining prestige ratings than either the past reputations of departments or affiliated universities.

The most troublesome consequence of continued reliance on reputational surveys is the harm this subjective method inflicts, however inadvertently, on aspiring departments, programs, and institutions. The prestige of established elites appears to act as a filter, screening from view the research achievements of the challengers, depriving them of recognition for accomplishments they have earned. The result of this baneful process in fact may be two-directional, screening our most prestigious universities from the bracing effects of vigorous competition by challenging institutions.

Comparing Reputational Ratings and Quantitative Measures by Academic Discipline

The argument outlined above, that reputational ratings have grown obsolete and harmful, is plausible, but unproven. Indeed, the history of reputational surveys as the mainstay of national university comparisons since the 1920s shows remarkably little research validating its utility as an accurate measure of research quality. The major national studies instead presumed the primacy of reputational surveys as a measure of research quality. This presumption was defensible through the 1960s and early 1970s when alternative measures of assessment were underdeveloped, and there was a loose academic consensus -- one that still exists -- that rankings based on reputation ratings were more or less correct, especially at the top of the research hierarchy. However, to perpetuate this untested assumption in the face of the extraordinary changes that were undermining its premise represented a disappointing standard of scientific rigor.

In the absence of a systematic validation of the most promising subjective and quantitative measures of university research quality against a benchmark standard of excellence, what evidence is available to test the proposition that reputational ratings fail to recognize the research achievements of rising programs and institutions? First, studies that documented research achievement in individual disciplines, especially sociology and political science, have provided a more finely grained analysis of research performance. Such studies produced rankings based on the number of publications, citations, grants, patents, and other research indicators, and compared ratings based on these measures with reputational ratings. (9)  In several of these single-discipline studies, especially those that relied on per capita measures, researchers have found a discrepancy between reputational ratings and the levels of research achievement shown by rising departments and programs. (10) 

Second, in The Rise of American Research Universities, we demonstrated the same phenomenon at the institutional level. In the public sector we identified 21 rising universities, including the University of California (UC), Santa Barbara and the State University of New York (SUNY) at Stony Brook. In the private sector there were 11 such campuses, including Brandeis and Rochester, institutions whose achievements were under recognized by the major reputational surveys. The institutional-level focus we employed, designed for a different purpose, does not yield the level of precision available through program-level analysis.

In this article, to extend our analysis, we compare NRC's reputational ratings with per capita measures of citation and award density.  The tables below reflect these comparisons in both individual disciplines and broad fields. The left-hand columns document the NRC reputational rankings of scholarly quality of program or department faculty, while the right-hand columns reflect rankings based on per capita citation density (or awards density for humanities fields). Citation density and award measures were provided in the NRC Report, but were not used by the NRC for ranking purposes. (11)  Finally, we compare David Webster's and Tad Skinner's institutional aggregation of the NRC reputational rankings (Change, 1996) with our own grand ranking that is based on quantitative per capita data.

The Strengths and Weaknesses of Citation Measures

Before discussing the tables, it is important to note the strengths and weaknesses of the per capita citation and award measures. These indicators refer to the number of citations or awards for a given department or program divided by the number of program faculty. Such indicators thereby avoid the problem, common in press-release competition among universities, of conflating quantity with quality by comparing total output data (for annual publications, citations, awards, research dollars, etc.) irrespective of program or institutional size. Per capita indicators offer instead a unit of research productivity that can be compared across programs at institutions of different sizes and types. (12).

The value of citation analysis as an indicator of research quality has been widely acknowledged. (13)  Published scholarship varies widely in quality -- roughly half of all scholarly and scientific publications, bibliometricians report, are never cited at all. Ranking university doctoral programs by the frequency with which the published scholarship of their faculty is cited by others thus provides a valuable benchmark of research quality, arguably the best single measure available.

On the other hand, despite its superior value as an indicator of research importance, there are inherent limitations. Citation analysis is but a single indicator, and no single indicator, however excellent, is sufficient for measuring the complexity and quality of institutional knowledge creation. There are other drawbacks. The NRC's funding and deadline pressures in the early 1990s, combined with the limited capacities of the Institute for Scientific Information (ISI), publisher of the Citation Index series, produced a data base of objective indicators with a level of reliability substantially below that which can be achieved today. (14) 

In the last NRC study, errors were introduced through misreporting by campus-based Institutional Coordinators, who were assigned the task of providing the number of campus faculty (the denominator in per capita citation density measures). Still other errors involved output data (publications, citations) caused by mistakes in recording names and institutions, in matching zip codes, and in data entry. However, such flaws tend to be randomly distributed, and produce little significant distortion when aggregated at the level of academic field or institution. Moreover, the number of arts and humanities awards, collected by the NRC staff, avoided most electronic data processing errors. Although caution is required when comparing programs or departments on the basis of the NRC Report's citation density scores, careful comparisons demonstrate persistent discrepancies between subjective and objective measures of research achievement. In all of the comparisons that follow, the patterns of research performance that emerge are consistent with our research findings that reputational rankings tend to mask the demonstrable research achievements of challenging institutions.

Comparing Reputational and Quantitative Measures of Research Achievement by Discipline

Tables 1 through 5 compare reputational and citation density (or award density for the humanities) for individual disciplines. Table 1 shows rankings for the top 25 programs in astrophysics and astronomy, as representative of fields in math and the physical sciences. We selected astronomy as an illustrative discipline for several reasons. Because only 33 doctoral programs in astronomy and astrophysics were rated by the NRC (as compared, for example, with 179 programs in cell and developmental biology), we thought that astronomers are more likely to know one another's work. By implication, members of small research communities should be less vulnerable to the halo-effect distortions of institutional prestige. Thus, the appearance of significant differences between reputation and citation rankings in astronomy reinforces the argument that institutional prestige often distorts collegial perception of research performance.

Table 1 reflects three patterns. The first supports a finding demonstrated in The Rise of American Research Universities: the nation's elite universities that have won the top reputation rankings, have earned their enviable status through superior research achievement. Familiar institutional elites -- Caltech, Princeton, UC Berkeley, Harvard, MIT -- dominate the top ten ranks in both reputation and per capita citation density. However, according to citation density scores (displayed in the right-hand column), certain challenging institutions, either absent or not highly ranked by reputation (displayed in the left-hand column) rise toward the top of the list. These campuses include Massachusetts-Amherst, UC Santa Cruz, SUNY-Stony Brook, and Colorado. This dual pattern is repeated throughout the tables that follow. On the one hand, established elite institutions, such as the Ivy League campuses and great state flagships, are often top-ranked on both reputational and objective measures. At the same time, challenging universities, often younger and smaller institutions such as SUNY-Stony Brook, Brandeis, or the newer UC campuses, break into the upper ranks when measured by their research achievements rather than by a perceived level of prestige. A third pattern found in Table 1 seems distinctive to the fields of astronomy and astrophysics. Certain universities (Arizona, Hawaii-Manoa) appear to benefit from the prominence of their astronomical observatories, scoring higher on reputation but lower when ranked by citation density.

Tables 2 through 5, comparing disciplines representing the biological sciences, engineering, social and behavioral sciences, and arts and humanities, show similar patterns of high rank in both reputation and per capita measures by traditionally prestigious institutions, high rank by rising institutions on quantitative measures, and certain patterns distinctive to specific disciplines. In cell and developmental biology (Table 2), for example, traditional elites -- MIT, Caltech, and Harvard -- rank high according to both measures. (15)  At the same time, several challenging institutions -- Case Western Reserve, Vanderbilt, Brandeis, and Cincinnati (all of which save Brandeis have a campus medical school) -- break into the top 25 when measured by citation density. Finally, cell biology programs based in medical schools are strongly represented in both rankings. The programs at the Stanford and Colorado medical schools, for example, are ranked higher on both reputational and quantitative measures than their counterparts in the arts and sciences.

Similarly, in the field of electrical engineering (Table 3), all three patterns hold. Proven elites -- Caltech, Princeton, Stanford, MIT -- rank high according to both measures. They are joined in the top 10 quantitative rankings (right-hand column) by rising challengers UC Santa Barbara and SUNY-Buffalo. Third, when ranked according to citation density, the rising research universities include non-flagship land-grant universities -- for example, North Carolina State (which also appears among the reputational top 25) and Colorado State -- a group not strongly represented among the top ranks in other fields.

In social and behavioral sciences disciplines such as history, where publication typically takes the form of books rather than journal articles, citation density is a less reliable indicator. However, economics (Table 4) provides a more typical example. The reputational ranking for economics holds few surprises. It is worth noting, however, that at Caltech, where faculty divisions are not organized according to the NRC's disciplinary taxonomy, a "virtual" economics program assembled by the Institutional Coordinator, ranked 19th (of 107 programs) in the NRC's reputation survey. In the eyes of faculty raters, Caltech's exceptional "coattail effect" boosted the reputation of even a program that did not formally exist.

In economics, the top 10 citation density ranking was led by a number the same elite institutions -- Chicago, Harvard, MIT, Stanford -- found in the reputational top 10, with the striking exception of Maryland-College Park, which jumped to first place in citation density from 20th rank in reputation. Maryland's high per capita ranking demonstrates in part the power of academic stars, in this case, College Park economist Mancur Olson, whose widely cited 1971 book, The Logic of Collective Action, created a new analytical paradigm. (16)  Aside from Maryland's striking rise (and Caltech's highly regarded "virtual" program), the economics comparison demonstrates a similar pattern of challenging institutions -- Boston University, Rochester, Vanderbilt -- rising in the per capita category.

The final discipline-based comparison ranks programs in philosophy (Table 5) as representative of the arts and humanities. Because no accurate method for measuring book publication was available in the early 1990s, the NRC staff independently compiled a data file of honors and awards received by humanities program faculty. Unfortunately, the Report provides awards data for only a small number of programs, especially when compared with the high numbers of article and citation data that were documented. As a consequence, a small difference in awards per program faculty produced a large difference in ordinal ranking, and also a large number of ranking ties.

In our own research for The Rise of American Research Universities, to account for fact that book publication was not represented, we constructed a similar index for measuring the research productivity of arts and humanities faculty. In a pilot study that documented the relationship between per capita awards and book publication in three humanities disciplines, we found a positive correlation of 0.73. This correlation demonstrates that the documentation of awards can provide a practical substitute for book publication. The awards density measure thus tends to be a high quality, low quantity indicator, the opposite of such measures as total publications or total research grant dollars.

The philosophy comparisons show similar patterns to those in other disciplines, with programs at prestigious universities -- Princeton, Harvard, UC Berkeley, Stanford, Michigan, Cornell, MIT, Chicago, and Brown -- ranked among the top 15 in both reputation and awards density. Yet, consistent with the other disciplinary comparisons (Tables 1-4), challenging institutions -- Illinois-Chicago (ranked 8th), Massachusetts-Amherst, Emory, Notre Dame, and Syracuse -- break into the top 25 quantitative ranking. A distinctive element is represented by Pittsburgh, which placed two top-ranked programs (Philosophy, and History and Philosophy of Science) in both the reputational and award density categories.

Comparing Reputational and Quantitative Measures for Academic Fields

Tables 6 through 10 compare top-20 reputational (provided by Webster and Skinner) and per capita institutional rankings for the five broad fields of study represented by single disciplines (Tables 1-5). When performance measures for doctoral programs are aggregated at the level of field rather than discipline, the top 10 ranks in citation density are typically dominated by established elites, with challengers breaking into the second ten ranks. Thus, in the physical sciences and mathematics (Table 6), the challenging universities in the citation density category include Arizona, UC Santa Barbara, Colorado, New York University, and Pittsburgh. In the biological sciences (Table 7), challengers -- UC Irvine, Iowa, and Colorado -- break into the quantitative top 20. In engineering (Table 8), UC Santa Barbara, tied for 16th in the reputational category, soars into third place in the citation density ranking. Syracuse, SUNY-Buffalo, and Rochester are ranked in the second ten. On the other hand, Purdue, Carnegie Mellon, Georgia Tech, and Penn State, ranked in the reputational top 20, do not appear in the per capita citation top 20.

In the social and behavioral sciences (Table 9), established leaders dominate the first 10 places in the citation density column, and challengers, led by 9th-ranked SUNY-Stony Brook, dominate the second 10 ranks. It is striking how many universities highly ranked by reputation -- UC Berkeley, Princeton, Minnesota, Cornell, North Carolina-Chapel Hill, and Illinois-Urbana -- are not included in the top 20 citation density ranking. In the arts and humanities (Table 10), where high rankings are dominated by private institutions, prestigious universities continue to lead the top ranks on both reputational and quantitative measures. Challengers -- UC Davis, Rice, and UC Irvine -- follow in the second 10 of the awards density category.

Institutional Grand Ranking

The final comparison (Table 11) shows the top 50 institutions ranked according to the mean score of reputation in the left column and by citation and awards density in the right column. (17)  Not surprisingly, at this level of competition, it is more difficult for challengers to break into the top ranks in either category. The established leaders who dominate the reputational rankings tend to be strong across the academic spectrum. This is especially true for the well-endowed private universities, which claim eight of the top 10 positions in both rankings. (The other two universities are the UC campuses at Berkeley and San Diego.) Challenging institutions, in contrast, typically have concentrated their resources on their strongest programs, building what Stanford provost Frederick E. Terman called "steeples of excellence." For such rising institutions (which included Stanford in the 1950s), this strategy seems well designed for breaking into the top ranks -- Terman emphasized that "the steeples be high for all to see." (18)  Thus, the 18 highest ranked universities in both the reputational and quantitative rankings are all institutions rated in the top 20 in the previous major reputational surveys. Judged according to the more objective citation or award density measure, the challengers appear beginning with UC Santa Barbara, ranked 19th among all universities, and 6th among public universities. Successful rising challengers -- Colorado, Washington University, Rochester, UC Irvine, and SUNY-Stony Brook -- rank 21 through 25 respectively. A second block of rising universities is led by Rice and Brandeis, ranked 31st and 32nd respectively, in the quantitative per capita density categories.

How Should the Next NRC Study Rate Research-Doctorate Programs?

As the NRC gears up for the next study, measuring the research quality of program faculty is only one item on the Council's planning agenda. Economist Charlotte Kuh, the project director, has been meeting with various administrative and faculty groups to hear their opinions, and to let them know that the next study must prove more useful than its predecessors to nonacademic constituencies, chiefly, government policymakers, private foundations, and perhaps most challenging to reach, business leaders. The new study thus will include more interpretation of data and trends that address the interests and needs of both academic and nonacademic constituencies.

Raising funds to support the next study will be difficult, partly because its predecessors are seen as chiefly of interest to academics concerned about the pecking order of institutional and program prestige. In addition, vexing problems of program taxonomy, alluded to in our earlier discussion of the growing mismatch between traditional departmental structures and the increasing interdisciplinary fluidity of the research enterprise, confront the study's planners. The graduate student constituency also needs more useful measures of program effectiveness. There is little evidence, for example, that beyond the problematical reputational rankings, prospective graduate students have found the findings from previous national surveys useful.

Nonetheless, the heart of the next NRC study should remain a comparative assessment of the quality of research in the nation's research-doctorate programs. Leading the world in knowledge production, American universities are crucial to economic growth and competitive success in the global market of the 21st century. We hear this rhetoric all around us and read it in boilerplate promotional literature from campus and corporation alike; yet, it is profoundly true. It is therefore important that the first national study of the 21st century succeed where the previous national studies fell short -- by producing a report that documents not only the sustained and merited reputation of traditional elites, but also the new research leadership of rising institutional challengers. Leaders in government, business, and industry, comfortable in their long association with elite programs and campuses, could learn from a new study that the pool of institutional talent is much deeper than it has appeared.

What research design should guide the next NRC assessment of research-doctorate programs? At a June 1999 NRC project planning conference in Washington, D.C., a group of faculty and administrators drawn from the nation's campuses reached a consensus that the new study's design should be guided by the results of a pilot project. This pilot study would examine intensively a sample of representative programs and institutions, measuring multiple indicators of performance (publications, citations, patents, research funding, awards and fellowships, etc.) to establish benchmarks of research quality and program effectiveness. Then, against these standards a variety of indicators designed to measure the full program universe would be tested to compare their ability to predict the standard. The methods tested would include measures used in previous studies (reputational survey questions, article publication, citations, humanities awards, etc.) and potential new indicators (publications and citations in leading journals, patents, book publication, measures of graduate training effectiveness such as job placement).

Should the next NRC study include a reputation survey of faculty? In our Chronicle of Higher Education (June 1999) opinion essay, we answered no -- reputation assessments should be given an honorable burial in the century that gave it birth, that benefitted from its maturity, and that witnessed its subsequent decay under the relentless pressures of the knowledge revolution. (19)  But a decision should depend substantially on the results of the NRC's pilot study. It is striking that throughout the last century the reputation survey device, though employed as the lead rating measure in all the major national studies, has never been systematically validated as a measure of research quality.

The NRC decision on whether or not to use the reputation survey moreover, may be determined in part by other factors. These surveys are defended as a social science method providing uniquely holistic, peer review assessments that reflect the strengths of large programs, and that also provide continuity over time for longitudinal studies. At the same time, there are political factors to be considered, legitimate concerns when dealing with a quasi-official body performing such a high-stakes service. Universities that traditionally have dominated reputation surveys constitute a powerful lobby for their continued use. The political need of organizations to avoid the controversy inherent in ranking their own members or constituent groups is also a relevant concern. Assessments of reputation enable organizations, such as the NRC and the ACE, to claim that the quality ratings were determined by expert members of the constituency, not by the sponsoring research organization.

Whether or not a reputation survey is included in the next study, the NRC must above all avoid repeating the major mistake of the 1995 project -- the listing of all programs as ranked by reputational survey score. This identified the Council, the research arm of the National Academy of Sciences, as an official arbiter of rank in the great academic ratings game. Even were the reputational evaluation not so vulnerable to challenge, the decision to rank programs exclusively by subjective data, rather than to list both subjective and objective program data alphabetically by program (a presentation used in the NRC 1982 study), stamped a particular pecking order with the NRC's powerful imprimatur.

However cloudy the future of subjective rankings may remain, the promise of effective objective measures of research quantity and quality appears bright. The ISI reports significant advances since the early 1990s in the comprehensiveness and reliability of its data files, and in its ability to match authors, publications, citations, and programs on a large scale. (20)  The NRC pilot project will provide an opportunity to develop and test new indicators of scholarly research performance, including measures of publication and citation in leading journals. (21)  An indicator documenting book publication would be appropriate for assessing humanities faculty, who have limited engagement in journal publications. (22)  The awards indicator for humanities programs, high in promise as a qualitative measure, but weakened by low award totals in the 1995 study, can be greatly strengthened by including competitive awards and prizes conferred by academic associations. Moreover, in assessing graduate program effectiveness, the pilot program may develop and test such revealing measures as time to degree, job placement, and postdoctoral fellowship awards. (23) 

A study of research-doctorate programs appropriate for the 21st century may be published on the web in a format convenient to consumers. Users might download the data and calculate their own rankings, possibly by using software accompanying the report that allows users to construct composite scoring schemes, similar to those used by the U.S. News rankings, that assign varying weights to selected measures of program performance.

The planning for the next NRC national assessment is facing intense scrutiny because the stakes are unusually high. As evidence accumulates that the tradition of focusing primarily on prestige ratings has masked a successful surge by challenging programs and institutions, the risk remains that the next study could repeat the old pattern. Much of the blame rests with the academic audience itself, which has rushed to embrace or criticize the prestige ratings, even as the sponsoring organizations, the NRC and the ACE, have tried to emphasize the variety of program measures and to resist aggregating program data into grand institutional rankings.

Clark Kerr, taking a longer view, observed (Change, 1991) that the timing of reputational change in American higher education history has coincided with periods of great transformation. The first occurred, Kerr claimed, after the Civil War when the great private and state research universities were built, and the second occurred with the expansion of research activity inspired by federal funding in the wake of Sputnik. In Kerr's view, the period from 1990 to 2010, during which "[a]t least three fourths of the faculties will turn over, and there will be some net additions . . . as enrollments rise," may be another period of significant change in the leadership configuration of America's research universities. (24)  If Kerr is correct, the NRC has a unique opportunity to describe the new alignment and set the standard of evaluation.

Revised June, 2000


1. National Research Council, Research-Doctorate Programs in the United States: Continuity and Change (Washington, D.C.: National Academy Press, 1995).

2. Hugh Davis Graham and Nancy Diamond, The Rise of American Research Universities: Elites and Challengers in the Postwar Era (Baltimore: Johns Hopkins University Press, 1997).

3. The four major national post-World War II reputational studies are: Alan M. Cartter, An Assessment of Quality in Graduate Education (Washington, D.C.: American Council on Education, 1966; Kenneth D. Roose and Charles Andersen, A Rating of Graduate Programs (Washington, D.C.: American Council on Education, 1970; Lyle V. Jones et al., An Assessment of Research-Doctorate Programs in the United States, 5 vols.(Washington, D.C.: National Academy Press, 1982); and the National Research Council 1995 study cited above.

4. Raymond M. Hughes, A Study of the Graduate Schools of America (Oxford, Ohio: Miami University Press, 1925); Hughes, "Report of the Committee on Graduate Instruction," Educational Record 15: 192-234. Hayward Keniston, Graduate Study and Research in the Arts and Sciences at the University of Pennsylvania (Philadelphia, Pa.: University of Pennsylvania Press, 1959).

5. In the NRC-sponsored studies of 1982 and 1995, which expanded the use of quantitative measures, reputational ratings showed a strong positive correlation with the more objective research indicators. Such high correlations are an expected result when comparing large numbers of research doctorate programs.

6. James Fairweather, "Reputational Quality of Academic Programs: The Institutional Halo Effect," Review of Higher Education 28,4 (1988): 345-56; Robert K. Toutkoushian, Halil Dundar, and William E. Becker, "The National Research Council Graduate Program Ratings: What Are They Measuring?" Review of Higher Education 21,4 (1998): 315-42.

7. For a review of reputational surveys, see David S. Webster, "Reputational Rankings of Colleges, Universities, and Individual Disciplines and Fields of Study from their Beginnings to the Present," Higher Education: A Handbook of Theory and Research, vol.8, John C. Smart, 234-304 (New York: Agathon Press, 1992). See also David L. Tan, "The Assessment of Quality in Higher Education: A Critical Review of the Literature and Research," Research in Higher Education 24,3 (1986): 223-65, and Clifton F. Conrad and Robert T. Blackburn, "Program Quality in Higher Education: A Review and Critique of the Literature and Research," John C. Smart, ed., Higher Education: Handbook of Theory and Research, vol.1 (New York: Agathon Press, 1986).

8. The NRC in 1993 sent questionnaires to 16,700 of the 65,470 faculty in the 274 institutions in the study; roughly half (7,900) returned usable questionnaires. The survey's most important rating indicator was "93Q," where respondents rated the scholarly quality of the program faculty on a scale of 0 to 5, with 0 denoting "Not sufficient for doctoral education" and 5 denoting "Distinguished."

9. These studies began to appear in the late 1960s. See for example, Lionel S. Lewis, "On Subjective and Objective Rankings of Sociology Departments,"American Sociologist 3 (1968): 129-31; W. Miles Cox and Viola Catt, "Productivity Ratings of Graduate Programs in Psychology Based on Publication in the Journals of the American Psychological Association," American Psychologist (October 1973): 793-809. For a more recent view, see James C. Garand and Kristy L. Graddy, "Ranking Political Science Departments: Do Publications Matter?" PS 32,1 (March 1999): 113-16.

10. See for example Richard C. Anderson, Francis Narin, and Paul McAlister, "Publication Rating versus Peer Rating of Universities," Journal of the American Society for Information Science (March 1978): 91-103.

11. The NRC Report, Appendix P, contains reputational and citation data for all programs, and per capita density measures for citations. The number of awards won by arts and humanities program faculty was also provided in Appendix J. We calculated per capita award density measures for these fields (Tables 5 and 10) by dividing the total number of awards for each university by the number of department or program faculty. The citation and award density scores were then converted into Z-scores to standardize the quantitative rankings. In Tables 6 through 11, the reputational scores are drawn from David S. Webster and Tad Skinner, "Rating PhD Programs: What the NRC Report Says ...and Doesn't Say," Change (May/June 1996): 24-44.

12. For a discussion of per capita measures, see Graham and Diamond, Rise of American Research Universities, 55-63.

13. See for example Jonathan R. Cole and Stephen Cole, Social Stratification in Science (Chicago: University of Chicago Press, 1973).

14. Research-Doctorate Programs in the United States (1995), Appendix G, 143-46; Brendan A. Maher, The NRC's Report on Research-Doctorate Programs: Its Uses and Misuses," Change (November/December 1996): 54-59.

15. Rockefeller University and UC San Francisco, ranked second and third, respectively, by reputation, were not included in the citation density ranking for this study. Both institutions had fewer than 11 programs rated in the 1995 NRC study, the minimum number selected for inclusion in our comparison.

16. See Mancur Olson, The Logic of Collective Action (Cambridge: Harvard University Press, 1971). The NRC Report recorded a citation density of 32.3 for the economics program faculty at Maryland, ranked 20th by reputation for scholarly quality, compared with an average citation density of 15.9 for the top 10 economics programs ranked by reputation. The Report provided a Gini coefficient of 74.7 for the Maryland department, indicating an unusually high concentration of citations on a small number of the program faculty. (The mean Gini coefficient for to 10 economics programs ranked by reputation is 10.0.) Olson accounted for roughly one fifth of the citations attributed to Maryland's 47 economics faculty members. For a discussion of the Gini coefficient, see Ronald G. Ehrenberg and Peter J. Hurst, "The 1995 NRC Rankings of Doctoral Programs: A Hedonic Model," Change (May/June 1996): 46-50.

17. The reputational rankings are from Webster and Skinner, who ranked 104 institutions with 15 or more programs included in the 1995 NRC study. In calculating the z-scores for citation and award density, we included 110 institutions. To Webster and Skinner's 104 campuses, we added six institutions with fewer than 15 NRC-rated programs: Alabama-Birmingham (13 programs), Brandeis (14), Dartmouth (11), Delaware (13), Georgetown (14), and Tufts (11).

18. Stanford provost Frederick E. Terman, quoted in Roger L. Geiger, Research and Relevant Knowledge: American Research Universities Since World War II (New York: Oxford University Press, 1993), 125.

19. Hugh Davis Graham and Nancy Diamond, "Academic Departments and the Ratings Game," Chronicle of Higher Education, 18 June 1999.

20. Henry Small, "Relational Bibliometrics," Proc. Fifth Biennial Conference, International Society for Scientometrics and Infometrics, M.E.D. Koenig and A. Bookstein, eds. (Medford, N.J.: Learned Information, 1995), 525-32.

21. In The Rise of American Research Universities, top-journal analysis of publications provided high quality indicators of research achievement in science, engineering, and the social and behavioral sciences. Analysis of citations in such leading journals offers even greater promise as an indicator of research quality. The leading journals in a program field may be identified using objective criteria, such as the ISI's list of "Journals Ranked by Times Cited." However, the problem of using top-journal analysis, from the perspective of a quasi-official sponsoring organization such as the NRC, may be less technical than political. Identifying top journals in an NRC study may provoke resentment by academic organizations, subscribers, and researchers associated with excluded journals, who object to NRC selection as a form of an endorsement that provides researchers, especially untenured scientists and scholars, with prestige guideposts on where and where not to seek publication.

22. The inability to link book authors to their academic programs and institutions on other than a hand-count basis has meant that books (other than anthologies, which are included in ISI data) have been excluded from all the major studies. The exclusion of book publication from studies of academic research achievement has been a glaring weakness of the large-scale studies, including, to our regret, The Rise of American Research Universities. There is reason to believe, however, that technical methods are now available to link book authors to their institutions. The pilot study should provide the NRC, and perhaps the ISI, with an opportunity to develop and test such a measure, especially in arts and humanities programs where book publication is the norm of scholarly output.

23. See Maresi Nerad and Joseph Cerny, "From Rumors to Facts: Career Outcomes of English Ph.D.s; Results from the Ph.D.'s-Ten Years Later Study," CGS Communicator 32,7 (Special Issue Fall 1999): 1-12.

24. Clark Kerr, "The New Race to be Harvard or Berkeley or Stanford," Change (May/June 1991): 1.

Table 1
Top 25 Research-Doctorate Programs in Astrophysics and Astronomy
Ranked by Mean Score of Reputation Rating and Citation Density

Rank Campus 93Q Score Rank Campus Z-Score
1 Caltech 4.91 1 Caltech  3.486
2 Princeton 4.79 2 UC-Berkeley  1.707
3 UC-Berkeley 4.65 3 UMass-Amherst  1.622
4 Harvard 4.49 4 UC-Santa Cruz  1.134
5 Chicago 4.36 5 Harvard  0.724
6 UC-Santa Cruz 4.31 6 Princeton  0.716
7 Arizona 4.10 7 MIT  0.410
8 MIT 4.00 8 SUNY-Stony Brook  0.338
9 Cornell 3.98 9 Colorado  0.292
10 Texas-Austin 3.65 9 Yale  0.292
11 Hawaii-Manoa 3.60 11 Minnesota  0.221
12 Colorado 3.54 12 Chicago  0.039
13 Illinois-Urbana 3.53 12 Cornell  0.039
14 Wisconsin-Madison 3.46 14 UCLA  0.007
15 Yale 3.31 15 Maryland-College Park -0.152
16 UCLA 3.27 16 Arizona -0.214
17 Virginia 3.23 17 Texas-Austin -0.224
18 Columbia 3.20 18 Stanford -0.299
19 Maryland-College Park 3.07 19 Columbia -0.307
20 UMass-Amherst 3.04 20 Wisconsin-Madison -0.469
21 Penn State 3.00 21 Illinois-Urbana -0.501
22 Stanford 2.96 22 Indiana -0.595
23 Ohio State 2.91 22 Ohio State -0.595
24 Minnesota 2.59 24 Hawaii-Manoa -0.629
25 Michigan 2.65 25 Michigan -0.696

Source: National Research Council, Report, 1995, Appendix table L-1.
Note: The 93Q score refers to the NRC reputation rating of scholarly quality of program faculty on a scale of 0 to 5,
with 0 denoting "not sufficient for doctoral education," and 5 denoting "distinguished."

Table 2
Top 25 Research-Doctorate Programs in Cell and Developmental Biology
Ranked by Mean Score of Reputation Rating and Citation Density

Rank Campus 93Q Score Rank Campus Z-Score
1 MIT 4.86 1 MIT 4.649
2 Rockefeller U 4.77 2 Stanford Medical 3.547
3 UC-San Francisco 4.76 3 UC-San Diego 3.219
4 Caltech 4.73 4 Colorado Medical 2.661
5 Harvard 4.70 5 Harvard 2.176
6 Stanford Medical 4.55 6 Caltech 2.029
7 UC-San Diego 4.50 7 Yale 1.969
8 U of Washington 4.49 8 Duke 1.420
9 Washington U 4.48 9 Princeton 1.680
10 Yale 4.37 10 U of Washington 1.515
11 Princeton 4.36 11 Washington U 1.436
11 Stanford (A&S) 4.36 12 Case-Western Reserve 1.088
13 UC-Berkeley 4.15 13 UCLA 1.045
14 Duke 4.11 14 UNC-Chapel Hill 1.032
15 Chicago 4.10 15 Columbia 0.880
16 Wisconsin-Madison 4.05 16 Penn 0.848
17 UCLA 3.99 17 Chicago 0.751
18 Texas-SW Medical 3.98 18 Vanderbilt 0.759
19 Columbia 3.94 19 Johns Hopkins 0.532
20 Johns Hopkins 3.91 20 New York U 0.517
21 New York U 3.88 21 UC-Berkeley 0.456
22 Colorado Medical 3.85 22 Brandeis 0.386
23 Pennsylvania 3.81 23 Minnesota Medical 0.380
24 Baylor Medical 3.80 24 Cincinnati 0.374
25 UNC-Chapel Hill 3.79 25 Illinois-Chicago 0.355

Source: National Research Council, Report, 1995, Appendix table P-7.

Table 3
Top 25 Research-Doctorate Programs in Electrical Engineering
Ranked by Mean Score of Reputation Rating and Citation Density

Reputation Citations/Faculty
Rank Campus 93Q Score Rank Campus Z-Score
1 Stanford 4.83 1 Caltech 3.861
2 MIT 4.79 2 Princeton 3.738
3 Illinois-Urbana 4.70 3 Stanford 3.178
4 UC-Berkeley 4.59 4 UC-Santa Barbara 2.685
5 Caltech 4.46 5 MIT 2.034
6 Michigan 4.38 6 Columbia 1.952
7 Cornell 4.35 7 SUNY-Buffalo 1.677
8 Purdue 4.02 8 UC-Berkeley 1.591
9 Princeton 4.01 9 Illinois-Urbana 1.362
10 Southern California 4.00 10 UC-San Diego 0.960
10 UCLA 4.00 11 Pennsylvania 0.939
12 Carnegie-Mellon 3.94 12 CUNY 0.853
13 Georgia Tech 3.93 13 Northwestern 0.843
14 Texas-Austin 3.88 14 Cornell 0.802
15 Columbia 3.79 15 Michigan 0.604
16 Wisconsin-Madison 3.75 16 North Carolina State 0.558
17 MD-College Park 3.75 17 Purdue 0.527
18 Minnesota 3.73 18 Brown 0.456
19 UC-Santa Barbara 3.71 19 Rochester 0.375
20 UC-San Diego 3.57 20 UCLA 0.359
21 North Carolina State 3.54 21 Maryland-College Park 0.288
22 Ohio State 3.53 21 Texas-Austin 0.288
23 Rensselaer 3.44 23 Colorado State 0.258
24 Polytechnic 3.42 24 Rice 0.232
24 U of Washington 3.42 25 Yale 0.141

Source: National Research Council, Report, 1995, Appendix table P-16.

Table 4
Top 26 Research-Doctorate Programs in Economics
Ranked by Mean Score of Reputation Rating and Citation Density

Reputation Survey
Rank Campus 93Q Score Rank Campus Z-Score
1 Chicago 4.95 1 Maryland-College Park 4.199
1 Harvard 4.95 2 Chicago 2.823
3 MIT 4.93 3 Harvard 2.743
4 Stanford 4.92 4 MIT 2.631
5 Princeton 4.84 5 UC-San Diego 2.071
6 Yale 4.70 6 Boston U 1.799
7 UC-Berkeley 4.55 7 Stanford 1.607
8 Penn 4.43 8 Rochester 1.527
9 Northwestern 4.39 9 UC-Berkeley 1.495
10 Minnesota 4.22 10 U of Washington 1.383
11 UCLA 4.12 11 Vanderbilt 1.367
12 Columbia 4.07 12 Northwestern 1.303
13 Michigan 4.03 13 Yale 1.159
14 Rochester 4.01 14 Pennsylvania 1.111
15 Wisconsin-Madison 3.93 15 Princeton 1.095
16 UC-San Diego 3.80 16 Michigan 1.999
17 New York U. 3.62 17 Michigan State 0.983
18 Cornell 3.56 18 Rice 0.711
19 Caltech 3.54 19 UCLA 0.583
20 Maryland-College Park 3.80 20 Southern California 0.519
21 Boston U 3.39 21 Duke 0.279
22 Duke 3.36 21 Wisconsin-Madison 0.279
23 Brown 3.34 23 New York U 0.231
24 Virginia 3.20 24 Iowa 0.135
25 UNC-Chapel Hill 3.16 25 Cornell 0.087
      25 Kentucky 0.087

Source: National Research Council, Report, 1995, Appendix P-15.

Table 5
Top 25 Research-Doctorate Programs in Philosophy

Ranked by Mean Score of Reputation Rating and Citation Density

Rank Campus 93Q Score Rank Campus Z-Score
1 Princeton 4.93 1 Harvard 3.471
2 Pittsburgh 4.73 2 Cornell 2.471
3 Harvard 4.69 3 Brown 1.647
4 UC-Berkeley 4.66 3 MIT 1.647
5 Pittsburgh* 4.47 5 Chicago 1.588
6 UCLA 4.42 6 Princeton 1.529
7 Stanford 4.20 6 UC-Berkeley 1.529
8 Michigan 4.15 8 Illinois-Chicago 1.353
9 Cornell 4.11 9 Northwestern 1.176
10 MIT 4.01 10 Michigan 1.118
11 Arizona 3.98 11 Pittsburgh 1.000
12 Chicago 3.88 12 UMass-Amherst 0.882
13 Rutgers 3.82 13 Pittsburgh* 0.706
13 Brown 3.92 14 Columbia 0.647
15 UC-San Diego 3.79 14 Indiana 0.647
16 Notre Dame 3.69 14 Emory 0.647
17 UNC-Chapel Hill 3.67 17 Penn 0.529
18 Illinois-Chicago 3.51 17 Duke 0.529
19 CUNY Graduate School 3.45 17 Notre Dame 0.529
20 UMass-Amherst 3.44 20 Syracuse 0.412
21 UC-Irvine 3.30 21 UC-San Diego 0.235
22 Wisconsin-Madison 3.28 22 Washington U 0.176
23 Syracuse 3.28 22 Penn State 0.176
24 Ohio State 3.21 24 UCLA 0.118
25 Northwestern 3.18 25 Iowa 0.000

Source: National Research Council, Report, 1995, Appendix table J-9.
*Program in History and Philosophy of Science

Table 6
Top 20 Institutions in Physical Sciences and Mathematics
(of Those That Had at Least Four of the Eight Such Programs Ranked)
Ranked by Mean Score of Reputation Rating and Citation Density

Rank Campus Mean Score Rank Campus Z-Score
1 UC-Berkeley 4.74 1 Harvard 17.307
2 MIT 4.69 2 Caltech 12.372
3 Caltech 4.61 3 MIT 10.595
4 Harvard 4.50 4 U Washington 10.263
5 Princeton 4.48 5 UC-Berkeley 9.456
6 Cornell 4.36 6 Princeton 9.356
7 Chicago 4.30 7 Stanford 9.050
8 Stanford 4.22 8 Columbia 8.257
9 UC-San Diego 4.07 9 Arizona 6.127
10 Texas-Austin 4.04 10 Johns Hopkins 6.038
12 UCLA 3.97 11 UCLA 5.872
12 Columbia 3.97 12 Northwestern 5.735
12 Yale 3.97 13 UC-San Diego 5.558
14 U Washington 3.91 14 UC Santa Barbara 4.791
15 Illinois-Urbana 3.89 15 Colorado 4.162
16 Wisconsin-Madison 3.81 16 New York U 3.761
17 Brown 3.73 17 Yale 3.687
18 Carnegie Mellon 3.66 18 Pittsburgh 3.552
19 Purdue 3.58 19 Penn 3.092
20 Rice 3.56 20 Cornell 3.081

Source: National Research Council, Report, 1995.
David S. Webster and Tad Skinner, aggregated reputation survey data by field.
See Webster and Skinner, "Rating Ph.D. Programs: What the NRC Report Says . . . And Doesn't Say,"
Change (May/June 1996) . Reputation ranking for Physical Sciences and Mathematics, Table 5.

Table 7
Top 20 Institutions in Biological Sciences
(of Those That Had at Least Four of the Eight Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density

Reputation Citations/Faculty
Rank Campus Mean Score Rank Campus Z-Score
1 UC-San Francisco 4.6 1 Stanford 21.727
2 MIT 4.54 2 Harvard 17.197
3 Harvard 4.43 3 UC-San Diego 16.985
4 UC-San Diego 4.42 4 Caltech 14.029
4 Stanford 4.42 5 Yale 14.015
6 Yale 4.40 6 MIT 13.244
7 UC-Berkeley 4.36 7 Columbia 9.547
8 Rockefeller U 4.31 8 U of Washington 9.205
9 Washington U 4.19 9 Johns Hopkins 8.772
10 U of Washington 4.18 10 UC-Berkeley 6.092
11 Columbia 4.15 11 Pennsylvania 6.018
12 Caltech 4.07 12 Duke 5.553
12 Duke 4.07 13 Michigan 5.363
14 Wisconsin-Madison 4.04 14 UCLA 4.866
15 Pennsylvania 4.03 15 Washington U 4.770
16 Chicago 3.99 16 UC-Irvine 4.404
16 Johns Hopkins 3.99 17 Iowa 3.424
18 Texas-SW Medical 3.94 18 Colorado 3.415
19 UCLA 3.93 19 Chicago 3.291
20 Baylor Medicine 3.87 20 UNC-Chapel Hill 3.043

Source: National Research Council, Report, 1995; Webster and Skinner, Table 3.

Table 8
Top 20 Institutions in Engineering
(of Those That Had at Least Four of Eight Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density

Reputation Rating
Rank Campus Mean Score Rank Campus Z-Score
1 MIT 4.65 1 Stanford 16.553
2 UC-Berkeley 4.47 2 Caltech 13.280
3 Stanford 4.33 3 UC Santa Barbara 10.954
4 Caltech 4.31 4 UC-Berkeley 9.610
5 Cornell 4.16 5 Minnesota 9.126
6 Princeton 4.13 6 MIT 8.615
7 Illinois-Urbana 4.05 7 Princeton 7.749
8 Michigan 4.00 8 Northwestern 7.657
9 UC-San Diego 3.92 9 Cornell 7.421
10 Minnesota 3.15 10 Texas-Austin 5.536
11 Northwestern 3.84 11 UCLA 5.355
12 Purdue 3.83 12 Johns Hopkins 5.062
13 Texas-Austin 3.82 13 Illinois-Urbana 5.038
14 Carnegie-Mellon 3.80 14 Syracuse 2.986
15 Pennsylvania 3.71 15 Pennsylvania 2.925
16 UC-Santa Barbara 3.70 16 SUNY-Buffalo 2.733
16 Wisconsin-Madison 3.70 17 Michigan 2.710
18 Georgia Tech 3.60 18 Wisconsin-Madison 1.967
19 UCLA 3.50 19 UC-San Diego 1.649
20 Penn State 3.44 20 Rochester 1.634

Source: National Research Council, Report, 1995; Webster and Skinner, Table 4.

Table 9
Top 20 Institutions in Social and Behavioral Sciences
(of Those That Had at Least Three of the Seven Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density

Rank Campus Mean Score Rank Campus Z-Score
1 Harvard 4.61 1 Stanford 12.135
2 Chicago 4.56 2 Harvard 11.776
3 UC-Berkeley 4.48 3 Chicago 10.412
4 Michigan 4.45 4 Duke 9.885
5 Stanford 4.43 5 Yale 9.621
6 Yale 4.33 6 UCLA 6.850
7 UCLA 4.22 7 UC-San Diego 6.798
7 Princeton 4.22 8 Michigan 5.924
9 Wisconsin-Madison 4.15 9 SUNY-Stony Brook 5.592
10 Columbia 3.97 10 U of Washington 5.047
11 Pennsylvania 3.94 11 Washington U 4.652
12 UC-San Diego 3.78 12 Rochester 4.509
12 Northwestern 3.78 13 Johns Hopkins 3.920
14 Minnesota 3.76 14 Pennsylvania 3.713
15 Cornell 3.67 15 Maryland-College Park 3.384
16 Duke 3.63 16 Boston U 3.366
17 U of Washington 3.57 17 Northwestern 3.284
18 UNC-Chapel Hill 3.55 18 UC-Santa Barbara 3.112
19 Texas-Austin 3.53 19 UC-Irvine 3.007
20 Illinois-Urbana 3.50 20 Ohio State 2.199

Source: National Research Council Report, 1995; Webster and Skinner, Table 2.

Table 10
Top 20 Institutions in Arts and Humanities
(of Those That Had at Least Five of the Eleven Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Awards Density

Rank Campus Mean Score Rank Campus Z-Score
1 UC-Berkeley 4.36 1 Harvard 19.536
2 Princeton 4.28 2 Princeton 12.713
3 Harvard 4.20 3 Chicago 11.799
4 Columbia 4.12 4 Stanford 11.672
5 Yale 3.95 5 Johns Hopkins 10.973
6 Cornell 3.93 6 Penn 10.050
7 Penn 3.88 7 UC-Berkeley 8.118
9 Chicago 3.85 8 Northwestern 6.947
9 Duke 3.85 9 Columbia 5.604
9 Stanford 3.85 10 Cornell 5.515
11 UCLA 3.67 11 Brown 5.021
12 Michigan 3.66 12 Duke 4.001
13 UC-Irvine 3.63 13 UC-Davis 1.972
14 Johns Hopkins 3.55 14 Rice 1.987
15 Virginia 3.54 15 Michigan 1.025
16 CUNY Grad School 3.45 16 UNC-Chapel Hill 0.598
17 Brown 3.42 17 UC-San Diego 0.539
18 Texas-Austin 3.40 18 Washington U 0.293
19 UC-San Diego 3.37 19 UC-Irvine 0.206
20 Northwestern 3.23 20 Virginia 0.146

Source: National Research Council, Report, 1995; Webster and Skinner, Table 2.

Table 11
Top 50 Institutions Ranked by Mean Score of Reputation Rating
and Citations or Awards Density of All Programs

Reputation Rating
Citations or Awards/Faculty
Rank Institution Mean Score Rank Institution Z-Score
1 MIT 4.60 1 Stanford 71.137
2 UC-Berkeley 4.49 2 Harvard 65.816
3 Harvard 4.40 3 Caltech 39.752
4 Caltech 4.29 4 MIT 37.386
4 Princeton 4.29 5 UC-Berkeley 35.330
6 Stanford 4.21 6 Johns Hopkins 34.765
7 Chicago 4.13 7 Princeton 32.192
8 Yale 4.08 8 UC-San Diego 31.529
9 Cornell 3.95 9 Chicago 28.563
10 UC-San Diego 3.93 10 Yale 27.931
11 Columbia 3.92 11 Pennsylvania 25.798
12 UCLA 3.85 12 U Washington 24.404
12 Michigan 3.85 13 Columbia 24.305
14 Pennsylvania 3.79 14 Northwestern 23.712
15 Wisconsin-Madison 3.70 15 UCLA 21.505
16 Texas-Austin 3.63 16 Duke 20.365
17 U of Washington 3.60 17 Cornell 17.528
18 Northwestern 3.58 18 Michigan 15.813
20 Carnegie Mellon 3.56 19 UC-Santa Barbara 12.865
20 Duke 3.56 20 Brown 8.173
20 Illinois-Urbana 3.56 21 Colorado 8.107
20 Johns Hopkins 3.56 22 Washington U 8.007
23 Minnesota 3.45 23 Rochester 6.451
24 UNC-Chapel Hill 3.44 24 UC-Irvine 5.994
25 Brown 3.40 25 SUNY-Stony Brook 5.960
26 New York U 3.37 26 Minnesota 4.553
27 UC-Irvine 3.35 27 Wisconsin-Madison 3.656
28 Virginia 3.34 28 New York U 3.506
29 Purdue 3.31 29 UNC-Chapel Hill 3.238
30 Arizona 3.25 30 Illinois-Urbana 2.788
31 Rochester 3.24 31 Rice 2.585
32 Emory 3.23 32 Brandeis 2.120
32 Rutgers 3.23 33 Utah 0.922
34 Washington U 3.22 34 Southern California 0.894
35 UC-Davis 3.18 35 Tufts 0.166
35 Penn State 3.18 36 Emory -0.038
37 Ohio State 3.16 37 Boston U -0.045
38 Indiana 3.15 38 Georgetown -0.047
39 SUNY-Stony Brook 3.13 39 Iowa -0.063
40 Rice 3.11 40 UC-Santa Cruz -0.431
41 UC-Santa Barbara 3.08 41 Virginia -0.803
42 Colorado 3.05 42 Delaware -1.290
42 CUNY Graduate School 3.05 43 Vanderbilt -1.521
44 Maryland-College Park 3.04 44 Arizona -1.865
44 Southern California 3.04 45 Carnegie Mellon -1.899
46 North Carolina State 3.03 46 Case-Western Reserve -2.488
47 Texas A&M 3.00 47 Texas-Austin -2.624
48 Vanderbilt 2.99 48 UC-Davis -2.999
49 UMass-Amherst 2.98 49 UMass-Amherst -3.270
50 Iowa 2.97 50 Maryland-College Park -4.143

Source: National Research Council, Report, 1995, Appendix tables P 1-41; Webster and Skinner, Table 1.