Introduction to the Printed Volume
The purpose of the Ethnologue is to provide a comprehensive listing of the known living languages of the world. The Ethnologue is intended more as a catalog than as an encyclopedia and so provides summary data rather than more extensive descriptions of identified languages. Information comes from numerous sources and is confirmed by consulting both reliable published sources and a network of field correspondents. Even a relatively limited range of information on all of the world’s languages is not available, so the scope of Ethnologue’s descriptions varies. The information is organized within specific categories as described below in “Layout of Language Entries” and no effort is made to gather data beyond those categories. Much of the focus of Ethnologue is on the less commonly known languages. Greater detail and depth of description of many of the languages, especially the larger, more commonly studied languages, can be found in other works such as the International Encyclopedia of Linguistics (Frawley 2003), The World’s Major Languages (Comrie 1987), and The Atlas of Languages (Comrie, Matthews, and Polinsky 1997).
Because languages are dynamic and variable and undergo constant change, the total number of living languages in the world cannot be known precisely. In this edition, we tally 6,909 languages which are known to have living speakers who learned them by transmission from parent to child as the primary language of day-to-day communication. These languages are commonly referred to as a person’s “first language” or “mother tongue”. As knowledge of the world’s languages increases, so does the number of identified languages. Simultaneously, however, the rate at which languages are going out of use is alarming and justifies the warnings that linguists began making early in the last decade (e.g., Krauss 1992). This edition reflects those contrasting and opposite processes with the addition of 163 languages previously unidentified (80 through splitting and 83 as new varieties not previously associated with another language) and the subtraction of 166 languages (75 being merged with other languages and 91 now recognized as no longer having any remaining speakers). This results in a decrease of three languages from the total reported in the previous edition of Ethnologue (2005).
Ethnologue tends to be conservative in identifying languages as having no remaining speakers, not classifying a language as no longer spoken or “extinct” until a reliable source has reported it to be so or until the probability of language loss is very high indeed. Thus it is likely that the loss of languages we are currently reporting lags behind the actual reality.
In addition to living languages as defined above, Ethnologue also contains data on languages which have gone out of use since the first edition of the publication more than 50 years ago. It also lists languages that are used only as second languages by a significant population. Ancient, classical, and long-extinct languages are not listed (even though the ISO 639-3standard assigns codes to them), unless they are in current use (as in the scriptures or liturgy of a faith community). More details on special status categories (extinct, nearly extinct, second language only) are given below.
The demographic, geographic, vitality, development, and linguistic information included in this volume can be useful to linguists, translators, anthropologists, bilingual educators, language planners, government officials, aid workers, potential field investigators, missionaries, students, and others with language interests.
The Ethnologue was founded by Richard S. Pittman, who was motivated by the desire to share information on language development needs around the world with his colleagues in SIL International as well as with other language researchers.
The first edition in 1951 was 10 mimeographed pages and included information on 46 languages or groups of languages. Hand-drawn maps were first included in the fourth edition (1953). The publication transitioned from mimeographed pages to a book in the fifth edition (1958). Dr. Pittman continued to expand his research through the seventh edition (1969), which listed 4,493 languages.
In 1971 Barbara F. Grimes became editor. She had assisted with the Ethnologue since 1953 (fourth edition) and took on the role of research editor in 1967 for the seventh edition (1969). She continued as editor through the fourteenth edition (2000). In 1971 information was expanded from primarily minority languages to encompass all known languages of the world. Between 1967 and 1973 she completed an in-depth revision of the information on Africa, the Americas, the Pacific, and a few countries of Asia. During her years as editor, the number of identified languages grew from 4,493 to 6,809 and the information recorded on each expanded so that the published work more than tripled in size. In 2000, Raymond G. Gordon, Jr. became the third editor of the Ethnologue and produced the fifteenth edition (2005). Shortly after the publication of the fifteenth edition, M. Paul Lewis became the editor, responsible for general oversight and research policy. He installed Conrad Hurd as managing editor, responsible for operations and database management, and Raymond Gordon as senior research editor, leading a team of regional and language-family focused research editors.
The data given here are all taken from a computerized database on languages of the world established in 1971 by then consulting editor, Joseph Grimes, from the typesetting tapes for the seventh edition (1969). The work was done at the University of Oklahoma under a grant from the National Science Foundation. In 1974 the database was moved to a computer at Cornell University where Dr. Grimes was professor of linguistics, and it was then moved to a personal computer in 1979. Since 2000 it has been housed and maintained at the headquarters of SIL International in Dallas, Texas. A presentation of all of the data included in this volume from that database is also accessible on the World Wide Web at http://www.ethnologue.com. The fact that the entries are partially constructed by computer accounts for a certain stiffness or redundancy in the phrasing.
One feature of the database since its inception has been a system of three-letter language identifiers. The codes were first published with the following explanation in a monograph reporting the results of the grant to build the database:
Each language is given a three-letter code on the order of international airport codes. This aids in equating languages across national boundaries, where the same language may be called by different names, and in distinguishing different languages called by the same name. (Grimes 1974:i)
While the codes were used behind the scenes in the database that generated the eighth and ninth editions, it was not until the tenth edition (1984) that they appeared in the publication itself.
In 1998, the International Organization for Standardization (ISO) adopted ISO 639-2, a standard for three-letter language identifiers. The standard is based on a convergence of ISO 639-1 (an earlier standard for two-letter language identifiers adopted in 1988) and of ANSI Z39.53 (also known as the MARC language codes, a set of three-letter identifiers developed within the library community and adopted as an American National Standard in 1987). The ISO 639-2 standard was insufficient for many purposes since it has identifiers for fewer than 400 individual languages. Thus in 2002, ISO TC37/SC2 formally invited SIL International to prepare a new standard that would reconcile the complete set of codes used in the Ethnologue with the codes already in use in the earlier ISO standard. In addition, codes developed by Linguist List to handle ancient and constructed languages were to be incorporated. The result, which was officially approved by the subscribing national standards bodies in 2006 and published in 2007, is a standard named ISO 639-3 that provides standardized three-letter codes for identifying nearly 7,500 languages (ISO 2007). SIL International was named as the registration authority for the ISO 639-3 standard inventory of language identifiers and administers the annual cycle for changes and updates.
This edition of Ethnologue is the second to use the ISO 639-3 language identifiers. In the fifteenth edition they had the status of Draft International Standard. In this edition they are based on the standard as originally adopted plus the 2006 series of adopted change requests (released August 2007) and the 2007 series of adopted change requests (released January 2008). Information about the ISO 639-3 standard and procedures for requesting additions, deletions, and other modifications to the ISO 639-3 inventory of identified languages can be found at the ISO 639-3 website: http://www.sil.org/iso639-3.
Due to the nature of language and the various perspectives brought to its study, it is not surprising that a number of issues prove controversial. Of preeminence in this regard is the definition of a language itself.
How one chooses to define a language depends on the purposes one has in identifying that language as distinct from another. Some base their definition on purely linguistic grounds. Others recognize that social, cultural, or political factors must also be taken into account. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and identity much more than to the linguistic features of the language(s) in question.
Languages as particles, waves, and fields. Scholars are recognizing that languages are not always easily nor best treated as discrete, identifiable, and countable units with clearly defined boundaries between them (Makoni and Pennycook 2006). Rather, a language is more often comprised of continua of features that extend across time, geography, and social space. There is growing attention being given to the roles or functions that language varieties play within the linguistic ecology of a region or a speech community. The Ethnologue approach to listing and counting languages does not preclude a more dynamic understanding of the linguistic makeup of the countries and regions we report on. While discrete linguistic varieties can be distinguished, we also recognize that those varieties exist in a complex set of relationships to each other. Languages can be viewed, then, simultaneously as discrete units (particles) amenable to being listed and counted, as continua of features across time and space (waves) that are best studied in terms of variational tendencies as examples of “change in progress”, and as parts of a larger ecological matrix (field), where functional roles and usage of the linguistic codes for a wide range of purposes are more in focus. All three of these perspectives, language as particle, wave, and field (Lewis 1999; Pike 1959), are useful and important. Ethnologue focuses primarily on the unitary nature of languages without prejudice against the other perspectives.
Language and dialect. Every language is characterized by variation within the speech community that uses it. Those varieties, in turn, are more or less divergent from one another. These divergent varieties are often referred to as dialects. They may be distinct enough to be considered separate languages or sufficiently similar to be considered merely characteristic of a particular geographic region or social grouping within the speech community. Often speakers may be very aware of dialect variation and be able to label a particular dialect with a name. In other cases, the variation may be largely unnoticed or overlooked. Not all scholars share the same set of criteria for distinguishing a “language” from a “dialect”. Since the fifteenth edition, Ethnologue has followed the ISO 639-3 inventory of identified languages as the basis for our listing. That standard applies the following basic criteria for defining a language in relation to varieties which may be considered dialects:
- Two related varieties are normally considered varieties of the same language if speakers of each variety have inherent understanding of the other variety at a functional level (that is, can understand based on knowledge of their own variety without needing to learn the other variety).
- Where spoken intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be a strong indicator that they should nevertheless be considered varieties of the same language.
- Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages.
These criteria make it clear that the identification of “a language” is not solely within the realm of linguistics.
Macrolanguages. With this edition of the Ethnologue we include entries for macrolanguages for the first time. The ISO 639-3 standard defines three-letter codes for both individual languages and macrolanguages. The latter are defined in the standard as “multiple, closely related individual languages that are deemed in some usage contexts to be a single language.” Using the three criteria listed above, some varieties may be considered separate languages and identified by distinctive ISO 639-3 codes, but for other purposes those individual languages might be grouped together and spoken of as a single language based on the shared heritage and identity of the speakers or other common features such as a common writing system and literature. Thus the ISO 639-3 standard defines three-letter codes for the macrolanguages it recognizes, and it enumerates the set of individual languages that are the members of each macrolanguage. Macrolanguages are distinguished from other groupings of languages (e.g., all of the languages spoken in South America or all of the languages that use the Latin script or all of the Bantu languages) in that the individual languages that comprise a macrolanguage must be closely related, and there must be some domain in which they are commonly viewed as comprising a single language.
This edition of Ethnologue has entries for 55 macrolanguages as identified in the ISO 639-3 standard. The addition of this conceptualization of language provides us with a better way to represent the fact that linguistic varieties function simultaneously as both individual units and within a larger functional matrix. Macrolanguage entries are brief, consisting largely of a listing of the individual languages that comprise them. See “Layout of macrolanguage entries” below.
Sign languages. There are hundreds of sign languages in the world, created and used by deaf people. This edition lists 126 such languages. As the primary language of day-to-day communication for their respective communities of users, these languages fall within the scope of the Ethnologue. The deaf sign languages listed in language entries are those used exclusively within deaf communities. The listings include only natural sign languages, not signed versions of spoken languages (manual codes), which typically have names like “Signed English” or “Signed French.” Manual codes are, however, sometimes mentioned in the entries about individual sign languages. Generally, we do not include manual systems invented primarily for use by hearing people that are not full languages (e.g., hand signals in sports), though some manual systems that have been assigned ISO 639-3 codes and are used as second languages only are included in our listings.
Language endangerment is a serious concern to which linguists and language planners have turned their attention in the last two decades. For a variety of reasons, speakers of some languages stop using their language and begin using another. Parents may begin to use only that second language with their children and gradually the intergenerational transmission of the heritage language is reduced and may even cease. As a consequence there may be no speakers who use the language as their first or primary language and eventually the language may no longer be used at all. A language may become dormant or extinct, existing perhaps only in recordings or written records and transcriptions.
The concern about language endangerment is centered, first and foremost, around the factors which motivate speakers to abandon their language and the consequences of language death for the community of (former) speakers of that language. Since language is closely linked to culture, loss of language almost always is accompanied by social and cultural disruptions as well. Secondarily, those concerned about language endangerment recognize the implications of the loss of linguistic diversity both for the linguistic and social environment generally and for the academic community which is devoted to the study of language more specifically.
There are two dimensions to the evaluation and characterization of endangerment—the number of persons who identify with a particular language and the number and nature of the domains in which the language is used. A language may be endangered because there are fewer and fewer people who claim that language as their own and therefore neither use it nor pass it on. It may also, or alternatively, be endangered because it is being used for fewer and fewer daily activities and so loses the characteristically close association of the language with particular social functions. Since form follows function, languages which are being used for fewer and fewer functions also tend to lose structural complexity, which in turn may affect the perceptions of users regarding the suitability of the language for use in a broader set of functions. This can lead to a downward spiral which eventually results in the loss of the language altogether. The Ethnologue reports on both of these dimensions of language use (users and functions) whenever the data are available.
Language endangerment is a matter of degree. At one end of the scale are languages that are vigorous, and perhaps are even expanding in numbers of speakers or functional areas of use. At the other end are languages that are on the verge of dormancy (loss of functions) or extinction (almost no individuals who identify the language as being part of their identity). In between are many degrees of greater or lesser endangerment. Since specific population thresholds may vary by region, we report a language as “Nearly extinct” when the speaker population is 50 or less, or when other criteria indicate that it is seriously endangered. In this edition, 457 languages are so designated. When a language is dormant, with no remaining societal functions assigned to it but there is an extant community that associates their ethnic identity with that language, we characterize the situation with the phrase “No remaining speakers” or “No known speakers”. In the case where a language is dormant, there are, in many cases, emerging speakers who are re-learning the language of their forebears. Where this represents an identifiable population, we characterize the language as having “Second language speakers only”. That label is also used for languages which have limited use, such as liturgical languages or lingua francae, but which are not necessarily endangered. If the language is neither used for any societal functions nor has an ethnically associated community, we characterize it as “Extinct”. In such cases, effort has been made to indicate the language that is now spoken by the ethnic group, usually with the phrase “Shifted to…” or “Shifting to…”. The increase in the number of extinct languages and languages with no remaining speakers from the fifteenth edition does not indicate with any precision the number of languages that have been lost since the time of the previous edition, but rather reflects better information concerning some that may have been extinct at some time previous to that publication.
How to identify the level of endangerment of still-used but declining languages, however, is not necessarily clear. As a scholarly consensus forms that can be applied worldwide, a set of metadata for evaluating endangerment is becoming increasingly possible (Lewis 2008). Sociolinguists and linguistic anthropologists seek to identify trends in language use, through the description of some direct measures of language vitality such as changes in the number of speakers or in the use of the language in certain domains or functions. Less directly, an increase in bilingualism, both in the number of bilinguals and in their proficiency levels, is often associated with these trends, though a high level of bilingualism is not, in itself, a sufficient condition for language shift or death. When data are available, the following factors which may contribute to the assessment of language endangerment are reported in the language entries: low speaker population (Grimes 1986), the population number of those who connect their ethnic identity with the language (whether or not they speak the language), the stability of and trends in that population size, bilingualism data, language attitudes within the community itself and the attitudes towards the language by outsiders, the age(s) of the speakers, the domains of use of the language, residency and migration patterns of speakers, industrialization and modernization trends, official recognition of languages within the nation or region, and whether or not children are learning it at home or being taught the language in schools. Such factors interact within a society in dynamic ways that are not necessarily predictable. With this edition of the Ethnologue we begin reporting an estimate of the level of disruption of intergenerational language transmission for a small number of languages using an adaptation of Fishman’s (1991) Graded Intergenerational Disruption Scale (GIDS); see the definition below in “Layout of language entries” under viability remarks. We encourage users of the Ethnologue to provide us with additional data for inclusion in future editions.
Following this introduction, the content of the book consists of six major sections.
Statistical Summaries. This section offers a summary view of the world language situation. Specifically, it offers numerical tabulations of languages and number of speakers by world area, by language size, by language family, and by country.
Languages of the World. This section provides detailed information on the 7,413 languages listed in this edition. These break down as 6,909 living languages, 55 macrolanguages, 28 languages used only as a second language, and 421 recently extinct languages. The language entries are organized by five major geographic areas (Africa, Americas, Asia, Europe, and Pacific) and by country within each area. Each country is introduced by an overview paragraph (see “Layout of country headers” below). Language entries under each country are in alphabetical order by primary language name and provide a summary description of the language, structured by a set of categories (see “Layout of language entries” below). Note that a full bibliography of references cited is included at the end of this section.
Language Maps. Languages spoken in 157 countries are shown on 201 pages of country maps. Continent maps and a world map are given to help orient the reader to the location of specific countries and continents. The maps are ordered by continent, then country. However, there are occasional deviations from strict alphabetical order of countries in order to maintain the integrity of maps on facing pages. No political statement is intended by the placement of any boundary lines for any languages or countries on any map. See “Language maps” below for more details on the maps and how they were produced.
Language Name Index. This is an index of the 41,186 unique names that are associated with the 7,413 languages in this edition. It is an alphabetical list of all the language names, alternate names, dialect names, and alternate dialect names that appear in the language entries. Instructions for using the index are given on its first page.
Language Code Index. This is an alphabetical index of the 7,413 three-letter language identification codes from ISO 639-3 that are used throughout the volume. Instructions for using the index are given on its first page.
Country Index. This alphabetical index of country names lists the page on which the section for that country begins in the main part of the book, and the page on which its maps begin if there are any.
Languages are listed by country under each major geographic area. The country names used as section headings are not official names, but the commonly known names of the countries in English. The entry for a country begins with a header paragraph giving summary information about the country. This header is followed by an entry for each language of the country that is not a recent immigrant. The country header has the following form:
Official country name. Country population. National or official languages. Country literacy rates. Non-indigenous languages. Sources of information. Blind population. Deaf population. Language counts. Map information.
Official country name. This is the name used by the country in its official documents. In most cases this differs from the popular English name as given in the section title and in the Country Index. There may be more than one official name listed in different languages.
Country population. These figures are taken from the most recent national census data where available or are the 2005 estimated population from the United Nations.
National or official languages. National languages are those languages spoken by a large portion of the population of a nation. Official languages are those that have been designated as such by an official body. We do not distinguish these two kinds of recognition.
Country literacy rates. These rates are estimates of the percentage of the population in the country that is literate in some language. Data are from various sources.
Non-indigenous languages. Non-indigenous languages are categorized as such if they are spoken by relatively recently arrived or transient populations which do not have a well-established, multi-generational community in the country. The label “Immigrant languages:” is used to introduce a list of known non-indigenous languages in the country. Population estimates if known are shown in parentheses immediately following the language name. These languages are not given their own language entries and are not included in the language counts for that country. Where more general information is known about the presence of immigrant speakers from particular countries or regions (without knowledge of the specific languages), such information is added in a statement that begins with “Also includes.” Given the transitory nature of these populations, this information may be inaccurate or incomplete.
Sources of information. The major sources for our data for each country are given. This list includes both published and unpublished sources (personal communications, etc.). Citations of published sources in the text of Ethnologue follow the conventional format of author surname followed by publication year. Personal communications, unpublished, and more general sources such as censuses, are identified by placing the year before the name of the source. The Bibliography provides full bibliographic references for the published sources cited in the country headers and language entries. Personal communications and other unpublished sources are not listed in the Bibliography.
Blind population. There are reported to be from 23,000,000 to 40,000,000 or more blind people in the world. Information from various sources on the number of blind people in each country is given in the country header. Information about the availability of Braille codes and Braille literature is given under specific languages. Readers are encouraged to submit additional information on the number of blind people in specific language groups, availability and standardization of Braille codes, and literature published or in progress. See “Updates and corrections” below for submission instructions.
Deaf population. There are millions of deaf and hearing-impaired people in the world. The country header gives information on the number of audiologically deaf people (which is generally larger than the number of deaf people who use a sign language) and an approximate count of the deaf institutions (schools, clubs, associations). The deaf sign languages listed in language entries are those used exclusively within deaf communities. They do not include those, like Signed English, that spell out spoken languages used in the country. See the fuller discussion above under “The problem of language identification.” Please send additional information on deafness and deaf sign languages to the Ethnologue editor. See “Updates and corrections” below for submission instructions.
Language counts. The number of individual languages indigenous to the country is given along with a breakdown of the number of living individual languages, the number that are extinct or have no remaining speakers, and the number that are used only as a second language. Macrolanguages are not included in these counts since they are not distinct from, but overlap, the individual languages that are already counted.
Map information. If the location of languages within the country in focus is shown on one of more maps in Part II, “Language Maps”, a reference is given to the exact page number on which the set of maps begins.
Many languages are spoken in more than one country, and so are listed under several countries. (In fact, this results in 9,222 entries for the 7,413 languages listed in this edition.) One of the countries is considered primary, usually the country of origin or country where most of the speakers are located. More information about a language is given in its entry in the primary country than in the others. An entry for a language in a non-primary country ends with the words “See more information under...” giving a cross-reference to the primary country. A complete entry for the primary country has the following form and content:
Primary language name [ISO code] (Alternate names). Country speaker population. Population stability comment. Population in all countries. Monolingual population. Population remarks. Ethnic population. Location. Class: Linguistic affiliation. Macrolanguage membership. Dialects: Dialect names. Intelligibility and dialect relations. Lexical similarity. Lg Use: Language function. Bilingualism remarks. Domains of use. User age groups. Language attitudes. Viability remarks. Lg Dev: Literacy rates. Literacy remarks. Use in elementary or secondary schools. Publications and use in media. Writing: Scripts used. Other: General remarks. Linguistic typology. Religion. Status. Map: Map information.
Within the language entry, italicized labels are used to organize the entry into topical sections. A label appears in a given entry only if the entry contains one or more of the pieces of information associated with that topic. Seven such labels are used:
- Class for the language classification, including macrolanguage membership if applicable;
- Dialects for information about the names of dialects of the language, including information on lexical similarity and intelligibility with other varieties if available;
- Lg Use for information about the use and viability of the language and the use of other languages by the community;
- Lg Dev for information about literacy rates, written materials, and use in education;
- Writing for information about writing systems and scripts;
- Other for all additional information; and
- Map for cross-references to the maps in Part II.
Information in all of these categories is not available for every language.
Primary language name. Each entry begins with the name used to refer in English to that language in that country. In most cases the name is the one that the speakers prefer if such a preference is known. However, speakers within a language community may have different opinions about which name they prefer. Known preferred names are recorded using English spellings, though diacritical marks may be included. Among Khoisan languages and a few other languages in southern Africa special symbols are used in language names to represent the “click” sounds produced with ingressive mouth air.
ISO code. The code assigned to the language by the ISO 639-3 standard (ISO 2007) is given in lower-case letters within square brackets. When a given language is spoken in multiple countries, all of the entries for that language use the same three-letter code. The code distinguishes the language from other languages with the same or similar names and identifies those cases in which the name differs across country borders. These codes ensure that each language is counted only once in world or area statistics.
Alternate names. Many languages are known by or have been referred to by more than one name. The 9,222 entries list 26,241 alternate names to assist the reader in identifying a language. These are enclosed in parentheses and listed in alphabetical order separated from each other by commas. Alternate names come from many diverse sources: speakers may have more than one name for their language, or neighboring groups may use different names. Other names may have been assigned by outsiders and used in ethnographic or linguistic publications before the name used by the speakers themselves was known. Another source of alternate names is variant spellings of what is essentially the same name. In many cases, spellings used in languages of wider communication or regional languages are also included in this list. Some of the names listed may actually be ethnic or place names that have been used in the literature as names for the language.
Some names in use by others are offensive to the speakers of the language. Those are identified by enclosing the name in double quotation marks and appending the label pej. (pejorative) following the name. We list these names as a means of helping users find languages they may have only heard referred to by such names. Ethnologue does not imply any endorsement of the pejorative names.
Country speaker population. The first population figure given is the estimated number of first-language (L1) speakers in the country in focus. Where it is available we provide the source and date of the information in parentheses. Differences among sources and differences in dates when the estimates were made may cause the totals of the populations for all of the languages in any given country to differ from the total population of the country.
We do not extrapolate population estimates to bring them up-to-date, since populations do not increase at the same rate in all language groups within a country and since some starting estimates themselves turn out later to have been incorrect. However, some population data submitted to the Ethnologue may be the result of extrapolation.
The Ethnologue provides the number of first-language speakers wherever possible. It is often difficult to get an accurate figure for the speakers of a language. All figures are only estimates—even census figures. Some sources do not include all dialects in their figures or may count as a single language two languages identified separately in the ISO 639-3 inventory. Some sources count members of ethnic groups, who, in some cases, may not be speakers of the language. Some sources do not make clear whether they refer to the total number of speakers in all countries, or only to those in one of the countries. Some do not distinguish first-language (L1) speakers from second-language (L2) speakers.
As described above in “Endangered Languages”, languages that are no longer in use, but still have ethnic group members who identify with the language, are listed as having “No known speakers” in place of a population figure. Languages that have neither societal use nor remaining ethnic group members are described as “Extinct”.
Dates and sources for population data are given where available. Where the word “census” appears as the source, it is generally the most recent available national census of the country and is not cited separately in the Bibliography.
Population stability comment. For some languages, we are able to indicate whether the speaker population is increasing or decreasing. This information also contributes to an overall evaluation of ethnolinguistic vitality. There may be a few cases where the actual speaker population count is not known or is unreported, but the stability of the population is evident and has been commented on.
Population in all countries. When a language has first-language speakers in more than one country, the entry for the primary country lists the total speaker population for all countries. Since information may come from multiple sources, the sum of the individual country populations may not equal the figure given for all countries. In some cases, the population of one or more countries may not be available.
Monolingual population. Where the data are available, the number of those who are monolingual is reported. This number can be compared with the total speaker population as one way to estimate the vitality of the language.
Population remarks. Additional information concerning populations may include population breakdowns (by dialect, gender, ethnic groups, or specific villages or communities), the population of the deaf community, or other comments on demographics.
Ethnic population. Where it is known, the population of those who identify themselves as part of the ethnic group is given. A language with no first-language speakers will be reported as extinct when the ethnic population figure is zero, absent, or unknown. When the speaker population is zero but there is an ethnic population figure, the language will be reported as having “No known speakers”.
Location. A description of the location where the language is spoken is included in each entry where a specific area can be defined. Those languages that are scattered through a country or wide region may not have this information in the entry or may be reported as “Widespread” and may not appear on the country maps. A list of all countries where the language is spoken is provided in the primary entry for a language spoken in multiple countries. Generally, regional locations are listed in descending order from largest geopolitical unit to smallest.
Linguistic affiliation. All languages are slowly changing, and linguistically related varieties may be diverging or merging. Most languages are related to other languages: to some more closely and to others more distantly. Linguists have used terms such as phylum, stock, family, branch, group, language, and dialect to refer to these relationships in increasing order of linguistic similarity. The classification information for each language follows this general order from largest grouping to smallest. More inclusive group names are given first, followed by the names for less inclusive subgroups, separated by commas.
Language classification information comes from a variety of sources. Generally, the organization of linguistic relationships outlined in the International Encyclopedia of Linguistics (Frawley 2003) is followed for most language families. For Austronesian languages, the Comparative Austronesian Dictionary (Tryon 1995) is followed most frequently. Departures from these primary sources are included based on more recent comparative studies as they are reported to us. The traditional (“Guthrie”) numbering system used to identify different subgroups of Bantu languages in Africa has been followed with some amendments based on more recent scholarship. The full set of classification trees is available as a dynamic presentation on the Ethnologue website at http://www.ethnologue.com/family_index.asp. A listing of the highest-level language families (including number of languages, average populations, and countries where spoken) is given below in the “Statistical Summaries” section.
Macrolanguage membership. If an individual language is a member of a macrolanguage (as discussed above in “The problem of language identification”), that fact is reported following the language classification information. The listing gives the name of the macrolanguage, the name of the primary country under which its entry is found (if different from the current country), and the ISO code for the macrolanguage. (See also “Layout of macrolanguage entries” below.)
Dialect names. Speech varieties which are functionally intelligible to each other’s speakers because of linguistic similarity are considered dialects of the same language and listed under that language. In this edition, a total of 11,779 dialect names are listed. In addition, 5,836 alternate names for individual dialects are listed in parentheses following the primary name for the dialect. When one of these names is known to be offensive to its speakers, it is placed in double quotes (and tagged as pejorative with the abbreviation “pej.” as is also done for alternate language names).
This listing of dialect names does not represent the results of rigorous dialectological investigations. As with the alternate names, we list the names of dialects which may have been mentioned in published or other sources. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In such cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages.
Intelligibility and dialect relations. The ability of the users of one variety to understand another variety, based only on the similarity of those two varieties, is called inherent intelligibility. A measure of inherent intelligibility with other varieties is given by percent. Values of less than 85% are likely to signal difficulty in comprehension of the indicated language. Intelligibility may not be reciprocal or mutual, thus the wording of the intelligibility description may indicate the direction of the intelligibility (e.g., 85% intelligibility of another variety, or 85% intelligibility by speakers of another variety). If the direction of intelligibility is not indicated (e.g., 85% intelligibility with another variety) or is identified as being mutual, it should be understood as being reciprocal with speakers of each of the varieties mentioned understanding each other equally well.
The ability of speakers to understand another variety because of previous exposure to it or learning is called acquired intelligibility and may be commented on in a few language entries.
Lexical similarity. The percentage of lexical similarity between two linguistic varieties is determined by comparing a set of standardized wordlists and counting those forms that show similarity in both form and meaning. Percentages higher than 85% usually indicate a speech variant that is likely a dialect of the language with which it is being compared. Unlike intelligibility, lexical similarity is bidirectional or reciprocal.
Language function. If a language has been recognized as a national or official language, it is generally identified in Ethnologue as “Official” without differentiating the precise nature of that recognition. A national language is one spoken by a large portion of the population of a nation and recognized as a marker of that national identity. An official language is one that has been designated as such by an official body for the activities of that body. If a language is only given such recognition regionally or within specific geopolitical units of a country, that fact will be noted in the language entry but not necessarily in the country header where official languages are also listed.
Bilingualism remarks. Because second languages are usually learned later than first languages, bilingualism is usually not uniform across a community. When speakers can use a second language, different speakers usually have varying degrees of bilingual proficiency in it, ranging from the ability to use only greetings, to engage in trade, or to freely express anything in the second language. Language groups are sometimes reported to be bilingual if a few of the speakers can use a second language to some degree, or if there are no monolinguals; whereas other sources would not classify groups as bilingual unless a large majority of their members could use the second language very well. Leaders, the educated, men, traders, those who travel, those in population centers, and people in certain age groups may be more bilingual than others. Where information is available, these factors about bilingualism are described.
Domains of use. When more than one language is used in a community, speakers often establish patterns of language use for specific configurations of speakers, topics, and locations. These domains of language use can be described by answering the question, “Who is speaking to whom, about what, and where?” The Ethnologue does not have sufficient data about each language to permit a full description of the domains of use in this technical sense, but uses the term to refer most often to a general set of locations (e.g., home, school, community) and thus only indirectly to the topics and speakers most generally associated with those settings. Knowledge of these patterns of language use can help in evaluating ethnolinguistic vitality and in developing strategies for language revitalization or language development.
User age groups. As language use shifts from a traditional language to one of wider communication, differences in use appear between age groups. As language change takes place, older adults tend to be the final speakers of the traditional language. The use of a language by children is thus a significant indicator of the patterns of intergenerational language transmission which is key to language maintenance.
Language attitudes. What people think and how they feel about their own language is important to those promoting literacy or other development activities as more positive attitudes generally correspond to stronger ethnolinguistic vitality. Attitudes are difficult to assess directly and equally difficult to describe adequately. We report only summary attitude evaluations as Positive, Neutral, or Negative.
Viability remarks. A number of viability indicators are given. Where the language is being passed on to children as their first language, the term “Vigorous” is used. Other indicators are the number of people who use the language as their second-language, and the degree of language shift of speakers of this language to a second language (in some cases indicated by the percentage of speakers within the ethnic community). General estimates of viability may be given. In a few cases, an estimate of a language’s position on an adaptation of Fishman’s (Fishman 1991) Graded Intergenerational Disruption Scale (GIDS) is reported; see, for instance, the entries in Kenya. The GIDS places languages on a scale from strongest to weakest using the numbers 1 through 8 as follows: (1) the language is used in education, work, mass media, government nationwide; (2) the language is used for local and regional mass media and governmental services; (3) the language is used for local and regional work by both insiders and outsiders; (4) literacy in the language is transmitted locally through compulsory public education; (5) the language is used orally by all generations and is effectively used in written form throughout the community; (6) the language is used only orally and is being learned by children as their first language; (7) the child-bearing generation knows the language well enough to use it with their elders but are choosing not to transmit it to their children; and (8) the only remaining speakers of the language are members of the grandparent generation.
Literacy rates. Where available, percentages of the speaker population who are literate are given for the first (L1) and second (L2) languages. Where identification of the second language is not given, it is assumed to be the national language of the country in focus or other major language in the vicinity.
Literacy remarks. Information concerning motivation for literacy and existence of government (and other) literacy programs are given where available. Additional information concerning literacy that does not appear in related categories may also be given.
Use in elementary or secondary schools. The language may be used either as a language of instruction or taught as a subject within one or more schools in the language area.
Publications and use in media. The existence of materials that have been produced in the language such as dictionaries, grammars, and broadcast media are indicated when known. We report the existence of such materials but do not list titles individually. Where extensive literature and media exist, we identify the language as “Fully developed”. For many languages this information is very incomplete at this time. More information is welcomed though it is unlikely that the Ethnologue will ever be able to document existing literature in a comprehensive way.
The most widely published book in the world is the Bible with at least portions having been translated and published in 2,546 or 37% of the living languages listed in the Ethnologue. This figure is based on the thorough archival efforts of the United Bible Societies and the American Bible Society. Information about Bible publication for each language is given with the dates of the earliest and most recent published Bible, New Testament (NT), Old Testament (OT), or complete books (portions).
Writing scripts used. Full statistics on the number of languages that have written form are not available. However, for each language, the script used for written materials is given if known. Where multiple scripts are in use they are reported in alphabetical order. Where possible we also report any specific style of a script that is used, the dates when a script began to be used or ceased to be used, and other comments regarding writing and orthography. Languages which are known to be unwritten are so identified. Since many languages use the Latin script, that fact is not always reported if its use is obvious.
General remarks. These are general statements about the language or its context that do not fall into other specific categories.
Linguistic typology. For some languages, brief statements are given on constituent order (Subject, Object, Verb = SOV) and other basic features that are of particular interest to linguists. In some cases these listings are extensive in that they cover a wide range of linguistic features. They are no more than broad characterizations, however, and not linguistic descriptions of the language.
Religion. The religious affiliations of the speakers of the language are given where known. These are generally listed in descending order of number of adherents.
Status. While there are many languages that are used as second languages by large populations of speakers, the phrase “Second language only” is used to indicate only those languages which are used as second languages but have no mother-tongue speakers. These may include languages of special use, such as languages of initiation, languages of herb doctors, cants, jargons, or American Plains Indian Sign Language. Increasingly, this categorization may also be applied to languages which previously were considered not to have any remaining speakers but where revitalization efforts are resulting in a community of emerging language users who have learned the language as their second language. Such dormant or reawakening languages are listed in the body of the Ethnologue but are not included among the world and major area statistical totals of living languages. The inventory of these languages is also incomplete and we welcome information that will expand our coverage.
The phrase “Nearly extinct” is used to indicate those languages of fewer than 50 speakers and other languages for which the number of speakers is a very small fraction of the ethnic group and where revitalization efforts, where they exist, have not yet demonstrated any inhibiting effect on the process of language loss.
Map information. If the location of the language within the country in focus is shown on one of more maps in Part II, “Language Maps”, the exact page numbers for those maps are given. If the language is identified on a map by name, but that name differs from the primary name in the entry, the name on the map is given in parentheses. If the language is represented on a map by an index number, rather than by its name, the index number is given following the page number (with a colon as separator).
Though macrolanguages typically involve multiple countries, the entry for a macrolanguage is listed only once in the country that is considered primary, usually the country of origin or country where most of the speakers are located. Entries for macrolanguages follow the same general format as individual language entries but with much less detail. Further detail is found by consulting the entries for the listed member languages. The layout of a macrolanguage entry is as follows:
Macrolanguage name [ISO code]. A macrolanguage; see Introduction, The problem of language identification, Macrolanguages. Includes: Individual language name [ISO code] (Primary country), Individual language name [ISO code] (Primary country), etc. Population total all countries.
If the primary country for a member language is the same as the country currently in focus, the country name is omitted. Each cross-referenced individual language entry includes a reference back to the macrolanguage entry as part of the language classification information.
Maps showing the locations of language homelands are available for most countries of the world. Most of the maps make use of polygons to show the approximate boundaries of the language groups. No claim is made for precision in the placement of these boundaries, which in many instances overlap with those of other languages. Reference numbers are used on some maps where space does not allow the placement of language names. For some maps where the language boundaries are not known, the names or numbers alone appear.
The earliest maps in Ethnologue were developed as part of the Language Mapping Project carried out jointly with Global Mapping International (GMI). All of the maps in this edition have been redrawn with a new and clearer design. We have taken advantage of the capabilities afforded to us by a new generation of software (ArcGIS® provided by ESRI—Environmental Systems Research Institute) to improve the way that we show the family association of each language and the overlap of languages. A greater level of geographic detail has been included in the maps to aid in the location of languages within countries. The maps are drawn using the Digital Chart of the World as the underlying geographic database which has a finer level of resolution than the base map used in previous editions. Consequently, all the language polygons have been repositioned to fit the greater detail of geographic features in this new database. The complete geographic database used to produce these maps (including the language polygons) is available in a product jointly published with GMI named the World Language Mapping System; see http://www.gmi.org/wlms/.
African equatorial countries use the Sinusoidal projection. Other equatorial countries use the Mercator (cylindrical) projection. Maps of countries in higher latitudes use the Lambert Conformal Conic projection.
No political statement is intended by the identification of any territory separately in a map or language entry nor by the placement of any boundary lines for any languages or countries on any map.
The compilation of a body of information such as that presented here requires a cooperative effort on the part of hundreds of contributors. Updates in this edition are largely the contribution of researchers, language fieldworkers, and native speakers of these languages who gave their time and expertise to improve the accuracy and quality of the Ethnologue. Lamentably, space does not permit a listing of every correspondent who has communicated with us since the fifteenth edition was released in 2005. Moreover, the list of contributors over the nearly six decades of Ethnologue publication, whose contributions can still be seen, defies documentation.
The Editor Emeritus, Barbara F. Grimes, continues to provide invaluable assistance. Raymond Gordon not only served as Senior Research Editor for this edition but also provided much needed advice and consultation on policies and procedures. Conrad Hurd, the Managing Editor, has supervised the data entry process, tracked changes, dealt with thousands of electronic and hardcopy communications, and has been a thoughtful and patient coworker. Maggie Frank and Denise Hovland spent thousands of hours entering data. In addition to our Research Editors—Stan Anonby, Ted Bergman, Mark Karan, Stuart Showalter, and Jürg Stalder—research and editorial assistance has been provided by Karl Anderbeck, Charles Fennig, and Barbara Waugh. Lorna Priest supervises and maintains the scripts and writing systems portion of the database. Gary Simons, as Executive Editor, has provided vision, guidance, and most importantly, coordination for the task of bringing this edition to press.
The Ethnologue database is technically supported by Roger Hanggi, Lars Huttar, Ray Uehara, and Paul Walker. Joan Spanne, ISO 639-3 Registrar, has worked diligently to assist us in keeping our database aligned with the ISO 639-3 inventory of identified languages.
The text of this volume was copyedited by Bonnie Brown with the assistance of a large team of proofreaders who were organized by Cyndi Conner. These proofreaders were Judy Benjamin, Raymond Bergthold, David Blood, Doris Blood, Eugene Burnham, Lydia Carlson, Becky Clarke, Vurnell Cobbey, Janet Ervin, Dennis Felkner, Charles Fennig, Ron Gebauer, Lois Gourley, Mack Graham, Monika Hoehlig, Ken Hubel, George Huttar, Kathy Huttar, Mary Huttar, Paul Kroening, Lana Martens, Grace Merrifield, Carolyn Ogden, Ron Radke, Paul Schmidt, Edna Terry, Paul Vollrath, Barbara Waugh, and Mae Zook.
Production of the publication has been coordinated by Dennis Felkner, with production management by Bob Kaiser, graphic design by the team of Barb Alber, Patrick Gourley, and Lori MacLean, and composition and typesetting by Jelle Huisman. Keami Hung has assisted with the database-to-typesetting interface.
The maps have been produced under the direction of SIL’s Lead Cartographer, Irene Tucker, by Matt Benjamin and Michael McMillan with quality assurance and technical assistance from Stephen Tucker.
The data reported reflect the cooperation and communications from hundreds of researchers in the field in SIL and other organizations, but especially the following: Alja Katriina Ahlberg, Micah Amukobole, Ato Balguda, Denise Bailey, Marvin Beachy, Albert Bickford, Douglas Boone, John Brownie, Mike Bryant, Ken Chan, Diana Cohen, Tefera Endalew, Craig Farrow, Ketevan Gadilia, Michael Greed, Teija Greed, Jeff Green, Rikka Halme, David Holbrook, Lydia Hoeft, Roland Horsch, Hope Hurlbut, Ken Hugoniot, Geoffrey Hunt, Greg Huteson, Eric Johnson, Linda Jordan, Andreas Joswig, Izabela Karpienia, Amy Kim, Christine Klaver, Peter Knapp, Erwin R. Kome, Miklós Kontra, Susanne Krueger, Iver Larsen, Randy Lebold, Ritva Lehonkoski, Nina Leong, Carsten Almann Levisen, Carol Magnusson, Natalia Manzienko, Laura McKaig, Scott Merrifield, Hussein Mohammed, Dave Moody, Elena Mosolova, Mundara Muturi, Bea Myers, Andreas Neudorf, Joan Nichols, Steven Nicolle, John Ommani, Eric Pawley, Jamin Pelkey, Maria Polinsky, Suwilai Premsrirat, Kenneth Prettol, Dwayne Rainwater, Calvin Rensch, David Riggs, Jacques Rongier, Mirja Saksa, Larry Salay, Julian Shelton, Ralph Siebert, Bev Stacy, Jürg Stalder, Rosemary Ulrich, Mirjami Uusitalo, Alan Vogel, Vitali Voinov, Dennis Walters, Barb Waugh, Menasseh Wekundah, John Wilner, Katharina Wolf, and Cathryn Yang.
In addition, the following scholars have made significant contributions to the data and its accuracy through personal communications to the editors: Roger Blench, Leoni Bouwer, David Bradley, Matthias Brenzinger, Bernard Comrie, Richard Cook, Jerrold Edmondson, Bev Erasmus, Wesley Leonard, Margaret Muthwii, Jayne Mutiga, Hezy Mutzafi, Nick Nicholas, Martin Njoroge, Derek Nurse, Malcolm Ross, Bonny Sands, Andrew Shimunek, Myriam Vermeerberge, and Valentin Vydrine.
Though there are many names listed above, we have almost certainly omitted others who have made significant contributions. To them we offer both apologies and many thanks.
New and updated editions of the Ethnologue are published on a regular basis. Although this edition contains nearly 60,000 updates and corrections from the previous one, this edition makes no claims for completeness.
Language additions or deletions. Requests for identification of a previously unidentified language or other modifications to the inventory of identified languages can be made directly to the ISO 639-3 Registrar by or by going to the ISO 639-3 website. Change request forms can be downloaded from the website.
Corrections. If you believe any of the information in the Ethnologue is in error, send your proposed change to the editor using one of the addresses given below. Be sure to report the source of your information.
The Ethnologue staff will seek to verify the proposed change before accepting it. This process may take months as it generally involves making enquiries of individuals who are resident in the country where the language is spoken. These persons may in turn make enquiries of others in order to perform the verification. The submitter can expect to receive an acknowledgment from the Ethnologue editor.
Submit corrections and additions by e-mail to the
Or by post to:
7500 West Camp Wisdom Road
Dallas, Texas 75236, U.S.A.
There is still much to be learned concerning the languages of the world and the search for better knowledge goes on.
M. Paul Lewis, Editor
Part of the Ethnologue, 16th Edition, M. Paul Lewis, Editor.
Copyright © 2009, SIL International. All rights reserved.
Copies of the Ethnologue may be obtained from the online Publications Catalog orSIL International Publications*
7500 West Camp Wisdom Road
Dallas, Texas 75236-5629 USA
Tel: (972) 708-7404
Fax: (972) 708-7433
*formerly the International Academic Bookstore
If you have questions, comments, or updates on the Ethnologue, see the Feedback page.