![]() | ![]() |
|
|
| PROF. WOLFGANG TEUBERT | |
|
My version of corpus linguisticsWolfgang Teubert(Draft,
to appear in International Journal of Corpus Linguistics 1/2005) I decided early on not to publish my own articles in the journal of which I am the editor, except that in the very first issue I attempted to outline my understanding of corpus linguistics, as it was then. It is surely more the excellence of our contributors and of our reviewers than my abstinence that have made this journal flourish. My thanks go to our authors and to the members of our Editorial Board. It is to their credit that the International Journal of Corpus Linguistics is now in its tenth year, strong as ever. Today, the corpus is considered the default resource for almost anyone working in linguistics. No introspection can claim credence without verification through real language data. Corpus research has become a key element of almost all language study. This is an indication that the paradigm of linguistics is finally becoming again more pluralistic. There is nothing new about working with real language data. When, two hundred years ago, the philologists embraced the philosophy of the enlightenment and set off to find the laws that make language work, they quickly found out that first of all they needed reliable language data. And while some of them never gave up the dream of uncovering the secret eternal laws that make language work, most philologists were happy to make generalisations about the multitude of data they had to deal with. Yet the more language data they had the more complicated these generalisations became. While the members of all language communities may be born with the same language faculty, languages are constantly in a state of flux, and the changes they undergo are not predictable. As a result, languages are as much the result of dubious analogy, random idiosyncrasy and downright anomaly as they are a rule-based manifestation of a (hopefully systematic) universal language faculty. Of course, this mortifying unruliness goes mostly unnoticed if I use the corpus only to extract the examples which fit with my hypothesis. Corpus linguistics goes further. It also wants to describe what cannot be explained. To mark the transition from a half-yearly to a quarterly publication mode, it may be appropriate to try once more to pin down some of the ideas which I think today constitute corpus linguistics as a theoretical approach to the study of language. The field has changed over the last ten years. While it used to be seen as a set of methods primarily for lexicography and language teaching (i.e. for applied linguistics) it now also offers a perspective on language that sets it apart from received views or the views of cognitive linguistics, both relying heavily on categories gained from introspection rather than from the data itself. Elena Tognini Bonelli discusses this development through the concept of a corpus-driven analysis, replacing the former theoretically less specific corpus-based methods. The views presented here encompass what I have learnt from discussions with many other linguists, corpus and otherwise. I am particularly indebted to John Sinclair and Michael Stubbs, who belong, for me, to the scholars most influential in defining the principles and goals of our still young discipline. Stubbs’ comments on an earlier version of my theses have been most helpful. He has indeed supplied the wording of thesis 10. John Sinclair is as creative and open-minded as ever. While the continuity in his work is stunning to observe, he constantly keeps widening the horizons of his linguistic enquiries. I am also very grateful to Geoff Barnbrook and Anna Cermakova for a host of useful comments, clarifications and corrections. This is then a new attempt to outline my understanding of corpus linguistics as I see it today. This vision will not be shared by all corpus linguistics. Only if the discourse of corpus linguistics remains controversial and pluralist will there be progress. New ideas gain their contours in contrast to what they reject. If my version of corpus linguistics serves this purpose, it will have succeeded. Dissenting voices are welcome. With the new format of the International Journal of Corpus Linguistics and its quarterly publication rhythm it will be possible, from time to time, to open up a discussion forum. Corpus linguistics in 25 theses:
A few notes on these theses may be appropriate. For me, corpus linguistics and cognitive linguistics are two complementary, but ultimately irreconcilable paradigms. Cognitive linguistics is interested in the minds of speakers and hearers. How do we turn our thoughts into speech? How do we transform what we hear or read into thoughts? If we are all born with the same language faculty, how can we draw the line between what is common to all languages, and what is specific to individual languages? To what extent is our linguistic performance governed by universal laws underlying our language faculty, and to what extent is it governed by the cultural conventions of the language community in which we have grown up? All this does not concern corpus linguistics. Corpus linguistics looks at language not from a psychological, but from a social perspective. Verbal interaction is what allows human social groups to be infinitely more complex than even groups of the apes, our closest relatives. Whatever apes do, they are not aware that they are conveying content and they cannot collectively reflect on the content they are conveying. , Humans are unique in that they can can negotiate content collectively. We can promise. We can instruct. We can report. We can plan. We can exchange arguments. And we can make ourselves interactively aware of what one of us is doing, how he or she is feeling, and what he or she is believing. Without this verbal interaction, we would be largely incapacitated. Without an audience, imagined or real, it would not make sense to formulate out thoughts. We would not come up with new thoughts. We would find ourselves in a dead alley. This is why we call solitary confinement torture. Language is, first of all, a collective activity. As Wittgenstein has shown, there is no private language. In an informal conversation, we are concerned, first of all, with people, and only to a minor extent with the texts they contribute. We try to infer their state of mind, their attitudes, their feelings and their intentions, from what they say. Whenever we are in doubt we are able to probe into their minds, by reacting in a special way to what they say. Speaker and hearer are both on the scene. For written language, this is not normally the case. The author has to anticipate the reader’s reactions. The reader cannot question him or her. He cannot even be sure about their authorship. All the readers have is the text. They are concerned with the meaning of the text and not with the intentions of its authors. This is the situation in which the corpus linguist finds himself or herself. Speech and writing are not ontologically as distinct as it has been customary to think. We can look at certain manifestations of orality, such as the verbatim repetitions of texts by specially trained people, as a way of writing. The same goes for rituals involving language, even the lyrics of songs. Then there are ideograms of all kinds, and again it is a question of perspective what we consider as ideograms. There is writing replacing speech and imbued with many of its features, such as SMSmessaging and Internet chatroom conversation. Perhaps we should look at the division between writing and speech as two distinct theoretical categories which in practice are only the extreme points of a cline. To restrict corpus linguistics to the analysis of written language is a rather contentious claim. I expect few of my colleagues to agree with me on this point. But I want us to be aware of the inherent differences between spoken and written language. If we want to investigate speech, we have to make sense of the nonverbal elements from which it cannot be detached. To describe social interaction in a wider frame we have to apply methods provided by other disciplines, such as sociology and cultural anthropology, but what we find there is hardly satisfactory. How, indeed, can we find out about the meaning of nonverbal interaction? Do we not have to listen to the interpretations people give, rather than assign our own categories to it? The linguist is a specialist in investigating texts, not in analysing the real world. His or her knowledge of the world is in no way privileged. When linguists come across a sentence such as “The sweetness of this lemon is sublime.”, their task is not to assert or reject it, but to look to see if other testimony in the discourse does or does not provide supporting evidence. But the exclusion of ‘real world’ data is not just a methodological question. It is also the consequence of the failure of the application of any sort of realism to the meaning problem. To exclude the real world from semantic analysis seems to contravene our intuitive sense of language. We all grow up believing that language should be a mirror of reality. Some proposition either is the case or it is not, and this can be decided on the basis of its factuality. By nature, we are all realists. We intuitively believe that the categories in which we think are ‘natural’, and that, for instance, the colours we assign to objects are ‘really’ there. However there was no word for the colour orange in German before the middle of the 19th century, there are languages in which there is no equivalent of green, and it seems that in Homeric times people were much less interested in colours than in the differences of surfaces. There are no colours in the real world; colours are a mental phenomenon. But they are, it seems, a contingent phenomenon. There might well be discourse communities which do not negotiate the colours of things. We learn what is red or yellow or blue when we learn our mother tongue. There is no innate universal ontology that makes us all see the world in the same way, and there is even less one that would make us see the world as it is. Even if we had direct access to the ontology, be it of the world in our minds or of the metaphysical world, it would not help us to find the meaning of what is said, for the only way to perceive reality is by experience. The discourse does not contain the assembled experience of the members of the discourse community. It only contains the testimony of these experiences. How the real world is related to first-person experience, and how this experience is related to testimony is not within the remit of linguistics. If someone claims he sees a lion, and someone else insists it is a tiger and not a lion, because the animal has stripes, all the linguist can do is to invoke other texts. Though stripes seem to be commonly mentioned in connection with tigers, somebody can still insist that the animal he or she has seen was a tiger, albeit one without stripes. How could the linguist prove that such tigers do not exist? The linguist’s task is to find out if this is the only discourse occurrence of an stripeless tiger, or if there are other cases. Indeed, as I am writing this, google lists 270 hits for the query ‘tiger “without stripes”’. Again, let us be clear: google is a virtual corpus of the [English, in this case] discourse and contains only testimony; it does not help us with reality. Neither does the Encyclopaedia Britannica. Again it presents nothing but testimony. As every judge knows, testimony can be accurate or not, and the principal way to determine what we consider its accuracy is to bring in more testimony. Thus the meaning of the lexical item tiger, which is the meaning of the discourse object tiger, is removed by so many steps from what a tiger is in reality that reality does not matter here. Are there really tigers out there? Do angels really exist? For the corpus linguist, this is not the question. What is important to know is that tigers normally come with stripes and angels normally come with wings. If corpus linguists cannot investigate meaning by looking into the heads of speakers and hearers, and if they do not know anything about reality, what can they do? They can search for paraphrases. What is a paraphrase? A paraphrase is anything that tells us something about the text segment in question. We can look at dictionary definitions as paraphrases, paraphrases made up by lexicographers. These paraphrases are, in principle, not privileged. Nobody has to agree with a lexicographer’s definition. One of the reasons why they are often taken as gospel is that, firstly, we tend to believe the experts, and secondly, the definitions of a given word are astonishingly similar if we look up a number of dictionaries. There is a simple explanation for it. Lexicographers will consult other dictionaries when they work on a new one. Another reason is that in some societies, the English being one of them, people have been educated to trust more the paraphrases proffered by lexicographers. This has to do with the wealth of the English vocabulary. Everyone constantly stumbles over unknown words. A dictionary seems like a straightforward way to find their meaning. But though the lexicographers’ paraphrases are usually more professional than those of other people, and should be based on a broad sample of relevant citations, lexicographers can be as biased as everyone else. What they say is part of the discourse; it is not outside it. It can be accepted, rejected or negotiated. Once we develop a feeling for paraphrases, we become aware that they abound in the discourse. Whenever someone enters a new idea into the discourse, they not only need a name for it, they also have to describe it. It was in the mid nineties that the word globalisation ceased to be a term familiar only to economists and , within a year’s time, became a household word. Journalists had to explain it to their readers. We still find their resonance on the Internet. One of the paraphrases reads: “Globalisation is the latest hope embraced by capitalist commentators for the salvation of their system.” Another one tells us “Globalisation is a great force for good.” Google lists 63600 hits for ‘”globalisation is”’. We find paraphrases for whatever people take to be a unit of meaning. There are 407 hits for ‘”friendly fire means”’, the first one of them reading: “Friendly fire means getting shot by your own people”. People feel the need to explain and define discourse objects when they are under discussion: “The right to choose means that women are entitled to abortion.” The paraphrases we find in the discourse outside of dictionary definitions do not normally distinguish between lexical and encyclopaedic meaning. It is a useless distinction, brought in by linguists, which is not supported by discourse evidence. When we talk about an expression, we talk about what it stands for, namely the discourse object, and this object is represented, identified, explained, and defined solely by the potpourri of paraphrases that others have used before us. Conscientious lexicographers have always taken paraphrases they found in texts seriously. Their own paraphrases attempted to smooth over differences and to deliver a more or less final judgment, just like a judge in a trial after hearing all the testimony. But saying something with which the lexicographic establishment disagrees is not a crime. Nobody should be prosecuted for uttering their discontent with mainstream opinion. This it what makes paraphrases so essential: they tell us what has been said and can be said about a discourse object. For a corpus-driven theory of meaning, they are crucial. They may contradict each other, they may describe something in such irreconcilable features that it is hard to see it as the same thing, but taken together in all their chaotic diversity they are the very material meaning consists of. Society thrives on the diversity of the discourse. Any enforced standardisation of word meaning would be its death. Attempts have been made. In the socialist bloc, the language used on public occasions or in the media was strictly controlled. This was the main hindrance to developing these societies further. Over the years, a separate language used among friends, colleagues and family developed, and the more it moved away from officialese, the less the establishment was connected to the people. All forms of progress depend on the possibility of finding an expression for new ideas and of negotiating these ideas in the discourse. Once a society stigmatises divergent content, it strangles the plurality of the discourse. In a homogenised discourse, you can only safely repeat what others have said before. Any attempt at innovation would be dangerous. In such a society the only way in which discourse can be used for one’s social advancement is by making increasingly hyperbolic statements. A typical example is, it seems to me, the public discourse in the United States, a discourse which is very much isolated from what is discussed in the rest of the world. It is a discourse in which a broad consensus between media and administration has helped to filter out certain undesirable ideas. People who want to be heard in such a homogeneous discourse cannot question established wisdom. They can only expand on what has been said before. For me, this would provide an explanation for the growing unilateralism of America. A creative discourse, on the other hand, is a pluralist discourse, a discourse in which each member of the discourse society is encouraged to participate in the negotiation of meaning. This discourse presupposes a democratic discourse community. Culture has been described, by Edward Tylor, as “that complex whole which includes knowledge, belief, art, morals, law, custom and any other capabilities and habits acquired by man as a member of society”. All the facets named here presuppose language. Knowledge has to be transmitted, beliefs must be expressed, art must be interpreted, morals have to be negotiated, laws have to be inscribed, and customs handed down. Language is always at the centre of culture. The study of language thus could be seen as the core module of the humanities, i.e. of the interpretative disciplines. But the study of language encompasses more than the description of the language system (if there is such a thing) of rules, usage and meaning. If we study the discourse as the container of a culture of a community, then we must have the means to specify what each text or text segment contributes to it. We must be able to make specific claims. We must see our task in the interpretation of individual texts and their (overt or covert) relationships to other texts. This means we must deal with them as unique occurrences. We must read them as repetitions of or reactions to what has been said before and what is being said elsewhere. We must acknowledge that the discourse has, by necessity, a diachronic dimension. We have to detect the traces that earlier texts leave in subsequent texts. We have to account for the iterability of writing. More important, we have to be aware that our interpretations of texts are not outside or on top of the discourse but part of it. This is why they are always provisional, for we never have the whole discourse at hand. No selection can avoid being contingent. New texts are always on the horizon. Corpus linguistics is and will remain an imperfect methodology to make sense of the discourse. For me, it is not so much a theory of language as a conceptual frame for studying the transmission of content in a discourse community, as evidenced in the intertextuality of the discourse. Corpus linguistics localises the study of language, once again, firmly and deliberately, in the Geisteswissenschaften, the humanities. |
| | English Prospectus | For Current Students | Academic Staff | Contact the Department | Medieval Studies | English Home | |