The University of Birmingham
University Homepage Site Index
Undergraduate AdmissionsPostgraduate AdmissionsResearchCurrent StudentsWho's WhoContact the DepartmentMedieval StudiesEnglish Home
Prof. Wolfgang Teubert

PROF. WOLFGANG TEUBERT

My version of corpus linguistics

Wolfgang Teubert

(Draft, to appear in International Journal of Corpus Linguistics 1/2005)

I decided early on not to publish my own articles in the journal of which I am the editor, except that in the very first issue I attempted to outline my understanding of corpus linguistics, as it was then. It is surely more the excellence of our contributors and of our reviewers than my abstinence that have made this journal flourish. My thanks go to our authors and to the members of our Editorial Board. It is to their credit that the International Journal of Corpus Linguistics is now in its tenth year, strong as ever.

Today, the corpus is considered the default resource for almost anyone working in linguistics. No introspection can claim credence without verification through real language data. Corpus research has become a key element of almost all language study. This is an indication that the paradigm of linguistics is finally becoming again more pluralistic.

There is nothing new about working with real language data. When, two hundred years ago, the philologists embraced the philosophy of the enlightenment and set off to find the laws that make language work, they quickly found out that first of all they needed reliable language data. And while some of them never gave up the dream of uncovering the secret eternal laws that make language work, most philologists were happy to make generalisations about the multitude of data they had to deal with. Yet the more language data they had the more complicated these generalisations became. While the members of all language communities may be born with the same language faculty, languages are constantly in a state of flux, and the changes they undergo are not predictable. As a result, languages are as much the result of dubious analogy, random idiosyncrasy and downright anomaly as they are a rule-based manifestation of a (hopefully systematic) universal language faculty. Of course, this mortifying unruliness goes mostly unnoticed if I use the corpus only to extract the examples which fit with my hypothesis. Corpus linguistics goes further. It also wants to describe what cannot be explained.

To mark the transition from a half-yearly to a quarterly publication mode, it may be appropriate to try once more to pin down some of the ideas which I think today constitute corpus linguistics as a theoretical approach to the study of language. The field has changed over the last ten years. While it used to be seen as a set of methods primarily for lexicography and language teaching (i.e. for applied linguistics) it now also offers a perspective on language that sets it apart from received views or the views of cognitive linguistics, both relying heavily on categories gained from introspection rather than from the data itself. Elena Tognini Bonelli discusses this development through the concept of a corpus-driven analysis, replacing the former theoretically less specific corpus-based methods. The views presented here encompass what I have learnt from discussions with many other linguists, corpus and otherwise. I am particularly indebted to John Sinclair and Michael Stubbs, who belong, for me, to the scholars most influential in defining the principles and goals of our still young discipline. Stubbs’ comments on an earlier version of my theses have been most helpful. He has indeed supplied the wording of thesis 10. John Sinclair is as creative and open-minded as ever. While the continuity in his work is stunning to observe, he constantly keeps widening the horizons of his linguistic enquiries. I am also very grateful to Geoff Barnbrook and Anna Cermakova for a host of useful comments, clarifications and corrections.

This is then a new attempt to outline my understanding of corpus linguistics as I see it today. This vision will not be shared by all corpus linguistics. Only if the discourse of corpus linguistics remains controversial and pluralist will there be progress. New ideas gain their contours in contrast to what they reject. If my version of corpus linguistics serves this purpose, it will have succeeded. Dissenting voices are welcome. With the new format of the International Journal of Corpus Linguistics and its quarterly publication rhythm it will be possible, from time to time, to open up a discussion forum.

 

Corpus linguistics in 25 theses:

  1. The focus of corpus linguistics is on meaning. Meaning is what is being verbally communicated between the members of a discourse community. Corpus linguistics looks at language from a social perspective. It is not concerned with the psychological aspects of language. It claims no privileged knowledge of the workings of the mind or of an innate language faculty.
  2. In corpus linguistics, language study is always the study of written (or transcribed or quoted or otherwise recorded) texts or text pieces, i.e. language which can be reproduced, heard, read and interpreted repeatedly. What is not written or transcribed or quoted or recorded is lost, both for the discourse community and for linguistic investigation. The question of what is spoken and therefore transient, and what is written and therefore permanent, is rather a matter of perspective than of linguistic ‘reality’.
  3. Every text segment, word, multi-word unit, phrase etc., can be viewed under the aspect of form and the aspect of meaning. The form is what represents the meaning, and there is no meaning without the form by which it is represented. Text segments are symbolic; they always mean something to someone. Normally the members of the language community deal with text segments without being aware what text segments mean, just as people are often jealous without being conscious of their jealousy. Unless there is some (potential) communication disorder, there is no need to discuss meaning within the discourse community.
  4. Meaning is in the discourse. Once we ask what a text segment means, we will find the answer only in the discourse, in past text segments which help to interpret this segment, or in new contributions which respond to our question. Meaning does not concern the world outside the discourse. There is no direct link between the discourse and the ‘real world’. It is up to each individual to connect the text segment to their first-person experiences, i.e. to some discourse-external ideation or to the ‘real world’. How such a connection works is outside the realm of the corpus linguist.
  5. For corpus linguistics, the meaning of a text or of a text segment is independent of the intentions of its speaker (its author). The dislocation of the speaker/author from his or her text distinguishes written (recorded) language from spoken language. In spoken language, the speaker is usually present, and if there is a communication disorder, we ask: “What do you mean?” and not: “What does this mean?”.
  6. Corpus linguistics is empirical. Its object is real language data. The discourse is the totality of all the texts that have been produced within a discourse community. Only those of which we have records (in form of written texts or transcripts) can be the object of linguistic investigation. However, just like the discourse community, the discourse is not an ontological reality; it is a construct, the object of research constructed by the linguist. The linguist’s task is to define and delimit his or her object of research, to specify which language data he or she wants to analyse. Delimiters include linguistic, spatial, temporal, social, topical and medial parameters.
  7. Corpus linguistics makes general and specific claims about the discourse, based on the analysis of a suitably selected cross-section of it, i.e. the corpus. General claims have to do with rules or with probabilistic expectations. They fall within the field of grammar or variation or language change, and also into the field of lexical meaning insofar as a text segment occurring in a text can be viewed as an instantiation of a lexical item. Specific claims are interpretations of texts or text segments viewed as unique occurrences.
  8. Each discourse has, by necessity, a diachronic dimension. What is said today is a reaction to what has been said before, an argument in a simultaneous debate and an anticipation of what we expect to be said tomorrow. If we look at language from this perspective, we want to make a specific claim. We want to know what makes a given text segment a unique occurrence rather than a token of a lexical item type. This will be determined by the unique position it maintains in the discourse as a whole, embedded in a context that is unique, and referring to a unique set of other texts. Unless we find the intertextual clues that link this text segment to previous texts, and to relevant contemporaneous texts, we do not know what makes it unique. However, we can only find these intertextual links if our corpus has a diachronic structure.
  9. While corpus linguistics may make use of the categories of traditional linguistics, it does not take them for granted. It is the discourse itself, and not a language-external taxonomy of linguistic entities, which will have to provide the categories and classifications that are needed to answer a given research question. This is the corpus-driven approach.
  10. Corpus linguistics is not in itself a method: many different methods are used in processing and analysing corpus data. It is rather an insistence on working only with real language data taken from the discourse in a principled way and compiled into a corpus. However, one should be wary of using such data merely to find out more about what we know already, since what (we think) we know is often derived from pre-corpus study. Corpus data provide insights of a type which has not previously been available. Concepts and categories derived from introspective language study or from models taken from other fields (e.g. computation) may not be appropriate for describing real language data.
  11. Corpus linguistics does not have its starting point in language universals as ontological features (as opposed to universals as theoretical concepts). Little is reliably known about the language faculty all human beings share. The study of this language faculty is outside the remit of corpus linguistics. Rather, corpus linguistics looks at phenomena which cannot be explained by recourse to general rules and assumptions. It is primarily concerned with the contingencies of language use. Normally, we become aware of language only if there is a communication disorder. These disorders have their origins in the variation we find within and between discourses. They can be analysed in terms of the differences we observe between one language use and another.
  12. The word is not privileged in terms of meaning. The corpus linguist posits endocentric entities, formally held together by some local grammar, and calls these entities (complex) lexical items or, alternatively, units of meaning. Lexical items can be single words, compounds, multi-word units, phrases, and even idioms. Just like single words, (complex) lexical items tend to recur in a discourse. This is why statistical procedures can be used for detecting them in a reasonably large corpus, as significant co-occurrences of the same entities.
  13. Frequency is an important parameter for detecting recurrent patterns defined by the co-occurrence of words. Frequency is thus an essential feature for making general claims about the discourse. However, statistical ‘significance’ is never enough. Lexical items also have to be semantically relevant.
  14. As with idioms, we can describe a complex lexical item holistically as a semantic unit whose meaning cannot be inferred from decomposing it into the smaller lexical items it consists of. It is, however, a matter of degree to what extent the meaning of a complex lexical item is independent of the meaning of the parts it is composed of. Sometimes it can be useful to describe a complex lexical item by assuming that its node is imbued with certain semantic (usually connotative) features inherent in the other elements that the complex lexical item consists of. This approach turns our attention to the phenomenon called semantic prosody (for connotative features) or semantic preference (for denotational features.)
  15. Lexical items can be single words or complex units of meaning. They are, in principle, monosemous. This is what distinguishes the concept of a lexical item from the concept of a word. Most words, particulary the more frequent ones, are polysemous. Complex lexical items can be seen as a node word together with all those words in its context with which it forms a semantic unit. As long as this unit is still ambiguous it is not yet complete, and more elements have to be added. It is complete as soon as it has, as a lexical item type, only one meaning. Once we replace the concept of the polysemous single word by the concept of the monosemous lexical item, the problem of ambiguity that has aggravated many linguists for a long time suddenly disappears.
  16. If the same (monosemous) lexical item recurs in a discourse, then each occurrence is one instantiation of the same lexical item type. Each instance can thus be seen as a token of the type constituted by this lexical item.
  17. Many lexical item types often allow for a certain degree of variation within their instantiations. This variation often has no effect on meaning. If variation affects meaning, we should talk rather about a set of related lexical item types each of which has its own instantiations.
  18. To posit a lexical item is to make a general claim. However, corpus linguistics is also concerned with specific claims. Any text segment can be viewed in two ways: as an instantiation of one or more potential lexical item types, or as a unique occurrence whose meaning has to be interpreted through the intertextual clues which connect it to other texts of the discourse. Frequency is irrelevant when our goal is to interpret text segments as unique occurrences. This is the point at which the diachronic perspective takes over from the synchronic view.
  19. Meaning is paraphrase. Whenever lexical item tokens are the cause of a communication disorder, their meaning will be negotiated, described or explained, replaced by synonyms, and sometimes even ‘defined’ as in dictionaries or in encyclopaedias. What are paraphrased in the discourse are what are looked upon intuitively by the members of the discourse community as units of meaning. However, the same lexical item type can be paraphrased in an infinite variety of ways. Therefore, whenever a lexical item token is being paraphrased, we can view it from two perspectives: as an instantiation of the lexical item type, and as a unique occurrence. From a synchronic perspective, the meaning of a lexical item type is a generalisation on all the paraphrases we find for the instantiations of this lexical item. But paraphrases are also relevant for specific claims. From a diachronic perspective, it is the history of paraphrases of a recurrent text segment, as evidenced in its intertextual links, that tells us what it means as a unique occurrence.
  20. There is no true and no fixed meaning. Everyone can paraphrase a unit of meaning however they like, therefore the meaning of any lexical item type is always provisional. The next paraphrase may already lead to a revision. The members of the discourse community will continue to negotiate, among themselves, what a unit of meaning means. They may agree or not: the issue is not truth, but acceptance. An explanation, a paraphrase that is widely accepted and re-used, is more relevant than a paraphrase that is never repeated, just as texts which are constantly referred to are more relevant than texts that leave no traces in subsequent texts.
  21. The discourse is a self-referential system. Natural language is the only codification system in which the functions of its elements are determined not by ascription from outside but by discourse-internal negotiation. This sets natural languages apart from formal calculi, like the code of mathematics.
  22. When we speak, we do not refer to a discourse-external reality but to what has been said before. When we negotiate the meaning of a text segment, we do this within the discourse, not outside or on top of it. This auto-referentiality of the discourse makes it, by necessity, circular. It holds for the discourse as it is known to hold for any dictionary that we can never escape circularity in making sense of language. Each lexical item refers to other lexical items. Whenever a new lexical item (and each lexical item once has been a new item) is introduced into the discourse it has to be explained in terms of lexical items which are already available.
  23. The discourse contains only testimony, provided by the members of the discourse community. It does not contain first-person experiences. The discourse-external reality can enter the discourse only as testimony. The link between the discourse and the discourse-external reality is, as said before, not part of the language system; it has to be established by each member of the discourse community individually.
  24. Corpus linguistics does not distinguish between lexical meaning and encyclopaedic meaning. The meaning of the unit lemon is everything that has been said about lemons. Lexical items and what they stand for are discourse objects (and not objects of the ‘real world’), constructed through the contributions of the members of the discourse community. As discourse objects, unicorns are as real as lemons. It is up to each member to decide for themselves whether unicorns or lemons are part of their first-person experiences.
  25. Linguistics is not a science like the natural sciences whose remit is the search for ‘truth’. It belongs to the humanities, and as such it is a part of the endeavour to make sense of the human condition. Interpretation, and not verification, is the proper response to the quest for meaning. There is no true meaning. The corpus linguist is not privileged as an ‘expert’ to pass judgment on what is permissible and what not. He or she is part of a discourse community, not outside of it. Corpus linguists have to submit their findings to their discourse community and argue for their acceptance. The discourse community is, in principle, a democratic community. Every member has the right to contribute to the discourse, and to discuss, modify or reject what other members say. The discourse organises itself. All regimentation from the outside strangles the creativity of the discourse community.

 

A few notes on these theses may be appropriate.

For me, corpus linguistics and cognitive linguistics are two complementary, but ultimately irreconcilable paradigms. Cognitive linguistics is interested in the minds of speakers and hearers. How do we turn our thoughts into speech? How do we transform what we hear or read into thoughts? If we are all born with the same language faculty, how can we draw the line between what is common to all languages, and what is specific to individual languages? To what extent is our linguistic performance governed by universal laws underlying our language faculty, and to what extent is it governed by the cultural conventions of the language community in which we have grown up?

All this does not concern corpus linguistics. Corpus linguistics looks at language not from a psychological, but from a social perspective. Verbal interaction is what allows human social groups to be infinitely more complex than even groups of the apes, our closest relatives. Whatever apes do, they are not aware that they are conveying content and they cannot collectively reflect on the content they are conveying. , Humans are unique in that they can can negotiate content collectively. We can promise. We can instruct. We can report. We can plan. We can exchange arguments. And we can make ourselves interactively aware of what one of us is doing, how he or she is feeling, and what he or she is believing. Without this verbal interaction, we would be largely incapacitated. Without an audience, imagined or real, it would not make sense to formulate out thoughts. We would not come up with new thoughts. We would find ourselves in a dead alley. This is why we call solitary confinement torture. Language is, first of all, a collective activity. As Wittgenstein has shown, there is no private language.

Speech is the primordial form of language. Why should we restrict corpus linguistics to the investigation of written (or transcribed or otherwise recorded) language? In an informal conversation, the verbal interaction normally involves other elements, such as deixis, gestures, and facial expressions. At stake is not only the communication of content, but also contributing to a group feeling, creating an atmosphere of trust, attempting to step up on the ladder of social hierarchy. The conversation can be embedded in some other interaction like walking in the park, watching TV on the sofa, or standing alongside an open grave. What is being said cannot be easily dissociated from the situation in which it takes place. It only makes sense within the context in which it has been said. However, to make claims, specific or general, about this non-verbal context is outside of the remit of a corpus linguist.

In an informal conversation, we are concerned, first of all, with people, and only to a minor extent with the texts they contribute. We try to infer their state of mind, their attitudes, their feelings and their intentions, from what they say. Whenever we are in doubt we are able to probe into their minds, by reacting in a special way to what they say. Speaker and hearer are both on the scene. For written language, this is not normally the case. The author has to anticipate the reader’s reactions. The reader cannot question him or her. He cannot even be sure about their authorship. All the readers have is the text. They are concerned with the meaning of the text and not with the intentions of its authors. This is the situation in which the corpus linguist finds himself or herself.

Speech and writing are not ontologically as distinct as it has been customary to think. We can look at certain manifestations of orality, such as the verbatim repetitions of texts by specially trained people, as a way of writing. The same goes for rituals involving language, even the lyrics of songs. Then there are ideograms of all kinds, and again it is a question of perspective what we consider as ideograms. There is writing replacing speech and imbued with many of its features, such as SMSmessaging and Internet chatroom conversation. Perhaps we should look at the division between writing and speech as two distinct theoretical categories which in practice are only the extreme points of a cline.

To restrict corpus linguistics to the analysis of written language is a rather contentious claim. I expect few of my colleagues to agree with me on this point. But I want us to be aware of the inherent differences between spoken and written language. If we want to investigate speech, we have to make sense of the nonverbal elements from which it cannot be detached. To describe social interaction in a wider frame we have to apply methods provided by other disciplines, such as sociology and cultural anthropology, but what we find there is hardly satisfactory. How, indeed, can we find out about the meaning of nonverbal interaction? Do we not have to listen to the interpretations people give, rather than assign our own categories to it?

The linguist is a specialist in investigating texts, not in analysing the real world. His or her knowledge of the world is in no way privileged. When linguists come across a sentence such as “The sweetness of this lemon is sublime.”, their task is not to assert or reject it, but to look to see if other testimony in the discourse does or does not provide supporting evidence. But the exclusion of ‘real world’ data is not just a methodological question. It is also the consequence of the failure of the application of any sort of realism to the meaning problem. To exclude the real world from semantic analysis seems to contravene our intuitive sense of language. We all grow up believing that language should be a mirror of reality. Some proposition either is the case or it is not, and this can be decided on the basis of its factuality. By nature, we are all realists. We intuitively believe that the categories in which we think are ‘natural’, and that, for instance, the colours we assign to objects are ‘really’ there. However there was no word for the colour orange in German before the middle of the 19th century, there are languages in which there is no equivalent of green, and it seems that in Homeric times people were much less interested in colours than in the differences of surfaces. There are no colours in the real world; colours are a mental phenomenon. But they are, it seems, a contingent phenomenon. There might well be discourse communities which do not negotiate the colours of things. We learn what is red or yellow or blue when we learn our mother tongue. There is no innate universal ontology that makes us all see the world in the same way, and there is even less one that would make us see the world as it is. Even if we had direct access to the ontology, be it of the world in our minds or of the metaphysical world, it would not help us to find the meaning of what is said, for the only way to perceive reality is by experience. The discourse does not contain the assembled experience of the members of the discourse community. It only contains the testimony of these experiences. How the real world is related to first-person experience, and how this experience is related to testimony is not within the remit of linguistics. If someone claims he sees a lion, and someone else insists it is a tiger and not a lion, because the animal has stripes, all the linguist can do is to invoke other texts. Though stripes seem to be commonly mentioned in connection with tigers, somebody can still insist that the animal he or she has seen was a tiger, albeit one without stripes. How could the linguist prove that such tigers do not exist? The linguist’s task is to find out if this is the only discourse occurrence of an stripeless tiger, or if there are other cases. Indeed, as I am writing this, google lists 270 hits for the query ‘tiger “without stripes”’. Again, let us be clear: google is a virtual corpus of the [English, in this case] discourse and contains only testimony; it does not help us with reality. Neither does the Encyclopaedia Britannica. Again it presents nothing but testimony. As every judge knows, testimony can be accurate or not, and the principal way to determine what we consider its accuracy is to bring in more testimony. Thus the meaning of the lexical item tiger, which is the meaning of the discourse object tiger, is removed by so many steps from what a tiger is in reality that reality does not matter here. Are there really tigers out there? Do angels really exist? For the corpus linguist, this is not the question. What is important to know is that tigers normally come with stripes and angels normally come with wings.

If corpus linguists cannot investigate meaning by looking into the heads of speakers and hearers, and if they do not know anything about reality, what can they do? They can search for paraphrases. What is a paraphrase? A paraphrase is anything that tells us something about the text segment in question. We can look at dictionary definitions as paraphrases, paraphrases made up by lexicographers. These paraphrases are, in principle, not privileged. Nobody has to agree with a lexicographer’s definition. One of the reasons why they are often taken as gospel is that, firstly, we tend to believe the experts, and secondly, the definitions of a given word are astonishingly similar if we look up a number of dictionaries. There is a simple explanation for it. Lexicographers will consult other dictionaries when they work on a new one. Another reason is that in some societies, the English being one of them, people have been educated to trust more the paraphrases proffered by lexicographers. This has to do with the wealth of the English vocabulary. Everyone constantly stumbles over unknown words. A dictionary seems like a straightforward way to find their meaning. But though the lexicographers’ paraphrases are usually more professional than those of other people, and should be based on a broad sample of relevant citations, lexicographers can be as biased as everyone else. What they say is part of the discourse; it is not outside it. It can be accepted, rejected or negotiated.

Once we develop a feeling for paraphrases, we become aware that they abound in the discourse. Whenever someone enters a new idea into the discourse, they not only need a name for it, they also have to describe it. It was in the mid nineties that the word globalisation ceased to be a term familiar only to economists and , within a year’s time, became a household word. Journalists had to explain it to their readers. We still find their resonance on the Internet. One of the paraphrases reads: “Globalisation is the latest hope embraced by capitalist commentators for the salvation of their system.” Another one tells us “Globalisation is a great force for good.” Google lists 63600 hits for ‘”globalisation is”’. We find paraphrases for whatever people take to be a unit of meaning. There are 407 hits for ‘”friendly fire means”’, the first one of them reading: “Friendly fire means getting shot by your own people”. People feel the need to explain and define discourse objects when they are under discussion: “The right to choose means that women are entitled to abortion.” The paraphrases we find in the discourse outside of dictionary definitions do not normally distinguish between lexical and encyclopaedic meaning. It is a useless distinction, brought in by linguists, which is not supported by discourse evidence. When we talk about an expression, we talk about what it stands for, namely the discourse object, and this object is represented, identified, explained, and defined solely by the potpourri of paraphrases that others have used before us.

Conscientious lexicographers have always taken paraphrases they found in texts seriously. Their own paraphrases attempted to smooth over differences and to deliver a more or less final judgment, just like a judge in a trial after hearing all the testimony. But saying something with which the lexicographic establishment disagrees is not a crime. Nobody should be prosecuted for uttering their discontent with mainstream opinion. This it what makes paraphrases so essential: they tell us what has been said and can be said about a discourse object. For a corpus-driven theory of meaning, they are crucial. They may contradict each other, they may describe something in such irreconcilable features that it is hard to see it as the same thing, but taken together in all their chaotic diversity they are the very material meaning consists of.

Society thrives on the diversity of the discourse. Any enforced standardisation of word meaning would be its death. Attempts have been made. In the socialist bloc, the language used on public occasions or in the media was strictly controlled. This was the main hindrance to developing these societies further. Over the years, a separate language used among friends, colleagues and family developed, and the more it moved away from officialese, the less the establishment was connected to the people. All forms of progress depend on the possibility of finding an expression for new ideas and of negotiating these ideas in the discourse. Once a society stigmatises divergent content, it strangles the plurality of the discourse. In a homogenised discourse, you can only safely repeat what others have said before. Any attempt at innovation would be dangerous. In such a society the only way in which discourse can be used for one’s social advancement is by making increasingly hyperbolic statements. A typical example is, it seems to me, the public discourse in the United States, a discourse which is very much isolated from what is discussed in the rest of the world. It is a discourse in which a broad consensus between media and administration has helped to filter out certain undesirable ideas. People who want to be heard in such a homogeneous discourse cannot question established wisdom. They can only expand on what has been said before. For me, this would provide an explanation for the growing unilateralism of America. A creative discourse, on the other hand, is a pluralist discourse, a discourse in which each member of the discourse society is encouraged to participate in the negotiation of meaning. This discourse presupposes a democratic discourse community.

Culture has been described, by Edward Tylor, as “that complex whole which includes knowledge, belief, art, morals, law, custom and any other capabilities and habits acquired by man as a member of society”. All the facets named here presuppose language. Knowledge has to be transmitted, beliefs must be expressed, art must be interpreted, morals have to be negotiated, laws have to be inscribed, and customs handed down. Language is always at the centre of culture. The study of language thus could be seen as the core module of the humanities, i.e. of the interpretative disciplines. But the study of language encompasses more than the description of the language system (if there is such a thing) of rules, usage and meaning. If we study the discourse as the container of a culture of a community, then we must have the means to specify what each text or text segment contributes to it. We must be able to make specific claims. We must see our task in the interpretation of individual texts and their (overt or covert) relationships to other texts. This means we must deal with them as unique occurrences. We must read them as repetitions of or reactions to what has been said before and what is being said elsewhere. We must acknowledge that the discourse has, by necessity, a diachronic dimension. We have to detect the traces that earlier texts leave in subsequent texts. We have to account for the iterability of writing. More important, we have to be aware that our interpretations of texts are not outside or on top of the discourse but part of it. This is why they are always provisional, for we never have the whole discourse at hand. No selection can avoid being contingent. New texts are always on the horizon.

Corpus linguistics is and will remain an imperfect methodology to make sense of the discourse. For me, it is not so much a theory of language as a conceptual frame for studying the transmission of content in a discourse community, as evidenced in the intertextuality of the discourse. Corpus linguistics localises the study of language, once again, firmly and deliberately, in the Geisteswissenschaften, the humanities.


| English Prospectus | For Current Students | Academic Staff | Contact the Department | Medieval Studies | English Home |