A Corpus-based Lexical and Grammatical Analysis of 
Written Business English

by
Yasumasa Someya
The Graduate Department of Language and Information Sciences,
The University of Tokyo

Presented 
at
The 12th World Congress of Applied Linguistics
(AILA '99 Tokyo, Program No. 7-019-1)
on
August 2, 1999 
 

ABSTRACT

The primary purpose of the study is to identify and describe some of the major lexico-grammatical features of English for Business Purposes (EBP) in its written domain. The author also aims at establishing a computer-based study methodology appropriate for the analysis of such large-scale corpus data as those used in this study, including the development of a series of computer programs with which to analyze the corpus data from various viewpoints and for specific purposes.

The study revealed, among other things, that EBP is characterized by a high degree of lexical closure. The overall lexical growth curve of the one-million-word Business Letter Corpus (BLC) compiled for this study reaches its plateau at about the 15,000-20,000 words range. When lemmatized, 90.8% of all the word tokens were covered only by 1,500 word types, showing a marked contrast to the three reference corpora used in the study. It was further found that the first 350 verb-types in the BLC Wordlist about 90.8% of all verb occurrences in the BLC, and only 100 adverbs -- including all the major "metadiscourse" items -- cover 89.4% of all that are needed to construct cohesive and well-expressed messages. The same also holds true, though with differing degrees and realization patterns, in other POS categories. The tendency of high lexical closure is more evident with the Learner BLC consisting of English business messages written by Japanese business people, where only 109 verb-types account for 90% of all the verb occurrences, and 43 adverbs account for 90.2% of all the instances of adverbs appearing in this corpus. 

The study also substantiated our second claim that EBP is characterized by a low level of lexical difficulty. It was found that approximately 77.5% of the first 3,000 word-types in the BLC, which cover over 95% of all the BLC word tokens, are within the 4000 Basic Words as defined by the JACET (Japan Association of College English Teachers), plus a fairly standard set of proper nouns, abbreviations and acronyms. It also produced ample evidence in strong support of our third hypothesis that written business English is characterized by its incorporation of spoken features into written texts. 

Another interesting finding is that many of the high frequency lexical items, which are ranked as per their relative importance defined statistically, coincide with the items that the Japanese users of EBP are prone to make errors in. The mean error ratio of the most important 50 "key" verbs, for instance, was found to be as high as 15.69% with SD = 12.54. The error analysis conducted on the Learner BLC further revealed that many of these errors show clear systemicity -- in other words, errors occur where they tend to occur and for good reason. With regard to verbs, for instance, errors are more likely to occur when a particular verb in English has its apparent semantic counterpart in Japanese but appears in a different argument structure and under different semantic constraints. Representative cases in point are the verbs discuss and require whose error ratios are as high as 67.69% and 20.51% respectively -- ratios that suggest something is going very wrong with the way these lexical items are taught in the classroom. This finding naturally leads to the proposal that the concepts of "argument structure" and "semantic constraints" be included in the teaching syllabus and students be given appropriate instructions thereof.

It is also noteworthy that many of the "core" lexical items are closely associated with particular syntactic patterns. For instance, the adjective important appears 367 times in the BLC, of which 74 cases (20.16%) occurred in the "It is ADJ (for NP) to VB", "It is ADJ to NP" or "It is ADJ that-clause (or ZERO-that)" formats. This and other similar instances that abound in the BLC indicate the importance of teaching lexical items not in isolation but in reference to the syntactic environments in which they typically appear. 

The discussions are not exhaustive by any means, but enough to substantiate the claim that Business English is a "sublanguage" with its unique lexico-grammatical patterns and that the identification and description of which in a systematic way will help the learners of EBP to learn what need to be learned more effectively than it used to be. It is the author's hope that the data presented in this paper and the various findings thereof will provide the teachers of EBP with a solid, data-driven foundation for their classroom instructions and for writing course materials of their own. The author also believes that the various computer programs written by him for the current study will prove useful for those interested in computational analysis of large-scale corpus data. The programs have been written with JGAWK (Ver. 2.11.1+ 3.0) and will be made available in a ready-to-run format for interested researchers at the completion of the current study, so that they can be tested, modified or otherwise used at the user's disposal.
 

Note:  A complete version of the paper can be downloaded from the author's Website. The AWK programs used in the study can also be downloaded from the same site.
 


Back to Publications List