1. What is a Word?

'Word' is a single unit of language, however it is hard to define in a way valid for all languages. It can be defined into the following three types for each language:
Phonological word: A word distinguished as a unit of phonology. E.g. I've in I've done it.
Grammatical word: A word distinguished as a grammatical meaning. E.g. I and have in I've done it.
Dictionary word: A word listed in a dictionary that provides various information of the words. The choice is a matter of convenience. E.g. in French, a verb is refered to by the infinitive: manger 'to eat'. But in Latin it is refered to by the first-person singular of the present indicative: edo 'I eat'.

Things are more complicated in agglutinating languages like Japanese. They build up words out of units expressing grammatical relations. For example, in Japanese, a word kaka-se-rare-ta 'I was made to write' is made from the grammatical elements posted on the following table:

Grammatical word










past tense

Dictionary form






Example: 'I was made to write a report'. Subjects are frequently ommited in Japanese.

2. Chinese Writing

 Writing systems whose basic function units are interpreted as words are known as logographic or ideographic systems. Logographic is used in such languages as Chinese and Japanese. It is also used in early Egyptian hieroglyph and Sumerian cuneiform.

2.1 Graphic development of Chinese characters

Chinese characters grew out of natural objects, but their origin is still a matter of contention. Oracle-bone inscriptions, jiaguwen, discovered in Anayang, Henan province are thought to be the oldest speciments of Chinese writing. The characters of the bronze script are more pictographic than the oracle-bone characters. As writing came to be done more frequently, other writing implements came into common use, and several different script forms developed. The most commonly distinguished are summarized in the following table:







13th–11th centuries BC


oracle-bone script

13th–4th centuries BC


bronze script

8th–3rd centuries BC


seal script


2nd century AD


clerical script


since 4th century AD


cursive script


running script


standard script

2.2 Categorization of Chinese characters

During the Han dynasty about 120 AD, the first major lexicon, the Shuo wen jie zi containing about 9,500 characters, distinguished six principles of formation and use called liu shu 'the six writings'. The first category are pictograms like  and  derived directly from drawings of objects. Next, there are indicators, indicating abstract notions such as  'above' and  'below'. Pictograms or indicators may be joined to form meaning compounds, the third category. For example,  'bright', ming, consists of  'sun' and  'moon'. The fourth category is phonetic loans. A character is transferred to a semantically unrelated word. For example,  zu 'foot' was transferred to write a homophonous word zu 'sufficient'. The fifth category of semantic-phonetic compounds, typically a character consists of two elements, a semantic classifier, which shows the general area of the meaning, and a phonetic, which hints at the pronunciation. The character tang is composed of the phonetic tang and the semantic classifier , 'cereal'. The final category mutually interpretive refers to characters which combine sound and meaning in unexpected ways. For example, the character yue 'music' is also read le 'pleasure'.









ear, eye

er, mu




above, below

shang, xia

Meaning compound



sun + moon  bright

ri, yue, ming

Phonetic loan


foot    sufficient


Semantic-phonetic compound



cereal + Tang   sugar

mi + tang    tang

Mutually interpretive


music    pleasure

yue    le

 2.3 Double articulation of Chinese characters

 Double articulation (or duality of structure) is a property of human language that has two levels of discrete units: meaningful units and meaningless phonological segments. For example, the sentence You go to sleep is composed of the words you, go, to and sleep, and at another level, it is composed of the phonological units [j], [u:], [g], etc. We can find a double articulation-like property on many of the Chinese characters. A character shu 'number' is composed of three elements , and each of which are characters that sound and mean mi 'rice', nu 'female' and zhang 'measure' respectively. But the meanings and sounds assosiated with these characters are unrelated to the meaning and sound of . There is no systematic relation between strokes and characters. A single horizontal stroke itself means 'numerical one', however the one in the character da 'big' has no longer a meaning of 'numerical one'.



mi 'rice'


zhang 'measure'

shu 'number


nu 'female'



yih   'numeral one'



da   'big'


ren   'man'

2.4 Character Distribution

The number of Chinese characters can only be estimated. Regular dictionaries list as many as 10,000 characters, while some authoritative dictionaries list more than 50,000 characters. The 'List of modern Chinese characters for everyday use'  Xiandai hanyu tongyong zibiao published in 1988 by the Committee for the Writing of the National Language includes a primary list of 2,500 characters and a secondary list of another 1,000 characters. As a matter of fact, statistics show that the 2,400 most common characters cover 99 percent of all characters used in typical Chinese texts. The first graph shows one of such observations1. The relationship between frequency and rank follows Zipf's Law, that tells us the product of the frequency (f) and the rank (r) of a word is always constant. In the second graph, the frequency of Chinese characters are estimated from the data of the first graph by assuming the total number of words that are referenced is 100,000

2.5 Linguistic structure

In the exact sense shown below, the Chinese writing system is not described as logographic.
The smallest meaningful element of speech or writing is called
morpheme. E.g. disgraceful is composed of three morphemes: dis-, grace, and -ful. In Chinese, each character is associated with a meaningful morpheme, and is equal to a syllable. But in modern Chinese, most of the words are disyllabic consisting of two characters. For example, consider the list of 'words of location' in the table below. Only the items in the disyllabic column are words in a sense that they can be used alone and moved around in the sentence. The monosyllabic ones must be suffixed to nouns.



















The following is an example sentence from a recent newspaper. Most words contain two or more than two characters.


 3. Sumerian Writing

 Sumer is situated in the southern half of Iraq, between Baghdad and the Persian Gulf. Sumerian is one of the oldest languages, whose characters are written on tablets of clay. Pictograms of an Uruk tablet are sketched below, the oldest stage of Sumerian writing, approximately 3200 BC.


3.1 Graphic development

The original direction of writing was from top to bottom, but for reasons unknown, it changed to left-to-right around 3000 BC. This affected the orientation of the signs by rotating all of them 90º counterclockwise. But the more the scripts write the more they develop routines to produce the pictograms. Curved lines are replaced by series of wedge shape strokes so the Sumerian script is called cuneiform meaning 'wedge' in Latin. Graphic development of cuneiform signs is outlined on the following table:






an, il

kur, mat



nig, sa





gu, gud

 3.2 Linguistic structure

The total number of signs was only around 1,000. In order to enlarge the expressive power of writing, several methods were taken. First, some signs are composed by combining already existing signs like 'food', which was composed by adding the sign of 'bread' to the sign of 'head'. Since Sumerian has a fair number of homonyms, next, a sign for one word is also used to represent another word that has an identical sound, known as rebus writing. If a homonymic sign is expected to lead a misunderstanding, then a determinative sign is placed before it to hint the meaning. For example, logogram gis 'wood' is used as a determinative for names of wooden objects.
Another method is
metonymy, where many words that have similar meaning but different sounds are written with the same sign. For instance, the sign for 'mouth' is also used for the sigh of 'word' and 'speak'. As the system grew more complex, a more efficient way of disambiguation was measured: Supplementary phonetic signs are placed before (and after) the ambiguous word so that they spell out the intended word. For example, the sigh uga 'raven' also means Eresh (name of a city), or naga 'soap'. So they put the signs sounds 'u' and sounds 'ga' before and after it:




