September 23, 2013

Understanding Chinese Characters


Chinese characters are a very complex system of recording the Chinese language into writing. Most of what seems to be a mix of illegible symbols is part of a logical but complex writing system that has been gradually developed around 2300 - 3000 years ago, with oldest confirmed characters dating back to around 1200 - 1050 BC. In this article I will try to briefly explain what one needs to know in order to understand Chinese characters and what you should know before you start studying them

Some basic facts:

  • The earliest confirmed evidence of the Chinese script yet discovered is the body of inscriptions on oracle bones (cattle bones and turtle shells used in divination and fortune telling ceremonies) from the late Shang dynasty (1200-1050 BC) - Wikipedia
  • According to some studies (including my own), you need to know only about 2500 characters to read the newspaper.
  • About 80% of all characters are made up of two elements - one telling you how to read it and the other one telling you what it means. This is good news, because based on these two elements you should know how to pronounce and understand the meaning of 4 out of 5 Chinese characters. 80% is a huge number and if you learn how to read this type of characters and understand their system, your learning progress will be much faster. 
Benchmarks in character evolution

Oracle bone script

The earliest preserved characters, that can be reasonably proven and dated are characters found on Oracle bones. These were bones (usually scapulae) of large animals (usually ox) or turtle shells that were used in ancient Chinese fortune telling and divination ceremonies. A small concave was drilled onto the bone (probably after the animal has been sacrificed) and a glowing piece of coal was placed in it. A person responsible for the ritual blew on the piece of coal which cracked the bone and based on the direction of the crack, the answer to the question of the fortune teller was 'yes' or 'no'. The question, the result of the fortune telling along with other details was then inscribed in characters onto the bone itself. The character system that was used is called 甲骨文 - The oracle bone script.

The discovery of these bones is relatively recent (1928). Fragments of these bones were sold in a Chinese medicine shop in a province in China, until someone noticed that they had these inscriptions on them. According to Wikipedia, they have been traced to a village near Anyang in Henan province.

Seal script

Before the unification of China in 221 BCE, there were no universal rules for writing characters. There were several ways of writing the same character, with varying shapes, stroke orders and stroke types. Several local writing systems have been developed as well. After China has been united by the Qin dynasty in 221 BCE and the Warring states period has ended, the First emperor Qin shi huang has decided to abolish all existing forms of writing and ruled that the only form of writing to be used was the one used in the state of Qin (developed gradually during the preceding Warring states period) - a script today called the Seal script. 

Regular script

The Seal script preserved its official status for a relatively short period of time. Other scripts started to emerge, with some of them rising to prominence.

Regular script (the way characters are written today - both traditional and simplified) has been attributed to Zhong Yao, of the Eastern Han to Cao Wei period (ca. 151–230 CE), who has been called the “father of regular script”. However, some scholars postulate that one person alone could not have developed a new script which was universally adopted, but could only have been a contributor to its gradual formation. It was not until the Southern and Northern Dynasties (420 to 589 CE) that regular script rose to dominant status. During that period, regular script continued evolving stylistically, reaching full maturity in the early Tang Dynasty (618–907 CE).- Wikipedia.


The three mentioned scripts are the benchmarks in evolution of Chinese characters because for the most part, the later directly derive from the earlier, they each have dominated for extended periods of time and respected dictionaries often refer to at least the Seal script versions for a better understanding of character etymology. 


The above picture shows three versions of the character 人 ren2 'person' in all three scripts. In this particular case, the character is simple and has not undergone a lot of change. The changes are more formal than structural.

The next picture shows the character 化 hua4 'change' in all three versions. As it is a simple character, you still can't see any substantial structural or formal differences between the Oracle bone and the Seal scripts, formal changes have been made in the transition from the Seal to the Regular script.

As you can see, both sides of the Regular script character have changed in form. The left 人 has been contracted to 亻, which is a rule in the Regular script. Lots of standalone characters, when becoming parts of other characters, on the left side are somehow contracted and this is one example of it.

The original character was a picture of a person 人 and another person turned upside down, hence the meaning 'change'. Since the character did not significantly change in form in its transition into the Seal script, its etymology can be easily understood there. In the Regular script however, this is not the case. The Seal script is therefore a very important step in understanding character etymology, since in many cases it preserves the shapes of the Oracle bone script better than the Regular script.

Another example is the character 伐 fa2 'attack, to send an expedition' (formed by 人 ren2 'person' and 戈 ge1 'weapon') in all three scripts. Notice how 人 preserved its shape in the Oracle bone and Seal scripts but again has been arbitrarily changed to  in the Regular script.

The transition features in the above mentioned characters are simple examples, but are very frequent. There are more complicated ones however. The following example is one of them and also shows how important the Seal script helps us understand character etymology:

In the figure above we can see the character 乏 fa2 'to lack' which is the mirror image of the character 正 zheng4 'correct, precise'. The inversion has been made on purpose to point to the meaning of the character 'to lack' (inverted preciseness, not precise, lacking) and can be clearly seen in the Seal script. This inversion is however completely invisible in the regular script. In the Regular script, the character consists of a 丿pie3 'left falling stroke contracted' at the top over 之 zhi1 'to go' at the bottom, both of which have been chosen arbitrarily by scholars when developing the Regular script to simplify the Seal script version of 乏 and fit Regular script formatting. 

This type of simplification and formatting is a very frequent feature of the whole transition process. The Seal script is a simplification of the Oracle bone script and the Regular script is a simplification of the Seal script (and modern Simplified characters further simplify the traditional characters of the Regular script). Since scholars only had a few hundred elements to choose from for the transition and they had to choose elements that would resemble the shape of the seal script most, in the case of 乏 they ended up choosing  丿 and 之.

The Seal script is also very helpful in understanding phono-semantic compound characters as the following example shows:

The top row shows the 父 fu4 'father' character as written in the Seal and Regular scripts. The second row shows the character 布 bu4 'cloth'. The 布 character  bu4 'cloth' character is a phono-semantic compound (something tells you how to read it and something what it means) composed of 巾 jin1 'towel (meaning) and 父 fu4 (reading). In the Seal script, you can clearly see, that 父 is  part of the 布 character and acts as the phonetic element in it, in the Regular script  has been simplified down to two strokes and is not recognizable anymore.


When first characters started to appear, they were simple pictures of objects, some of which are still in use today. Some of these characters are 人 (person), 龜 (turtle), 日 (sun), 月 (moon), 門 (door). A long time ago, these characters were really pictures of what they represented, today, they are preserved in the Regular script with only their form changed so that it fits the Regular script formatting. Whoever was inventing these characters very soon must have realized that this way of recording a language was very impractical because:
  • there was no relation to the sound in the character and unless told, no one was exactly sure how to read it
  • it might have been easy to create small pictures of concrete objects, but abstract terms, verbs, adverbs, prepositions etc. must have been very difficult if not impossible to create
  • apart from the fact that there is no relation to the sound or the way a picture should be read, there is also no clear relation to the meaning. A picture of a standing man can represent 'a person, a man, to stand, to be patient...' and probably lots of other things. 
  • characters did not have a standard form, stroke order or stroke number. Quite possibly every time someone tried to write something and did not have an existing text at hand to compare it to, the shape, stroke number and order of some characters must have changed by accident, which caused a lot of confusion. Some characters had almost 20 versions
  • those who were inventing characters started to realize that it would be impossible to create as many characters as there are words, objects, actions, situations etc. and a new system had to be created
To partly overcome the problem of defining abstractness, the scribes started to combine meanings of existing characters into new ones: 

好 hao4 'to be fond of' created as a combination of 女 nv3 'woman' and 子 zi3 'child'
林 lin2 'forest' combining two 木 mu4 'tree'

or started to employ character loans:

我 wo3 'me, I' originally a character meaning 'axe, weapon' created as a combination of 扌shou3 'hand' and 戈 ge1 'axe', used for the 1st personal pronoun 'I, me' because the Ancient Chinese words for 'axe' and 'I, me' had the same or similar pronunciation. 
六 liu4 'six' originally probably a picture of a small house chosen to represent the number six because of its sound
不 bu4 'no, not' originally a picture of a flying bird with the character meaning 'to fly, to soar'.

To overcome multiple meaning ambiguity, they started adding indicators to existing characters, pointing to their meanings: 

木 mu4 'wood' 本 ben3 'roots'
刀 dao1 'knife' 刃 ren4 'edge of a blade' 
日 ri4 'sun' 旦 dan4 'dawn' 

This however still did not solve one very big problem - sound. There was nothing in the old characters that would tell you how to read them. Probably after sound loans have been introduced, instead of purely combining the meanings of two characters, scholars started to combine them in a way, one character was chosen to point to the meaning and another character was chosen to point to the sound of the character as a whole. For instance 財 cai2 'wealth' was created as a combination of 貝 bei4 'money' pointing to the meaning and 才 cai2 'talent' pointing to the sound. 清 qing1 'clear' and was created as a combination of 氵shui3 'water' pointing to the meaning and and 青 qing1 'green, blue' which points to the sound. 

This method proved itself to be historically the most effective and prevalent one as today, more than 80% of characters in use are of this type. 80% is a huge number and it is safe to say that Chinese Characters today can be divided into these compound characters and the rest. 

Phono-semantic compounds explained

The above table shows the 才 character entry from the Etymological Phonetic Dictionary of Modern Chinese Characters I'm working on. 才 cai2 is the leading phonetic character for this group. 才 is a very good character to explain phono-semantic compounds (PSCs) on because it is found both in regular and an irregular compounds.

The most prevalent form of PSCs today is a one where the semantic element is on the left side and the phonetic on the right side as is the case with the first two characters 財 and 材. These can be called regular PSCs. I call them regular, simply because they are the most frequent ones. Actually in this case, 才 is a perfect phonetic as it matches the syllable (initial and final both) and the tone as well.

In 在 zai4 however, the 才 phonetic element is on the left side and has been corrupted (clearly visible in the Seal script version of the character). I call this an irregular PSC. 才 is also not a perfect phonetic element in this case (cai2 Vs. zai4) but still works very well compared to some other PSCs.

The 存 cun2 character is not a PSC, but a meaning-meaning compound. 才 is clearly a part of it as a co-semantic element on the left (see explanation). 

才 is the phonetic again in the following character zai2. It has been corrupted into a 十 at the top left. This character is not used as a standalone character today, but has been chosen as a new phonetic element in the following three characters and as a co-semantic in the last one.


  • For understanding character etymology, understanding the earlier versions of modern characters, especially the Seal script is very helpful. Many phonetic or semantic elements have been simplified or corrupted and are not recognizable anymore in the modern versions.
  • You do not need to know 50 000 characters to read the newspaper or books. According to studies, 2500 characters is enough to read the newspaper. According to Wikipedia, the Dictionary of the Emperor KangXi (one of the most reputable later dictionaries in Chinese character history) contains 47 000, characters, but 40% of these are graphic variants. I would guess, that most of the remaining characters are place names, people's names or names of local dishes, animals, plants or rarely used objects.
  • Most of Chinese characters (about 80%) are phono-semantic compounds, where one element in the character points to the sound and another element in it points to the pronunciation. Learning the system behind this type of characters will improve your learning significantly.

No comments:

Post a Comment