Professional Documents
Culture Documents
Analysis of ChChara System
Analysis of ChChara System
*Corresponding author: Jiajia Hu: School of Chinese Language and Literature, Beijing Normal
University, Beijing 100875, China. E-mail: Hjj81@126.com
Ning Wang: Research Center for Folklore, Classics and Chinese Characters, Beijing Normal
University, Beijing 100875, China
1 Introduction
Recent studies in linguistics indicate a growing interest in quantitative descrip-
tions of graphical symbols and scripts. The present contribution aims at giving
an overview of measurable properties of signs and sign systems, as well as func-
tional dependences among symbol properties (Altmann & Fan, 2008). However,
the quantitative studies on Chinese scripts have been poor (Bohn, 2002; Feng,
2006; Liu, 2012; Qi, 1996; Zhou, 2008).
There are two systems of writing (Saussure, 1916/2001). (1) In an ideographic
system, each word is represented by a single sign that is unrelated to the sounds
Unauthenticated
Download Date | 4/28/16 8:42 PM
66 Jiajia Hu and Ning Wang
of the word itself. Each written sign stands for a whole word and consequently, for
the idea expressed by the word. The classical example of an ideographic system
of writing is Chinese. (2) The system commonly known as the phonetic system
tries to respond to the succession of sounds that constitute a word. Phonetic sys-
tems are sometimes syllabic and sometimes alphabetic, i.e., based on the irreduc-
ible elements used in speaking. As an ideographic system of writing, the most
noticeable feature of Chinese is that each character is constructed according to a
word’s meaning, which the character records. There is always, as a result, some
information pertaining to meaning that can be obtained from a character, which
is called the character’s original meaning. The traditional six principles of inquiry
into characters’ original meanings with respect to their structures are known as
LiuShu. Based on LiuShu, Wang (2002) puts forward a modern theory that uses
“character components and component functions” to analyse Chinese character
structures. In this theory, the elements of the Chinese writing system are not char-
acters or strokes but character components that carry specific meanings when
they are used to compose characters. For example, the characters 桃 (táo, means
peach), 李 (lǐ, means plum), 柳 (liǔ, means willow), 桂 (guì, means laurel), 杏
(xìng, means apricot), which have a common component 木 (mù, means wood),
are all used to record the name of a type of tree. Thus, the consistency of Chinese
characters’ forms and meanings are not only evident in each single character but
also evident in the consistent meaning of the same component in different char-
acters, which make Chinese characters an ordered system.
The Chinese character components (or components for abbreviation) are the
elements that are used to form Chinese characters, such as 木 and 兆 are the com-
ponents of 桃 whereas 木 indicates the meaning of 桃 and 兆 (zhào) indicates
the sound of 桃. A character composed of only one component, i.e., the charac-
ter itself, is called a non-composite character. A character composed of no less
than two components is called a composite character. There are two types of Chi-
nese character components: (1) primitive components, which are not joined with
other components, such as 田, 力, . . . , and (2) compound components, which are
joined with other components, such as 相, which is composed of 木 and 目, 吾
which is composed of 五 and 口, . . . . Therefore, a Chinese character is composed
of components hierarchically, with primitive components as its bottom. Table 1
shows the component hierarchy of a Chinese character.
The first layer component is also called a direct component because it has
a direct relationship with the character’s original meaning. A direct component
can play a quadruple role in composing a character: (1) a pictographic symbol,
(2) a semantic symbol, (3) a phonetic symbol and (4) a deictic symbol. They are
called direct component functions. The combination of modes of direct compo-
nents with their different functions in the formation of Chinese characters can be
Unauthenticated
Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 67
divided into eleven categories, the structure modes of Chinese characters, which
are listed in Table 2.
In Chinese, each composite character is composed of no less than two direct
components, whereas each non-composite character can be viewed as a compo-
Character 堤
0 single-element character/ 止
non-composite character
Unauthenticated
Download Date | 4/28/16 8:42 PM
68 Jiajia Hu and Ning Wang
sition of one direct component, which is just the character itself. Some of these
components are primitive, and the rest of them are compound. In other words, as
independent characters, they are also composed of no less than two direct com-
ponents. Take 堤 as an example. According to Shuowen, 堤 (dī, means embank-
ment) is composed of a semantic component 土 and a phonetic component 是.
土 (tǔ, means soil) is a primitive component because, as an independent charac
ter, it is non-composite. 是 (shì, means true), as an independent character, is
composed of two semantic components 日 and 正, so it is a compound compo-
nent. 日 (rì, means sun) is also a primitive component, while 正 (zhèng, means
straight) is composed by a pictographic component 止 (zhǐ, means foot) and a
deictic component 一 (yī, means one), both of which are primitive components.
Unauthenticated
Download Date | 4/28/16 8:42 PM
69
Quantitative analysis of Chinese characters system
Unauthenticated
Download Date | 4/28/16 8:42 PM
70 Jiajia Hu and Ning Wang
1 堤 土 Semantic 是 phonetic
2 土
3 是 日 semantic 正 semantic
4 日
5 正 止 pictographic 一 deictic
6 止
7 一
IF THEN
=1 0
=2 pictographic + deictic Deictic-pictographic
=2 semantic + deictic Deictic-semantic
=2 deictic + phonetic Deictic-phonetic
=2 pictographic + phonetic Pictographic-phonetic
=2 semantic + phonetic Semantic-phonetic
=2 pictographic + pictographic Pictographic composite
=2 pictographic + semantic Semantic-pictographic
=2 semantic + semantic Semantic composite
>2 others – phonetic Comprehensive composite
without phonetic component
>2 phonetic + others Comprehensive composite with
phonetic component
Unauthenticated
Download Date | 4/28/16 8:42 PM
71
Quantitative analysis of Chinese characters system
formation tree of a character, with all of its components on each layer (see Figure
1). From the top to the bottom, it is a process for analysing a character’s structure.
From the bottom to the top, it is exactly the process for composing a character
with primitive components.
Based on the complete description of seal script structures (like Table 3), a
software (Hu, 2010) is developed to demonstrate the formation tree of each char-
acter in Shuowen. When searching for a Chinese character (e.g., 使), this software
will first find its items in the table and its direct components (here 吏 and 人) and
then find these components’ items in the same table and their direct components
(史 and 一). The process will continue until all components are primitive. In other
words, they have no direct components in the table. The result includes a forma-
tion tree with the searching character as its root, as in Figure 1, and its direct
component number, its primitive component number, the character layer and its
structure mode.
Unauthenticated
Download Date | 4/28/16 8:42 PM
72 Jiajia Hu and Ning Wang
The software (Hu, 2010) uses the five attributes above and their combina-
tions to analyse and compare the systematic nature of synchronic characters at
different development stages. This paper, based on the complete description of
seal scripts in Shuowen, presents statistical data with the software to explain and
prove the ordered system of seal scripts, an ancient script style adopted in the Qin
Dynasty for the purpose of the standardisation of writing as well as serving as a
bridge connecting ancient script styles, such as oracle bone inscription or bronze
inscription, and modern script styles, such as clerical script or standard script.
This subsection uses the five attributes above to analyse the seal scripts in
Shuowen separately. The horizontal axis represents one attribute; the vertical axis
represents the number of characters with a property of this attribute.
(1) The direct component number of characters varies from one to six. Char-
acters with two direct components account for the most part, nearly 95%, and
characters with only one direct component, non-composite characters, account
for the second highest percentage, only 2.7%. Those characters with no less than
three direct components only account for no more than 2.4% (see Table 5).
1 257 2.73
2 8949 94.89
3 23 2.08
4 196 0.24
5 4 0.04
6 2 0.02
Unauthenticated
Download Date | 4/28/16 8:42 PM
73
Quantitative analysis of Chinese characters system
1 257 2.73
2 1136 12.05
3 1898 20.13
4 1877 19.9
5 1389 14.73
6 1007 10.68
7 708 7.51
8 468 4.96
9 295 3.13
10 165 1.75
11 77 0.82
12 80 0.85
13 41 0.43
14 16 0.17
15 8 0.08
16 2 0.02
17 2 0.02
18 1 0.01
19 1 0.01
20 0 0
21 0 0
22 1 0.01
23 1 0.01
24 0 0
25 0 0
26 1 0.01
1 257 2.73
2 1203 12.76
3 2783 29.51
4 2504 26.55
5 1885 19.99
6 632 6.7
7 127 1.35
8 37 0.39
9 3 0.03
Unauthenticated
Download Date | 4/28/16 8:42 PM
74 Jiajia Hu and Ning Wang
(4) The semantic-phonetic characters account for more than 84%, followed
by semantic comprehensive characters and non-composite characters. These
three formation modes account for 95% total (see Table 8).
0 257 2.73
Deictic-pictographic 6 0.06
Deictic-semantic 68 0.72
Deictic-phonetic 1 0.01
Pictographic-phonetic 9 0.1
Semantic-phonetic 7926 84.04
Comprehensive composite with phonetic component 138 1.46
Pictographic composite 7 0.07
Semantic-pictographic 78 0.83
Semantic composite 854 9.06
Comprehensive composite without phonetic component 87 0.92
(5) Eighty percent of characters never included other characters as direct com-
ponents. Only 20% of characters have included other characters as direct compo-
nents, and the functions of these characters vary substantially. Several compo-
nents with powerful functions could compose tens or hundreds of characters,
and many components could only compose several other characters (see Figure
2). The direct constructing ability of characters (represented by F) and their ranks
(represented by R) are approximated with a Zipfian distribution (FR0.82=1324). It
can be easily observed by plotting the data with log-log coordinates, in which the
plot is linear with a negative slope (see Figure 3).
Unauthenticated
Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 75
Fig. 3: Direct constructing ability of characters and their ranks (log-log coordinate)
Unauthenticated
Download Date | 4/28/16 8:42 PM
76 Jiajia Hu and Ning Wang
This subsection uses the paired combinations of the five attributes above to ana
lyse the seal scripts in Shuowen. The vertical axis represents one attribute, and the
colour represents the other attribute. Each pairwise combination performs two
types of analysis, the distribution of the character number with the paired attri-
butes and the distribution of the direct constructing ability of these characters.
They are both represented by the horizontal axis.
According to Tables 9–14, we can see that no matter how many primitive com-
ponents they have, most seal scripts have two direct components, and their layers
mainly distribute in two to six. As the primitive component number and character
layer increase, the capability (direct character-formation number) of these char-
acters declines rapidly. However, with the same number of primitive components,
the fewer their layers, the more capability these characters may have. The char-
acters that have more than seven primitive components can hardly be used to
compose other characters.
Table 9: Character distribution of direct component number and primitive component number
1 257
2 1136 1843 1826 1356 976 688 452 287 162 73 77 41
3 55 41 28 26 18 13 7 1 3 3
4 10 5 3 1 2 2
5 1 1 1 1
6 1 1
14 15 16 17 18 19 20 21 22 23 24 25 26
1
2 15 8 2 2 1 1 1 1 1
3 1
4
5
6
Unauthenticated
Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 77
Table 10: Direct constructing ability distribution of direct component number and primitive
component number
1 7174
2 4748 2219 1267 733 584 185 100 30 9 28 4 0
3 270 147 100 63 276 7 18 0 26 1
4 81 17 6 5 0 4
5 4 0 1 3
6 1 3
14 15 16 17 18 19 20 21 22 23 24 25 26
1
2 1 0 0 0 0 0 2 0 0
3 1
4
5
6
Table 11: Character distribution of direct component number and character layer
1 257
2 1137 2706 2456 1860 626 124 37 3
3 55 65 45 22 6 3
4 10 9 2 2
5 3 1
6 1 1
Unauthenticated
Download Date | 4/28/16 8:42 PM
78 Jiajia Hu and Ning Wang
Table 12: Direct constructing ability distribution of direct component number and character
layer
1 7174
2 4748 2685 1758 586 104 26 3 0
3 270 250 309 46 23 11
4 81 28 0 4
5 5 3
6 1 3
Table 13: Character distribution of character layer and primitive component number
1 257
2 1136 55 10 2
3 1843 761 140 32 6 1
4 1106 863 373 112 37 7 3 2 1
5 306 504 432 262 138 83 36 29 12
6 96 146 135 106 58 34 30 14
7 12 31 32 12 3 15 11
8 2 12 9 2 5 2
9
14 15 16 17 18 19 20 21 22 23 24 25 26
1
2
3
4
5
6 5 4 2 1 1
7 7 2 1 1
8 1
9 1
Unauthenticated
Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 79
Table 14: Direct constructing ability distribution of direct component number and character
layer
1 7174
2 4748 270 81 1
3 2219 528 160 54 6 1
4 886 511 370 285 9 4 2 3 0
5 179 207 125 59 34 8 25 1 0
6 26 47 22 3 1 27 1 0
7 3 17 10 2 2 0 0
8 0 0 0 0 3 0
9
14 15 16 17 18 19 20 21 22 23 24 25 26
1
2
3
4
5
6 0 0 0 0 0
7 1 0 0 2
8 0
9 0
According to Tables 15–20, in which, for the sake of convenience, the eleven
formation modes are combined into three categories: non-composite characters,
composite characters without phonetic components and composite charac-
ters with phonetic features, we can see that except for non-composite charac-
ters, seal scripts of all layers with phonetic component outnumber seal scripts
without phonetic component, and with the increase of character layers, the gap
also increases. Meanwhile, except for non-composite characters, the direct con-
structing ability of seal scripts without phonetic component is stronger than seal
scripts with phonetic component. However, as the primitive component number
and character layer increase, the capability of these characters without phonetic
component declines, but the capability of these characters with phonetic compo-
nent shows a parabolic trend that peaks as the primitive component number and
layers are both four.
Unauthenticated
Download Date | 4/28/16 8:42 PM
80 Jiajia Hu and Ning Wang
Table 15: Character distribution of structure modes and direct component number
1 2 3 4 5 6
Non-composite 257
Composite with phonetic component 1047 119 14 3 2
Composite without phonetic component 7935 77 9 1
Table 16: Direct constructing ability distribution of structure modes and direct component
number
1 2 3 4 5 6
Non-composite 7174
Composite with phonetic component 6661 505 82 4 4
Composite without phonetic component 3249 404 31 4
Table 17: Character distribution of structure modes and primitive character number
1 2 3 4 5 6 7 8 9 10 11 12 13
Non-composite 257
Composite with 350 315 197 112 85 36 25 11 8 6 4 1
phonetic
component
Composite 786 1583 1680 1277 922 672 443 284 157 71 76 40
without phonetic
component
14 15 16 17 18 19 20 21 22 23 24 25 26
Non-composite
Composite with 1 1
phonetic
component
Composite 15 8 2 2 1 1 1 1
without phonetic
component
Unauthenticated
Download Date | 4/28/16 8:42 PM
81
Quantitative analysis of Chinese characters system
Table 18: Direct constructing ability distribution of structure modes and primitive component
number
1 2 3 4 5 6 7 8 9 10 11 12 13
Non-composite 257
Composite with 350 315 197 112 85 36 25 11 8 6 4 1
phonetic
component
Composite 786 1583 1680 1277 922 672 443 284 157 71 76 40
without phonetic
component
14 15 16 17 18 19 20 21 22 23 24 25 26
Non-composite
Composite with 1 1
phonetic
component
Composite 15 8 2 2 1 1 1 1
without phonetic
component
1 2 3 4 5 6 7 8 9
Non-Composite 257
Composite with phonetic component 399 426 185 112 24 5 1
Composite without phonetic component 804 2357 2319 1773 608 122 36 3
Table 20: Direct constructing ability distribution of structure modes and character layer
1 2 3 4 5 6 7 8 9
Non-composite 7174
Composite with phonetic component 4193 2022 698 303 36 4 0
Composite without phonetic component 907 946 1372 336 91 33 3 0
Unauthenticated
Download Date | 4/28/16 8:42 PM
82 Jiajia Hu and Ning Wang
The results indicate that the sealed script in Shuowen uses approximately
257 primitive components, which account for no more than 3% of all characters,
to construct a strict hierarchical system layer by layer. Most characters are com-
posed of two direct components, a semantic component and a phonetic compo-
nent, which is the most economic formation mode for indicating both a charac-
ter’s meaning and sounding. More than 80% of characters are composed of two
to six primitive components, forming a tree with two to five layers. It can be con-
cluded that there are mainly five patterns of seal script formation trees, as Figure
4 shows: 1) two leaves, two layers; 2) three leaves, three layers; 3) four leaves,
three or four layers (mainly four); 4) five leaves, four or five layers (mainly four);
and 5) six leaves, four or five layers (mainly five). Furthermore, the phonetic com-
ponents of seal scripts are mostly non-composite characters or characters with-
out phonetic component. However, the increase in character layers is determined
by a character’s phonetic component. In other words, the phonetic component
of a character may usually be more complicated than its semantic component,
which is consistent with both human intuition and research.
4 Further research
The Chinese writing system as we know it today has existed for well near four
thousand years. In the long course of history, Chinese script style has undergone
significant transformation, and there have been five major stages in its develop-
ment: oracle bone inscription, bronze inscription, seal script, clerical script and
standard script. These stages can be further divided into two categories: ancient
Chinese script (including the first three stages) and modern Chinese script (in-
cluding the latter two stages). Anything more than a cursory glance at Chinese
characters will reveal a high degree of structure to them. However, what exactly
Unauthenticated
Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 83
is the structure? Where did the structure come from? How do we describe this
structure? There are already some studies that try to define and analyse the struc-
ture of Chinese characters, but most of them focus on modern Chinese characters,
which people think is more useful than focusing on ancient Chinese characters.
These studies usually take strokes as the structural units of Chinese characters.
However, this line of thought with respect to the strokes of Chinese characters
only applies to modern Chinese script. What we want to emphasise here is that
the development of Chinese characters is a continuous process. The seal script is
well known as the bridge between ancient and modern Chinese script. Therefore,
the methodology we put forward in this paper can be extended easily to both
ancient and modern Chinese characters. We have already begun this research. By
analysing and comparing synchronic scripts at different stages, we hope to find
the common nature and developmental tendencies of Chinese writing. This is a
huge project, but we believe it may lead to substantial new developments in the
study of ideography and provide more useful guidance to application, as well.
Funding: This paper was supported by the Fundamental Research Funds for the
Central Universities.
References
Altmann, G., & Fan, F X. (Eds.) (2008): Analyses of Script: Properties of Characters and Writing
Systems (Quantitative Linguistics, 63). Berlin. New York: Mouton De Gruyter.
Bohn, H. (2002): Untersuchungen zur chinesischen Sprache und Schrift [Studies on the Chinese
language and writing]. In Köhler, R. (Eds.), Korpuslinguistische Untersuchungn zur
quantitativen und systemtheoretischen Linguistick [Corpus studies in quantitative and
systems theoretical linguistics] (127–177). Retrieved from http://ubt.opus.hbz-nrw.de/
volltexte/2004/279/pdf/05_bohn.pdf
Feng, Z W. (2006): Description of Chinese Character Structure by Context Free Grammar, in:
Linguistic Sciences (in Chinese), 22, 14–23.
Hu, J J. (2010): The Construction of the Inner Systems Model of “ShuoWen” (Unpublished
doctoral dissertation). Beijing Normal University, Beijing, China.
Liu, Z J. (2012): The Quantitative Survey of Pictophonetic Character in Pre-Qin Days, in:
Linguistic Sciences (in Chinese), 56, 88–100.
Qi, Y T. (1996): Computer Data of Xiaozhuan Shaped System, in: Research in Ancient Chinese
Language (in Chinese), 30, 25–33.
Saussure, F. (2001): Cours de linguistique generale. (Harris, R. Trans.). Beijing: Foreign
Language Teaching and Researching Press. (Original work published 1916)
Wang, N. (2013): Course of the construction theory of Chinese character form (in Chinese).
Taipei: San Min Book Co, Ltd.
Zhou, X W. (2008): Attributes of Chinese Characters Structure: Quantitative Research on
Diachronic Evolution. Beijing: China Radio and Television Publishing House.
Unauthenticated
Download Date | 4/28/16 8:42 PM