You are on page 1of 19

DOI 10.

1515/glot-2014-0005    Glottotheory 2014; 5(1): 65 – 83

Jiajia Hu* and Ning Wang

Quantitative analysis of Chinese characters
system based on a tree model
Abstract: As an ideographic system, Chinese writing is not based on the irre-
ducible elements used in speaking (e.g., syllabic or alphabetic elements) as in
phonetic systems. Consequently, there is a misperception that in Chinese writing,
each character is treated as a separate symbol, which makes it seem very diffi-
cult to learn Chinese writing. In the modern study of Chinese ideography, the
elements of Chinese writing are called components, which can play a quadruple
role in composing characters. Based on this understanding of components, this
paper uses the mathematical tree model to analyse the structure of Chinese char-
acters and create a computational description of all of the character structures
explained in Shuowen Jiezi, a character dictionary of Chinese seal script. Using
this methodology, this paper also performs a statistical analysis of synchronic
Chinese characters and provides quantitative data to illustrate the systematic
nature of Chinese writing. There is much room for further research using this

Keywords: Chinese character structure system, tree model, character component,

structure mode, character layer, direct constructing ability

*Corresponding author: Jiajia Hu: School of Chinese Language and Literature, Beijing Normal
University, Beijing 100875, China. E-mail:
Ning Wang: Research Center for Folklore, Classics and Chinese Characters, Beijing Normal
University, Beijing 100875, China

1 Introduction
Recent studies in linguistics indicate a growing interest in quantitative descrip-
tions of graphical symbols and scripts. The present contribution aims at giving
an overview of measurable properties of signs and sign systems, as well as func-
tional dependences among symbol properties (Altmann & Fan, 2008). However,
the quantitative studies on Chinese scripts have been poor (Bohn, 2002; Feng,
2006; Liu, 2012; Qi, 1996; Zhou, 2008).
There are two systems of writing (Saussure, 1916/2001). (1) In an ideographic
system, each word is represented by a single sign that is unrelated to the sounds

Download Date | 4/28/16 8:42 PM
66    Jiajia Hu and Ning Wang 

of the word itself. Each written sign stands for a whole word and consequently, for
the idea expressed by the word. The classical example of an ideographic system
of writing is Chinese. (2) The system commonly known as the phonetic system
tries to respond to the succession of sounds that constitute a word. Phonetic sys-
tems are sometimes syllabic and sometimes alphabetic, i.e., based on the irreduc-
ible elements used in speaking. As an ideographic system of writing, the most
noticeable feature of Chinese is that each character is constructed according to a
word’s meaning, which the character records. There is always, as a result, some
information pertaining to meaning that can be obtained from a character, which
is called the character’s original meaning. The traditional six principles of inquiry
into characters’ original meanings with respect to their structures are known as
LiuShu. Based on LiuShu, Wang (2002) puts forward a modern theory that uses
“character components and component functions” to analyse Chinese character
structures. In this theory, the elements of the Chinese writing system are not char-
acters or strokes but character components that carry specific meanings when
they are used to compose characters. For example, the characters 桃 (táo, means
peach), 李 (lǐ, means plum), 柳 (liǔ, means willow), 桂 (guì, means laurel), 杏
(xìng, means apricot), which have a common component 木 (mù, means wood),
are all used to record the name of a type of tree. Thus, the consistency of Chinese
characters’ forms and meanings are not only evident in each single character but
also evident in the consistent meaning of the same component in different char-
acters, which make Chinese characters an ordered system.
The Chinese character components (or components for abbreviation) are the
elements that are used to form Chinese characters, such as 木 and 兆 are the com-
ponents of 桃 whereas 木 indicates the meaning of 桃 and 兆 (zhào) indicates
the sound of 桃. A character composed of only one component, i.e., the charac-
ter itself, is called a non-composite character. A character composed of no less
than two components is called a composite character. There are two types of Chi-
nese character components: (1) primitive components, which are not joined with
other components, such as 田, 力, . . . , and (2) compound components, which are
joined with other components, such as 相, which is composed of 木 and 目, 吾
which is composed of 五 and 口, . . . . Therefore, a Chinese character is composed
of components hierarchically, with primitive components as its bottom. Table 1
shows the component hierarchy of a Chinese character.
The first layer component is also called a direct component because it has
a direct relationship with the character’s original meaning. A direct component
can play a quadruple role in composing a character: (1) a pictographic symbol,
(2) a semantic symbol, (3) a phonetic symbol and (4) a deictic symbol. They are
called direct component functions. The combination of modes of direct compo-
nents with their different functions in the formation of Chinese characters can be

Download Date | 4/28/16 8:42 PM
 Quantitative analysis of Chinese characters system    67

divided into eleven categories, the structure modes of Chinese characters, which
are listed in Table 2.
In Chinese, each composite character is composed of no less than two direct
components, whereas each non-composite character can be viewed as a compo-

Table 1: Component hierarchy of Chinese character “堤”

Character 堤

1st Layer Component (Direct Component) 土 是

2nd Layer Component 日 正

3rd Layer Component (Primitive Component) 一 止

Table 2: Eleven structure modes of Chinese characters

Mode Definition i.e.

0 single-element character/ 止
non-composite character

Deictic-pictographic composed of a pictographic 正(止+一)

component and a deictic component

Deictic-semantic composed of a semantic component 本(木+一)

and a deictic component

Deictic-phonetic composed of a phonetic component 百(白+一)

and a deictic component

Pictographic-phonetic composed of a pictographic 琴(玨+今)

component and a phonetic component

Semantic-phonetic composed of a semantic component 堤(土+是)

and a phonetic component

Comprehensive composite composed of a phonetic components 寶(缶+宀+王+貝)

with phonetic component and other components

Pictographic composite composed of two pictographic 步(止+𣥂)


Semantic-pictographic composed of a pictographic 電(电+雨)

component and a semantic component

Semantic composite composed of two semantic 是(日+正)


Comprehensive composite composed of different components 羅(网+糸+隹)

without phonetic component except phonetic component

Download Date | 4/28/16 8:42 PM
68    Jiajia Hu and Ning Wang 

sition of one direct component, which is just the character itself. Some of these
components are primitive, and the rest of them are compound. In other words, as
independent characters, they are also composed of no less than two direct com-
ponents. Take 堤 as an example. According to Shuowen, 堤 (dī, means embank-
ment) is composed of a semantic component 土 and a phonetic component 是.
土 (tǔ, means soil) is a primitive component because, as an independent charac­
ter, it is non-composite. 是 (shì, means true), as an independent character, is
composed of two semantic components 日 and 正, so it is a compound compo-
nent. 日 (rì, means sun) is also a primitive component, while 正 (zhèng, means
straight) is composed by a pictographic component 止 (zhǐ, means foot) and a
deictic component 一 (yī, means one), both of which are primitive components.

2 Computer descriptions of Chinese character

forms based on a tree model
The structure of a Chinese character, as discussed above, is hierarchical. The
relationship between a character and its direct components is the same as that
between a parent node and its children nodes in the mathematical tree model.
Therefore, the structure of a Chinese character can be viewed as a tree, with the
character itself as the root and all of the character’s primitive components as
leaves. This concept is called a formation tree in this paper. Figure 1 depicts the
formation tree of a Chinese character.

Fig. 1: Formation tree of Chinese character “堤”

Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 

Several important attributes of a formation tree should be considered: (1) the

number of the children nodes of the tree root, i.e., the number of the character
direct components; (2) the number of the leaves of the tree, i.e., the number of
the character primitive components; and (3) the height of the tree, also called the
character layer in this paper, indicating the complexity of a character. The higher
the formation tree, the more complex the character is. The formation tree of 堤
shows that there are two direct components of 堤, which are actually composed
of four primitive components through a three-layer compound. Another very im-
portant attribute determined by the combination of direct component functions
is the structure mode shown in Table 2. In this example, 堤 is a semantic-phonetic
character, as determined by the functions of its direct components 土 and 是.
For synchronic Chinese characters, it is easy to trace any character’s forma-
tion tree once every character’s direct components have been analysed, as Xu had
done in his monograph Shuowen Jiezi (Explaining Simple and Analyzing Com-
pound Characters, Shuowen for abbreviation), a character dictionary of Chinese
seal script written approximately two thousand years ago. In this ingenious book,
every non-composite character was given an independent entry to explain the
consistency of its form and meaning. Every composite character also had an in-
dependent entry that analysed its direct components and component functions
and how it was composed by these direct components according to its meaning.
In addition, every direct component that appeared in a character entry also had
its own entry as an independent character, which means that Shuowen is an in-
ternally self-sufficient book. Characters that have the same components (or same
functions) share reference to each other’s structure and meaning.
Inspired by Shuowen, this paper advances a computational methodology for
describing the system of Chinese characters on the basis of the tree model dis-
cussed above. This work uses a relational database to store all characters’ direct
components’ information. Every character is an item in the relation table. The
columns of the table are characters’ direct components and the functions of these
components. As a matter of fact, this methodology is analogous to a computa-
tional representation of the children nodes in a tree. What is more, this method-
ology involves a description of the different relationships between the parent and
its different children. There are four types of such relationships, which are in fact
the components’ four functions. Table 3 gives the computational description of
Figure 1, which is called the character formation table in this paper.
Four important attributes of a Chinese character can be obtained from its for-
mation tree: (1) the direct component number, or the number of children of the
tree root; (2) the primitive component number, or the number of the leaf nodes
of the tree; (3) the character layer, or the height of the tree; and (4) the structure
mode determined by its direct components’ functions (see Table 2), which can be

Download Date | 4/28/16 8:42 PM
70    Jiajia Hu and Ning Wang 

Table 3: Character formation table (Computational description of Figure 1)

Id Char Direct component 1 Direct component 1 Direct component 2 Direct component 2

function function

1 堤 土 Semantic 是 phonetic

2 土

3 是 日 semantic 正 semantic

4 日

5 正 止 pictographic 一 deictic

6 止

7 一

derived computationally according to the character formation table, as Table 3

shows. The rules for determining the structure modes are listed in Table 4. The IF
column lists the conditions, and the THEN column lists the corresponding con-
clusions. For example, if a character has more than two direct components and
none of its components plays a phonetic role, then the character formation mode
is Comprehensive Composite without Phonetic. According to the character forma-
tion table of every Chinese character’s direct components and components’ func-
tions, it is easy to use the BFS (breadth first search) methodology to rebuild the

Table 4: Rules to determine structure modes


Direct component numbers Direct component function Construction mode

=1 0
=2 pictographic + deictic Deictic-pictographic
=2 semantic + deictic Deictic-semantic
=2 deictic + phonetic Deictic-phonetic
=2 pictographic + phonetic Pictographic-phonetic
=2 semantic + phonetic Semantic-phonetic
=2 pictographic + pictographic Pictographic composite
=2 pictographic + semantic Semantic-pictographic
=2 semantic + semantic Semantic composite
>2 others – phonetic Comprehensive composite
without phonetic component
>2 phonetic + others Comprehensive composite with
phonetic component

Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 

formation tree of a character, with all of its components on each layer (see Figure
1). From the top to the bottom, it is a process for analysing a character’s structure.
From the bottom to the top, it is exactly the process for composing a character
with primitive components.
Based on the complete description of seal script structures (like Table 3), a
software (Hu, 2010) is developed to demonstrate the formation tree of each char-
acter in Shuowen. When searching for a Chinese character (e.g., 使), this software
will first find its items in the table and its direct components (here 吏 and 人) and
then find these components’ items in the same table and their direct components
(史 and 一). The process will continue until all components are primitive. In other
words, they have no direct components in the table. The result includes a forma-
tion tree with the searching character as its root, as in Figure 1, and its direct
component number, its primitive component number, the character layer and its
structure mode.

3 Computer analysis of Chinese character form

With the computational description of synchronic Chinese characters, it is possi-
ble to do some statistical analysis to give a panoramic view of the whole system of
characters. In addition to the four important attributes of a Chinese character dis-
cussed above (the direct component number, the primitive component number,
the character layer and the structure mode), there is a fifth important attribute re-
lated to when a character is used as a direct component to compose other charac-
ters, which is called the direct constructing ability. It related to a character’s capa-
bility. The more a character can be used to compose other characters, the greater
the capability the character has, and because it can help identify more other char-
acters, the character therefore becomes more important. For example, 木 (mù,
means wood) is used to compose 449 characters in Shuowen as a direct semantic
component, and all of these 449 characters meanings are associated with trees or
wood. Some of them are the names of different types of trees, like 桃 (táo, means
peach) and 李 (lǐ, means plum); some are the names of different parts of trees,
like 本 (běn, means root) and 枝 (zhī, means branch); some describe different
states of trees, like 枯 (kū, means withered) and 荣 (róng, means luxuriant); some
are things made of wood, like 柱 (zhù, means stake) and 桥 (qiáo, means bridge);
some of them are actions carried out with wood, like 栽 (zāi, means plant) and 构
(gòu, means construct), . . . and so on.

Download Date | 4/28/16 8:42 PM
72    Jiajia Hu and Ning Wang 

The software (Hu, 2010) uses the five attributes above and their combina-
tions to analyse and compare the systematic nature of synchronic characters at
different development stages. This paper, based on the complete description of
seal scripts in Shuowen, presents statistical data with the software to explain and
prove the ordered system of seal scripts, an ancient script style adopted in the Qin
Dynasty for the purpose of the standardisation of writing as well as serving as a
bridge connecting ancient script styles, such as oracle bone inscription or bronze
inscription, and modern script styles, such as clerical script or standard script.

3.1 Computational analysis of single attributes

This subsection uses the five attributes above to analyse the seal scripts in
Shuowen separately. The horizontal axis represents one attribute; the vertical axis
represents the number of characters with a property of this attribute.
(1) The direct component number of characters varies from one to six. Char-
acters with two direct components account for the most part, nearly 95%, and
characters with only one direct component, non-composite characters, account
for the second highest percentage, only 2.7%. Those characters with no less than
three direct components only account for no more than 2.4% (see Table 5).

Table 5: Character distribution of direct component number

Direct component number Character distribution Percentage (%)

1  257 2.73
2 8949 94.89
3   23 2.08
4  196 0.24
5    4 0.04
6    2 0.02

(2) The primitive component number of characters varies from one to

twenty-six. Characters with three to four primitive components account for more
than 40% and characters with no more than 10 primitive components account for
more than 95%. These findings indicate that the compositions of most characters
are not complicated (see Table 6).
(3) Character layers vary from one to nine. Characters with only one layer are
non-composite characters. The characters with three to four layers account for
more than 56%. Nearly 90% of characters have two to five layers (see Table 7).

Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 

Table 6: Character distribution of primitive component number

Primitive component number Character distribution Percentage (%)

 1 257 2.73
 2 1136 12.05
 3  1898 20.13
 4 1877 19.9
 5 1389 14.73
 6 1007 10.68
 7 708 7.51
 8 468 4.96
 9 295 3.13
10 165 1.75
11 77 0.82
12 80 0.85
13 41 0.43
14 16 0.17
15 8 0.08
16 2 0.02
17 2 0.02
18 1 0.01
19 1 0.01
20 0 0
21 0 0
22 1 0.01
23 1 0.01
24 0 0
25 0 0
26 1 0.01

Table 7: Character distribution of character layer

Character layer Character distribution Percentage (%)

1 257 2.73
2 1203 12.76
3 2783 29.51
4 2504 26.55
5 1885 19.99
6 632 6.7
7 127 1.35
8 37 0.39
9 3 0.03

Download Date | 4/28/16 8:42 PM
74    Jiajia Hu and Ning Wang 

(4) The semantic-phonetic characters account for more than 84%, followed
by semantic comprehensive characters and non-composite characters. These
three formation modes account for 95% total (see Table 8).

Table 8: Character distribution of structure mode

Structure mode Character Percentage (%)


0 257 2.73
Deictic-pictographic 6 0.06
Deictic-semantic 68 0.72
Deictic-phonetic 1 0.01
Pictographic-phonetic 9 0.1
Semantic-phonetic 7926 84.04
Comprehensive composite with phonetic component 138 1.46
Pictographic composite 7 0.07
Semantic-pictographic 78 0.83
Semantic composite 854 9.06
Comprehensive composite without phonetic component 87 0.92

(5) Eighty percent of characters never included other characters as direct com-
ponents. Only 20% of characters have included other characters as direct compo-
nents, and the functions of these characters vary substantially. Several compo-
nents with powerful functions could compose tens or hundreds of characters,
and many components could only compose several other characters (see Figure
2). The direct constructing ability of characters (represented by F) and their ranks
(represented by R) are approximated with a Zipfian distribution (FR0.82=1324). It
can be easily observed by plotting the data with log-log coordinates, in which the
plot is linear with a negative slope (see Figure 3).

Download Date | 4/28/16 8:42 PM
 Quantitative analysis of Chinese characters system    75

Fig. 2: Character distribution of direct constructing ability

Fig. 3: Direct constructing ability of characters and their ranks (log-log coordinate)

Download Date | 4/28/16 8:42 PM
76    Jiajia Hu and Ning Wang 

3.2 Computational analysis of paired attributes

This subsection uses the paired combinations of the five attributes above to ana­
lyse the seal scripts in Shuowen. The vertical axis represents one attribute, and the
colour represents the other attribute. Each pairwise combination performs two
types of analysis, the distribution of the character number with the paired attri-
butes and the distribution of the direct constructing ability of these characters.
They are both represented by the horizontal axis.
According to Tables 9–14, we can see that no matter how many primitive com-
ponents they have, most seal scripts have two direct components, and their layers
mainly distribute in two to six. As the primitive component number and character
layer increase, the capability (direct character-formation number) of these char-
acters declines rapidly. However, with the same number of primitive components,
the fewer their layers, the more capability these characters may have. The char-
acters that have more than seven primitive components can hardly be used to
compose other characters.

Table 9: Character distribution of direct component number and primitive component number

Direct Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

1 257
2 1136 1843 1826 1356 976 688 452 287 162 73 77 41
3 55 41 28 26 18 13 7 1 3 3
4 10 5 3 1 2 2
5 1 1 1 1
6 1 1

14 15 16 17 18 19 20 21 22 23 24 25 26

2 15 8 2 2 1 1 1 1 1
3  1

Download Date | 4/28/16 8:42 PM
 Quantitative analysis of Chinese characters system    77
Table 10: Direct constructing ability distribution of direct component number and primitive
component number

Direct Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

1 7174
2 4748 2219 1267 733 584 185 100 30 9 28 4 0
3 270 147 100 63 276 7 18 0 26 1
4 81 17 6 5 0 4
5 4 0 1 3
6 1 3

14 15 16 17 18 19 20 21 22 23 24 25 26

2 1 0 0 0 0 0 2 0 0
3 1

Table 11: Character distribution of direct component number and character layer

Direct Character layer

1 2 3 4 5 6 7 8 9

1 257
2 1137 2706 2456 1860 626 124 37 3
3 55 65 45 22 6 3
4 10 9 2 2
5 3 1
6 1 1

Download Date | 4/28/16 8:42 PM
78    Jiajia Hu and Ning Wang 

Table 12: Direct constructing ability distribution of direct component number and character

Direct Character layer

1 2 3 4 5 6 7 8 9

1 7174
2 4748 2685 1758 586 104 26 3 0
3 270 250 309 46 23 11
4 81 28 0 4
5 5 3
6 1 3

Table 13: Character distribution of character layer and primitive component number

Character Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

1 257
2 1136 55 10 2
3 1843 761 140 32 6 1
4 1106 863 373 112 37 7 3 2 1
5 306 504 432 262 138 83 36 29 12
6 96 146 135 106 58 34 30 14
7 12 31 32 12 3 15 11
8 2 12 9 2 5  2

14 15 16 17 18 19 20 21 22 23 24 25 26

6 5 4 2 1 1
7 7 2 1 1
8 1
9 1

Download Date | 4/28/16 8:42 PM
 Quantitative analysis of Chinese characters system    79

Table 14: Direct constructing ability distribution of direct component number and character

Character Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

1 7174
2 4748 270 81 1
3 2219 528 160 54 6 1
4 886 511 370 285 9 4 2 3 0
5 179 207 125 59 34 8 25 1  0
6 26 47 22 3 1 27 1  0
7 3 17 10 2 2 0  0
8 0 0 0 0 3  0

14 15 16 17 18 19 20 21 22 23 24 25 26

6 0 0 0 0 0
7 1 0 0 2
8  0
9 0

According to Tables 15–20, in which, for the sake of convenience, the eleven
formation modes are combined into three categories: non-composite characters,
composite characters without phonetic components and composite charac-
ters with phonetic features, we can see that except for non-composite charac-
ters, seal scripts of all layers with phonetic component outnumber seal scripts
without phonetic component, and with the increase of character layers, the gap
also increases. Meanwhile, except for non-composite characters, the direct con-
structing ability of seal scripts without phonetic component is stronger than seal
scripts with phonetic component. However, as the primitive component number
and character layer increase, the capability of these characters without phonetic
component declines, but the capability of these characters with phonetic compo-
nent shows a parabolic trend that peaks as the primitive component number and
layers are both four.

Download Date | 4/28/16 8:42 PM
80    Jiajia Hu and Ning Wang 

Table 15: Character distribution of structure modes and direct component number

Structure mode Character layer

1 2 3 4 5 6

Non-composite 257
Composite with phonetic component 1047 119 14 3 2
Composite without phonetic component 7935  77  9 1

Table 16: Direct constructing ability distribution of structure modes and direct component

Structure mode Character layer

1 2 3 4 5 6

Non-composite 7174
Composite with phonetic component 6661 505 82 4 4
Composite without phonetic component 3249 404 31 4

Table 17: Character distribution of structure modes and primitive character number

Structure mode Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

Non-composite 257
Composite with 350  315  197  112  85  36  25  11   8  6  4  1
Composite 786 1583 1680 1277 922 672 443 284 157 71 76 40
without phonetic

14 15 16 17 18 19 20 21 22 23 24 25 26

Composite with  1 1
Composite 15 8 2 2 1 1 1 1
without phonetic

Download Date | 4/28/16 8:42 PM
Quantitative analysis of Chinese characters system 

Table 18: Direct constructing ability distribution of structure modes and primitive component

Structure mode Primitive character number

1 2 3 4 5 6 7 8 9 10 11 12 13

Non-composite 257
Composite with 350  315  197  112  85  36  25  11   8  6  4  1
Composite 786 1583 1680 1277 922 672 443 284 157 71 76 40
without phonetic

14 15 16 17 18 19 20 21 22 23 24 25 26

Composite with  1 1
Composite 15 8 2 2 1 1 1 1
without phonetic

Table 19: Character distribution of structure modes and character layer

Structure mode Character layer

1 2 3 4 5 6 7 8 9

Non-Composite 257
Composite with phonetic component 399  426  185  112  24   5  1
Composite without phonetic component 804 2357 2319 1773 608 122 36 3

Table 20: Direct constructing ability distribution of structure modes and character layer

Structure mode Character layer

1 2 3 4 5 6 7 8 9

Non-composite 7174
Composite with phonetic component 4193 2022  698 303 36  4 0
Composite without phonetic component  907  946 1372 336 91 33 3 0

Download Date | 4/28/16 8:42 PM
82    Jiajia Hu and Ning Wang 

The results indicate that the sealed script in Shuowen uses approximately
257 primitive components, which account for no more than 3% of all characters,
to construct a strict hierarchical system layer by layer. Most characters are com-
posed of two direct components, a semantic component and a phonetic compo-
nent, which is the most economic formation mode for indicating both a charac-
ter’s meaning and sounding. More than 80% of characters are composed of two
to six primitive components, forming a tree with two to five layers. It can be con-
cluded that there are mainly five patterns of seal script formation trees, as Figure
4 shows: 1) two leaves, two layers; 2) three leaves, three layers; 3) four leaves,
three or four layers (mainly four); 4) five leaves, four or five layers (mainly four);
and 5) six leaves, four or five layers (mainly five). Furthermore, the phonetic com-
ponents of seal scripts are mostly non-composite characters or characters with-
out phonetic component. However, the increase in character layers is determined
by a character’s phonetic component. In other words, the phonetic component
of a character may usually be more complicated than its semantic component,
which is consistent with both human intuition and research.

Fig. 4: Main patterns of seal scripts’ formation tree

4 Further research
The Chinese writing system as we know it today has existed for well near four
thousand years. In the long course of history, Chinese script style has undergone
significant transformation, and there have been five major stages in its develop-
ment: oracle bone inscription, bronze inscription, seal script, clerical script and
standard script. These stages can be further divided into two categories: ancient
Chinese script (including the first three stages) and modern Chinese script (in-
cluding the latter two stages). Anything more than a cursory glance at Chinese
characters will reveal a high degree of structure to them. However, what exactly

Download Date | 4/28/16 8:42 PM
 Quantitative analysis of Chinese characters system    83

is the structure? Where did the structure come from? How do we describe this
structure? There are already some studies that try to define and analyse the struc-
ture of Chinese characters, but most of them focus on modern Chinese characters,
which people think is more useful than focusing on ancient Chinese characters.
These studies usually take strokes as the structural units of Chinese characters.
However, this line of thought with respect to the strokes of Chinese characters
only applies to modern Chinese script. What we want to emphasise here is that
the development of Chinese characters is a continuous process. The seal script is
well known as the bridge between ancient and modern Chinese script. Therefore,
the methodology we put forward in this paper can be extended easily to both
ancient and modern Chinese characters. We have already begun this research. By
analysing and comparing synchronic scripts at different stages, we hope to find
the common nature and developmental tendencies of Chinese writing. This is a
huge project, but we believe it may lead to substantial new developments in the
study of ideography and provide more useful guidance to application, as well.

Funding: This paper was supported by the Fundamental Research Funds for the
Central Universities.

Altmann, G., & Fan, F X. (Eds.) (2008): Analyses of Script: Properties of Characters and Writing
Systems (Quantitative Linguistics, 63). Berlin. New York: Mouton De Gruyter.
Bohn, H. (2002): Untersuchungen zur chinesischen Sprache und Schrift [Studies on the Chinese
language and writing]. In Köhler, R. (Eds.), Korpuslinguistische Untersuchungn zur
quantitativen und systemtheoretischen Linguistick [Corpus studies in quantitative and
systems theoretical linguistics] (127–177). Retrieved from
Feng, Z W. (2006): Description of Chinese Character Structure by Context Free Grammar, in:
Linguistic Sciences (in Chinese), 22, 14–23.
Hu, J J. (2010): The Construction of the Inner Systems Model of “ShuoWen” (Unpublished
doctoral dissertation). Beijing Normal University, Beijing, China.
Liu, Z J. (2012): The Quantitative Survey of Pictophonetic Character in Pre-Qin Days, in:
Linguistic Sciences (in Chinese), 56, 88–100.
Qi, Y T. (1996): Computer Data of Xiaozhuan Shaped System, in: Research in Ancient Chinese
Language (in Chinese), 30, 25–33.
Saussure, F. (2001): Cours de linguistique generale. (Harris, R. Trans.). Beijing: Foreign
Language Teaching and Researching Press. (Original work published 1916)
Wang, N. (2013): Course of the construction theory of Chinese character form (in Chinese).
Taipei: San Min Book Co, Ltd.
Zhou, X W. (2008): Attributes of Chinese Characters Structure: Quantitative Research on
Diachronic Evolution. Beijing: China Radio and Television Publishing House.

Download Date | 4/28/16 8:42 PM

You might also like