Professional Documents
Culture Documents
On The Pictorial Structure of Chinese Characters (IA Onpictorialstruc254rank)
On The Pictorial Structure of Chinese Characters (IA Onpictorialstruc254rank)
U. S. DEPARTMENT OF COMMERCE
NATIONAL BUREAU OF STANDARDS
THE NATIONAL BUREAU OF STANDARDS
The National Bureau of Standards is a principal focal point in the Federal Government for assuring
maximum application of the physical and engineering sciences to the advancement of technology in
industry and commerce. Its responsibilities include development and maintenance of the national stand-
ards of measurement, and the provisions of means for making measurements consistent with those
standards; determination of physical constants and properties of materials; development of methods
for testing materials, mechanisms, and structures, and making such tests as may be necessary, particu-
larly for government agencies; cooperation in the establishment of standard practices for incorpora-
tion in codes and specifications; advisory service to government agencies on scientific and technical
problems; invention and development of devices to serve special needs of the Government; assistance
to industry, business, and consumers in the development and acceptance of commercial standards and
simplified trade practice recommendations; administration of programs in cooperation with United
States business groups and standards organizations for the development of international standards of
practice; and maintenance of a clearinghouse for the collection and dissemination of scientific, tech-
nical, and engineering information. The scope of the Bureau's activities is suggested in the following
listing of its four Institutes and their organizational units.
Institute for Basic Standards. Electricity. Metrology. Heat. Radiation Physics. Mechanics. Ap-
plied Mathematics. Atomic Physics. Physical Chemistry. Laboratory Astrophysics.* Radio Stand-
ards Laboratory: Radio Standards Physics; Radio Standards Engineering.** Office of Standard Ref-
erence Data.
Institute for Materials Research. Analytical Chemistry. Polymers. Metallurgy. Inorganic Mate-
rials. Reactor Radiations. Cryogenics.** Office of Standard Reference Materials.
* NBS Group, Joint Institute for Laboratory Astrophysics at the University of Colorado.
** Located at Boulder, Colorado.
NATIONAL BUREAU OF STANDARDS
Technical Note 254
by
B. Kirk Rankin III
Walter A. Sillars
Robert W. Hsu
strokes
sound features. —
u
Those interested in detailed accounts of modern linguistics might
consult [ l] and [ 2] .
1
It is our contention that sets of data other than natural languages
show many of the same features and regularities as those that are
One other set of data which shares with these two types of diagrams
both their "two dimensional" nature and their tractability to linguistic
1/
our research results; in the long run, it may turn out to be inadequate
or in some sense undesirable. The model imposes the following stair^
CHARACTER — C — RADICAL
R
— C —STROKE
I
RADICAL
VARIANT |
R
I
Ours is not, by any means, the only way of describing the struc-
1/
2/
ters called radicals, some of which have variant forms. (See the list
If the character naturally splits into a left side and right side,
the radical.
However, there are many characters for which this procedure does
not give the correct radical. These characters may be divided into
two classes:
and
@ •
4
2. If the character has a natural left- right split and
which can occur on the right, then the right side is the
radical; otherwise,
otherwise,
namely:
a
When we consider in addition to these the fact that the list of difficult
characters — in Mathews includes over 10% of the total number of
work the traditional list of 214 radicals. This list has been arrived
u
That is those characters whose radicals are not obvious
2/
These rules are grouped into two classes, syntactic and lexical. The
syntactic rules have the form A = X or A = a (X, Y) where A, X and Y
are nodes of the grammar and a is one of the three symbols h, v, or s.
and a radical is any symbol which occurs on the right side of a lexical
rule. There are no symbols which are both nodes and radicals. The
syntactic rules correspond to those rules in a phrase- structure
grammar which generate from the top node down to the immediate sub-
terminal level; the lexical rules correspond to those which generate
the terminal symbols.
Syntactic Rules
CHAR = COMP; IT -
N = COMP; NT
S = COMP; ST
W = COMP; WT
E = COMP; ET
O = OT
1/ The semicolon separates optional subrules; this line is really an
abbreviation for the two rules: CHAR = COMP and CHAR = IT.
Lexical Rules
WT ET NT ST OT IT
-1/ x
a.
1'
a
x
n
1/
radicals.
CHAR (= Character) is the initial or top node of the grammar.
CHAR can be rewritten as COMP, which eventually gives a complex
symbol which heads a column in the lexical rules, the column consist-
ing of terminal symbols which can occur in isolation. COMP can be
rewritten as a vertical array having top N (North) and bottom S (South),
I ill '
~^3s» '
1r f '
anc* rt
ii ' Although not all of these are in fact
Syntactic Steps
CHAR
COMP
W,E)
WT,E)
WT, COMP)
Lexical Steps
s ( P ,
v (NT, ST)))
s ( Q , v ( + , ST)))
b (
EJ , v ( + , a )))
10
First is the question of recursion. Recall that in DZ4 there is a
potentially infinite loop through the node COMP. This has the effect of
asserting that there is no upper bound on the size of the characters, so
that \ S is just as well-formed as
\ \
,
"^ "H , etc
We
y v^
are not satisfied with this feature of the grammar, but neither are
which give rise to terminal arrays like tZj^jrJii- \i~ ^fj which does
1. TOP
They
Ax
are flying p lanes
2. TOP
12
Consider the radical J~~ , for example. Within the limits of the
seems that the class of radicals which can occur with, say, J , has
more in common with those that can occur with radicals which beyond
v
doubt belong to the north terminal class, such as ' , than it has with,
is explained by the fact that the grammar does not have a superim-
position operation. However, this is not so much a justification of our
13
"superimposed" characters with a view to defining in detail new co-
occurrence classes.
These questions are of a more general nature than those just con-
sidered in that they are not tied to any particular grammar or even to
That is, what is the class of characters which we want our grammar
to generate? We clearly would not be satisfied with a grammar which
generated all the characters in Mathews [6] and no more, for it would
have no predictive power and thus would not be able to account for
those characters which Mathews omits, those coined since its publi-
writer of Chinese as to what does and what does not constitute a well-
our informant for his evaluation. The informant accepts it (in this
classes in such a way that all members of a given class play a similar
strokes into two or more subgroups if and only if for each of the
smaller groups resulting from the proposed cut there are many com-
binations of strokes which can replace it without requiring any change
( A'./f and c£ ) rather than three since our intuition rebels at the
many combinations -which can occur over tj , there are few if any
other than 12 which can occur under *s* .
We accept the view that some radicals have variant forms. How-
ever, we depart from tradition in that we list many more instances of
u
Numbers in quotes refer to radicals of the traditional list. See
Appendix A.
15
A relatively trivial phenomenon is size variation. When a radical
size, but also its shape is distorted. For example, note the shape
English is "good" and the "bett-" and "be-" elements in "better" and
haps sufficient constraint might be that two candidates for being radical
variants of the same radical must be pictorially similar. Pictorial
u
The linguist will of course, see in this the analog of phonetic
16
/
|
problem. Both y\ and 'f are grouped into the same radical,
that one stroke of each radical occupies the same space. A clear case
share a stroke. At this point we mention that the rules to cover cases
such as this, i. e. , the rules corresponding to the morphophonemic
rules in a grammar for a spoken language, are tacitly assumed to be
and ^ may combine to produce _^> , but they may not combine to
produce J
and J-"~"
, for these latter two are not well-formed.
1/
17
There seem to be four common adjoining operations. First, there
adjoin two strokes, but which rather places them in a "near" relation;
and — are adjoined by the near-operation in .
The task here is to devise rules powerful enough to state the in-
two counts. First, it cannot account for many radicals which seem
to have nearly unique structures and, second, it seems to be able to
1/
12 ,
r] ,
and V3 .
18
is a stroke which has the two variants I and y . It might be
characterized as a "vertical hooked stroke", the direction of the
versus —
-
* .
H. Concluding Remark
It is expected that future work will provide much more in the way
of results in the last four areas. As future work is pursued, much
time will be devoted to work with native informants. The final step
ACKNOWLEDGEMENTS
For reading earlier versions of this paper and for making valuable
suggestions, we wish to thank Margaret R. Fox, Ethel C. Marden, and
19
Division for the statistical tests relating to our statement concerning
REFERENCES
1. Hockett, Charles F. A Course in Modern Linguistics,
20
APPENDIX A
15 28 43 54
L
16 29 55
X 7L
17 30 56
u i;
18 31 44 57
—7
I 71 a
32 45 58
L
19 33 46
J fl ± dj J3_
20 34 47 59
/
21 35 60
/
t %-
22 36 61
A\>
\1
23 37 48
J
X
10 24 38 49
;>^
JL
25 39 50 62
3" rp *
12 26 40 51 63
P
13 41 52 M •
nli 4,
14 27 42 53
/ \
65 -]- 79 n 93 /J- 106 / 120
?
66 80 107
I #
81 94 108 121
it V
67 \ 82 109 122
4
68 83 95
A £ 137 IJ77
69 84 96 10
70 85 in 123
/J 7\ 3r
71 112 124
\
\
/ \1 m
86 97 113 125
t
72 98 126
8 ifi
73 87 99 114 127
a \M V
74 100 115 128
3 N 4
75 88 101 116 129
//N
^ /v
77 90
#
103
^ 118
A-
t
78 91 104 131
f /I
/-A
ft.
92 105
>^ 119 132
g
^ Jf *
^ 177 194
t
133
£ 148
n *
134 149 164 178 i, 195
.4
£3 CZ
ffi i n l
£ & £ ^
166 180 197
136
# 151
v:Z
SL
4 ^
4 5 I /£*.
& % K. ;n
183 200
139
s
154
g A ^
155 169 184 201
140
4 w
ft * PI ft.
£ * m
143 159 172 188 205
vv7?
$ IE f
144 160 173 189 206
*J f *
190 207
145
^L
161
J!
*T&
^
208
162 174
i ,91
H A
% 1 Pi
212
213
214
n an
fflg
APPENDIX B
The numbers which appear in the left -most column of the lexical rules
refer to the number of strokes per radical in the list of radicals below
the number.
Syntactic Rules
CHAR COMP; IT
N COMP; NT
S COMP; ST
w COMP; WT
E COMP; ET
O OT
25
LEXICAL RULES
I. WT ET NT ST OT IT WT ET NT ST OT IT
L x L
XX X "gi
XX J
XXX V-
xx U
x Hi
X X A
T x x
A
1, x x A x x x x
2. A x x x x
J\ x
a XXX
« ><
t X x
WT ET NT ST OT IT WT ET NT ST OT IT
;l h>
41i x x x x n xxx
l-~7
jl x x x x
/ v
X
X
T xxx 7L
>c X X
/ x
/v
+ X X X X
r r
1 n
p XXX /
i x
3. WT ET NT ST OT IT WT ET NT ST OT IT
XXX £.
it /T
x
•5
ZT a x x x x
2 XXX 4r XXX
f X X
_±_ X X X X
il± x X X X
x
1
A r X X
X X X X X X X
/L, -k
x XXX
T
X X J^ X X X X
r j X X
a i
li\ X X X X «,
* XXX 4 x
> x
"f ><
-tt X X
3- ?-
X X X X
X X rh x
£
K x XXX
T 1
XXX X X X
X X * X X
\/
} x X X
3 X X
WT ET NT ST OT IT WT ET NT ST OT IT
{>
a X X
X X
d x
S #
n
i X X
4. ^ X
J X '\ X X
r
WT ET NT ST OT IT WT ET NT ST OT IT
x x p
± £
* X X
4 x X X
x X X
4- jL
x x
7\
T x
D x 4
n x ^
4
k X X X X £
7, x
Jt.
% x
WT ET NT ST OT IT WT ET NT ST OT IT
ft x
S.
B X X X X
+ ^r x x x x
E7 xx
7 XXX
t XXX
7X X
'J
£ X X X X
AJ X X
it £
h. ^
f x ^ X X
T *.
* X X X X
WT ET NT ST OT IT WT ET NT ST OT IT
^ X X
AA X X
x
fa
?
A
x
ft
&
t
7 #
X X
a
ra
.t
WT ET NT ST OT IT WT ET NT ST OT IT
x IT
/a
X X
^ X
A •27 x X X
X X T7_ X X X
it JL x
f # x
£ -%- X X
j|/ ^ X
^t
£ ^ X
£ X X X X & x x
t,
If? X X XX
I 4k xxx
tT7?
/$
T~-
% X
WT ET NT ST OT IT WT ET NT ST OT IT
fc .
x x & x
13b X X X -^ X
& X X X #) X
;
I
T X X X x 1^ X
6-
*k x
f x PX
^J.
xx 4H XX
i X X X T^
iXZ7
IS X X X X x
^ X x ^V
If x £
r£) x ^
7C;
& x x X #>
-^ XX
X X X X X
j^
WT ET NT ST OT IT WT ET NT ST OT IT
T-
±
E x x
fe
K * x X X X X
*
M *
* A x X
$ x X X
& 5
.Is
-ft
X X x X X
W 9i
A'
A-
# x
t £li|
x X
I tftj
WT ET NT ST OT IT WT ET NT ST OT IT
i X X
f <k
(3 X X
Jf X X X X it
X X 5
2. X X
f X X
X X
% XXX
f x
ij f
7.
V /
a. X X
WT ET NT ST OT IT WT ET NT ST OT IT
m X
«1^ &-
x X X X JL
fjk.
& X X X X
*
Sfc X X X X fr
X X X X f.
* A
X X
PI x x XXX £
-ft
^ x X
1.3
?, X X
X)A
* x
a X X
I x * x
*JL x X X
it XXX
WT ET NT ST OT IT WT ET NT ST OT IT
i X X fa
* ^
7ffc
XXX
4 fe
*rt
i:
p£j ^ XXX
9-plus
£ X X
X X
WT ET NT ST OT IT WT ET NT ST OT IT
£
tb
t2tt
ill
x
"f"
0^
%
DEPARTMENT OF COMMERCE POSTAGE AND FEES PAID
U.S.
WASHINGTON. D.C. 20230 U.S. DEPARTMENT OF COMME*
OFFICIAL BUSINESS