You are on page 1of 12

368 Int. J. Data Mining and Bioinformatics, Vol. 24, No.

4, 2020

Study on the rule and logical relationship of TCM


prescription for lumbago

Yong Xiao
College of Infonnation Engineering,
Hubei University of Chinese Medicine,
Wuhan 430065, China
E1nail: xy10l5 @hbtcm.edu.cn

Jiaozhi Wang
College of Infonnation Engineering,
Wenzhou Medical University,
Wenzhou 325035, Zhejiang, China
E1nail: wjz@ l26.com

Shaowu Shen
Institute of Standardisation and Infonnation Technology,
College of Infonnation Engineering,
Hubei University of Chinese Medicine,
Wuhan 430065, China
E1nail: ssw6211 @ l63.com

Xiaoqiong Wang
Infonnation Centre,
Mianyang Hospital of Traditional Chinese Medicine,
Mianyang 621000, China
E1nail: wxq@ l26.com

Yunfang Liu
Department of Infonnation Management & Engineering,
Law & Business College ofHubei University of Economics,
Wuhan 430205, Hubei, China
E1nail: liuxiangnv@ l63.com

Yan Wang*
Infonnation Engineering College,
Hubei University of Chinese Medicine,
Wuhan 430065, China
E1nail: shuxuewy20 l6@hbtcm.edu.cn
*Corresponding author

Copyright © 2020 Inderscience Enterprises Ltd.


Study on the rule and logical relationship ofTCM 369

Abstract: Based on the prescription information of low back pain obtained


from the hospital electronic medical record system, this paper first analyses the
efficacy of drugs, the rules of drug such as four qi and five taste, and then the
Apriori algorithm is used to find the association rules of prescription drugs.
Finally, the high-order logic analysis method based on probability logic is used
to mine the high-order logic relationship of prescription drugs.

Keywords: lumbago ; electronic medical record ; prescription regularity ; logic


relationships.

Reference to this paper should be made as follows : Xiao, Y. , Wang, J. ,


Shen, S. , Wang, X. , Liu, Y. and Wang, Y. (2020) ' Study on the rule and logical
relationship of TCM prescription for lumbago' , Int. J. Data Mining and
Bioinformatics, Vol. 24, No. 4, pp.368-379.

Biogra1>hical notes: Yong Xiao is an Associate Professor and Master Advisor


in the College of Information Engineering, Hubei University of Chinese
Medicine. His research interests are TCM informatisation and TCM
information standards.

Jiaozhi Wang is a Teacher in College of Information Engineering, Wenzhou


Medical University. His main research interest is the digitisation of TCM
clinical data.

Shaowu Shen is a Professor and Doctoral Advisor in the College of


Information Engineering, Hubei University of Chinese Medicine. Since 2005 ,
he has been engaged in the research of traditional Chinese medicine
standardisation and information technology.

Xiaoqiong Wang is a Senior Laboratory Scientist in the Information Centre,


Mianyang Hospital of Traditional Chinese Medicine. Her research direction is
information management.

Yunfang Liu is an Associate Professor in the Department of Information


Management and Engineering and her research interest is data mining.

Yan Wang is an Associate Professor and Master Advisor in the College of


Information Engineering, Hubei University of Chinese Medicine. Her current
research interests are mathematics, Statistics, machine learning and data
mining.

1 Introduction

With the rise of data mining technology, more and more research has been widely used in
the field of traditional Chinese medicine, which is of great significance to promote the
informatisation and modernisation of traditional Chinese medicine (Liu et al. , 2020).
TCM electronic medical record is in line with the characteristics of TCM clinical records,
to meet the medical, legal and management needs of electronic medical record (Song
et al. , 2020). How to effectively use patients ' electronic medical record infonnation to
assist doctors in diagnosis is a field worthy of further research. In order to improve the
utilisation rate of clinical data resources, sunnnarise common clinical rules and remedies
and provide data support for the modernisation of traditional Chinese medicine, data
370 Y. Xiao et al.

mining of traditional Chinese medicine electronic medical record data is carried out by
using the technical method of data analysis and mining of traditional Chinese medicine
electronic medical record data (Gao et al. , 2019).
Records of lumbago first appeared in the Huangdi Neijing, which refers to a kind of
disease with lumbago, bilateral or lumbar spine pain as the main disease, often
accompanied by pain in the back, legs, knees, etc., and its aetiology and pathogenesis are
related to internal injury, deficiency of kidney, cold and dampness, blood stasis,
exogenous pathogenic factors and seasonal changes. Use the data mining technology to
explore and the prescription drug law of lumbago, which can provide the ideas for the
clinical treatment of this disease.

2 Research methods

2.1 Frequency analysis


Frequency refers to the number of occurrences of an object representing a feature in a
variable value. Frequency table compiled with frequency is one of the basic tools
frequently used in statistical description. The data distribution features and types can be
easily mastered through frequency table, which is more connnonly used in medical case
data mining. Based on different research purposes or different data acquisition, the
selected objects representing a certain feature may have different amounts of
prescription, traditional Chinese medicine and prescription medicine. Frequency is not
only the basis of statistical inference, but also is often used to construct other statistical
indicators in data mining analysis such as formula compatibility rule correlation and
clustering.

2.2 Association rules


In the knowledge pattern of data mining, association rules are one of the commonly used
algorithms, which are mainly used to discover some meaningful relationships hidden in
massive data sets. The key concepts in association rule analysis include: support degree,
confidence degree and promotion degree. The attributes of association rules are
ex'J)ressed by these parameters.
1) Support: the probability of event A and event B occurring simultaneously. If there
are 1000 Chinese medicines in a certain clinical data and 1000 prescriptions contain
licorice and Pinellia Chinensis, then tl1e support degree of the above association rules
is 10%.

S{A ~ B} = NA-,8
N
2) Confidence: the probability that event B also occurs under the premise that event A
occurs. For example, in a store, 70% of the customers who bought diapers also
bought beer, so the confidence level is 70%.
S{A ~B}
S{A}
Study on the rule and logical relationship ofTCM 371

3) Lift: describe how much influence the occurrence of event A has on the occurrence
of event B , and the ratio of the occurrence of event B to the confidence degree in the
absence of any constraint, that is, whether the probability will be improved with or
without this rule.
The analysis method of association rules is simple, but the calculation is heavy,
especially for the massive data. The famous Apriori algoritlnn is a classic algorithm of
association rules, which can effectively implement association rule analysis. Its basic
idea is to use quality indexes such as support degree, confidence degree and promotion
degree to select combinations that meet the requirements and reduce the complexity of
the final result. Create association rules by searching for data in frequent if-then patterns
and using support and confidence to identify the most important relationships.

2. 3 Logic relationships analysis


Bowers et al. (2004) proposed the Logic Analysis of Phylogenetic Profiles (LAPP),
which is different from the usual method of machine modelling. The presented method
LAPP starts from the expression data of elements, through a series of effective logical
analysis to discover logical relationships between elements. The logical connection
makes it possible to establish the mechanism of the logical model through the data
analysis. The core of LAPP is how to find the logical relationship between nodes through
sample data. Bowers used the U-value algorithm to find the logical relationship between
elements.
Bowers calculated uncertainty coefficients U to define the existence of logical
relations between node A and node B , where

U (B I A ) = [ H ( B) + H ( A ) - H (A, B)] I H ( B)

and H refers to the entropy of the nodes. The value of U can range between O and 1. The
variable U ( B I A) denotes the influence on the certainty of B when A is certain. The
value of U represents the existence of an uncertain logical relationship between nodes A
and B. The first-order logic between elements A and B has two types of corresponding
synchronisation (A ~ B) , a synchronisation (A-.B) , the functional form is denoted as.fi ,
then the uncertainty coefficient is written as:

U (B I J; (A))= H(B)+H(J; (A ))-H(J; (A ),B)/H(B)

The logic relationship beh¥een only two nodes is called first-order logic, while the logic
between three and four nodes is called second-order logic and third-order logic,
collectively known as higher-order logic. The second order logic relations are similar,
only calculate U (c I f (a , b)) can we get the possibility of the existence of some second-
order logic relations between a, b and c, here the function f is a class of second order
logic types (Hammervoll, 2009). In general, second order logic is only found when there
is no first order logic, so we need to compute U ( c I a ) and U ( c I b) , the existence
372 Y. Xiao et al.

of second-order logic is considered only if both U (c I a) « U (c If (a , b)) and


U (c Ib) « U (c I f (a, b)) are true. Similarly, the second-order logical relationship
between the three random variables A , Band C can be expressed as the effective function
of the uncertainty coefficient h:

U (CI fz (A ,B)) = H(C)+H(J2 (A,B))-H( C, J; (A ,B))/H (C)


There are 8 cases in};, as shown in Table 1.
Table 1 Types of higher order logic relationships

TYPE LOGIC FUNCTION


C=AIIB C appears only when both A
and B appear

2 C=-(AIIB ) C appears when A does not


appear or B does not appear

3 C=AVB When A appears or B


appears, C appears

4 C=-(AVB ) C appears only when A does


not appear and B does not
appear
5 C=AII-B , C appears when A appears
C=-An B , and B does not appear or C
appears when A does not
appear and B appears
6 C= ~ A vB C appears when A appears or
B does not appear or C
appears when A does not
appear or B appears

DJ
7 C= ~ (A +-+B) C occurs only when either A
orB occurs

rn C= (A +-+B) C occurs when both A and B


are present or when neither A
nor B is present
Study on the rule and logical relationship of TCM 373

3 Case study

3.1 Data source


With the first symptom "low back pain" as the inscription, the whole process diagnosis
and treatment records of low back pain patients in the rehabilitation department of
Mianyang Hospital of Traditional Chinese Medicine from January 1, 2015 to August 27,
2019 were searched. A total of 360 patient admission diagnosis records, 2756 diagnosis
and treatment process records and 360 discharge records were obtained. There were 350
prescription records in the diagnosis and treatment process records, and 13 incomplete
records of prescription data were excluded, and 337 prescription data were finally
included.

3.2 Data normalisation


The original data of electronic medical records usually have a lot of useless or low value
density noise data, which will interfere with the process and results of data analysis, and
affect the efficiency of data analysis and the accuracy of results. Therefore, it should be
cleaned and standardised treatment. Data cleaning process is the process of data
purification, purification, noise reduction and redundancy removal, with the purpose of
eliminating noise such as outliers, repeated data and wrong data. The process of data
nonnalisation is not only the standardisation of data content and form, but also the
standardisation of data processing process and data analysis standards. The included 337
prescription data were standardised with reference to the Phannacopoeia of the People ' s
Republic of China (2015 edition) and Traditional Chinese Medicine (TCM), and TCM
with different names were treated unifonnly (Gao and Miao, 2020).

3. 3 Frequency analysis
Among the 33 7 prescriptions, there are 23 traditional Chinese medicines, The number of
occurrences of the drug is sorted from large to small, the top 10 occurrences of
Traditional Chinese medicine is shown in Figure 1.
Look from the statistical frequency of single herbs, drug use frequency is more than
150 times for licorice root (199) and Ligusticum Chuanxiong (167), followed by
twotoothed achyranthes root (116), euco1mnia bark (112), poria cocos (110), orange fruit
(108), peony root (105), officinal magnolia bark (100), which can be found, licorice and
Ligusticum Chuanxiong drug use frequency is much higher than other drugs. From the
perspective of efficacy, Licorice root is sweet and smooth, can nourish the spleen and qi.
Ligusticum Chuanxiong is a common medicine to promote blood circulation and relieve
pain. It can be seen that invigorating qi and activating blood circulation is the key
treatment for low back pain, reflecting that qi stagnation and blood stasis are its basic
pathogenesis and run through the whole course of the disease.
374 Y. Xiao et al.

Figure 1 Frequency distribution

co:ix seed
Chnese angelra
offrnalm agno fu bark
peony root
orange fiuit
porn cocos
eucom ill n bark
1w o 1D oth achyran thes root
L:igusttum Chuanx:bng
li[uori:e root

0 50 100 150 200 250

Table 2 Statistical indicators

No. TCMname frequ ency percentage Dmean Dmin Dmax SD


liquorice root 199 59.05% 7.52 25 2.87
2 Ligusticum Chuanxiong 167 49.55% 14.04 10 30 3.26
3 twotoothed achyranthes root 116 34.42% 15.03 10 30 2.24
4 eucommia bark 112 33.23% 16.75 10 30 3.85
5 Poria Cocos 110 32.64% 20.3 10 100 12.62
6 orange fruit 108 32.05% 15.08 10 30 2.48
7 peony root 105 31.16% 14.98 9 20 1.89
8 officinal magnolia bark 100 29.67% 14.94 10 20 0.82
9 Chinese angelica 99 29.38% 14.64 10 30 3.76
10 coix seed 93 27.60% 29.14 20 30 2.8

3.4 Drug property frequency analysis

The understanding of herbal medicine in Chinese medicine is mainly based on two


aspects: "qi" and "taste" . Qi refers to the properties of cold, hot, wann and cool, which
are called the "four qi". Taste refers to the five tastes of sour, bitter, sweet, spicy and
salty, which are called the "five flavours" with the development process of later
generations, besides cold, hot, wann and cool, the "flat" nature was supplemented, and
the "tasteless" flavour was supplemented in addition to the five flavours.
Among the TCM prescriptions for lumbago selected by the electronic medical record
system, the four qi are mainly warm and flat, the five flavours are mainly sweet, spicy
and bitter and the liver, spleen, lung, kidney, stomach and heart are the main meridians.
The results are shown in the Figures 2 below:
Study on the rule and logical relationship of TCM 375

Figure 2 Frequency distribution of the four qi

1366
1153

629 593
501

188
49 11 8

warm flat micro cold micro coo l big big hot


cold hot hot cold

Figure 3 Frequency distribution of the five flavours

slightly spicy
slightly sweet
astri ngent
salty
slightly bitter
tasteless
sour
bitter
spicy
sweet
0 500 1000 1500 2000 2500

Figure 4 Frequency distribution of channel tropism

sanj iao
sma ll intestin e
pericard ium
large intestin e
gallbl adder
bladder
heart
stomach
kidney
lung
spleen
liver
0 500 1000 1500 2000 2500
376 Y. Xiao et al.

Combined with the analysis of drug efficacy, the efficacy of drug is mostly strengthening
muscles and bones, tonifying liver and kidney, and removing rheumatism. In terms of the
four qi and five flavours, sweet can be used to nourish and relieve temper, warm the
kidney to help Yang, smooth the effect of drugs to protect the spleen and kidney in the
deficiencies. Various drugs and tastes can be combined with multiple methods to achieve
the purpose of treating both symptoms and root causes.

3.5 Drug efficacy analysis


Among the 370 prescriptions, the efficacy frequency ~ 200 were strengthening muscles
and bones, tonifying liver and kidney, dispelling wind and dampness, dispelling wind
and analgesia, clearing heat and detoxifying toxin, moistening bowel and relieving
constipation, followed by the efficacy of pacifying fetus, tonifying spleen and
invigorating qi, relieving acute pain, relieving phlegm and relieving cough, harmonising
various drugs, clearing heat and cooling blood, benefiting water and reducing swelling,
etc. The result is shown in the Table 3.
Table 3 Efficacy distribution frequency of traditional Chinese medicine (top 10)

N o. efficacy fr equency No. efficacy fr equency


Strong Bones And moderating the property of
272 11 199
Muscles hems
reinforce liver and removing pathogenic heat
2 260 12 196
kidney from blood
inducing diuresis for
3 dispel wind-damp 258 13 184
removing edema
wind-expelling and inducing diuresis for treating
4 242 14 179
pain-alleviatin strangurtia
clearing away heat dispelling wind and
5 240 15 178
and toxic materials eliminating dampness
6 relaxing bowel 210 16 enhance blood circulation 167
miscarriage Regulate the menstrual
7 199 17 166
prevention function analgesic
invigorating spleen Remove congestion and
8 199 18 156
and replenishing qi regulate menstruation
relieving spasm and eliminating stasis to stop
9 199 19 145
pain pain
Resolve phlegm to Promote urination and leach
10 199 20 135
relive cough out damp

3. 6 Analysis ofdrug association rules


Owing the strong correlation between drugs, the minimum support degree was set at
10%, the minimum confidence degree was set at 80%, and the maximum of the
preceding item is 5. Association rules were analysed for drugs. Under this condition, a
total of 975 rules are generated, and 15 rules with high correlation degree are selected,
the results are shown in the following Tables 4 and 5.
Study on the rule and logical relationship of TCM 377

Table 4 Association rules for combination of two drugs

Top 10 with high support


!DX Support Confidence Lift
Drug group Examples
(%) (%) (%)
Chinese taxillus twig-----+eucommia bark 91 27.003 84.615 2.546
largeleaf gentian root-----+Chinese taxillus
2 76 22.552 88.158 3.265
twig
largeleaf gentian root-----+eucommia bark 76 22.552 86.842 2.613
largeleaf gentian root-----+Ligusticum
4 76 22.552 84.211 1.699
Chuanxiong
cortex phellodendri-----+atractylodes
5 43 12.76 86.047 5.178
rhizome
6 atractylodes rhizome-----+coix seed 43 12.76 83.721 3.034
7 the root offangji-notopterygium root 39 11.573 94.872 4.634
8 the root offangji-largeleaf gentian root 39 11.573 97.436 4.321
9 the root offangji-Chinese taxillus twig 39 11.573 94.872 3.513
10 the root offangji-peony root 39 11.573 100 3.21

Table 5 Association rules for three drug combinations

Top 10 with high support


!DX Support Confidence Lift
Drug group Examples
(%) (%) (%)
Chinese taxillus twig + eucommia
77 22.849 84.416 3.743
bark-largeleaf gentian root
Chinese ta"'l'.illus twig + eucommia
2 77 22.849 80.519 1.625
bark-Ligusticum Chuanxiong
eucommia bark + twotoothed achyranthes
70 20.772 82.857 3.068
root-----+Chinese taxillus twig
Chinese taxillus twig + liquorice
4 68 20.178 85.294 2.566
root-----+eucommia bark
largeleaf gentian root + Chinese taxillus
5 67 19.881 97.015 2.919
twig-eucommia bark
largeleaf gentian root + Chinese taxillus
6 67 19.881 82.09 2.385
twig-twotoothed achyranthes root
largeleaf gentian root + Chinese taxillus
7 67 19.881 82.09 1.657
twig-Ligusticum Chuanxiong
Chinese taxillus twig + Ligusticum
8 67 19.881 82.09 3.64
Chuanxiong-largeleaf gentian root
Chinese taxillus twig + Ligusticum
9 67 19.881 92.537 2.784
Chuanxiong-eucommia bark
twotoothed achyranthes root+
10 67 19.881 83.582 2.683
Ligusticum Chuanxiong-peony root
378 Y. Xiao et al.

3. 7 Logic relationships analysis


Informative triplets are selected based on a lower and upper threshold. Call a candidate
triplet c=f(a,b) good if U (cla) <=0.4 and U (clb) <=0.4 and U(clf(a,b)) >=0.6.
In this paper, every drug owns a serial number and U (c I a)= 0.4, U (c I b) = 0.4 ,
U (c I f (a , b)) = 0 .6 are selected to get the result by means of codeblocks (Sindhu and
Kannan, 2014).
Firstly, the data is pre-processed and the prescription data is expressed as a matrix. If
the element is bigger than zero, then the labelled element is equal to one; otherwise, it is
marked as zero, so that all elements in the matrix only have values 0, 1. After data pre-
processing, the probability logic method is used to calculate the data, calculate the first-
order relationship of two drugs and the U-value of the logical relationship between two
drugs and another drug, and then select the threshold value, so that the first-order
relationship of three drug combinations is less than the threshold value and the second-
order relationship is greater than the threshold value. Then, the combination of different
types of drug triad under different thresholds are counted.
Only when the first-order relationship between the three drugs is less than the
threshold value can we consider that there is no first-order relationship between them and
there may be a high-order relationship between them. If U-values of different logical
relationships in A group of drug triplets are the same, calculate the hamming distance
between drug c and f (a, b) and retain the logical relationship type with small hamming
distance. If the values of U-values of different logical relationships of a set of drug
triplets all meet the threshold, we choose the logical relationship value with the largest
U-value, because we want to identify the logical combination of drug triplets containing
more infonnation.
According to the results of association rules in this study, under the condition of2'.:10°
of support, the root of fangji showed the highest occurrence of 15 association rules for
7 times, followed by largeleaf gentian root, Chinese taxillus twig, eucommia bark, and
Ligusticum Chuanxiong. The c01mnonly used drugs with strong correlation in clinical
practice are mainly wind-dispelling dampness, invigorating deficiency and promoting
blood circulation and removing blood stasis.
Every drug owns a serial number, for example Largeleaf Gentian Roo has a number
V0161 , corresponding to the No.161 Chinese herb. All the higher-order relationships
associated with V0161 are extracted shown in Figure 6. The total logical number is 33 ,
among them, among them, there are 21 logical relations of the third type and 12 logical
relations of the seventh type, parts are shown in Table 6.
Table 6 The higher-order relationships of Largeleaf Gentian Roo

TYPE Number N umber Num ber First-order First-order First-order


#3# V0161 V0122 V0283 0 .102769 0.392394 0.539915
#3# V0161 V0283 V0313 0 .3 92394 0.148303 0.552862
#7# V0161 V0268 V0283 0 .057524 0.392394 0.527684
#3# V0161 V0122 V0283 0 .102769 0.392394 0.539915
#3# V0161 V0283 V0313 0 .3 92394 0.148303 0.552862
Study on the rule and logical relationship of TCM 379

Table 6 The higher-order relationships of Largeleaf Gentian Roo (continued)

TYPE Number N umber Number First-order First-order First-order


#7# V0161 V0268 V0283 0.057524 0.392394 0.527684
#3# V0161 V0122 V0283 0.102769 0.392394 0.539915
#3# V0161 V0283 V0313 0.392394 0.148303 0.552862
#7# V0161 V0268 V0283 0.057524 0.392394 0.527684
#3# V0161 V0122 V0283 0.102769 0.392394 0.539915
#3# V0161 V0283 V0313 0.392394 0.148303 0.552862
#7# V0161 V0268 V0283 0.057524 0.392394 0.527684

4 Conclusion

In this study, the data mining technology and the Apriori algorithm were used to explore
and analyse the prescription drug law, and the drug law of the rehabilitation department
of Mianyang Traditional Chinese Medicine Hospital for the treatment of lumbago was
studied to provide ideas for the clinical treatment of this disease.

Acknowledgement

The research is supported by the Science and Technology research project of Hubei
Provincial Education Department (No. B2020095)

References
Bowers, P. , Cokus, S., Eisenberg, D. and Yeates, T. (2004) ' Use of logic relationships to decipher
protein network organization', Science, Vol. 5705 , No. 306, pp.2246-2249.
Gao, T. and Miao, M. (2020) ' Analysis of pinellia ternata drug use rule based on association rule ' ,
Chinese Journal of Traditional Chinese Medicine, Vol. 35, No. 3, pp.627-631.
Gao, Y. , Wang, F . and Guo, J. (2019) ' Application of data mining technology in the field of
traditional Chinese medicine ' , Hunan Journal of Traditional Chinese Medicine , Vol. 35,
No. 7, pp.182-185 .
Hammervoll, T. (2009) ' Value-creation logic in supply chain relationships' , Journal of Business-
to-Business Marketing, Vol. 16, No. 3, pp.220-241.
Liu, J. , Huang, Y. , Lu, D ., Ku, W. and Zhang, S. (2020) ' Research trend and application analysis of
data mining in the field of traditional Chinese medicine' , Chinese Journal of Traditional
Chinese Medicine, Vol. 45, No. 2, pp.953-955 .
Sindhu, M.S. and Kannan, B. (2014) 'Detecting signals of drug-drug interactions using association
rule mining methodology ' , Journ al ofNe urochemistry, Vol. 30, No. 4, pp.783-790.
Song, W. , Chen, Z. and Li, Y.L. et al. (2020) 'Data processing and text mining technologies on
electronic medical records' , A Review Journal of Healthcare Engineering, Vol. 25, No. 2,
pp.11-19.

You might also like