You are on page 1of 15

Information Processing and Management 39 (2003) 133–147

www.elsevier.com/locate/infoproman

Decision support for the academic library acquisition
budget allocation via circulation database mining
S.-C. Kao a, H.-C. Chang b, C.-H. Lin

c

a

b

Department of Information Management, Kun Shan University of Technology, 949 Da Wan Road, Yung Kung,
Tainan 710, Taiwan, ROC
Department of Business Administration, National Cheng Kung University, 1 University Road, Tainan 710, Taiwan, ROC
c
Department of Industrial Management Science, National Cheng Kung University, 1 University Road,
Tainan 710, Taiwan, ROC
Received 21 September 2001; accepted 13 December 2001

Abstract
Many approaches to decision support for the academic library acquisition budget allocation have been
proposed to diversely reflect the management requirements. Different from these methods that focus mainly
on either statistical analysis or goal programming, this paper introduces a model (ABAMDM, acquisition
budget allocation model via data mining) that addresses the use of descriptive knowledge discovered in the
historical circulation data explicitly to support allocating library acquisition budget. The major concern in
this study is that the budget allocation should be able to reflect a requirement that the more a department
makes use of its acquired materials in the present academic year, the more it can get budget for the coming
year. The primary output of the ABAMDM used to derive weights of acquisition budget allocation contains two parts. One is the descriptive knowledge via utilization concentration and the other is the suitability via utilization connection for departments concerned. An application to the library of Kun Shan
University of Technology was described to demonstrate the introduced ABAMDM in practice. 
2002 Elsevier Science Ltd. All rights reserved.
Keywords: Decision support; Acquisition budget allocation; Data mining

1. Introduction
The decision of the budget allocation for academic libraries is a fairly important, but complex
task. Greaves (1974) indicated that eight variables play the most important role in the acquisition

E-mail address: kaosc@mail.ksut.edu.tw (S.-C. Kao).
0306-4573/03/$ - see front matter  2002 Elsevier Science Ltd. All rights reserved.
PII: S 0 3 0 6 - 4 5 7 3 ( 0 2 ) 0 0 0 1 9 - 5

134

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

allocation operation in his study. These include (1) the size of faculty, (2) the size of students or
size of student credit hours, (3) the cost of library material, (4) the adequacy of the library collection in an academic discipline, (5) the size of type of courses, (6) the amount of conducted
research, (7) the past record in use of allocated funds, and (8) the circulation statistics. Many
ensuing studies depicted in the literature have witnessed the increased use of these factors in the
past few years (Budd & Adams, 1989; Crotts, 1999; Decroos et al., 1997; Evans, 1996; Hamaker,
1995; Lafouge & Laine-Cruzel, 1997; Sorgenfrei, 1999; Wise & Perushek, 1996, 2000). They have
indeed shown a meaningful contribution to the enhancement of library management. Although
the decision with respect to the acquisition budget allocation also involves many other issues such
as priorities, strategic plans, and programs, etc., it is believed that the suitable material utilization
should be considered also. The survey presented by Tuten and Lones (1995) and research conducted by Budd and Adams (1989) emphasized that circulation data is one of the most extensively
referred factors when dealing with the decision of desired allocation budget for libraries. Therefore, information discovered in circulation database would be valuable to relevantly reflect the
utilization of material for a library.
The techniques used to support the decision of the acquisition budget allocation operation
mostly include statistics based models and goal programming based paradigms. Based on both/
either the quantified data and/or the information provided by the management, the former simply
focuses on the statistical analysis to derive a hierarchical decision tree where the concerned factors
along with corresponding shared ratio are contained (Anderson, Sweeney, & Williams, 1994). The
later deals mainly with the development of mathematical models that can offer optimal solutions
to the problem of contradictory or incommensurable goals by giving the rank order of the concerned goals and the constraints of the concerned factors in advance. Wise and Perushek (1996,
2000) has comprehensively demonstrated its use for the acquisition allocation problem. In spite
that both techniques have shown a meaningful contribution to the decision support for library
acquisition budget allocation, a drawback that is revealed is that the historical circulation data is
hardly ever taken into account in depth while budgeting. In other words, the utilization of materials acquired by a department should be able to reflect the final allocated acquisition budget,
and thus becomes the motivation of this study.
To deeply analyze the circulation data to reveal the material utilization for a department is a
complex task in all intents and purposes. It is not simply to get the ratio of the number of records
to the total records of the circulation database for a period of time. The data collection, the
definition of degree that a material belongs to a department, and the complexity of entropy
computation are all variables that may cause the task enormously difficult. The data collection
needs to gather the circulation data from daily operations and store in a database, clean unnecessary attributes (or fields) and missing data if existing, and reconstruct the created database if
necessary. The degree that a material belongs to a department needs to be defined in order for a
department to compute the total entropy of performance. For example, the degree that a material
is classified into the subject of accounting may be defined to be related to the department of accounting with the semantic strength of ‘‘absolutely matching’’, information management with
‘‘matching’’, and mechanical engineering with ‘‘not matching’’. This implies that when a material
of accounting was utilized, the department of accounting performs better than both information
management and mechanical engineering because they use their acquired materials more appropriately. Although this task is highly subjective and time consuming, it is necessary for this study.

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

135

The data mining technology is a process of discovering implicit knowledge in large databases. It
has the capability to uncover the hidden relationships, patterns, and trends in the historical databases. For example, by using modern information technology, the data mining technique has
witnessed the increased emphasis on the value of past data, such as personal bankruptcy prediction (Donato et al., 1999), hotel data mart (Sung & Sang, 1998), customer service support (Hui
& Jha, 2000), knowledge generation in finance (Dhar, 1998). It can perform the operation of
association, classification, regression, clustering, or summarization to reveal patterns that are
significantly interesting, meaningful, interpretable, and decision supportable in large databases
(Han & Fu, 1999; Hirota & Pedrycz, 1999; Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Fayyad &
Stolorz, 1997). More importantly, the discovered knowledge with a descriptive or predictive form
can be used to support domain related decisions. For example, the descriptive knowledge ‘‘according to the circulation data collected for the last academic year, the department of information
management made much more use of materials in its subject than others in theirs’’ is decision
supportable for budget allocation, and so is the predictive knowledge ‘‘IF the department is information management THEN the utilization of materials in its subject is 92.54%.’’
To achieve the objective of this research, the construction of a model, ABAMDM (acquisition
budget allocation model via data mining) that is based on the circulation database mining for
decision support for the academic library acquisition budget allocation is studied. This research
also provides a library budget allocation solution model with a mechanism that a designer may
use in developing a decision support system. Of the rest of this paper, three sections are distributed as follows. Section 2 describes the ABAMDM where the definition of membership, the
descriptive knowledge discovery, the entropy computation, and the weights of budget allocation
for departments are included. An illustrated example presented in the same section is provided
to demonstrate the ABAMDM. Section 3, a practical application for the Library of KSUT
(LKSUT) is delineated, where an allocation table that contains 17 departments with their corresponding weights is presented. The conclusion and future research issues are addressed in the
final section.

2. The ABAMDM
2.1. The architecture of ABAMDM
The architecture of ABAMDM is illustrated in Fig. 1. It contains two stages to achieve the
objective of this study. The first stage is to preprocess the circulation data, and the second is to
derive circulation performance and descriptive knowledge, that are used to decide the weights of
acquisition budget allocation for departments.
2.1.1. Preprocess of circulation data
In general, the original circulation database contains several attributes. The only ones that are
required in this study are departmental members identifiers (member_ID) and materials identifiers
(material_ID). In Fig. 1, the data table of Circulation contains such two attributes. However,
what are needed for this study are department identifiers (dept_ID) and the category identifiers
(category_ID) to reflect performance for departments. Therefore, two tables, DeptMember and

136

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

Fig. 1. The architecture of ABAMDM.

Material, have to be generated, where DeptMember contains the attributes of dept_ID and
member_ID and Material the material_ID and category_ID. The circulation data table of Circulation_I containing attributes of dept_ID and category_ID can be then generated. Notice that
this generation process can be omitted if Circulation_I can be directly obtained from the employed library information system (the dotted rectangular part in Fig. 1).
Eventually, the objective of preprocess of circulation data is to derive the final circulation table
(Circulation_II) that includes dept_ID, category_ID, and the corresponding semantic strength.
The semantic strength represents the degree of the relation between a department and a material
category. It is management definable and can be divided into several levels. Basically, a semantic
strength takes on a linguistic value, and therefore is not calculable. However, it can be assigned a
numeric value when measurement is concerned with. For example, the ‘‘absolutely matching’’ is a
defined semantic strength with a numeric value of 0.6 indicating that a department and a category
are perfectly relative while ‘‘absolutely not matching’’ no relation at all. Nevertheless, from a
systematic point of view, it is very tedious for every academic year to define the semantic strength

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

137

for all categories. For example, if the size of department is 35 and category 500 in a circulation
database, the size of definition units will be 35  500, that is 17,500. Therefore, it is necessary to
create a data table (Membership) that contains all departments, categories, and corresponding
semantic strength, and then perform a Structured Query Language (SQL) operation (SQLFC ) to
retrieve the Circulation_II (Connolly, Begg, & Strachan, 1996). The SQLFC is given as follows. 

SQLFC
Create table Circulation_II (dept_ID, category_ID, Strength);
Insert into Circulation_II (dept_ID, category_ID, Strength).
Select Circulation_I.dept_ID, Circulation_I.category_ID, Membership.Strength;
From Membership, Circulation_I;
Where Circulation_I.dept_ID ¼ Membership.dept_ID and Circulation_I.category_ID ¼
Membership.category_ID.
In order to reduce the evaluation differences among librarians, departmental faculties, and
specialists, the table Membership can be created via group assessment. However, this study does
not go to this point. Part of the Membership for the department of Information Management as an
example is illustrated in Fig. 2. On the vertical dimension, the 6 represents the ‘‘absolutely
matching’’, 5 the ‘‘extremely matching’’, 4 the ‘‘matching’’, 3 the ‘‘ordinarily matching’’, 2 the
‘‘likely matching’’, 1 the ‘‘slightly matching’’, and 0 the ‘‘absolutely not matching’’. On the horizontal one, the category codes are based on the table of New Classification Scheme for Chinese
Libraries. For example, code ‘‘480’’ is ‘‘trade’’ and is defined to be related to the Information
Management with the semantic strength of ‘‘absolutely matching’’.
2.1.2. Generation of decisional knowledge
When it comes to the stage of generation of decisional knowledge, the research attempt is to
obtain descriptive knowledge and utilization gain. The descriptive knowledge is stored in the data
table of DeptConcent and utilization gain in DeptConnect. The DeptConcent contains attributes

Fig. 2. Part of Membership for the department of Information Management.

138

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

of dept_ID and its corresponding degree of concentration that represents the category distribution
of materials that have been used. The DeptConnect includes attributes of dept_ID and connection
that represents the utilization suitability for department. For example, the descriptive knowledge
‘‘the department of Information Management made much use of materials, and most of them are
in its subject’’ can relevantly explain the utilization of the department of information management. Therefore, the value of concentration and connection is derived from the number of records, the distribution of material categories used, and the links of categories and subjects.
Importantly, how to obtain the degree of concentration for a department continues a critical
issue that needs to be solved in this study. The ID3 algorithm introduced by Quinlan (1986) has
been widely used to help measure the information entropy for a set of data under the consideration of multiple classes (Quinlan, 1987; Sestito & Dillon, 1994). Based on the information theory,
it adopts a top-down induction method to return the degree of ability (or purity) that a variable
can separate the other. The more the value of the degree of ability, the less the data is equally
distributed. For a department (denoted by D), the expected information (I), the expected entropy
(E(D)), and the concentration (Concentration(D)) are expressed in formula (1)–(3), respectively. 
n 
n
nC 
nC 
C
C
ð1Þ
IðnC1 ; nC2 ; . . . ; nCn Þ ¼  1 log2 1 þ    þ  n log2 n
M
M
M
M
nCi is the number of records that return to class Ci , i ¼ 1; 2; . . . ; n and M is the total number of
records.
t h
i
X
nVi 
ð2Þ
IðaVi C1 ; aVi C2 ; . . . ; aVi Cm Þ
EðDÞ ¼
M
i¼1
t is the number of different values that the department D can takes on; nVi is the total number of
records that the department D takes value Vi , i ¼ 1; 2; . . . ; t; aVi Cj is the total number of records
that the department D takes value Vi and returns to class Cj , i ¼ 1; 2; . . . ; t, j ¼ 1; 2; . . . ; m and M is
the total number of records.
ConcentrationðDÞ ¼ IðnC1 ; nC2 ; . . . ; nCn Þ  EðDÞ

ð3Þ

To obtain the utilization connection for a department, the semantic strength has to be defined in
advance. Assume there are n levels defined for the semantic strength, denoted by SS(L1 ; L2 ; . . . ; Ln )
and the corresponding importance is represented by SI(x1 ; x2 ; . . . ; xn ), where Li is the ith level and
xi the importance of Li with a numeric value ranging from 0.00 to 1.00 and the sum of xi is 1.00,
i ¼ 1; 2; . . . ; n. The averaged connection for a department D, Connection(D), is defined by formula
(4), where ND is the number of total members.
Pn
nL xL
ConnectionðDÞ ¼ i i i
ð4Þ
ND
where nLi is the number of records of which the category is Li and xLi is the importance of Li .
In spite that the Concentration(D) in formula (3) and the Connection(D) in formula (4) can be
obtained as the decision base for acquisition budget allocation, there is no advocacy that can be
rely on to derive budget allocation weights. However, it is believed that someone may put
more emphasis on connection than concentration, but others may make this decision vice versa.
Eventually, budget operation has to deal with subjective opinions as usual and will be never

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

139

conducted without ‘‘numbers’’. In other words, budget allocation has to rely on things that are
countable, measurable, and quantifiable. Therefore, it is an assumption for ABAMDM that the
importance of concentration is a and connection 1  a that are management definable. Consequently, the final weight, Weight(D), for the department D can be then determined by formula (5),
where i ¼ 1; 2; . . . ; m, m is the number of departments.
a ConcentrationðDÞ þ ð1  aÞ ConnectionðDÞ
i¼1 ½a ConcentrationðDi Þ þ ð1  aÞ ConnectionðDi Þ

WeightðDÞ ¼ Pm

ð5Þ

2.2. An illustrated example
Assume that the original preprocessed circulation data table (Table 1) used as an example to
demonstrate the ABAMDM includes the following information: (1) four departments considered:
Dept01, Dept02, Dept03, and Dept04; (2) Dept01 has 35 members, Dept02 31, Dept03 42, and
Dept04 38; (3) 302 records in total, of which Dept01 has 72, Dept02 52, Dept03 79, and Dept04
99; (4) five levels for semantic strength, SS(A, H, M, L, N)––A: absolutely matching, H: highly

Table 1
A collected circulation database for departments
Dept

Category

Count

Entropy

Importance

Dept01

A
H
M
L
N

18
1
0
23
30
E(Dept01)

0.5000
0.0857

0.5259
0.5263
0.3905

0.4000
0.3000
0.2000
0.1000
0.0000

Dept02

A
H
M
L
N

13
0
2
6
31
E(Dept02)

0.5000

0.1808
0.3595
0.4449
0.3541

0.4000
0.3000
0.2000
0.1000
0.0000

Dept03

A
H
M
L
N

42
2
8
2
25
E(Dept03)

0.4846
0.1343
0.3346
0.1343
0.5253
0.4219

0.4000
0.3000
0.2000
0.1000
0.0000

Dept04

A
H
M
L
N

9
36
20
18
16
E(Dept04)

0.3145
0.5307
0.4661
0.4472
0.4249
0.7158

0.4000
0.3000
0.2000
0.1000
0.0000

140

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

matching, M: matching, L: likely matching, N: absolutely not matching; corresponding importance: IS(0.4, 0.3, 0.2, 0.1, 0.0); (5) a is 0.3.
Table 1 contains the following columns: dept, category, count, entropy, and importance. The
dept represents the identifier of the department while category the level of semantic strength. The
count is the number of records observed for a level of matching and the entropy is the expected
entropy computed via formula (2). The importance contains the numeric value ranging from 0.00
to 1.00 representing the corresponding importance that a level of matching is related to a department. To achieve the objective, the concentration of material categories via ID3 algorithm,
the connection via formula (4), and the weight via formula (5) for the departments are obtained.
The expected information via formula (1) for Table 1 is I(82, 39, 30, 49, 102), that is 1.8805. The
entropy for Dept01 is 0.3905, Dept02 0.3541, Dept03 0.4219, and Dept04 0.7158. In subsequence,
by formula (3), the concentration for Dept01 is then 1.4900, Dept02 1.5264, Dept03 1.4585, and
Dept04 1.1647. Fig. 3 shows the concentration against the total records for four departments.
Particularly, it is found that the concentration in terms of category observed in Dept04 is less than
any one of the others, in spite of the largest use of materials. This implies that Dept04 makes use
of materials in various subjects.
The connection via formula (4) for Dept01, Dept02, Dept03, and Dept04 are 0.2800, 0.2000,
0.4571, and 0.5316, respectively. Consequently, by formula (5), the weight for Dept01 is 0.2380,
Dept02 0.2233, Dept03 0.2752, and Dept04 0.2635. In addition to these numeric data that can be
used to support acquisition budget allocation, some descriptive knowledge can be derived on the
basis of comparison as follows. Notice that the value of the use of materials is the averaged record
per member (ARPM).
• For Dept01: adequate use of materials (2.0571), not diverse categories observed (1.4900), low
utilization connection (0.2800).
• For Dept02: little use of materials (1.6774), not diverse categories observed (1.5264), very low
utilization connection (0.2000).

Fig. 3. The concentration against size for four departments.

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

141

• For Dept03: ordinary use of materials (1.8810), diverse categories observed (1.4585), high utilization connection (0.4571).
• For Dept04: high use of materials (2.6053), very diverse categories observed (1.1647), very high
utilization connection (0.5316).

3. An application case
3.1. Application characteristics
Before ABAMDM was placed in service, the LKSUT took the ARPM as the basis to derive
weights of partial acquisition budget allocation for departments. The introduced ABAMDM has
been demonstrated to librarians at LKSUT for the possibility of employment. A concise questionnaire was designed to help elicit information with respect to evaluation of the model. The
questions posed to the librarians covered the aspects of process of circulation data analysis, usability for acquisition budget allocation, validity of the outputs, and applicability. The feedback
summarized in a concise manner was contained in Table 2. It was found that the introduced ABAMDM was adequate for acquisition budget allocation. However, computerization for
ABAMDM was strongly recommended. This part will be put onto the list of the future research
issues.
The LKSUT then employed the proposed ABAMDM to support acquisition budget allocation
operation for the 2001 academic year in the context of material utilization. The total budget was
14,673,500 new Taiwan dollar (NTD). It is the LKSUT’s policy that based upon the circulation in
the last academic year, 10% of the total budget (1,467,350 NTD), is shared by 17 departments. It
took LKSUT about three months to create the data table of Membership that contains the semantic strength indicating the relations between departments and categories. The semantic
strength was defined to be five levels: absolutely matching, highly matching, matching, slightly
matching, and absolutely not matching. The librarians, departmental faculty, and specialists in
library domain were all participants who frequently discussed in groups on a departmentby-department basis to get opinions from each other so that the distinction can be reduced. They

Table 2
Reviewers’ results
Review scenarios

Results
Reviewer 1

Reviewer 2

Reviewer 3

Reviewer 4

Adequate

Need to be simpler

Good

Better if simpler

Process of circulation
data analysis
Usability for acquisition
budget allocation
Validity of the outputs

Adequate

Acceptable

Helpful

Decision supportable

Acceptable

Agreeable

Good

Applicability

High

High only if
computerized

Adequate

Acceptable, but need
clearer description
Better if computerized

142

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

individually evaluated the semantic strength and determined the final conclusion in groups if any
conflict occurs.
The LKUST has been employing a windows-based information system (named T2) developed
by Transtech Information Co. Ltd., to help circulation operation for 4 years. The circulation data
(Circulation_I) for a period of time can be easily collected via T2. However, T2 does not include
the function creating Circulation_II that includes three attributes of dept_ID, category_ID, and
Strength. Therefore, it is necessary to externally perform SQLFC to generate the Circulation_II.
Table 3 listed the characteristics of this application case, including the time period for collecting
circulation data, the definition of semantic strength, the a, the number of members for 17 departments, the total number of records observed, and the ARPM.
3.2. The results and findings
By utilizing ID3, formula (4) and formula (5), the results as shown in Table 4 including concentration, connection, weights for ABAMDM and ARPM, and allocated acquisition budget for
ABAMDM was obtained. Notice that due to the consideration of comparison aspect, the results
produced by ARPM was also included in Table 4. Fig. 4 was created to illustrate the concentration against the number of records for departments. The relationship between concentration

Table 3
The characteristics for the ABAMDM applied in LKSUT
Time period for collecting
circulation data

4/1/2000 to 3/31/2001

Semantic strength (a ¼ 0:2)

SS(A, H, M, L, N)––A: absolutely matching, H: highly matching, M: matching, L:
likely matching, N: absolutely not matching;
ISðA; H; M; L; NÞ ¼ ISð0:4; 0:3; 0:2; 0:1; 0:0Þ

Departments

Total members (ND )

Total records (NR )

NR /ND

Mechanical Engineering
Electronic Engineering
Environmental Engineering
Electrical Engineering
Fiber Engineering
Information Management
Accounting
Industrial Management
Real Estate Management
Early Childhood Care and
Education
International Trade
Finance and Banking
Public Communication
Applied English
Visual Communication Design
Motion Picture Design
Space Design

1041
806
565
907
228
667
471
402
452
220

10,169
9552
10,723
13,498
6384
8633
8390
6244
7940
1878

9.7685
11.8511
18.9788
14.8820
28.0000
12.9430
17.8132
15.5323
17.5664
8.5364

476
477
166
629
579
233
301

10,740
4945
701
10,174
11,041
2172
4312

22.5630
10.3669
4.2229
16.1749
19.0691
9.3219
14.3256

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

143

Table 4
The results via ABAMDM and ARPM
Code
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17

Departments
Mechanical Engineering
Electronic Engineering
Environmental Engineering
Electrical Engineering
Fiber Engineering
Information Management
Accounting
Industrial Management
Real Estate Management
Early Childhood Care
and Education
International Trade
Finance and Banking
Public Communication
Applied English
Visual Communication
Design
Motion Picture Design
Space Design

Concentration

Connection

Weight (%)

2.0733
1.9758
1.9923

3.7414
2.6979
6.3602

6.3069
4.7258
10.1544

1.8863
2.0271
2.0691

2.8191
3.2114
4.7061

2.0020
2.0299
1.9865

ABAMDM

Budget (NTD)
ARPM

ABAMDM

ARPM

3.8777
4.7044
7.5338

92,545
69,344
149,001

56,899
69,030
110,547

4.8721
5.5051
7.7337

5.9075
11.1148
5.1378

71,491
80,779
113,481

86,684
163,093
75,390

2.5503
2.3211
3.3142

4.5170
4.1880
5.6422

7.0711
6.1657
6.9731

66,280
61,453
82,791

103,758
90,472
102,320

2.1071

0.6514

1.7443

3.3886

25,596

49,722

1.9506
2.0591
2.1175
1.9664
1.9654

4.9603
3.3212
1.5175
4.6698
5.7908

8.0662
5.6795
3.0305
7.6419
9.3014

8.9566
4.1152
1.6763
6.4208
7.5696

118,359
83,338
44,469
112,134
136,484

131,424
60,385
24,597
94,215
111,073

2.0964
2.0581

3.0785
3.2385

5.3340
5.5567

3.7000
5.6867

78,269
81,537

54,298
83,443

Fig. 4. The concentration against the number of records for 17 departments.

144

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

Fig. 5. The concentrations and connections for 17 departments.

Fig. 6. Number of members, number of records, and final budget for 17 departments.

and connection was shown in Fig. 5 while the number of members, the number of records, and
allocated budget in Fig. 6. Notice that in order to show the results in an appropriate manner, the
value of the number of records was multiplied by 1/10 and budget by 1/100 in Fig. 6.
From the tables and figures given above, it was found that
1. In Table 4, although the department of Fiber Engineering showed the highest ARPM
(28.0000 in Table 3), it neither obtained the highest connection, nor the budget.

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

145

2. In Fig. 4, the department of Public Communication showed the highest concentration
(2.1175 in Table 4). This implied that the categories used were not equally distributed. However,
the Fig. 5 indicated that it did not obtain a high value of connection. This provided the information that most of the materials it made use were not in its subject. This was supported by the
result shown in Fig. 6 that it obtained low budget. Similar result was found for the department of
Early Childhood Care and Education.
3. In Fig. 4, the department of Electrical Engineering obtained the lowest value of concentration (1.8863 in Table 4), but the largest number of records (13,498 in Table 3). Accordingly, it
was found that it made use of material in a variety of categories, which implied that part of them
were in its subject, but part were not. Therefore, the total connection it obtained was not high (in
Fig. 5), and so was the allocated acquisition budget (in Fig. 6).
4. In Fig. 5, it was found that the department of Environmental Engineering obtained the
highest value of connection (6.3602 in Table 4), but the observed concentration was considered to
be low in comparison to others (in Fig. 4) and the number of members was not big. However, it
finally obtained the highest allocated budget (149,001 in Table 4). It seems that the value of a
played a very considerable role in this case.
5. In Fig. 6, the department of Mechanical Engineering showed the biggest number of members
(1041 in Table 3), but did not obtain the biggest budget. This implied that the size of department
was not the unique factor that can be relied on to determine final acquisition budget.
6. The acquisition budget allocated via ARPM depended totally upon the ARPM. Fig. 7
provided the information that the result produced by ABAMDM was fairly different from that by
ARPM.
From the findings described above, a remarkable implication obtained was that high acquisition budget depends not only upon the number of records and the number of members, but also
upon the suitability of use. Although it is difficult to evaluate the introduced ABAMDM and
ARPM via the concluding allocated acquisition budget, it is believed that to open up the material

Fig. 7. The allocated acquisition budget via ABAMDM and ARPM for 17 departments.

146

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

utilization with respect to relevance and bring it into the process of acquisition budget allocation
is a useful path to make obvious the value of the materials for which we budget.

4. Concluding remarks
This paper has addressed the importance of circulation data processing in a basic detail, introduced a budget allocation model by using the data mining technique, illustrated the use of
ABAMDM, and demonstrated an application. The proposed ABAMDM employs the SQL to
help preprocess the circulation data if necessary, the information theory to measure the concentration of categories observed in the circulation data table, and the utilization connection to
derive the weights as a decisional base of acquisition budget allocation. It offers a new way of
processing the circulation data at hand to elicit information that can interpret the data in an
appropriate mode. The knowledge discovered by ABAMDM can be used to support making
decisions in regard to the acquisition library budget allocation. However, although the ABAMDM can provide the budget allocation operation with some helpful information by mining the
circulation data table at hand, the subjective information will greatly influence the final results.
For example, the definition of semantic strength for categories and departments and the value of a
are factors that need to be determined carefully. Furthermore, based on reviewers comments listed
in Table 2, this model may be more applicable if computerized, and thus would become an extension of this study.
It has been seen that the availability of accessing materials via Internet is rapidly changing the
strategy as a transition from print to electronic forms for libraries. On-line materials (or electronic
materials) are very expensive at this time. Importantly, the large amount of expenditure during the
past decade has revealed that a rather swift shift or reallocation of the collection budget from
print to electronic publications makes the budget allocation decision more complex and difficult
(Miller, 1999). For example, how to budget the comprehensive preservation for the electronic
materials without the copyright and migration problems? How to negotiate the most advantageous on-line database licenses for users, and how to catalog these titles? What can be relied on
while making the decision on which electronic journals or e-books are good for our library? We
believe that data collection via daily circulation work will be greatly influenced by the way a user
makes use of the on-line materials, and in consequence makes the budget allocation operation
even more difficult. In spite that many issues and arguments have been brought onto the discussion and research platform, it is believed that ‘‘how to use right money to buy right things’’
remains a core question while budgeting. It will be valuable to discover unknown information in
historical data to support making budget allocation related decisions.

References
Anderson, D. R., Sweeney, D. J., & Williams, T. A. (1994). An introduction to management science: quantitative
approaches to decision making (pp. 593–622). New York: West Publishing Company.
Budd, J. M., & Adams, K. (1989). Allocation formulas in practice. Library Acquisitions: Practice & Theory, 13, 381–
390.

S.-C. Kao et al. / Information Processing and Management 39 (2003) 133–147

147

Connolly, T. M., Begg, C. E., & Strachan, A. D. (1996). Database Systems: A Practical Approach to Design,
Implementation, and Management. New York: Addison-Wesley.
Crotts, J. (1999). Subject usage and funding of library monographs. College & Research Libraries, 60, 261–273.
Decroos, F., Dierckens, K., Poller, V., Rousseau, R., Tassignon, H., & Verweyen, K. (1997). Spectral method for
detecting periodicity in library circulation data: a case study. Information Processing & Management, 33(3), 393–403.
Dhar, V. (1998). Data mining in finance: using counterfactuals to generate knowledge from organizational information
systems. Information Systems, 23(7), 423–437.
Donato, J. M., Schryver, J. C., Hinkel, G. C., Schmoyer, R. L., Jr., Leuze, M. R., & Grandy, N. W. (1999). Mining
multi-dimensional data for decision support. Future Generation Computer Systems, 15(3), 433–441.
Evans, M. (1996). Library Acquisitions formulae: the monash experience. Australian Academic & Research Libraries,
27, 47–57.
Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI
Magazine, 17, 37–54.
Fayyad, U., & Stolorz, P. (1997). Data mining and KDD: promise and challenge. Future Generation Computer Systems,
13(2–3), 99–115.
Greaves, F. L., Jr. (1974). The allocation formula as a form of book fund management in selected state-supported
academic libraries, Florida State University, unpublished doctoral dissertation.
Hamaker, C. (1995). Time series circulation data for collection development or: you can’t intuit that. Library
Acquisitions: Practice & Theory, 19(2), 191–195.
Han, J., & Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge
and Data Engineering, 11(5), 798–805.
Hirota, K., & Pedrycz, W. (1999). Fuzzy computing for data mining. Proceedings of the IEEE, 87(9), 1575–1600.
Hui, S. C., & Jha, G. (2000). Data mining for customer service support. Information and Management, 38(1), 1–13.
Lafouge, T., & Laine-Cruzel, S. (1997). A new explanation of the geometric law in the case of library circulation data.
Information Processing & Management, 33(4), 523–527.
Miller, R. G. (1999). Electronic journals and the scholarly communication process: present and future. In C.-C. Chen
(Ed.), IT and Global Digital Library Development (pp. 293–300). Masachusetts: MicroUse Information.
Quinlan, J. R. (1986). Induction of decision tree. Machine Learning, 1, 81–106.
Quinlan, J. R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221–234.
Sestito, S., & Dillon, T. (1994). Automated knowledge acquisition. Englewood Cliffs, NJ: Prentice Hall.
Sorgenfrei, R. (1999). Slicing the pie: implementing and living with a journal allocation formula. Library Collections,
Acquisitions & Technical Services, 23(11), 39–45.
Sung, H. H., & Sang, C. P. (1998). Application of data mining tools to hotel data mart on the Intranet for database
marketing. Expert Systems with Applications, 15(1), 1–31.
Tuten, J. H., & Lones, B. (1995). Allocation Formulas in Academic Libraries. Chicago, IL: Association of College and
Research Libraries.
Wise, K., & Perushek, D. E. (1996). Linear goal programming for academic library acquisition allocation. Library
Acquisitions: Practice & Theory, 20(3), 311–327.
Wise, K., & Perushek, D. E. (2000). Goal programming as a solution technique for the acquisition allocation problem.
Library & Information Science Research, 22(2), 165–183.