You are on page 1of 6

SMARtS: Software Metric Analyzer for Relational

Database Systems

Bushra Jamil Asma Batool


Department of Computer Science Department of Computer Science
University of Sargodha University of Sargodha
Sargodha, Pakistan Sargodha, Pakistan
bushra_jameel@yahoo.com asma.batool@yahoo.com

Abstract—Software complexity has a strong impact on We have collected eight projects out of which six projects
development effort and maintainability. Measurement of are small scale student projects and rest two are business
complexity can help managers in project planning and cost applications. To evaluate the authenticity of proposed weights
estimation. A number of metrics exist to measure the complexity we have calculated the complexity using SMARtS and
for procedural and object oriented applications. In this paper we analyzed computed complexity and real effort using
present a model to compute the complexity for small scale correlation analysis. Obtained results suggest that there is high
relational database applications. Our model is based on different correlation between computed complexity and actual effort.
relational database objects. Complexity of each database object is
computed by assigning suitable weights to its complexity This paper is organized as follows: Section 2 describes
determining factors. To evaluate our model, we have applied literature review; Section 3gives the detailed introduction of
correlation analysis on computed complexity and actual effort. major database objects; Section 4 presents the architecture of
The results indicate a strong correlation between the effort and SMARtS; Section 5 presents categorization of objects; Section
complexity computed by our model. 6 describes the results and section 7 concludes the paper and
proposes future directions.
Complexity Determining Factors; Categorization Ranges;
Complexity Weights, Database Application Complexity Estimation; II. LITERATURE REVIEW
I. INTRODUCTION Software metrics has been promoted since early 1970s.
Initially SLOC were proposed to compute work effort but it
Measurement of certain attributes during software was imprecise in most of the cases. Function points by
development process is one of the most important activities in Albrecht were well suited to scientific and engineering
software project management. Measurements enable project applications [8]. Another attempt in this regard was done by
manager to determine the impact of improvement efforts McCabes who used flow graphs to compute the complexity of
towards software development process [2]. Software metrics code [11]. Flow graphs mainly focus control statements to
are measurement based techniques that are helpful for project compute complexity.
manager in estimating some necessary information like
budget, schedule, effort and other resources at initial level [1]. Most of the proposed metrics are mainly suitable for the
Using measurements large number of software metrics has procedural languages while ignoring relational database
been proposed to estimate the above stated parameters. applications. Bang model by DeMarco was the first model
based on data model but its validation has not been proved by
Most commonly used metrics are LOC, function points by any means [6]. In [12] Mario Piattini proposed a metric that
Albrecht, cyclomatic complexity by McCabes, COCOMO by was based upon tables, attributes in table and its degree. As
Boehm and Information flow by Henry and Kafura. All of the this model ignored all other objects so its results were not
above specified metrics are well suited to applications accurate.
developed in procedural and object-oriented languages [2].
However there are no such studies that compute complexity Another attempt in this regard was a proposed metric to
metric of relational database systems based on detailed compute the complexity and effort of a small scale relational
features of database objects such as for a table, constraints like database application that is presented in [2]. This metric was
assertion and triggers increase its implementation effort. In based upon the complexity level of each of the database
this paper we propose architecture for Software(S) Metric (M) object.
Analyzer (A) for Relational(R) Database (t) Systems(S)
(SMARtS) to compute the complexity of relational database III. OBJECTS OF THE RELATIONAL DATABASE SYSTEMS
applications. The complexity is computed on the basis of The complexity of the conventional database systems
weights assigned to complexity determining factors of each developed in any relational DBMS depends upon the five
database object. The proposed SMARtS system is developed major database objects: tables, relationships, queries, forms
using VC++ 6 and MS Access. and reports. These objects can be classified as simple, average
or complex [1]. Characteristics of these objects vary on the

978-1-4244-8003-6/10/$26.00 ©2010 IEEE


bases of the complexity determining factors (CDFs), for Assertion 0.75
example a simple table becomes complex if indexes,
assertions, or triggers are applied on it. We assign weights to Trigger 0.75
these CDFs and total weight of CDFs is used to place any
database object into one of the three classifications i.e. simple,
B. Relationships
average or complex.
In any relational database application the use of
These weights have been assigned on the basis of detailed relationships can not be avoided. The relationships are used to
study of number of relational database applications, interviews extract information from multiple tables by joining tables.
conducted with experienced database developers and Primary and foreign keys are used to join tables. Cardinality
correlation results of CDFs of each object with actual effort and degree of relationship affect the complexity of the
for developing the application. We have chosen static weights relationship.
as 0.25, 0.50, and 0.75 for ease in calculations.
There are three permissible cardinality ratios of the
A brief description of each database object and its CDFs relationship that are one-to-one, one-to-many and many-to-
along with weight assigning strategy is given below. many. None of the RDBMS supports many-to-many
A. Tables and their CDFs relationship and it must be broken down into two one-to-many
relationships [4]. Referential Integrity (RI) can be enforced on
A table object is used to store information in the form of any the relationship of the table. The purpose of referential
rows and columns. Main CDFs affecting the classification of integrity is to keep the references synchronous so that the
the table include number of fields in the table, type of fields, database always remains in consistent state [4]. Cascade
number of fields in primary key, number of foreign keys, delete means that a record in a primary table can not be
constraints, indexes, assertions and triggers. deleted if there are some matching records in child
Greater the number of fields in the tables greater will be table. Cascade update ensures that update in primary key must
the complexity. Fields having complex data types increase the result in updating all the records in the child table. Degree of a
complexity of the table. A foreign key is “a correspondence relationship is the number of entity types participating in a
between a set of columns in one table and the set of primary relationship. The implementation of addition items in
key columns in some other table [3]”. Cascade relationships relationships will definitely increase the complexity of the
between the tables can be established using foreign keys. object as well as effort.
Foreign key constraints increase the table complexity and the Weights assigned to relationships on bases of above
time to create table. The complexity of table increases with specified CDFs are shown in Table 2:
increasing FKs in the table.
An assertion is used to enforce semantic integrity TABLE II. WEIGHTS OF CDFS OF RELATIONSHIP
constraint. An assertion is used to specify a condition that
Unary Binary
must remain true in every database state [4]. Writing code for
assertions is difficult and hence increases the complexity. A One-to-one 0.25 0.25
trigger is an action that is automatically executed as result of a
One-to-many 0.50 0.50
modification to the database. A condition must be specified
under which the trigger is to be executed and the actions to be One-to-many(RI Enforced) 0.75 0.75
taken when the trigger executes [4]. Triggers are also useful
Many-to-many 0.75 0.75
for enforcement of referential integrity, which must preserve
relationship between the referencing table and the referenced
table whenever an update, add or delete operation is Ternary or n-ary relationship is given weight 0.75 as these are
performed on rows of these tables. Due to this triggers are more difficult to create.
also considered as one of the major CDFs of a table. As
assertions and triggers are difficult to write so are given higher C. Queries
weights. The queries are necessary part of any information system.
Query is a request for obtaining data results or performing an
The weights assigned to each CDF in a table to calculate its action on data, or for both. A query can be used for answering
classification are given in Table 1: a simple question, performing calculations; combining data
from multiple tables or for updating data in a table. The
TABLE I. WEIGHTS OF CDFS OF TABLE
queries are categorized into; simple select query, nested select
CDF of Table Weight Assigned query, action query, and crosstab query.
Simple data type field 0.25 Simple select query is used to retrieve records from one
table or query; Nested select query contains number of select
Complex data type 0.50 queries; an action query updates a table and a crosstab query
Each field in primary key 0.25 performs summations and calculations, and presents data in a
spreadsheet format.
Each field in foreign key 0.50
One may take long time to think to write nested select averages, percentages, or running sums of the grouped records
query and consequently it needs more effort. Crosstab queries to make the data more understandable. A report using
are more difficult as compared to that of simple select queries. expression can be used for performing different mathematical
We assigned weights to each query type depending upon its calculations, for extraction of text, or for validation of data. A
complexity. The queries weights are shown Table 3: subreport is used to view information from more than one
table or query on the same report. Custom reports are reports
TABLE III. WEIGHTS OF CDFS OF QUERIES that are designed for a specific purpose and saved so that one
can generate a report in that custom format at any time when
Query Type Weight Assigned desired [4].
Simple select query 0.25
We interviewed database application developers to get
Nested select query 0.75 information about the complexity level of different reports.
Based on this summary weights assigned to each report type
Action query 0.50
are shown in Table 5:
Crosstab query 0.50
TABLE V. WEIGHTS OF CDFS OF REPORTS

D. Forms Type Weight Assigned

Forms provide easiest and more accurate way of entering Simple report 0.25
the data in the database [2]. The forms can be simple data Grouped report 0.50
entry forms, custom dialogue boxes, suforms, switchboards
and customized forms. Some of the forms contain special Report using expressions 0.75
effects. Sub report 0.75
A simple data entry form is used for adding, viewing, Custom report 0.75
editing, and deleting existing data in one table. Custom
dialogue boxes are used for prompting user for any action.
Subforms are used for viewing, editing, and deleting existing IV. PROPOSED ARCHITECTURE FOR SMARTS
data in more than one table. Switchboards are used as menu
for simplifying the process of starting the different reports and The proposed Software Metric Analyzer for Relational
forms in a database and easing navigation around the database. Database Systems (SMARtS) is comprised of three layers;
Customized forms are the forms that are modified to change Presentation Layer, Processing Layer and Repository Layer.
the appearance and functionality of forms. The architecture of SMARtS is shown in Fig 1.

Using correlation of different CDFs of forms and actual Presentation Layer


effort we have assigned a weight to each form type. The Graphical User
Graphical User Interface
Interface
weights assigned to each form type are given in Table 4:

TABLE IV. WEIGHTS OF CDFS OF FORMS Processing Layer


Complexity Determining Factors Entry Module
Type Weight Assigned

Simple data entry 0.25


Table Relationship Query Form CDF Report CDF
Custom dialogue boxes 0.50 CDF CDF Module CDF Module Module
Module Module
Subforms 0.75

Switchboards 0.50 Application Complexity Estimator


Customized forms 0.75
Object Classifier
Form with special effects 0.75

E. Reports Repository Layer


The last object of database in our proposed system is
reports. Reports are ready-to-print documents for viewing Metric, CDF, Weight
desired database information away from a computer. Reports Repository
can be simple reports, grouped reports, summary reports,
report using expressions, subreports, and custom reports.
Figure 1. Architecture of SMARtS
A simple report contains information that is based on one
table or query. A grouped report presents information by Presentation layer is the top most layer that is Graphical
dividing records into groups. A grouped report contains User Interface (GUI). It is used to take input of various CDFs
various sections and it also represents summations, totals, and to present output in the form of complexity.
The second layer is processing layer. The processing layer V. CLASSIFICATION OF OBJECTS
consists of three modules; Complexity determining factors A database object can fall in the category of simple,
(CDF) Entry, Application Complexity Estimator (ACE), average or complex which latterly determines the complexity
Object Category Classifier (OCC). Estimator uses CDF values of each object.
for each object as well as inputs by the user through
presentation layer and passes these to OCC. OCC operates on A. Classification of Tables
CDF values for classification of object using weighted sum The complexity of table depends upon type and number of
and ranges stored in its repository. CDFs given in Table 1. Each CDF is given a static weight
Third layer is data repository layer, used for data storage. (Wi) according to the complexity of each CDF. The
We store three types of data in data repository; Weight data classification of table object is computed by using the total
repository contains stored weights of CDFs, ranges of weight of table (Ws) using following formula:
different database objects and the complexity determining
factors; Metric repository is used to store and retrieve
WS= ∑ Nfi Wi (1)
calculations required for complexity estimation of object and
overall application, and CDF repository stores user entered
CDF values. Estimator and Classifier use Weight repository Where Nfi is the number for each CDF entered by user and
and Application Complexity Estimator uses metric repository Wi is the corresponding weight the CDF.
to estimate complexity. The table will be characterized as simple, average or
User can enter the number of each CDF through GUI using complex based on the computed ranges as shown in Table 6.
interactive dialogue boxes. Sample dialogue box for table
TABLE VI. CATEGORY RANGES FOR TABLE OBJECT
object is shown in Fig. 2.
Category Simple Average Complex

Range of Ws 1.25-2.75 2.76-5.75 5.76++

We propose the statistics in Table 7 for assigning the


ranges to three different categories.

TABLE VII. FACTORS FOR CLASSIFICATION

Type of factor Static Weight Simple Average Complex


No of Simple
.25 4 6 8+
data type fields
No of Complex
.25 0 1 2+
Data Types
No of Fields In
.25 1 2 3+
P.K.
No of F.K. .50 0 1 2+
Figure 2. CDF Entry for Tables Assertions .50 0 1 2+

These inputs are passed to estimator. Estimator by using Triggers .50 0 1 2+


weighted sums of corresponding factors and ranges determines
the classification of each object that is shown to the user as in The lower and the upper bound for simple category for
Fig. 3. example are computed as follows.

LB: 4 Simple data types *.25 + 1 P.K. *.25 = 1.25 (2)

UB: 8 Simple data types *.25 + 1 Complex data types


*.50 + 1 P.K. *.25 = 2.75 (3)

B. Classification of Relationships
Relationship classification is based on cardinality and
degree of relationship. User enters cardinality and degree for
each relationship. If degree is other than ternary or n-ary its
weight (Wo) is obtained from matrix shown in table 2
otherwise Wo is 0.75. Relationship is classified using ranges
given in Table 8.

Figure 3. Classification of Objects.


TABLE VIII. CATEGORY RANGES FOR RELATIONSHIP. We have calculated the complexity using SMARtS. Effort
increases with the increase in complexity. The complexity and
Category Simple Average Complex
actual effort for our collected eight projects is shown in table
Range of Wo 0.25 - 0.49 0.50 - 0.74 0.75 ++ 11.

C. Classification Of Queries, Forms And Reports TABLE XI. COMPLEXITY VS. EFFORT
User enters the number of each the CDF (Nf) for Complexity Using Effort in
Project Name
corresponding object (query, form, report). The types of CDF Database Points person days
of query, reports and forms and their static weights are PUCIT System,
275 25
mentioned in table 3, 4, 5 respectively. Nf is multiplied by Sargodha
corresponding static weight (Wi) to calculate computed weight Daewoo Bus System 159 6
for each factor (Cw). Weighted sum is obtained by summation
Voting System 48 1
of all Cw using formula given below:
Hospital System 261 10
Ws = ∑ (4) Students Info System 130 5

where i = 1 to 4 for Queries, i = 1 to 6 for Forms, and i = 1 to Library System 188 10


5 for Reports. Suffah School System 179 6
The classification of reports, form, queries is determined Poultry Form 315 14
by using ranges given in Table 9:

TABLE IX. CATEGORY RANGES FOR REPORTS. We have applied correlation analysis between computed
complexity and actual efforts that is 0.8. This suggests that a
Category Simple Average Complex high correlation exists between computed complexity and real
Range of Ws 0.25 - 0.49 0.50 - 0.74 0.75 ++
effort. We have chosen the weights on the basis of our
significance tests that also verify that our proposed model is
D. Complexity Estimation well suited to small database applications.
Once all the database objects are classified the complexity The correlation chart between computed complexity and
estimation is done using Database Points (DBP) metrics. DBP real effort of all projects is shown in Fig 4.
is database points computed through DBP metric [6] in Table
10. 350
300
TABLE X. DBP METRIC.
250
200 Database
Category Simple Average Complex
150 Points
Table *7 + * 10 + * 15 = 100 Effort
Relationships *2 + *3+ *5 = 50
0
Queries *5 + *7+ * 10 =
1 2 3 4 5 6 7 8
Forms *4 + *5+ *7 =
Database Projects
Reports *4 + *5+ * 15 =
DBP (Sum)
Figure 4. Effort vs. Database Points
Number of each object for each category is obtained
through classifier and values are set in the metric accordingly. Fig. 4 suggests that a project having greater Database Points
These numbers are multiplied by their corresponding weights needs more effort to develop it.
to get weight for each object (Cw). DBP Sum is obtained as
VII. CONCLUSION AND FUTURE WORK
DBP (sum) = ∑ Cw (5)
In this paper we have proposed a model to compute the
complexity of small relational database application developed
These Database points can be used to compute complexity. in MS Access. The model was built using different database
Greater the DBP sum greater will be the complexity. objects. These objects were categorized based upon the basis
VI. EVALUATION of weights assigned to their complexity determining factors.
We have implemented our model in a tool called SMARtS.
We have taken eight projects developed in MS Access and
the real effort required to develop them.
We used our model to compute complexity of eight [4] Elmasri and Navathe. (2006) Fundamentals of Database Systems,
database projects. To evaluate our model, we analyzed the Addison Wesley; 5th edition.
calculated complexity and actual effort. We found that our [5] Kautz K. 1999, "Making sense of measurement for small
organizations", IEEE Software, March / April 1999, pp.14-20.
model successfully captured the complexity of different
[6] Stephen G MacDonell, Martin J. Shepperd, and Philip J. Sallis (1997)
database projects. Correlation analysis has shown that “Metrics for Database Systems: An Empirical Study”, IEEE
complexities computed by SMARtS and the actual efforts are International Symposium on Software Metrics, 1997, PP 99-107.
strongly correlated. [7] Mario Piattini, Coral Calero and Marcela Genero,(2001) “Table
Oriented Metrices For Relational Databases”, Software Quality Journal,
In future we want to enhance the functionality of SMARtS 9, 79–97, 2001
to incorporate ERD so that complexity can be automatically
[8] Albrecht A.E. and Gaffney J.E. 1983, "Software function, lines of code,
computed from the design documents. and development effort prediction: a software science validation", IEEE
Transactions on Software Engineering, 9(6), pp. 639-647.
REFERENCES [9] McCabe T.J. 1976, "A Complexity Measure", IEEE Transaction on
[1] Sana Abiad, Ramzi A Haraty, and Nashat Mansour., (2000) “Software Software Engineering, 2(4), pp. 308-320.
Metrics for Small Database Applications”, ACM SAC’00 Como, Italy, [10] Matson J., Barrett B., and Mellichamp J. 1994, "Software Development
March 19-21 2000, PP. 866-870. Cost Estimation Using Function Points", IEEE Trans. Software
[2] Justus S, Lyakutti K,( 2007) “Assessing the Object-level Behavioral Engineering, Vol. 20, No_4, April 1994, pp. 275-287.
Complexity in Object Relational Databases” IEEE International [11] McCabe T.J. 1976, "A Complexity Measure", IEEE Transaction on
Conference on Software-Science, Technology and Engineering. 30-31 Software Engineering, 2(4), pp. 308-320.
Oct. 2007 PP. 48 - 56
[12] Mario Piattini, Coral Calero and Marcela Genero,(2001) “Table
[3] Microsoftfice Online viewed on19 November 2008, Oriented Metrices For Relational Databases”, Software Quality Journal,
www.office.microsoft.com/en- 9, 79–97, 2001
us/access/HA102330311033.aspx?pid=CH100645701033

You might also like