You are on page 1of 35

Softwa re Me tric s a s Be nch m ar k s fo r Source Code Quali ty of Softwa re Sys te m s

Julien Rentrop August 31, 2006

One Year Master Course Software Engineering Thesis Supervisor: Dr. Jurgen Vinju Internship Supervisor: Drs. Patrick Duin Company or Institute: Software Improvement Group Availability: Public

Universiteit van Amsterdam, Hogeschool van Amsterdam, Vrije Universiteit

Con te nt s

SUMMARY

4

PREFACE

5

  • 1 INTRODUCTION

6

  • 1.1 Backg round

6

  • 1.2 Defin itions

6

  • 1.3 Resea rch qu estions and out li ne

7

  • 1.4 Disc lai m er

7

  • 2 BACKGROUND AND CONTEXT

8

  • 3 SELECTING SOFTWARE METRICS

10

  • 3.1 Sel ect ion cr iter ia

10

  • 3.2 Evaluation of softwar e

met ri cs

10

  • 3.3 Conclu sion

14

  • 4 BENCHMARKS

15

  • 4.1 Repr ese nting software

m etr ics i n b ench ma rks

15

  • 4.2 Validat ion app roach

17

  • 4.3 Conclu sion

19

  • 5 BENCHMARK DATA COLLECTOR

20

  • 5.1 Funct ional ity

20

  • 5.2 Design

20

  • 5.3 Sum mary

23

6

CASE STUDY

24

  • 6.1 Parti cipant s

24

  • 6.2 Judgments by as sesso rs

24

  • 6.3 Ben ch mark s

25

7

CONCLUSION

31

APPENDI X A: MEASUREMENT GUIDELINES

32

 

BIBLIOGRAPHY

33

Sum ma ry

Pref ace

Thi s p roj ect ha s b een cond uc t ed a t th e So ft ware I mprov em ent G ro up, to wh om I am

grat eful fo r p ro v iding m e an i nt eres ting en vi ro n ment to wo rk in . I w ou ld li ke to tha nk my coa ch es d rs. Pa t ri ck D uin and d r. Ju rgen Vi n ju fo r th ei r sugg est ion s and mot iva tio n d u ring t hi s in t ernship. Fu rt h ermo re I wou ld li k e to t han k d r. Ha r ro

Sto kma n a nd

d r. Tobia s Kuip ers

fo r th eir suppo r t and val uabl e id eas.

For revi ew ing ea r li er v ers ion s o f

th is t hes is I w o uld li k e to t han k Ti m Pr ijn .

Final ly I wo uld l ik e t o than k m y pa ren t s a nd my brot h er M ichel fo r t hei r supp or t.

1 Intr oduc tio n

  • 1.1 Background

Th e So ftwa re I mprov em ent G ro up ( SI G) is a c o mpan y that i s specia liz ed in t h e

analy s is o f la rg e so ftwa re s y st em s. B as ed on t h ese ana ly s es a rang e o f s erv ic es a nd produ ct s a re o ffered to t h ei r c li en ts. An exa mpl e i s th e aut oma ti c g enera tio n of docu m enta ti on for l ega c y s ys tem s. T he g en erat ed d ocum en tat ion h elps so ftwa re dev elop ers o f th e c li en t in u ndersta nding t h e so urc e cod e o f th e l egac y s y st ems. Ano th er examp le i s p er fo rming so ftwa re r is k as s essm ent s. A so ftwa re r is k ass es sm en t a s s es s es th e t echn ica l q ual it y of a s oftwa re s y st em. It i s bas ed on mea su rement s of t h e so urc e c ode and o n in tervi ew s of t he sta k eho ld ers in vo lv ed i n

th e p ro jec t. Th e result s SIG, w hi ch l ead s to th e

o f th ese mea su rement s are int erp ret ed b y th e exp ert s o f th e

id en ti fi ca tio n of

p robl em s a nd p os sib l e so lut io ns t o ov erc om e

th e ident ifi ed p rob lem s. Th es e a re rep or t ed to t heir cl ient s.

Th ere a re mu lt ipl e q u es tio ns t hat a c li en t can h a ve w hen th ey a s k th e S IG to

perfo rm a n a s s es sment. Th e

qu es ti on mig ht b e as b road a s wa nt ing to know w heth er

th e so ftwa re s y st em is ma in tainab l e in t he fut u re o r a s sp eci fi c as wan ti ng to k now

wheth er it i s ea s y to i mpl em ent a cha ng e in th e length o f ban k numb ers in t hei r sy st em. A qu esti on o ft en a sk ed by cl ient s i s: is t he qua lit y o f my so ftwa re s y st em co mparab le to ot her s oftwa re s ys tem s? Kn owing th e an sw er of t hi s q u es tio n h elp s

th e cl i ent t o d ecid e wh et h er

t h ey sho uld i mprov e th ei r

prac ti c es o r su sta in th eir cu r rent p rac ti c es .

cu r rent s oftwa re d evelop ment

  • 1.2 Definitions

Th e t erm s oftwa re met ri cs is u s ed a s a co ll ect iv e t erm to d es c rib e a wid e ra nge o f act iv it ies c on c erning mea surement s in so ftwa re eng in eer ing [Fento n 99] . In t hi s th es is t h e us e o f it i s rest r ic t ed to th e fol lowi ng cla s si c d efi nit ion:

De fi niti on : A so ftwa re m et ri c m ea su res p rop ert ies o f so urc e cod e.

With t hi s defi nitio n w e ex c lud e m et ri c s

tha t can be ga th ered in a so ftwa re proj ect

that a re n ot ba sed o n so urc e c ode. Th e g oal o f th is stud y i s to u s e m et ri c s to c omp are th e qual it y of a s oftwa re sy s tem agains t th e qua lit y o f a set of o th er so ft ware sy s tems . W e u s e b enchma rking a s a

method fo r

co mpa ri son . A b en chma rk is d efin ed as fol low s:

De fi niti on : A benchma rk i s a compa r is on o f a n organ izat ion’ s o r p rodu ct’ s

perfo rman c e

aga in st its p eers.

In th is st udy t he en tity t hat i s b enchma rked a re so ftwa re s ys t em s. T o b e m ore prec is e t he sou rc e cod e o f so ft ware s y st ems. Th e t erm per fo rma n ce i s a g ener ic t erm that i s u s ed a s an ind ica to r of go odn ess [S im 0 3 ]. T h e ad van tage o f ben chma rki ng is that it h elp s orga nizat io ns t o o v erc om e bl indn es s towa rd s oth er approaches t hat are mo re fit th en cu rrentl y empl oy ed.

1.3

Research questions and outline

Th e mai n res earch qu est ion is d efin ed as fol low s :

Is b en ch marki n g ba s ed on s oftwa re met ri cs a good m eth od to det ermi ne th e

main tai nabil it y of th e so urc e c ode o f s oftwa re sy st em s ?

Softwa re ri s k a s s es sment s, such a s p er fo rmed b y th e S IG, eva luat e softwa re sy st em s that a re in o r wil l ent er th e ma int en an c e p ha se. Th e ma jo r goa l of th es e ass es sm en ts is t o d et ermin e wh eth er th e s ys t em is u ndersta ndabl e and th erefore able t o k eep it s b us in ess valu e i n a chang ing en vi ro nm en t. Th i s st udy ’s p ropo s ed ben ch ma rk s fo r sou rc e cod e qual it y ca n b e u s ed as a to ol t hat h elp s as s es so rs in det erm in ing th e ma int ainab il it y of a s y st em .

 

Thi s ma in

qu esti on i s di vid ed in

sev eral sub

q u est ion s:

Whi ch s oftwa re met ri cs can i ndi cat e th e qua lity of a so ftwa re s y st em ’s sou rc e cod e? To an sw er t hi s qu es ti on we w il l in ves tigat e soft ware m et ri c s tha t a re in trodu c ed in lit era tu re. S el ect ion cr it eria a re d efin ed t o eva luat e th e m etr i cs. Th e resu lt s a re des c rib ed i n chap ter 3 .

How to rep resent s oftwa re m et ri c s in a b en chma rk ? Th ere a re d ifferent wa ys t o rep res en t b enchma rks . In

chapt er 4 we d efin e a s et

of rep res entat io ns and di s cu ss

th e ad van tag es a nd di sad vant ages o f ea ch.

How to des ign a too l that ga th ers m ea su rem ent data t o c reat e th e b en ch mark s? Th e S IG ha s d evelop ed a too ls et to aut oma ti cal l y m easu re t h e so urc e c ode fo r differen t progra mm ing la nguag es . A co nt rib uti o n o f th i s st udy i s th e d ev elopm ent of th e B enchma rk Da ta Col l ec to r too l tha t ext en ds SI G’ s too ls et to enabl e ben ch ma rk ing.

How to va lida te t h e result s o f t he p ropo s ed b enchma rk s ?

In chapt er 4

w e i nv est igat e m et hod s fo r th e vali dation in lit era tu re a nd p rop os e a

new m ethod t hat c an be u s ed to va lida t e th e p ropos ed benchma rk s in th i s st udy.

In chapt er 6

w e appl y and va lida te t h e p ropo s ed benchma rk s to 11 indu st r ial

us ed so ftware sy st ems w ritt en i n th e Java p rog rammi ng lang uage.

In chapt er 7 th e an sw er t o th e ma in res ea rch q u esti on i s

g iv en.

  • 1.4 Disclaimer

Thi s do cu ment con tain s softwa re m easurem ent valu es o f real l ife so ftwa re s ys t em s. As pa rt o f a con fid en tia lity ag reem ent t h e nam es of t h es e s oftwa re sy s tem s a nd th e co mpani es who crea ted t h em a re alt ered t o mak e th em a non ym ou s.

2 Ba ckgr ou nd an d Co nt ex t

Mea su rem en t i s need ed becau s e it h elps i n und ers tand ing a

pa rt i cula r ent it y o r

att ribu t e. Ev en w h en it ’s n ot c l ear h ow to m ea su re a cer tain a tt ribu t e, th e ac t of

propo si ng a nd d is cus si ng ab out it h elp s

in crea ti ng a

bett er und ers tand ing [Fento n

97]. T h e p ro mi s e

of so ftwa re m et ri c s is to h elp manag ers and engi n eers in d ec i sio n

mak ing bas ed on fac ts .

Th e top ic o f s oftware m etr i cs can be di v ided i n t wo pa rt s:

Prod uc t m et ri c s: M ea su res that qua nt ify att r ibut es o f a so ft ware sy st em ( th e

produ ct) . Exa mpl es of at tr ibu tes t hat ca n b e m easu red a re th e s iz e, co mpl exity and t h e a mou nt o f reu s e. Pro c es s m etr ic s: M easures o f th e p roc ess o f c reating a so ftware p rodu ct . Exa mpl es o f at tr ibut es tha t ca n b e m ea su red a re th e am oun t of t im e sp ent, defect s found and t h e stab il it y of t he requi rements .

Cur ren t litera tu re o f b en chma rki ng is main l y fo cus ed on p ro ces s m et ri c s. O n e o f t he main con t ribu to rs to b enchma rking so ftwa re p ro c es s es is Ca spers Jon es. Hi s boo k

inc lud es b enchma rk st udi es defect rat es w it hin d ifferent

on th e u s e of b es t p ra cti c es, p rod uc ti vity rat es a nd o rganiza ti on t ypes [ Jones 0 0].

Thi s stud y i s fo cu s ed on b en chm arki ng th e ma in tainab il it y of t he

sou rc e cod e o f

so ftwa re s ys t em s. T h e m et ri c s us ed a re d eri ved from t he sou rc e cod e. Man y so ftwa re m et ri c s hav e b een propos ed t hat c an mea su re proper ti es of t he sou rc e

cod e. A c omp reh en si v e ov ervi ew o f th ese met ri c s is Engin eeri ng In st itu t e [ SE I].

p ro vid ed by t he So ftwa re

Mos t o f t he l it erat ure abo ut s oftwa re met ri cs o n th e so urc e c ode i s withi n on e sy st em. I n co nt ras t, in thi s s tud y m et ri c s a re u s ed to c ompa re at th e sy st em l ev el . No lit erat u re wa s fo und t hat u sed th e b en chma rk t ermi nol ogy t o co mpa re so ftwa re sy st em s o n so u rc e c od e m et ri c s. How ever, w e d id find so me st udi es that a re cl os ely relat ed:

Th e NA SA ha s b ui lt a repo sito ry o f va r iou s m et r i cs fo r b oth p ro c edu ral and obj ect ori ent ed progra mm ing la nguag es [N AS A-1]. An o th er repo sit ory o f m et ri c s

crea t ed b y th e NA SA al so i nc lud es m et ri c s ab out

er ro rs a nd requ i rem ent s [NA SA-

2]. R es earch ers of t he N AS A d is s emina t ed man y o f th ei r resu lt s bas ed on t h ei r stud i es o n so ftwa re m et r ic s. H owev er, th ey d id not prov id e in sigh t in to how t h e repo sit ory o f m et ri c s fo r di fferen t so ftwa re s y st ems can be co mpared agai ns t ea ch ot her. A st udy i n w hi ch op en sou rc e p ro jec ts a re co mp ared aga in st each oth er and on e

orig ina ll y c lo sed sou rc e s ys t em agai ns t it s op en sou rc e su c c es so r [Sa molada s 04]. T h e Ma in tainab il it y Ind ex [C ol ema n 94] m etr ic wa s m ea su red i n ti m e ( fo r ea ch suc c ess iv e v ersio n) to d et ec t wh et her t he sy st em’ s so urc e c ode qua lity i s imp ro ving o r d et erio rat ing. Lit eratu re tha t propo ses a n ew m eth od or t ool , oft en a ls o co ntai n s a sec tio n in

whi ch th e p ropo s ed too l is appl i ed to a s et o f s y st em s. A n examp l e is th e propo sal o f a too l fo r d et ec ti ng d upli ca ted cod e i ndep end en t of t h e p rog ra mm ing languag e [Du cas s e 9 9]. In t hi s w o rk t he t ool is appli ed to a s et o f so ft ware sy st em s a nd th e result s a re in fo rmall y c ompa red again s t ea ch ot h er .

We exp ect th ere are s ev eral rea so ns wh y no b ench ma rk u s ing s ou rc e cod e m et ri c s hav e been p ropo s ed in l it erat ure y et. Fi rs t, it ca n b e ha rd to m ea su re q uality o f th e sou rc e cod e. S eco nd, it ca n b e ha rd to ob tai n th e so u rc e c od e o f s oftwa re s ys tem s to analyz e.

3 Selecti ng s of t ware m etric s

In th is chap ter t he an sw er to t he qu est ion “W hi ch so ftwa re m et ri c s ca n indi cat e t h e quality o f a so ftwa re s y st em ’s sou rc e cod e?” wil l be gi v en.

To an sw er t hi s qu es ti on we w il l in ves tigat e soft ware m et ri c s tha t a re in trodu c ed in lit erat ure. In th e fi rs t s ect ion w e d efi ne s el ecti on c r it eria to enab le a s el ec tio n. In t h e s ec ond s ect ion d es c rib e th e i ndi vidu al s oftwa re met ri cs and d et erm in e wh et her t hey mat ch th e sel ec ti on c rit er ia. T hi s chapt er co nc lu des w ith an sw eri ng th e q u es tio n ment ion ed abo ve.

 

3.1

Selection criteria

Th e fo ll owing s el ecti on c r it eria ha v e b een d efin ed in coop erat ion w ith t h e SI G:

Th e so ftware m etr i c mu st b e ea si ly exp lai nabl e t o cl ient s. T he b en ch ma rk s tha t

are ba s ed on th e sel ec t ed so ftwa re m et r ic s wi ll be p resent ed to cl ient s of th e SIG. Th e c li en ts do n ot ha v e in-d epth know l edg e of so ftwa re m et ri c s. I t’s

th erefo re n ec essa ry tha t th e ra ti onal e b ehi nd th e m et ri c s ca n b e ea s il y expla in ed.

Th e so ftware m etr i c

mu st b e app li cabl e to man y progra mm ing la nguag es . Ou r

int en t i s to enabl e th e crea tio n of b en ch mark s fo r d ifferent p rog ram mi ng languag es. We fi rst wan t to expl o re g en eral m et ri c s that a re app li cabl e to a ll prog ram mi ng languag es in st ead o f m et ri c s tha t are li mit ed t o onl y a li mited set of progra mm ing la nguag es . W e do ho wev er expl ic itl y no t d emand th at a m et ri c in

languag e X sho uld be co mparab l e to tha t sam e m etr i c in

prog ram mi ng prog ram mi ng

languag e

Y .

Th e so ftware m etr i c mu st b e a uto mat ica ll y ca l cu labl e f ro m th e so u rc e c ode. It

mu st no t b e n eces sa ry t o co mpi le a nd execu t e t he cod e, b ecau s e th is migh t not

be p ra ct ica ll y po ss ibl e becau s e no t a ll ex t ernal

d ep end en ci es a re ava ilab le so it’s

not po ss ibl e to crea t e a n en vi ronm ent to ru n a n d m ea su re it i n. Th e so ftware m etr i c mu st ha v e a st ro ng ba si s in l it eratu re. A st rong ba si s in lit erat ure wi ll en su re tha t th e appli cabi lity o f th e s oftwa re met ri c i s kno wn.

3.2

Evaluation of software metrics

 

3.2.

1

Lin e s of Cod e

Sin c e th e s ta rt o f so ftwa re engi neeri ng engin eers hav e b een co un ting t h e lin es o f cod e th ey w ro t e. Cou nti ng li nes i s u sed fo r est i mati ng th e amo un t o f ma in tenan c e requi red and it can b e us ed to n ormal iz e o th er s oftwa re m et r ic s [Ro senb erg 9 7].

In th e ear ly da ys w hen a s s emb ly p rog ram ming l anguages w ere u s ed th e no ti on of

lin e o f cod e was

si mpl e. How ev er w hen th ird g enerat ion p rog ram mi ng languag es

a

were int rod uc ed t he n ot ion o f a lin e o f cod e b ec ame ha rd er. T hi rd g enera tio n prog ram mi ng languag es ha ve p rog ram mi ng co n st ru ct s for st ru ct u red con t rol flow .

For exa mpl e th e b egin (‘{ ‘)

a nd end (‘ }’) to ken s o f b lo ck s in t h e C p rog ram mi ng

languag e. T o sta nda rdiz e th e c oun ti ng th e So ftw are Engi n eer ing In s tit ut e ha s publi sh ed a s et o f reco mm enda tio ns [Pa rk 92]. Broadl y two di fferen t co unt ing

method s ex is t: P hy s ica l Lin es of Cod e and

Log ic al L in es of Cod e.

3.2. 1.1 Eval uat ion

TO DO

3.2. 2 McC ab e’s Cyc lo mati c C o mpl e xity

In 1 976 M cCab e d efi n ed th e c yc lo mat ic comp l ex it y numb er m et r ic. Th e

mea su res th e nu mb er o f ind ep end en t path s th ro ugh a so ftwa re

mod ul e

m et r ic [M cCab e 7 6].

[McCab e 76] p ropo s es an upp er li mit of 1 0 fo r c yc lo mati c comp l ex it y b ecau s e high er valu es wou ld ind ica t e les s manag eab le and t est able m odul es. T hi s upp er lim it i s no t based o n empi ri cal res ea rch, h owev er mu lt ipl e real wo r ld p roj ect s c on fi rm that modu les with hig h er cy c loma ti c c ompl exity val u es o ften hav e m ore er rors and are les s und ers tandabl e [M cCab e 89].

Alt houg h c yc lo mati c

comp l ex it y is w idel y us ed, c ritiqu e on it ex is ts . [ Sh epperd 8 8]

cla im ed tha t it’ s bas ed o n po or th eo reti ca l fo und atio ns and an i nadeq uat e mod el of

so ftwa re d ev elop m ent . He al so

cla im ed tha t c y cl omat ic co mpl ex it y i s a

and o ft en o utp erform ed b y, l in es of cod e.

p ro xy fo r,

A reas on for th e w id e us e o f th e c y clo mat i c co m plexity met ri c is that it ca n b e ea s il y co mput ed by stat i c a nal ys i s a nd it’ s al ready wid el y u sed i n th e i ndu st ry fo r q ual it y con t rol pu rpo s es [F en ton 9 9].

3.2. 2.1 Eval uat ion

Th e c y clo mat ic co mpl ex it y m et r ic i s on e o f th e o ldest m et ri cs and i s s ti ll i n us e i n res ea rch and pra ct i ce today . Th ere a re how ev er , as d es c rib ed ear li er, m ix ed opi nio n s about th e m etr i c. Exp lai ning t hi s m et r ic t o cl i ent s entai l s explai ni ng th e co n cep t of

con t rol flow . W it h

a few exampl es it’s po s sib le t o exp lai n thi s m etr ic to c li en ts . Th e

so ftwa re m et ri c ca n b e cal cu lat ed fo r bot h p ro c edura l a nd

ob j ec t or i ent ed

prog ram mi ng languag es. T ool s are a vail abl e to det erm in e th e cy cl oma ti c co mpl exity. Th e c y clo mat ic co mpl ex it y ha s b een sel ec t ed to be a pa rt o f th e ben chma rk s.

3.2. 3 Obje ct O ri ent ed met ri cs

Th e O O approach mod el s t he wo r ld in t erm s o f o bject s. T hi s ext ends p ro ced u ral languag es that a re ba s ed on data fi eld s and p ro c edu res. Trad it iona l m et ri c s su ch a s cy cl oma ti c c ompl exity cann ot m ea su re O O c on c ept s such a s c la ss es, in h er ita nc e and

mes sag e

pa s sing [Ch idamb er 92] .

New m et r ic s ha v e b een d ev elop ed to m ea su re O O s y st em s. On e co mm onl y u s ed s et of O O m etr ic s i s Chida mb er a nd Kem erer’s su it e o f cla s s l ev el m et ri c s:

Weig ht ed M eth ods P er C la ss (WMC) WMC i s t he su m of t he sta ti c co mpl exity o f th e method s. Wh en a ll s tat ic co mpl ex it ies a re con s idered to b e unit y th en W MC can be d efin ed as t h e numb er of m et hod s.

Th e la rg er t h e numb er o f m eth ods t he g reat er t he i mpac t on sub c las s es. C la ss es with ma n y meth ods can be mo re appl i cati on sp eci fi c and th erefore ha rd er to reu se.

Dept h of I nh eritan ce T ree (DI T) When a cla s s is d eepl y n es ted it in h er it s mo re f rom it’ s an c es to rs. Th i s can inc rea s e th e c ompl exity o f th e cla ss .

Numb er of Ch ild ren (N OC) Clas s es that ha v e man y chi ldren a re ha rd to cha nge b eca us e o f th e tig ht coup li ngs with its chi ld ren.

Coupl ing B etw een Obj ect s (C B O) A h igh n umb er of coup ling s with ot h er cla s ses i s di sadva ntag eo us b eca u se wh en th e int er fac e o f a cla s s it i s c oupl ed to cha nges it n eed s to be modi fi ed a s w ell.

Resp on s e Fo r a

Cla s s (RFC)

RFC i s a mea sure o f th e in t era c tio n of a c las s with oth er c la ss es.

Lac k of Co hesi on i n M ethod s (LC OM)

Thi s m etr ic cal cu lat es th e u sage o f a c las s’ s att r i but es i n it s m et hod s. A c las s

lac k s co hesi v en es s w hen

m et hod s do n ot ma k e us e of it s a tt rib ut es.

An empi ri ca l in ves tigat io n in an acad emi c env i ro nm ent repo rt ed fiv e ou t of si x of

th es e m etr ic s to b e a us efu l p redi ct or t o cla s s

fa ult-p ro nen es s [Ba si li 9 6].

An ext ens iv e rep or t of t h e C hida mber and Kem erer ’s m etr i cs suit e a nd oth er O O met ri cs can b e fou nd in [A rch er 95].

3.2. 3.1 Eval uat ion Th e obj ect o ri en t ed m et ri c s p ropo s ed by Ch idam ber a nd Kem erer a re oft en refer red

to in l it eratu re. Th e m et ri c s can b e u sefu l qual it y indi ca to rs , b ut th ere a re s om e

lim ita tio ns . Fo r exa mpl e

sam e pac kag e,

di fferen t

Co upli ng B etw een Obj ect s pa ck ages and pac kag es o f

co un ts coup ling s to cla s s es i n th e ext erna l lib ra ri es a ll th e sam e.

Coupl ing s w it h a l ot o f ext ernal pa ck ages are m uch wo rs e th en coup li ngs t o int erna l packag es.

Th e m et ri c s are d es ign ed fo r t he O O pa radig m. Th e

g oal o f th is p roj ect i s to u s e

met ri cs t o c reat e benchma rks fo r a w ide n umb er of di fferen t progra mm ing languag es. Th e O O met ri c s a re th erefore no t s el ect ed.

3.2. 4 Dup lic at ed cod e

When cod e i s dupl ica t ed it ca n beco m e hard er t o

ma k e chang es b ecau s e o n e cha ng e

mu st al so b e made in all copi es. T hi s ta k es m ore ti m e a nd is a ls o er ro r p ro ne a s it’ s

ea sy t o fo rg et to mak e t he chang e i n mu ltip l e p l ac es.

On e way t o

m easu re dup li cat ed c od e

is b y p erfo rm ing l in e

ba s ed t ex t mat ch ing

[Bak er 95]. Ano th er wa y to

mea su re dupl i cat ed cod e i s b y mat ch ing la yo ut,

exp ress io n a nd con t rol flo w met ri c s [Ma y rand 9 6].

A so ftwa re s y st em can ha v e uniq u e so urc e cod e in multip le pl ac es tha t p ro vid e t he sam e fun ct iona lity. Th is i s ca ll ed c on c eptua ll y d upli cat ed c od e. Th e p rog ram mers o f

sy st em s w ith con c ept ual ly multip le t im es and did no t

dup li cat ed c ode ha v e prog ram m ed th e sa m e fu nc tio nal it y reu s e or cop y exi st in g c ode. C on cep tua ll y d upli ca ted cod e

cann ot b e d et ect ed au toma ti cal ly a nd is t herefo re d etec ted b y in sp ec tio ns p erform ed

by hu man s.

3.2. 4.1 Eval uat ion

Dupli cat ed cod e is a top ic that ha s got a lo t of a tt ent io n in s ci enti fic l it erat ure. Th e

con c ept o f d upl ica t ed cod e i s si mpl e: A l ot o f d u pli cat ed

cod e u nn ec es sary i nc rea ses

main t enan c e co s ts b eca u se t h e s ys t em i s la rger. T hi s co n cep t is ea si ly exp lai nabl e to

cl ient s. Dup li cat ed c ode can be d et erm in ed for a ll progra mm ing la nguag es . Th ere a re

ev en too ls a vai labl e ( su ch a s [Kett el eri j 05]) th a t ind ependent o f th e progra m ming la nguag e.

can d et ect dupl ica t ed cod e

3.2. 5 De ad c ode

Dead c ode i s c ode that i s n ever ex ecu ted . Hav in g dead cod e in c reas es th e amo un t of cod e tha t need s to b e mai nta in ed. T he p rog ram mers tha t mai nta in th e s ou rce cod e

migh t not know w heth er th e c ode i s

d ead o r sti ll u s ed. D ead cod e mak es it

ha rd er to

main tai n

th e s y st em’ s sou rce

cod e.

A reas on

for d ead cod e t o rema in i n

th e s ou rc e cod e i s that p rog ram m ers a re a f raid

that t he sy s tem m ight b rea k witho ut it. A c omp reh en si v e un it t est suit e can gi v e mo re a ssu ran c e that t h e s ys t em do esn’t b reak with out t h e d ead c ode. Ano th er reas on for keep ing dead cod e i s that p rog ram m ers wan t to b e ab le to resto re old cod e. Keepi ng old c ode sho uld n ot be n ec es sa ry beca u se v ersi on c ont rol s y st em s c an ea sil y rest ore o ld cod e.

3.2. 5.1 Eval uat ion

Th e p erc entag e o f d ead c od e in a so ftwa re s y st em is a u s eful ind ica to r o f th e q ual it y

of th e sou rce cod e. A p robl em how ev er i s tha t d ev elopi ng

a t ool that can

auto mati ca ll y d etermin e wh i ch par ts o f th e sou rc e cod e i s d ead i s fa r f ro m ea s y. O ne of th e p robl ems i s tha t dy nam ic con st ru c ts i n a prog ram mi ng languag e a re ha rd t o res ol ve.

Ano th er

p rob lem i s tha t a ll ent ry p oin t s mu st b e ava ilab le. In

p rac ti c e th e S IG d o es

not alwa y s hav e a cc ess to th e s ou rce cod e of ex terna l s y st em s. T h es e ex t ernal

sy st em s can b e ent ry po int s. I f no t all en try poi nts a re a va ilabl e sou rce det ect ed as d ead w h en it isn’t.

cod e ca n b e

3.2. 6 Dat ab ase metri c s

A lot o f res earch ha s b een fo cu s ed on t he mea surem ent o f so urc e c ode o f progra m s. How ev er man y in fo rmati on s y st ems m ak e ex ten si v e us e o f da tabas es and th erefo re mea su r ing th e qua lit y o f th e databa se st ru c tu re is i mpo rta nt as w el l. T o m easure t he main tai nabil it y of a databa s e th ree si mpl e m et r i cs ha v e been p ropo s ed [Cal ero 01]:

Numb er

of tab les

Numb er

of col umn s

Numb er

of fo reign k ey s

3.2. 6.1 Eval uat ion

Softwa re s ys tem s oft en ma k e us e o f da tabas es. A b en ch ma rk t hat can ind ica t e th e quality o f th e databa s e st ruc tu re w ould t h erefo re b e d es i rabl e. Res ea rch on th e defin iti on o f so ft ware m etr ic s fo r da tabas es i s h owev er st il l in it s i nfa nc y. T h e m et ri c s propo s ed b y [Cal ero 0 1] can b e us efu l to c ompa re a nd s el ect f ro m di fferen t da tabas e

mod el s d esig ned fo r

on e

p rob l em do main bu t a re no t us eful a s ind ica to rs fo r qual it y

to co mpa re databa s e s ch emas d esign ed for d ifferent p robl em d oma in s. T he met ri cs can b e us ed as an ab sol ut e s iz e indi cat o r su ch a s tota l li n es o f c ode fo r so urc e c ode.

3.3 Conclusion

In th is chap ter w e d efi ned s el ec ti on c r it eria and appli ed th em t o a n umb er of differen t m et ri c s. Ba s ed on th e sel ec ti on we fou nd t h e lin es o f cod e, c y cl omat i c co mpl ex it y and cod e dup li cat ion m etr i cs u s eful q uality i ndi cat ors o f a so ftwa re sy st em’ s so urc e c ode.

4 Be nch mar k s

Thi s chapt er c on si st s of t wo pa r ts . Th e fi rs t s ecti on w il l a ns wer t he qu est ion “H ow to repres ent so ftwa re m et ri c s in a benchma rk ?”. T o valida t e th e propo sed repres enta ti on s, th e s eco nd s ec ti on wi ll an sw er th e q uest io n “ How to val idat e th e results o f th e propo sed b en ch ma rk s ?”.

4.1 Representing software metrics in benchmarks

In th is pa rag raph w e p ropo s e th ree d ifferent wa ys t o rep res en t a b en ch ma rk u si ng th e m et ri c s s el ect ed in t h e p rev io us chapt er.

4.1. 1 Ord e red tabl es

Som e o f th e m et r ic s di scus s ed in t h e p rev io us chapt er mea su re proper ti es a t th e modu le l ev el. Th e mod ul e l ev el m ean s c las s es o r m et hods in Ob jec t O ri en t ed languag es a nd unit s a nd p ro c edu res i n p ro c edu ral p rog ram mi ng languag es. A con c ret e exa mpl e i s M cCab e’ s c yc lo mat ic comp l ex it y tha t i s cal cu lat ed p er m ethod i n Java.

In th e b en ch mark s p rop os ed h ere, s ys t em s a re co mpared

and no t mod ul es w ith in

one sy s tem. T o us e t hes e mod ul e l evel m et ri c s we n eed a wa y to li ft th e va lu es o f th e modu l e l ev el t o th e s y st em lev el. Th is l ifti ng i s ca ll ed agg regat ing. Th ere a re a

numb er of di fferen t aggregat io n fu nc tio ns su ch as a v erage, m edian , su m, ma x a nd min. Th e sum i s no t us efu l b ecau s e th e s ize o f a n appl i cati on wou ld a ffec t th e mea su rement resu lt s. T h e max a nd min fun c tio n wo uld ma k e th e resu lt ba sed o n onl y on e mod ul e. I n thi s res ea rch w e u s e th e av erage agg regati on fun ct ion t o li ft met ri c val u es t o th e s y st em l ev el.

As d efin ed in t h e int rod uc ti on a b en ch ma rk is a co mpar is on o f a n orga nizat io n’s o r

produ ct ’s p erfo rman c e aga in st its p eers. In

indi vid ual m et r ic s as

p erforman c e (q ual it y)

th e ben ch ma rk p res en ted h ere we u s e indi c ato rs and t he co mpar is on i s

pres en t ed b y s or ti ng th e m et ri c va lu es. A s y st em that ha s a hig h po s iti on i n th e tabl e mea ns t hat it’ s th e best in th e b en ch mark and v ic e v ersa. A dum m y examp le i s pres en t ed b el ow wit h fo u r d ifferent s y st ems na med A , B, C a nd D:

Syst e m n ame

Met ri c v al ue

D

2

A

8

C

9

B

14

In this (dummy) benchma rk system D s cores best, A and C are in the middle and B is worst.

Now w e u s e th e s oftware m etr i cs sel ec t ed in th e previ ou s chap t er, d efin e t he l ev el o f mea su rement, cho os e an aggrega tio n fu n cti on a nd d et erm in e how t h e met ri c resu lt s sho uld b e o rd ered:

Be n ch mark

Met ri c

Lev el

Aggreg ati on fu nct io n

Ord e r

B1

Lin es o f c ode

Met hod

Av erag e

As c endi ng

B2

Lin es o f c ode

Clas s

   

B3

Cyc lo mat ic co mpl ex it y

Met hod

Av erag e Av erag e

As c endi ng As c endi ng

B4

Cyc lo mat ic co mpl ex it y

Met hod

Perc entag e of method s below th resh old *

As c endi ng

B5

Cyc lo mat ic co mpl ex it y, Lin es o f c ode

Met hod

Perc entag e of cod e belo w th reshold *

As c endi ng

B6

Mainta inab ilit y Ind ex **

Sy st em

Non e

Desc ending

B7

Code dup li cat ion

Sy st em

Perc entag e of dupli cat ed c ode

As c endi ng

* In [McCabe 76] the threshold value 10 is proposed to indic ate modules that are of low quality ** Maintainability index is a compound metri c that is based on the averages of lines of code, cyclomatic complexity, Halstead volume and com ment percentage [Col eman 94].

We exp ect tha t s om e of t h es e b en ch mark s w il l result in h ighl y s im ila r resu lt s:

Lin es o f c ode a t th e m et hod l ev el (b en ch ma rk 1 ) and li n es o f c ode at t h e cla s s

lev el (ben chma rk 2) i s exp ec t ed to be mak e a cla s s la rg e a s well.

id en ti cal beca u se la rg e m et hod s w ou ld

Cyc lo mat ic comp l ex it y (b en chma rk 4)

a nd

c yc lo mati c c omp lex it y w it h

lin es o f

cod e (b en chma rk 5) a re exp ec t ed to be si mi lar.

Th ese exp ecta ti on s w il l b e i nv est igat ed in chapt er 6.

4.1. 2 Co mbi n ed b en c h mark

In th e p revi ou s pa rag raph 7 di fferent benchma rks a re p rop os ed. Mu lt ipl e ben ch ma rk s are u s eful b eca us e t hey ca n g iv e a n i ndi cat ion o f di fferen t a sp ec ts o f

quality. A s ys tem m ight fo r examp l e hav e h ighl y

c omp lex m et hod s b ut

do es hav e a

low p erc entag e o f d upli ca ted cod er. How ever it would b e int eresti ng

if t h e result s o f

th e d ifferent b en chm ark s

can b e co mbi n ed in on e b enchma rk to get a n ov eral l v iew.

Th e un it o f m easu rem ent o f ea ch b en ch ma rk i s differen t. S impl y c al cula ti ng th e

mea n of cal cu lat e

al l m ea su rem en ts i s t herefo re n ot va lid . T o ov ercom e t hi s p rob l em w e ca n

th e di fferen c e b etw een th e m ean and ea ch s y st em and di vid e t hi s b y

th e

sta ndard d ev iat ion. No w w e ha v e on e un it o f m easu rem en t fo r al l b en chma rks t hat

tell s how ma ny sta ndard d ev iat ion s a m easu rem en t is awa y f ro m th e m ean. An averag e ca n b e cal cu lat ed fo r an y co mbi nat ion o f b en ch ma rk s. W hi ch ben chma rk s sho uld b e co mbin ed wi ll b e in v es tiga ted i n chap t er 6.

4.1. 3 Hi sto grams

In th is pa rag raph w e p ropo s e

to u s e hi st ogra m s as a rep resenta tio n tha t can

co mpare t he di st r ibut io n o f l in es o f c ode p er cla ss . To crea te t h e hi stog ra m a ll

sy st em s th e li nes o f cod e p er cla s s m easurem en ts a re pa rt iti on ed in bi n ra nges. W e us e 1 1 b in s ea ch with a widt h of 5 0. W e us e rel ati ve freq uenc i es i n st ead of a ctua l

freq uenc i es b ecau se t h e siz es of th e s ys t em s his tog ram th e Y-a xi s run s f rom 0% t o 10 0 %.

m ea su red thi s s tud y va ry. In a relati v e

Ou r approach is to co mpa re o n e s ys t em’ s di st rib utio n

again st t h e b en chma rk. Th e

ben ch ma rk i s th e a verag e di st rib uti on o f a ll o th er s ys t em s.

A sig ni fi can t d ifferenc e o f th e di st rib ut ion o f a so ftwa re s ys t em and t he b en ch ma rk does not d irect ly t ell tha t it ’s wo rs e or b et ter t hen a v erag e. S o it do esn’ t exac tl y fit

th e d efin it ion o f a b en ch ma rk . How ev er a sig ni fi can t d ifferenc e i s in t erest ing does d es erv e att enti on o f so ftw are as s ess ors t o det erm in e th e cau ses o f th e differen c e. I n th is wa y us ing m ea su rem en t resu lts o f oth er s y st em s t hat are analyz ed h elp s i n a nal yzing t h e cu r rent sy s tem.

and

To quan ti fy th e d ifferen ce b etw een t he b en ch ma rk and o n e s ys tem w e

adop t ed th e

His tog ram D ifferenc e M easu re (HD M) f rom [Cu i 06]. HDM i s ca lcula t ed a s fo ll ows:

Giv en two hi s tog ram s wit h th e sa m e nu mber an d bin w idth s: O n e s ys tem’ s

his tog ram

a nd th e b en chma rk hi stog ra m. Bi n d ifferen c e i s th e abs olu t e d ifferenc e

betw een tw o b in s ( th e s y st em ’s and t h e b en ch m ark ’s bi n). Th en hi st ogra m d ifferen ce

co r respo nds t o th e sum mat ion o f b in di fferen c es . HDM i s t he n ormal iz ed

h i stog ra m

differen c e. HDM ha s an i nt erva l of 0 t o 1 wh ere 0 indi cat es tha t it ’s t otal l y d ifferent

and 1 indi cat es a p erfect mat ch. A p erfect ma tch i n th is cas e m ea ns t hat t he sy s tem is equal ly di s tr ibu ted a s th e benchma rk.

4.2 Validation approach

Softwa re m et ri c s a re u sed fo r d ifferent goa ls. So me o f th es e g oal s a re for find ing mod7 ul es that a re li k el y to ha v e er ro rs [M enz ies 0 2], fi ndi ng modu l es t hat a re ha rd to t es t a nd fi ndi ng modu l es t hat a re ha rd to u nd ers tand and th erefo re ha rd t o chang e [M cCabe 76].

4.2. 1 Lite rat ure app roac he s

In th is pa rag raph w e d is cu s s app roa ch es u s ed in l it eratu re to valida t e th e u se o f so ftwa re m et ri c s.

4.2. 1.1 Int ervi ewi ng d ev elop ers

Thi s app roa ch c on si st s of ru nni ng m easu rem ent s on on e o r mo re s y st em s. F rom th es e m ea su rem en t s om e va lu es a re s el ect ed, for exa mpl e th e h igh es t and low est valu es. I nt erv i ews w ith d ev elop ers o f th e s oftwa re s ys t em a re cond uc t ed to find i f th e opin ion o f th e d ev elop ers c on fi rm s t he met ri c val u es su ch as in [McCab e 7 6] a nd [Chidamb er 96] .

Thi s app roa ch ca n b e appli ed t o co mpa re met ri c s a t th e mod ul e l evel. T h e

dev elop ers o f a so ftwa re s y st em can s tat e wh et her th ey find on e mod ul e eas ier o r

hard er to und erstand and modi fy th en o th er mo dul es. H ow ev er, u sing

int ervi ew s

with d ev elop ers t o val idat e c ompa ri so ns at t h e s ys t em l evel i s ha rder. A s king

 

dev elop ers wh eth er th ey find th e s ys t em s th ey main tai n to be will n ot b e us eful:

ea sy o r

ha rd

to mod ify

A so ftwa re s y st em can b e hug e. A

dev el oper m i ght onl y b e a n exper t of a par t of

th e s ys t em. Th e d ev elop er ca nno t sta t e th e qual it y o f t he wh ol e s y st em .

 

Res ea rch show ed tha t a c ru cia l fa cto r i n so ftwa re mai nt ena nc e i s a s tabl e main t enan c e t eam [SW EB O K 0 4]. D ev elop ers th at ha v e wo rk ed a lo ng ti me o n

main tai ning a s y st em ga in a lot o f k nowl edg e ab out it a nd w il l th erefo re find it ea sy t o modi fy. A so ftwa re dev elop er ha s on ly wo rked o n a l im it ed a mo unt o f di fferent sy s tem s.

A

d evelop er’ s op in ion abo ut th e qua lit y o f th e s y st em is a ffect ed by t he sy s tem s

th e

d evelop er had w o rk ed on. It’ s po ss ibl e tha t all of th em a re qu it e good o r bad

but in t he d ev elop er’s eyes s om e ca n b e wo rs e and som e a re b et t er .

We th erefo re con c lud e tha t int ervi ewi ng d ev elop ers i s no t suit ed for val idati ng th e propo s ed b enchma rks.

4.2. 1.2 Cor rela ting w ith di s co v ered bug s

Ano th er app roa ch i s to co mpare t he mea su rements w ith t h e a mo unt o f bugs repo rt ed such a s in [ Bas il i 96] and [M enzi es 02] .

Thi s app roa ch i s app li cabl e for o n e s ys tem o r fo r al l s ys tem s in on e o rgan izati on, bu t is fa r m ore di ffi cu lt to do a cros s di fferen t o rga ni zation s. Th ere a re mu ltip l e rea son s fo r th is di ffi cu lt y:

Different o rganiza ti on s reco rd (o r do no t reco rd at a ll) bug da ta in di fferen t way s

that can ma k e co mpa ri son s ac ro s s o rganiza tio ns i mpo ss ibl e. Bug s can b e d etec ted at var io us stag es su ch a s dur ing d ev elop m en t, a ft er

dev elopm ent and wh en t h e s ys tem w ent i n p rod uct io n. Th e du rat ion o f th es e stag es can b e d ifferent fo r ea ch softwa re s ys t em a nd th erefo re a co mpar i son

would b e in val id.

Fur th ermo re th e goal o f ou r b en chma rk s is u s ed to d et erm in e wh et her sou rc e cod e i s

main tai nabl e a nd not

abo ut bug s/c or rect nes s. W e th erefore con cl ude that co r relat ing

dis co v ered bug s i s not suit ed fo r va lidat ing t he propo s ed b enchma rks.

4.2. 2 Ou r v ali dat io n app roac h

We n eed to va lida te i f t he p ropo s ed b en chma rks gi v e a n a c cu rat e o rder ing o f th e

so ftwa re s ys t em’ s s ou rce cod e q ual it y. A s val ida tion app roa ch es in l it erat ure

a re

based o n co mpa ri son s with in on e s y st em w e defin ed a n ew va lida tio n app roa ch that is bas ed o n a compa r is on wit h judg ment s gi ven by th e as s es so rs.

Th e judg ment s o f th e a s s es so rs ca n b e found i n ass es sm en t d ocum en t s. P er s y st em

ass es sed o n e do cu m en t is w ritt en . An

as s essm ent d ocum en t repo r ts o n th e q ual it y

of di fferent subj ect s o f a so ftwa re s ys t em, fo r examp le t h e a rch it ectu re/d es ign, sou rc e cod e, (u nit) t est ing, do cu menta tio n a nd t ool u sag e. In o ur valida ti on approa ch w e onl y lo ok at t h e judgm ent s g iv en a bout th e qua lity o f so u rc e c ode. Th e

oth er sub jec ts a re out sid e t he sc op e o f t hi s st ud y. Th e j udgm en ts i n th e rep or ts writt en i n nat ura l lang uage. As h uma n languag e i s a mbig uou s w e rew rite t h e judgm en ts in nat ura l lang uage to a fi x ed s et o f j udgm ent s: hig h q ual it y, no rma l quality and l ow q ual it y.

a re

Judgm ent s o f a s s es so rs of t h e SIG a re ba s ed on th e i nt erpretat ion o f m easu rem en t results. Th e m et ri c s u sed b y a s s es so rs a re o ft en th e sam e as t h e m et ri c s us ed i n th e

ben ch ma rk s: Lin es o f C ode, M cCab e a nd Cod e d upli cat ion . Th e

as s ess ors do ha v e

mo re t ool s t o d et erm in e t he qua lity o f th e s ou rc e cod e l ik e focu sing m etr i cs o n spec ific pa rt s of a sy st em a nd man uall y i nsp ect i ng t h e so urc e cod e.

To gi ve a judg m en t th e a s s es so rs imp li citl y c om pare th e m ea su rem en ts o f th e cu r rent sy st em w it h m easurem ent s o f oth er s y st em s th ey ha ve a s ses s ed. It’ s

pos sibl e t hat t he b en ch ma rk s are m ore a ccura t e th en th e judg ment s o f a s ses so rs a s

one a s ses so r d idn’ t a s s es s al l s ys t em s a nd mig h t n ot

ass es sm en ts .

rem emb er

va lu es from p rev iou s

4.3 Conclusion

Th ree di fferent way s to b en ch ma rk so ftwa re s ys tems ha v e been p ropo s ed. T h e fi rst one con s is ts o f o rdered tab les wh erein t h e po s iti on of t he so ftwa re s y st em i n th e tabl e rep res en ts it s q ualit y aga in st ot h er so ftwa re s y st em s, t he s ec ond on e c on si st s of u sing multip le b en ch ma rk s to crea te o n e co m bin ed b en ch ma rk and t h e thi rd on e us es hi stog ra ms w hi ch ca n b e v i sual l y in spec ted to find di fferen c es in t he di s tr ibu tio n of s ou rc e cod e in s oftwa re sy s tem s.

By stud yi ng litera tu re w e found two approaches to val idat e s oftwa re met ri c s. We fou nd that th es e app roa ch es a re not appl i cabl e t o valida t e th e benchma rks p ropo s ed in th is stud y. A n ew valida ti on m et hod ha s b een propo sed t hat i s bas ed on judgm en ts o f as s ess ors t hat wi ll be u s ed to valid ate t he p ropo s ed benchma rks .

5 Be nch mar k D at a Collec t or

Thi s chapt er des cr ib es t h e g oal s a nd d esig n o f t he

b en ch mark data c ol lec to r. F i rs t,

th e da ta tha t is col l ec t ed is di s cu s sed. Seco nd, we des c rib e th e des ign co n sid erati on s that a re made.

Th e So ftwa re I mprov em ent G ro up has bu ilt th e Sy st em Ana ly si s T ool kit ( SA T) to analyz e softwa re s ys t em s. A mong o th ers it co nt ain s th e imp l em en tat ion o f so ft ware

met ri cs su ch a s lin es writt en i n Ja va.

o f cod e, c y clo mat i c co mpl exity

and

c od e d upli cat io n. Th e S AT is

 

5.1

Functionality

Th e g oal

o f th is stud y i s to b en ch mark s oftwa re sy st em s

a ch i ev e th i s goa l two comp on en ts mu s t b e ava ilabl e:

b y us e o f m et ri c s on sou rc e

cod e. T o

A fi ll ed da taba se w ith met ri cs o f s oftwa re sy st ems

 

A vi ew o f th is databa s e tha t sh ows th e b enchma rk results

Thi s chapt er dea ls w ith t h e

fi rst

po int b y d ev elo ping a t ool that col l ec ts mea surement

data and sto res it i n on e da taba se. Th e s eco nd point is a chi ev ed b y s tanda rd too l s su ch a s th e My SQ L co ns ol e appli cat io n fo r runn i ng S QL qu eri es and Mi c ro so ft E xc el

fo r p resent ing th e m ea su rem ent s in tab les and char ts. Th e dev el opm en t or sel ec ti on of a fu ll fl edged p res en tat ion appl i cati on i s ou ts i de th e s cop e o f th is p roj ect.

Cur ren tl y th e B en ch ma rk Da ta C oll ect or can col l ect data for so ftwa re s ys t em s prog ram m ed in Java and i n C O B OL. I n th e fu tu re th is t ool can b e ext end ed s o it can be us ed fo r oth er progra mm ing la nguag es a s well.

 

5.2

Design

5.2. 1 Entity R el ati o ns hip ( ER ) mod el

To des c rib e th e data th at i s st o red t h e fo llow ing E/R data m od el i s c reat ed:

E/R Diag ra m In th is di agra m fo u r en titi es

E/R Diag ra m

In th is di agra m fo u r en titi es a re d efi ned . S OFTW ARE_ SY S TEM c onta in s att r ibut es to identi fy a nd cat eg or ize a s oftwa re s ys tem. T h e a tt ribu t e own erNam e d efin es th e co mpan y o r o rgan izati on t hat own s th e sy st em. Th e att r ibut e ma in tain erNa me can

be th e sa m e a s t he ow n erN am e, but i s di fferen t in ca s e of o ut sou rci ng. St or ing th es e

att ribu t es i s rel eva nt beca us e it w ould b e in terest ing

to b en ch mark al l s ys t em s

dev elop ed b y o r fo r on e ow ner o r mai nta in er ag ain st ea ch o th er. T he sy s tem i s

cat ego riz ed by it’ s indu s try Typ e (for exampl e Fi nanc e, P ubli c and I ndu st ry) and it’ s tech ni cal T ype ( fo r examp le Standa lon e appl ica ti on, Web appl i cati on and Mai nf ra me).

A so ftwa re s y st em can c on si st o f mu lt ipl e programmi ng lang uages. I n th is data mod el m ea su rem en t resu lt s a re s epa rat ed per p rog ram mi ng languag e. Fo r exa mpl e one so ftwa re s y st em i s p rog ram med i n C a nd A SM a nd has a SL OC o f 10 .00 0 for C and a SL OC o f 5. 00 0 fo r AS M.

Th e

MEA SUR EMEN TS en tity ha s two at t ribu t es: met ri cT yp e a nd or igi nT yp e. T he

met ri cT yp e a tt rib ut e i s u sed fo r st or ing t he n am e of t h e m et ri c, for examp le L in es o f Code. Th e o rigi nT yp e a tt r ibut e s to res t h e l ev el o f m easurem ent , fo r exa mpl e s y st em

lev el and m ethod l ev el.

5.2. 2 Clas s di ag ram

Th e B en ch ma rk

Data C oll ect or i s w ritt en i n Ja va . T h e fo llow ing diag ra m g iv es a n

ov erv iew o f t he cla s ses and th ei r rela tio ns:

Clas s diag ram o f B en ch ma rk Data C oll ect or

Clas s diag ram o f B en ch ma rk Data C oll ect or

5.2. 3 De sig n co n sid erat io n s

Dur ing th e d ev elopm ent we made a nu mb er o f c ons id era tio n s a re mad e tha t a ffec ts th e d esign . Th e m os t imp or tan t a re d es cr ib ed b el ow:

5.2. 3.1 Re-u si ng co mpon en t s of t he SA T

A goa l wh en d ev el oping t h e B en ch mark Data Co ll ec to r was t o mi ni miz e th e amo un t of n ew so u rc e cod e that n eed s to be w ritt en . M o re so urc e c ode wo uld c os t mo re ti m e to d ev elop and wo uld req ui re mo re ti me to mai n tain. To red uc e th e a mou nt o f new

sou rc e cod e th at n eeds t o b e w rit t en w e ch os e t o ma k e u se o f c ompo n ent s t hat are alread y a vai labl e o r u sed i n SI G’ s Sy s tem Ana ly si s To ol kit:

Softwa re m et ri c s: Th e s oftwa re met ri cs a vai labl e in t h e SA T.

Pers i st en c e: Th e Hib erna t e obj ec t/ relat iona l p ersi st en c e s ervi c e [H ib ernat e]. T o

si mpli fy th e in t era c tio n w it h Hib ernat e w e us e u t ility cla s ses o f th e S AT. Fil e fi lt ers: To s elec t w hi ch fil es are ana lyz ed an d whi ch a ren’ t we u s e fil e filt ers of th e Apa ch e’ s Jaka r ta C om mon s proj ec t [Jaka r ta].

Inpu t fo rmat: T he i npu t o f t he Benchma rk Data Coll ecto r i s a fi le [Sp r ing]. S ee pa rag raph 5. 2. 3.2.

Sp ring

con fig ura ti on

5.2. 3.2 Inpu t: F lexib il it y b y c on figu ra tio n or ha rd- cod ed Th e inp ut o f th e b enchma rk da ta co ll ecto r i s a S pri ng XML c on figu ra tio n

fi l e. Sp ring i s

a Java fra m ewo rk that min im izes ha rd c oded d epend en ci es to i nc rea s e modu lar izat ion and t estab ilit y [ Spr ing]. Sp ri ng u s es t he In versi on o f C ont ro l (al so cal led D ep end en c y Inj ect ion) pat t ern.

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"> <beans> <bean class="java.lang.String" id="rootDir"> <constructor-arg><value>/home/julien/sources/A</value></constructor-arg> </bean>

<bean class="org.apache.commons.io.filefilter.SuffixFileFilter" id="fileFilter"> <constructor-arg><value>.java</value></constructor-arg> </bean>

<bean class="software_improvers.model.benchmark.SoftwareSystem" id="softwareSystem"> <property name="systemName"><value>System A</value></property> <property name="ownerName"><value>Company A</value></property> <property name="industryType"><value>Finance</value></property> </bean>

<bean class="nl.sig.network.benchmark.JavaBenchmarkNetwork" id="network"> <property name="softwareSystem"><ref bean="softwareSystem"/></property> <property name="rootDir"><ref bean="rootDir"/></property> <property name="fileFilter"><ref bean="fileFilter"/></property> </bean> </beans>

A XML configuration file

Th e XML con fig u rati on fil e o f th e b enchma rk da t a c ol lect o r is u s ed to d efin e whi ch cod e sh ou ld b e anal yz ed a nd ent er in fo rma tio n of th e softwa re s ys t em. A

con fig u rati on fil e i s

mad e fo r each s ys t em

t hat i s a nal yz ed.

A qu est ion rai s ed

d u ring t h e d evelop ment

wa s: Shou ld it b e p os s ibl e to

c on figu re

met ri cs i n th e XM L co nfigu rat ion fil e? W e fi rst t h ought t hat it w ou ld mak e t he t ool

mo re fl exibl e b eca us e it w ou ld b e po ss ibl e to ad d mo re m et r ic s and

s et d ifferent

cou nt ing opt ion. H ow ev er we fou nd that int rod u cing th es e con fig ura ti on opt ion s

would i nt rodu c e probl em s:

When th ere a re co nfigu rat io n op ti on s it wi ll b e p os sib le t hat m etr i c val ues a re cal cu lat ed di fferent p er s ys t em. Fo r exa mpl e fo r on e s y st em th e mi ni mu m cl on e siz e tha t is d et ect ed i s co nfigu red a t 10 lin es an d for o th er s ys t em s it’ s 6 l in es.

Th ese di fferen c e c on figu ra tio ns wo uld ma k e co mpar is on s in val id. Configu ri ng tak es ti m e f ro m th e u ser. U s ing th e Benchma rk Data Col l ec to r shou ld tak e as l es s ti me a s p os s ibl e.

Th e B en ch ma rk Data C oll ect or mu st promo t e st andard izati on in st ead o f con fig u rati on. W e ha ve t h erefo re ch os en to defi ne wh ich m et ri c s a nd opt ion s are us ed in Ja va cod e u sing con s tant s s o it’ s i mpo ss ibl e fo r th e u s er to u se di fferent met ri cs o r opt ion s.

5.3 Summary

In th is chap ter w e d es cr ib ed th e d esig n of t he B en ch ma rk Data Co ll ect or. Th e Benchma rk Data Col l ec to r c oll ect s m ea su rem en t s result s f rom th e so urc e c ode fo r so ftwa re s ys t em s wr it t en in Ja va or C OB OL .

6

Cas e st ud y

In th is ca se stud y w e app ly and valida t e th e ben ch mark s d efi ned i n chapt er 4. Th e Benchma rk Data Col l ec to r to ol des c rib ed in chap ter 5 i s u sed t o ga th er th e mea su rement va lu es.

6.1

Participants

Th e b enchma rk co ns i st s of 1 1 so ftware sy st ems writt en i n Ja va. Th e fu nc ti onal it y pro vid ed, th e i ndu st ry typ e and s iz es o f th es e s ys t em s a re di v ers e.

 

Softwa re Sy st em

Desc r ipti on

Siz e in SLO C

A

An admi ni st ra ti ve appl i cati on for a go vernm en ta l organ izat ion.

71.

31 3

B

Appli cat ion B enabl es mi c ro pa y ment s on i nt ern et sit es.

79.

61 0

C

Suppo rt s th e o rgan izati on i n managi ng d eb it s. T he appli cati on c onta in s bat ch job s to au to mati ca ll y inv ok e act ion s, l ik e s ending l ett ers.

193

.5 51

D

Admi ni st ra ti ve appl i cati on for ban k ing p rodu ct s us ed b y regio nal o ffic es.

95.

98 7

E

Admi ni st ra ti ve appl i cati on for lea s es of cars

28.

33 4

F

Admi ni st ra ti ve appl i cati on for an o rga nizat ion i n th e

1.3

93.

55 1

G

tra nsp or t indu s try. A ba ck- offi c e appli cat io n fo r proc es sing po sta l it em s.

102

.8 02

H

An in t er fa ce appl i cat ion t hat prov id es cli ent s in formati on about th e stat u s o f t hei r ord ered p rod uc ts .

45.

66 1

I

Admi ni st ra ti ve appl i cati on t hat man ages perso n al info rmat io n fo r a go v ern menta l orga nizat io n.

20.

16 7

J

Thi s appl ica tio n i s d ev elop ed and u sed b y th e S I G fo r th e analy s is o f s oftware sy st ems .

72.

74 3

K

An appli cat io n bu ilt fo r t he i nsu ran c e indu st ry. T he appli cati on c onta in s an adv ic e m odul e and supp ort s t he regi st rat ion o f in fo rmat ion abou t p erson s.

201

.0 52

6.2

Judgments by assessors

 

As d es c rib ed in pa rag raph 4. 2.2 w e u se t h e judg ment s of a ss es so rs to co mpa re w ith th e results o f th e b en ch mark s. T he tabl e b elow des c rib es th e judg m ent s p er so ftwa re

sy st em:

As t he j udgm en t s a re g iv en in na tu ral la nguag e it’ s not qua nt ifiabl e. T h e judgm ent s in th e rep or ts can ta k e up mu ltip l e s en ten ces an d th e t erm s u s ed va ry b y repo rt. Exa mpl es o f judg m ent s a re “ thi s s y st em is abo v e a v erag e”, “bett er th en w e ha ve ev er s een ” a nd “ th e s y st em i s o v er ly co mpl ex ”. We ha v e si mpl ifi ed th es e sent en c es to eith er “good” d eno t ed a s “+” o r “bad” d en ot ed as “-”. Th e repo rt s w e st udi ed didn’t j udge an y s y st em a s being av erag e.

Softwa re sy st em

Judgm ent

A

+

B

+

C

-

D

-

E

+

F

-

G

?

H

?

I

+

J

+

K

-

Two s y st ems (G and H) ha ve n ot b een a s s es s ed (y et) . Th e judg m ent o f th es e sy st em s is u ndefin ed. It ’s n ot po ss ibl e to valida t e th es e s ys t em’ s pla c es i n th e ben ch ma rk. W e did how ev er cho os e t o k eep th em in, as it wi ll affec t th e compa r iso n of ot her sy st ems .

6.3 Benchmarks

In appendi x A we d esc rib e wh ich pa r ts o f th e s oftware s y st em a re m ea su red. F irs t we pres ent t h e result s a nd a ft erwa rds w e c omp are th ese results with t he j udgm en t s giv en by a s ses so rs.

Th e b enchma rks mea su red h ere a re d es cr ib ed i n pa rag raph 4. 1.

6.3. 1 Ord e red tabl e

B1

Valu e

 

B3

Valu e

 

B5

Valu e

 

B7

Valu e

E

4.7

7

J

54.

25

J

99.

16 %

J

 

3%

A

5.2

8

K

62.

40

K

95.

98 %

E

 

3%

J

5.6

7

E

67.

62

E

93.

95 %

I

 

5%

K

5.7

6

B

81.

32

F

86.

60 %

B

 

5%

H

6.1

5

I

82.

31

H

86.

29 %

G

 

9%

I

6.2

4

A

93.

34

A

84.

69 %

K

10

%

B

7.9

4

G

96.

89

B

83.

73 %

F

16

%

G

8.6

5

D

107

.0 1

G

83.

69 %

H

18

%

F

9.0

0

C

182

.0 8

I

78.

55 %

A

18

%

D

9.5

6

F

190

.2 2

C

76.

33 %

D

20

%

C

13.

55

H

206

.6 1

D

52.

49 %

C

29

%

 

B2

Valu e

 

B4

Valu e

 

B6

Valu e

 

E

4.7

7

J

99.

89 %

E

161

.8 8

A

5.2

8

K

99.

51 %

K

159

.6 4

J

5.6

7

E

99.

33 %

A

158

.1 4

K

5.7

6

H

99.

03 %

I

153

.2 5

H

6.1

5

A

98.

69 %

B

149

.1 9

I

6.2

4

F

97.

98 %

G

147

.3 4

B

7.9

4

G

97.

96 %

D

146

.8 6

G

8.6

5

B

97.

72 %

F

145

.3 6

F

9.0

0

I

97.

37 %

H

143

.8 5

D

9.5

6

C

93.

99 %

J

139

.1 8

C

13.

55

D

91.

56 %

C

125

.7 1

6.3. 1.1 Valida tio n

A li mita ti on of th e val idat ion d es c rib ed h ere is t h at t h e nu mber o f s ys t em s anal yz ed here i s rat h er sma ll (1 1 so ftwa re s y st ems). It’ s pos sibl e t hat with m ore s y st ems t h e results wo uld be di fferent.

Th e fo ll owing t abl e p resent s th e j udgm en ts gi v en b y th e as s es so rs and th e po sitio n in ea ch b en ch ma rk. To c ompa re th e pos it ion a s oftwa re s y st em to a judgm en t we hav e s imp li fi ed th e po sitio ns i n th e benchma rk t o:

+ = Hig h ( Top fou r s ys t em s) 0 = A v erage ( Th e th ree s ys t em s in t he middl e) - = L ow (Bo tto m fo u r s ys t em s)

Th e s y st em s deno ted beca u se n o s ys t em i s

a s “ 0” in th e b en chma rks are not compa rab le to th e judg m ent s judg ed as a verag e. W e ha ve cho s en t o ex clud e t hes e so ftware

sy st em s b ecau s e if o nl y on e or two s y st em s a re added th e qual it y ca n jump f rom

averag e to l ow

o r to hig h q ual it y.

To ma k e th e co mpa ri so n ea si er th e judg m ent s o f th e as s es so rs are rep ea ted in th e s ec ond co lu mn. W hen a sy st em’ s p os it ion i n th e benchma rk co nfi rm s th e judg m en t it’ s co lo red g reen and it’s col o red red w hen it’ s c ont radi ct s t he j udgm en t.

Softwa re Sy st em

Judgm ent

B1

B2

B3

B4

B5

B6

B7

A

+

+

0

+

0

  • 0 +

 

-

B

+

0

+

-

-

 
  • 0 +

0

 

C

-

-

-

-

-

-

-

-

D

-

-

-

-

-

-

0

-

E

+

+

+

+

+

+

+

+

F

-

-

-

-

0

+

-

0

G

?

-

0

0

0

-

0

0

H

?

0

-

0

+

0

+

-

I

+

0

0

0

-

-

+

+

J

+

+

+

+

+

+

-

+

K

-

+

+

+

+

+

+

0

When w e ig no re th e s y st em s tha t are av erag e ( deno t ed a s 0) th ere are t he fol low ing differen c es:

Benchma rk 1: S ys t em K

Benchma rk 2: S ys t em K

Benchma rk 3: S ys t em B, sy s tem K

Benchma rk 4: S ys t em B, sy s tem I, sy st em K

Benchma rk 5: S ys t em F, s y st em I , s ys t em K

Benchma rk 6: S ys t em J, s y st em K

Benchma rk 7: S ys t em A

Si

x

o ut o f s even benchma rks w ro ngl y indi ca te s ys t em K

a

s

hig h q ual it y. A cc ord ing to

th e a s ses so rs th is sy st em wa s o v er eng in eered. An ov er engi neered so ft ware sy st em

can ha v e man y smal l a nd si mpl e m et hod s a nd c l ass es and th erefore sc ore w el l in t he

ben ch ma rk s 1 to 6 . larg e for t hi s s y st em

Th e ab so lu te siz e and desig n co mpl ex it y i s co ns id ered to b e to o

Th e d ev elop ers of t hi s s y st em u s ed a too na med Ch eck st yl e

t hat au toma ti cal l y

repo rt s on o ver ly la rg e and c ompl ex c las s es a nd m et hod s [C h ec k st yl e].

Ano th er int eres ti ng result i s that sy s tem J s co res w ell on a ll benchma rks exc ept o n th e Ma in tainab il it y Ind ex ( B6). W e found t hat t h is s y st em has a poo r resu lt b eca us e it had th e lo west co mm en t p ercentag e (4. 5 %). We beli ev e that th e co mm ent perc en tag e m et ri c i s q u es tio nabl e becau s e c ode that i s ea s il y readabl e sho uld no t need lo ts o f c om m ent s to cla ri fy it [M cCon nell 0 4].

Bas ed on t h es e resu lt s benchma rk 1, 2 (av erag e l in es o f c ode o f m et hod s / cla ss es)

and 7 ( cod e d upl ica tio n p erc entag e) c on fi rm

t h e judg m ent s o f a s s es so rs.

A p rob lem wit h co mpa ri ng th e judg ment s to

t he benchma rk s is t hat on e b en ch mark

onl y m easu res on e a spect o f qual it y wh il e th e ju dgment s g iv en b y th e as s ess ors

mea su re al l a sp ec t s.

Th erefore t h e d ifferenc es a nd si mil ar it ies found

can b e cau s e

beca u se o th er a sp ect s w ere of a c erta in qua lity whi ch happ en s to b e th e sa m e a s th e ben ch ma rk. A s t he sca l es o f c ompa ri so n is o nl y “+” and “-” t here i s a lwa ys 5 0 % chan c e tha t th e resu lt s are si mila r.

Th e va lidat io n w ould ha v e b een mo re p rec is e i f t here wou ld b e a d irect mapping t o

th e a sp ec t m ea su red in th e b en ch mark and th e judgm en t

of t h e a s ses so rs. An oth er

imp ro vem en t w ould b e if th e s cal es w ere mo re prec is e so tha t th e chan c e of

coi nc id enta l s im ila riti es

w ou ld b e low er.

Next to t h es e p robl ems o f th e va lida tio n m et hod it w ould b e d esi rabl e t o hav e a larg er s et o f s oftwa re sy st em s. A p rob l em i s tha t t h e b en chma rk s p ropo s ed here are int end ed to b e u sed b y th e S IG to h elp a ss es so rs in d ec iding w heth er th e s y st em i s good o r no t. Gat h er ing m ore data from s oftwa re s y st em s wo uld m ean t hi s wo rk sho uld no t b e u s ed b ecau s e it wo uld in flu en c e t he a ss ess o rs. By do ing th is w e and up with a c las s ic chi c k en and egg p robl em b eca us e fo r t hi s va lidat ion app roa ch w e need th e j udgm en ts o f as s es so rs and th e a ss ess ors n eed b en ch ma rk s fo r ma k ing judgm en ts .

6.3. 2 Co mbi n ed b en c h mark

In th e p revi ou s pa rag raph w e fo und b en chma rk 1, 2 a nd 7 to b e cl os est

t o th e

judgm en ts o f as s ess ors. Now w e c omb in e two o f th es e, benchma rk 1 (li nes o f cod e per m eth od) a nd b en ch ma rk 7 ( cod e d upl ica tio n ), to c rea t e on e co mbi ned ben ch ma rk. Bot h b en ch ma rk s a re equa ll y w eigh ted i n th e eq uat ion. A val ue abo v e zero ind ica t es abo v e a v erage qua lity and vi c e v ers a.

Th e va lidat io n w ould ha v e b een mo re p

Ex c ept f ro m s ys t em K a ll valu es are sim ila r to th e judg ment s of th e a s s es so rs.

6.3. 3 Hi sto grams

Th e fo ll owing t abl e p resent s th e di fferen c es b et ween t he di st r ibut io n o f l in es cod e fo r

ea ch s y st em

and t h e b en ch ma rk:

 
 

Sy st em

HDM

 

A

  • 0.9 57 73

B

  • 0.9 49 79

C

  • 0.8 77 53

D

  • 0.8 16 57

E

  • 0.9 22 93

F

  • 0.8 14 93

G

  • 0.9 36 72

H

  • 0.7 62 87

I

  • 0.8 81 98

J

  • 0.8 47 91

K

  • 0.9 00 14

 

Difference between the distribution of lines of code over clas ses.

We to ok tw o

ent ri es f ro m thi s tab le: S y st em A b ecau se it s high est valu e i ndi cat es a

high s im ila r it y w it h th e b enchma rk a nd s ys t em H b eca u se it s low est va lu e ind ica t es that it’ s mo st di fferen t f ro m th e b enchma rk.

6.3. 3 Hi sto grams Th e fo ll owing t abl e p resent s

The lines of code distribution of syste m A is equivalent to the benchmark.

There is a significant difference between system H’s and the benchmark’s distributio n. System H has

There is a significant difference between system H’s and the benchmark’s distributio n. System H has fewer classes in the s mallest bin range and therefore more clas ses in the higher bin ranges.

Th e hi st ogra m s cl ea r ly show t hat s y st em A is eq uiva l ent to th e b en chma rk and tha t sy st em H i s d ifferent. S y st em H ha s few er smal l c la ss es a nd has a h igh er p erc ent o f it s cla s s es i n th e mo re bin . Sy st em H’ s la rge cla ss es are wo rt h in v es tiga ting a s th es e cla s ses m ight t ak e up to o mu ch fun ct iona lity .

7 Con clu sio n

In th is st udy w e expl ored t he u s e of met ri cs a s ben ch ma rk s fo r t he sou rc e cod e quality o f so ftware sy st ems . Ba sed o n th e s el ect ion c r it eria, d es c rib ed in chap ter 3,

th e lin es o f cod e, c y cl oma ti c

co mpl exity and cod e d upl ica tio n m et ri c s were s elec ted.

Th ese met ri cs w ere us ed to c rea t e rep res en tat io ns o f th e benchma rks .

Is b en ch marki ng ba s ed on softwa re m et ri c s a g ood m eth od to det ermi ne th e so urc e

cod e q ual it y of so ftwa re s ys t em s ?

 

Ap pe ndi x A: Me as ure men t Guidelin es

To be abl e t o mak e a fa ir co mpar is on it’ s need ed to set c lear gu ideli n es abou t what sho uld b e m easured o r no t.

Part s t o meas ure

Sou rce cod e tha t n eed s to b e m easu red i s al l p rogra m so urc e cod e t hat i s

main tai ned b y th e

d ev elop ers of t h e p ro ject . Th e fo ll owing gu id eli n es a re d efin ed:

Prog ram sou rc e cod e i s in c lud ed.

Sou rce cod e of ext ernal l ib rar i es i s ex cl ud ed. Ex terna l m ean s lib ra ri es tha t a re

from a th i rd pa rt y, for examp le C O TS o r open so urc e.

Sou rce cod e of l ib rar i es made b y th e c ompan y it s elf sho uld be i n clud ed.

Gen era ted cod e is

exc lud ed. How ev er i f t he g en erat ed cod e i s ma inta in ed by

hand it sho uld be i n clud ed.

Unit t es t cod e shou ld be ex c lud ed. Th e pu rpo s e of un it t est cod e is di fferen t th en

fo r p rog ra m cod e. U nit tes t cod e o ft en ha s l es s con t rol flow

sta tem en ts and can

s er iou s ly affect th e m ea su rement resu lt s. M easu ri ng qu aliti es of u nit tes t cod e

sho uld b e m easured s epa ra te from p rog ram c od e. M easur ing u nit t est s i s beyo nd th e s cop e of t hi s p roj ec t.

Bibliog rap hy

[Jones 0 0] C. Jo nes. S oftwa re As s essm ent s, B ench ma rk s and B est P rac ti c es , Addi son-W esl ey, 20 00

[McCon n el l 04] S. M cCo nn el l. C od e C ompl et e Second Ed it ion, M ic ro so ft P res s, 2 00 4

[Men s 0 2] M. M en s, S. D emey er. F utu re T rend s in So ftwa re E vol ut ion M et ri c s, ACM, 200 2

[Bak er 95] B .S. B ak er. O n F indi n g Dupli ca tio n a nd N ea r- Dupl ica ti on in La rge Softwa re Sy st ems , P ro c eedi ngs o f th e S econd W orki ng Co nferen c e on R ev ers e Engin eeri ng (WCR E ’9 5), 19 95

[Mayrand 96] J. Ma yra nd, C. Leb lan c, E.M. M erl o. E xp eri ment on th e Au toma ti c Det ecti on o f F un ct ion Cl on es in So ft ware S y st em Us in g M et ri c s, In t ernat iona l Conferen c e on So ft ware Mai nt ena nc e (IC SM '9 6) , 1 99 6

[Veerma n 03] N. V eerma n. R ev ita liz in g Modi fia bility o f L ega c y A ss et s, P ro c eeding s of th e S ev ent h Eu rop ean Co nferen c e on So ftw are Main t enan c e And R eengi n eer ing (CSMR’0 3), 20 03

[Bra nd 97] M.G.J . van d en B rand , P. Kl in t, C. V erho ef. Re- engin eerin g n eed s Gen er i c Pro gram mi n g L an gua ge T echn olo gy, ACM , 19 97

[McCab e 76] T .J. M cCab e, A Co mpl exity M easu re, P ro ceed ing s o f t he 2nd int erna tio nal c on ferenc e on Softwa re eng in eerin g, 1 97 6

[McCab e 89] T .J. M cCab e, C.W. But l er . Desi gn C ompl exity M easu rem en t a nd T es ti ng, Comm un ica tio n s o f t he A CM, 19 89

[Sh epperd 8 8] M. Sh epp erd. A c rit iqu e o f c yc lo mati c c omp lex it y a s a so ftwa re met ri c, So ftwa re Eng in eering Jo urnal, 1 98 8

[Chidamb er 91] S.R. Chida mb er , C.F . Kem erer. Towa rd s a m etr i cs suit e fo r obj ect ori ent ed desi gn, In t ernat iona l w o rk sh op o n Pr in c ipl es o f s oftwa re ev ol uti on, 1 99 1

[Chidamb er 94] S.R. Chida mb er , C.F . Kem erer. A M et r ic s S uit e fo r Obj ect Or i ent ed Des i gn, I EEE , 19 94

[Bas ili 96] V.R. Bas il i, W.L. M elo . A Val idati on o f Ob j ec t-O r ient ed Desi gn M etr i cs a s Qua lit y I ndi cato rs, I EEE Tra nsa c tio ns o n So ftwa re Eng in eer ing, 1 99 6

[NAS A-1] NA SA So ftware A s su ran c e T echn olog y C ent er, So ftware M et ri c s Res ea rch

and D evelop ment. h ttp:// sat c.g sf c.na sa.g ov/ metr ic s/. La st

v i sit ed augu s t 20 06

[NAS A-2] NA SA I ndep endent V eri fi cat io n a nd Va lidat ion Fa ci lit y, M et ri c s Data Prog ram , http:// mdp.i vv .na sa.go v/. Las t v is it ed augus t 20 06

[Ros enb erg 97] J. R os enberg, S om e Mi s con c ept i ons Ab out L in es of Cod e, m etr ic s, p. 137 , Fo u rth Int erna tio nal S oftwa re M et r ic s S ym posi um (ME TRI CS' 97), 1 99 7

[Park 92] R.E. Pa rk, So ftwa re Siz e M easu rem en t : A Fra m ewo rk fo r C oun ti n g Sou rc e Stat em ent s, S oftwa re E ngin eeri ng In st itu t e (CM U/SE I-9 2-TR- 02 0), 19 92

[Col eman 94] D. Col eman, D. A sh, B. Lowt h er , P. Oma n, U sin g M et r ic s to E val uat e

Softwa re Sy st em Main tai nabil it y, Co mput er, v ol. 2 7, n o. 8, pp . 44- 49,

Aug. ,

1 99 4

[Arch er 95] C. A rch er. M easur in g Obj ec t-O r i ent ed So ft ware P rodu ct , So ftwa re Engin eeri ng In st itu t e ( SE I-CM-2 8), 19 95

[Tian 95] J. Tia n, M.V . Z el kowitz, Comp l ex it y Measu re Eva lua tio n a nd S el ecti on, I EEE

Tra n sac tio ns o n So ftwa re Eng in eer ing, vol . 21,

no. 8,

pp. 64 1-65 0,

Aug., 19 95.

[SEI] Ca rn eg y M ello n So ftwa re Eng in eer ing I ns titut e: A ta xon om y mea su res. ht tp://www. sei. cm u. edu/s t r/ta xon om ies/.

of qua lity

[Calero 0 1] C. C al ero, M. Pia tti ni, M. G en ero. A Cas e St udy with R elat iona l Databa s e Met r ic s, AC S/I EEE I nt erna tio nal Co nferen c e o n Compu ter S y st em s and App li cat ion s (AICC SA' 01), 2 00 1

[Dem ey er 99] S. D em ey er , S. Du cas s e. M et ri c s, Do T hey Rea ll y H elp ?, LM O, 19 99

[Dem ey er 01] S. D em ey er , T. M en s, M. Wermeli nger. T owa rds a So ftwa re E vol ut ion Benchma rk, In t ernat iona l w ork shop o n p r in cipl es of so ftwa re evo lut ion , 20 01

[Fren ch 9 9] V.A. F ren ch. E stab li sh in g So ftwa re Met r ic Th reshold s. I nt ernat io nal

Work shop o n

So ftwa re M ea su rem ent (IW SM’ 99) , 1 99 9

[Duca s se 9 9] S. Du ca ss e, M. Ri eg er, S. D em eyer. A Lan gu age I nd ependent App roa ch fo r Det ect in g Dupl i cat ed C od e, ic sm, p. 10 9, 15 th IE EE I nt ernat iona l Con ferenc e on Softwa re Ma in tenan c e ( ICSM '9 9), 1 99 9.

[Gra y 96] A.R . Gra y, S. G. MacD on el l. A c ompa ri son o f t echn iqu es fo r dev elop in g predi cti v e mod els o f s oftware m etr i cs, E l sev ier, 199 6

[Sch n eid ewi nd 92] N.F. S cn eid ewind . M et hodo lo gy Fo r Va lida tin g S oftwa re M et r ic s, IEE E T ran sac ti on s on So ft ware E ngin eering , 19 9 2

[Samo ladas 0 4] Sa molada s, I. , Sta m elo s, I. , An gel is, L. , a nd Oi ko no mou, Op en sou rc e so ftwa re dev el opm en t sh ou ld st ri v e fo r even great er c ode main tai nabil it y, Comm un ica tio n s o f t he A CM 47, 10 (O c t. 20 04), 8 3-8 7.

[Kett eler ij 0 5] R. Ket tel er ij, B .G. P rij n. Det ec tio n O f Dupl i cat ed C od e In La rge Softwa re Sy st ems , Grad uat ion rep or t Un iv ersity of A rn hem and Nij m eg en, 2 00 5

Thi s wo rk d esc rib es cod e d etec tio n too l.

th e d ev el opm en t of a la ngu age ind ep end en t d upli ca ted A maj o r d esi gn goa l of t hi s tool i s p er fo rman c e. Th i s too l

is u s ed

in th i s st udy t o cal culat e cod e d upli ca tio n.

[Mei jl es 05] J. Meij les. A nal y si s of d es i gn ers ’ w o rk, Ma st er th esi s Un i versity o f Am st erdam, 2 00 5

[Fent on 9 9] N .E. F ent on, M. N eil . So ftwa re Met ri cs: su cc es ses, fai lu res and n ew direct ion s, El s evi er, 19 99

[Fent on 9 7] N .E. F ent on, S. L. P fleeg er. So ftwa re M et ri c s S ec ond Editio n, PWS Publi sh ing Co mpan y, 19 97

[Menz ies 0 2] T. M enzi es, J.S. F en ton, Ju st in S. Di St efano, M. Chap man, K. M cGi ll. Met r ic s T hat Mat t er, P ro c eeding s o f th e 27 th A n nual NA SA G oddard/ IEE E So ftwa re Engin eeri ng W o rk sh op, 20 02

[Sim 0 3] Su san E ll iot t Si m, S tev e Ea st erbroo k, Richa rd C. Ho lt, " Us ing B en chma rk ing to Adva n ce R es ea rch: A Cha ll eng e to So ftwa re E ngin eering, " ic s e, p. 74 , 25 th Int erna tio nal Co nferen c e o n So ftwa re Eng in eeri ng (I CSE '03) , 20 03.

[Cui 06] Qi ngguang Cu i, Matt h ew O. Wa rd, E lk e A. R und en st ei n er , J ing Ya ng, "Mea su ri ng Da ta Ab st ra cti on Qual it y in Mu lt ireso lut ion V i sual izat ion", To appear i n IEE E Sy mpo si um on In fo rma tio n Vi sua liza tio n 2 006 ( In foV is ’0 6), 20 06 http://da vi s.wpi. ed u/~x mdv/d oc s/i nfov is06 _m easu re.pdf

[SIG] S oftwa re Imp ro v em en t Group, h ttp://ww w.sig. nl

[Chec ks ty l e] C hec ks t yl e, ht tp://chec ks t yl e. sou rc efo rge. n et

[SWEB O K 0 4] Guid e t o th e So ftwa re Eng in eer in g Bod y o f Knowl edg e, http://www. sw ebo k.o rg

[Spr ing] Sp ring F ram ew ork, h ttp://www. sp ring f ram ew ork.o rg

[Hibernat e] Hib ernat e, h ttp://www. hib erna te.o rg

[Jakar ta] Th e Apa ch e S oftwa re Fo undat ion , Ja ka rta Co mm on s, http://ja ka rta.apa ch e. org/ co mm on s