You are on page 1of 4

Identifier Naming Conventions and Software Coding

Standards: A Case Study in One School of Software


Yanqing Wang, Shengbin Wang, Xiaojie Li, Hang Li, Jin Du
School of Management, Harbin Institute of Technology
Harbin, China
e-mail: yanqing@hit.edu.cn

Abstract - Within software coding standards, identifier naming software metrics, source level testing and verification were
plays an important role. Identifier naming conventions were main topics.
divided into four categories in our research - Hungarian, Pascal, The research on coding standards has attracted our attention
Camel and Underscore. Such techniques in compiler theory as since 2003. Many students and even professionals are not used
regular expression and lexical analysis were used to extract to complying with coding standards when they are writing
identifiers and to match them with the defined naming programs. While there are few approaches to be taken to
conventions in our evaluation system. The consistency of measure how much students or engineers comply with coding
identifier naming was calculated with standard deviation. After standards in their programming practice. Along this topic, a
testing students’ projects in three years and three pedagogical
series of researches were undertaken by our team:
websites in which there were lots of Java files, the evaluating
system were proved applicable and some conclusions were made • Since hundreds of rules or guidelines in some software
based on the testing results as well. companies are difficult to follow by students who are
studying their introductory programming languages, a
Keywords - software quality assurance (SQA); coding standards; set of simplified coding standards were introduced into
identifier naming convention; regular expression; software metric our pedagogical process[8][9].
• Based on the simplified coding standards, the hierarchy
I. INTRODUCTION structure of evaluating index system was proposed in
which three layers (target layer, criterion layer and index
It is a highly competitive era of globalization and layer) and nineteen indices were defined. The weight
industrialization, in which the development of software distributions of all indices were acquired by
industry determines the competitiveness of science and questionnaire and Analytical Process Hierarchy (AHP)
technology of a nation to some extent. The development of [10].
software industry is based on software quality assurance • So as to facilitate students to evaluate their work online,
(SQA). Coding standards do not only help programmers an evaluation platform was constructed, with which
communicate and collaborate efficiently in big projects to students could upload their programs and got
produce quality software products [1], also they are of great benchmarking results and their detailed shortcomings on
significance when software enterprises undertake international coding standards [11].
joint development [2]. Many internationally renowned software • Through discriminating the relationship between
companies have established strict criteria to improve the complying with coding standards and retaining
standards of software codes [3][4]. The research and training of programming style, a quality outlook at source code
coding standards is relevant to the competitiveness of software level was proposed [12].
industry. However, there is no efficient approach to evaluate Coding standards were divided into four categories: Layout,
the coding standards of programs while manual evaluation is Naming, Comment and Coding. Actually, naming conventions
expensive, much fallible and out of objectiveness. play very important role since lots of programmers ignore
In last decade, such software organizations as IBM, them. Such confusing variables as i, jj, kkk, or a1, a2, a3 can be
Microsoft, Bell laboratory and some institutes began to explore seen quite frequently not only in universities but in software
universal and applicable evaluation criteria of coding standards companies. The bad naming habits prevent programs from
rather than traditional testing techniques [5]. In 2nd being understood, maintained and reused. The naming of
international conference of Asia-pacific Software Quality held identifiers such as variables, constants, functions, structure
in Hong Kong in 2001, coding standards were discussed tags, structure variables, classes and objects are all important.
formally and effective evaluation system to coding standards So far, in global software industry, popular identifier naming
was discussed to improve software quality. In 2006, an conventions include Hungarian, Camel, Pascal and so on.
evaluation approach based on code review process (CRP) was Based on our previous researches on coding standards, the
proposed by Dr. Li in Unitec New Zealand University in which recognition and statistics of identifier naming conventions are
coding specification was taken as a complement in manual our focus in this paper. First, input source files were scanned
assessment to coding standards [6]. Fang proposed that using and identifiers in them were extracted. Second, each identifier
coding standards to improve program quality [7]. Especially, in was recognized and classified into corresponding category.
the 8th IEEE International Working Conference on Source Third, the frequency and occurrences times of each category
Code Analysis and Manipulation (SCAM’08), source level

This research is a China Postdoctoral Science Foundation funded project

978-1-4244-5392-4/10/$26.00 ©2010 IEEE


Authorized licensed use limited to: Instituto Tecnologico de Costa Rica (ITCR). Downloaded on September 02,2020 at 23:02:24 UTC from IEEE Xplore. Restrictions apply.
were summarized and analyzed so that some valuable results Kenneth introduced the mathematical sign of regular
were found. This paper was organized as follows: our related expressions into grep. Since then, regular expressions have
researches were introduced in Section 2. The methodology was been widely used in all kinds of UNIX operating systems and
described in section3. In Section 4, some cases were analyzed similar tools. In computing, regular expressions provide a
and the corresponding results were introduced. concise and flexible means for identifying strings of text of
interest, such as particular characters, words, or patterns of
characters. A regular expression (often shortened to regex or
II. POPULAR IDENTIFIER NAMING CONVENTIONS regexp) is written in a formal language that can be interpreted
Although there is not a unified coding standard by a regular expression processor, a program that either serves
internationally nowadays, more and more people admit the as a parser generator or examines text and identifies parts that
affect of coding standard in the process of software developing match the provided specification [14].
in practice. Almost every set of coding standard is made up of In this paper, regular expression was used to extract
four parts: layout, naming, comments and programming. That identifiers from source code programs and to match them with
is to say identifier naming is a quarter of coding standard. the four accepted naming conventions. To extract identifiers
When measuring the extent of specification of identifier from a source program, it is not a good idea to scan it word by
naming for source code internationally at present, four word because the scanning process is often disturbed by non-
common naming rules can be taken into consideration: identifiers. Also, each variable has its definition. Therefore,
writing matched regular expressions can save more system
resources and time. The occurrence times and line numbers
A. Hungarian naming convention cannot be obtained only by addressing definition sentences so
One or more lowercase letters are used as the prefix of an that two arrays are used. One array carries the identifiers
identifier, so as to identify the scope and type of the identifier. obtained from definition sentences, the other stores the
After the prefix is one or more words with first letter identifiers which are got when scanning the whole file and
uppercase, and the word should indicate the purpose of the ignoring punctuations, constants, reserved words, etc. The
variable [13]. An example is: int iStudentNumber ; The partial latter array helps system get occurrence times and line numbers
prefixes of Hungarian notion are listed in Table I. of a certain identifier.
In java language, almost all the definition sentences are like
TABLE I. PARTIAL PREFIXES IN HUNGARIAN NAMING CONVENTION the following format:
Prefix Type Prefix Type
(modifier+) type + variable (=new type ());
a Array i integer
b Boolean l LongInt Therefore, when definition sentences are obtained, only
by Byte lp LongPointer modifier or type is required. After extracting, every identifier
… …
should be classified into a specific naming convention. As
mentioned above, four categories were defined and
B. Camel naming convention miscellaneous identifiers are ignored when calculating. The
regular expressions of different naming conventions are listed
It is also spelled camel case. Uppercase letters are taken as
as follows (Table II).
word separators, lowercase for the rest. In an identifier each
logical breakpoint uses an uppercase to mark. The initial letter TABLE II. REGULAR EXPRESSIONS OF DIFFERENT NAMING CONVENTIONS
of the first word is in lower case as a camel is bowing its head.
An example is: Naming convention Regular expressions
[ab(by)(cb)(cr)(cx)(cy)(dw)(fn)hil(lp)(m_)
printEmployeePaychecks( ) ; Hungary
n(np)ps(sz)w]+([A-Z][a-z]+)+
C. Pascal naming convention Camel [a-z]+([A-Z][a-z]+)+
Pascal ([A-Z][a-z]+)+
It is similar to the Camel naming convention except that the Underscore ([a-z]+_)+[a-z]+
initial letter of the first word is uppercase. For example:
public void DisplayInfo( ) ; B. Statistics approach
String UserName( ) ; Generally, in a single program written by one programmer,
more than one identifier naming conventions may be applied.
D. Underscore naming convention
Although it has not been proved which one is the best, it is no
Underscore naming convention and Camel naming doubt that the more consistent the identifier naming is, the
convention are also very similar, only each logical breakpoint more the software complies with coding standards. However,
in the identifier uses an underscore to mark. For example: not every programmer has a sense to do so and there is no
print_employee_paychecks( ) ; effective statistics approach to measure identifier naming of
programmers. Therefore, a well-known statistics method,
standard deviation, became our solution.
III. METHODOLOGY In probability theory and statistics, standard deviation is a
measure of the variability or dispersion of a statistical
A. Regular expression
population, a data set, or a probability distribution. A low
Regular expression derived from automatic control theory standard deviation indicates that the data points tend to be very
and formal language theory in computer science. In 1950s,

Authorized licensed use limited to: Instituto Tecnologico de Costa Rica (ITCR). Downloaded on September 02,2020 at 23:02:24 UTC from IEEE Xplore. Restrictions apply.
close to the mean, whereas higher one indicates that the data above, the results of projects were calculated and listed in
are spread out over a large range of values. Table IV.
As the occurrence times of each identifier are different, the
standard deviation was calculated with two algorithms in our TABLE IV. PROJECTS BY THE YEAR 3 STUDENTS IN 2006 TO 2008
application system, one of which was depended on rough 2006 2007 2008
Naming
consistency index and the other was based on precise Rough Precise Rough Precise Rough Precise
consistency index. When calculating σ R , each identifier was Hungarian 64 526 66 275 115 488
Pascal 7 64 31 164 23 78
counted as one unit no matter how many times they appeared. Camel 1560 15312 1510 8980 2090 11475
While with σ P , the occurrence times of each identifier were Underscore 103 572 156 583 145 305
taken into account. For example, there are seven variables in Consistency 752.0 7465.9 714.8 4323.3 999.2 5594.9
one program: usloop, m_Name, StudentNumber, CallerId,
userName, moneyAmout, no_of_student. The occurrences and With the data in Table IV, it was found that: Camel
naming conventions of them are shown in Table III. naming convention was utilized the most frequently;
Underscore and Hungarian naming conventions took the
TABLE III. NAMING CONVENTIONS AND OCCURRENCES subsequent positions; rough consistency index and precise
Variable name Occurrences Naming conventions consistency index had similar performance.
usloop 15 Hungarian
B. Three pedagogical websites
m_Name 8 Hungarian
StudentNumber 2 Pascal So as to assure pedagogical quality and to enhance the
CallerId 1 Pascal learning outcome by students, three pedagogical websites were
userName 6 Camel developed and put into commission in SoS@HIT. These three
moneyAmout 7 Camel websites were all tested with our application to find some
no_of_student 15 Underscore meaningful results. These websites are running in intranet of
HIT and have not opened to internet users yet.
1) www.wondercall.cn:8080/codingstandards: This web
Let X be a random variable with mean value μ . In the case
site is working to test the coding standards of C/C++ files. It
where X takes random values from a finite data set X1, X2, X3, brought into service in 2007 and 73 Java files are running.
and X4 to present the times of using the four identifier naming 2) www.wondercall.cn:8080/coolpsp: This web site is used
conventions: Hungarian, Pascal, Camel and Underscore. as a computer aided construction (CAI) system for course
Since X1 = 2, X2 = 2, X3 = 2, X4 = 1 and the mean value Personal Software Process (PSP). It began its work in 2007 and
μ =1.73, the rough standard deviation is: 7 java files are running.
1 3) www.wondercall.cn:8080/peercodereview: It is used to
σR = [(2 − 1.73) 2 + (2 − 1.73) 2 + (2 − 1.73) 2 + (1 − 1.73) 2 ] = 0.45 manage the peer code review process. It was put into
4
commission in 2008 and 89 java files are playing their roles.
Similarly, X1 = 23, X2 = 3, X3 = 13, X4 = 15 and the mean
From the data in 3 pedagogical websites, as listed in Table
value μ =13.6, so the precise standard deviation is: V, the similar results to previous part A were found eventually.
1
σP = [(23 − 13.5) 2 + (3 − 13.5) 2 + (13 − 13.5) 2 + (15 − 13.5) 2 ] = 7.13
4
TABLE V. PROJECTS BY THE YEAR 3 STUDENTS IN 2006 THROUGH 2008
codingstandards coolpsp peercodereview
IV. CASES ANALYSIS Naming
Rough Precise Rough Precise Rough Precise
Based on the algorithms mentioned above, an evaluation Hungarian 27 145 1 12 0 0
Pascal 0 0 0 0 1 3
system was developed to test the consistency of identifier
Camel 263 1554 29 131 369 1759
naming of software programs. It can run at single file mode and Underscore 3 14 0 0 79 410
batch file mode. At single file mode, only one Java file can be Consistency 127.08 753.34 14.34 63.75 175.12 833.23
tested. While at batch mode, a folder name can be input
parameter and all the Java files in that folder and all its
subfolder can be calculated. To make sense of our system's C. Guidelines on identifier naming
performance, some cases were tested and analyzed. According to all the data in above tables, even though there
A. Testing results of students’ projects in three years are no strong evidences yet, some preliminary guidelines can
be presented:
The students in School of Software in Harbin Institute of • Hungarian naming convention is the strictest and had a
Technology (SoS@HIT) are required to finish their course great achievement with Microsoft’s success, but it
project on Software Engineering in Year three. Three up to five becomes more and more unpopular because of its
students make a team and select project topic. Some teams complexity. Therefore, a software organization should
finish their project in J2EE or Java application. To find some not take this notation as standard before its engineers
valuable information on the identifier naming conventions in have got ready for complying with this notation.
Java, the projects written in Java language in 2006, 2007 and
• Since the popularity of Java language worldwide, Camel
2008 were collected. By executing our application mentioned
naming convention is getting welcome greatly. Thus,

Authorized licensed use limited to: Instituto Tecnologico de Costa Rica (ITCR). Downloaded on September 02,2020 at 23:02:24 UTC from IEEE Xplore. Restrictions apply.
Camel and Pascal naming conventions should be more REFERENCES
emphasized and widely promoted. [1] V. Basili and B. Boehm, “Software Defect Reduction Top 10 List,” IEEE
• The prospect of Underscore shows quite common. As a Computer Society, vol. 34, no. 1, January 2001, pp. 135-137.
result, this naming convention is not recommended. [2] W. Burger, “Offshoring and Outsourcing to INDIA,” Proc. Second IEEE
• Though each programmer can own his or her own International Conference on Global Software Engineering (ICGSE 07),
identifier naming convention, when a team is built up, IEEE Press, 27-30 Aug. 2007, pp. 173-176.
all programmers in it should follow the same naming [3] M. Steve, Code Complete, 2nd ed., Redmond, WA: Microsoft Press, 2004.
ISBN 0-7356-1967-0.
convention to keep the source programs highly readable
[4] S. Herb, and A. Andrei, C++ Coding Standards 101 Rules, Guidelines, and
and maintainable. Best Practices, Hong Kong: Pearson Education Asia Ltd, 2006.
[5] D. Janzen, H. Saiedian, and L. Simex, “Test-driven development concepts,
taxonomy, and future direction,” Computer, vol.38, no.9, 2005, pp. 43-50.
V. CONCLUSIONS
[6] X. Li, “Effectively Teaching Coding Standards in Programming,” Proc.
Under the research framework of coding standards, Conference on Information Technology Education (SIGITE 05). Newark,
identifier naming convention was proposed as a new approach New Jersey, USA. 2005, pp. 239-244.
to evaluating coding standards. This evaluation technique and [7] X. Fang, “Using a coding standard to improve program quality,” Proc. the
corresponding system are helpful to meet the requirements of 2nd Asia-Pacific Conference on Quality Software, IEEE Computer
Society, Los Alamitos, CA, USA, 2001, pp. 73-78.
coding standards by software industry. In this system, identifier
[8] Y. Wang, J. Wang, X. Sui and, P. Ma, “Quantitative Research on How
naming conventions were summarized as four conventions - Much Students Comply with Coding Standard in Their Programming
Camel, Pascal, Hungarian and Underscore. Then such Practices,” Proc. the 3rd China Europe International Symposium on
techniques as compiler and regular expression were applied to Software Industry Oriented Education (CEIS-SIOE 07), Dublin, Ireland,
extract and recognize identifiers. The emergence times and line February 6-7, 2007, Blackhall Publishing, pp. 116-119.
positions were all reported and standard deviation was [9] Y. Wang, H. Su, J. Wang, and B. Wang, “How Many Students are ready to
calculated as the criterion of evaluating naming consistency of Write Quality Programs Complying with Coding Standards: A Case
Study,” Journal of Acta Scientiarum Natralium Universitatis Sunyatseni,
a programmer or a software organization. Three websites were vol. 46, no. SUPPL, December, 2007, pp. 93-96.
taken as testing cases and developing status of our organization
[10] Y. Wang, X. Xu, L. Lei, and J. Wang, “An AHP-based Evaluation Index
on identifier naming conventions was analyzed. System of Coding Standards,” Proc. 2008 International Conference on
The research in this paper is not perfect at all because there Information Technology in Education (CITE'08), Wuhan, China,
is lots of work to do in near future, such as: (1) choosing December 12-14, 2008, IEEE Computer Society, pp. 620-623.
reasonable standard deviation for four identifier naming [11] Y. Wang, L. Lei, C. Zhao, and Z. Huang, “Teaching Model of Coding
conventions needs further statistic study; (2) after the Standards Based on Evaluation Index System and Evaluating Platform,”
evaluation practice of naming conventions in Java language is Proc. 2008 International Conference on Computer Science and Software
Engineering (CSSE 08), Wuhan, China, December 12-14, 2008, IEEE
successful, the evaluation to C/C++ language should be Computer Society, pp. 635-638.
achieved; (3) more collaboration with universities and software [12] Y. Wang, et al, “Complying with Coding Standards or Retaining
enterprises will be sought to get more data and information to Programming Style: A Quality Outlook at Source Code Level,” Journal of
refine the evaluating approaches of identifier naming Software Engineering and Application, vol. 1, no. 1, 2008, pp. 88-91.
conventions; (4) since many open source projects are widely [13] S. Charles, “Hungarian Notation,” MSDN Library, Microsoft, 1999.
used in today’s software world, the empirical study on http://msdn2.microsoft.com/en-us/library/aa260976(VS.60).aspx
identifier naming status based on open source projects will be [14] A. V. Aho, “Algorithms for Finding Patterns in Strings,” Handbook of
very challenging and interesting. Theoretical Computer Science, volume A: Algorithms and Complexity,
The MIT Press, 1991, pp. 255–300.

Authorized licensed use limited to: Instituto Tecnologico de Costa Rica (ITCR). Downloaded on September 02,2020 at 23:02:24 UTC from IEEE Xplore. Restrictions apply.

You might also like