You are on page 1of 5

Towards A Better Code Completion System by API

Grouping, Filtering, and Popularity-Based Ranking

Daqing Hou David M. Pletcher


Electrical and Computer Engineering Mathematics and Computer Science
Clarkson University, Potsdam, NY USA 13699 Clarkson University, Potsdam, NY USA 13699
dhou@clarkson.edu pletchdm@clarkson.edu

ABSTRACT the same time, this kind of reuse has also created a massive amount
Nowadays, programmers spend much of their workday dealing with of information and artefacts that many programmers have to deal
code libraries and frameworks that are bloated with APIs. One with. For example, the Standard Edition of the Java Development
common way of interacting with APIs is through Code Completion Kit version 1.6 (i.e., Java SE 6) has 3,777 classes and interfaces.
inside the code editor. By default, Code Completion presents in Over the past several years, the standard API library of Java SDK
a popup pane, in alphabetical order or by relevance, all accessible has grown nearly linearly, on average at a rate of 356 classes and
members available in the apparent type and supertypes of a receiver interfaces per year [10]. As another example, in a keynote speech 1 ,
expression. This default behavior for Code Completion should and Charles Petzold estimated that “[in the] .NET Framework 2.0[,]
can be further improved because (1) not all public methods are APIs [t]abulating only MSCORLIB.DLL and those assemblies that begin
and presenting non-API public members to a programmer is mis- with word System, [there are] over 5,000 public classes that in-
leading, (2) certain APIs are meant to be accessible only in some clude over 45,000 public methods and 15,000 public properties, not
limited contexts but not others, (3) the alphabetical order separates counting those methods and properties that are inherited and not
otherwise logically related APIs, making it hard to see their con- overridden.” In addition to the standard, core SDK APIs, program-
nection and to work with, and (4) commonly used APIs are often mers also need to learn to use third-party, domain-specific APIs.
presented long after much less used APIs due to suboptimal API Finally, on top of the core APIs, there are the vast amount of de-
sorting strategies. BCC (Better Code Completion) addresses these rived, nonetheless equally critical, information and artefacts such
problems by enhancing Code Completion so that programmers can as documentation, tutorials, wizards and recipes, and sample code
control how specific API elements should be sorted, filtered, and and applications for programmer consumption.
grouped. We report our preliminary validation results from testing Therefore, modern programmers must master the necessary skills
BCC with Java projects that make use of the AWT/Swing APIs. to effectively work with this vast amount of information artefacts
For one large project, the BCC approach reduces by over ninety and to overcome potential learning barriers [6]. They must learn to
percent the total number of APIs that a programmer would have to avoid both information overloading and information scarcity. This
scroll through using Eclipse’s Code Completion before settling on is far from a trivial task, and many programmers rely on tools and
the desired ones. features to help manage these APIs and artifacts, such as the IDEs
(Integrated Development Environments). Drawing from features
such as automated refactorings and code-assist operations, modern
Categories and Subject Descriptors IDEs have certainly changed the way software is developed. IDEs
D.2.3 [Software Engineering]: Coding Techniques and Tools— are particularly critical in helping programmers deal with the pro-
Object-oriented programming, Program editors; D.2.6 [Software liferation of APIs in libraries and frameworks.
Engineering]: Programming Environments—Integrated environ-
ments

General Terms
Algorithms, Documentation, Experimentation, Measurement

1. INTRODUCTION
Large application frameworks and libraries have clearly contributed
substantially to programmer productivity and software quality. At Figure 1: Code Completion can be invoked automatically or by
pressing CTRL-SPACE.

One of such features that has become standard in IDEs is Code


Permission to make digital or hard copies of all or part of this work for Completion. Code Completion regularly saves IDE users time when
personal or classroom use is granted without fee provided that copies are they work with difficult-to-remember or unfamiliar parts of APIs.
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to Obviously, Code Completion as currently implemented in IDEs
republish, to post on servers or to redistribute to lists, requires prior specific 1
permission and/or a fee. http://www.charlespetzold.com/etc/
RSSE ’10, May 4, 2010, Cape Town, South Africa DoesVisualStudioRotTheMind.html (All URLs ver-
Copyright 2010 ACM 978-1-60558-974-9/10/05 ...$10.00. ified on Jan. 31, 2010.)

26
(a) Sorting alphabetically by name all completion(b) Moving two relevant proposals to the top and
proposals available for the receiver. sorting the remaining proposals alphabetically.

Figure 2: Eclipse Code Completion (ECC, as of version 3.4) sorts completion proposals either alphabetically or by relevance.

such as Eclipse can still be improved. In this paper, we report on how preserve this advantage.
our experience with “Better Code Completion” (BCC), a research The second method is sorting by relevance. The completion pro-
prototype that demonstrates the feasibility of several such improve- posal computer assigns each proposal a relevance score during the
ments. More specifically, we show that ranking APIs based on computation process. However, relevance only comes into play
usage frequencies (popularity ranking) can reduce the number of when the expected return type of the completion invocation target
APIs that a programmer has to browse in order to select the one is non-trivial. For instance, if the completion process is invoked on
that she or he needs. BCC allows programmers to specify rules to the right side of an assignment statement, the type expected on the
control the presentation of APIs during Code Completion. With left side of the assignment statement is used to boost the relevance
rules, BCC can be customized to provide a more intuitive, logical scores of proposals that return or resolve to that type. Proposals
ordering and grouping of the list of completion proposals, to fil- that are not relevant as described above fall into alphabetical order.
ter out proposals that are not intended for programmers to see, and An example of sorting by relevance is shown in Figure 2b.
to show a proposal only in certain specified contexts. (Currently, With Code Completion, programmers are freed from having to
rules are specified with a simple grammar and stored in a text file. remember all the specific details about each API. Instead, they
The details will be described elsewhere.) BCC is developed as an rely on Code Completion as a just-in-time reminder to help re-
Eclipse plugin [8]. call and access these details only as they are needed. In this way,
they can focus on higher level, more important design informa-
tion such as relevant classes and type hierarchies. For example,
2. BACKGROUND AND MOTIVATION for Java 2D graphics, a programmer may need to remember only
The Code Completion displays a list of class members that can generally that java.awt.Graphics2D contains various APIs
be accessed from or invoked upon a specified “receiver” object in- for painting, such as drawOval() and drawString(), but
stance. An example in Java from the Eclipse IDE is shown in Fig- she or he does not have to remember the exact names, parame-
ure 1. In Figure 1, a local variable of type java.awt.Graphics2D ters, or the detailed semantics for each of these APIs. These details
named g has had Code Completion invoked upon it by typing “g.”. can be discovered using Code Completion on the fly. In particu-
Typically, hitting “.” after either an identifier, expression, or the lar, Code Completion allows a programmer to conveniently browse
“this” or “super” keywords invokes the code-completion process. and choose the relevant APIs, and offers easy access to the asso-
During this process, Code Completion in Eclipse computes a list ciated documentation, all in the context where the programmer is
of “completion proposals”, which are either method invocations or actively coding. Thus, Code Completion helps programmers avoid
field accesses that could be used to complete the current Java ex- switching work contexts and the ensued interruptions to their train-
pression. If the user continues to type after the initial “.”, the list of-thoughts. In this way, Code Completion supports programmers
of proposals is automatically filtered such that only those whose in best utilizing their brain power so that they can focus on more
names begin with the typed-in letters are displayed. As shown in important information, handle larger problems, and work more ef-
Figure 1, the completion proposals are listed one per line, in order fectively. Over time, Code Completion also helps a programmer
of member name, showing the member name and argument names incrementally get familiar with the APIs that she or he is using.
and types (if applicable), return type (or declared type for a field), Two typical scenarios can be identified when a programmer uses
and the first type in the type hierarchy on the path from the receiver Code Completion to complete an API usage expression. In the first
type to java.lang.Object where the member was declared. scenario, the programmer may already know the exact spelling, or
As of version 3.4, ECC (Eclipse’s Code Completion) has two a good portion of the prefix of the API name to be used. In this
sorting methods for completion proposals. The first, as shown in case, she or he can just type the name or prefix as usual without
Figure 2a, is to sort the completion proposals by member name, in major interruption from Code Completion. In the second scenario,
alphabetical order. This is usually ineffective for objects with large the programmer may know only a receiver expression and a rough
APIs, as this sorting ignores the level in the hierarchy at which idea of what is needed to achieve with the receiver. In this case,
the member is declared or overridden at. However, an important she or he may rely on Code Completion as a quick, within-context
advantage of the alphabetical order is its familiarity and good pre- alternative to searching and browsing documentation, as shown by
dictability (in the sense that every time Code Completion is invoked the JavaDoc in the right portion of Figure 2a. We believe that the
for the same type, the list of completion proposals remains in the second scenario is the main target for optimizing Code Completion.
same order). Any improvement to Code Completion must some-

27
(a) BCC sorts proposals by position of declaration (b) BCC sorts proposals by popularity.
type in type hierarchy and then alphabetically by
name within type.

Figure 3: BCC sorts proposals by the type hierarchy or by popularity ranking.

Since Code Completion is an interface through which program- We hypothesize that the grouping mechanism make it more ef-
mers view and access APIs on a daily basis, it is important to de- fective for programmers to learn and use APIs than without as it
sign it to be highly usable. The usability issue is especially crucial can be used to present APIs logically. Moreover, we believe that by
when the number of APIs to be viewed by a programmer becomes adding logical structures into the API space, grouping helps main-
too large (say, more than twenty). In such circumstances, it can be tain the predictability of the popup pane. This is important because
beneficial if a programmer can see the needed APIs as early as pos- before any new way of sorting completion proposals other than the
sible. While reaching this goal, it is necessary to keep the program- alphabetical order can be used, it must ensure that the popup pane
mer remaining oriented in the API space, which can be achieved does not become too chaotic to the programmers.
by adding some kind of logical structures into APIs. It can also be
helpful if Code Completion can filter out APIs that are irrelevant to 3.2 Sorting APIs.
the programmer’s current task. In these ways, the number of APIs Second, BCC provides two more options for sorting completion
that a programmer has to spend time on is reduced. proposals than ECC.
• Type-hierachy-based sorting.
3. BCC: BETTER CODE COMPLETION BCC sorts the list of completion proposals in the order from
At the core of any type-based Code Completion system is an en- the declared type to the root in the type hierarchy (java.lang.-
gine capable of computing the static type of a receiver expression Object), then alphabetically within each type. An example
that a programmer is currently working on. All APIs from this type of this behavior is shown in Figure 3a, where methods from
and its supertypes that are available for use in the current code con- JButton are shown at the top of the list before AbstractButton,
text, which are also known as completion proposals, are presented unlike the default Code Completion ECC shown in Figure 2,
to the programmer in a popup pane in a certain order. which mixes up methods without considering their relative
The presentation of the popup pane can be customized by filter- positions in the type hierarchy.
ing and sorting the list of completion proposals in different ways. • Popularity-based sorting.
Other than the two sorting methods introduced in Section 2, the The frequencies, or popularity by which APIs have been in-
default ECC offers very limited options for user customization. voked statically in source code can be used to sort APIs.
The goal for BCC is to provide programmers with more control The more frequent an API is used, the earlier it should ap-
over how code completion proposals can be grouped, filtered, and pear in the popup pane. API use frequencies can be dynami-
sorted. The main design ideas of BCC are described as follows. 2 cally computed or statically configured. Static configuration
can be based on information gathered from “representative”
3.1 Grouping APIs. projects that use the same APIs. We are currently experi-
First, BCC allows programmers to specify which methods should menting to determine which approach, or combination of ap-
belong to the same group. Methods of the same group will always proaches may yield the best possible solution.
be displayed together in the Code Completion popup pane. For ex- An example of this behavior is shown in Figure 3b, where
ample, the add()/getComponent()/remove() methods in java.awt.Co- two methods are shown at the top of the list before other
ntainer can be grouped together logically as they control how child methods for JButton and AbstractButton. This is because
components are added or removed from a container widget. A these two methods were used and their popularity has been
group can span multiple classes in the same type hierarchy. A spe- taken into account when BCC sorts the APIs.
cial kind of grouping is by the type where APIs are defined (group-
ing by types). 3.3 Filtering APIs.
2
The BCC implementation for the most part is a straightforward Finally, BCC also allows users to defined context-sensitive filters
customization of the modular design of ECC. Thus in this paper we to filter out completion proposals that is certainly irrelevant in the
will focus mainly on the design ideas. current coding context. Three types of filters can be applied.

28
(a) BCC filters out public method Upda-(b) BCC filters out public method Con- (c) BCC keeps add() available for JPanel.
teUI(). tainer.add() for JLabel.

Figure 4: BCC API filters.

• Private Filter. strategies may yield the highest improvement. The second question
The class javax.swing.JComponent contains the public method is the most important as it will help determine how BCC can be de-
updateUI(). Although public, this method is not intended to ployed in practice. We use the rank of an API in the popup pane as
be directly accessed by client code. It is made public be- the metric of improvement, with a value of 1 for the top API, 2 for
cause the Swing artifact(s) invoking this method are located the second API, and so on. If the adoption of a new strategy leads
outside the package javax.swing. to statistically smaller overall API ranks for all projects than those
BCC can be configured to filter out such non-API public of ECC, then it is considered a better strategy.
methods (Figure 4a). For our validation, we decided to test BCC against the Java AWT/
• Private Filter With Receiver Exceptions. Swing APIs. Table 1 depicts the nine projects used during our val-
Single inheritance in Java has been shown to generally lead idation. These include GUI frameworks that extend Swing and
to cleaner design, but it can have negative consequences too. AWT, as well as applications that use Swing and AWT. For each
One example is that class javax.swing.JComponent inherits project, we gathered the number of Swing/AWT API calls as well
from java.awt.Container. The designers probably made this as the number of distinct APIs called by the project. In particu-
decision to avoid the code duplication that would result from lar, jide-common, OpenSwing, SwingX, and the Zeus Java Swing
having each “true container” class redefine the exact same Components library are UI frameworks that invoke a total of 27166
Container methods. However, this means that non-container times of Swing and AWT methods. jEdit, LAPIS, SweetHome3D,
classes like JLabel also inherit from Container, even though and Sun’s Swing Tutorial Examples are applications that make use
JLabel is probably never intended to be a container. This of Swing and AWT by invoking a total of 14280 times of Swing
filter makes methods inaccessible (Figure 4b) except upon and AWT methods.
certain subclasses (e.g., JPanel in Figure 4c). We configured BCC with different combinations of strategies.
• Subclass Only. For each configuration of BCC, we ran it over each validation project
The public paint() method in class java.awt.Component should and gathered the total ranks. This is done by a program that visits
generally be called by an instance’s own repaint() method, or each method call and programatically invoking BCC to perform
from within a subclass of Component that wishes to extend code completion, which, as a result, computes an ordered list of
paint(). With BCC, paint() can be made accessible only from completion proposals. The position of the method to be called in
within subclasses of Component. that list is added to the total rank for the project. Table 2 shows a
part of the results of our validation so far. The first column lists the
4. PRELIMINARY VALIDATION AND DIS- projects. The second and third columns contain the ranking results
for ECC’s alphabetical order and ECC’s by-relevance order. The
CUSSION next four columns are for four of BCC’s configurations.
Table 2 shows that all of BCC strategies significantly improve
Table 1: Java Projects Used in BCC Validation. over ECC’s alphabetical order and by-relevance order. In particu-
lar, the “ranking only” column used only popularity-based ranking
Projects #API Calls #Unique Calls strategy (the ranking is computed dynamically; every time an API
LAPIS 1944 562 is used, its use count, or ranking increments by 1). Table 2 shows
OpenSwing 6865 721 that in total, the adoption of this strategy alone reduces 80.19% over
SweetHome3D 4368 674 ECC’s alphabetical order and 74.10% over the by-relevance order.
Swing Tutorial Examples 2629 547 Table 2 also shows that the combination of type-hierarchy-based
SwingX 8233 1250 order, filtering, and dynamic ranking yields the highest percent-
jEdit 5339 734 age of overall rank reduction (the right most column, 88.22% and
jide-common 11032 1007 84.60%). (The “type” strategy represents “type-hierarchy-based
zeus 1036 261 sorting” described previously. When combined with the type-hiearchy-
netbeans 151567 2928 based order, the popularity-based order has a higher priority.) En-
couraged by this result, we applied this combination to a larger
project, netbeans, and the percentage of reduction goes even higher
We have tried to empirically answer two validation questions
(the bottom section of Table 2, 91.96% and 90.23%).
about BCC. Our first validation question is whether each of BCC’s
Note that Table 2 shows only part of our overall results. For ex-
customization strategies outlined previously in Section 3 may ac-
ample, we tested the individual effects of the type-hierarchy-based
tually lead to any improvement over the default ECC. Our second
sorting and the user-defined grouping on ranking. These results
validation question is which combination of BCC’s customization

29
Table 2: Partial results of running ECC and BCC over validation projects. The “ranking only” column used only popularity-based ranking
strategy, and the “type” strategy represents “type-hierarchy-based sorting”. netbeans is added only recently.

Projects alphabetical relevance ranking only ranking+filtering ranking+type ranking+type+filtering


jEdit 619294 469516 104375 84436 72549 61102
Jide-common 824731 568139 163536 135446 133308 114654
LAPIS 182356 132981 49037 38494 27831 23895
OpenSwing 884303 711841 158490 129528 115456 98952
SweetHome3D 370844 286319 75136 58720 47438 39157
Swing Tutorial Examples 395301 340554 71544 57918 35188 31136
SwingX 698256 522218 164982 135875 123036 105812
zeus 174136 141989 34844 27535 16556 14137
Total 4149221 3173557 821944 667952 571362 488845
% reduction from alpha 23.51% 80.19% 83.90% 86.23% 88.22%
% reduction from relevance 74.10% 78.95% 82.00% 84.60%
netbeans 23924204 19706197 1924340
% reduction from alpha 17.63% 91.96%
% reduction from relevance 90.23%

are not shown. However, it is clear from Table 2 that the type- and production of APIs [5, 4].
hierarchy-based sorting adds additional value to the popularity-based BCC’s API filtering mechanism filters out public methods that
sorting strategy. are not APIs as well as APIs that are meant to be used in only lim-
We are experimenting with the static configuration of the popularity- ited contexts, such as in a subclass. Interestingly, the issue of non-
based ranking, hoping that a universally effective popularity-based API public methods seems to be important enough that a proposal
order can be found. But it seems that a combination of static and to extend Java language with the so-called “superpackage” modu-
dynamic popularity-based approaches and the other grouping and larity mechanism is being worked on as part of the Java community
filtering strategies would lead to the ultimate best configuration. process 3 .

5. RELATED WORK 6. REFERENCES


[1] M. Bruch, M. Monperrus, and M. Mezini. Learning from
The two most closely related work are [9] and [1]. Both incor-
Examples to Improve Code Completion Systems. In
porate additional information and knowledge (program history and
ESEC/SIGSOFT FSE, pages 213–222, 2009.
API go-togetherness, respectively) to help API consumers more ef-
fectively select APIs. Both work try to reduce the number of APIs [2] S. Han, D. Wallace, and R. Miller. Code Completion From
that a programmer has to go through before settling on the one she Abbreviated Input. In ASE’09, 2009.
or he needs. However, none of them have adequately addressed [3] R. Hill and J. Rideout. Automatic Method Completion. In
the issue of how to present the recommended APIs in the origi- ASE’04, pages 228–235, 2004.
nal Code Completion popup pane, or the related potential usability [4] R. Holmes and R. J. Walker. Informing Eclipse API
problems. New contributions of BCC include type-hierarchy-based Production and Consumption. In ETX’07, pages 70–74,
sorting, grouping, and filtering. 2007.
API patterns can be useful information aids to programmers in [5] R. Holmes and R. J. Walker. A Newbie’s Guide to Eclipse
the form of an additional view showing the recommended APIs [1, APIs. In MSR ’08, pages 149–152, 2008.
11]. Such tools can be helpful to a programmer engaging in the [6] A. J. Ko, B. A. Myers, and H. H. Aung. Six Learning
broader design exploration task, which requires deep thought and Barriers in End-User Programming Systems. In VL/HCC,
reflection, and tends to take longer time than using Code Comple- pages 199–206, 2004.
tion to insert a method call. For design exploration, design quality, [7] G. Little and R. C. Miller. Keyword Programming in Java.
rather than coding speed and work efficiency, is of paramount im- Autom. Softw. Eng., 16(1):37–71, 2009.
portance. Although some developers may use Code Completion for [8] D. M. Pletcher and D. Hou. BCC: Enhancing Code
design exploration, it is not necessarily the best tool for that task. Completion for Better API Usability. In ICSM’09, pages
Other related tools that are aimed at improving coding efficiency 393–394, 2009. Tool Demonstration.
in the code editor include keyword programming, abbreviation based [9] R. Robbes and M. Lanza. How Program History Can
completion, and automated method completion. Keyword program- Improve Code Completion. In ASE’08, pages 317–326, 2008.
ming takes words that may appear in an API as input to create an [10] Y. Ye, Y. Yamamoto, K. Nakakoji, Y. Nishinaka, and
expression [7]. In this way, it frees a programer from remember- M. Asada. Searching the Library and Asking the Peers:
ing the specific API names and may reduce a programmer’s mem- Learning to Use Java APIs on Demand. In PPPJ, pages
ory load. Abbreviation based completion speeds up coding effi- 41–50, 2007.
ciency by using abbreviated input to query syntactic valid code
[11] H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO:
snippets [2]. Automated Method Completion exploits code similar-
Mining and Recommending API Usage Patterns. In ECOOP,
ity to recommend code templates similar to what the programmer
pages 318–343, 2009.
is working on in the code editor [3].
BCC uses API popularity to sort the APIs for Code Completion. 3
JSR 294: Improved Modularity Support In the Java Programming
API popularity has been proposed for informing the consumption Language. http://jcp.org/en/jsr/detail?id=294

30

You might also like