Professional Documents
Culture Documents
by about 10 people. I5
RELEASE
I3
I2
RELEASE TIMELINE
cesses.
I3
I2
RELEASE
D7
D6
The data for this study comes from the complete change D5
25.7%
10
25%
8
600
4 6
MRs
2
NUMBER OF FILES
0
400
14.1%
84 85 86 87 88 89 90 91 92 93 94 95 96
YEAR
9.7%
NUMBER OF ACTIVE PROGRAMMERS PER DAY
200
7%
2 4 6 8 10
5.8%
PROGRAMMERS
3.6%
2.5%
1.9% 2.1% 1.5%
1.1%
0
0 1 2 3 4 5 6 7 8 9 10 >10
0
NUMBER OF DEVELOPERS ON THE SAME FILE IN THE INTERVAL OF I6 84 85 86 87 88 89 90 91 92 93 94 95 96
YEAR
10
modifying each le in the development interval
8
of release I6. Bar N shows the percentage of les which
4 6
IMRs
were worked on by N developers during the development
2
interval of release I6.
0
84 85 86 87 88 89 90 91 92 93 94 95 96
YEAR
NUMBER OF ACTIVE FEATURES PER DAY
10
4 DATA AND ANALYSIS
2 4 6 8
FEATURES
4.1 Levels of Parallel Development
0
The 5ESS system is maintained as a series of releases,
84 85 86 87 88 89 90 91 92 93 94 95 96
YEAR
P2
P3
P4
P5 1A
P6
P7
DELTA - DEVELOPER
P8
P9
2A
89
89
89
89
89
9
/8
/8
/8
/8
/8
/8
/8
/8
/8
/8
/8
5/
6/
7/
8/
9/
10
11
12
13
14
15
16
17
18
19
20
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
3A
ACTIVE PROGRAMMERS 4B
ACTIVE PROGRAMMERS
8 10
5C
6
89
89
89
89
9
/8
/8
/8
/8
/8
/8
/8
/8
/8
/8
/8
5/
6/
7/
8/
9/
10
11
12
13
14
15
16
17
18
19
20
line represents a version of the le as it was changed by
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
8/
a delta (denoted 1{5). The y-axis also encodes the de-
Figure 5: A closer look at developer activity. This veloper who made the delta (denoted A{C). The delta
is a closer look at the developer activity during the bus- sequence is read from top to bottom. The lines con-
iest period (8/89). Each line in the top panel shows necting the horizontal lines show where lines have been
MRs being worked on by 9 developers during this pe- changed from one version to the next. An upright trian-
riod. The X's indicate when they made deltas into the gle shows where new code was inserted while an inverted
le. The solid line in the bottom panel shows the num- triangle shows where code was deleted. The trapezoids
ber of developers who have open MRs on each of those show modications of blocks of code. Note that this
days. It is a magnication of the developer panel from gure only shows a fragment of each program version,
the previous gure. The dashed line shows the number approximately from lines 7850-7950.
of developers who actually made deltas on each day.
IMRs and MRs being developed per day. We then look
developer B put in changes on top of A's changes. Fi- at four measures associated with the amount of work
nally some of B's changes were modied by developer C within each feature, IMR and MR: their intervals, the
on the same day. number of les aected, the number of MRs involved,
Across the subsystem, 3% of the deltas made within and the number of developers involved. Table 1 sum-
24 hours by dierent developers physically overlap each marizes these data.
others' changes. Note that physical overlap is just one Figure 7 shows the frequency distributions of features,
way by which one developer's changes can interfere with IMRs, and MRs being worked on per day. The feature
others. We believe that many more con
icts arise as and IMR distributions have means of 25 and 22, and
a result of parallel changes to the same data
ow or maximum values of 86 and 62, respectively. On the
program slice. other hand, there is a mean of 70 MRs open per day,
4.4 Multilevel Analysis of Parallel Development and a maximum of more than 200. Note that in all cases
the tail is very long with respect to the mean.
In this section we make liberal use of histograms to pro- Figure 8 shows the frequency distributions of develop-
vide a clear picture of the data that would not be evi- ment intervals at the three levels. The intervals are
dent if we were to report merely the minimum, mean, measured by taking the dates of the rst and last delta
and maximum of each distribution. It is important to associated with that feature, IMR, or MR, and comput-
notice that the tails of several distributions are long ing the dierence. Thus the interval re
ects the activity
and fall o more slowly than the Poisson or binomial only with respect to coding.3 One observation here is
distributions (classical engineering distributions). This that the shapes of all three distributions appear to be
is extremely important to consider in designing tools: if similar, even though their scales are orders of magni-
a tool is designed around the mean value, it will not be tude apart. Note also that 46% of the IMRs and 50% of
particularly useful for the critical cases that need the the MRs are opened and solved on the same day. More
support the most, namely, those cases represented by importantly, the tails here are even longer with respect
the tail of the distribution.
3 For example, the feature interval measured excludes other
To understand the amount of parallelism going on at feature activities like estimation, planning, requirements, design,
the dierent levels, we examine the number of features, and feature test.
Features IMRs MRs
Min Mean Max Min Mean Max Min Mean Max
Being worked on per day 0 25.3 86 0 21.8 62 1 69.3 223
Interval (days) 1 318.5 3344 < 1 14.6 2233 < 1 10.1 2191
Files aected 1 31.0 906 1 4.3 388 1 1.1 15
MR count 1 34.6 2188 1 2.6 86 n/a n/a n/a
Developer count 1 4.0 98 1 1.1 9 n/a n/a n/a
Table 1: Data summary. This table summarizes the data to be used in analyzing the degree of parallelism.
NUMBER OF FEATURES
1000
100
NUMBER OF DAYS
80
600
60
40
200
20
0
0
0 10 20 30 40 50 60 70 80 90
100 200 300 400 500 600 700 800 900 1000 >1000
NUMBER OF FEATURES PER DAY FEATURE INTERVAL IN DAYS
2500
1000
NUMBER OF IMRs
NUMBER OF DAYS
1500
600
500
200
0
0
0 10 20 30 40 50 60 70
<1 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 >100
NUMBER OF IMRs PER DAY IMR INTERVAL IN DAYS
140
1000
10000
NUMBER OF MRs
NUMBER OF DAYS
6000
600
2000
200
0
0
Figure 7: Feature, IMR, MR distribution per day. Figure 8: Interval distributions. These histograms
These histograms show the distribution of the number show the development interval distributions for features,
of features, IMRs, and MRs being worked on per day. IMRs and MRs in number of days.
80
60
NUMBER OF FEATURES
60
40
40
20
20
0
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 >100
0
NUMBER OF IMRs
3000
2500
NUMBER OF IMRs
1500
1000
0
500
1 2 >2
0
NUMBER OF FEATURES
1 2 >2
150
FILES TOUCHED BY MR
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 >100
is only 10%.
5000
NUMBER OF IMRs
versions for a le. A tighter upper bound would be to DEVELOPERS PER IMR
Figure 12: Number of MRs per le. This histogram ferent developers to the same les within a day of
shows the distribution of the number of MRs aecting each other and some of these may interfere with
each le over the lifetime of the le. each other.
Over the interval of a particular release (I6), the
number of les changed by multiple developers is
50% which while not concurrent with respect to
the MR level is concurrent with respect to the re-
lease. These may also have interfering changes |
2000
1000
1 2 3 4 5 6 >6
MAXIMUM NUMBER OF MRs PER DAY PER FILE ticular, the tails of the distributions are the signicant
factors to consider in technical support, not the mean
Figure 13: Maximum number of MRs per le per values. In both the cases of workspaces and merging,
day. This histogram shows the distribution of the max- we claim that those critical factors have not been un-
imal number of MRs that aect each le in a day. derstood or appreciated.
The data in subsection 4.4 suggests that, if each MR had
its own workspace, we would need on the order of 70 to
of summarization. Thus we argue that we have the nec- 200 workspaces per day for this particular subsystem.
essary internal validity. (And this is just one of 50 5ESS subsystems!) More-
over, since 50% of MRs are solved in less than a day,
It is in the context of external validity that we must the cost and complexity of constructing and destroy-
be satised with arguments weaker that we would like. ing workspaces becomes very important. One might re-
We argue from extra data (namely, visualizations of the duce the number of workspace per day by assuming one
entire 5ESS system similar to Figure 1) that this sub- workspace per IMR or per feature. Doing so introduces
system is suciently representative of the other subsys- further coordination problems since there may be more
tems to act as their surrogate. The primary problem than one developer working on the IMR or feature.
then is the representativeness of 5ESS as an embedded Given the multi-level nature of feature development,
real time and highly reliable system. In its favor are the one might imagine the need for a hierarchical set of
facts that it is built using a common language (C) and workspaces[9] such that there is a workspace for each
development platform (UNIX). Also in its favor are the feature, a subset of workspaces for each IMR for that
facts that it is an extremely large and complicated sys- feature and then individual workspaces for each MR.
tem development and that problems encounter here are In either case, further studies are needed to determine
at least as severe as those found in lesser sized and com- the costs and utility of workspaces in supporting the
plicated developments. Thus we argue that our data has phenomena we have found in this study.
a good level of external validity and is generalizable to
other developments of similar domains. The utility of the current state of merge support
depends on the level of interference versus non- ACKNOWLEDGEMENTS
interference. The data in subsection 4.5 indicates that The Code Decay Project [2] is a multi-disciplinary and
about 45% of the les can have 2 to 16 parallel versions multi-institution project for which a common infrastruc-
with potentially interfering changes. It is not clear how ture has been created in support of multiple strands
well current merge technologies will be able to support of software engineering research. We wish to thank
this degree of parallel versions | how do you merge 16 members of the project who have provided us with
parallel versions? The data we have uncovered certainly background information, insightful discussions, techni-
leads us to be sympathetic with Adele's claim that fre- cal suggestions, and general support which led to this
quent updates are necessary for coordinated changes work.
and that waiting until commit time will lead to par- REFERENCES
allel versions that cannot be merged without some very
costly overhead and coordinated eort. In fact, the sup- [1] Frederick P. Brooks, Jr. No silver bullet: Essence and acci-
ported strategy is what is left unsupported in these de- dents of software engineering. IEEE Computer, pages 10{19,
April 1987.
velopments.
[2] Code decay home page. http://www.bell-labs.com/org/
Further studies are needed to assess the validity and 11359/projects/decay.
utility of merge technologies. We note in the next sec- [3] Michael A. Cusumano and Richard W. Selby. Microsoft Se-
tion one such study that will help to assess this area. crets. The Free Press, 1995.
[4] Jacky Estublier and Rubby Casallas. The Adele congu-
The synchronize and build strategy poses a problem in ration manager. In Walter F. Tichy, editor, Conguration
this context where features are the primary unit of work. Management. Trends in Software. John Wiley & Sons, 1994.
Features represent a set of logically coherent changes to [5] Barney G. Glaser and Anselm L. Strauss. The Discovery of
the system. As noted in Figure 9 features have a very Grounded Theory: Strategies for Qualitative Research. Al-
large tail distribution with a maximum number of les dine Publishing Company, 1967.
per feature show in Table 1 as 906. Synchronizing at the [6] Rebecca E. Grinter. Doing software development: Occa-
MR level is not a problem since most MRs aect only sions for automation and formalisation. In Proceedings of
the European Conference on Computer Supported Coopera-
one le. However, each MR represents only a partial so- tive Work, Lancaster, U.K., September 1997.
lution to a problem as represented by an IMR and is not [7] Susan Horwitz, Jan Prins, and Thomas Reps. Integrat-
a useful candidate for coherence and consistency. IMRs, ing noninterfering versions of programs. ACM Trans. on
on the other hand, each represent a specic problem in Software Engineering and Methodology, 11(3):345{387, July
implementing a feature and have a mean value for the 1989.
les aected of 4.3 per IMR. However, as we have noted [8] Charles M. Judd, Eliot R. Smith, and Louise H. Kidder.
elsewhere, the mean is not a satisfactory indication of Research Methods in Social Relations. Harcourt Brace Jo-
vanovich College Publishers, 1991.
the problem since the tail here is again a very substan- [9] Gail E. Kaiser and Dewayne E. Perry. Workspaces and exper-
tial one having a maximum value of 388 les for at least imental databases: Automated support for software mainte-
on IMR. nance and evolution. In Proceedings of the 1987 Interna-
tional Conference on Software Maintenance, pages 108{114,
Here again, further studies are needed to assess the ap- Austin, Texas, September 1987.
propriate level of parallelism that is useful in implement- [10] David B. Leblang. Personal communication.
ing suchs a synch and build strategy. [11] David B. Leblang. The CM challenge: Conguration man-
6.3 Future Directions agement that works. In Walter F. Tichy, editor, Congura-
tion Management. Trends in Software. John Wiley & Sons,
We have looked at only the prima facie con
icts, namely, 1994.
those where there are changes on changes or changes [12] Alex Mahler. Variants: Keeping things together and telling
within a day of each other. A more interesting class them apart. In Walter F. Tichy, editor, Conguration Man-
agement. Trends in Software. John Wiley & Sons, 1994.
of con
icts are those which we might term semantic [13] K.E. Martersteck and A.E. Spencer. Introduction to the
con
icts. These cases arise where changes are made to 5ESS(TM) Switching System. AT&T Technical Journal,
the same slices of the program and hence may interfere 64(6 part 2):1305{1314, July{August 1985.
with each other semantically. This phenomena requires [14] Dewayne E. Perry. System compositions and shared depen-
us to look very closely at the les themselves via some dencies. In 6th Workshop on Software Conguration Man-
program analysis tools. agement, Berlin, Germany, March 1996.
[15] Marc J. Rochkind. The Source Code Control System. IEEE
Once we have this level of understanding and knowl- Trans. on Software Engineering, SE-1(4):364{370, December
edge of interference, it will be interesting to see if there 1975.
are any correlations between the associated quality data [16] Walter Tichy. Design, implementation and evaluation of a re-
and programs where there are high degrees of parallel vision control system. In Proceedings of the 6th International
changes and/or interference. Conference on Software Engineering, pages 58{67, Tokyo,
Japan, September 1982.
[17] P. A. Tuscany. Software development environment for large
switching projects. In Proceedings of Software Engineer-
ing for Telecommunications Switching Systems Conference,
1987.
[18] Robert K. Yin. Case Study Research: Design and Methods.
Sage Publications, 2nd edition, 1994.