You are on page 1of 4

Gender in End-User Software Engineering

Margaret Burnett*, Susan Wiedenbeck†, Valentina Grigoreanu*,


Neeraja Subrahmaniyan*, Laura Beckwith*, Cory Kissinger*

*Oregon State University †Drexel University


Corvallis, OR, USA Philadelphia, PA
{burnett,grigorev,subrahmn,beckwith,ckissin}@eecs.orst.edu Susan.Wiedenbeck@cis.drexel.edu

ABSTRACT debugging tools seemed to be different than males’ behavior. This


In this paper, we describe research that reports gender differences led us to conduct a series of studies investigating whether there
in usage of software engineering tools by end-user programmers. were indeed gender differences pertinent to the design of these
We connect these findings with possible explanations based on end-user software engineering tools.
theories from other disciplines, and then add to that our recent Our results showed consistently that male and female end-user
results that these differences go deeper than software engineering programmers did indeed use debugging tools differently. In this
tool usage to software engineering strategies. We enumerate the paper, we first offer possible explanations for these phenomena
strategies that work better for males and the ones that work better from theories in the areas of information processing and problem
for females, and discuss implications and possible directions for solving, demonstrating how these differences should indeed affect
follow-up. males’ and females’ behaviors during end-user debugging.
Second, we describe our most recent empirical results on gender
Categories and Subject Descriptors in end-user software engineering, focusing on two lines of
D.2.5 [Software Engineering]: Testing and Debugging; H.1.2 research: (1) what strategies males and females employ in
[Information Systems]: User/Machine Systems—Human factors; debugging and (2) how two variants of just-in-time explanations
H.4.1 [Information Systems Applications]: Office Automation— of debugging strategy impact males’ and females’ debugging.
Spreadsheets
2. BACKGROUND
General Terms Areas such as psychology and marketing have identified ways in
Human Factors, Reliability which males and females differ, and these differences have been
linked to behaviors in software-based problem-solving tasks such
as end-user debugging [2].
Keywords
Gender, debugging, end-user programming, end-user software In psychology, Bandura defines the construct of self-efficacy as
engineering, tinkering, self-efficacy, strategy, Surprise-Explain- an individual’s judgment of his or her ability to carry out a
Reward. specific action and thus to attain a desired performance outcome
[1]. Individuals who have low self-efficacy tend to exhibit lower
use of cognitive strategies, less persistence, and lower effort
overall than individuals who have high self-efficacy. Indeed,
1. INTRODUCTION studies of learning computer applications showed that females had
A goal of our research is to support people who engage in end- lower self-efficacy than males [8, 17], but they did not go on to tie
user software engineering tasks, such as testing and debugging the effects of self-efficacy to performance outcomes.
spreadsheet formulas. For example, we have developed features
that can be seamlessly blended into spreadsheet software to An empirical study that our group carried out on the effects of
encourage and assist end users in systematically testing and self-efficacy in end-user debugging did go on to show
debugging spreadsheets [7]. In the course of this research we downstream effects of self-efficacy [3]. The environment used
began to notice that females’ behavior with these testing and was WYSIWYT (“What You See Is What You Test”) which
provides visual debugging tools [7]. Participants in the study,
males and females, were each given a research spreadsheet with
Permission to make digital or hard copies of all or part of this work for
enhanced features to aid debugging. Their self-efficacy was
personal or classroom use is granted without fee provided that copies are measured before doing the debugging tasks. As in the studies
not made or distributed for profit or commercial advantage and that above, females had lower self-efficacy than males. But more
copies bear this notice and the full citation on the first page. To copy importantly, their self-efficacy was tied to their performance.
otherwise, or republish, to post on servers or to redistribute to lists, Females’ level of self-efficacy predicted their final percent
requires prior specific permission and/or a fee. testedness of the spreadsheet—low self-efficacy was associated
WEUSE IV’08, May 12, 2008, Leipzig, Germany. with low feature usage, whereas males’ performance was not
Copyright 2008 ACM 978-1-60558-034-0/08/05...$5.00.

21
predicted by self-efficacy, suggesting that self-efficacy is much number of bugs fixed. Males tinkered highly, even excessively, in
more important in females’ problem solving than males’. Follow- the low-cost interface, which makes tinkering easy to do. They
on studies sometimes found females’ self-efficacy to be lower tended to tinker without pausing to reflect, and this may have
than males and sometimes did not—but they consistently found reduced both the educational benefit of tinkering and
this phenomenon of a significant tie between females’ self- correspondingly their effectiveness in debugging [14]. In contrast
efficacy and their success, but no such tie for males [4]. to males, females tinkered equally in both interfaces, and their
tinkering was predictive of testing effectiveness and increased
Research in motivation has shown that females perceive higher bugs fixed. Females paused during tinkering more than males did.
risks than males do in many situations [10], including intellectual Tinkering in a moderate and “pause-ful” manner appears to have
risks [9]. Blackwell’s Attention Investment Model [6] describes helped the females learn about the features, leading to effective
risks and benefits in problem solving using programming by end outcomes. Nevertheless, self-efficacy remained an important
users who are not necessarily highly skilled. In the model, an predictor of effectiveness in females’ debugging, interacting with
individual considers the perceived benefit of programming and its the benefits gained in tinkering. For example, in some cases in the
expected payoff versus the perceived risk and cost if the
high-support environment, females’ tinkering actually interfered
programming fails. If females are risk-averse they may perceive with their self-efficacy.
the risks and cost to greatly outweigh the benefits of
programming. Blackwell’s model applies not only to
programming, but also to other software-based problem-solving 3. CURRENT AND FUTURE RESEARCH
tasks, especially those requiring use of (perceived) sophisticated DIRECTIONS
or time-intensive devices in the software environment.
In our study above, we did find evidence of risk-averse behaviors
3.1 Strategies
by females. As we have mentioned, several features were Because of the tool usage and behavior differences we had found
in end-user debugging, we wondered whether the differences went
provided to help participants debug spreadsheets. Females showed
significantly lower acceptance of these debugging features. They deeper than mere behaviors, down to their strategies. Thus, we
were willing to edit formulas, a debugging feature they were decided to investigate what debugging strategies end-user
familiar with, but were less willing than males to initially try out programmers were actually trying to follow, and whether there
were gender differences in those strategies.
the new features. The females were also less willing to adopt new
features, that is, to intellectually engage with the features in Research in the area of marketing also provides reasons to think
repeated usage. In a post-questionnaire, females said that they did that there may be gender differences in end-user debugging
not use a new feature because they thought it would take too long strategies. The Selectivity Hypothesis [13] proposes that males
to learn, suggesting that risk was an issue with them and is a and females behave differently in decision making and problem
barrier to adopting new features that could improve their solving. Females tend to process information in a comprehensive
debugging. Here again self-efficacy arises; although the females way, examining all the available cues and making elaborative
said they thought it would take them too long to learn the features, inferences in order to make a decision. Furthermore, they practice
by the end of the task, our measures of feature comprehension comprehensive processing whether the problem is simple or
showed no differences in how well males versus females actually complex. Males, on the other hand, avoid comprehensive
understood the features. information processing, using heuristics to make decisions and
only falling back on comprehensive processing if a complex task
A partial replication of the study above, with real-world end-user requires it. If the Selectivity Hypothesis holds across multiple
developers using Excel on a real-world spreadsheet, showed the areas involving problem solving and decision making, female end
findings to be robust. It replicated the finding that females’ self- users may be most effective using debugging strategies that make
efficacy predicts effectiveness, while males’ self-efficacy does good use of comprehensive processing.
not. Furthermore, females in the Excel study again focused on the
most familiar features, especially value edits in this study, There are many potential strategies for debugging spreadsheets.
suggesting that they may have avoided more complicated features For example, one might begin by testing, and then use a data flow
(or at least those features that were perceived to be strategy to narrow down the possible cause of incorrect output.
complicated)—something that did not occur among the males. Alternatively, one might inspect the code looking for logic flaws.
To discover the strategies males and females attempted to use in
We then carried out a follow-on empirical study to understand debugging, we asked them via an open-ended question what
male and female end-user programmers’ exploration and self- strategies they used. We then combined these verbal descriptions
learning of spreadsheet testing and debugging features. In this of strategies with behavioral evidence of these strategies, to
study we used two environments, one similar to the prior study determine whether males and females actually used different
(called the low-cost environment) and a second environment that strategies in debugging and whether those strategy choices led to
was also similar but designed to provide greater support (called success.
the high-support environment).
Their responses revealed eight strategies: testing, dataflow, code
Participants engaged in tinkering (or playful experimentation) in inspection, specification checking, color following, to-do listing,
exploring and using new features. Educational literature reveals fixing formulas, and spatial. Three strategies stood out: testing,
that hands-on, self-guided tinkering can benefit learners [14]. dataflow, and code inspection.
However, the educational literature also reports that males have a The behavioral evidence showed that testing values to verify
greater propensity to tinker than do females [11]. cells’ correctness was a successful strategy for males. In
In the study, males tinkered a great deal, but across both particular, the males who had high success in debugging used
environments their tinkering did not predict effective testing nor testing more than the less successful males; likewise, successful

22
male debuggers used testing more than successful female approach debugging strategies, rather than explaining features.
debuggers. Males also used dataflow strategies more than did Six such explanations were developed: how to find errors, how to
females, and furthermore males using dataflow were more fix errors, how I can test my spreadsheet, why should I change
successful at the end of the experiment than were females who values, what is a good overall strategy, and am I doing it right.
used dataflow. One important question is whether participants would take the
By contrast, code inspection was more associated with females; time to learn via the explanations—the Attention Investment
successful females had more instances of code inspection than Model makes clear that individuals assess the potential benefits
males and more total formulas displayed in all instances. and costs in order to make a decision.
Referring back to the Selectivity Hypothesis, it may be that Tool tips in WYSIWYT addressed part of these explanations but
females using code inspection were practicing comprehensive were limited to a short text, and did not seem to be a viable way to
processing, investigating many formulas in detail. (Males, in explain strategies. We therefore chose two new two vehicles for
contrast, may have been successful because strictly following short strategy explanations: video explanation snippets (1-2.5
dataflow arrows through the spreadsheet minimizes the amount of minutes) and hyperlinked textual explanations. The videos
information processing required.) Other strategies that were showed two people (one male and one female) problem solving
successful for females but not for males were specification and discussing one of the six strategies, while demonstrating how
following in which users verified formulas via comparison to a to carry them out on a spreadsheet. The female took on the role of
specification document, as well as to-do listing (marking cells to the confused or questioning student, who in the end of the video
track code inspection progress). was always successful. This was done because one way of
Overall, of the eight strategies the end-user participants described, increasing self-efficacy is to view another person like oneself
there were gender differences for seven of them! A main result of struggle in a task and ultimately succeed [1]. The wording of the
this study was that debugging strategies that worked well for text version was identical to the video except that there was no
males did not work well for females. A disadvantage for female sample spreadsheet to which the reader could refer. The
debuggers was that, for the most part, their preferred strategies of participants were 7 males and 3 females. When they needed an
code inspection, specification following, and to-do listing, are not explanation during the course of debugging, they had the option
well supported by features in end-user environments. Future work of using either the text version or the video.
may assess whether new design features can provide better The results showed that the explanations improved the
support for these strategies. participants’ choices, allowing them to close information gaps that
hindered their debugging. In addition, females reported an
3.2 Strategy Explanations increase in their confidence due to the explanations, but no male
We have also been exploring how to support the end-user suggested this. In this small group of participants the choices of
programmers who struggle to find a suitable strategy, because the media varied, suggesting both presentation choices should be
above study also revealed that not everyone used reasonable provided. The findings also provided recommendations for
strategies to track down the bugs. In particular, we would like to changes to future versions of the explanations: some participants
entice them into strategies that have a chance of working well for lacked the motivation to read or view the explanations and some
them. misinterpreted the explanations.
The WYSIWYT environment in which we prototype our
approaches provides a motivational strategy, called Surprise-
4. IMPLICATIONS AND CONCLUSIONS
Explain-Reward, which attempts to entice end-user programmers The implications of our findings so far are quite clear: the features
to use end-user software engineering features for systematic in supposedly gender-neutral end-user software environments are
testing and debugging [18]. The first step in the strategy is to not gender-neutral after all. Males’ and females’ use of and
(gently) surprise the user with something unexpected in the benefits from these features vary greatly.
environment, such as a red colored cell border. Second, if the user Currently, we are designing a larger and more definitive empirical
is curious about it, he or she can seek an explanation via tool tips study of strategy explanations and gender. We are also
that pop up when the user hovers over the red cell border. Finally, investigating whether small changes to the attributes of features
if the user reads and carries out what is advised in the tool tip, may remove some of the barriers to males’ or females’ success
rewards may occur; for example, the feature might help them find with and willingness to use those features.
the bug. While the surprises and rewards are fairly well We point out that there are no “typical” males or “typical”
developed, there is still much to do in determining what females. For example, many males use comprehensive
explanations users want and how they can most effectively be information processing, and many females use heuristic
conveyed. processing. Therefore, we are not advocating separate versions of
We have conducted an initial qualitative empirical study to software for each gender. Rather, we advocate adjusting features
determine what types of explanations users need in debugging and offering flexible options so that any end-user programmer
[12]. These explanations fell into five groups. One of the most whose cognitive or problem-solving style does not fall into the
prominent needs elicited from the study participants was patterns preferred by the developer of that software can still be
explanations of how to proceed in debugging, e.g., what would be effective in their software development efforts.
a suitable strategy or how to accomplish a particular goal. This
type of information gap was by far greater than explanations of 5. ACKNOWLEDGMENTS
new features. This work was supported in part by Microsoft Research and by
Building on these new findings on information gaps, we carried NSF CNS-0420533, ITR-0325273 and CCR-0324844.
out a qualitative empirical study focusing on explaining ways to

23
6. REFERENCES [11] Jones, M. G., Brader-Araje, L., Carboni, L. W., Carter, G.,
[1] Bandura, A. Self-efficacy: Toward a unifying theory of Rua, M. J., Banilower, E. and Hatch, H. Tool time: Gender
behavioral change. Psychological Review 8, 2 (1977), 191- and students’ use of tools, control, and authority. Journal of
215. Research in Science Teaching 37, 8 (2000), 760-783.

[2] Beckwith, L. and Burnett M. Gender: An important factor in [12] Kissinger, C., Burnett, M., Stumpf, S., Subrahmaniyan, N.,
end-user programming environments? In Proc. IEEE Beckwith, L., Yang, S., and Rosson, M. B. Supporting end-
Symposium on Visual Languages and Human-Centric user debugging: What do users want to know? In Proc.
Computing (2004), 107-114. Advanced Visual Interfaces, ACM Press (2006), 135-142.

[3] Beckwith, L. Burnett, M., Wiedenbeck, S., Cook, C., Sorte, [13] Meyers-Levy, J. Gender differences in information
S., and Hastings, M. Effectiveness of end-user debugging processing: A selectivity interpretation. In P. Caffarata & A.
software features: Are there gender issues? In Proc. CHI Tybout (Eds.), Cognitive and Affective Responses to
2005, ACM Press (2005), 869-878. Adverrtising. Lexington, MA, Lexington Books, 1987.

[4] Beckwith, L., Inman, D., Rector, K., Burnett, M. On to the [14] Rowe, M.B. Teaching Science as Continuous Inquiry: A
real world: Gender and self-efficacy in Excel, In Proc. Basic (2nd ed.). McGraw-Hill, New York, NY 1978.
VLHCC, IEEE (2007). [15] Subrahmaniyan N., Beckwith, L., Grigoreanu, V., Burnett,
[5] Beckwith, L. Kissinger, C., Burnett, M., Wiedenbeck, S., M., Wiedenbeck, S., Narayanan, V., Bucht K., Drummond,
Lawrance, J., Blackwell, A., and Cook, C. Tinkering and R., Fern, X. Testing vs. Code Inspection vs. ... What Else?
gender in end-user programmers’ debugging, In Proc. CHI Male and Female End Users’ Debugging Strategies. In Proc.
2006, ACM Press (2006), 231-240. CHI 2008, ACM Press (to appear).

[6] Blackwell, A. First steps in programming: a rationale for [16] Subrahmaniyan N., Kissinger, C., Rector, K., Inman, D.,
attention investment models. In Proc. IEEE Human-Centric Kaplan, J., Beckwith, L., and Burnett, M. Explaining
Computing Languages and Environments (2002), 2-10. debugging strategies to end-user programmers. In Proc.
IEEE Symposium on Visual Languages and Human-Centric
[7] Burnett, M., Cook, C. and Rothermel G. End-user software Computing (2007), 127-134.
engineering. Communications of the ACM 47, 9 (2004), 53-
58. [17] Torkzadeh, G. and Koufteros, X. Factorial validity of a
computer self-efficacy scale and the impact of computer
[8] Busch, T. Gender differences in self-efficacy and attitudes training. Educational and Psychological Measurement 54, 3
toward computers. Journal of Educational Computing (1994), 813-821.
Research 12, 2 (1995), 147-158.
[18] Wilson, A., Burnett, M., Beckwith, L., Granatir, O., Casburn,
[9] Byrnes, J. P., Miller, D. C. and Schafer W. D. Gender L., Cook, C., Durham, M. and Rothermel, G. Harnessing
differences in risk taking: A meta-analysis. Psychological curiosity to increase correctness in end-user programming. In
Bulletin 125, (1999), 367-383. Proc. CHI 2003, ACM Press (2003), 305–312.
[10] Finucane, M., Slovic, P., Merz., C-K., Flynn, J. and
Satterfield, T. Gender, race and perceived risk: the white
male effect. Health, Risk and Society 2, 2 (2000), 159-172.

24

You might also like