You are on page 1of 20

1lLle: !"# %&'()*+ !",- .

/0 1)- 2,,3 4",56 7 1)', 890+# /- 9", :;


<)&= >,9,-9&/- ?,")@&/0* &- A&*5'
AuLhor: %)-&,= ?0*+) B A*)-C 4,09,D,*E
LA uaLe (approved for prlnL):
uCl: 10.3778/!LlS.2013.22.8urda.1



Note to users: Articles in the Epubs ahead of print (EAP) section are peer
reviewed accepted articles to be published in this journal. Please be aware
that although EAPs do not have all bibliographic details available yet, they
can be cited using the year of online publication and the Digital Object
Identifier (DOI) as follows: Author(s), Article Title, Journal (Year), DOI, EAP
(page #).
The EAP page number will be retained in the bottom margin of the printed
version of this article when it is collated in a print issue. Collated print
versions of the article will contain an additional volumetric page number.
Both page citations will be relevant, but any EAP reference must continue to
be preceded by the letters EAP.

ISSN-0729-1485
Copyright ! 2013 University of Tasmania
All rights reserved. Subject to the law of copyright no part of this publication
may be reproduced, stored in a retrieval system or transmitted in any form or
by any means electronic, mechanical, photocopying, recording or otherwise,
without the permission of the owner of the copyright. All enquiries seeking
permission to reproduce any part of this publication should be addressed in
the first instance to:
The Editor, Journal of Law, Information and Science, Private Bag 89, Hobart,
Tasmania 7001, Australia.
editor@jlisjournal.org
http://www.jlisjournal.org/

EAP 1
Why Discard When You Can Keep Them? A Case
Study on the E-Mail Retention Behaviour in Firms
DANIEL BURDA
*
AND FRANK TEUTEBERG
**

Abstract
Firms are increasingly required to consciously retain and dispose of specific
information as part of an effort to ensure compliance with legal and regulatory
mandates. While e-mails represent a major part of all corporate records, they can be
used as electronic evidence in legal investigations and compliance audits. However,
the decision towards e-mail retention or disposal is often incumbent upon employees
in the course of performing their jobs. This paper presents the results of a case study
seeking to uncover how and why employees retain e-mails. We employ qualitative and
quantitative data collection methods, thereby analysing mailboxes of 20 employees
and more than 700,000 e-mails. Our findings point towards different types of
employee behaviour and a fractional tendency to hoard vast amounts of e-mail
pursuing a keep everything forever mentality. Based on the consolidated findings,
we elaborate a set of propositions, highlight the organisational implications and
suggest opportunities for future research.
Introduction
In the light of the recent rise and increasing diffusion of social media in the
corporate context, e-mail might appear an outdated and old-fashion means of
business communication. However, there is no doubt that e-mail is still the
pervasive and ubiquitous application that interconnects the majority of
business information and corporate communication.
1
E-mail has also gained a
reputation as a smoking gun.
2
Anecdotal evidence from the practitioners
community and market studies suggests that e-mail is one of the most
important means in the preparation of legal evidence in litigation and

*
University of Osnabrueck, Institute of Information Management and Information
Systems, Katharinenstr. 1, 49069 Osnabrueck, Germany, <dburda@uni-
osnabrueck.de>.
**
University of Osnabrueck, Institute of Information Management and Information
Systems, Katharinenstr. 1, 49069 Osnabrueck, Germany, <frank.teuteberg@uni-
osnabrueck.de>.
1
Mimecast, The Shape of Email - Research Report (2012); Judith Ramsay and Karen
Renaud, Using Insights from E-mail Users to Inform Organisational E-mail
Management Policy (2012) 31(6) Behaviour & Information Technology 587.
2
Linda Volonino, Electronic Evidence and Computer Forensics (2003) 12
Communications of the Association for Information Systems 3.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 2
regulatory investigations.
3
Consistently, it can be observed from recent
litigation that firms have to reckon with being accused based on evidence
found in e-mail. Examples are the US governments lawsuit against Standard
& Poors
4
and the dispute between Oracle and Google.
5
Similarly, as other
prominent examples such as ING
6
and Morgan Stanley
7
show, firms risk
being fined when they fail to retrieve retained e-mails upon the request of
regulators or the courts. In both cases the firms had to pay penalties of up to
USD15 million, because they were not able to retrieve retained e-mails.
Today, risks related to compliant e-mail retention are a growing challenge for
many firms, since these risks may have decisive, not exclusively monetary
consequences; they may also result in a loss of credibility and reputation.
8

According to the private sector research firm Gartner, software vendors have
responded to these demands by offering extended archiving products
referred to as Enterprise Information Archiving (EIA) solutions. Those
solutions support archiving of electronically stored information (ESI), such as
e-mail and provide e-discovery functionality as well as policy-based mailbox
management.
9
Although firms are investing in EIA solutions to manage their
ageing data assets, the role of people, decision rights and policies is
emphasised in an attempt to ensure the effective retention of a firms
information assets in line with business, legal and regulatory objectives.
10
On
the other hand, it is recognised that people rather tend to overvalue
information, which seems to foster a tendency to amass rather than discard

3
Association of Records Managers & Administrators (ARMA), Study: E-Discovery
Not Limited to E-Mail (2012) 46(1) Information Management 12.
4
Amanda Bronstad, Feds Preparing to Sue Standard & Poors Over Pre-crash Ratings (4
February 2013) The National Law Journal <http://at.law.com/DxHifq>.
5
Joe Mullin, Oracle Tells Jury You Cant Just Step on Somebodys Intellectual Property
(17 April 2012) Ars Technica <http://arstechnica.com/tech-
policy/2012/04/oracle-tells-jury-you-cant-just-step-on-somebodys-intellectual-
property/>.
6
EEI, ING Firms Settle Email Retention Case (11 February 2013) Compliance
Reporter 47.
7
Reuters, Morgan Stanley Offers $15M Fine for E-Mail Violations (2006)
Computerworld
<http://www.computerworld.com/s/article/108687/Morgan_Stanley_offers_15
M_fine_for_e_mail_violations>.
8
Nancy Flynn, The E-policy Handbook: Rules and Best Practices to Safely Manage your
Company's E-mail, Blogs, Social Networking, and Other Electronic Communication Tools
(Amacon, 2009).
9
Sheila Childs, Kenneth Chin and Debra Logan, Magic Quadrant for Enterprise
Information Archiving (2011).
10
Vijay Khatri and Carol V Brown, Designing Data Governance (2010) 53(1)
Communications of the ACM 148; Gerhard F Knolmayer et al, E-mail Governance:
Are Companies in Financial Industries More Mature? (Paper presented at the 45th
Hawaii International Conference on System Sciences, 2012).
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 3
digital information.
11
Bearing in mind that a decision towards e-mail retention
is often held by the end-user due to the lack of respective policies
12
and thus
might be up to chance, this points towards considerable organisational
implications between the poles of human behaviour and corporate
objectives.
13
Guided by the following research question, it is thus the intent of
this study to provide an understanding of employees e-mail retention
behaviour in a firm: How and why do employees retain corporate e-mails?
In this case study, we address our research question by analysing a set of 20
mailboxes of employees in a major software firm. The paper is structured as
follows: In following section we present related research on the relevant
topics and highlight the research gap. Next, we describe our research
methodology followed by the presentation of our findings. Then, we discuss
our findings and suggest a set of propositions. Finally, we conclude this study
by elaborating the studys implications and opportunities for future research.
1 Related Work
A review of the extant literature revealed only one publication explicitly
focused on email retention and governance issues. Knolmayer et als
14
study
proposes indicators to develop a maturity model for e-mail governance and
examines the e-mail governance maturity of various firms. Based on their
empirical examination, they conclude that firms are struggling to implement
robust policies for handling, archiving and deleting e-mails. Moreover,
Volonino
15
presents an overview of computer forensics to encourage research
into e-mail archives and e-records management. According to Volonino,
currently e-mail is considered one of the primary sources of e-evidence in
many legal actions while IT departments are rarely prepared for the issues
that e-discovery impose on active/archival data operations. Similarly, Ward
et al describe the organisational challenges in responding to e-discovery
requests in a timely and cost-effective manner: While storage costs of ESI
may be inexpensive, managing ESI is not particularly so when the company
has not implemented rigorous policies on e-mail usage and ESI document

11
Joseph G Davis and Shayan Ganeshan, Aversion to Loss and Information
Overload: An Experimental Investigation (Paper presented at the International
Conference on Information Systems (ICIS) 2009).
12
Burke T Ward et al, Recognizing the Impact of E-Discovery Amendments on
Electronic Records Management (2009) 26(4) Information Systems Management 350.
13
Elizabeth Lomas, Information Governance: Information Security and Access
Within a UK Context (2010) 20(2) Records Management Journal 182; Whitepaper: The
Disconnect Between Legal and IT Teams (2009) Waterford Technologies
<http://www.waterfordtechnologies.com/wp-content/uploads/2012/11/4.102.1-
WHITE-PAPER-Disconnect_Legal_and_IT-DontKnows.pdf>.
14
Knolmayer et al, above n 10.
15
Volonino, above n 2.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 4
retention as part of a litigation readiness program.
16
A challenging aspect of
e-discovery is data collection that nowadays runs into terabytes of data,
including e-mails, which makes a systematic and cost-effective retention
management a necessity.
17
E-mail metadata, such as sender and subject line
information, have to be kept online and be easily accessible and searchable for
all types of regulatory, audit and legal inquiries.
18
According to a Gartner
study, the increase in legal discovery associated with e-mail has driven the
demand for new e-mail archiving applications, which is recognised as one of
the fastest growing segments in the software market. E-mails are considered
to consume large amounts of storage and IT budgets and requirements for
archiving, e-discovery and compliance add additional cost to the
management of e-mail.
19

Another stream of research relevant to this study focuses on the examination
of behaviour in relation to individual e-mails. This research, mostly stemming
from the domain of Human Computer Interaction (HCI), follows a typical
design cycle to derive user requirements from which new system features,
that improve e-mail solutions,
20
can be developed. For instance, Whittaker
and Sidner empirically investigated the use of e-mail applications.
21
Based on
a mailbox analysis of employees, they found that employees maintain an
average of 47 folders and keep 2 482 e-mails, 34 per cent of which are older
than three months. They identified three e-mail filing strategies, namely, no
filing, spring-cleaning and frequent filing. From these strategies they derived
functional requirements to redesign e-mail applications. Ten years later,
Fisher et al conducted a similar study to compare their findings with 1996.
22

They found that inboxes have roughly the same amount of items, but
employees e-mail archives have grown tenfold with a mean at 28 660 e-mail
items. According to Fisher et al, 43 per cent of all items were older than three

16
Ward et al, above n 12, 351.
17
John C Ruhnka and John W Bagby, Using ESI Discovery Teams to Manage
Electronic Data Discovery (2010) 53(7) Communications of the ACM 142; Linda
Volonino, Janice Sipior and Burke T Ward, Managing the Lifecycle of
Electronically Stored Information (2007) 24(3) Information Systems Management 231.
18
Linda Volonino, Guy H Gessner and George F Kermis, Holistic Compliance with
Sarbanes-Oxley (2004) 14(1) Communications of the Association for Information
Systems 219.
19
Childs, Chin and Logan, above n 9; Ted Schadler, Should Your Email Live in the
Cloud? A Comparative Cost Analysis (2009).
20
For a comprehensive review, see, eg, Steve Whittaker, Personal Information
Management: From Information Consumption to Curation (2011) 45 Annual
Review of Information Science and Technology 3.
21
Steve Whittaker and Candace Sidner, E-mail Overload: Exploring Personal
Information Management of E-mail (Paper presented at the Conference on Human
Factors in Computing Systems, Vancouver, 1996).
22
Danyel Fisher et al, Revisiting Whittaker & Sidners E-mail Overload Ten Years
Later (Paper presented at the 20th Anniversary Conference on Computer
Supported Cooperative Work, Banff Alberta, 2006).
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 5
months while the number of folders increased to 133 in comparison to 1996.
Other studies examine individual differences in dealing with e-mail
messages,
23
the problem of e-mail overload
24
and the role of e-mail in task
management.
25

While acknowledging the amount of existing research, our review shows that
the examination of e-mail governance, and more specifically the challenges of
e-mail retention, is still scarce. It is thus the intention of this paper to address
this research gap while focusing on the individual behaviour of employees in
retention and deletion of e-mails and its implications for the organisation.
2 Research Methodology
For this research endeavour, we decided to conduct a case study for the
following reasons. Case studies are deemed an appropriate method for
investigating why and how research questions as well as sticky, practice-
based problems where the experiences of the actors are important and the
context of action is critical.
26
Moreover, there is little research and sound
theoretical knowledge available on the topic of employees e-mail retention
behaviour. As such, the nature of this case study is more exploratory, seeking
to establish a foundation for future research by documenting the experiences
and knowledge gained from practice. Guided by our research questions, we
apply an approach referred to as soft-positivism.
27
This approach enables us
to draw from a positivist view, which assumes that e-mail retention
behaviour is a relatively stable and an objectively existing phenomena, while
allowing other constructs to emerge from the collected data. On the other
hand and in line with an interpretive perspective, we also allow other
constructs that emerge from the data to surface. Our overall approach is
described in the following subsections and represents the case study protocol.

23
Deborah Barreau, The Persistence of Behavior and Form in the Organization of
Personal Information (2007) 59(2) Journal of the American Society for Information
Science and Technology 307; Jacek Gwizdka, Email Task Management Styles: The
Cleaners and the Keepers (Paper presented at the CHI04 Conference on Human
Factors in Computing Systems, Vienna, 2004).
24
Laura A Dabbish and Robert E Kraut, Email Overload at Work: An Analysis of
Factors Associated with Email Strain (Paper presented at the 20th Anniversary
Conference on Computer Supported Cooperative Work, Banff Alberta, 2006).
25
Nicolas Ducheneaut and Victoria Bellotti, E-mail as Habitat: An Exploration of
Embedded Personal Information Management (2001) 8(5) Interactions 30.
26
Izak Benbasat, David K Goldstein and Melissa Mead, The Case Research Strategy
in Studies of Information Systems (1987) 11(3) MIS Quarterly 369, 369.
27
Anna Madill, Abbie Jordan and Caroline Shirley, Objectivity and Reliability in
Qualitative Analysis: Realist, Contextualist and Radical Constructionist
Epistemologies (2000) 91(1) British Journal of Psychology 1.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 6
2.1 Unit of Analysis and Case Selection
The unit of analysis of the present study is an employees e-mail retention
behaviour in a firm. We selected a single-case design with multiple embedded
units of analysis representing the participating employees while the firm
represents the single case sample being constant for all embedded units. This
design increases the evidential significance of our findings and the studys
external validity since it supports the replication of results.
28
To address our
research questions, we followed the typical case sampling strategy.
29
We
observed a multi-national software development firm where e-mail provides
the typical means of internal and external communication, scheduling and
calendaring and is used by every employee on a daily basis in the course of
business. The globally operating firm is based in continental Europe and
employs more than 55 000 people in more than 120 countries. In addition to
various geo-specific legislation, the firm is obliged to comply with regulations
prescribed by the Sarbanes-Oxley Act of 2002 (SOX).
30
The firm uses Microsoft
Outlook Exchange as their corporate-wide e-mail system. We were granted
access to the research site and thus could directly ask employees from
different departments for their participation in this study while applying the
snowballing strategy to win additional participants. In total we were able to
acquire 20 participants from three different departments with an average
affiliation with the firm of 7.3 years. Table 1 provides an overview of
participant profiles and department affiliation.
<O Department Age [years] Manager Gender
Research
(R)
Consulting
(C)
Project
Mgmt (P)
26-
35
36-
45
46-
55
56-
65
yes no f m
Frequency 11 8 1 11 7 1 1 3 17 4 16
Percentage 55 40 5 55 35 5 5 15 85 20 80
Table 1 Participant Demographics Overview
2.2 Data Collection and Data Analysis
Yin proposes three principles of data collection to increase the robustness of
the results, namely: (1) use of multiple sources of evidence, (2) creation of a
case study database and (3) to maintain the chain of evidence.
31
Following the

28
Kathleen M Eisenhardt, Building Theories from Case Study Research (1989) 14(4)
Academy of Management Review 532; Robert K Yin, Case Study Research: Design and
Methods (Sage, 2009).
29
Guy Par, Investigating Information Systems with Positivist Case Study Research
(2004) 13(1) Communications of the Association for Information Systems 233; Yin, above
n 28.
30
Sarbanes-Oxley Act of 2002, Pub L No 107204, 116 Stat 745 (2002)
<http://www.sec.gov/about/laws/soa2002.pdf> (SOX).
31
Yin, above n 28.
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 7
first principle and in line with Kaplan and Duchon,
32
we used a combination
of qualitative and quantitative methods to collect data. The data collection
took place between June and August 2012 and included unstructured
interviews, participant observation, a tool-supported mailbox analysis and a
survey questionnaire to overcome reported issues in prior research regarding
information about deleted e-mails that are difficult to capture.
33
As the firm
under study employs Microsoft Outlook Exchange, we decided to develop a
macro in Visual Basic for Applications (VBA) to ease the data collection
without being required to install other software components on the
participants computers. The developed macro automatically captures an
employees mailbox data by reading all online (ie on the server) and offline (ie
in a local archive) stored e-mail items and folders including their metadata
such as size, last modification or received date/time. Depending on the
amount of e-mails a specific user retained, the runtime of the macro varied
between 10 and 150 minutes. We scheduled personal meetings or telephone
calls with each of the participants where we introduced them to the scope of
the study. We explained the approach and emphasised the respect of data
anonymity and privacy which is reportedly a major concern in such studies.
34

Subsequently, we deployed the macro in their local Microsoft Outlook client
and started the automatic data collection. During or after the analysis we
interviewed the participants with regards to their personal reasons for e-mail
retention and their usage of the archiving functions in Outlook. During the
interviews participants also presented their Outlook client enabling us to
observe the way they retain and file e-mails. We documented the results of
the interviews and observations by writing and took field notes after the
meetings took place.
In line with the second data collection principle, we created a case study
database where we stored all data for subsequent analysis. In order to
maintain a chain of evidence (third principle), we aggregated the quantitative
mailbox data consisting of 718 783 single e-mail items stepwise. We thereby
stored a snapshot of each step to allow tracing back and forth between the
raw data and aggregations and eventually our interpretations. In an effort to
increase the objectivity
35
of our findings and ease the comparison between
participants by the means of quantitative data, we decided to survey the
participants with an online questionnaire. To develop the questionnaire, we
started augmenting the interview transcripts and field notes with reflective
remarks
36
and research literature. We commenced open coding, whereby

32
Bonnie Kaplan and Dennis Duchon, Combining Qualitative and Quantitative
Methods in Information Systems Research: A Case Study (1988) 3(3) MIS Quarterly
571.
33
Ashish Gupta et al, E-mail Management: A Techno-Managerial Research
Perspective (2006) 17(1) Communications of the Association for Information Systems
941.
34
Ibid.
35
Kaplan and Duchon, above n 32.
36
Par, above n 29, 249.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 8
statements in the transcripts and field notes pertaining to some reasons for
retention of e-mails were used, to form themes and categories.
37
Those themes
defined the focus of the online questionnaire and guided its development.
Before we started the survey, the questionnaire was reviewed by two research
colleagues. According to their comments, we revised the wording of some
questions to improve the clarity and adjusted the sequence of questions.
Based on the set of data collected during the interviews and the survey
responses, we conducted the analysis of the overall data in response to our
research question.
3 Findings
3.1 How Do Employees Retain E-mails?
The studied firm has a set of global policies in place, such as an information
security policy, travel policy and purchasing policy, that are binding to all
employees. However, during the time of data collection, there was no formal
policy defined that governs the retention and disposal of e-mails. Every
employee has a server size quota that restricts the total size of e-mails to be
stored on the server to 215 MB while all received and sent e-mail is kept by
default until the user explicitly deletes it. Once the mailbox size reaches 90 per
cent of the server quota, users are automatically notified via e-mail to delete
unrequired items. Once the users mailbox size equals the given quota of 215
MB, he/she will not be able to send and receive any e-mails. By that time, the
user has to make a decision about whether and what e-mails to discard or to
retain. To archive specific e-mails in Outlook the user can create a local
archive, ie, personal-storage-table (PST file) that is stored on the users local
hard drive. All items to be retained can manually or automatically (auto
archiving) be moved from the server to the local archive (offline archive)
the latter option is only used by 30 per cent of the employees in our sample.
As a consequence of this configuration, users have the freedom to decide
what e-mail to retain or discard since there is no policy guiding the decision.
From a technical perspective, there is practically no storage limit on the local
hard drive and backup server that restricts the amount of retained e-mails.
Table 2 provides an aggregated excerpt of the mailbox analysis results
including the total, mean, median, minimum, maximum and standard
deviation (SD) values for each user while the first digit in the user ID
indicates the organisational unit of the user (column A).

37
Cathy Urquhart, An Encounter with Grounded Theory: Tackling the Practical and
Philosophical Issues in Eileen M Trauth (ed), Qualitative Research in IS: Issues and
Trends (Idea Group, 2001) 104.
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 9
Table 2: Excerpt of Aggregated Mailbox Analysis Results (n=20)
A B C D E F G H I J K
User Attributes Mailbox Characteristics Archive Characteristics
User ID
Affiliation
with the
Firm
[year]
Date of
Oldest E-
mail in
Mailbox
Age of
Mailbox
[year]
Total
Number of
E-mails
Total Size
of E-mails
[MB]
Total
Number of
Archived
E-mails
Percentage
of E-mails
Older 3
Months
Percentage
of
Archived
E-mails
Archived
E-mails
per Month
of
Affiliation
Regularly
Backup
C01 5.8 27.09.06 5.8 72 450 5452.70 70 893 97.70% 97.90% 1027 yes
R02 4.3 26.03.08 4.3 50 188 5544.40 49 282 94.30% 98.20% 948 yes
R03 15 25.01.99 13.4 48 581 7837.70 47 817 95.90% 98.40% 266 yes
C04 4.3 28.04.08 4.2 36 737 5275.40 34 840 90.30% 94.80% 683 yes
R05 5.7 31.10.06 5.7 29 652 3906.70 24 364 90.00% 82.20% 358 yes
C06 10 03.04.07 5.3 25 427 3813.00 24 702 90.70% 97.10% 206 yes
R07 5.8 29.09.06 5.8 17 533 1589.90 14 434 88.40% 82.30% 206 yes
R08 8 04.04.11 1.2 12 220 1077.80 9745 60.00% 79.70% 102 yes
R09 10.7 04.03.08 4.3 10 130 1174.30 7190 78.50% 71.00% 56 yes
P10 4.3 14.03.08 4.3 8608 1090.50 6821 83.30% 79.20% 134 yes
C11 7.5 18.07.05 7 4871 194.4 1356 68.50% 27.80% 15 no
R12 4.6 02.01.08 4.5 90 110 8898.90 86 241 93.10% 95.70% 1568 yes
C13 4.1 30.05.08 4.1 16 237 2488.30 15 710 95.90% 96.80% 321 yes
R14 4.4 27.07.08 4 35 629 3563.30 34 960 97.50% 98.10% 660 yes
R15 7.8 27.09.04 7.8 89 967 10 186.20 89 369 92.60% 99.30% 961 yes
R16 2.4 01.02.10 2.4 5332 929.3 4294 89.80% 80.50% 148 yes
C17 12 28.08.00 11.9 45 776 4814.50 44 543 96.80% 97.30% 309 no
R18 17.1 11.11.99 12.7 90 272 10 289.00 89 465 95.80% 99.10% 436 yes
C19 6 11.09.09 2.9 13 755 2179.10 12 647 71.70% 91.90% 176 yes
C20 6.5 29.12.05 6.5 15 308 1249.50 7892 59.60% 51.60% 101 yes
Total

718 783 81 554.90 676 565 92.00% 94.10% 8681
Mean 7.3

5.9 35 939.10 4077.70 33 828.30 86.50% 86.00% 434
Median 5.9 4.9 27 539.50 3688.10 24 533 90.50% 95.30% 287
Min 2.4 25.01.99 1.2 4871 194.4 1356 59.60% 27.80% 15
Max 17.1 04.04.11 13.4 90 272 10 289.00 89 465 97.70% 99.30% 1568
SD 3.9 3.3 29 264.20 3173.30 29 733.10 12.3 18.4 413

Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 10
3.1.1 Mailbox Characteristics
As can be seen from Table 2, the 20 participants of our study keep 718 783 e-
mails in total (column E) that are up to 13.4 years old (column D) and
consume 81 554.9 MB (~ 80 gigabyte) storage in total (column F). At the lower
and upper ends, the size of the mailbox per user ranges from 194.4 MB (C11)
to 10 289 MB (R18) while the average mailbox size equals 4077.7 MB.
Acknowledging the total number of e-mails held by a specific user, we find a
range between 4871 e-mails at the lower end and 90 272 e-mails at the upper
end, which equals a difference by a factor of 18.5. Moreover, it can be seen
that in 11 out of 20 cases the age of the mailbox (column D), determined by
the date of oldest e-mail captured in the mailbox (column C), only differs
slightly (=< 0.1 years) or even equals the term of affiliation with the firm
(column B) as, eg, in the case of C01, R02 or C20. The reasons for significant
differences between the mailbox age and the term of affiliation can be
accounted for by spring cleaning actions (eg, R06/08) or loss of e-mails due to
job change associated with a location and hardware change (R09/18).
Comparing the age of mailboxes, we find an increase in contrast to Fisher et
al.
38
It can be observed from column D that 75 per cent of all participants
retain e-mails dating back at least 4.2 years, 50 per cent for at least 5.3 years
and 25 per cent for at least seven years. In line with this finding, it can be seen
from column H that in total 92 per cent of all retained e-mails are older than
three months, ie, the e-mail has been sent or received at least three months
ago. Those items account for 93.2 per cent of the total storage need for e-mails.
3.1.2 Archive Characteristics
We define an archived e-mail as being stored offline on the local hard drive in
a PST-file that can only be accessed with the local outlook client on the
respective users computer. In contrast, online stored e-mails can be accessed
by the employee via mobile devices or a webmail portal. Those archived
items are subject to the users individual decisions regarding backup (column
K), security measures, deletion or permanent retention. Recognising the
percentage values in column I, we find that 94.1 per cent, ie, 676 565 of all
718 783 analysed e-mails, are archived by the user and account for 79 008.1
MB (96.6 per cent) of the total storage demand. However, considering the
different terms of affiliation with the firm, we standardised the total number
of archived e-mails by the months of affiliation of each employee (data
collected via survey) as can be seen from column J. The range of relative
archived item count of a specific user varies by a factor of 52 between a
minimum of 15, maximum of 1568 and results in an average of 434 archived
e-mails per user. Moreover, we calculated the average archive growth based
on 10 samples providing us with data of at least five years and observe that
the archive size has grown 6.5 fold in five years, from 0.57 GB to 3.69 GB at a
compound annual growth rate (CAGR) of 59.45 per cent, ie, the size is more
than doubling every two years, which is compatible with estimates from other
studies.
39


38
Fisher et al, above n 22.
39
IDC, The 2011 IDC Digital Universe Study (2011).
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 11
3.1.3 Deletion Behaviour
Like in similar studies,
40
our automated mailbox analysis is invisible to
already deleted e-mails. This might bias our results since the total number of
e-mails is not only dependent on the number of received or sent items but
also on the deletion behaviour of the employee. In an effort to limit this effect,
we asked the participants of the study to describe their deletion behaviour
using a seven point Likert-Type rating scale ranging from 1 (I do not delete e-
mails) to 7 (I delete e-mails). Figure 1 depicts an overview of the participants
responses in a scatter plot where the x-axis represents the given answers on
the rating scale, and the y-axis represents relative amount of archived
(column J). As illustrated in Figure 1, one can obviously distinguish at least
two opposite groups of employees, depicted as A and B in Figure 2. Those
groups significantly differ in the amount of retained e-mails (t(11) = 6.35,
p < 0.001, r = 0.79). Group A retains 974.48 e-mails on average (SE = 134.16)
and claims rather not to delete e-mails. Group B rather tends to delete e-mails
and retains 153.26 e-mails on average (SE = 36.12). This general tendency is
also supported by an inverse correlation between the amount of retained e-
mails and the deletion behaviour (Spearmans ! = -0.615, p < 0.01).


Figure 1: Claimed Deletion Behaviour in Relation to Number of Retained Items

40
Fisher et al, above n 22.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 12
3.2 Why Do Employees Retain E-mails?
Based on the analysis of the interview data and literature,
41
we deduced five
reasons for e-mail retention. We asked the participants of this study to rate
their agreement with those reasons on a Likert-Type scale ranging from 1
(strongly disagree) to 7 (strongly agree).
ID Description Mean SD Min Max
RE1
I retain e-mails just because I might need them in
the future.
6.25 0.72 5 7
RE2
I retain e-mails to reliably hold on to important
information for long periods of time, e.g., project
documentations, contracts or descriptions of how
to perform a complicated task.
5.65 1.71 2 7
RE3
I retain e-mails to be able to demonstrate
justification of my decisions.
5.45 1.63 2 7
RE4
I retain e-mails because it simply takes too long
to decide which e-mail to retain and which to
delete to make it worth the effort.
4.55 5.00 1 7
RE5
For me retaining e-mails is a way of not getting
lagged behind other people.
3.90 3.25 1 7
Table 3: Aggregated Ratings of E-mail Retention Reasons (n = 20)
The reasons including some descriptive statistics about the employees ratings
are illustrated in Table 3. As can be seen from Table 3, respondents mainly
retain e-mails as they assume that they might be useful in the future. Other
major reasons are to preserve important information, to justify their decisions
and/or because making decisions about disposal or retention is considered a
time consuming task. Reason RE1 shows a high degree of agreement across
all participants with an average rating of 6.25, a standard deviation (SD) of
0.72 and a minimum rating of 5 while RE2 and RE3 show an average rating of
5.65 and 5.45 respectively as well as a standard deviation of 1.71 and 1.63
respectively. RE4 receives an average rating of 4.55 with a standard deviation
of 5.00. RE5 shows a lower level of agreement across participants with mean
ratings of 3.90 and a standard deviation of 3.25. Besides the given reasons for
retention, it has to be noted that we observed that employees perceived
storage to be a generally inexpensive and unlimited resource one can easily
make use of. For example, participant R15 stated: Why not [keep] all my e-
mails? Storage prices have fallen and are not a big deal nowadays.

41
Angela Edmunds and Anne Morris, The Problem of Information Overload in
Business Organisations: A Review of the Literature (2000) 20(1) International
Journal of Information Management 17; Ronald L Thompson, Christopher A Higgins
and Jane M Howell, Personal Computing: Toward a Conceptual Model of
Utilization (1991) 15(1) MIS Quarterly 125; Ron Weber, The Grim Reaper: The
Curse of E-mail (2004) 28(3) MIS Quarterly iii.
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 13
4 Interpretation of Findings and Propositions
Aggregating our key findings from the field data in response to our research
questions we suggest a set of propositions presented in Table 4. Firstly, the
quantitative mailbox data shows that the relative amount of archived e-mails
broadly varies between users, ie, by a factor of 52 in a range between 15 and
1,568 retained e-mails. While interpreting Figure 1, it might not be surprising
that users claiming to delete e-mails, significantly retain less e-mails than
users that claim to abstain from deleting. Rather, it has to be questioned how
those contrasting behaviours towards retention/deletion can be explained.
Characterising the two revealing opposites (group A, B in Figure 1) there
seems to be a type of employee that tends to hoard e-mails retaining them in a
keep everything forever manner (average archived e-mails: 974). On the
contrary we find group B that rather tends to discard e-mails retaining only a
subset in a more selective way (average archived e-mails: 153). This
observation also finds support in existing research.
42
However,
acknowledging this difference in the light of many similarities, such as a
general lack of an e-mail retention policy, the use of the same e-mail client
and server quota indicates that a decision towards retention or deletion is
contingent on behavioural factors. Bearing in mind the issues of compliance,
the notion of e-mail as a smoking gun and potential fines for firms, this is an
important finding and thus leads to proposition P1.
Secondly, it should be noted that 11 employees show no difference between
the mailbox age and the term of employment, which implies that they retain
e-mails dating back to their first working days within the firm. While some of
the differences can be explained by reasons such as loss of email or spring-
cleaning (group A), we may acknowledge a fundamental tendency. It seems
that once an employee has made the decision to retain an e-mail, this decision
is not revised by re-evaluating the e-mail at a later point in time. This
indication is also supported by the high CAGR of 59.45 per cent. In support of
extant research confirming that people prefer to keep their options open and
that they are rather averse to change their retention decisions, we formulate
P2.
43
Thirdly, we observe vast amounts of old e-mails being retained. On
average, 86 per cent of a users mailbox size is attributed to archived e-mails
that are additionally backed up regularly by 90 per cent of the participants.
Roughly projecting this samples average mailbox size (~ 4 GB) to the total
population of employees in the firm (which we estimate to be 55 000), we
estimate a total volume of 213.9 terabyte (TB) accounting for the storage of e-
mails. This amount of information has to be stored, managed and possibly
reviewed in the course of e-discovery requests amidst an archive growth of
roughly 60 per cent annually leading to an ever-accumulating storage need. It
is acknowledged that such growth drives both the complexity of information
management and the overall cost of e-mail. And despite falling storage prices,
the strong growth rates are not compensated.
44
According to a recent study,

42
Gwizdka, above n 23.
43
Davis and Ganeshan, above n 11; Whittaker, above n 20.
44
Ward et al, above n 12.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 14
40 per cent of total e-mail costs can be attributed to storage and archiving.
45
In
addition, we find several reports citing the high costs associated with e-
discovery. Depending on corporate practices and how well firms are
prepared, the analysis of e-mails in audits or legal investigations may run into
millions.
46
However, our field data points towards a lack of awareness with
regards to the impact of ones individual e-mail retention habits on
information management issues, legal risks and costs. During the interviews,
employees seemed to be free of concerns regarding the impact of their
individual retention behaviour on a corporate level. This finding is also
reflected in the identified retention reasons provided in Table 3 and point
towards the need to educate and inform employees about the issues
associated with e-mail retention. The high mean ratings (and low SD) of the
first three retention reasons in Table 3 point towards a motivation to retain e-
mails which is rather intrinsic in nature. As a consequence, we formulate
proposition three (P3).
Fourthly, we draw a parallel from the area of information security, where
employees are considered a valuable resource in achieving information
security in alignment with business objectives. Transferring this concept to
our context, such an alignment requires an understanding of the e-mails
content, its relative value, associated legal and regulatory relevance as well as
potential reuse opportunities across other business processes. As such, the
appropriateness of an exclusively technological solution can be questioned.
47

Despite the benefits of technological measures, information security research
emphasises the importance of formal and informal control mechanisms, such
as policies, organisational culture, training or awareness.
48
Based on the
premise that individual user behaviour and socio-organisational measures are
also important in the context of e-mail retention and acknowledging the lack
of an e-mail retention policy in the firm under study, we formulate P4.

45
Schadler, above n 19.
46
Daniel E Braswell and W Ken Harmon, Assessing and Preventing Risks from E-
mail System Use (2003) 5 Information Systems Control Journal 33.
47
Burcu Bulgurcu, Hasan Cavusoglu and Izak Benbasat, Information Security Policy
Compliance: An Empirical Study of Rationality-Based Beliefs and Information
Security Awareness (2010) 34(3) MIS Quarterly 523.
48
Tejaswini Herath and H Raghav Rao, Protection Motivation and Deterrence: A
Framework for Security Policy Compliance in Organisations (2009) 18(2) European
Journal of Information Systems 106.
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 15
Table 4: Propositions Explaining Employee E-Mail Retention Behaviour
5 Conclusions
5.1 Implications for Practice and Scientific Community
The present study was designed to explore how/why employees retain e-
mails and to elaborate on the implications for firms. We employed a
combination of qualitative and quantitative data collection methods to gather
data from 20 employees in a major software firm. Connecting the findings
from the field in response to our research questions, we suggest a set of
propositions that highlight our key findings. The findings offer practical
implications for firms that use e-mail and, thus, are faced with a constantly
growing amount of e-mails. Our findings provide empirical support that a
decision towards retention or disposal is contingent on behavioural factors in
the absence of any corporate guidance. Moreover, our results indicate a lack
of awareness of the associated effects of each individuals e-mail retention

49
Gwizdka, above n 23.
50
Whittaker, above n 20.
51
Anthony Sanchez, Top 5 Strategic Email Compliance Mistakes (2005) Sarbanes-
Oxley Compliance Journal
<http://www.s-ox.com/dsp_getFeaturesDetails.cfm?CID=843>.
52
Ward et al, above n 12; Knolmayer et al, above n 10.
ID Proposition
Exemplary Quote from Study
Participants
Literature
P1
A decision towards retention is
contingent on behavioural factors
in the absence of a binding
guideline for the retention of e-
mails.
C11: I only retain e-mails that are
important for my current task. Once I
have completed it, I delete the related
e-mails. Otherwise it is getting too
much.
C19: I do not delete e-mails at all. I
just keep everything. I think every e-
mail has its purpose.
Gwizdka
49

P2
Once the decision towards
retention is made by the employee,
a revised decision towards deletion
is rather unlikely.
C01: When I decide to retain an e-
mail, I usually do not revise this
decision later on. It takes time and is
not worth the effort.
Whittaker
50

P3
Employees are not aware of the
implications of their individual
retention behaviour and the
associated impact on cost and risk
on firm-level.
R15: Why not keeping all my e-
mails? Storage prices have fallen and
are not a big deal nowadays.
Sanchez
51

P4
Firms can improve the employees
retention behaviour by the means
of socio-organisational measures
such as policies or trainings.
R05: I think that trainings that
provide concrete and tangible
guidelines for email retention eg,
based on the description of real-life
scenarios, would be helpful.
Ward et al
and
Knolmayer et
al.
52

Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 16
behaviour with regards to legal risks or costs on a corporate level. However,
the way firms retain, manage and retrieve information will impact their risk
exposure and legal costs.
53
Thus, acknowledging the findings of this study
may help firms to improve their e-mail retention procedures by initiating
appropriate measures which should not be of a technological nature only.
Also, socio-organisational measures should be considered in order to impart
awareness of the impact of individual retention behaviour and to promote
conscious decision making regarding e-mail retention by which a keep
everything forever culture can be avoided.
On the other hand, this study has some general implications for the research
community by contributing to the body of knowledge on e-mail retention
within a firm. Our findings indicate the existence of different types of
behavioural patterns in e-mail retention and deletion among employees. This
study thus provides motivation for further research geared towards the
examination of the cognitive and environmental factors to provide a better
understanding of the determiners of this behaviour. Further, our study shows
that there is little extant research that has investigated the issues organisations
face with e-mail and information retention from a compliance or legal point of
view. Moreover, organisations are obliged to retain specific information in an
effort to ensure compliance which seems to become more complex between
the tension of rising legal requirements and exponential data growth. As
such, this study points to a number of research opportunities on e-mail
retention from an information governance perspective.
5.2 Limitations and Future Research
As with every study, this study has some limitations that should be noted
when interpreting the findings. Firstly, this exploratory case study was
conducted in two countries in Europe but only one firm, which raises
questions about the external validity, ie, generalisability of our findings.
54

Although we collected and analysed data from several employees, statistical
generalisation is impossible to achieve with 20 units. The main reason for the
relatively small sample size used in this study was due to difficulties in
recruiting research participants owing to their concerns about data privacy
and data loss. This issue is also reported in extant research.
55
Many
respondents refused to participate as they perceived their inbox as a very
personal and confidential repository of business and personal information.
Although we tried to convince employees to participate by showing them
examples of result-reports of mailboxes to prove that only anonymous data
was extracted, they were nonetheless uneasy. Other respondents refused to
participate because they were anxious that they would lose access to their e-
mails due to the macro installation. Nevertheless, it should be noted that case
research should be judged on its theoretical generalisability and as such

53
Volonino, Sipior and Ward, above n 17.
54
Yin, above n 28.
55
Gupta et al, above n 33.
A Case Study on the E-Mail Retention Behaviour in Firms

EAP 17
differs from sampling research that aims at a statistical generalisability of its
findings.
56
On the other hand, acknowledging the problems in acquiring
participants, our findings may be subject to non-response bias.
57
Although the
comparison of participants suggests that the findings and patterns hold true
for the other employees of the firm, including the non-
respondents/participants in this study may have provided additional insights
and a more complete understanding of employee retention behaviour. This is
because these employees represent a significant constituent of the overall
population of interest. Moreover, there may be cultural or structural
influences that vary across different firms, industries and countries that need
to be taken into account when interpreting our results. For example, different
litigation systems and litigation cultures may impact the way firms and
employees manage e-mail retention. Examining a rather small or mid-sized
firm that does not operate globally or that is it not required to comply with
SOX requirements may provide additional insights into an employees
retention behaviour and the external factors that influence this behaviour.
Also, investigating further firms in different industries with, eg, different e-
mail clients, mailbox quotas or policies provides an interesting opportunity
for future research to uncover similarities/differences among employees
behaviour.
Secondly, we were only able to collect the mailbox data at one specific date.
Although we collected a large set of 718,783 e-mail records ranging back to
1999, we still lack a more dynamic view for understanding both an
employees retention behaviour over time and when retention decisions are
made.
Thirdly, we lack a deeper understanding of differences and commonalities
between different types of employees including the decision making
rationales in e-mail retention that, for example, could support the
development of a taxonomy of different user behaviours. Toward that end,
additional qualitative in-depth case studies could be conducted to identify
antecedents and cognitive factors influencing an employees e-mail retention
behaviour. In a subsequent step, scholars could take a more positivist
research approach to operationalise relevant constructs influencing an
employees retention behaviour and apply more quantitative research
designs. In this effort, hypotheses should be developed and tested, eg,
through experiments or large-scale surveys to assist the development of a
theory of explaining
58
on the behaviour of e-mail retention of employees.
Therefore, future research should be conducted with larger and diversified
samples from various organisations from different industry sectors and

56
Bas Hillebrand, Robert AW Kok and Wim G Biemans, Theory-testing Using Case
Studies: A Comment on Johnston, Leach, and Liu (2001) 30(8) Industrial Marketing
Management 651.
57
J Scott Armstrong and Terry Overton, Estimating Nonresponse Bias in Mail
Surveys (1977) 14 Journal of Marketing Research 396.
58
Shirley Gregor, The Nature of Theory in Information Systems (2006) 30(3) MIS
Quarterly 611.
Journal of Law, Information and Science Vol 22(2) 2012-2013
EAP 18
geographical regions to allow statistical generalisation and to increase the
external validity of the findings. While the findings of the present study
should be viewed within the light of the described limitations, they have
nevertheless yielded preliminary insights about how and why employees
retain e-mails. We thus believe this study provides some useful, however
tentative, findings that should be of interest to both scholars and practitioners.

You might also like