Professional Documents
Culture Documents
I. INTRODUCTION
and credit/debit card data. A data breach can have As technology progresses and information storage
becomes digital, organisations are swiftly changing
2
from a sourcing model to a value-based one, while may be enforced as a consequence of compliance
keeping client wants in mind. Businesses are known issues, customer legal action, increased security
to gather and analyse personally identifiable costs, and a lack of consumer faith [5]. According
information (PII) from customers, like purchasing to a recent poll, the cost of a data breach is
patterns, surfing habits, credit card details, and continuing to rise. When it was discovered that
security numbers (SSNs), and then use this data to sensitive data had been lost, organisations were hit
present customers with personalized promotions [1]. with hefty fines. According to Hovav & D'Arcy et
Consumers' privacy was affected by data breaches, al[6], based on whether business was an e-tailer or
which has a massive effect on their confidentiality not, the businesses responded significantly to
decisions . Consumers who had their personal breach notices. After announcing a data breach,
information stolen expressed their dissatisfaction businesses lose 2.1 percent of their market share
with the compromised firm, citing fears that their [7].In the investigation conducted by Ga Shankar
privacy had been abused [2] According to research, and Mohammed (2020), they discovered two
as a result of the compromised firm's lack of faith, potentially dangerous data breaches at Choice Point
these affected clients switched to other competitive and TJX. They believe that when developing
organisations (Choi et al., 2016). According to the organisational privacy policies, companies should
research Conducted by the Philippine Institute, data think about ethical duties[8].
breaches cost impacted organisations around
III. METHODOLOGY
$3 million and resulted in a 5% drop in stock
For this analysis, the researchers took the twitter
prices , as well as users ending partnerships with the
data of the individuals of the four organizations data
companies involved and shifting to those with
breach: (1) the Chegg data breach, (2) the Marriott
superior security measures. In brief, the exploitation
data breach, (3) the Under Armor data breach, and
of users' PII and the subsequent data breach foretold
(4) the T-Mobile data breach.
major implications for the companies involved [3].
In the dataset, the researchers used Word2vec, tfidf
According to scientific studies, when the data
weighting, and KMeans clustering. Unsupervised
breach was imminent, financial markets responded
sentiment analysis was performed using word
negatively, lowering the valuation of the breached
embeddings learned for the current dataset which
firm and, as a result, the wealth of its shareholders
used gensim's Word2Vec method implementation.
[4]. Businesses face considerable challenges as a
result of data breaches or the prospect of a data
breach. Illegal access to the confidential data or the
A. Chegg Data Breach
unintentional disclosure of confidential information
data could have catastrophic consequences. Fines Chegg informed the SEC of a security issue
3
affecting more than 40 million users. The attacker Unsupervised sentiment analysis was performed
was able to obtain email id, names and password of using word embeddings trained for the provided
the users & shipping addresses. dataset using gensim's Word2Vec framework, and
B. Marriot Data Breach the results are shown below. The main processes
were detecting negative and positive clusters in
Hackers gained access to many of Marriott's hotel
word vector space using sklearn's version of the
reservation systems for four years, disclosing the
KMeans clustering method, this was then used to
travel plans, user names, contact number, email,
convert each statement into a vector of sentiment
passport number, D.O.B, and gender of 500 million
scores for every word or phrase. The second vector
people. Some victims' payment card information
for a given sentence was created by replacing all of
and expiration dates were also stolen.
the terms in the sentence with their associated tfidf-
C.T-Mobile Data Breach
scores. For each sentence, the resulting prediction
According to T-Mobile, more than 2 million was generated as a dot product of these two vectors;
people's information may have been accessed. T- If the vector sum appeared to be positive, the
Mobile notified affected customers via text message overall sentiment was assumed positive; if the
that hackers had acquired to the user name, ZIP vector sum appeared to be negative, the overall
codes, contact numbers, mail id, type of account emotion was considered negative.
and account data. A. Chegg Data.
B. Marriot Data
IV. RESULTS AND ANALYSIS Table 3. Confusion Matrix for Marriot Tweets
V. CONCLUSION REFERENCES
The Researchers investigated individual tweets [1] Ayyagari, R., 2012. An exploratory analysis of
following data breaches at many firms and noticed data breaches from 2005-2011: Trends and
the impact of information breaches on the insights. Journal of Information Privacy and
organisation as a whole. Data in people's Twitter Security, 8(2), pp.33-56.
posts was used to detect and analyse their [2] Bansal, G. and Zahedi, F.M., 2015. Trust
sentiment. The tweets and replies on a few key violation and repair: The information privacy
issues were compiled into a dataset that includes perspective. Decision Support Systems, 71, pp.62-
text, user, and sentiment info, among many other 77.
things. The dataset was then used to detect [3] Garrison, C.P. and Ncube, M., 2011. A
sentiment in tweets & replies, as well as to calculate longitudinal analysis of data breaches. Information
model scores based on a variety of user- and tweet- Management & Computer Security.
based criteria. Word2vec, tfidf weighting, and [4] Goode, S., Hoehle, H., Venkatesh, V. and
KMeans clustering was used in the dataset. Word Brown, S.A., 2017. User compensation as a data
embeddings generated for the current dataset were breach recovery action: An investigation of the Sony
utilised to perform unsupervised sentiment analysis PlayStation network breach. MIS Quarterly, 41(3),