You are on page 1of 11

Ahmad Nabil, Ahmad Nazli

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, a.nabilnazli@gmail.com

Aimi Syahrul, Che Aziz


Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, aimi.syahrul@gmail.com

Muhammad Nabil Isyraff, Mohd Isham


Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, nabilisyraff98@gmail.com

Muhammad Nur Aiman, Rosmin


Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, aimanr04@gmail.com

Muhammad Syahiran, Affendy


Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, syahiranaffendy99@gmail.com

A data breach is a security occurrence in which unauthorized individuals copy, transfer, view, steal, or exploit sensitive, protected, or
confidential data. Governments, healthcare, financial services, insurance, social media, and a variety of other businesses have all
experienced it at some point. Malaysia has had a stunning amount of data breaches and data leaks in the last decade alone, with no
indications of the trend slowing down. Data breaches are a part of cybercrime that has to be prevented at all costs. In this study, the issues
regarding data breach such as leak of blood donor information from the Australian Red Cross Blood Service in 2009, the intrusion of
Malaysian Airline data, and the case of 46 million phone numbers being leaked and sold, and solutions the issues such as Organizational
data policies, encryption, and the Data Leaks Prevention and Detection (DLDP) approach are explored and discussed. The importance of
security and preventing data breaches is also emphasized in the hopes that many individuals and organizations would take the problem
seriously.

CCS CONCEPTS • Information systems • Social and professional topics • Applied computing

Additional Keywords and Phrases: Data Breach, Cybercrime, Data Privacy,

1 INTRODUCTION

The Internet has grown inseparable from human life as technology has advanced in our modern era. Despite the
advantages that the Internet provides in our daily lives, there are also concern and worries about unresponsible individuals
misusing the Internet and causing problems for others. Cybercrime is a result of increased globalisation, low-cost mobile
phones, and accessible Internet access. The growing prevalence of cybercrime has been a topic of concern due to its many
and varied characteristics. Cybercrime refers to a wide range of illegal actions carried out with the use of computer and the
use of cyberspace as a communication medium [1]. The Internet and technology have provided great tools, but they, like
any tool, may be harmful if not utilized with caution. Phishing, malware, fraud, denial of service, and even data breach are
all instances of cybercrime [2].

1
A data breach is a security occurrence in which unauthorized individuals copy, transfer, view, steal, or exploit sensitive,
protected, or confidential data. Governments, healthcare, financial services, insurance, social media, and a variety of other
businesses have all experienced it at some point [3]. Malaysia has had a stunning amount of data breaches and data leaks
in the last decade alone, with no indications of the trend slowing down. This has prompted concerns about whether
Malaysians’ personal data is effectively protected under the Personal Data Protection Act 2010 (PDPA). The PDPA strives
to protect data subjects’ personal data by regulating the processing collection, and storage of such data by individuals and
organisations, as well as defining legislation and rules for the operation of personal data by individuals and organizations
[4].
The Internet was used to disseminate information when Wikileaks’ release of hundreds of thousands of secret and
confidential documents involving various governments and multinational corporations has implicated many countries
including Malaysia, demonstrating how vulnerable national security is from the perspective of information system and
network sustainability. Critical Information Infrastructure is defined as those assets, real or virtual, systems and functions
that are so important to a country’s economic strength, national image, national defence and security, government
capability to function, and public health safety that their incapability or destruction would be catastrophic [5].
It demonstrates the need of securing critical data in order to avoid any unexpected events that may occur in any industry.
In this study, we'll look at the issues surrounding data breaches, as well as present and future solutions that might be used
to address the issues.

2 DISCUSSION OF ISSUES

Data breaches may harm the victims with anxiety as their data has been compromised. It risked the issues such as fraud
transactions, identity theft and the hassle to delete old accounts to create anew. For example, customers’ data such as
customer details, credit card information and passport numbers were breached, and it had affected Marriott customers
badly. Moreover, data stolen from data breaches are used to steal personal information and defraud businesses. Criminals
would use the data to either gain profit, publicise it or sell it of in the darknet [6].
There was also an issue arose about data encryption. It was said that encryption only protects the content but not the
client’s privacy. The data can still be stolen any other way. Moreover the suggested use of TOR and VPN to anonymise
the system is still vulnerable to website fingerprint attacks even though it can hide the traffic data flow [6].
Enterprises are also one of the targets of data breaches. It became a regular occurrence and has affected hundreds of
millions of people. Table 1 shows some of the largest data breach occurrence in the past few years. Some of the data were
leak by the internal staffs either accidentally or intentionally [7].

Table 1: Massive Enterprise Data Leak Incidents in Recent Years

Organisation Records Breach Date Type Source Industry Estimated Cost


Anthem Insurance 78 million January 2015 Identity theft Malicious outsider Healthcare $100 million
Yahoo 500 million December 2014 Account access State sponsored Business $350 million

2
Organisation Records Breach Date Type Source Industry Estimated Cost
Home depot 109 million September 2014 Financial access Malicious outsider Business $28 million
JPMorgan chase 83 million August 2014 Identity theft Malicious outsider Financial $13 billion
Benesse 49 million July 2014 Identity theft Malicious insider Education $138 million
Korea credit bureau 104 million January 2014 Identity theft Malicious insider Financial $100 million
Target 110 million November 2013 Financial access Malicious outsider Business $252 million
Adobe System 152 million September 2013 Financial access Malicious outsider Business $714 million
a
Data Source Is from the Dataset of World’s Biggest Data Breaches

There was also case in 2016 regarding blood donor details of the Australian Red Cross Blood Service. Sensitive data
were placed in an unsecured part of their website. 3.5 million people were affected. These internal data leaks are a challenge
as the leaker has official access to the data. They know how to bypass detection and understands how the organisation
works. External data leaks on the other hand are done by hackers. Such incident is the Yahoo and Target data breach issue
in 2016. It was said to be one of the biggest data breaches in history. There are a few issues to be highlighted regarding
Target’s failure in preventing the data breach incident. First problem was the access control to third party. It was not secure,
and hackers could break in. The second issue was the payment system in the network. It was segregated even though it
contains sensitive data. The third problem was Point of Sale (POS) system allowed non-authorised configuration and
unknown software were installed. The fourth problem was regarding the firewall and intrusion warnings. Target ignored
the warnings by the security tools [7]. Figure 1 shows how Target’s data got breached.

Figure 1: Breakdown and analysis of the Target data breach.


(http://dx.doi.org/10.1002/widm.1211)

Recently in November 2021, there was a data breach on a hotel booking website called RedDoorz. It was said that
Singapore and Southeast Asia customers were affected. A total of 5.9 million of personal data were leaked. The
Singaporean government claims this as the largest data breach in Singapore and the responsible party was fined $74,000
by the Personal Data Protection Commission (PDPC). The issue to be highlighted was that the RedDoorz APK in Google
Play was embedded with the Amazon Web Service key. It was identified as the source of the data breach access. Amazon
Web Services stated that access keys are not to be embedded into source code [8].

3
It was reported that there out of 85 percent security incidents, 78 percent are confirmed data breaches [9]. Web
applications have the highest rate of data breach. Although Denial of Service (DoS) attacks have the highest incident cases,
it is by far the least cause of data breach [9]. Figure 2 shows the pattern in breaches.

Figure 2: Pattern in breaches (n=3,950)


(https://www.cisecurity.org/wp-content/uploads/2020/07/The-2020-Verizon-Data-Breach-Investigations-Report-DBIR.pdf)

Furthermore, when data breaches occur, the main purpose of it is to steal data. The top stolen data are personal
information (confidentiality), credentials (confidentiality) and alter behaviour (integrity) [9]. Figure 3 shows the most
compromised data in a data breach incident.

Figure 3: Top compromised attribute varieties in breaches (n=3,667)


(https://www.cisecurity.org/wp-content/uploads/2020/07/The-2020-Verizon-Data-Breach-Investigations-Report-
DBIR.pdf)

2.1 Definition, Effect, and Attribute

A data breach is when information is stolen or removed from a system without the owner's knowledge or authority.
A data breach might happen to a small business or a major corporation. Credit card numbers, client data, trade secrets, and
national security information are examples of sensitive, proprietary, or confidential information that could be stolen.

4
The consequences of a data breach can include damage to the target company's reputation as a result of a perceived
betrayal of trust. If linked documents are part of the information stolen, victims and their customers may face financial
consequences.

Hacking or malware attacks are responsible for the majority of data breaches. Breach methods that are regularly
noticed including the insider leak. It is when data is stolen by a trusted individual or a person of authority with access
privileges. Secondly, the payment card fraud. It means the physical skimming devices are used to steal payment card data.
Besides, loss or theft is also a form of breach method. Lost or stolen items include portable drives, laptops, workplace
computers, files, and other tangible items can lead to it. Next, the unintended disclosure. It is when sensitive data gets
exposed as a result of human error or ignorance. Lastly, the breach method is called unknown when the actual breach
mechanism is unclear or hidden in a tiny percentage of situations [10].

2.2 Phases of Data Breach

Data breaches do not happen overnight; they must go through at least three stages which are first, the research phase.
After deciding on a target, the attacker looks for flaws to exploit, such as in workers, systems, or the network. This requires
the attacker to conduct extensive investigation, which may include stalking employees' social media profiles to learn about
the company's infrastructure.

Next phase is the attack phase. After scouting a target's vulnerabilities, the attacker initiates contact via a
network-based or social attack. In a network-based attack, the attacker takes advantage of flaws in the target's infrastructure
to launch an attack. SQL injection, vulnerability exploitation, and/or session hijacking are just a few examples of these
flaws. In a social attack, the attacker infiltrates the target network through social engineering techniques. This might be a
maliciously constructed email sent to an employee, specifically tailored to capture the person's attention. The email could
be phishing for information, tricking the recipient into providing personal information to the sender, or it could contain a
malware attachment that executes when opened.

The final phase of data breaching is exfiltration it is when the attacker is free to extract data from the company's
network once inside. This information could be exploited for blackmail or ‘cyberpropaganda’. The information gathered
by an attacker can be utilized to launch more severe attacks against the target's infrastructure. The summary of the phases
of data breaching is shown in the figure below.

5
Figure 4: Stages of data breaching [11]

(https://www.imperva.com/learn/data-security/data-breach/)

6
2.3 Reported data breaching case in Malaysia

The first case is related to Malaysia Airlines. Based on article in Security Magazine, Malaysia Airlines has revealed
that a third-party IT service provider was involved in a "data security incident." The intrusion had also had no impact on
the carrier's key IT infrastructure and services, according to the business [12].

According to Channel Asia, the airline stated that the event occurred sometime between March 2010 and June 2019.
The incident had no impact on itineraries, reservations, ticketing, ID cards, or payment card information, according to a
statement sent to Enrich frequent flyer members. However, the compromised data includes Enrich members' names, dates
of birth, gender, and contact information, as well as frequent flyer numbers, status, and tier level information.

Next is the case where 46 million phone numbers get leaked and sold. As reported in BBC News, in a huge data
breach, the personal information of over 46 million Malaysian mobile subscribers was exposed on the dark web. Mobile
phone numbers, unique phone serial numbers, and residential addresses are among the information that has been exposed
[13]. Personal information was also obtained from a number of Malaysian government and commercial websites. The
Malaysian Communications and Multimedia Commission (MCMC) is now looking into the matter. Lowyat.net, a
Malaysian technology news website, was the first to notice the data breach. Someone tried to sell massive databases of
personal information for an undisclosed sum of Bitcoin on the website's forums, according to the website.

The breach is thought to have affected the entire country, which has a population of 32 million people, as well as
foreigners using temporary pre-paid mobile phone numbers. Service providers are required by Malaysian law to maintain
customers' personal data secure, thus there will very certainly be legal consequences.

3 DISCUSSION OF SOLUTIONS

Data Breach has given a big impact to all technology users and developers. There are many data affect because the
leakage from inside or outside source. This cannot be allowed because our data is our privacy. Before it got worse several
ways to prevent this data breach can be applied to minimize our privacy data leakage to irresponsible parties. First, start
with security protection, this security protection can be applied in many ways. For example, you can secure your data with
passwords that are not related to personal information. Why this is important to do this kind of protection? Because when
you not using your personal information as your security to your data this can make the irresponsible parties have
difficulties to break into your data. Besides that, with strong security protection you can avoid or minimize the data leakage
to third parties so your data will keep private without any leakage happen. On other side, the company who save users data
also need to improve and update their security of data so when the data is in higher security the hacker will face the trouble
and give up attacking that data. The company also need to think and manage their data storage so when the data saved, all
the privacy data of users will be safe, and this can help to improve the confident of user to trusting your company. The
policy data also should be given attention by company because when the policy is created all the terms and regulations

7
must be clear to user read and understand it about data storage policy. The company who managed the data also need to
always aware about the existing of data intrusion attempts from the third parties so when they keep alert to this attempt,
they can be in ready condition to prevent the data from leakage to the wrong party. Companies should assess the magnitude
of the risks posed by assessing the likelihood that a threat will materialize, evaluating the potential damage that could
result, and assessing the sufficiency of policies, procedures, and safeguards in place to protect against foreseeable threats
once potential threats have been identified [14].

Second, Encryption techniques must be applied in various ways to the data protection when the data stored, transfer
and using it. The technique cannot be same to all the data type because every data has their sensitive issues that can bring
harm if the data leakage to third parties. The access control of techniques also must fully utilize so all the data can keep
safe when the transfer or using it happen. Many issues happen when the security of data protection is weak all the data
stolen during transmit happen from database to users. At the very least, all encrypts emphasize data safety throughout
processing [15]. There are several techniques can be used to maximize the encryption technique to prevent the data breach
be stolen from irresponsible parties.

Homomorphic encryption technique is a sort of encryption that uses a public key. It enables anyone to compute
functions of data while it is encrypted without knowing anything about it. Along with algorithms such as KeyGen, Enc,
and Dec, a fourth algorithm evaluation is planned. This takes pk, a function f, and the encrypted data as input and produces
f(Dec(sk, cin)) = Dec (sk, cout). Because there is no hidden key, everyone can assess Eval. This encryption assists all Data
Analysts in enabling data for further analysis, particularly in regulated industries.

Verifiable computation technique is defined in the context of two parties, which are as follows: A computationally
weak verifier and a computationally powerful but untrustworthy supplier to whom the verifier assigned the task It operates
in the following way. The output requested by the prover given an input value of x and a function f for the evaluation of
the input is y, together with the proof pi that y=f(x), thereby proving the accuracy of computation.

Multi-party computation technique is the development of the final two or more parties to calculate all the inputs in a
way that makes everyone learn the proper result of the function while also preventing anybody from learning anything
else. There are two categories of adversaries’ semi-honest malevolent individuals and wicked people. This technique uses
all the probabilities that can be happen if they put their-self in that shoe.

Functional encryptions technique is a public key encryption with a normal secret key to decode data as well as
functional secret keys. Instead of decrypting the data, these keys provide access to the appropriate function evaluated result.
The key gen now works by accepting any function f and returning an output as sk(f), which when combined with Dec(skf)
of c gives f(x) where c=enc (x) This strategy ensures security by ensuring that if someone has the key to function f, that
person can obtain no more information about the data x until f(x) is not exposed. This is the finest security encryption
strategy since it prevents anyone holding keys from learning the results of data analytic. It is mostly used to generalise
current encryption techniques such as Attributes Encryption and Identity Based Encryptions.

8
Third, DLPD. Content-based analysis and context-based analysis are the two types of technical means used in DLPD
(Data Leaks Prevention and Detection Techniques). Content-based (i.e., sensitive data scanning) approaches examine data
content to prevent unwanted information from being exposed in various states (i.e., at rest, in use, and in transit) [16,17].
Although content scanning can effectively protect against data loss due to human error, it is likely to be bypassed by
internal or external attackers using data obfuscation. Context-based approaches, on the other hand, focus on contextual
analysis of the meta-information associated with the monitored data or the context surrounding the data, rather than
attempting to detect the presence of sensitive content. Some DLPD solutions use a combination of content and context
analysis [18].

Content-based DLPD uses data fingerprinting, lexical content analysis (e.g., rule-based and regular expressions), and
statistical analysis of monitored data to search for known sensitive information on laptops, servers, cloud storage, and
outbound network traffic. Signatures (or keywords) of known sensitive content are extracted and compared to content
being monitored to detect data leaks in data fingerprinting, where signatures can be digests or hash values of a set of data.

To improve the robustness of confidential content rephrasing, a fingerprinting method that extracts fingerprints from
the core confidential content while ignoring non-relevant (nonconfidential) parts of a document is used. Lexical analysis
is a technique for locating sensitive data that follow simple patterns.

Regular expressions, for example, can be used to detect structured data in documents, such as social security numbers,
credit card numbers, medical terms, and geographic information. Users can configure customized signatures and regular
expression rules using an open-source network IDS. The signatures and rules will then be compared against sniffed packets
in Snort to detect data leak attempts. The frequency of shingles/n-grams, which are typically fixed-size sequences of
contiguous bytes within a document, is the focus of statistical analysis. Item weighting schemes and similarity measures
in statistical analysis are another line of research, in which item weighting assigns different importance scores to items
(i.e., n-grams) rather than treating them equally. Collection intersection is a statistical analysis method for detecting the
presence of sensitive data that is widely used. The similarity score between monitored content sequences and sensitive data
sequences that are not allowed to leave enterprise networks is computed by comparing two sets of shingles. Several studies
have been conducted to profile users' normal behaviours to identify intruders or insiders. Research proposed modelling
normal users' data access patterns and raising an alarm when a user deviates from the normal profile to mitigate insider
threat in database systems, rather than detecting the presence of sensitive data. Based on mining database traces stored in
log files, researchers proposed to detect anomalous access patterns in relational databases with a finer granularity. Their
method can detect role intruders in database systems, which are people who act differently than the people who should be
in that role. The researcher demonstrated the feasibility of detecting the weak signals characteristic of insider threats on
organizations' information systems using a set of algorithms and methods for detecting malicious insider activities. By
monitoring user activities and detecting anomalous behaviour, researchers were able to identify and respond to insider
threats in data leak detection. They demonstrated a hybrid framework that combines signature- and anomaly-based
approaches. To detect unknown and insider attacks, the anomaly-based component learns a model of normal user

9
behaviour, and the signature-based component automatically creates anomaly signatures (e.g., patterns of malicious
activities) from alerts to prevent the execution of similar activities in the future. By capturing the semantics of user intent
and ensuring that a system's behaviour matches the user's intent, malware can be protected from malicious activities such
as manipulating a host machine to send sensitive data to outside parties. Insider behaviour and activity are monitored by a
system to detect malicious insiders who operate within their privileges but engage in activities that are outside the scope
of their legitimate responsibilities. Data mining and machine learning techniques are used in many of these context-based
approaches. Machine learning-based approaches have the advantage of avoiding the need to precisely describe anomalous
activities by identifying outliers. Watermarking is a technique for detecting and preventing data leaks by marking data of
interest as it enters and exits a network. In an outbound document, the presence of a watermark indicates a potential data
leak. It can also be used for forensics (i.e., post-mortem) analysis, such as identifying the leaker following an incident.
Trap-based defences are also effective against insider threats, which can entice and trick users into disclosing their
malicious intentions.

4 CONCLUSION

Finally, there have been various data breaches that have occurred and impacted many corporations and governments
over the years in our technology era. Data breaches, such as the leak of blood donor information from the Australian Red
Cross Blood Service in 2009, the intrusion of Malaysian Airline data, and the case of 46 million phone numbers being
leaked and sold, are mostly the result of irresponsible parties’ wrongdoings and the security of the systems is not extremely
secure. These occurrences not only harmed the organization’s operations, but they also put the leaked information’s owner
in greater unforeseen danger. Preventive measures to reduce the privacy data leaking to irresponsible parties must be put
in place to avoid any unanticipated data breaches. Organizational data policies, encryption, and the Data Leaks Prevention
and Detection (DLDP) approach are all explored in this paper as examples of strategies to tackle the challenges.

After conducting this study, it is apparent that having secure systems to protect sensitive data from being hacked by
unauthorised individuals is critical in order to protect numerous associated groups. It is anticipated that many individuals
and organizations will take the matter very seriously and take significant actions regarding the security and privacy of their
data in the future to avoid any unwanted events.

10
REFERENCES
[1] Arora, B. 2016. Exploring and analyzing Internet crimes and their behaviours. Perspect. Sci. 8. 540-542. doi: 10.1016/J.PISC.2016.06.014

[2] Monteith, S., Bauer, M., Alda, M., Geddes, J., Whybrow, P. C. and Glenn, T. 2021. Increasing Cybercrime Since the Pandemic: Concerns for
Psychiatry. Curr. Psychiatry Rep. 23. 4. 1–9. doi: 10.1007/S11920-021-01228-W/TABLES/3.

[3] Khan, F., Kim, J. H., Mathiassen, L. and Moore, R. 2021. DATA BREACH MANAGEMENT: AN INTEGRATED RISK MODEL. Inf. Manag. 58.
1. 103392. doi: 10.1016/J.IM.2020.103392.

[4] Noor Sureani, N. B., Awis Qurni, A. S. B., Azman, A. H. B., M. Othman, M. B. B. and Zahari, H. S. B. 2021. The Adequacy of Data Protection
Laws in Protecting Personal Data in Malaysia. Malaysian Journal of Social Sciences and Humanities (MJSSH). 6. 10. 488–495.
https://doi.org/10.47405/mjssh.v6i10.1087

[5] Abdul Ghani Azmi, I. M., Zulhuda, S. and Wigati Jarot, S. P. 2012. Data breach on the critical information infrastructures: Lessons from the
Wikileaks. Proc. 2012 Int. Conf. Cyber Secur. Cyber Warf. Digit. Forensic, CyberSec. 306–311. doi: 10.1109/CYBERSEC.2012.6246173.

[6] Sharma, N., Oriaku, E. A. and Oriaku, N. 2020. Cost and Effects of Data Breaches, Precautions, and Disclosure Laws. Int. J. Emerg. Trends
Soc. Sci. 8. 1. 33–41. doi: 10.20448/2001.81.33.41.

[7] Cheng, L., Liu, F. and Yao, D. D. 2017. Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdiscip. Rev.
Data Min. Knowl. Discov. 7. 5. doi: 10.1002/WIDM.1211.

[8] Chee, K. 2021. Data of 5.9m customers of RedDoorz hotel booking site leaked in Spore’s largest data breach. The Straits Times.

[9] Langlois, P. 2020. 2020 Data Breach Investigations Report.

[10] TrendMicro. n.d. Data breach. https://www.trendmicro.com/vinfo/us/security/definition/data-breach

[11] Imperva. n.d. Data Breach. https://www.imperva.com/learn/data-security/data-breach/

[12] Security Magazine. 2021. Malaysian Airlines is breached. https://www.securitymagazine.com/articles/94738-malaysian-airlines-is-breached

[13] BBC. 2017. Malaysian data breach sees 46 million phone numbers leaked. https://www.bbc.com/news/technology-41816953

[14] Bennet, S. 2008. Data Security Breaches: Problems And Solutions. The Practical Lawyer.

[15] Varshney, S., Munjal, D., Bhattacharya, O., Saboo, S., and Aggarwal, N. 2020. Big Data Privacy Breach Prevention Strategies. 2020 IEEE

International Symposium on Sustainable Energy, Signal Processing and Cyber Security (ISSSC).

https://doi.org/10.1109/isssc50941.2020.9358878

[16] Cose. 2018. 5 Ways to Avoid a Data Breach. https://www.cose.org/en/Mind-Your-Business/Operations/5-Ways-to-Avoid-a-Data-Breach

[17] Groot, J. D. 2020. What is Data Loss Prevention (DLP)? A definition of data loss prevention. Digital Guardian.

https://digitalguardian.com/blog/what-data-loss-prevention-dlp-definition-data-loss-prevention

[18] Cheng, L., Liu, F. and Yao, D. 2017. Enterprise data breach: causes, challenges, prevention, and future directions. 7th ed. John Wiley & Sons
Ltd.

11

You might also like