Professional Documents
Culture Documents
A Thesis Paper
Presented to the FEU Diliman
In Partial Fulfillment of the Requirements
In Thesis
By:
Co, Chantel Dane A.
Fenis, Mikee T.
Gali, Julianne Kay E.
Nebria, Samuel John P.
Valeroso, Franco Enrique G.
Valiente, Merl Polin C.
Presented to:
Ms. Jackie Lou O. Raborar, PhD
APPROVAL SHEET
The Research Paper entitled, “Data Mining and Netizens’ Awareness: Understanding its
Implications on Social Media and Data Privacy”, prepared and submitted by Chantel Dane A.
Co, Mikee T. Fenis, Julianne Kay E. Gali, Samuel John P. Nebria, Franco Enrique G. Valeroso,
and Merl Polin C. Valiente has been approved and accepted as partial fulfillment of the
requirements for the degree of Bachelor of Science in Business Administration Major in
Financial Management and Business Analytics.
Ms. Jackie Lou O. Raborar, MBA, Ph.D Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA
Approved by the Committee on Oral Examination with a grade of PASSED on November 27,
2021.
Mr. Jojit C. Alcalde Dr. Myrna Lim Mr. Remy Manuel C. Pendon, CPA
Panel Chair
Accepted as partial fulfillment of the requirements for the degree of Bachelor of Science in
Business Administration Major in Financial Management and Business Analytics.
2
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
ACKNOWLEDGEMENT
First and foremost, praise and thank God, the Almighty, for his guidance and blessings
The researchers would like to express deep and sincere gratitude to the research coordinator of
this course, Ms. Jackie Lou O. Raborar, to the researchers’ adviser, Mrs. Ma. Jaysan Dasel
Cadete-Cruz, and to the researchers’ statistician, Ms. Karizza M. Abolencia for imparting their
knowledge and providing us guidance throughout this research. Their sincerity and motivation
The researchers would also like to thank the respondents in the pilot testing survey for sharing
their knowledge with us with all interest despite the difference between surveying in person and
online. The researchers would not be able to continue this research without their responses that
would support this study about the “Data Mining and Netizens’ Awareness: Understanding its
The researchers' thanks and appreciation also go to the colleagues and people who have willingly
3
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
ABSTRACT
Data mining is a convenient method of organizing large sums of data, which in turn
identifies patterns now even beyond what is considered quantifiable; with the advancement of
technology considered, data collection methods once considered unfeasible prior are now
possible and in fact prevalent all throughout the industries that shape the economy and the means
understand the psychology of users towards the capabilities of data mining within the context of
social media; in parallel, social media users’ transparency with their information on such
platforms as well as their priority over convenience or privacy is crucial to determine a valid
conclusion on the users’ opinion towards data mining within social media as to whether it is a
necessity to develop more convenient functions or a disruption to privacy that can jeopardize
their online data. Additionally, it is vital to ask for expert opinions in order to gain a
comprehensive understanding of the modern context surrounding data mining such as to identify
what specific data is susceptible to being collected, the safety of social media users’ data, and the
possible solutions to address the issue at hand following suit. Understanding the psychology of
social media users towards unseen processes behind every interaction on such platforms is
necessary to provide a concrete answer as to where to find the balance between the convenience
4
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
TABLE OF CONTENTS
TITLE PAGE i
APPROVAL SHEET ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
TABLE OF CONTENTS v
Chapter 1. THE PROBLEM RATIONALE
1.1 INTRODUCTION 7
1.2 BACKGROUND OF THE STUDY 8
1.3 STATEMENT OF THE PROBLEM 12
1.4 SIGNIFICANCE OF THE STUDY 13
1.5 RESEARCH IMPEDIMENTS 15
1.6 DEFINITION OF TERMS 16
Chapter 2. RELATED LITERATURE
2.1 RELATED LITERATURE 19
2.2 THEORETICAL FRAMEWORK 32
2.3 RESEARCH PARADIGM 34
2.4 RESEARCH OBJECTIVES 36
2.5 RESEARCH QUESTIONS 36
Chapter 3. RESEARCH METHODS
3.1 RESEARCH DESIGN 37
3.1.1 LOCALE OF THE STUDY 38
3.1.2 POPULATION OF THE STUDY 39
3.1.3 SAMPLE SIZE 39
3.1.4 SAMPLING TECHNIQUES 40
3.1.5 DATA COLLECTION INSTRUMENTS 41
3.1.6 VALIDATION OF THE DATA COLLECTION INSTRUMENTS 42
3.1.7 METHOD OF DATA COLLECTION 51
3.1.8 METHOD OF DATA ANALYSIS AND PRESENTATION 52
5
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
6
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
CHAPTER 1
INTRODUCTION
correlations within large data sets, producing productive inferences. The information produced
from the said process allows users to make informed decisions such as when mitigating risks and
costs, improving company and business relationships, and increasing revenue among other
common uses observable across many industries and disciplines today such as but not limited to
Given the benefits of data mining observed across professional fields and disciplines,
however, it is also important to note that while data mining is a convenient tool that can provide
accurate analysis from broader samples, data mining raises concerns surrounding privacy,
particularly when data supposedly collected randomly are able to identify specific individuals
once said data is aggregated together; particularly when data is amassed indiscriminately and
collects data that can be directly associated with such individuals such as names and basic
companies that responded to the survey apply data mining” for the reason of identifying trends
easier, developing smarter marketing strategies, and predicting customer’s behavior. While there
is a growing demand for industries that use data mining, does it mean that the level of awareness
on data mining is also increasing? Do people have enough knowledge about the effects of data
7
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
mining on social media and on their privacy? Are they sensitive to data privacy or does their
convenience matter more? The privacy of individuals is very important and as such, the
researchers want to know the perspective of social media users pertaining to data mining. To the
best of our knowledge, there is yet a publication pertaining to data mining within the context of
social media and data privacy to be made. As such, the researchers decided to conduct this study
to provide a better understanding of how data mining affects social media, its users, and their
privacy.
Data mining is a discipline that intersects between the fields of computer science and
analytics; organizations such as the SAS Institute, a significant producer of statistics and
business intelligence software, as well as the collaborative effort of five companies from various
industries representing the European Union under the European Strategic Programme on
Research in Information Technology (ESPRIT), have developed data mining models Sample,
Explore, Modify, Model, and Assess (SEMMA) and Cross-Industry Standard Process for Data
Mining (CRISP-DM) respectively; among other examples of software of similar function (Saltz,
2020).
Despite the differences found in the aforementioned data-mining models, however, the
process of data-mining itself remains the same: Data is first assessed according to quality and
industry standards, trivial information, as well as inconsistent or incomplete data, is filtered out.
It is then followed by inserting data in its respective databases, which may or may not overlap; it
is in this process that data is assessed to group with other complementary data. This is then
8
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
database, it is in this process that unnecessary attributes are eliminated if not included in the next
process. Data is then standardized and prepared, databases should contain uniform attributes of
specific data such as a database for biographic data containing various names, ages, addresses,
and the like, organized according to specific needs. Data is then mined, wherein patterns are
detected such as uptrends for certain searches on the internet in a specific period. The patterns
found in the mined data are then evaluated, where users or data scientists give their insights; and
it is here that data is then converted into valuable pieces of information with the use of graphical
towards interconnectedness across various purposes and users as mentioned prior. As is evident
with the demand for birth certificates and NBI clearances in the Philippines as proof of existence
and or legitimacy of a person’s details, which can be observed to be required most especially
within the context of job application among other scenarios (Perdigon, 2017). On a broader
scale, changes in population, as well as possible causes thereof, can be analyzed through the
large amounts of data composed of countless relevant documents such as the certificates
mentioned prior. More than evident, data mining is utilized across various scales of data, small
and large.
One cannot deny that the growth of technology has significantly changed how people
share data and information. These changes can be observed through the use of the Internet in
various platforms available online such as search engines, social media networks, creative
9
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
content outlets, communications services, payment systems, and many more. However, the
number of users in social media networks is significantly higher compared to other platforms.
Different statistics show that more than half of the world’s population now use social media,
hence opening the door for these networks to pull data from their users that can be used in
various research and studies. However, there is a lack of ethical guidelines for research arising
from social media mining thus the need to raise awareness for ethical research practice. (Taylor,
2017).
As of late, there are 4.48 billion users of social media around the world, equating to more
than half of the world population regardless of age or internet access as mentioned in a statistical
report related to search engine optimization (Dean, 2021). Different statistics show that more
than half of the world’s population now use social media, hence opening the door for these
networks to pull data from their users that can be used in various research and studies. However,
there is a lack of ethical guidelines for research arising from social media mining thus the need to
reviewed to find out data mining techniques that are present in social media. In their study, they
found out that data mining techniques on social media have various strengths and weaknesses
depending on the type of informative data required and call for more profound research that
takes into account accurate implementation of data mining techniques in the academic and
industrial sectors. An article from Statista Research Department published in August 2021 shows
that 81.3 million Filipinos are active social media users. Despite these numbers, data mining
10
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
remains an unfamiliar term in the Philippines. Instead of these unfamiliarities, the researchers
would like to fill in the gap of knowledge about data mining, specifically its influences on our
society. The researcher chose this topic to shed light on how social media networks use our data
and information in various ways. This in return would raise awareness to Filipino social media
In recent years, data mining and information extraction have emerged as critical areas for
amount of data must be collected. Simple numerical numbers and text documents, as well as
more complex information such as location data, multimedia data, and hypertext pages, can be
employed. Data privacy is a widespread concern for people as well as corporate and public
entities. In most cases, the field and operations of data mining result in major data security and
protection difficulties. A notable example would be a retail company taking down a customer's
grocery list. This information can provide a clear indicator of customer interest in a variety of
products (Kade, 2019). The most common view expressed by businesses and governments using
social media data mining to understand their consumers is that it leads to less privacy and more
surveillance. Individuals' private information and sensitive information are collected in order to
create customer profiles and understand user activity patterns. Illegal access to information and
the confidentiality of information are becoming major concerns. Large volumes of sensitive and
private information about persons or businesses are obtained and retained when data is taken
from users. Given the sensitive nature of some of this data and the possibility of unauthorized
11
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Data mining in itself has long been a topic of controversy often bypassing ethical
standards due to several previous accounts of being instrumental in the infringement of data
privacy (IvyPanda, 2019). However it is a matter of how it is utilized that changes the perception
of the public towards data mining, as to whether it is only used exclusively in the context of
manipulation of public opinion, mass surveillance, and other ideas that may incline to what can
be considered as conspiracy theories or as a tool that can help users collect reliable information,
allowing users to make judgments, decisions, and adjustments at a pace once unthinkable by
yesterday’s standards (Keary, 2019). This example of radical progress towards information
technology and other parallel fields must be made known to the public, to ascertain critically the
compromises and conveniences with which data mining has to offer as well as the public’s
The researchers aim to assess the Data Mining and Netizens’ Awareness: Understanding
its Implications on Social Media and Data Privacy. The researcher will make use of a qualitative
and quantitative approach to gather both primary and secondary data and information.
Specifically, this study seeks to determine whether Filipino netizens value their data privacy
more than convenience, or whether they are ready to sacrifice convenience for privacy, or if there
is a middle ground. The research aims to assess public opinion on data mining and assess what
can or should be done with the results to determine the general inclination of netizens towards
data mining either as a tool for convenience or eavesdropping. Data mining is fraught with
12
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
technology as a collection of codes and programs. It's utilized in a variety of settings, including
but not limited to banking, research, and surveillance to discern the transparency of netizens
specifically with their online data. The researchers want to know what specific bits of
information are gathered from social media users by different sectors. The researchers seek to
have a better understanding of the challenges surrounding netizen data privacy and security.
Researchers prefer to address severe issues like sites or applications that unlawfully sell data to
ascertain how aware are netizens of the presence of data mining on social media sites. As a
result, some users are hesitant to provide personal information even on suitable sites. The
researchers seek to discover what data is collected from internet users according to data-mining
program users. Finally, researchers want assurance that the personal information provided by
netizens is truly secure and confidential. Researchers also like to know where data flows because
not everyone who uses the internet is familiar with data mining to seek solutions from
Data mining has received great interest in the information industry in recent years as a
result of the widespread availability of huge amounts of data as well as the methods required to
transform data into valuable information and knowledge. Specifically, the researchers aim to
13
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Netizens
To promote awareness of the use and collection of data, it is important to be made known
the processes and approaches applied in data mining; as well as the data itself, the scales of data
used, what data is collected by whom, among other findings. Netizens must understand how their
social media information is used and, as such, how it is controlled, processed, and filtered.
Society
Social media has served a significant role in society, in which people widely use social
media platforms to express and criticize their opinion. As a result of the information provided in
this study, it shall build and raise data mining awareness, as well as promote every citizen's
conscious use of information, which includes information security understanding and the
Businesses
It will give organizations a broader view because data mining is widely utilized in a
variety of applications such as survey research marketing, product analysis, demand and supply
analysis, investment trends in bonds and stocks, telecommunications, e-commerce, and so on. In
today's highly competitive corporate world, data mining is critical. As a result, data mining can
14
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Future Researchers
review the various data mining trends from its inception to the future. It shall be helpful to
researchers to focus on the various issues of data mining given the constant need for
development in the field of data mining, specifically the design of the techniques used to gain
significant results in today’s competitive global marketplace. Data mining techniques possess
great potential for developing new sets of tools that can be used to enhance or circumvent the
privacy of the average person, increasing customer satisfaction, as well as providing safe and
This study covers the Netizens’ Awareness of Data Mining. The collection of data will be
conducted on Filipino netizens or social media users, ages 16 and above, regardless of gender.
Respondents will be given survey questionnaires to determine the overall public opinion towards
data mining from the perspective of the netizens. Researchers will also interview IT specialists or
IT professors of FEU Diliman to give the researchers a more precise interpretation of the
implications of data mining on social media and its ramifications on modern society. The results
of this study are intended to assess how data mining affects social media, its users, and society in
today’s generation.
participants since it would be risky for the group to conduct physical interviews. Most likely, the
15
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
practical way to achieve this is through e-mails or via Messenger, virtual meetings, and other
online platforms. However, the group might also struggle in brainstorming or sharing one's ideas
through video conferencing. The face-to-face or physical interaction among members is more
effective and engaging than remote meetings due to some challenges like adaptability, struggle,
DEFINITION OF TERMS
For clarification, the following terms used in this study have been defined. The following
terms are:
robots to do tasks
Business Intelligence - the procedural and technical infrastructure that collects, stores, and
Causal relationship – occurs when a data set has a direct influence on another variable;
Computer nodes - any physical device within a network of other tools that's able to send,
16
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
CRISP-DM - an open standard process model that describes common approaches used by data
Data fusion - the process of integrating multiple data sources to produce more consistent,
accurate, and useful information than that provided by any individual data source
Data mining - a process of detecting anomalies, patterns, and correlations within large data sets
Data warehouse - a system used for reporting and data analysis and is considered a core
Deep learning - a type of machine learning and artificial intelligence that imitates the way
Machine learning - a type of artificial intelligence that allows software applications to become
more accurate at predicting outcomes without being explicitly programmed to do so; uses
Netizens – a buzzword pertaining to active participants in the online community of the Internet,
will be used interchangeably with the term “social media users” in this paper
a set of data through a process that mimics the way the human brain operates
17
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Regression analysis - a set of statistical processes for estimating the relationships between a
Search engines - a software system that is designed to carry out web searches
SEMMA - Sample, Explore, Modify, Model, and Assess; a list of sequential steps developed by
18
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
CHAPTER 2
RELATED LITERATURE
Overview
Data mining is defined as the process of detecting patterns and vital information inferred
from particularly large data sets collected primarily for business intelligence activities (IBM
Cloud Education, 2021). Said data sets are stored in data management systems otherwise known
as data warehouses which process large amounts of data from various sources; with which by
time, produces records vital to the analyses of data scientists among other users of said data,
given that mined data are derived from multiple sources to be processed synonymously, data
warehouses have since been considered by organizations as their “single source of truth” (Oracle,
2020). That said, however, data mining is distinct from data analysis in the sense that data
analysis is focused on testing the efficacy of models as well hypotheses, data mining, on the
other hand, emphasizes the use of machine learning and statistical assumptions to detect patterns
from large data sets otherwise undetected by human interference primarily due to said quantity of
data and the possibility of human error, which data mining reduces to a significant extent
(Pedamkar, 2021). Furthermore, data mining relies on structured data and aims to make data
more functional, creating data trends and patterns through mathematical and scientific models
(statistical models); whereas data analysis is a more holistic approach performed on multiple
kinds of data to make business decisions (as an example), relying on the visualization of results,
creating insights through business intelligence and analytics models (Sarangam, 2021).
19
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Timeline
Data Mining is said to have been conceptualized in 1763 as a component of the Bayes’
theorem which attempts to find the relation of current to former probabilities among other
objectives, this theorem would then be instrumental to the process of data mining itself given that
the nature of data mining deals with potential current or future outcomes derived from previous
data (Li, 2016). This would then be followed by the development of regression analysis upon
Legendre and Gauss’ attempt to determine celestial bodies orbiting the Sun in 1805, regression
observational studies and in the field of finance particularly involving the equation of Capital
Asset Pricing Model (CAPM) (Corporate Finance Institute, 2021). In 1936, Alan Turing
proposed the idea of a machine capable of performing rapid computations on large amounts of
data which would serve as the fundamental idea behind the modern computer (Liesbeth, 2021).
A few years later, in 1943, the conceptual model of a neural network would then be
conceptualized by Warren McCulloch and Walther Pitts, proposing that neurons are primarily
capable of three functions which are to receive, process, then generate input (McCulloch, 2016).
Lawrence Fogel would then create the first company in 1965 (Decision Science, a subsidiary of
Titan Systems, Inc. since 1982) to apply computation to address gaps such as the development of
missile evasion of military hardware, medical diagnostics, risk management, and other fields
In 1975, John Henry Holland would then have written his book Adaptation in Natural and
Artificial Systems; which proposes the theoretical foundations of how data mining is understood
20
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
today (Grimes, 2015). Two decades later, in 1992, Bernhard E. Boser, Isabelle M. Guyon, and
Vladimir N. Vapnik would then improve the support vector machine, a supervised learning
model, to create nonlinear classifiers (functions used to separate instances that are otherwise
inseparable through linear classification) to enhance its data analysis and pattern recognition
properties for classification and regression analysis (Paul, 2017). Nine years later, data science
would have been then formally introduced by William S. Cleveland as a stand-alone discipline
(Udemy, 2016). In 2015, the White House elected Dhanurjay Patil as its first Chief Data Scientist
Examples of Purposes
Data mining is present across countless professional fields due to its many potential uses;
likewise in the context of healthcare is used to measure the effectiveness of certain treatments,
according to an article from the Morsani College of Medicine, it is with the aid of data mining
that they can find the best course of action on how to treat patients with their respective
conditions at a pace more rapid and efficient than ever due to the nature of data mining as to
collect data containing necessary remarks and quantifiable results, making following diagnosis
and treatment processes more efficient (USF Health, 2021). Furthermore, it is also stated from
the same article that data mining is also applied to detect fraud and abuse in the sense that it is
used to assess potential abnormalities in prescriptions and insurances, was stated that the Texas
Medicaid Fraud and Abuse Detection System was an ideal example as in the year 1998, it was
able to recover $2.2 million in stolen funds as well as a total of 1,400 individuals for
investigation.
21
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Likewise in business, data mining has several purposes to encompass most processes, as
is evident with service providers collating billing details, interactions, customer visits into the
providers’ websites among others in which such compiled data is analyzed to give service
providers an idea as to what incentives or promotions to offer (Matillion, 2020). Aligned with
this, service providers in Malaysia such as Celcom and Maxim have been known to assess their
performance through their customers’ feedback on Twitter which would be illustrated in the form
of graphs from the said application (Burhanuddin et al., 2021). Likewise in retail, valuable data
is collected and analyzed to give retailers an idea as to what products to offer depending on
variables like the seasonality or trends surrounding products; as such, data such as sales,
transactions, as well as demographics are valuable for retailers to measure risk and make
recognized as the predictive nature of data mining (Karim & Sirwan, 2019). The electronic
commerce industry has also been known to employ the use of data mining to assess historical
financial, inventory, and transaction data to optimize enterprise resources and detect fraud and
abnormal events as observed with the likes of Amazon and Shopee (Chen, 2017).
Data mining is also present in the field of finance as it can be used to detect trends,
spikes, and downtrends in sales through compiled historical data, as well as to determine the
flow of capital thereof, and to measure the ratio between profit and sales (Datalya, 2018).
Likewise, programs have been developed for traders who engage in trading equities and
22
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
currencies to automatically trade requiring less human interaction, these programs predict future
outcomes such as uptrends and dips using previous data as employing the use intelligent analysis
of statistics inferred from the constant fluctuations in prices of such commodities (Popescu,
2021).
Lately, data scientists have conducted innumerable studies on deep learning algorithms
which possess the potential to further improve the capabilities and break the current boundaries
of data mining specifically through brain simulations to improve the capabilities and the current
use of learning algorithms, to make further advances in machine learning and artificial
intelligence, and to make radical improvements to that of AI, all of which are said to be possible
due to the modern data mining’s use of larger neural networks, and thus significantly
overshadowing prior learning algorithms, and can therefore potentially expand its uses to more
Foreign Controversies
Despite the convenience which data mining provides, however, privacy and data mining
have been more intertwined than ever, augmented as a result of prior events particularly Edward
Snowden, a former employee of the National Security Agency, exposing the said agency’s
misuse of data mining as to conveniently eavesdrop on citizens within and beyond United States
territories through the Planning Tool for Resource Integration, Synchronization, and
Management (PRISM) program which was proposed to prevent any further terrorist attacks akin
to the September 11 attacks which cost several lives and a negative impact on the U.S. economy
23
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
(Houston, 2017). To further expand on other tools used by the NSA, said agency employs
collected data of which has gathered data primarily focusing on telephony and internet data from
countries such as Germany, France, and Spain among other countries (ET Bureau, 2019).
Another program employed by the aforementioned agency was Tempora, which was spearheaded
used by the same agency, a Linux software also leaked by Edward Snowden that uses the Apache
web server and stores data in MySQL databases which enables users of said program to access
any individual’s emails, and has been reported to be also used by the Bundesnachrichtendienst,
Australian Signals Directorate, and Japan’s Defense Intelligence Headquarters among other
their use of data mining is to oversee a balance between the acquisition of information and the
protection of private interests due to its duty (that of said agency) to protect the rights of the U.S.
citizens, emphasizing that data mining must be used responsibly, hence, evidence of conflicting
interests between agencies even of the same country observed through the use of data mining
(Central Intelligence Agency, 2016). It must be addressed however that the CIA, as well as other
members of the United States Intelligence Community, outsources its data-mining activity to
Palantir Technologies, which has had its share of controversies recently, particularly for
colluding with Cambridge Analytica, a British consulting firm, which has been accused of
24
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
influencing U.S. elections as well as the Brexit vote (Bensaid, 2020). This raises concerns
particularly about the CIA’s handling of data as well as augmenting its reputation for acquiring
information through questionable means as evident to prior projects. Furthermore, another online
article emphasized the CIA’s incompetence in handling data collected a year later after the data
mining report conducted by the CIA mentioned prior; amounting to a total of 34 terabytes worth
of compromised data stolen by an undisclosed CIA employee which was uploaded to WikiLeaks
as a result of a lack in security which the Center for Cyber Intelligence, was held accountable for
In 2012, Republic Act 10173, also known as the Data Privacy Act (DPA) of 2012, was
passed. This law pointed that although it is the State’s duty to ensure free flow of information in
order to promote innovation and growth in the country, it also recognizes its citizen’s
fundamental right to privacy of communication and honored the states inherent obligation to
government or private sectors are secured and well protected (Ecci International, n.d.). The law
created the National Privacy Commission (NPC) that will be in charge of monitoring compliance
of the Philippines with the international standards set for data protection. The NPC is also tasked
with administering and implementing the provisions of the DPA. The NPC was formally
organized in 2016 and is headed by privacy commissioner Raymund Liboro, assisted by two
deputy commissioners, namely Dondi Mapa and Ivy Patdu; the NPC also follows their mandate
to ensure that they themselves follow suit in accordance with said law, also to accept and solve
25
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
complaints, to issue cease and desist orders, to ensure that local authorities follow and respond
with accordance to the aforementioned act, to also promote a culture of protecting the data
privacy of all individuals and to guide such parties that seek their assistance in observing said
culture, as well as to enforce data privacy laws across borders (National Privacy Commission,
n.d.).
In connection to our study, we emphasize three general principles with respect to the
collection and processing of personal data according to the NPC. The entities covered by the Act
and the Implementing Rules must adhere to the following principles: transparency, legitimate
purpose, and proportionality. The principle of transparency relates to social media and data
mining as it requires the purpose in processing a person's data to be determined and disclosed
before its collection. Transparency allows the data subject to be aware of the nature, purpose, and
extent of the processing of his or her personal data, including the risks involved, the identity of
the personal information controller, their rights as a data subject, and the way these are often
exercised. Information and communication that is related to the processing of personal data must
be easily accessible and understood by the data subject. The principle of legitimate purpose
requires the gathering and processing of information must be compatible with the specified
purpose, and must not be contrary to law, morals, or public policy. The principle of
proportionality requires that the processing of data should be adequate, relevant, suitable,
necessary, and not excessive to the required purpose. Personal data shall be processed only if the
purpose of the processing could not be fulfilled by other means (National Privacy Commission,
2016).
26
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
It is also worth noting that the Philippine National Police Anti Cybercrime Group
(PNP-ACG) makes use of J48, an open source Java algorithm, in order to mine texts, with which
said algorithm has analyzed and identified online scams; totalling to a dataset consisting of 82
documents with a total of 14,098 mainly Filipino words or attributes in order to respond to
complaints regarding online scams which has been reported to have significantly increased from
a double-digit figure to a triple-digit figure from 2013 in a span of four years to 2017 according
McRobbie and Thornton undertook their stock-taking precisely as media systems were
further destabilized, focusing on the diversity of traditional media space. With the dawn of the
twenty-first century, digital platforms have come to underpin and even define social life in rich
cultures, with individuals' identities and relationships being fostered at least in part through
computing infrastructures (Lupton, 2018). Social media is one of the most visible expressions of
changed how information is produced and exchanged. They promote vernacular discourse and
disseminate massive amounts of "user-generated content" (Yar, 2014). Digital platforms are also
displacing traditional media as a source of information. Finally, their nature as loosely coupled
networks of users not only promotes virality – the quick and unpredictable spread of content –
27
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
but also creates a broad virtual sociality (Baym, 2015). Various features - such as 'likes,
"retweets,' hashtags (#), mentions (@), and so on – index and anchor communications, enhancing
awareness of others and linking spatially dispersed users into communities of shared interest and
According to the survey, 81% believe the hazards of companies collecting data about
them exceed the advantages, while 66 percent believe the same about government data gathering.
In related news, 72 percent of adults think they get very little or no value from companies
collecting data on them, and 76 percent feel the same thing about government data collection.
Companies collect data with the goal of profiling customers and perhaps focusing the sale of
goods and services to them based on their attributes and habits. As per the survey, 77% had heard
or read at least a little about how businesses and other organizations utilize personal data to
target commercials or special offers, or to estimate how risky customers might be.
Approximately 64% of all adults claim they've received advertisements or solicitations based on
their personal information. And 61% of those who have seen advertising based on their data feel
the ads at least somewhat correctly reflect their interests and characteristics. (This equates to 39
percent of the adult population.) (2019, Auxier et al.). Nowadays, netizens embark on a prevalent
lifestyle to actively voice out their opinions online that includes both forums and social
networks. Their opinions which initially are intended for their groups of friends propagate to the
attention of many. This pool of opinions in the forms of forum posts, messages written on
micro-blogs, Twitter and Facebook, constitute online opinions that represent a community of
online users. The messages might seem to be trivial when each of them is viewed singularly; that
28
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
said, however, the converged sum of them serves as a potentially useful source of feedback to
current affairs after analysis. A local government, for instance, may be interested to know the
response of the citizens after a new policy is announced, from their voices collected from the
Internet. However, such online messages are unstructured, their contexts vary greatly, and that
applications, public opinion in online education refers to the tendentious individual attitude and
subjective will expressed by netizens around the occurrence, development, and change of a
certain educational phenomenon (including ideas, events, figures, policies, problems, etc.) in the
virtual space of the Internet. With the further development of mobile Internet and social network
technologies, the expression forms of education public opinion are increasingly diversified,
especially the public opinion data has shown multi-dimensional characteristics. In recent years,
education policy have paid more and more attention, and the network has become the most
concentrated display platform of national discourse. Furthermore, with the deepening of the
comprehensive reform of education, the education authorities also hope to receive feedback from
the people on the adjustment of education policies and reform measures, to provide the basis for
the decision-making of education reform. However, current works on education public opinion
analysis focus on qualitative analysis of educational events, such as the use of questionnaires and
other forms, resulting in limited data sources, and the analysis angle and contents are also limited
to the analysis of statistical results (Liu et al., 2021). With the development of big data and data
29
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
mining technologies, we can more in-depth clarify the law of education public opinion
dissemination, and mine the deep-seated views of educational events. Thus, valuable information
can be extracted from massive, noisy, and fuzzy data (Liu et al., 2021). Therefore, taking the
cross-dimension data of public opinion in online education as the starting point, this paper makes
The amount of individuals using social media is growing, with 4.48 billion people
expected to use it in July 2021. This translates to more than 57% of the global population. The
number of persons accessing social media has increased by 13.1% in the last year. Throughout
July 2020 and July 2021, 520 million new users joined social media, with an average of almost
1.4 million new users every day. More than nine out of ten internet users use social media at least
once a month, and six social media networks have more than one billion monthly active users.
The Philippines had 89 million social media users in January 2021. The number of Filipinos
using social media increased by 16 million (+22%) from 2020 and 2021. Social media users in
the Philippines stood for 80.7% of the total population in January 2021. (Kemp, 2021).
For the third year in a row, the Philippines holds on to its title of being the social media
capital in the world and this is based on the Global Digital Report of 2019 by We Are Social.
Based on their report, there are 76 million active Filipino social media users which covers 71%
of the entire population. These users spend an average of four hours a day on different social
30
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
media platforms which is quite questionable considering that the country’s internet speed
Many factors affect social media growth in the Philippines. One of the factors is the
socio-economic aspect of the Filipinos. Having a wide range of cheap smartphones in the
country, it creates more potential users of social media from a wider socio-economic
demographic. This may also explain the 67% of social media users accessing via mobile devices.
Next factor is the Age of social media users. According to the Digital Report conducted by We
Are Social in 2019, the major group of social media users within the Philippines are in the 18-24
age range. It makes up 33% of active users with around 21 million users. With this report, it is
interesting to understand how intertwined social media platforms are with the user’s social,
personal, and academic life as the age range includes students in the university up to the early
career age. Nowadays, it is a common practice among classes to utilize social media platforms as
a virtual meeting place to use outside the classroom since social media platforms are created to
be easily accessible and to offer more room for interaction than the typical bulletin board inside
the classroom. Aside from that, the cultural aspect of the Filipinos is also one of the factors that
affects social media growth in our country. Filipinos are very social people and are known for
their hospitality. With an estimated 10.2 million Filipinos abroad, the culture of using social
media platforms to connect with their families helps bridge the distance in a way that other
applications cannot provide. Therefore, social media is still - after all - a means of connecting
(Estares, 2019).
31
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Social media in the Philippines will continue to grow and become part of every Filipino's
day to day life. Whether this will be a positive or negative experience for everyone, it will
depend on how the users are knowledgeable about data mining, and their ability to use social
media as something that can impact their life or just a tool that will compromise their private
information.
THEORETICAL FRAMEWORK
The theory of the Three-tier Big Data Mining Architecture established by Wang supports
this research. (2019). Figure 1 depicts a generally recognized data mining analytical architecture
32
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
with three layers: data access and computing, data privacy, domain knowledge, and large data
mining method. The theory was based on data mining techniques and philosophy.
The figure shows that the inner core data mining platform is primarily responsible for
data access and computation, according to Wang (2019). With the growing volume of data, it's
more important than ever to consider distributed storage of large-scale data while computing.
That is, data analysis and task processing are broken down into numerous subtasks and run in
parallel on a large number of computer nodes. The structure's middle layer is critical for
connecting the inner and outer layers. The inner layer's data mining technology provides a
foundation for data-related activities in the intermediate layer, such as information exchange,
privacy protection, and knowledge acquisition from regions and applications, among other
things. Information exchange is not only the assurance of each phase in the process but also the
objective of processing and analyzing large data in the smart grid. Data fusion technology is used
at the outer layer of the architecture to preprocess heterogeneous, unpredictable, incomplete, and
multi-source data. Complex and dynamic data will be mined after preprocessing, and ubiquitous
smart grid global knowledge will be acquired by local learning and model fusion. Finally, the
The theory of the three-tier big data mining architecture states that ‘When data mining is
used properly, it may provide you with a significant competitive edge by allowing you to
understand more about your consumers, develop successful marketing strategies, boost revenue,
and save expenses.’ According to Wang and Zhang (2019), data mining can provide answers to
business issues that were previously too time-consuming to answer manually. Users can uncover
33
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
patterns, trends, and correlations that they might otherwise overlook by employing a variety of
statistical approaches to examine data in various ways. They may use the information to forecast
what will happen in the future and take action to affect company results; relating to the study of
determining Data Mining and Netizens’ Awareness: Understanding its Implications on Social
RESEARCH PARADIGM
Figure 2. The Conceptual Framework for Data Mining and Netizens’ Awareness:
Understanding the Implications of Data Mining on Social Media and Its Ramifications on
Modern Society
The figure above exhibits the concept for the study of “Data Mining and Netizens’
Awareness: Understanding its Implications on Social Media and Data Privacy”. The figure
included the uses of data mining across various industries as the independent variable, the
extraction of user’s data serves as the mediator, and the dependent variable contains the
34
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
The independent variable comprises seven parameters followed by the mediating variable
with two parameters. The seven parameters indicated on the independent variable are the root
cause of the existence of the mediating variable. The collection of users’ data is required for
industries to utilize the benefits of data mining. The mediating variable then affects the
dependent. The dependent variable has two parameters, these are the significant and insignificant
implications of data mining to the netizens. The first parameter contains the significant or
positive implications of data mining to the users while the other parameter consists of
insignificant or the users’ issues of the application of data mining. The extraction of users’ data
causes agony to the netizens as their identity and privacy are involved. Data mining is greatly
used today by many companies to study patterns in a huge set of data so they can learn more
about their customers. Industries that utilize data mining are capable of having a competitive
The purpose of the framework is to identify how data mining works, what type of user’s
data is extracted, and determine how and where the data are stored to promote awareness and
provide proper understanding to the netizens. This can also help users to protect their privacy,
minimize the negative impact of the issues associated with data mining, and appreciate the
35
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
RESEARCH OBJECTIVES
The study on Data Mining and Netizens’ Awareness: Its Implications on Social Media
1. To determine the general opinion of netizens towards data mining either as a tool for
2. To ascertain how aware are netizens of the presence of data mining on social media sites.
4. To seek solutions from professionals that can help netizens protect their private
information.
RESEARCH QUESTIONS
1. Are netizens more inclined to associate data mining with making online content more
2. Are netizens aware of the capabilities of data mining occurring in social media?
3. How transparent are netizens with their data in their social media accounts?
4. Do netizens prioritize convenience over privacy in terms of social media access, vice
versa?
36
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
CHAPTER 3
RESEARCH METHODS
RESEARCH DESIGN
According to De Vaus (2006) “The research design is the overall strategy chosen to be
used in integrating the different parts of the study in a coherent, logical way. It constitutes the
blueprint for the collection, measurement, and analysis of data.” The research design used was
In this study, the researchers involved both qualitative and quantitative design to enhance
the research and prove the reliability and validity of the study.
The qualitative method was applied in this study to emphasize the quality of information
and processes that are not mathematically examined or measured in terms of quantity, amount,
intensity, or frequency.
The quantitative method was used in this study to emphasize objective measurements
including the statistical, mathematical, and numerical analysis of data collected through
The study utilized the said approach as it is meant to determine the implications of data
mining as well as its ramifications on modern society by conducting surveys and interviews with
participants, collecting their descriptive data, and analyzing it using statistical methods to
37
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
The study will be conducted in the National Capital Region (NCR) since the chosen
The National Capital Region (NCR) is also known as Metropolitan Manila. It is the
country’s political, economic, and educational center. On November 7, 1975, Metro Manila was
formally established through Presidential Decree No. 824, under the management of the
38
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
The population is as an entirety of all the subjects and objects that conform to a set of
specifications compromising a group of people that is of interest to the researcher (Polit and
Hungler 1999). The study population consisted of netizens (social media users) and IT specialists
in the Philippines. The aforementioned subjects were determined to participate as the population
of the study as they can provide different perspectives toward data mining and provide precise
interpretation of the implications of data mining on social media and its ramifications on modern
society, which can help the researchers to assess the purpose of this study.
SAMPLE SIZE
39
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Mouton (1996:132) defines the sample as the subjects selected from a group with the
intent to discover something about the total population from which they are taken. The sample of
the population of this study included a minimum of 250 Filipino netizens or social media users
and 3 Filipino IT Specialists. Convenient subjects were selected to participate in the study until a
sample size of 250 was attained. A convenient sample is composed of subjects involved in the
study because they occur to be in the right place at the right time (Polit & Hungler 1993:176).
The 250 Filipino netizens, whose age ranges from 16 and above, currently residing in NCR and
were chosen via email and messenger, while the 3 Filipino IT professors were selected from FEU
Diliman. The sample size of 250 netizens and 3 IT professionals were the total subjects who met
SAMPLING TECHNIQUES
sampling involves using respondents who are “convenient” to the researcher (Galloway, 2005).
Since physical distancing is highly encouraged during this pandemic, this sampling technique
was chosen by the researchers due to their limitations to conducting this study physically or
face-to-face.
40
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
checklist of researcher-made questionnaires to gather the necessary data for the Data Mining and
Netizens’ Awareness: Understanding its Implications on Social Media and Data Privacy. Based
on the researcher's readings, previous studies, technical literature, published and unpublished
thesis relevant to the analysis, the draft of the questions was drawn up. The criteria for the design
of good data collection tools were considered in the preparation of the instrument. For reference,
to satisfy the information preparedness of the participants, the statement explaining the
circumstances or difficulties involved were toned down. Open-ended options have been given to
allowed to obtain the netizens’ and IT Professors’ valid responses. A variety of research
assumptions are based on the preference for the use of standardized questions, such as a) the cost
of being less costly means of data collection, b) the avoidance of a personal bias, c) less demand
for an immediate response, and a greater sense of privacy for the participants. Ultimately, it
promoted transparent solutions to critical issues at hand. Furthermore, few consultants and
advisers validated the instrument before it was laid on for the analysis.
41
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Survey Questionnaire
Part 1:
Facebook
Twitter
Instagram
Youtube
Tiktok
Shopee
Lazada
PART 2: This section determines the respondents’ awareness of some of the modern capabilities
of data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
42
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
PART 3: This section determines the online transparency of social media users, therefore their
susceptibility to data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
43
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
PART 4: This section determines the priority of social media users as to whether they prioritize
privacy over convenience, vice versa. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
44
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
was created that included critical factors such as our respondents' awareness of the conventional
uses of data mining across various industries, as well as its modern capabilities and share of
controversial uses; our respondents' online habits and their preference for convenience over
In order to increase the validity of the potential output, the services of an outsourced
statistician were enlisted to validate the survey; after validating the survey, the statistician
emphasized the importance of including an introductory page that requests our respondents'
45
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
consent in accordance with the Data Privacy Act of 2012, which is coincidentally related to the
Furthermore, prior to formally releasing our survey, a pilot test of 25 respondents from
the selected population was undertaken to assess the possible consistency of our data and to
reduce the possibility of disseminating an incoherent survey at the last minute. The statistician
then used the Tau-equivalent reliability, often known as Cronbach's Alpha, to analyze the
findings, which yielded a rating of.930 derived from the responses from the survey. According to
the statistician, an equivalent of .9 indicates that the survey was well-designed and, as a result, is
Prior to the official distribution of our survey, we have received these responses from our pilot
testing:
1 0 1 9 15
2 0 5 5 15
3 0 6 5 14
4 3 3 10 8
5 3 3 6 13
6 3 5 3 13
7 7 4 9 3
8 3 3 9 9
46
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
9 8 6 5 5
10 3 6 5 10
11 1 6 9 8
12 5 8 8 2
13 0 0 8 17
14 0 0 9 16
15 0 5 14 4
Our initial findings suggest a majority of positive responses that our respondents are
aware of algorithms being utilized in social media sites (statement 1), and the same amount that
is completely aware of personalized ads when using online shopping platforms, five among the
respondents having a grasp as per the concept, with the same amount being unaware (statement
2). The majority of our respondents are also aware of the online presence of law enforcement
among other public agencies for the purpose of investigating and preventing cybercrime, with an
almost similar outcome of the previous statement of respondents merely having a grasp of the
idea, as well as those who are partially unaware of their presence (statement 3). Meanwhile, we
have received a majority of positive responses pertaining to their knowledge towards the
organizational and anti-fraud uses of databases across industries such as in banking and
to the general concept of data mining and its ability to rapidly organize data (statement 5).
complete and authentic basic information such as their real names, and contact information on
social media sites (statement 6). Half of our respondents admit to having publicly viewable
47
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
social media accounts, the other half being presumably hidden from public view or utilizing
anonymous accounts(statement 7). Our findings also suggest a majority of respondents read the
terms and conditions prior to using an application, contrary to the common belief that says
otherwise (statement 8). On the other hand, more than a half of the respondents deny sending
case-sensitive information such as passwords and proof of transactions via social media sites
(statement 9), while more than a half of responses suggest our respondents habitually comment
Most of our respondents agree that the use of online algorithms by social media sites are
designed to augment convenience such as when following content aligned with their interests
and suggested items by online marketplaces based on our respondents’ previous searches
(statement 11), although it is worth noting that more than half of our respondents are not willing
to deliberately provide their information for any undisclosed purpose by social media sites
(statement 12). Parallel to our findings on statement 7, all of our respondents attest to modifying
their accounts as to choosing which information can be viewed publicly and exclusively, more
apparently are inclined to make their accounts more private, with more than a quarter only
making minimal changes as per making their accounts more private (statement 13). Most of our
respondents also expect the data that they choose to delete to be completely erased, void of any
traces, with more than a quarter suggesting options with which they can still have a back-up,
presumably when deleting data accidentally or out of impulse (statement 14). Lastly, a majority
of our respondents expect their details in their social media sites to be secure and free from being
48
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Interview Questions
● Given that FEU Diliman has an extensive and efficient program for student and faculty
identification, Would you say that it is because of this system that it is easier for students
and professors alike to integrate themselves into canvas accounts, as well as to ask for a
generated password for internet access within the premises? What kind of data exactly is
collected in the school database?
● What kind of programs are used to create the school database? And is there a name for
the database?
● Were there any issues during the development of that system?
● Would you say that there is at least a process of data mining present in the collection of
student data?
● We are aware that data mining is one of the topics taught under our integrated IT subjects
such as business analytics, what do you think are the benefits of being taught about this
process?
● As educators specializing in Information Technology, what other functions of this school
make use of data mining? Do other schools do the same? Are there other functions in this
school that data mining could make more efficient and convenient?
● With the rapid development that can further augment the capabilities of data mining,
what are the possible current issues that data mining can solve that it has yet to address?
(foreign, local, within the school)
● What would you say are the advantages of data mining over more primitive data
collection methods?
● Is there any law or policy that we should all be aware of pertaining to data mining? Were
there any recent local controversies regarding the topic? (cite some cases from other
countries mentioned in the RRL if necessary).
● How would you say that government and law enforcement agencies prevent potential
crimes? Through social media sites? Would you say our local agencies also make use of
data mining similar to foreign governments?
49
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
● Can you provide a rough estimate or percentage as to how many Filipinos use social
media? Would you say that they are aware that what they do on those social media sites is
susceptible to being documented or collected through third parties and other means? (cite
Cambridge Analytica incident if necessary)
● As specialists in information technology, is there any assurance that our online data as
netizens are secure? How susceptible are we to data mining without our consent? Is there
any way to mitigate or prevent the intrusion of our online privacy?
● Where do we draw the line between the convenience brought by the output through data
mining and the privacy and security of our online data?
On the other hand, a series of questions were designed for the interview to be conducted
on the latter half of this study, said questions were constructed also aligned with the objectives of
the study in mind. Given that the selected respondents for this data collection instrument are
educators of the same university to that of the researchers, we have limited the context of some
of the questions to the specialized field of said educators; that said, however, being also
specialists of the field of information technology, a fraction of questions were also designed with
a holistic concept of data mining considering that the topic at hand is also included in the
researchers’ curriculum as part of the IT-fusion program as per the first-hand experience of the
researchers under data mining-related subjects such as the Fundamentals of Business Analytics
This data collection instrument was validated by the researchers’ thesis adviser and
coordinator and as such, was a tacit implication deeming pilot testing for this instrument as
50
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
The researchers’ methods of collecting data for this study are through surveys and interviews.
Survey
A survey is a wide term that refers to any data gathering strategies in which each
participant is given to answer the same set of questions in a predetermined order, and the answers
are simple and easily understood by the researcher. Given the nature of the study, questionnaires
were used to collect data that was later used in the study's analysis. This will aid in acquiring a
The first step before giving out the survey form is to make the necessary consent and
permission from the participants to consider ethical issues. After obtaining consent and
permission, the questionnaire was floated out to the participants and they were given enough
time to answer the survey questions. Thereafter, the accomplished survey will be collected and
the responses will be tallied, tabulated, analyzed, processed, and interpreted using the statistical
Interview
Is a method of collecting data through oral and verbal communication between the
researcher and the respondent, both structured and unstructured questions. The interview was
used because they are quite flexible, adaptable, and can easily allow the researcher to gain
greater insight into the topic. It was used to find views of the faculties in FEU Diliman about
51
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
data mining. The interview session will be conducted online in google meet or zoom so that
he/she would feel more at ease and comfortable sharing information needed for the study. Then,
The data and interview responses provided by the respondent would be kept confidential
to encourage a degree of freedom and adaptation in gathering information from them. Then, the
purpose of the interview was clearly explained. A collection of open-ended interview questions
was created, which served as a guiding pattern but were not completely adhered to. In this
unstructured interview approach, open-ended questions were utilized to obtain a more thorough
It is emphasized in this juncture of the study that both qualitative and quantitative
methods shall be applied to process the data to be gathered in the succeeding procedures with
which valuable information shall be the product thereof. As such, measurable questions shall be
used to assess the mean through five-point scales, with which data shall be interpreted with
shall also be calculated as deemed necessary by the researchers to provide more profound
quantitative interpretations derived from the same sets of data stated prior. It is also just as
necessary to mention in this juncture that thematic analysis shall be utilized in the succeeding
thematic analysis would enable the researchers to interpret qualitative output while considering
52
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
crucial factors that may affect the answers with which the respondents might provide (Kiger &
Varpio, 2020).
However, the researchers must expect potential obstacles that the use of thematic analysis
may bring. As such, the researchers must expect to provide extensive interpretations
proportionate to the quantity of data that shall be amassed while considering necessary details
such as outliers and contradicting results due for further analysis (Rosala, 2019). Obstacles aside,
all methods stated prior are necessary for the researchers to produce valuable information about
the psychology of netizens towards data mining, and thus, taken by the researchers to mean tacit
As such, the formulas with which the researchers shall measure data are as follows:
1. Percentage implies to derive a specified quantity from a hundred; by dividing the former to
the latter rebased to 100, the output of which is expressed with the percent sign (%) (Ministry
respondents.
2. Weighted mean is a variant of mean derived from the multiplied weights accordingly with its
mean which is then summed with all other necessary products (Mucha & Pamula, 2020).
Through the weighted mean, the researchers shall determine the average answers from each
respective query.
53
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
3. Grand mean is derived either from a certain set of subsamples or all observations; it is
computed by simply dividing every data point by the joint sample size (Stephanie, 2021).
ETHICAL CONSIDERATIONS
In parallel to the topic of our study, it is expected from the researchers to practice utmost
respect and confidentiality in handling the respondents’ information and their respective answers
and as such, shall ask for their consent whenever it is deemed necessary over the course of this
paper and beyond its publication. The researchers are also expected to state their objectives upon
distribution of the data gathering instruments, with emphasis that the respondents disseminate
said instruments using whatsoever means so long as it is within legal bounds and provided that
they can still be identified in order to remain liable for the confidentiality of the respondents’
Upon dissemination of the data gathering instruments, the researchers shall adhere to the
provisions of Republic Act No. 10173 or the Data Privacy Act of 2012, its Implementing Rules
and Regulations, relevant policies and issuance of the National Privacy Commission, and all
other requirements and standards for continuous improvement and effectiveness of personal data
54
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
REFERENCES
Auxier, B. et al. (2019, November 15). Americans and Privacy: Concerned, Confused and
Feeling Lack of Control Over Their Personal Information. Pew Research Center: Internet,
Science & Tech; Pew Research Center: Internet, Science & Tech.
https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confuse
d-and-feeling-lack-of-control-over-their-personal-information/
Baym, NK. (2015). Personal Connections in the Digital Age. Google Books.
https://books.google.com.ph/books?hl=en&lr=&id=4_1RCgAAQBAJ&oi=fnd&pg=PT6&ot
s=PUuT3qV-In&sig=bGpwya7ItUubdojhqoIMxSuA3ww&redir_esc=y#v=onepage&q&f=fa
lse
Bhasin, H. (2019). Data Mining In Retail: Applications and Six Phases in the Life Cycle.
https://www.marketing91.com/data-mining-in-retail/
Burhanuddin, M. (2018). Analysis of Mobile Service Providers Performance Using Naive Bayes
Data Mining Technique. 10.11591/ijece.v8i6.pp5153-5161
Calderon, T., Cheh, J.,& Kim, I. (2003). How Large Corporations Use Data Mining to Create
Value. Management Accounting Quarterly.
Central Intelligence Agency. (2016). 2016 Data Mining Report, January 01, 2016, real-life
through December 31, 2016.
55
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
https://www.cia.gov/static/18ac2c0c8b3caa27a014b4239d42efad/2016-cia-data-mining-report.pd
f
Dean, B. (2021). Social Network Usage & Growth Statistics: How Many People Use Social
Media in 2021?. https://backlinko.com/social-media-users
De Vaus, D. A. Research Design in Social Research. London: SAGE, 2001; Trochim, William
M.K. Research Methods Knowledge Base. 2006.
https://libguides.usc.edu/writingguide/researchdesigns
Disini & Disini Law Office (2017). Data Privacy Principles and Rights
https://elegal.ph/data-privacy-principles-and-rights/.
Ecci International. (n.d.). A Summary of RA No. 10173 or the Data Privacy Act of 2012.
https://eccinternational.com/ra-10173-data-privacy-summary/
Edgar, Thomas W. (2017). Research Methods for Cyber Security || Exploratory Study. , (),
95–130. doi:10.1016/B978-0-12-805349-2.00004-2
56
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
ET Bureau. (2019). A look at mass electronic intelligence setup in the US and UK.
https://economictimes.indiatimes.com/tech/internet/a-look-at-mass-electronic-intelligence-set
up-in-the-us-and-uk/articleshow/67358291.cms?from=mdr
Homeland Security Today. (2020). Newly Unclassified Report Finds CIA Security Failures Led
to Massive 2017 Breach.
https://www.hstoday.us/subject-matter-areas/cybersecurity/newly-unclassified-report-finds-ci
a-security-failures-led-to-massive-2017-breach/
Houston, T. (2017). Mass Surveillance and Terrorism: Does PRISM Keep Americans Safer?.
https://trace.tennessee.edu/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir
=1&article=3085&context=utk_chanhonoproj
Injadat, Mohammad Noor; Salo, Fadi; Nassif, Ali Bou (2016). Data Mining Techniques in Social
Media: A Survey. Neurocomputing, (), S092523121630683X–.
doi:10.1016/j.neucom.2016.06.045
Kade, R. (2019, February 5). Your Guide To Current Trends And Challenges In Data Mining.
SmartData Collective.
57
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
https://www.smartdatacollective.com/your-guide-to-current-trends-and-challenges-in-data-mi
ning/
Karim, S., Sirwan, R. (2019). A Data Mining Approach for Prediction Modeling Using
Association rule. 10.30630/joiv.3.3.264
Keary, T. (2019). The balancing act of data mining ethics: the challenges of ethical data mining.
https://www.information-age.com/data-mining-123481736/
Kemp, S. (2021, February 11). DataReportal – Global Digital Insights. DataReportal – Global
Digital Insights. https://datareportal.com/reports/digital-2021-philippines
Kiger, M. & Varpio, L. (2020). Thematic analysis of qualitative data: AMEE Guide No. 131.
https://doi.org/10.1080/0142159X.2020.1755030
Leprice-Ringuet, D. (2018). The UK’s mass surveillance regime has broken the law again.
https://www.wired.co.uk/article/uk-mass-surveillance-echr-ruling
Liu S, Wang S, Liu X, Lin C-T, Lv Z (2021) Fuzzy detection aided real-time and robust visual
tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102
Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for an out-of-view challenge
in visual tracking under IoT assisted complex environment. Neural Comput Applic
33(4):1055–1065
58
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
National Privacy Commission (2016). Implementing rules and regulations of the data privacy
act. https://www.privacy.gov.ph/implementing-rules-regulations-data-privacy-act-2012/#17
Palad, E. (2018). Document classification of Filipino online scam incident text using data mining
techniques. https://animorepository.dlsu.edu.ph/etd_masteral/5522/
Patil, D. (2015). White House Author, DJ Patil, Deputy Chief Technology Officer for Data Policy
and Chief Data Scientist in the Office of Science and Technology Policy.
https://obamawhitehouse.archives.gov/blog/author/dj-patil
59
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Rosala, M. (2019). How to Analyze Qualitative Data from UX Research: Thematic Analysis.
https://www.nngroup.com/articles/thematic-analysis/
Saltz, J. (2020). CRISP-DM is Still the Most Popular Framework for Executing Data Science
Projects. https://www.datascience-pm.com/crisp-dm-still-most-popular/
Sarangam, A. (2021). Data Mining vs Data Analysis – An Easy Guide In Just 3 points.
https://www.jigsawacademy.com/blogs/data-science/data-mining-vs-data-analysis/
Statista. (2021, August 12). The number of social network users in the Philippines is from
2017–2026.
https://www.statista.com/statistics/489180/number-of-social-network-users-in-philippines/
60
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Software Testing Help. (2021). Data Mining Examples: Most Common Applications Of Data
Mining 2021. https://www.softwaretestinghelp.com/data-mining-examples/
Taylor, Joanna; Pagliari, Claudia (2017). Mining social media data: How are research sponsors
and researchers addressing the ethical challenges?. Research Ethics, (), 174701611773855–.
doi:10.1177/1747016117738559
Udemy. (2016). How William Cleveland Turned Data Visualization Into a Science.
https://priceonomics.com/how-william-cleveland-turned-data-visualization/
Wang, Y. & Zhang, Y. (2019, June 13). Role of Big Data in the Smart Grid.
https://www.researchgate.net/publication/306330602_Energy_Big_Data_A_Survey#pf4
Yoshida, R. (2017). Tokyo is evasive on the report of a secret deal with the NSA over a mass
surveillance program.
https://www.japantimes.co.jp/news/2017/04/25/national/politics-diplomacy/tokyo-evasive-re
port-secret-deal-nsa-mass-surveillance-program/
61
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
APPENDICES
FEU DILIMAN
COLLEGE OF ACCOUNTS AND BUSINESS
LETTER OF REQUEST FOR ADVISORSHIP
Date: September 06, 2021
Faculty Member
Department of Accounts and Business
Accepted by:
62
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
FEU DILIMAN
CONSULTATION TIMESHEET
Section : DF41
63
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA MS. JACKIE LOU O. RABORAR, PHD
64
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
FEU DILIMAN
COLLEGE OF ACCOUNTS AND BUSINESS
OFFICIAL APPOINTMENT LETTER OF FACULTY ADVISORSHIP FOR THESIS WRITING 1
We are pleased to inform you of your official appointment as Research Adviser entitled:
“Data Mining and Netizens’ Awareness: Understanding its Implications on Social Media and Data
Privacy” of the following students:
Co, Chantel Dane A. Nebria, Samuel John P.
Fenis, Mikee T. Valeroso, Franco Enrique G.
Gali, Julianne Kay E. Valiente, Merl Polin C.
As a group adviser, you are expected to:
1. Provide the group his/her consultation schedule;
2. Attend the consultation sessions as specified on the consultation schedule;
3. Require the group to submit a copy of their RESEARCH progress.
4. Advise and help the group all about the scheme, format, and contents of their research proposal.
5. Feedback the Business Administration Department on the group attendance, by submitting every
week the consultation form duly signed by the group adviser;
6. Conduct the mock defense to familiarize the group in preparation for the final presentation.
7. Help the group do the revision and correction and;
8. Encourage or assist the group to complete and submit the final paper and revision of their
business research proposal on time.
This will be a very challenging but exciting task. Your support is highly appreciated.
I agree and accept the official appointment
65
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
LETTER OF COMPLETION
This is to respectfully and officially endorse to your good office the Research paper entitled “Data Mining
and Netizens’ Awareness: Understanding its Implications on Social Media and Data Privacy” done by the
group whose names appear below:
The final paper has been carefully scrutinized and evaluated by the undersigned. This endorsement
signifies that the group has satisfactorily and completely complied with the prescribed format and
requirements of the paper.
I hereby recommend for the Oral Defense on the date and place that may be set by the Officer-in-Charge
of the Department of Accounts and Business. Thank you.
Respectfully yours,
66
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Survey Questionnaire
Part 1:
Facebook
Twitter
Instagram
Youtube
Tiktok
Shopee
Lazada
PART 2: This section determines the respondents’ awareness of some of the modern capabilities
of data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
67
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
PART 3: This section determines the online transparency of social media users, therefore their
susceptibility to data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
68
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
PART 4: This section determines the priority of social media users as to whether they prioritize
privacy over convenience, vice versa. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)
69
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
70
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
71
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
72
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
73
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
74
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
Interview Questions
● Given that FEU Diliman has an extensive and efficient program for student and faculty
identification, Would you say that it is because of this system that it is easier for students
and professors alike to integrate themselves into canvas accounts, as well as to ask for a
generated password for internet access within the premises? What kind of data exactly is
collected in the school database?
● What kind of programs are used to create the school database? And is there a name for
the database?
● Were there any issues during the development of that system?
● Would you say that there is at least a process of data mining present in the collection of
student data?
● We are aware that data mining is one of the topics taught under our integrated IT subjects
such as business analytics, what do you think are the benefits of being taught about this
process?
● As educators specializing in Information Technology, what other functions of this school
make use of data mining? Do other schools do the same? Are there other functions in this
school that data mining could make more efficient and convenient?
● With the rapid development that can further augment the capabilities of data mining,
what are the possible current issues that data mining can solve that it has yet to address?
(foreign, local, within the school)
● What would you say are the advantages of data mining over more primitive data
collection methods?
● Is there any law or policy that we should all be aware of pertaining to data mining? Were
there any recent local controversies regarding the topic? (cite some cases from other
countries mentioned in the RRL if necessary).
● How would you say that government and law enforcement agencies prevent potential
crimes? Through social media sites? Would you say our local agencies also make use of
data mining similar to foreign governments?
75
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
● Can you provide a rough estimate or percentage as to how many Filipinos use social
media? Would you say that they are aware that what they do on those social media sites is
susceptible to being documented or collected through third parties and other means? (cite
Cambridge Analytica incident if necessary)
● As specialists in information technology, is there any assurance that our online data as
netizens are secure? How susceptible are we to data mining without our consent? Is there
any way to mitigate or prevent the intrusion of our online privacy?
● Where do we draw the line between the convenience brought by the output through data
mining and the privacy and security of our online data?
Validated by:
Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA Ms. Jackie Lou O. Raborar, PhD
76
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
CURRICULUM VITAE
77
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
78
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
79
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
80
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
81
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City
82