You are on page 1of 82

Far Eastern University – Diliman

Sampaguita Avenue, Mapayapa Village III,


Pasong Tamo, Quezon City

Data Mining and Netizens’ Awareness: Understanding its Implications on


Social Media and Data Privacy

A Thesis Paper
Presented to the FEU Diliman
In Partial Fulfillment of the Requirements
In Thesis

By:
Co, Chantel Dane A.
Fenis, Mikee T.
Gali, Julianne Kay E.
Nebria, Samuel John P.
Valeroso, Franco Enrique G.
Valiente, Merl Polin C.

Presented to:
Ms. Jackie Lou O. Raborar, PhD

1st Term, S.Y. 2021-2022


Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

APPROVAL SHEET

The Research Paper entitled, “Data Mining and Netizens’ Awareness: Understanding its
Implications on Social Media and Data Privacy”, prepared and submitted by Chantel Dane A.
Co, Mikee T. Fenis, Julianne Kay E. Gali, Samuel John P. Nebria, Franco Enrique G. Valeroso,
and Merl Polin C. Valiente has been approved and accepted as partial fulfillment of the
requirements for the degree of Bachelor of Science in Business Administration Major in
Financial Management and Business Analytics.

Ms. Jackie Lou O. Raborar, MBA, Ph.D Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA

Research Coordinator Research Adviser

Approved by the Committee on Oral Examination with a grade of PASSED on November 27,
2021.

Mr. Jojit C. Alcalde Dr. Myrna Lim Mr. Remy Manuel C. Pendon, CPA

Panel Member Panel Member Panel Member

Mr. Allan Joseph R. Bacus, MBA

Panel Chair

Accepted as partial fulfillment of the requirements for the degree of Bachelor of Science in
Business Administration Major in Financial Management and Business Analytics.

Mr. Allan Joseph R. Bacus, MBA

Program Director, Department of Accounts and Business

2
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

ACKNOWLEDGEMENT

First and foremost, praise and thank God, the Almighty, for his guidance and blessings

throughout this research work to complete the study successfully.

The researchers would like to express deep and sincere gratitude to the research coordinator of

this course, Ms. Jackie Lou O. Raborar, to the researchers’ adviser, Mrs. Ma. Jaysan Dasel

Cadete-Cruz, and to the researchers’ statistician, Ms. Karizza M. Abolencia for imparting their

knowledge and providing us guidance throughout this research. Their sincerity and motivation

have deeply inspired us to complete and continue this research.

The researchers would also like to thank the respondents in the pilot testing survey for sharing

their knowledge with us with all interest despite the difference between surveying in person and

online. The researchers would not be able to continue this research without their responses that

would support this study about the “Data Mining and Netizens’ Awareness: Understanding its

Implications on Social Media and Data Privacy”.

The researchers' thanks and appreciation also go to the colleagues and people who have willingly

helped with their abilities.

3
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

ABSTRACT

Data mining is a convenient method of organizing large sums of data, which in turn

identifies patterns now even beyond what is considered quantifiable; with the advancement of

technology considered, data collection methods once considered unfeasible prior are now

possible and in fact prevalent all throughout the industries that shape the economy and the means

where society converge as of late; particularly social media. As such, it is necessary to

understand the psychology of users towards the capabilities of data mining within the context of

social media; in parallel, social media users’ transparency with their information on such

platforms as well as their priority over convenience or privacy is crucial to determine a valid

conclusion on the users’ opinion towards data mining within social media as to whether it is a

necessity to develop more convenient functions or a disruption to privacy that can jeopardize

their online data. Additionally, it is vital to ask for expert opinions in order to gain a

comprehensive understanding of the modern context surrounding data mining such as to identify

what specific data is susceptible to being collected, the safety of social media users’ data, and the

possible solutions to address the issue at hand following suit. Understanding the psychology of

social media users towards unseen processes behind every interaction on such platforms is

necessary to provide a concrete answer as to where to find the balance between the convenience

brought by innovation, and to compromise our online data on any occasion.

4
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

TABLE OF CONTENTS
TITLE PAGE i
APPROVAL SHEET ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
TABLE OF CONTENTS v
Chapter 1. THE PROBLEM RATIONALE
1.1 INTRODUCTION 7
1.2 BACKGROUND OF THE STUDY 8
1.3 STATEMENT OF THE PROBLEM 12
1.4 SIGNIFICANCE OF THE STUDY 13
1.5 RESEARCH IMPEDIMENTS 15
1.6 DEFINITION OF TERMS 16
Chapter 2. RELATED LITERATURE
2.1 RELATED LITERATURE 19
2.2 THEORETICAL FRAMEWORK 32
2.3 RESEARCH PARADIGM 34
2.4 RESEARCH OBJECTIVES 36
2.5 RESEARCH QUESTIONS 36
Chapter 3. RESEARCH METHODS
3.1 RESEARCH DESIGN 37
3.1.1 LOCALE OF THE STUDY 38
3.1.2 POPULATION OF THE STUDY 39
3.1.3 SAMPLE SIZE 39
3.1.4 SAMPLING TECHNIQUES 40
3.1.5 DATA COLLECTION INSTRUMENTS 41
3.1.6 VALIDATION OF THE DATA COLLECTION INSTRUMENTS 42
3.1.7 METHOD OF DATA COLLECTION 51
3.1.8 METHOD OF DATA ANALYSIS AND PRESENTATION 52

5
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

3.2 ETHICAL CONSIDERATIONS 54


REFERENCES 55
APPENDICES 62
CURRICULUM VITAE 77

6
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

CHAPTER 1

THE PROBLEM RATIONALE

INTRODUCTION

Data mining is defined as the process of detecting anomalies, patterns, as well as

correlations within large data sets, producing productive inferences. The information produced

from the said process allows users to make informed decisions such as when mitigating risks and

costs, improving company and business relationships, and increasing revenue among other

common uses observable across many industries and disciplines today such as but not limited to

insurance, banking, and telecommunications (SAS, n.d.).

Given the benefits of data mining observed across professional fields and disciplines,

however, it is also important to note that while data mining is a convenient tool that can provide

accurate analysis from broader samples, data mining raises concerns surrounding privacy,

particularly when data supposedly collected randomly are able to identify specific individuals

once said data is aggregated together; particularly when data is amassed indiscriminately and

collects data that can be directly associated with such individuals such as names and basic

biographical information among others.

According to the survey of Management Accounting Quarterly in 2003 “65% of the

companies that responded to the survey apply data mining” for the reason of identifying trends

easier, developing smarter marketing strategies, and predicting customer’s behavior. While there

is a growing demand for industries that use data mining, does it mean that the level of awareness

on data mining is also increasing? Do people have enough knowledge about the effects of data

7
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

mining on social media and on their privacy? Are they sensitive to data privacy or does their

convenience matter more? The privacy of individuals is very important and as such, the

researchers want to know the perspective of social media users pertaining to data mining. To the

best of our knowledge, there is yet a publication pertaining to data mining within the context of

social media and data privacy to be made. As such, the researchers decided to conduct this study

to provide a better understanding of how data mining affects social media, its users, and their

privacy.

BACKGROUND OF THE STUDY

Data mining is a discipline that intersects between the fields of computer science and

analytics; organizations such as the SAS Institute, a significant producer of statistics and

business intelligence software, as well as the collaborative effort of five companies from various

industries representing the European Union under the European Strategic Programme on

Research in Information Technology (ESPRIT), have developed data mining models Sample,

Explore, Modify, Model, and Assess (SEMMA) and Cross-Industry Standard Process for Data

Mining (CRISP-DM) respectively; among other examples of software of similar function (Saltz,

2020).

Despite the differences found in the aforementioned data-mining models, however, the

process of data-mining itself remains the same: Data is first assessed according to quality and

industry standards, trivial information, as well as inconsistent or incomplete data, is filtered out.

It is then followed by inserting data in its respective databases, which may or may not overlap; it

is in this process that data is assessed to group with other complementary data. This is then

8
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

followed by compressing and clustering data according to the requirements of a specific

database, it is in this process that unnecessary attributes are eliminated if not included in the next

process. Data is then standardized and prepared, databases should contain uniform attributes of

specific data such as a database for biographic data containing various names, ages, addresses,

and the like, organized according to specific needs. Data is then mined, wherein patterns are

detected such as uptrends for certain searches on the internet in a specific period. The patterns

found in the mined data are then evaluated, where users or data scientists give their insights; and

it is here that data is then converted into valuable pieces of information with the use of graphical

representations, reports, and other business intelligence tools (Christiansesn, 2021).

Data mining is more than a process, rather, it is evidence of technological advancement

towards interconnectedness across various purposes and users as mentioned prior. As is evident

with the demand for birth certificates and NBI clearances in the Philippines as proof of existence

and or legitimacy of a person’s details, which can be observed to be required most especially

within the context of job application among other scenarios (Perdigon, 2017). On a broader

scale, changes in population, as well as possible causes thereof, can be analyzed through the

large amounts of data composed of countless relevant documents such as the certificates

mentioned prior. More than evident, data mining is utilized across various scales of data, small

and large.

One cannot deny that the growth of technology has significantly changed how people

share data and information. These changes can be observed through the use of the Internet in

various platforms available online such as search engines, social media networks, creative

9
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

content outlets, communications services, payment systems, and many more. However, the

number of users in social media networks is significantly higher compared to other platforms.

Different statistics show that more than half of the world’s population now use social media,

hence opening the door for these networks to pull data from their users that can be used in

various research and studies. However, there is a lack of ethical guidelines for research arising

from social media mining thus the need to raise awareness for ethical research practice. (Taylor,

2017).

As of late, there are 4.48 billion users of social media around the world, equating to more

than half of the world population regardless of age or internet access as mentioned in a statistical

report related to search engine optimization (Dean, 2021). Different statistics show that more

than half of the world’s population now use social media, hence opening the door for these

networks to pull data from their users that can be used in various research and studies. However,

there is a lack of ethical guidelines for research arising from social media mining thus the need to

raise awareness for ethical research practice. (Taylor, 2017).

A recent study conducted by Injadat et al in 2016, wherein a total of 66 articles were

reviewed to find out data mining techniques that are present in social media. In their study, they

found out that data mining techniques on social media have various strengths and weaknesses

depending on the type of informative data required and call for more profound research that

takes into account accurate implementation of data mining techniques in the academic and

industrial sectors. An article from Statista Research Department published in August 2021 shows

that 81.3 million Filipinos are active social media users. Despite these numbers, data mining

10
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

remains an unfamiliar term in the Philippines. Instead of these unfamiliarities, the researchers

would like to fill in the gap of knowledge about data mining, specifically its influences on our

society. The researcher chose this topic to shed light on how social media networks use our data

and information in various ways. This in return would raise awareness to Filipino social media

users to be vigilant in sharing their private information and data online.

In recent years, data mining and information extraction have emerged as critical areas for

businesses and researchers in a variety of industries. In order to develop information, a huge

amount of data must be collected. Simple numerical numbers and text documents, as well as

more complex information such as location data, multimedia data, and hypertext pages, can be

employed. Data privacy is a widespread concern for people as well as corporate and public

entities. In most cases, the field and operations of data mining result in major data security and

protection difficulties. A notable example would be a retail company taking down a customer's

grocery list. This information can provide a clear indicator of customer interest in a variety of

products (Kade, 2019). The most common view expressed by businesses and governments using

social media data mining to understand their consumers is that it leads to less privacy and more

surveillance. Individuals' private information and sensitive information are collected in order to

create customer profiles and understand user activity patterns. Illegal access to information and

the confidentiality of information are becoming major concerns. Large volumes of sensitive and

private information about persons or businesses are obtained and retained when data is taken

from users. Given the sensitive nature of some of this data and the possibility of unauthorized

access to the information, this becomes contentious.

11
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Data mining in itself has long been a topic of controversy often bypassing ethical

standards due to several previous accounts of being instrumental in the infringement of data

privacy (IvyPanda, 2019). However it is a matter of how it is utilized that changes the perception

of the public towards data mining, as to whether it is only used exclusively in the context of

manipulation of public opinion, mass surveillance, and other ideas that may incline to what can

be considered as conspiracy theories or as a tool that can help users collect reliable information,

allowing users to make judgments, decisions, and adjustments at a pace once unthinkable by

yesterday’s standards (Keary, 2019). This example of radical progress towards information

technology and other parallel fields must be made known to the public, to ascertain critically the

compromises and conveniences with which data mining has to offer as well as the public’s

knowledge and awareness surrounding such technological and statistical advancement.

STATEMENT OF THE PROBLEM

The researchers aim to assess the Data Mining and Netizens’ Awareness: Understanding

its Implications on Social Media and Data Privacy. The researcher will make use of a qualitative

and quantitative approach to gather both primary and secondary data and information.

Specifically, this study seeks to determine whether Filipino netizens value their data privacy

more than convenience, or whether they are ready to sacrifice convenience for privacy, or if there

is a middle ground. The research aims to assess public opinion on data mining and assess what

can or should be done with the results to determine the general inclination of netizens towards

data mining either as a tool for convenience or eavesdropping. Data mining is fraught with

difficulties. Data mining extends beyond the academics' understanding of information

12
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

technology as a collection of codes and programs. It's utilized in a variety of settings, including

but not limited to banking, research, and surveillance to discern the transparency of netizens

specifically with their online data. The researchers want to know what specific bits of

information are gathered from social media users by different sectors. The researchers seek to

have a better understanding of the challenges surrounding netizen data privacy and security.

Researchers prefer to address severe issues like sites or applications that unlawfully sell data to

ascertain how aware are netizens of the presence of data mining on social media sites. As a

result, some users are hesitant to provide personal information even on suitable sites. The

researchers seek to discover what data is collected from internet users according to data-mining

program users. Finally, researchers want assurance that the personal information provided by

netizens is truly secure and confidential. Researchers also like to know where data flows because

not everyone who uses the internet is familiar with data mining to seek solutions from

professionals that can help netizens protect their private information.

SIGNIFICANCE OF THE STUDY

Data mining has received great interest in the information industry in recent years as a

result of the widespread availability of huge amounts of data as well as the methods required to

transform data into valuable information and knowledge. Specifically, the researchers aim to

provide valuable insights which can be of use to the following groups:

13
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Netizens

To promote awareness of the use and collection of data, it is important to be made known

the processes and approaches applied in data mining; as well as the data itself, the scales of data

used, what data is collected by whom, among other findings. Netizens must understand how their

social media information is used and, as such, how it is controlled, processed, and filtered.

Society

Social media has served a significant role in society, in which people widely use social

media platforms to express and criticize their opinion. As a result of the information provided in

this study, it shall build and raise data mining awareness, as well as promote every citizen's

conscious use of information, which includes information security understanding and the

development of acceptable attitudes and techniques to prevent information vulnerability.

Businesses

It will give organizations a broader view because data mining is widely utilized in a

variety of applications such as survey research marketing, product analysis, demand and supply

analysis, investment trends in bonds and stocks, telecommunications, e-commerce, and so on. In

today's highly competitive corporate world, data mining is critical. As a result, data mining can

be considered a natural evolution of information technology.

14
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Future Researchers

This study is beneficial to future researchers, it is projected to be one of the bases to

review the various data mining trends from its inception to the future. It shall be helpful to

researchers to focus on the various issues of data mining given the constant need for

development in the field of data mining, specifically the design of the techniques used to gain

significant results in today’s competitive global marketplace. Data mining techniques possess

great potential for developing new sets of tools that can be used to enhance or circumvent the

privacy of the average person, increasing customer satisfaction, as well as providing safe and

useful products at reasonable and economical prices.

RESEARCH IMPEDIMENTS AND LIMITATIONS OF THE STUDY

This study covers the Netizens’ Awareness of Data Mining. The collection of data will be

conducted on Filipino netizens or social media users, ages 16 and above, regardless of gender.

Respondents will be given survey questionnaires to determine the overall public opinion towards

data mining from the perspective of the netizens. Researchers will also interview IT specialists or

IT professors of FEU Diliman to give the researchers a more precise interpretation of the

implications of data mining on social media and its ramifications on modern society. The results

of this study are intended to assess how data mining affects social media, its users, and society in

today’s generation.

Due to the pandemic, the researchers might encounter hindrances in obtaining

participants since it would be risky for the group to conduct physical interviews. Most likely, the

15
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

practical way to achieve this is through e-mails or via Messenger, virtual meetings, and other

online platforms. However, the group might also struggle in brainstorming or sharing one's ideas

through video conferencing. The face-to-face or physical interaction among members is more

effective and engaging than remote meetings due to some challenges like adaptability, struggle,

technical issues, and time management.

DEFINITION OF TERMS

For clarification, the following terms used in this study have been defined. The following

terms are:

Algorithm – a finite sequence of well-defined instructions

Analytics – systemic analysis of data and statistics

Artificial Intelligence – intelligence demonstrated by machines; the ability of computers or

robots to do tasks

Business Intelligence - the procedural and technical infrastructure that collects, stores, and

analyzes the data produced by a company's activities

Causal relationship – occurs when a data set has a direct influence on another variable;

triggering the occurrence of another event

Computer nodes - any physical device within a network of other tools that's able to send,

receive, or forward information

16
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

CRISP-DM - an open standard process model that describes common approaches used by data

mining experts. It is the most widely-used analytics model.

Data fusion - the process of integrating multiple data sources to produce more consistent,

accurate, and useful information than that provided by any individual data source

Data mining - a process of detecting anomalies, patterns, and correlations within large data sets

in order to make predictive inferences, aiding in making informed decisions

Data warehouse - a system used for reporting and data analysis and is considered a core

component of business intelligence

Deep learning - a type of machine learning and artificial intelligence that imitates the way

humans gain certain types of knowledge

Machine learning - a type of artificial intelligence that allows software applications to become

more accurate at predicting outcomes without being explicitly programmed to do so; uses

historical data as input to predict new output values

Netizens – a buzzword pertaining to active participants in the online community of the Internet,

will be used interchangeably with the term “social media users” in this paper

Neural network - a series of algorithms that endeavors to recognize underlying relationships in

a set of data through a process that mimics the way the human brain operates

17
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Regression analysis - a set of statistical processes for estimating the relationships between a

dependent variable and one or more independent variables

Search engines - a software system that is designed to carry out web searches

SEMMA - Sample, Explore, Modify, Model, and Assess; a list of sequential steps developed by

SAS Institute, guides the implementation of data mining applications

18
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

CHAPTER 2

RELATED LITERATURE

Overview

Data mining is defined as the process of detecting patterns and vital information inferred

from particularly large data sets collected primarily for business intelligence activities (IBM

Cloud Education, 2021). Said data sets are stored in data management systems otherwise known

as data warehouses which process large amounts of data from various sources; with which by

time, produces records vital to the analyses of data scientists among other users of said data,

given that mined data are derived from multiple sources to be processed synonymously, data

warehouses have since been considered by organizations as their “single source of truth” (Oracle,

2020). That said, however, data mining is distinct from data analysis in the sense that data

analysis is focused on testing the efficacy of models as well hypotheses, data mining, on the

other hand, emphasizes the use of machine learning and statistical assumptions to detect patterns

from large data sets otherwise undetected by human interference primarily due to said quantity of

data and the possibility of human error, which data mining reduces to a significant extent

(Pedamkar, 2021). Furthermore, data mining relies on structured data and aims to make data

more functional, creating data trends and patterns through mathematical and scientific models

(statistical models); whereas data analysis is a more holistic approach performed on multiple

kinds of data to make business decisions (as an example), relying on the visualization of results,

creating insights through business intelligence and analytics models (Sarangam, 2021).

19
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Timeline

Data Mining is said to have been conceptualized in 1763 as a component of the Bayes’

theorem which attempts to find the relation of current to former probabilities among other

objectives, this theorem would then be instrumental to the process of data mining itself given that

the nature of data mining deals with potential current or future outcomes derived from previous

data (Li, 2016). This would then be followed by the development of regression analysis upon

Legendre and Gauss’ attempt to determine celestial bodies orbiting the Sun in 1805, regression

analysis would then be used by researchers to understand causal relationships through

observational studies and in the field of finance particularly involving the equation of Capital

Asset Pricing Model (CAPM) (Corporate Finance Institute, 2021). In 1936, Alan Turing

proposed the idea of a machine capable of performing rapid computations on large amounts of

data which would serve as the fundamental idea behind the modern computer (Liesbeth, 2021).

A few years later, in 1943, the conceptual model of a neural network would then be

conceptualized by Warren McCulloch and Walther Pitts, proposing that neurons are primarily

capable of three functions which are to receive, process, then generate input (McCulloch, 2016).

Lawrence Fogel would then create the first company in 1965 (Decision Science, a subsidiary of

Titan Systems, Inc. since 1982) to apply computation to address gaps such as the development of

missile evasion of military hardware, medical diagnostics, risk management, and other fields

now reliant on computational intelligence (ETHW, 2016).

In 1975, John Henry Holland would then have written his book Adaptation in Natural and

Artificial Systems; which proposes the theoretical foundations of how data mining is understood

20
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

today (Grimes, 2015). Two decades later, in 1992, Bernhard E. Boser, Isabelle M. Guyon, and

Vladimir N. Vapnik would then improve the support vector machine, a supervised learning

model, to create nonlinear classifiers (functions used to separate instances that are otherwise

inseparable through linear classification) to enhance its data analysis and pattern recognition

properties for classification and regression analysis (Paul, 2017). Nine years later, data science

would have been then formally introduced by William S. Cleveland as a stand-alone discipline

(Udemy, 2016). In 2015, the White House elected Dhanurjay Patil as its first Chief Data Scientist

(CDS) (Patil, 2015).

Examples of Purposes

Data mining is present across countless professional fields due to its many potential uses;

likewise in the context of healthcare is used to measure the effectiveness of certain treatments,

according to an article from the Morsani College of Medicine, it is with the aid of data mining

that they can find the best course of action on how to treat patients with their respective

conditions at a pace more rapid and efficient than ever due to the nature of data mining as to

collect data containing necessary remarks and quantifiable results, making following diagnosis

and treatment processes more efficient (USF Health, 2021). Furthermore, it is also stated from

the same article that data mining is also applied to detect fraud and abuse in the sense that it is

used to assess potential abnormalities in prescriptions and insurances, was stated that the Texas

Medicaid Fraud and Abuse Detection System was an ideal example as in the year 1998, it was

able to recover $2.2 million in stolen funds as well as a total of 1,400 individuals for

investigation.

21
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Likewise in business, data mining has several purposes to encompass most processes, as

is evident with service providers collating billing details, interactions, customer visits into the

providers’ websites among others in which such compiled data is analyzed to give service

providers an idea as to what incentives or promotions to offer (Matillion, 2020). Aligned with

this, service providers in Malaysia such as Celcom and Maxim have been known to assess their

performance through their customers’ feedback on Twitter which would be illustrated in the form

of graphs from the said application (Burhanuddin et al., 2021). Likewise in retail, valuable data

is collected and analyzed to give retailers an idea as to what products to offer depending on

variables like the seasonality or trends surrounding products; as such, data such as sales,

transactions, as well as demographics are valuable for retailers to measure risk and make

decisions of varying degrees of importance and urgency (Bhasin, 2019).

As such, online advertisements have begun to become personalized, to suit the

preferences of existing and potential consumers through prediction modeling; which is

recognized as the predictive nature of data mining (Karim & Sirwan, 2019). The electronic

commerce industry has also been known to employ the use of data mining to assess historical

financial, inventory, and transaction data to optimize enterprise resources and detect fraud and

abnormal events as observed with the likes of Amazon and Shopee (Chen, 2017).

Data mining is also present in the field of finance as it can be used to detect trends,

spikes, and downtrends in sales through compiled historical data, as well as to determine the

flow of capital thereof, and to measure the ratio between profit and sales (Datalya, 2018).

Likewise, programs have been developed for traders who engage in trading equities and

22
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

currencies to automatically trade requiring less human interaction, these programs predict future

outcomes such as uptrends and dips using previous data as employing the use intelligent analysis

of statistics inferred from the constant fluctuations in prices of such commodities (Popescu,

2021).

Lately, data scientists have conducted innumerable studies on deep learning algorithms

which possess the potential to further improve the capabilities and break the current boundaries

of data mining specifically through brain simulations to improve the capabilities and the current

use of learning algorithms, to make further advances in machine learning and artificial

intelligence, and to make radical improvements to that of AI, all of which are said to be possible

due to the modern data mining’s use of larger neural networks, and thus significantly

overshadowing prior learning algorithms, and can therefore potentially expand its uses to more

industries (Brownlee, 2020).

Foreign Controversies

Despite the convenience which data mining provides, however, privacy and data mining

have been more intertwined than ever, augmented as a result of prior events particularly Edward

Snowden, a former employee of the National Security Agency, exposing the said agency’s

misuse of data mining as to conveniently eavesdrop on citizens within and beyond United States

territories through the Planning Tool for Resource Integration, Synchronization, and

Management (PRISM) program which was proposed to prevent any further terrorist attacks akin

to the September 11 attacks which cost several lives and a negative impact on the U.S. economy

23
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

(Houston, 2017). To further expand on other tools used by the NSA, said agency employs

BOUNDLESS INFORMANT, which is a database tool that uses metadata to summarize

collected data of which has gathered data primarily focusing on telephony and internet data from

countries such as Germany, France, and Spain among other countries (ET Bureau, 2019).

Another program employed by the aforementioned agency was Tempora, which was spearheaded

by the NSA and maintained by the British Government Communications Headquarters to

intercept fiber-optic cables (Leprice-Ringuet, 2018). Yet another program, XKEYSCORE, is

used by the same agency, a Linux software also leaked by Edward Snowden that uses the Apache

web server and stores data in MySQL databases which enables users of said program to access

any individual’s emails, and has been reported to be also used by the Bundesnachrichtendienst,

Australian Signals Directorate, and Japan’s Defense Intelligence Headquarters among other

intelligence agencies (Yoshida, 2017).

However, according to an unclassified document from the Central Intelligence Agency,

their use of data mining is to oversee a balance between the acquisition of information and the

protection of private interests due to its duty (that of said agency) to protect the rights of the U.S.

citizens, emphasizing that data mining must be used responsibly, hence, evidence of conflicting

interests between agencies even of the same country observed through the use of data mining

(Central Intelligence Agency, 2016). It must be addressed however that the CIA, as well as other

members of the United States Intelligence Community, outsources its data-mining activity to

Palantir Technologies, which has had its share of controversies recently, particularly for

colluding with Cambridge Analytica, a British consulting firm, which has been accused of

24
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

influencing U.S. elections as well as the Brexit vote (Bensaid, 2020). This raises concerns

particularly about the CIA’s handling of data as well as augmenting its reputation for acquiring

information through questionable means as evident to prior projects. Furthermore, another online

article emphasized the CIA’s incompetence in handling data collected a year later after the data

mining report conducted by the CIA mentioned prior; amounting to a total of 34 terabytes worth

of compromised data stolen by an undisclosed CIA employee which was uploaded to WikiLeaks

as a result of a lack in security which the Center for Cyber Intelligence, was held accountable for

(Homeland Security Today, 2020).

Constitution and Domestic Authority

In 2012, Republic Act 10173, also known as the Data Privacy Act (DPA) of 2012, was

passed. This law pointed that although it is the State’s duty to ensure free flow of information in

order to promote innovation and growth in the country, it also recognizes its citizen’s

fundamental right to privacy of communication and honored the states inherent obligation to

ensure that personal information in information and communication systems whether in

government or private sectors are secured and well protected (Ecci International, n.d.). The law

created the National Privacy Commission (NPC) that will be in charge of monitoring compliance

of the Philippines with the international standards set for data protection. The NPC is also tasked

with administering and implementing the provisions of the DPA. The NPC was formally

organized in 2016 and is headed by privacy commissioner Raymund Liboro, assisted by two

deputy commissioners, namely Dondi Mapa and Ivy Patdu; the NPC also follows their mandate

to ensure that they themselves follow suit in accordance with said law, also to accept and solve

25
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

complaints, to issue cease and desist orders, to ensure that local authorities follow and respond

with accordance to the aforementioned act, to also promote a culture of protecting the data

privacy of all individuals and to guide such parties that seek their assistance in observing said

culture, as well as to enforce data privacy laws across borders (National Privacy Commission,

n.d.).

In connection to our study, we emphasize three general principles with respect to the

collection and processing of personal data according to the NPC. The entities covered by the Act

and the Implementing Rules must adhere to the following principles: transparency, legitimate

purpose, and proportionality. The principle of transparency relates to social media and data

mining as it requires the purpose in processing a person's data to be determined and disclosed

before its collection. Transparency allows the data subject to be aware of the nature, purpose, and

extent of the processing of his or her personal data, including the risks involved, the identity of

the personal information controller, their rights as a data subject, and the way these are often

exercised. Information and communication that is related to the processing of personal data must

be easily accessible and understood by the data subject. The principle of legitimate purpose

requires the gathering and processing of information must be compatible with the specified

purpose, and must not be contrary to law, morals, or public policy. The principle of

proportionality requires that the processing of data should be adequate, relevant, suitable,

necessary, and not excessive to the required purpose. Personal data shall be processed only if the

purpose of the processing could not be fulfilled by other means (National Privacy Commission,

2016).

26
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

It is also worth noting that the Philippine National Police Anti Cybercrime Group

(PNP-ACG) makes use of J48, an open source Java algorithm, in order to mine texts, with which

said algorithm has analyzed and identified online scams; totalling to a dataset consisting of 82

documents with a total of 14,098 mainly Filipino words or attributes in order to respond to

complaints regarding online scams which has been reported to have significantly increased from

a double-digit figure to a triple-digit figure from 2013 in a span of four years to 2017 according

to a publication from De La Salle University titled Document classification of Filipino online

scam incident text using data mining techniques (Palad, 2018).

Netizens views in Data Mining

McRobbie and Thornton undertook their stock-taking precisely as media systems were

further destabilized, focusing on the diversity of traditional media space. With the dawn of the

twenty-first century, digital platforms have come to underpin and even define social life in rich

cultures, with individuals' identities and relationships being fostered at least in part through

computing infrastructures (Lupton, 2018). Social media is one of the most visible expressions of

'digital societies.' Whether as social networking (Facebook), microblogging (Twitter),

photo-sharing (Instagram), or video-sharing (YouTube) sites, social media has fundamentally

changed how information is produced and exchanged. They promote vernacular discourse and

creativity as "many-to-many" communication networks, allowing regular users to produce and

disseminate massive amounts of "user-generated content" (Yar, 2014). Digital platforms are also

displacing traditional media as a source of information. Finally, their nature as loosely coupled

networks of users not only promotes virality – the quick and unpredictable spread of content –

27
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

but also creates a broad virtual sociality (Baym, 2015). Various features - such as 'likes,

"retweets,' hashtags (#), mentions (@), and so on – index and anchor communications, enhancing

awareness of others and linking spatially dispersed users into communities of shared interest and

identifiability (Murthy, 2013).

According to the survey, 81% believe the hazards of companies collecting data about

them exceed the advantages, while 66 percent believe the same about government data gathering.

In related news, 72 percent of adults think they get very little or no value from companies

collecting data on them, and 76 percent feel the same thing about government data collection.

Companies collect data with the goal of profiling customers and perhaps focusing the sale of

goods and services to them based on their attributes and habits. As per the survey, 77% had heard

or read at least a little about how businesses and other organizations utilize personal data to

target commercials or special offers, or to estimate how risky customers might be.

Approximately 64% of all adults claim they've received advertisements or solicitations based on

their personal information. And 61% of those who have seen advertising based on their data feel

the ads at least somewhat correctly reflect their interests and characteristics. (This equates to 39

percent of the adult population.) (2019, Auxier et al.). Nowadays, netizens embark on a prevalent

lifestyle to actively voice out their opinions online that includes both forums and social

networks. Their opinions which initially are intended for their groups of friends propagate to the

attention of many. This pool of opinions in the forms of forum posts, messages written on

micro-blogs, Twitter and Facebook, constitute online opinions that represent a community of

online users. The messages might seem to be trivial when each of them is viewed singularly; that

28
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

said, however, the converged sum of them serves as a potentially useful source of feedback to

current affairs after analysis. A local government, for instance, may be interested to know the

response of the citizens after a new policy is announced, from their voices collected from the

Internet. However, such online messages are unstructured, their contexts vary greatly, and that

poses a tremendous difficulty in correctly interpreting them.

As the rapid growth of information techniques utilized in different domains and

applications, public opinion in online education refers to the tendentious individual attitude and

subjective will expressed by netizens around the occurrence, development, and change of a

certain educational phenomenon (including ideas, events, figures, policies, problems, etc.) in the

virtual space of the Internet. With the further development of mobile Internet and social network

technologies, the expression forms of education public opinion are increasingly diversified,

especially the public opinion data has shown multi-dimensional characteristics. In recent years,

public policy formulation, implementation, and policy performance evaluation, including

education policy have paid more and more attention, and the network has become the most

concentrated display platform of national discourse. Furthermore, with the deepening of the

comprehensive reform of education, the education authorities also hope to receive feedback from

the people on the adjustment of education policies and reform measures, to provide the basis for

the decision-making of education reform. However, current works on education public opinion

analysis focus on qualitative analysis of educational events, such as the use of questionnaires and

other forms, resulting in limited data sources, and the analysis angle and contents are also limited

to the analysis of statistical results (Liu et al., 2021). With the development of big data and data

29
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

mining technologies, we can more in-depth clarify the law of education public opinion

dissemination, and mine the deep-seated views of educational events. Thus, valuable information

can be extracted from massive, noisy, and fuzzy data (Liu et al., 2021). Therefore, taking the

cross-dimension data of public opinion in online education as the starting point, this paper makes

an in-depth study on such data.

Social Media Users

The amount of individuals using social media is growing, with 4.48 billion people

expected to use it in July 2021. This translates to more than 57% of the global population. The

number of persons accessing social media has increased by 13.1% in the last year. Throughout

July 2020 and July 2021, 520 million new users joined social media, with an average of almost

1.4 million new users every day. More than nine out of ten internet users use social media at least

once a month, and six social media networks have more than one billion monthly active users.

The Philippines had 89 million social media users in January 2021. The number of Filipinos

using social media increased by 16 million (+22%) from 2020 and 2021. Social media users in

the Philippines stood for 80.7% of the total population in January 2021. (Kemp, 2021).

For the third year in a row, the Philippines holds on to its title of being the social media

capital in the world and this is based on the Global Digital Report of 2019 by We Are Social.

Based on their report, there are 76 million active Filipino social media users which covers 71%

of the entire population. These users spend an average of four hours a day on different social

30
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

media platforms which is quite questionable considering that the country’s internet speed

averages at 15 Mbps only (Estares, 2019).

Many factors affect social media growth in the Philippines. One of the factors is the

socio-economic aspect of the Filipinos. Having a wide range of cheap smartphones in the

country, it creates more potential users of social media from a wider socio-economic

demographic. This may also explain the 67% of social media users accessing via mobile devices.

Next factor is the Age of social media users. According to the Digital Report conducted by We

Are Social in 2019, the major group of social media users within the Philippines are in the 18-24

age range. It makes up 33% of active users with around 21 million users. With this report, it is

interesting to understand how intertwined social media platforms are with the user’s social,

personal, and academic life as the age range includes students in the university up to the early

career age. Nowadays, it is a common practice among classes to utilize social media platforms as

a virtual meeting place to use outside the classroom since social media platforms are created to

be easily accessible and to offer more room for interaction than the typical bulletin board inside

the classroom. Aside from that, the cultural aspect of the Filipinos is also one of the factors that

affects social media growth in our country. Filipinos are very social people and are known for

their hospitality. With an estimated 10.2 million Filipinos abroad, the culture of using social

media platforms to connect with their families helps bridge the distance in a way that other

applications cannot provide. Therefore, social media is still - after all - a means of connecting

(Estares, 2019).

31
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Social media in the Philippines will continue to grow and become part of every Filipino's

day to day life. Whether this will be a positive or negative experience for everyone, it will

depend on how the users are knowledgeable about data mining, and their ability to use social

media as something that can impact their life or just a tool that will compromise their private

information.

THEORETICAL FRAMEWORK

Figure 1. Three-tier Big Data Mining Architecture

The theory of the Three-tier Big Data Mining Architecture established by Wang supports

this research. (2019). Figure 1 depicts a generally recognized data mining analytical architecture

32
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

with three layers: data access and computing, data privacy, domain knowledge, and large data

mining method. The theory was based on data mining techniques and philosophy.

The figure shows that the inner core data mining platform is primarily responsible for

data access and computation, according to Wang (2019). With the growing volume of data, it's

more important than ever to consider distributed storage of large-scale data while computing.

That is, data analysis and task processing are broken down into numerous subtasks and run in

parallel on a large number of computer nodes. The structure's middle layer is critical for

connecting the inner and outer layers. The inner layer's data mining technology provides a

foundation for data-related activities in the intermediate layer, such as information exchange,

privacy protection, and knowledge acquisition from regions and applications, among other

things. Information exchange is not only the assurance of each phase in the process but also the

objective of processing and analyzing large data in the smart grid. Data fusion technology is used

at the outer layer of the architecture to preprocess heterogeneous, unpredictable, incomplete, and

multi-source data. Complex and dynamic data will be mined after preprocessing, and ubiquitous

smart grid global knowledge will be acquired by local learning and model fusion. Finally, the

model and its parameters must be tweaked in response to the feedback.

The theory of the three-tier big data mining architecture states that ‘When data mining is

used properly, it may provide you with a significant competitive edge by allowing you to

understand more about your consumers, develop successful marketing strategies, boost revenue,

and save expenses.’ According to Wang and Zhang (2019), data mining can provide answers to

business issues that were previously too time-consuming to answer manually. Users can uncover

33
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

patterns, trends, and correlations that they might otherwise overlook by employing a variety of

statistical approaches to examine data in various ways. They may use the information to forecast

what will happen in the future and take action to affect company results; relating to the study of

determining Data Mining and Netizens’ Awareness: Understanding its Implications on Social

Media and Data Privacy.

RESEARCH PARADIGM

Figure 2. The Conceptual Framework for Data Mining and Netizens’ Awareness:
Understanding the Implications of Data Mining on Social Media and Its Ramifications on
Modern Society

The figure above exhibits the concept for the study of “Data Mining and Netizens’

Awareness: Understanding its Implications on Social Media and Data Privacy”. The figure

included the uses of data mining across various industries as the independent variable, the

extraction of user’s data serves as the mediator, and the dependent variable contains the

implications of data mining to netizens.

34
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

The independent variable comprises seven parameters followed by the mediating variable

with two parameters. The seven parameters indicated on the independent variable are the root

cause of the existence of the mediating variable. The collection of users’ data is required for

industries to utilize the benefits of data mining. The mediating variable then affects the

dependent. The dependent variable has two parameters, these are the significant and insignificant

implications of data mining to the netizens. The first parameter contains the significant or

positive implications of data mining to the users while the other parameter consists of

insignificant or the users’ issues of the application of data mining. The extraction of users’ data

causes agony to the netizens as their identity and privacy are involved. Data mining is greatly

used today by many companies to study patterns in a huge set of data so they can learn more

about their customers. Industries that utilize data mining are capable of having a competitive

advantage, greater understanding of their customers’ characteristics, useful predictions of future

trends, and effective operations management.

The purpose of the framework is to identify how data mining works, what type of user’s

data is extracted, and determine how and where the data are stored to promote awareness and

provide proper understanding to the netizens. This can also help users to protect their privacy,

minimize the negative impact of the issues associated with data mining, and appreciate the

advantages of data mining.

35
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

RESEARCH OBJECTIVES

The study on Data Mining and Netizens’ Awareness: Its Implications on Social Media

and Data Privacy seeks to attain the following objectives:

1. To determine the general opinion of netizens towards data mining either as a tool for

convenience or disruption to privacy.

2. To ascertain how aware are netizens of the presence of data mining on social media sites.

3. To discern the transparency of netizens specifically with their online data.

4. To seek solutions from professionals that can help netizens protect their private

information.

RESEARCH QUESTIONS

Specifically, it sought to answer the following questions:

1. Are netizens more inclined to associate data mining with making online content more

convenient or consider it as a tool that can compromise private information?

2. Are netizens aware of the capabilities of data mining occurring in social media?

3. How transparent are netizens with their data in their social media accounts?

4. Do netizens prioritize convenience over privacy in terms of social media access, vice

versa?

36
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

CHAPTER 3

RESEARCH METHODS

RESEARCH DESIGN

According to De Vaus (2006) “The research design is the overall strategy chosen to be

used in integrating the different parts of the study in a coherent, logical way. It constitutes the

blueprint for the collection, measurement, and analysis of data.” The research design used was

chosen based on the objectives of this study.

In this study, the researchers involved both qualitative and quantitative design to enhance

the research and prove the reliability and validity of the study.

The qualitative method was applied in this study to emphasize the quality of information

and processes that are not mathematically examined or measured in terms of quantity, amount,

intensity, or frequency.

The quantitative method was used in this study to emphasize objective measurements

including the statistical, mathematical, and numerical analysis of data collected through

questionnaires, interviews, and surveys.

The study utilized the said approach as it is meant to determine the implications of data

mining as well as its ramifications on modern society by conducting surveys and interviews with

participants, collecting their descriptive data, and analyzing it using statistical methods to

provide reliable results.

37
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

LOCALE OF THE STUDY

The study will be conducted in the National Capital Region (NCR) since the chosen

participants are netizens and IT professors that currently reside in NCR.

Figure 3. Location Map

The National Capital Region (NCR) is also known as Metropolitan Manila. It is the

country’s political, economic, and educational center. On November 7, 1975, Metro Manila was

formally established through Presidential Decree No. 824, under the management of the

Metropolitan Manila Commission.

38
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

POPULATION OF THE STUDY

The population is as an entirety of all the subjects and objects that conform to a set of

specifications compromising a group of people that is of interest to the researcher (Polit and

Hungler 1999). The study population consisted of netizens (social media users) and IT specialists

in the Philippines. The aforementioned subjects were determined to participate as the population

of the study as they can provide different perspectives toward data mining and provide precise

interpretation of the implications of data mining on social media and its ramifications on modern

society, which can help the researchers to assess the purpose of this study.

SAMPLE SIZE

39
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Mouton (1996:132) defines the sample as the subjects selected from a group with the

intent to discover something about the total population from which they are taken. The sample of

the population of this study included a minimum of 250 Filipino netizens or social media users

and 3 Filipino IT Specialists. Convenient subjects were selected to participate in the study until a

sample size of 250 was attained. A convenient sample is composed of subjects involved in the

study because they occur to be in the right place at the right time (Polit & Hungler 1993:176).

The 250 Filipino netizens, whose age ranges from 16 and above, currently residing in NCR and

were chosen via email and messenger, while the 3 Filipino IT professors were selected from FEU

Diliman. The sample size of 250 netizens and 3 IT professionals were the total subjects who met

the sampling criteria and were willing to cooperate in the study.

SAMPLING TECHNIQUES

This study used non-probability-convenience sampling as a sampling technique.

Convenience Sampling is a method of collecting samples by taking samples that are

conveniently located around a location or internet service (Edgar, et al 2017). Convenience

sampling involves using respondents who are “convenient” to the researcher (Galloway, 2005).

Since physical distancing is highly encouraged during this pandemic, this sampling technique

was chosen by the researchers due to their limitations to conducting this study physically or

face-to-face.

40
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

DATA COLLECTION INSTRUMENTS

The instrument used was a checklist of researcher-made interview questions and a

checklist of researcher-made questionnaires to gather the necessary data for the Data Mining and

Netizens’ Awareness: Understanding its Implications on Social Media and Data Privacy. Based

on the researcher's readings, previous studies, technical literature, published and unpublished

thesis relevant to the analysis, the draft of the questions was drawn up. The criteria for the design

of good data collection tools were considered in the preparation of the instrument. For reference,

to satisfy the information preparedness of the participants, the statement explaining the

circumstances or difficulties involved were toned down. Open-ended options have been given to

accommodate free-formatted perspectives on topics or issues. In this manner, the instrument is

allowed to obtain the netizens’ and IT Professors’ valid responses. A variety of research

assumptions are based on the preference for the use of standardized questions, such as a) the cost

of being less costly means of data collection, b) the avoidance of a personal bias, c) less demand

for an immediate response, and a greater sense of privacy for the participants. Ultimately, it

promoted transparent solutions to critical issues at hand. Furthermore, few consultants and

advisers validated the instrument before it was laid on for the analysis.

41
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

VALIDATION OF THE DATA COLLECTION INSTRUMENTS

Survey Questionnaire

Part 1:

Name: _______________________________ Age: ________________

Email: _______________________________ Gender: ______________

Social Media Account (check all the accounts you have):

 Facebook
 Twitter
 Instagram
 Youtube
 Tiktok
 Shopee
 Lazada

PART 2: This section determines the respondents’ awareness of some of the modern capabilities
of data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Awareness towards the presence of Strongly Disagree Agree Strongly


data mining disagree agree

I am aware that social media sites


detect patterns in the content I
subscribe to (e.g. the pages I follow,
the videos I view, the people I
subscribe to).

42
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I am aware that online shopping


platforms also detect patterns in the
items I buy or I am interested in,
showing similar items such as
through ads on the internet.

I am also Facebook’s Marketplace


and the like use algorithms that keep
valuable information such as
transactions and basic information of
people to detect and prevent
fraudulent transactions.

I am aware of the online presence of


the government and or any form of
law enforcement as well as their
methods to detect and prevent
cybercrime.

I am aware that there are programs


that can help researchers and data
scientists sort data and quickly show
results as compared to manually
organizing and sorting information.

PART 3: This section determines the online transparency of social media users, therefore their
susceptibility to data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Transparency of users’ data Strongly Disagree Agree Strongly


disagree agree

43
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I provide my complete and real basic


information on the social media sites
that I use such as my real name, age,
and contact information.

My content can be viewed publicly


even by other social media users I
am not acquainted with.

I read the terms and conditions


before using an app or other
programs.

There were times that I sent


passwords, proof of transactions, and
other private information via social
media sites.

I tend to be vocal as to what I feel


when commenting on online content.

PART 4: This section determines the priority of social media users as to whether they prioritize
privacy over convenience, vice versa. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Inclination to user privacy or Strongly Disagree Agree Strongly


algorithmic convenience disagree agree

I think that the examples of what


social media sites do in the previous
questions promote convenience for
users like me (personalized ads such
as on online shopping platforms).

44
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I generally allow social media sites to


use my information for whatever
purpose it may serve to them.

I carefully choose what information


can be viewed publicly from what
can be exclusively viewed by those I
allow to see, such as my online
friends.

I prefer that whenever I choose to


delete any of my online content, I am
given assurance that it is completely
removed.

I expect my personal details to be


secured and restricted from being
published without my consent.

In order to gain a comprehensive understanding of our respondents' perspectives, a survey

was created that included critical factors such as our respondents' awareness of the conventional

uses of data mining across various industries, as well as its modern capabilities and share of

controversial uses; our respondents' online habits and their preference for convenience over

privacy were also considered.

In order to increase the validity of the potential output, the services of an outsourced

statistician were enlisted to validate the survey; after validating the survey, the statistician

emphasized the importance of including an introductory page that requests our respondents'

45
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

consent in accordance with the Data Privacy Act of 2012, which is coincidentally related to the

subject of the paper at hand; thus, being highly advisable.

Furthermore, prior to formally releasing our survey, a pilot test of 25 respondents from

the selected population was undertaken to assess the possible consistency of our data and to

reduce the possibility of disseminating an incoherent survey at the last minute. The statistician

then used the Tau-equivalent reliability, often known as Cronbach's Alpha, to analyze the

findings, which yielded a rating of.930 derived from the responses from the survey. According to

the statistician, an equivalent of .9 indicates that the survey was well-designed and, as a result, is

suitable for official distribution.

Prior to the official distribution of our survey, we have received these responses from our pilot

testing:

Statement Completely Disagree Agree Completely


Disagree Agree

1 0 1 9 15

2 0 5 5 15

3 0 6 5 14

4 3 3 10 8

5 3 3 6 13

6 3 5 3 13

7 7 4 9 3

8 3 3 9 9

46
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

9 8 6 5 5

10 3 6 5 10

11 1 6 9 8

12 5 8 8 2

13 0 0 8 17

14 0 0 9 16

15 0 5 14 4

Our initial findings suggest a majority of positive responses that our respondents are

aware of algorithms being utilized in social media sites (statement 1), and the same amount that

is completely aware of personalized ads when using online shopping platforms, five among the

respondents having a grasp as per the concept, with the same amount being unaware (statement

2). The majority of our respondents are also aware of the online presence of law enforcement

among other public agencies for the purpose of investigating and preventing cybercrime, with an

almost similar outcome of the previous statement of respondents merely having a grasp of the

idea, as well as those who are partially unaware of their presence (statement 3). Meanwhile, we

have received a majority of positive responses pertaining to their knowledge towards the

organizational and anti-fraud uses of databases across industries such as in banking and

healthcare (statement 4) as well an equal amount of respondents signifying awareness pertaining

to the general concept of data mining and its ability to rapidly organize data (statement 5).

Pertaining to our respondents’ online habits, a majority of which attested to providing

complete and authentic basic information such as their real names, and contact information on

social media sites (statement 6). Half of our respondents admit to having publicly viewable

47
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

social media accounts, the other half being presumably hidden from public view or utilizing

anonymous accounts(statement 7). Our findings also suggest a majority of respondents read the

terms and conditions prior to using an application, contrary to the common belief that says

otherwise (statement 8). On the other hand, more than a half of the respondents deny sending

case-sensitive information such as passwords and proof of transactions via social media sites

(statement 9), while more than a half of responses suggest our respondents habitually comment

on highly public posts (statement 10).

Most of our respondents agree that the use of online algorithms by social media sites are

designed to augment convenience such as when following content aligned with their interests

and suggested items by online marketplaces based on our respondents’ previous searches

(statement 11), although it is worth noting that more than half of our respondents are not willing

to deliberately provide their information for any undisclosed purpose by social media sites

(statement 12). Parallel to our findings on statement 7, all of our respondents attest to modifying

their accounts as to choosing which information can be viewed publicly and exclusively, more

apparently are inclined to make their accounts more private, with more than a quarter only

making minimal changes as per making their accounts more private (statement 13). Most of our

respondents also expect the data that they choose to delete to be completely erased, void of any

traces, with more than a quarter suggesting options with which they can still have a back-up,

presumably when deleting data accidentally or out of impulse (statement 14). Lastly, a majority

of our respondents expect their details in their social media sites to be secure and free from being

published without their consent (statement 14).

48
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Interview Questions

● Given that FEU Diliman has an extensive and efficient program for student and faculty
identification, Would you say that it is because of this system that it is easier for students
and professors alike to integrate themselves into canvas accounts, as well as to ask for a
generated password for internet access within the premises? What kind of data exactly is
collected in the school database?
● What kind of programs are used to create the school database? And is there a name for
the database?
● Were there any issues during the development of that system?
● Would you say that there is at least a process of data mining present in the collection of
student data?
● We are aware that data mining is one of the topics taught under our integrated IT subjects
such as business analytics, what do you think are the benefits of being taught about this
process?
● As educators specializing in Information Technology, what other functions of this school
make use of data mining? Do other schools do the same? Are there other functions in this
school that data mining could make more efficient and convenient?
● With the rapid development that can further augment the capabilities of data mining,
what are the possible current issues that data mining can solve that it has yet to address?
(foreign, local, within the school)
● What would you say are the advantages of data mining over more primitive data
collection methods?
● Is there any law or policy that we should all be aware of pertaining to data mining? Were
there any recent local controversies regarding the topic? (cite some cases from other
countries mentioned in the RRL if necessary).
● How would you say that government and law enforcement agencies prevent potential
crimes? Through social media sites? Would you say our local agencies also make use of
data mining similar to foreign governments?

49
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

● Can you provide a rough estimate or percentage as to how many Filipinos use social
media? Would you say that they are aware that what they do on those social media sites is
susceptible to being documented or collected through third parties and other means? (cite
Cambridge Analytica incident if necessary)
● As specialists in information technology, is there any assurance that our online data as
netizens are secure? How susceptible are we to data mining without our consent? Is there
any way to mitigate or prevent the intrusion of our online privacy?
● Where do we draw the line between the convenience brought by the output through data
mining and the privacy and security of our online data?

On the other hand, a series of questions were designed for the interview to be conducted

on the latter half of this study, said questions were constructed also aligned with the objectives of

the study in mind. Given that the selected respondents for this data collection instrument are

educators of the same university to that of the researchers, we have limited the context of some

of the questions to the specialized field of said educators; that said, however, being also

specialists of the field of information technology, a fraction of questions were also designed with

a holistic concept of data mining considering that the topic at hand is also included in the

researchers’ curriculum as part of the IT-fusion program as per the first-hand experience of the

researchers under data mining-related subjects such as the Fundamentals of Business Analytics

which some of the researchers are currently taking.

This data collection instrument was validated by the researchers’ thesis adviser and

coordinator and as such, was a tacit implication deeming pilot testing for this instrument as

unnecessary, which was also explicitly stated by the said coordinator.

50
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

METHOD OF DATA COLLECTION

The researchers’ methods of collecting data for this study are through surveys and interviews.

Survey

A survey is a wide term that refers to any data gathering strategies in which each

participant is given to answer the same set of questions in a predetermined order, and the answers

are simple and easily understood by the researcher. Given the nature of the study, questionnaires

were used to collect data that was later used in the study's analysis. This will aid in acquiring a

thorough understanding of Netizens' attitudes about data mining.

The first step before giving out the survey form is to make the necessary consent and

permission from the participants to consider ethical issues. After obtaining consent and

permission, the questionnaire was floated out to the participants and they were given enough

time to answer the survey questions. Thereafter, the accomplished survey will be collected and

the responses will be tallied, tabulated, analyzed, processed, and interpreted using the statistical

measures. Responses were treated with the utmost confidentiality.

Interview

Is a method of collecting data through oral and verbal communication between the

researcher and the respondent, both structured and unstructured questions. The interview was

used because they are quite flexible, adaptable, and can easily allow the researcher to gain

greater insight into the topic. It was used to find views of the faculties in FEU Diliman about

51
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

data mining. The interview session will be conducted online in google meet or zoom so that

he/she would feel more at ease and comfortable sharing information needed for the study. Then,

the interviewee was told that the conversation will be recorded.

The data and interview responses provided by the respondent would be kept confidential

to encourage a degree of freedom and adaptation in gathering information from them. Then, the

purpose of the interview was clearly explained. A collection of open-ended interview questions

was created, which served as a guiding pattern but were not completely adhered to. In this

unstructured interview approach, open-ended questions were utilized to obtain a more thorough

response from the interviewee.

METHOD OF DATA ANALYSIS AND PRESENTATION

It is emphasized in this juncture of the study that both qualitative and quantitative

methods shall be applied to process the data to be gathered in the succeeding procedures with

which valuable information shall be the product thereof. As such, measurable questions shall be

used to assess the mean through five-point scales, with which data shall be interpreted with

respective calculations and observational interpretations; additionally, percentages and means

shall also be calculated as deemed necessary by the researchers to provide more profound

quantitative interpretations derived from the same sets of data stated prior. It is also just as

necessary to mention in this juncture that thematic analysis shall be utilized in the succeeding

procedures to emphasize an equal degree of integrity on the qualitative perspective; in which

thematic analysis would enable the researchers to interpret qualitative output while considering

52
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

crucial factors that may affect the answers with which the respondents might provide (Kiger &

Varpio, 2020).

However, the researchers must expect potential obstacles that the use of thematic analysis

may bring. As such, the researchers must expect to provide extensive interpretations

proportionate to the quantity of data that shall be amassed while considering necessary details

such as outliers and contradicting results due for further analysis (Rosala, 2019). Obstacles aside,

all methods stated prior are necessary for the researchers to produce valuable information about

the psychology of netizens towards data mining, and thus, taken by the researchers to mean tacit

consideration of the challenges at hand.

As such, the formulas with which the researchers shall measure data are as follows:

1. Percentage implies to derive a specified quantity from a hundred; by dividing the former to

the latter rebased to 100, the output of which is expressed with the percent sign (%) (Ministry

of Manpower, 2021). This shall be valuable in analyzing the demographics of the

respondents.

2. Weighted mean is a variant of mean derived from the multiplied weights accordingly with its

mean which is then summed with all other necessary products (Mucha & Pamula, 2020).

Through the weighted mean, the researchers shall determine the average answers from each

respective query.

53
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

3. Grand mean is derived either from a certain set of subsamples or all observations; it is

computed by simply dividing every data point by the joint sample size (Stephanie, 2021).

ETHICAL CONSIDERATIONS

In parallel to the topic of our study, it is expected from the researchers to practice utmost

respect and confidentiality in handling the respondents’ information and their respective answers

and as such, shall ask for their consent whenever it is deemed necessary over the course of this

paper and beyond its publication. The researchers are also expected to state their objectives upon

distribution of the data gathering instruments, with emphasis that the respondents disseminate

said instruments using whatsoever means so long as it is within legal bounds and provided that

they can still be identified in order to remain liable for the confidentiality of the respondents’

information during and after the course of this paper.

Upon dissemination of the data gathering instruments, the researchers shall adhere to the

provisions of Republic Act No. 10173 or the Data Privacy Act of 2012, its Implementing Rules

and Regulations, relevant policies and issuance of the National Privacy Commission, and all

other requirements and standards for continuous improvement and effectiveness of personal data

security management system.

54
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

REFERENCES

Auxier, B. et al. (2019, November 15). Americans and Privacy: Concerned, Confused and
Feeling Lack of Control Over Their Personal Information. Pew Research Center: Internet,
Science & Tech; Pew Research Center: Internet, Science & Tech.
https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confuse
d-and-feeling-lack-of-control-over-their-personal-information/

Baym, NK. (2015). Personal Connections in the Digital Age. Google Books.
https://books.google.com.ph/books?hl=en&lr=&id=4_1RCgAAQBAJ&oi=fnd&pg=PT6&ot
s=PUuT3qV-In&sig=bGpwya7ItUubdojhqoIMxSuA3ww&redir_esc=y#v=onepage&q&f=fa
lse

Bensaid, A. (2020). Israel-linked CIA-funded Palantir goes public, making espionage


mainstream.
https://www.trtworld.com/magazine/israel-linked-cia-funded-palantir-goes-public-making-es
pionage-mainstream-40230

Brownlee, J. (2020). What is Deep Learning?.


https://machinelearningmastery.com/what-is-deep-learning/

Bhasin, H. (2019). Data Mining In Retail: Applications and Six Phases in the Life Cycle.
https://www.marketing91.com/data-mining-in-retail/

Burhanuddin, M. (2018). Analysis of Mobile Service Providers Performance Using Naive Bayes
Data Mining Technique. 10.11591/ijece.v8i6.pp5153-5161

Calderon, T., Cheh, J.,& Kim, I. (2003). How Large Corporations Use Data Mining to Create
Value. Management Accounting Quarterly.

Central Intelligence Agency. (2016). 2016 Data Mining Report, January 01, 2016, real-life
through December 31, 2016.

55
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

https://www.cia.gov/static/18ac2c0c8b3caa27a014b4239d42efad/2016-cia-data-mining-report.pd
f

Chen, Y. (2017). The Application of Data Mining in Electronic Commerce.


https://www.atlantis-press.com/article/25879415.pdf

Christiansen, L. (2021). 7 Key Steps in the Data Mining Process.


https://zipreporting.com/en/data-mining/data-mining-process.html

Corporate Finance Institute. (2021). Regression Analysis.


https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/

Datalya. (2018). How Data Mining Can Help Financial Analysts.


https://datalya.com/blog/business-statistics/how-data-mining-can-help-financial-analysts

Dean, B. (2021). Social Network Usage & Growth Statistics: How Many People Use Social
Media in 2021?. https://backlinko.com/social-media-users

De Vaus, D. A. Research Design in Social Research. London: SAGE, 2001; Trochim, William
M.K. Research Methods Knowledge Base. 2006.
https://libguides.usc.edu/writingguide/researchdesigns

Disini & Disini Law Office (2017). Data Privacy Principles and Rights
https://elegal.ph/data-privacy-principles-and-rights/.

DTI (2019). The National Capital Region


https://www.dti.gov.ph/regions/ncr/profile/

Ecci International. (n.d.). A Summary of RA No. 10173 or the Data Privacy Act of 2012.
https://eccinternational.com/ra-10173-data-privacy-summary/

Edgar, Thomas W. (2017). Research Methods for Cyber Security || Exploratory Study. , (),
95–130. doi:10.1016/B978-0-12-805349-2.00004-2

56
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Estares, Ian. (2019). Social Media Growth in the Philippines.


https://www.d8aspring.com/eye-on-asia/4-more-reasons-why-social-media-in-the-philippines
-is-huge

ET Bureau. (2019). A look at mass electronic intelligence setup in the US and UK.
https://economictimes.indiatimes.com/tech/internet/a-look-at-mass-electronic-intelligence-set
up-in-the-us-and-uk/articleshow/67358291.cms?from=mdr

ETHW. (2016). Lawrence J. Fogel. https://ethw.org/Lawrence_J._Fogel

Galloway, Alison (2005). Encyclopedia of Social Measurement || Non-Probability Sampling. , (),


859–864. doi:10.1016/b0-12-369398-5/00382-0

Homeland Security Today. (2020). Newly Unclassified Report Finds CIA Security Failures Led
to Massive 2017 Breach.
https://www.hstoday.us/subject-matter-areas/cybersecurity/newly-unclassified-report-finds-ci
a-security-failures-led-to-massive-2017-breach/

Houston, T. (2017). Mass Surveillance and Terrorism: Does PRISM Keep Americans Safer?.
https://trace.tennessee.edu/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir
=1&article=3085&context=utk_chanhonoproj

IBM Cloud Education. (2021). Data Mining. https://www.ibm.com/cloud/learn/data-mining

Injadat, Mohammad Noor; Salo, Fadi; Nassif, Ali Bou (2016). Data Mining Techniques in Social
Media: A Survey. Neurocomputing, (), S092523121630683X–.
doi:10.1016/j.neucom.2016.06.045

IvyPanda. (2019, December 29). Ethical Implications of Data Mining By Government


Institutions.
https://ivypanda.com/essays/ethical-implications-of-data-mining-by-government-institutions/

Kade, R. (2019, February 5). Your Guide To Current Trends And Challenges In Data Mining.
SmartData Collective.

57
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

https://www.smartdatacollective.com/your-guide-to-current-trends-and-challenges-in-data-mi
ning/

Karim, S., Sirwan, R. (2019). A Data Mining Approach for Prediction Modeling Using
Association rule. 10.30630/joiv.3.3.264

Keary, T. (2019). The balancing act of data mining ethics: the challenges of ethical data mining.
https://www.information-age.com/data-mining-123481736/

Kemp, S. (2021, February 11). DataReportal – Global Digital Insights. DataReportal – Global
Digital Insights. https://datareportal.com/reports/digital-2021-philippines

Kiger, M. & Varpio, L. (2020). Thematic analysis of qualitative data: AMEE Guide No. 131.
https://doi.org/10.1080/0142159X.2020.1755030

Leprice-Ringuet, D. (2018). The UK’s mass surveillance regime has broken the law again.
https://www.wired.co.uk/article/uk-mass-surveillance-echr-ruling

LI, R. (2016). History of Data Mining.


https://www.kdnuggets.com/2016/06/rayli-history-data-mining.html

Liesbeth, D.M. (2021). Turing Machines. https://plato.stanford.edu/entries/turing-machine/

Liu S, Wang S, Liu X, Lin C-T, Lv Z (2021) Fuzzy detection aided real-time and robust visual
tracking under complex environments. IEEE Trans Fuzzy Syst 29(1):90–102

Liu S, Liu X, Wang S, Muhammad K (2021) Fuzzy-aided solution for an out-of-view challenge
in visual tracking under IoT assisted complex environment. Neural Comput Applic
33(4):1055–1065

Lupton, D. (2018). Digital Sociology. London: Routledge.


http://scholar.google.com/scholar_lookup?hl=en&publication_year=2018&author=D+Lupton
&title=Digital+Sociology

58
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Matillion. (2020). 5 real-life applications of Data Mining and Business Intelligence.


https://www.matillion.com/resources/blog/5-real-life-applications-of-data-mining-and-busine
ss-intelligence

McCulloch, W. (2016). Embodiments of Mind.


https://sites.google.com/a/eq.books-now.com/en89/9780262630306-14imtracGEital83

Ministry of Manpower. (2021). Percentages, Concepts, and Definitions.


https://stats.mom.gov.sg/SL/Pages/Percentages-Concepts-and-Definitions.aspx

Mucha, M. & Pamula, H. (2020). Weighted Average Calculator.


https://www.omnicalculator.com/math/weighted-average

Murthy, D (2013). Twitter. London: Polity Press.


http://scholar.google.com/scholar_lookup?hl=en&publication_year=2013&author=D+Murth
y&title=Twitter

National Privacy Commission. (n.d.). About National Privacy Commission.


https://www.privacy.gov.ph/about-national-privacy-commission/

National Privacy Commission (2016). Implementing rules and regulations of the data privacy
act. https://www.privacy.gov.ph/implementing-rules-regulations-data-privacy-act-2012/#17

Oracle. (2020). What Is a Data Warehouse?.


https://www.oracle.com/ph/database/what-is-a-data-warehouse/

Palad, E. (2018). Document classification of Filipino online scam incident text using data mining
techniques. https://animorepository.dlsu.edu.ph/etd_masteral/5522/

Patil, D. (2015). White House Author, DJ Patil, Deputy Chief Technology Officer for Data Policy
and Chief Data Scientist in the Office of Science and Technology Policy.
https://obamawhitehouse.archives.gov/blog/author/dj-patil

59
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Paul, M. (2017). Nonlinear Classification.


https://cmci.colorado.edu/classes/INFO-4604/fa17/files/slides-9_nonlinear.pdf

Pedamkar, P. (2021). Data Mining vs Data Analysis.


https://www.educba.com/data-mining-vs-data-analysis/

Perdigon, J. (2017). Pre-Employment Requirements.


https://www.cbecompanies.com/uploads/userfiles/files/documents/Employees/Philippines/Pr
e-Employment%20Requirements%20Final%20-%20instructions.pdf

Popescu, O. (2021). Crypto trading bots: The ultimate beginner's guide.


https://www.trality.com/blog/crypto-trading-bots/

Rosala, M. (2019). How to Analyze Qualitative Data from UX Research: Thematic Analysis.
https://www.nngroup.com/articles/thematic-analysis/

Saltz, J. (2020). CRISP-DM is Still the Most Popular Framework for Executing Data Science
Projects. https://www.datascience-pm.com/crisp-dm-still-most-popular/

Sarangam, A. (2021). Data Mining vs Data Analysis – An Easy Guide In Just 3 points.
https://www.jigsawacademy.com/blogs/data-science/data-mining-vs-data-analysis/

SAS. (n.d.). Data Mining What it is & why it matters.


https://www.sas.com/en_ph/insights/analytics/data-mining.html

Statista. (2021, August 12). The number of social network users in the Philippines is from
2017–2026.
https://www.statista.com/statistics/489180/number-of-social-network-users-in-philippines/

Statswork. (2018). Population.


https://www.statswork.com/help-guide/research-methodolgy/population/

Stephanie, G. (2021). Grand Mean: Definition. https://www.statisticshowto.com/grand-mean/

60
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Software Testing Help. (2021). Data Mining Examples: Most Common Applications Of Data
Mining 2021. https://www.softwaretestinghelp.com/data-mining-examples/

Taylor, Joanna; Pagliari, Claudia (2017). Mining social media data: How are research sponsors
and researchers addressing the ethical challenges?. Research Ethics, (), 174701611773855–.
doi:10.1177/1747016117738559

Udemy. (2016). How William Cleveland Turned Data Visualization Into a Science.
https://priceonomics.com/how-william-cleveland-turned-data-visualization/

USF Health. (2021). Data Mining in Healthcare.


https://www.usfhealthonline.com/resources/key-concepts/data-mining-in-healthcare/

Wang, Y. & Zhang, Y. (2019, June 13). Role of Big Data in the Smart Grid.
https://www.researchgate.net/publication/306330602_Energy_Big_Data_A_Survey#pf4

Yar, M (2014). The Cultural Imaginary of the Internet. London: Springer.


http://scholar.google.com/scholar_lookup?hl=en&publication_year=2014&author=M+Yar&ti
tle=The+Cultural+Imaginary+of+the+Internet

Yoshida, R. (2017). Tokyo is evasive on the report of a secret deal with the NSA over a mass
surveillance program.
https://www.japantimes.co.jp/news/2017/04/25/national/politics-diplomacy/tokyo-evasive-re
port-secret-deal-nsa-mass-surveillance-program/

61
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

APPENDICES

FEU DILIMAN
COLLEGE OF ACCOUNTS AND BUSINESS
LETTER OF REQUEST FOR ADVISORSHIP
Date: September 06, 2021
Faculty Member
Department of Accounts and Business

Dear Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA,


May we request for your kind approval to be our Group Adviser in our business research entitled “Data
Mining and Netizens’ Awareness: Understanding its Implications on Social Media and Data Privacy” as
partial fulfillment of the requirements for our degree.

Thank you. Respectfully yours,


Co, Chantel Dane A.
Fenis, Mikee T.
Gali, Julianne Kay E.
Nebria, Samuel John P.
Valeroso, Franco Enrique G.
Valiente, Merl Polin C.

Accepted by:

MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA


Signature of Adviser over Printed Name Group Adviser

Date: September 06, 2021

62
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

FEU DILIMAN

COLLEGE OF ACCOUNTS AND BUSINESS

CONSULTATION TIMESHEET

Group Name : Group 4

Section : DF41

Adviser’s Name : Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA

DATE CONSULTATION TIME STARTED TIME ENDED GROUP LEADERS


ACTIVITIES SIGNATURE

09/03/2021 TITLE & TOPIC 9:54 AM 10:16 AM


PROPOSAL

10/05/2021 CHAPTER 1 1:40 PM 2:00 PM

10/18/2021 CHAPTER 2 10:25 AM 11:29 AM

11/02/2021 CHAPTER 3 10:00 AM 10:27 AM

63
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Adviser’s Name : Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA

Prepared by: Noted by:

MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA MS. JACKIE LOU O. RABORAR, PHD

64
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

FEU DILIMAN
COLLEGE OF ACCOUNTS AND BUSINESS
OFFICIAL APPOINTMENT LETTER OF FACULTY ADVISORSHIP FOR THESIS WRITING 1

DATE : September 06, 2021


TO : MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA
FROM : MS. JACKIE LOU O. RABORAR, PhD
Research Coordinator
RE : Official Appointment Letter of Faculty Advisorship

We are pleased to inform you of your official appointment as Research Adviser entitled:
“Data Mining and Netizens’ Awareness: Understanding its Implications on Social Media and Data
Privacy” of the following students:
Co, Chantel Dane A. Nebria, Samuel John P.
Fenis, Mikee T. Valeroso, Franco Enrique G.
Gali, Julianne Kay E. Valiente, Merl Polin C.
As a group adviser, you are expected to:
1. Provide the group his/her consultation schedule;
2. Attend the consultation sessions as specified on the consultation schedule;
3. Require the group to submit a copy of their RESEARCH progress.
4. Advise and help the group all about the scheme, format, and contents of their research proposal.
5. Feedback the Business Administration Department on the group attendance, by submitting every
week the consultation form duly signed by the group adviser;
6. Conduct the mock defense to familiarize the group in preparation for the final presentation.
7. Help the group do the revision and correction and;
8. Encourage or assist the group to complete and submit the final paper and revision of their
business research proposal on time.
This will be a very challenging but exciting task. Your support is highly appreciated.
I agree and accept the official appointment

MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA


Signature over Printed Name
Faculty Adviser
Signed by:
MS. JACKIE LOU O. RABORAR, PhD
Research Coordinator

65
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

FAR EASTERN UNIVERSITY – DILIMAN


DEPARTMENT OF ACCOUNTS AND BUSINESS

LETTER OF COMPLETION

FOR BUSINESS PLAN AND BUSINESS RESEARCH

Date: November 25, 2021

MS. JACKIE LOU O. RABORAR, PhD


Subject Coordinator/Chair, Business Research

Dear MS. JACKIE LOU O. RABORAR,

This is to respectfully and officially endorse to your good office the Research paper entitled “Data Mining
and Netizens’ Awareness: Understanding its Implications on Social Media and Data Privacy” done by the
group whose names appear below:

Co, Chantel Dane A. Nebria, Samuel John P.


Fenis, Mikee T. Valeroso, Franco Enrique G.
Gali, Julianne Kay E. Valiente, Merl Polin C.

The final paper has been carefully scrutinized and evaluated by the undersigned. This endorsement
signifies that the group has satisfactorily and completely complied with the prescribed format and
requirements of the paper.

I hereby recommend for the Oral Defense on the date and place that may be set by the Officer-in-Charge
of the Department of Accounts and Business. Thank you.

Respectfully yours,

MRS. MA. JAYSAN DASEL CADETE-CRUZ, CPA, MBA


Adviser

66
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Survey Questionnaire

Part 1:

Name: _______________________________ Age: ________________

Email: _______________________________ Gender: ______________

Social Media Account (check all the accounts you have):

 Facebook
 Twitter
 Instagram
 Youtube
 Tiktok
 Shopee
 Lazada

PART 2: This section determines the respondents’ awareness of some of the modern capabilities
of data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Awareness towards the presence of Strongly Disagree Agree Strongly


data mining disagree agree

I am aware that social media sites


detect patterns in the content I
subscribe to (e.g. the pages I follow,
the videos I view, the people I
subscribe to).

67
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I am aware that online shopping


platforms also detect patterns in the
items I buy or I am interested in,
showing similar items such as
through ads on the internet.

I am also Facebook’s Marketplace


and the like use algorithms that keep
valuable information such as
transactions and basic information of
people to detect and prevent
fraudulent transactions.

I am aware of the online presence of


the government and or any form of
law enforcement as well as their
methods to detect and prevent
cybercrime.

I am aware that there are programs


that can help researchers and data
scientists sort data and quickly show
results as compared to manually
organizing and sorting information.

PART 3: This section determines the online transparency of social media users, therefore their
susceptibility to data mining. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Transparency of users’ data Strongly Disagree Agree Strongly


disagree agree

68
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I provide my complete and real basic


information on the social media sites
that I use such as my real name, age,
and contact information.

My content can be viewed publicly


even by other social media users I
am not acquainted with.

I read the terms and conditions


before using an app or other
programs.

There were times that I sent


passwords, proof of transactions, and
other private information via social
media sites.

I tend to be vocal as to what I feel


when commenting on online content.

PART 4: This section determines the priority of social media users as to whether they prioritize
privacy over convenience, vice versa. (On a scale of 1-4, 1 strongly disagree, 4 strongly agree)

Inclination to user privacy or Strongly Disagree Agree Strongly


algorithmic convenience disagree agree

I think that the examples of what


social media sites do in the previous
questions promote convenience for
users like me (personalized ads such
as on online shopping platforms).

69
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

I generally allow social media sites to


use my information for whatever
purpose it may serve to them.

I carefully choose what information


can be viewed publicly from what
can be exclusively viewed by those I
allow to see, such as my online
friends.

I prefer that whenever I choose to


delete any of my online content, I am
given assurance that it is completely
removed.

I expect my personal details to be


secured and restricted from being
published without my consent.

70
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

71
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

PILOT TEST RESULT

72
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

73
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

74
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

Interview Questions

● Given that FEU Diliman has an extensive and efficient program for student and faculty
identification, Would you say that it is because of this system that it is easier for students
and professors alike to integrate themselves into canvas accounts, as well as to ask for a
generated password for internet access within the premises? What kind of data exactly is
collected in the school database?
● What kind of programs are used to create the school database? And is there a name for
the database?
● Were there any issues during the development of that system?
● Would you say that there is at least a process of data mining present in the collection of
student data?
● We are aware that data mining is one of the topics taught under our integrated IT subjects
such as business analytics, what do you think are the benefits of being taught about this
process?
● As educators specializing in Information Technology, what other functions of this school
make use of data mining? Do other schools do the same? Are there other functions in this
school that data mining could make more efficient and convenient?
● With the rapid development that can further augment the capabilities of data mining,
what are the possible current issues that data mining can solve that it has yet to address?
(foreign, local, within the school)
● What would you say are the advantages of data mining over more primitive data
collection methods?
● Is there any law or policy that we should all be aware of pertaining to data mining? Were
there any recent local controversies regarding the topic? (cite some cases from other
countries mentioned in the RRL if necessary).
● How would you say that government and law enforcement agencies prevent potential
crimes? Through social media sites? Would you say our local agencies also make use of
data mining similar to foreign governments?

75
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

● Can you provide a rough estimate or percentage as to how many Filipinos use social
media? Would you say that they are aware that what they do on those social media sites is
susceptible to being documented or collected through third parties and other means? (cite
Cambridge Analytica incident if necessary)
● As specialists in information technology, is there any assurance that our online data as
netizens are secure? How susceptible are we to data mining without our consent? Is there
any way to mitigate or prevent the intrusion of our online privacy?
● Where do we draw the line between the convenience brought by the output through data
mining and the privacy and security of our online data?

Validated by:

Mrs. Ma. Jaysan Dasel Cadete-Cruz, CPA, MBA Ms. Jackie Lou O. Raborar, PhD

Faculty Adviser Research Coordinator

76
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

CURRICULUM VITAE

77
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

78
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

79
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

80
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

81
Far Eastern University – Diliman
Sampaguita Avenue, Mapayapa Village III,
Pasong Tamo, Quezon City

82

You might also like