You are on page 1of 6





School of Management, Library and Information Science, Tian Jin Normal University,TianJin,10065,China
Information Management, Chaoyang University of Technology, Taichung, Taiwan
School of Management, Tian Jin Normal University, TianJin,10065,China
E-MAIL:, *

Abstract: money. [1]

The rapid development of the Internet has brought great As a result of the rapid development of Computer,
convenience to users, but it also brought a lot of troubles to network and other information technology, data of social
the users' information privacy exposure. This study selects the media operators have been exponential growing. These
basic information of 300 users on FACEBOOK social media to
microscopic data publicized become a form of public
cluster analysis. The data sets of users are divided into five
groups by experiment, and the clustering results are classified
offerings, but since data publishers cannot control how users
and then discussed. By the analysis of the basic information of use data, neither do they know whether the information
the users, the risk factors of the users' basic information exposure will bring Internet users a certain risk [2]. Web
exposure are searched. At last, this study provides users with service providers want to use as much privacy information
substantive recommendations for the healthy development of as possible to get more benefits, and some stakeholders want
social media. to earn more profits, thus the content of the user's
information is held by multiple parties. Therefore, users
Keywords: need to grasp control of the privacy in their own hands to
Social Media; Information Exposure; Privacy Protection; ensure their own privacy are safe. It can be seen that the
K-means in-depth study of privacy issues is of great practical
1. Introduction The existence of social media provides a good research
environment for the user privacy problem study, it not only
has a very close situation to the reality in user scale, social
With the Internet growing, mobile Internet has deeply relations, complex structure, behavior law but also a model
affected people's social behavior. The dimension and the easier to obtain, easier to analyze, easier to verify and other
possibility of users exposed to the network environment is characteristics. As people pay more and more attention to
greatly increased. The threat of user information exposure privacy protection, how users' privacy and security can be
has become an important problem related to the vital effectively guaranteed has been a hot topic in the recent
interests of every Internet users. Under the influence of the study. Hundreds of social media have more than 10 million
powerful intelligent terminal, various service applications, users. For example: Facebook, Twitter, LinkIn, Wechat,
the growing role of network and other factors, more and Weibo and so on. And with the "Internet +" developing,
more users information are inevitable to be confronted with small social media products continue to emerge, then the
the initiative, unintentional or malicious disclosure. source of social media is not clear, making the user's privacy
According to "China National Rights Protection Survey more difficult to be protected. So the study of social media
Report 2016", 54% of Internet users think that personal users’ basic information exposure research is particularly
information disclosure is serious, of which 21% of Internet important.
users think it is very serious and 84% of Internet users
personally feel the adverse effects caused by information
exposure. According to the research, 47% of Internet users 2. Related Work
have met the "social media posing as friends and family
scam" situation, and a considerable number of the Internet At home and abroad, people attach great importance to
users received fraud information and suffered the loss of social media research. And the aspects of the study are very

` *&&&


broad. From the perspective of user behavior, Wang, Na and Through the results, the same class of topic or the user's
Xu, Dachen[3] used the questionnaire star in the most interesting topic can be recommended, which made
investigation of the protection of personal information in Sina microblogging topic function used more frequently[11].
mobile social networks, and carry out a sampling survey. In From the perspective of social media operators,
the survey, 535 were valid questionnaires. Analysis of the Cheng-Hung Tsai, Han-Wen Liu etc. [12] proposed the
survey results reveals that five aspects of user behavior information explosion, users use various social networking
influence personal information security. Respectively, (1) sites to interact through social network. Users connect to
setting personal information associated with the user to be a various forms of social networking sites anytime, anywhere
password and using the same password on multiple social through the Internet connection, thus social media operators
media; (2) installing social media software from unknown can clearly know the user's information and social relations.
sources of while not constantly update the system security Through the understanding of the user's personal preferences
software; (3) forbidding the use of terminal system privacy and interests, social media operators recommend the user's
control; (4) worrying about unknown information disclosure personal ads, products, articles and other diversified
but often activate the GPS function; (5) being more inclined personal social services. This way not only meets the needs
to upload privacy data to the network disk. These five of users, but also increases the product's click-through rate
aspects of user behavior will increase the possibility of and exposure. This article bases on the tones people used in
privacy exposure[3]. the social network, obtain data from three access--the user
Wang, Shuyi and Zhu, Na[4] put forward new fan page, the user graffiti wall message information, friends’
challenges to the Internet users’ privacy study in the mobile home page. The data is used for analysis of personal
social media user privacy protection strategy research. They preferences, and considering the amount of personal
hold the idea that besides identity, health, family and social data ,the paper find the personal preferences category from
relations, the location of mobile social media users, different groups, so as to recommend personal advertising
information publicized by service providers, posting status and product recommendation services better. Through the
to expose users’ privacy, hackers’ are also involved. Based actual verification, the research results are significantly
on the above issues, the author proposed the government, improved [12].
industry associations, service providers and users themselves From the perspective of user privacy exposure,
and other aspects collaboratively protect the user privacy[4]. Mohamed Bourimi, Ricardo Tesoriero etc.[13] proposed the
From the perspective of social media user privacy, Zhu, privacy and security issues of the multi-modal user interface
Hou[5] in the "social media users’ psychological reaction to for social media applications. The proposed method
privacy protection research" took the domestic and foreign describes how the privacy and security issues are modeled
important journals as the objects of study to analyze the from the perspective of the user interface, and how this
relevant literature about user's privacy psychology and model is developed with a four-level conceptual framework
created an integrated model of privacy concerns. Objects for multi-modal, multi-platform user interfaces. The
were divided into the anterior variables (Privacy experience, approach also explained how these models can be adapted to
privacy awareness, personality differences, demographic the development of social media applications. Finally the
characteristics, culture), the results variables (regulation, author used this model as a social media application to show
behavioral response) and the adjusted variables (privacy the feasibility of the model[13].
calculation, trust)[5]. Elena Zheleva, Lise Getoor proposed that in order to
Longfei Guo[6] took emergence and media report as solve the privacy problem, many social media sites allow
the objects of study. constructed social network privacy users to hide their personal files from the public. How do
concerns dynamic image factor model, observed changes in data users use the online social network mixed with public
user privacy concerns, compared single man’s concerns with and private user profiles to predict the user's private
groups’, explored the different results caused by different attributes? According to the relationship between the
functions of social media. At a result, they found regular classifications of the relationships, they infer the user's
patterns of social network privacy attentions. In addition, the sensitive information through the public information. In
evolutionary game model of privacy protecting addition to links between friends, groups, the study found
Liwen Wang[11] used Python to extract hot topics and that in several well-known social media sites, the use of
participants’ information from the Sina microblogging , and link-based and group-based classification research on social
then used the hierarchical clustering algorithm to extract the networking privacy issues mixed with public and private
hot topics clustering. At the same time, the collaborative user profiles can accurately restore the users’ private
filtering algorithm was used to analyze the extracted data. profiles[14].
The results of the two algorithms are analyzed and compared. Eden Litt [15] suggested that every day hundreds of


millions of people log on social networking sites and learning and other methods to build the model for data
generate TB-level data. How do social networking sites use collection research, from the perspective of the data
technology to protect the privacy of users? The results show inequality user and social media operators to obtain,
that the technology on the social networking site is cross-language text analysis, a variety of social media text
intermediary communication, which is between the user and analysis, Operators’ obtaining user information for product
the operator. The development of privacy limits and the marketing and other aspects of research. According to the
technology are controlled by people. Ultimately people can existing research results of scholars, users’ privacy are
identify privacy inequality, some of whom are more likely to severely exposed to the public view. There is few mature
take advantage of the technology to protect their privacy, but research system for social media user privacy. And the
there are also some personal information and reputations that existing user information exposure research empirical
may face more risk than others[15]. In addition, Sweeny research is rare. Therefore, this study based on the social
promoted K-anonymity (K-anonymity) processing before media user information exposure that is great significance
the release of micro-data is can effectively reduce the issue of privacy protection.
probability of personal information disclosure[16]. This paper selects the data with which users register for
Priyadarshini Lamabam and Kunal Chakma[17] basic information operations research, delimits exposure
proposed a way by code mixing and code switching, as well level of the users’ basic information. When browsing the
as two or more languages in the exchange process at the basic information of Facebook users, the basic information
same time being used. Based on previous research, for such of the user is shown in Table 1. In Table, “1” is the user to
a mixed code for language recognition, because of existence expose this information which is dangerous and “0” is not
of informal text (such as creative spelling and phonetic dangerous but “0+” is a little bad.
input), the text is difficult to accurately identify. The author According to the basic information exposed, this paper
uses natural language processing methods to automatically selects the most basic 10 variables. The basic information of
identify mixed social media text. In the study of Facebook users is the most likely to affect the user privacy
cross-language social media, the author selected Twitter and exposure. There are work experience, education, living place,
Facebook to show the mix of codes between English and mailbox and phone, birthday, gender, family members,
Manipuri, compared the Trigram model and the conditional emotional status and user avatar. Set 0 as the definition of
random field (CRF) model, to find which model can more the starting point for work experience and education. Based
accurately identify the language[17]. on the amount of information exposed, this paper tries to
[18] used text analysis in data mining for emotional establish a model that reflects the factors and the types of
analysis in social media texts. The collected data sets are attack in the process of information exposure.
divided into words and sentences, and then the weights of Through empirical research, this study aims at
the two classes are calculated to form the final text model. providing users with optimization strategies to reduce the
JAVA technology is used to evaluate the performance of the possibility of attack and a basis for long-term healthy
model through the different performance parameters and the development.
system in which the proposed system is trained by accuracy,
error rate, storage space, training time and test time. The 4. Experiments and results
author attempted to promote the automatic identification
model by using this model through subsequent research[18].
Entry the basic information of 300 users into the R
software and use K-means for grouping. By k-means
3. Materials and methods clustering algorithm, this paper divided 300 users’ data into
five groups. The Quantities were 14, 91, 43, 143, 9.ġ The
The amount of Facebook's monthly active users following results can be obtained by analyzing the grouping
exceeded 1.7 billion, and the amount of active users also results.ġ The first group of features are mainly in the work,
broke through 1.1 billion. Facebook has a huge amount of education background, place of residence, birthday, family
active users, a lot of social media text and provide an open members and these features are exposed more. The second
API access to data. group of features are mainly in the work, education
Based on the existing research results, although some background, family members and these features are exposed
scholars use the data mining for social media text analysis, more. In the third group, work and family members are not
domestic study lacks the choice of Facebook as a research exposed. There were much exposure in education
example and aspects of the user information exposure. Some background and few exposure in telephone numbers.ġ The
foreign researchers now use the data mining, machine characteristics of the fourth group mainly manifested in the


few exposure in work, education background and much fourth group for the fifth level, the first group for the fourth
information disclosure in family members. level, the second group for the third level, the fifth group for
The exposure of the fifth group exists mainly in the the second level, the third group for the first level .The first
work, education background, but family members are rarely level is the safest one.
exposed. These five groups are defined for user information
exposure hazard levels from danger to safety level. The

TABLE 1: Facebook users record information of field name and sub-field name

Facebook user basic information Dangerous level

Work experience 0+
Professional skills 0
Education 0+
Living place 0+
Contact information E-mail 1
iphone 1
Basic Information Birthday 1
Gender 1
Blood type 0
Sexual orientation 0
Other social networking sites 0
Family member 0+
Relationship status 1
Details About user information 0
Favorite aphorism 0
Memorabilia 0
Profile picture 1

FIGURE 2. K-means clustering results and grouping diagrams

FIGURE 1. R K-means grouping results
Through the grouping of the attributes exposed to the user
information, it is determined which variables the user's
exposure information is mainly focused on, and the harmful
effects of the centralized variables on the exposure of the user


information. And then put forward the corresponding Acknowledgments

countermeasures, the future users in the use of social media
how to protect the user's information privacy. This paper is supported by the national youth social
Privacy protection is an important issue in the global science fund project "study of social media user privacy
Internet governance, the user will pay more attention to protection based on information prices dynamically
information privacy and thus affect the adoption of revelation " (project approval number: 15CTQ017)
information systems. This is also one of the reasons why our
social media users use satisfaction decline. The average age
of users showing an upward trend. Therefore, for social References
media operators, should improve the user experience at the
same time, to further eliminate their privacy concerns, such [1] China Internet Association,Report on the Protection of
as adding more privacy exposure options, inform the users of Chinese Internet Users' Rights and
their privacy information. At the same time, it should start Interests,2016[EB/OL].
from the national level, one is to recognize the importance of
personal information protection. If users privacy and security 6.06.26/2016.10.08ˊ
are not enough, that will affect their use of information [2] XU Yong, WANG Hao, LI Dong-qin.Study on
systems, so that the negative use of the network platform will Anonymous Privacy Protection Technology in Data
even resist the use. The second is to strengthen the protection Publishing Domain,2011ˈ30˄8˅˖128-133.
of adolescent privacy information and education, since [3] Wang, Na, Xu, Dachen. Investigation and Analysis of
childhood there is the importance of privacy protection. The the Status Quo of Personal Information Protection in
third is in the personal information protection legislation to
Mobile Social Networksˈ2015ˈ34˄1˅˖185-194.
clear the scope of personal information for a variety of
[4] Wang, Shuyi, Zhu, Na. Research on the Privacy
collection, transmission and use of personal information
Protection of Mobile Social Media Users. Information
channels and platforms. The operation of a strong legal
Theory and Practice, 2013, 36 (7): 36-40.
definition and the elimination of information privacy are high
[5] Zhu, Hou. Study on Social media users privacy
user concerns about privacy risks.
concerns psychological mechanism. Library and
Information Knowledge, 2016, (2): 75-82.
5. Conclusion
[6] Guo, Longfei. Study on dynamic image factors and
behavior law of Social network users’ privacy attention.
According to the data of 300 users, we can see that there
Beijing University of Posts and Telecommunications:
are more users belonging to the third group and the fifth
School of Economics and Management, 2013.
group. According to the dangerous level defined by us, we
[7] Miao, Zhenxing. Research on character analysis
also have more user groups for the first and second levels
technology based on social media. Harbin Institute of
respectively. Users belonging to the third, fourth and fifth
Technology: College of Computer Science and
groups should pay attention to the exposure of information,
Technology, 2013.
such as work, education background, birthday, family [8] Chen, Xi. Research on Social Network Customer
members and other basic information.
Segmentation Based on R Language Data Mining.
This paper only selects the basic information of 300
School of Economics and Management, Beijing
users for empirical research, and has some limitations in the
University of Posts and Telecommunications, 2011.
research of data set. At the same time, the use of supervised
[9] Lian, Jie. Research on Social Network Data Mining
learning methods for dealing with the data will produce a Based on User Characteristics. Beijing Jiaotong
certain human and the choices of basic information data set University, 2013.
may also have some irresistible factors. [10] Chen, Yuying. Analysis of social network user
In the following study, researchers have to establish the characteristics based on machine learning. Beijing
user's basic information exposure model and expand the Jiaotong University: School of Electronic Information
training data set at the same time. The future research will not Engineering, 2015.
be limited to the user's basic information exposure, but [11] Wang, Liwen. Research on data mining based on social
contain the user's comments, the user's location information network. Xi'an University of Electronic Science and
and so on. The study is looking forward to providing a more Technology, 2014.
secure service.
[12] Tsai, Cheng-Hung, Liu ˈ Han-Wen et. al. Social
persona preference analysis on social networks.


International Conference on Connected Vehicles and pp: 1649̢1656.

Expo (ICCVE), Taiwanˈ2015, pp.32-39. [16] Samarati P., Sweeney L. Generalizing Data to Provide
[13] Mohamed BourimiˈRicardo Tesorieroˈet al. Privacy Anonymity When Disclosing Information, Proc. of the
and Security in Mult-modal User Interface Modeling for Seventeenth ACM Sigact-Sigmod-Sigart Symposium on
Social Media, IEEE Third International Conference on Principles of Database Systems. New York: ACM Press,
Privacy, Security, Risk and Trust (PASSAT) and 2011 1998:188-189.
IEEE Third Inernational Conference on Social [17] Priyadarshini Lamabam, Kunal Chakma ˈ2016ˈ A
Computing, (SocialCom) 2011, pp:1364-1367. Language Identification System for Code-Mixed
[14] Elena ZhelevaˈLise Getoor. To join or not to join: the English-Manipuri Social Media Text, 2016 IEEE
illusion of privacy in social networks with mixed public International Conference on Engineering and
and private user profiles. Proceedings of the 18th Technology (ICETECH).
international conference on World wide web, New York, [18] Vidhyabhushan Dasondi, Milap Pathak, Narendra Pal
2012, pp:531-540. Singh, 2016, An implementation of graph based text
[15] Eden LittˈUnderstanding social network site usersÿ classification technique for social media, 2016,
privacy tool use. Computers in Human Behavior, 2013, Symposium on Colossal Data Analysis and Networking.