Professional Documents
Culture Documents
Journal of Computer Science Research - Vol.2, Iss.1 January 2020
Journal of Computer Science Research - Vol.2, Iss.1 January 2020
Journal of
Computer Science
Research
Editor-in-Chief
Dr. Lixin Tao
Volume 2 ︱ Issue 1 ︱ January 2020 ︱ Page 1 - 20
Journal of Computer Science Research
Contents
ARTICLE
12 Location Determination Of Optimal Emergency System For Hurricane Disaster Based On Mathematical
Modeling
Shizhong Ma Shilong Zhu Yujie Jiang
16 Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis
Qing Hou Guangjian Wang Xiaozheng Wang Jiaxi Xu Yang Xin
REVIEW
1 Review of Artificial Intelligence with Retailing Sector
Venus Kaur Vasvi Khullar Neha Verma
8 Based On K-means Disease Diagnosis Research
Jiaqi Wu Qingda Zhang Linlin Zhao
Copyright
Journal of Computer Science Research is licensed under a Creative Commons-Non-Commercial 4.0 International
Copyright (CC BY- NC4.0). Readers shall have the right to copy and distribute articles in this journal in any form in
any medium, and may also modify, convert or create on the basis of articles. In sharing and using articles in this
journal, the user must indicate the author and source, and mark the changes made in articles. Copyright © BILIN-
GUAL PUBLISHING CO. All Rights Reserved.
Journal of Computer Science Research | Volume 02 | Issue 01 | January 2020
REVIEW
Review of Artificial Intelligence with Retailing Sector
Venus Kaur1 Vasvi Khullar2 Neha Verma3*
School of Information Technology, Vivekananda Institute of Professional Studies, Delhi, India
Article history This research service provides an original perspective on how artificial in-
Received: 21 December 2019 telligence (AI) is making its way into the retail sector. Retail has entered
a new era where ECommerce and technology bellwethers like Alibaba,
Accepted: 20 January 2020 Amazon, Apple, Baidu, Facebook, Google, Microsoft, and Tencent have
Published Online: 31 March 2020 raised consumers’ expectations. AI is enabling automated decision-mak-
ing with accuracy and speed, based on data analytics, coupled with self-
Keywords: learning abilities. The retail sector has witnessed the dramatic evolution
Artificial Intelligence(AI) with the rapid digitalization of communication (i.e. Internet) and; smart
phones and devices. Customer is no longer the same as they became more
Big data empowered by smart devices which has entirely prevailed their expec-
Retail tation, habits, style of shopping and investigating the shops. This article
Internet of Things (IoT) outlines the Significant innovation done in retails which helped them to
evolve such as Artificial Intelligence (AI), Big data and Internet of Things
(IoT), Chatbots, Robots. This article further also discusses the ideology
of various author on how AI become more profitable and a close asset to
customers and retailers.
1. Introduction As an instance, a purchase from retailer includes various
I
types of data in the form transactional data(i.e. cost price
n the fast-growing retail environment, retailer needs of product , quantity), customer data (i.e. age, gender,
to re-examine what they are doing, how they are do-
nationality etc.) and environmental data [5].Where as to
ing and how are they developing products. Shopping
forecast a demand of product more actively and precisely
complex are embracing the technology so to become
,a systematic identification response of consumer on sale
smarter to provide customer satisfaction, better service,
behavior is needed. Subsequently, data management is one
better response to customer demands and supply to be in
biggest issues and emerging field too in the retails indus-
competition in the era of Artificial Intelligence and Big
try, to resolve this issue companies are smearing “advanced
data [2]. Online or offline, AI can accommodate vast de-
and reliable data mining algorithms” [5,15] to store and eval-
scriptive data from enormous sources; images and videos
uate results to get better performance with data analysis.
and customer behavior and response. As a result of which
a huge amount of data (i.e. Big data) which is released 2. Objective
from different sources with distinctive patterns where we
can record facial expressions of buyer and seller and draw (1) The main objective of the report is to define, de-
a semantic process which can revolutionize a business [3]. scribe, and forecast the global artificial intelligence in
*Corresponding Author:
Neha Verma,
School of Information Technology, Vivekananda Institute of Professional Studies, Delhi, India;
Email: neha.verma@vips.edu
retail market on the basis of types (online and offline), consumer in different manner with the helps of smart de-
technologies, solutions, services, deployment modes, ap- vices which collects data, browsing cookies, facial expres-
plications, and regions. sions, euphemisms, preferences, living style etc. which
(2) The report provides detailed information regarding can be synthesized to provide better recommendation to
the major factors influencing the growth of the AI in retail make fast and better decisions [13].
market (drivers, restraints, opportunities, and industry-
specific challenges). 4. Literature Review
(3) The report aims to strategically analyse micro mar-
To review topic of Artificial intelligence in retailing a set
kets with respect to individual growth trends, future pros-
of articles were searched from Scopus. The search terms
pects, and contributions to the total market [4].
were “Retail” and “Artificial intelligence” with in topic
3. Need For Change In Retail search terms were “Big Data” and “IoT”. “Computer sci-
ence”, “Business management and security” and “Engi-
Everyone admits the fact that Change is inevitable. neering” were the subject area for Research. This resulted
Those days are gone when retailer can sell what they in 54 documents from which 20 were not citied. After re-
want to sell. Now, Retailers to have compete in every searching of 34 documents 20 were not found in the Sco-
sector of excellence beyond price. Convenience and ex- pus database and rest 14 were chosen for comprehensive
perience to consumer are becoming the two most crucial review.
factor (Offline and online) of battleground for brands. Basically, the distinctive and numerous quantities of
As per a survey done the Store WPP in partnership with procedures are led to explore the proficient vitality. The
IBM on customers and retailers, its results that 48% board preparing for the impact of artificial intelligence
should provide personalized promotion when they de- in retailing which essentially lies in three explicit sys-
mand it online, also 45% of them wants similar online tems which are suggested as quantitative, qualitative and
product availabilities as in-store. AI offers retailers the mixed method. Following that, these techniques have
opportunity to both radically enhance and personalise given a way to deal with experience a describe about few
the customer experience and to realise significant gains strategies that are utilized for proficiency, unwavering
in productivity throughout the business - from the ware- quality and operability of new techniques and gadgets
house, to delivery, to head office and online and physical in the era of retails. Subsequently, I have discovered a
outlets. In a highly competitive market, retailers cannot few relevant articles to comprehend the administration
afford to be left behind. strength of artificial intelligence in retailing which have
However, launching and progressing AI within a busi- used techniques, for example, an interpretivist approach,
ness does not come without its challenges: getting buy case studies, mixed method approach and; Elastic net ap-
in from senior management (who often do not have a proach and sentiment analysis [15].
tech or data background), choosing the right partners and
finding staff with the appropriate skill sets can all present 4.1 Review On Sentiment Analysis
significant hurdles. Another surprising fact came in front In this analysis done by Y. Li and H. Fleyeh [17] informa-
like retailers who are familiar with AI and wants to invest tion are gathered from Twitter, a small scale blogging
in it are 94% but ironically 91% of retailer also believes stage, which has been generally utilized for investigation
that it could be disruptive for their organization [3]. On the of notion (“positive”, “negative” or “nonpartisan”). The
other hand, customer want everything their own way: the slithered twitter information are those tweets incorporate
way the shop, what they want to explore, customization “IKEA” in explored urban communities amid entry pe-
and personalization, time they want to shop and they way riod of individual IKEA. Dialects explored in this work
want to get their products they want everything in their are English and Swedish. English-based tweets are best
control. which leads to need for change in retailing and in class in Sentiment examination, while there is not very
where Artificial Intelligence comes to fore [4]. many investigations in the Swedish- based supposition
The best applications of AI work alongside other tech- examination. For further investigation author introduced
nologies and alongside people within retail businesses. elastic net approach method to investigate the public
That means seeing the deployment of AI not just as the opinions. So as to explore general conclusions about re-
development of isolated technology, but part of a wider cently opened IKEA store, neighbourhood inhabitancies’
process of change. That in turn implies that the UK needs tweets containing “IKEA” are crawled (to be specific
to develop a talent pool for AI with a wide spectrum of “IKEA dataset”). At that point the notion to discover vi-
skills.AI help the brands and retailer to understand the
sual tweets are figured for profound comprehension of 4.2 Review On Quantitative Research
their assessment. The system of Sentiment analysis to
manage English and Swedish tweets is appeared in Figure Author, R. Seranmadevi and A. Senthil Kumar [13] in the
1. For English tweets, vocabulary-based methodologies is article “Experiencing the AI emergence in Indian retail
utilized thus in perspective on its capacity and validity in – Early adopters’ approach” has adopted quantitative
numerous present applications. But here author describe approach in favor getting result of Artificial software in
the limitation of sentiment analysis as the absence sen- retail sector. A disproportionate multistage judgement
timent lexicon in Swedish dictionary which confine the testing system has been utilized to gather the information
research to some extent and its notion can be only antici- from 610 customers situated at four capital areas of India,
pated by AI based strategies. The preparation information for example, Tamilnadu, Kerala, Karnataka and Telan-
is assessment named Swedish tweets where each word in gana individually. Information was gathered amid first
tweets are highlights encouraged into classifiers and relat- quarter time of the year 2018.The essential information
ed notion (“positive” and “negative”) are relied upon to be were gathered from the customers of the over four urban
fitted precisely from elective AI classification strategies. communities through organized poll prepared in online
The execution of these classifiers is assessed and the best mode. Since the examination depends on early adopters’
one is summoned to foresee slant of Swedish IKEA tweets approach however the creation of AI advances is to a great
[17]
. extent accessible just at the capital district of these states
instead of the entered area. Expressive research configu-
ration is utilized to portray the goal of customers’ domain
towards the rise of AI in the Indian retail segment.
The examination is constrained to southern piece of
India and it secured capital city of four states alone. The
example is constrained to 610 respondents. The fitting
factual strategies and apparatuses had been utilized in the
investigation to touch base at the discoveries. The utiliza-
tions of AI advances in on the web and disconnected retail
are assembled independently and its impact on structure
the quality, CRM and huge information was developed.
Further, its effect on the retailers aim and clients amuse
Figure 1. The proposed model to predict IKEA sentiment [17] are contemplated through Structural Equation Modelling
utilizing AMOS programming V.20and it was tried with
In addition to this author of “Shopping with a robot-
the suitable speculation.
ic companion”, F. Bertacchini, E. Bilotta and P. [2] They
has used the sentiment analysis in purposing the robotic 4.3 Review On Interpretivist Approach
model NAO in the field of artificial intelligence in retails.
The NAO robot can gather continuous discoursed and In the review paper by G. Santoro et al. on the subject of
utilize the sentiment analysis for the advancement client’s “Big data for business management in the retail industry”,
disposition knowledge. This AI module studies the client author has used interpretivist approach. Author justifies
feeling & mood and interconnects feelings inside the open this approach by describing that “This is regarded a fitting
settings, amid the shopping alliance with the mechanical strategy for this sort of point since it covers another space
right hand. Two procedures, the priori strategy and the and in light of the fact that there is restricted information
developing have been followed. For the initial, a rundown available about how firms use and manage Big data in re-
of tags, action words, modifiers, verb modifiers and sen- tails. In this regard, the case study is a suitable technique
tences for the six explicit feelings, physically portioned when researchers need to address inquiries concerning the
from tests of right-hand client’s collaboration in the real “how” and “why” of a specific point. [18]
world has applied. Then on this specimen, the AI has been The various case study methods are especially pertinent
prepared. From there the emergent segment is carried on for interpretivist approach. In detail, to accomplish the ob-
without the previous information, dealing with a similar jectives of this paper, a subjective methodology was em-
example with just three fundamental class, for example, braced through a numerous case study approach including
positive, nonpartisan and negative. With this AI module, firms working in the retail business. Author has taken five
we have had the capacity to relate personal conduct stan- associations working in the retail business. To maintain a
dards and feelings with discourse.[2] strategic distance from predisposition, the accompanying
advances and techniques were utilized. Initial, a comfort make choices among different products which suits them
choice was made among the best performing associations the best, get to know the physical location of products in-
in this area, so as to give bits of knowledge getting from stores and compare different brands.
high-performing standards. They were then reached to These responses will further be tailored into systematic
ask about their accessibility for contribution in meetings dataset based on visual, textual and facial expressions to
and their methodology toward Big data arrangement. Five perform predictive analysis to forecast the demand and
were accessible for meetings and were the keenest on this behavior of consumer. As described by [4], to trace in-store
exploration and point. Additionally, these were firms re- behavior of customer by Beacon, Closed- circuit televi-
ally conveying Big data in procedures and exercises. The sion (CCTV) , Radio Frequency Identification (RFID),
accompanying case determination criteria were picked: and Near Field Communication (NFC) are used in similar
the case is spoken to by a high performing retail firm, fashion clickstream behavior is used in online tracing.
and the case presents genuine usage of huge information From these RFID is widely used in retails and logistic and
stages or activities, somewhat. Subsequent to applying is the base technology for IoT [10].
these criteria, one firm was barred as it had not conveyed IoT is expected to be the trendsetter upcoming technolo-
enormous information like different ones. The four chose gy in almost every era of industry [16]. However, the alleged
firms had sent big data in their exercises something like acceptance and deployment of IoT is in initial stage which
one year before the meetings. Data Analysis started with indicates the lack of adaptive strategies in industries [9].
thinking about the mission, vision, qualities and tech- Also. Seranmadevi and Kumar [13] supports the fact that
niques of the firm, alongside the general history. This AI works at backdrop to accumulate the data, store and
information was triangulated with information got from process the data into information about customer which
the meetings and the outcomes were broke down. To fill helps to build big data structure and strengthen Big data
in missing delicacies, when vital, line up correspondence analytics which success based on right people training and
was directed with the firm by means of email or potential- analytical training. Big data is essential in paradigm shift
ly phone. n understanding with the organizations, it was to industry 4.0 revolution for online company’s decision
chosen to give the examination as unknown contextual support. It can inculcate the data to information and use it
investigations to avoid any conceivable misinterpretations more effectively to offer personalized customer services
because of the open idea of its substance. Firms wanted to both online and in-store [8].
stay unknown to abstain from uncovering key choices and
6. Application Of Artificial Intelligence With
viewpoints, particularly in light of the high centralization
Retail
of this industry and the predetermined number of well-
performing players in it. At long last, unknown cases may Retailer, especially offline, are facing continuous change
consider extrapolating all the more genuine data from re- in the customer behavior. Thus, they need to keep them
spondents. Author observed that, it is intriguing to feature updated by providing low-cost alternates to e-commerce to
that all the four retail firms profoundly and consistently customer at low management cost [2]. Therefore, research-
utilize huge information investigation to improve forms. ers are designing Robots which can fulfill the desired
Every one of the interviewees expressed that huge infor- function of maintaining shops at low cost, performing both
mation in the retail business assume a key job, particularly back-end and front-end operations, real time updating of
in the advertising and coordination capacities and that this shelves etc. [2,5].
significance will just increment. As reported by International federation of robotics [7], it
is seemed that growth of robotics in the market of social,
5. Artificial Intelligence With Retailing professional and domestic use is drastically increased.
“Retailing as a function is central to all economies and a Meanwhile, Artificial Intelligence and augmented reality
part of retailing value chain” [12]. Thus, AI will have great- kind of fiction is becoming reality [3]. It is widely accepted
er impact on retailing among the other digital technologies that Artificial Intelligence is a process of understating hu-
[3]
. AI is transforming the way of retailer interaction with man behavior and embedding the neurology into machines
the consumer and consumer protocol to choose a prod- so that they act smarter and help the human by their own
uct. Ranging from Siri on Apple iPhone, Watson of IBM, understanding [2,3,6]. Machine learning and Reductionism
Cortana of Microsoft, Alexa of google to Deep Mind of is transforming the way of interaction of machine to ma-
Alphabet varies in context and are mostly adopted by re- chines and machine to humans [2].
tailers and consumers [3,6]. AI possess marvelous response As described in Smart Shopping by IBM and Molnar G,
of customer as they shop online or in-store. As they can Bots are software appearance of robots which carries certain
programmed task is categorized into two types chatbots and Freenome uses artificial intel-
conversational bots [2,11]. Where Chat bots are most com- ligence to conduct innovative
cancer screenings and diagnostic
monly in use by websites where they assist you commonly Google Analytics , G
tests. Using non-invasive blood
Suite (formerly Goo-
asked questions which saves your time enhances user expe- 3.
FREE- tests, the company’s AI technol-
gle Apps for Work) ,
NOME ogy recognizes disease- associ-
rience and collects data; and Apple`s Siri, , Google`s Alexa, AI and nginx
ated patterns,
Microsoft`s Cortana, IBM`s Watson and Amazon echo .
providing earlier cancer
comes under conversational bots which are smart enough detection and better treatment
options.
to understand the intent of questions and learn your nature
AEye builds the vision algo-
of question [1,5]. IoT is one of the significant technologies rithms, software and hardware
in the field of AI which is transforming the retail entirely. that ultimately become the eyes
of autonomous vehicles. Its
IoT, which is based on network of electronics, sensors and
LiDAR technology focuses on
software to collect data is responsible to converse machine the most important information Robotics, AI, Ma-
4. AEYE
to machine with cognition and reasoning or in can defined in a chine Learning
vehicle’s sightline such as
as when Internet combine with devices, IoT Emerges [8]. people, other cars and animals,
IoT provide retailer a set of distinct data from which while putting less emphasis on
they can optimize their process to achieve more customer things like the sky, buildings
and surrounding vegetation.
satisfaction. “Bosch Home Connect ovens, Samsung smart
Lobster is an AI-powered plat-
washers and dryers, Nest thermostats, Ring video door- form
bells, security of SimpliSafe, GPS and accelerator sensors that helps brands, advertisers
and media outlets find and
on smartphones, and RFID products” are example IoT license user-generated social Google Analytics ,
smart devices [6]. Moreover, many companies are trusting media content by scanning ma- Vimeo, and Google
AI technology and turning the science fiction into reality 5 Lobster
jor social networks and several Tag Manager, Artifi-
cloud storage providers for cial Intelligence and
such as Augmented reality and virtual reality which bring images and video, using AI-tag- algorithmic engineer-
real time experience to users virtually (i.e. Simulator, VR ging and machine learning ing.
training sessions, flight simulators ) and making driverless algorithms to identify the most
relevant content.
car using AI, using of drones to supply goods [6,14]. It then provides those images to
For instance ,Amazon has showed interest in making clients for a fee.
drones delivery ,Tesco, Walmart, Target etc. has started Siemens focuses on areas like
energy, electrification, digitiza-
exploring deep learning of data which can create lucra- tion, and automation, as well as
tive offers by evaluating real time customer behavior [15], SIE- resource-saving and energy effi-
6 Robotics, AI
McDonald’s has installed automated kiosks to remove the MENS cient technologies and a leading
provider of devices and systems
need of cashier and customer representative. for medical diagnosis, power
Company generation, and transmission.
S.NO About the company Technologies
Name
AIBrain is an artificial intelli-
The online retail giant offers gence company that builds AI
both consumer and business-ori- solutions for smartphones and
ented AI products and services robotics applications. It has
and many of its professional AI BigData, IOT,AI,Ma- three products: AICoRE, the AI
AMA-
1. services are built on consumer chine Learning, agent; iRSP, an intelligent robot
ZON
products. Amazon Echo brings AWS,Cloud software platform; and Fu- Robotics,AI, Machine
artificial intelligence into the 7 AIBrain
turable, a future simulation AI Learning
home through the intelligent game where every character is a
voice server, Alexa fully autonomous AI. The focus
The multinational technology of their work is to develop arti-
company IBM has been active ficial intelligence infused with
in AI since the 1950s. The com- the human skill set of problem
pany was involved in the birth solving, learning and memory.
of artificial intelligence and is
still firmly committed today. cloud, big data and
With Watson, IBM has created a analytics, blockchain,
2. IBM
machine learning platform that IoT, machine learn-
can integrate AI into business ing, Linux and AI
processes, such as building a
chatbot for customer support.
Customers include Big Four
Auditor, KPMG and Bradesco,
one of Brazil’s largest banks.
Casetext is an AI-powered legal The failure to initiate direct, personal customer rela-
search engine specializing in tionships is not an option in this new landscape. Over-
legal documents, with a database
of more than 10 million statutes, all, the biggest advantage retailers can gain from AI is
cases, and regulations. A recent accurate, efficient analysis and proper utilization of
study comparing legal research all of the customer data at their disposal.
platforms found that attorneys Natural Language
Case- AI has – and will – continue to alter the retail indus-
8 using Casetext’s CARA A.I. processing, AI, Ma-
Text
finished their research 24.5% chine Learning try. The next few years will see continued enhance-
faster, required 4.4 times fewer
ments to both customer experience and operations.
searches to accomplish the same
research task, and rated the However, retailers aiming to fully take advantage of
cases they found to bw 20.8% these technologies in order to keep pace with giants
more relevant than those found
on a legacy research tool.
like Amazon and Walmart still have their work cut out
Like HiSilicon with its Kirin for them.
980, Qualcomm is another chip
manufacturer that is committed References
to artificial intelligence. AI plays
a crucial role in the Snapdragon [1] R. Arthur. Macy’s Teams With IBM Watson For
855 mobile platform. The chip
uses a signal processor for AI AI-Powered Mobile Shopping Assistant. Forbes.com,
Qual-
9
comm
speech, audio and image func- Artificial Intelligence 2016. [Online]. Available:
tions. h t t p s : / / w w w. f o r b e s . c o m / s i t e s / r a c h e l a r -
Qualcomm Snapdragons power
some of the most popular thur/2016/07/20/macys-teams-with-ibm-wat-
smartphones on the market. If son-for-ai- powered-mobile-shopping-assis-
you’re interested in AI in the tant/#6db414277f41 [Accessed: 03- Apr- 2019].
smartphone, you should keep an
eye on Qualcomm. [2] F. Bertacchini, E. Bilotta, P. Pantano. Shopping
Nvidia Corporation builds with a robotic companion. Computers in Human
graphics processing units and
Personal Computer Behavior, 2017, 77: 382-395. Available: 10.1016/
hardware to power various types
NVIDIA of AI-enabled devices. The com-
graphics, graphics j.chb.2017.02.064.
processing [3] J. Bowman. How Artificial Intelligence is transform-
10 CORPO- pany’s technology is used for
unit (GPU) and also
RATION everything from robots and
on artificial intelli- ing the retail conversation. The Store WPP in part-
self-driving vehicles to intelli-
gent video analytics and smart
gence (AI). nership with IBM, 2017.
factories. [4] C. Chan, H. Lau, Y. Fan. IoT Data Acquisition in
Fashion Retail Application: Fuzzy Logic Approach.
7. Conclusion In International Conference on Artificial Intelligence
and Big Data, 2018.
This research paper concluded the facts that the world [5] D. Grewal, S. Motyka, M. Levy. The Evolution and
of online and offline are congregating which is due to the Future of Retailing and Retailing Education. Journal
change in shopping, thinking, demanding and receiving of Marketing Education, 2018, 40(1): 85-93. Avail-
behavior of consumer. Willingly or unwillingly retailers able:
have to opt to new standards of innovative technology to 10.1177/0273475318755838.
meet the consumer demands or they have to face decline [6] D. Grewal, A. Roggeveen, J. Nordfält. The Future
in sales and revenues. Companies have started embracing of Retailing. Journal of Retailing, 2017, 93(1): 1-6.
the Artificial intelligence into their business process which Available: 10.1016/j.jretai.2016.12.008.
are nor only boosting up their businesses but also giv- [7] IFR. Service Robots. International federation of ro-
ing them better suggestion based on the distinctive data botics, 2016. [Online]. Available:
captured through new smart device, RFID, Robots, Chat https://www.ifr.org/service-robots/statistics/. [Ac-
Bots, Conversational bots, Big data, Facial expressions cessed: 07- Apr- 2019].
of consumers, their choices, IoT and many else smart and [8] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami.
innovative technologies which supports machine learning, Internet of Things (IoT): A vision, architectural el-
deep learning, Artificial augmentation and Intelligence ements, and future directions. Future Generation
,Virtual reality. Yet, this phase of AI is in preliminary Computer Systems, 2013, 29(7): 1645-1660. Avail-
stage which make the gap of experience strategy to hold able: 10.1016/j.future.2013.01.010.
the direction in which AI should be implemented as per [9] C. Hsu, C. Yeh. Understanding the factors affecting
business needs. the adoption of the Internet of Things. Technology
REVIEW
Based On K-means Disease Diagnosis Research
Jiaqi Wu* Qingda Zhang Linlin Zhao
North China University of Science and Technology, Tangshan, 063210, China
Article history For the diagnosis of diseases, modern medicine usually searches for
Received: 14 January 2020 diseases in the disease database to find the type of disease that matches
them. The diagnosis of diseases is the first step in treatment. Then the
Accepted: 17 January 2020 classification of diseases is the basis of disease diagnosis. Disease clas-
Published Online: 31 March 2020 sification plays an extremely important role in the scientific management
of medical records and the development of modern medicine, and is a
Keywords: bridge connecting modern medical science. Therefore, the classification
Disease Diagnosis of diseases is very necessary. Based on this, this article establishes a
K-means model for disease diagnosis, and combines the internationally
K-means unified disease type code ICD statistics table to classify the sample data
ICD set into infectious and parasitic diseases, tumors, diabetes and circulatory
diseases The training is perfect, and finally the diagnosis classification of
the disease is realized.
1. Introduction medical records in the clinical disease classification sys-
I
tem to construct a clinical diagnosis comparison table, and
n traditional medical diagnosis and treatment, doctors designed various coding systems based on existing clinical
understand the basic situation of the patient in ad- research. Yunsi Cen [2] built a digital medical management
vance for the diagnosis of the disease, including some database by comparing the medical knowledge object
basic physical data such as the patient's age, past medical classification systems at home and abroad, rebuilding the
history, and symptoms of onset. Then analyze the patient's medical knowledge object center, digitizing the disease
condition based on previous experience and get the dis- type and determining its value range. Jiayi Li [3] and others
ease type of the patient, and then treat the patient. In this analyzed the main nursing problems of 41 inpatients with
paper, the K-means model is established to improve the chronic obstructive pulmonary disease, found out the reg-
existing diagnosis defects, to improve the doctor's accura- ularity of the patient's onset symptoms, and built an Oma-
cy rate of the patient's illness, and to analyze the patient's ha classification system to promote the development of
condition in conjunction with scientific algorithms. clinical nursing. Xiangju Ouyang [4] and others proposed
the construction of a national-level disease classification
2. Research Status
system assisted by the Internet through the rapid develop-
For the diagnosis and classification of diseases, some ment of the Internet, using computers to retrieve disease
scholars in modern academic research have realized its codes. Juanjuan Cheng [5], collected and analyzed the data
importance and started a series of studies. Xianjing Hu [1] of 586 patients undergoing full digital mammography in
and others used the application and method of electronic outpatient clinics. Professional physicians classified breast
*Corresponding Author:
Jiaqi Wu,
North China University of Science and Technology, Tangshan, 063210, China;
Email: 1643360071@qq.com
lesions and proposed the establishment of a BI-RADS Step 4:Repeat steps 2 and 3 until a certain abort con-
FFDM classification system. Accuracy and sensitivity. dition is reached.
In the K-means clustering algorithm, the Euclidean dis-
3. Building the Model tance between different points is an index used to measure
the similarity between different quantities. In the K-means
K-Means Clustering is a method often used to automatically
clustering algorithm, when the distance from a different
divide a data set into K groups. It belongs to an unsupervised
point to a certain point is closer, the point will be classi-
learning algorithm. It originally originated from a vector
fied into a class only with the point closest to it. The tradi-
processing method of signal processing. The sum of the
tional K-means clustering algorithm is first classified into
squares of the distances from each point to its corresponding
a class based on the centroid, and so on to achieve classifi-
cluster centroid is the smallest.Given a set of observations
cation [5]. Before performing the K-means clustering algo-
(x1,x2,...,xn),Each of these observations is a d-dimensional
rithm, it is of great significance to determine the number
real number vector, and K-means clustering aims to divide n
k of category centers.Common methods for determining k
observations into k(k≤n)set S={S1,S2,...,Sk} To minimize
are: Silhouette Coefficient Calinski-Harabasz Index. Ca-
the sum of squares within the cluster,Where μi is the aver-
linski-Harabasz Index The relative calculation is relatively
age of the points in Si, which guarantees that the K-Means
simple and the value of k is more practical. Therefore, the
algorithm converges to a local optimum. K-means algorithm
Calinski-Harabasz Index was selected as the evaluation
execution steps are as follows:
standard for this experiment.
Step 1:Randomly select the initialized k category
Calinski-Harabasz score s is calculated as:
centers from the given data set a1, a2,…,ak;
Step 2:For each sample xi,Mark it as distance cate- tr ( Bk ) m − k
gory center d(xi ,xj ) Recent categories j; s (k ) =
tr (Wk ) k − 1
Since the K-means clustering algorithm is not suitable
for processing discrete data, when calculating the dis- Where m is the number of training samples and k is
tance between samples, you can choose one of Euclidean the number of categories; Bk Covariance matrix; Wk Co-
distance, Manhattan distance, or Minkovsky distance as variance matrix of the data within the category; tr Is the
the similarity of the algorithm according to actual needs trace of the matrix.That is, the smaller the covariance of
measure.Sample data is represented as xi=(xi1,xi2,...,xid), the data within the category, the better, and the larger the
xj=(xj1,xj2,...,xjd), Are samples xixj The specific values of covariance between the categories, the better, so that the
the corresponding d description attributes。The smaller Calinski-Harabasz score will be high.
the distance between the two rocks, the more similar they In this paper, the sample data set is obtained by consult-
are, and the smaller the difference is. ing the relevant hospital data and collation, and the data
The Euclidean distance formula is as follows: is preprocessed. The discrete values of the sample data
set are removed, the missing values of the sample data set
d are filled, and the data set used for model testing is finally
d(x
= i,x j) ∑ (x
k =1
ik − x jk ) 2 collated. The results are used as clustering attributes. The
K-means clustering method is used to cluster the data.
Manhattan distance is as follows: The number of clusters ranges from 2 to 24 through con-
tinuous loop iterations to obtain the corresponding Calins-
d
ki-Harabasz score, as shown in the figure
d(x=
i,x j) ∑ | (x
k =1
ik − x jk ) |
d
d(x i , x=
j)
p
∑ | (x
k =1
ik − x jk ) | p =
(p 1, 2, ⋅⋅⋅, ∞)
Infectious and parasit- sification problems. Divided into ten categories, and given
0 [1,140] 2699 2.7%
ic diseases the range of diagnostic values and the percentage of their
1 [140,240) Tumor 3353 3.4% number, it can better meet the needs of clinical diagnosis.
250 diabetes 8568 8.6%
2
4. Model Evaluation
Diseases of the circu-
3 [390,460)or785 29753 30.0%
latory system In this paper, the K-means algorithm is used for clustering
4 [460,520)or786 Breathe 14109 14.1% the diagnosis types of the disease. When constructing the
disease grouping model, K-means clustering is performed
5 [520,580)or787 digestion 9297 9.3%
on the data set and the diseases are grouped by combin-
6 [580,630)or788 Urogenital 5026 5.1% ing the various ICD coding ranges. The advantage is that
7 [710,740) Musculoskeletal 4826 4.9% K-means is a classic algorithm for solving clustering
8 [800,1000) damage 6815 6.8%
problems. Its operation is simple and the data processing
speed is fast. For processing large data sets, the algorithm
9 V or E or else other 15045 15.1%
maintains scalability and efficiency. When the result
It can be seen from the table that the percentage of clusters are very dense, and When the difference between
infectious and parasitic diseases is 2.7%; the percentage clusters is obvious, the algorithm has better processing
of tumor diseases is 3.4%; the percentage of diabetes effect. However, K-means is very sensitive to noise and
is 8.6%; and the percentage of circulatory diseases is outlier data. Even if there is a small amount of such data,
30.0%; Respiratory diseases accounted for 14.1%; di- it will have a great impact on the average value calculated
gestive diseases accounted for 9.3%; urogenital diseases by the overall model.
accounted for 5.1%; musculoskeletal and connective tis-
sue diseases accounted for 4.9%; injuries And poisoning 5. Conclusion
accounted for 6.8%; other types of diseases accounted The study of disease diagnosis classification can not only
for 15.1%. meet the needs of hospital management, but also can be
Test and summarize the established K-means algorithm used for hospital medical, scientific research and teaching
model. The above operation process uses the training set purposes. The correctness of disease diagnosis classifica-
of processing data set. The divided test set data is now in- tion directly affects the evaluation of medical quality and
put into the model, and the results are shown in the figure the allocation of medical resources. The K-means model
below. established in this paper has a certain guiding role in clin-
ical medical applications, and its algorithm has achieved
great research achievements in various fields. Therefore, the assessment of inpatients with COPD[J]. Journal
this model can be used not only for disease diagnosis but of nursing, 2013, 20 (08): 12-15.
also for other unlabeled Clustering problem. [4] Yunsi Cen. Research on medical knowledge model-
ing for clinical diagnosis and treatment[D]. Tsinghua
References University, 2012.
[5] Juxiang Ouyang, Feixia Chen, Wenfu Liang, Fang
[1] Xianjing Hu, Li Zhou, Mingyue Hu. Applied research
Shen, Dongsheng Liu, Mingjian Hu. The establish-
on clinical disease classification system of electronic
ment and application of Internet-assisted internation-
medical record[J]. World's latest medical information
al disease classification system[J]. Chinese Journal of
abstract, 2018, 18 (82): 205 + 207.
hospital management, 2006 (07): 485 + 494.
[2] Juanjuan Cheng. Application of BI-RADS FFDM
[6] Tiancai Deng, Ning Liao, Fan Zhang, Lifang Man,
classification system in the diagnosis of breast benign
Fangfang Zhu, junxuan Wang, Lang Li. Quality and
and malignant diseases[D]. Huazhong University of
efficiency analysis of domestic and international dis-
science and technology, 2014.
ease classification and coding [J]. Chinese medical
[3] Jiayi Li, Mei Wang, Honglu Duan, Xueqin Liu. Ap-
record, 2017, 18 (01): 28-32.
plication of Omaha problem classification system in
ARTICLE
Location Determination Of Optimal Emergency System For Hurri-
cane Disaster Based On Mathematical Modeling
Shizhong Ma1* Shilong Zhu2 Yujie Jiang3
1. College of Electrical Engineering, North China University of Science and Technology, Tangshan, 063210, China
2. College of Science, North China University of Science and Technology, Tangshan, 063210, China
3. School of Economics, North China University of Science and Technology, Tangshan, 063210, China
Article history This article first introduces the current research status of space optical
Received: 14 January 2020 communication, and gives a brief overview of the development and appli-
cation prospects of space optical communication, explaining its important
Accepted: 17 January 2020 research significance. Then, the working principle of ATP in space optical
Published Online: 31 March 2020 communication system is studied, the mathematical model of ATP control
system is established according to the actual needs, and the ATP control
Keywords: system design of space optical communication is designed. By selecting
Emergency response system appropriate motors and gyroscopes as the actuators and detection ele-
ments of the system, substituting the actual parameters for simulation
Reserve sites model analysis, and correcting and verifying the results, some useful results are
Requirement analysis obtained. The simulation results show the rationality and effectiveness of
the ATP design scheme.
1. Methodology coverage to improve the model.
R
eserve Sites Selection Model: In this section, Demand Efficiency Fairness
fairness, center allocation model, median location model Emergency Medical Package Emergency Medical Package Emergency Medical Package
*Corresponding Author:
Shizhong Ma,
College of Electrical Engineering, North China University of Science and Technology, No. 21 Bohai street, Tangshan, 063210,
China;
Email: 1643360071@qq.com
ages dispatched by each reserve sites does not exceed its min L
reserve, the total number of emergency medical packages
reserved by a reserve sites equals its demand, and the time
of dispatching emergency medical packages to each de- ∑ xij = 1
mand sites does not exceed the prescribed rescue time, the j∈J
optimization model of material distribution is established. x − z ≤ 0
ij j
∑ z j = p
High Efficiency Weak Economy s.t. j∈J
L − ∑ ri dij xij ≥ 0
j∈J
xij , z j ∈ {0,1}
∀i ∈ I , ∀j ∈ J
Rescue Time Total Cost
Among them, xij refers to whether reserve point sup-
area; ∑ j=p
z
(3) There are p reserve points by default; j∈J
(4) The weighted distance between the disaster area
∀i ∈ I , j ∈ J
and the reserve point is the largest.
Specific models are as follows: Each index is interpreted in the same way as the central
location model based on disaster weight. jective function and incorporating the idea of maximum
Maximum Coverage Model: The purpose of building coverage. Firstly, the constraints are as follows:
the reserve point is to carry out rescue operations ratio- (1) There p reserve points in default;
nally and to achieve the timeliness. Therefore, the reserve (2) Each affected area has and only one reserve point to
point should be able to cover the disaster area to the max- support it.;
imum extent, so as to start rescue work in time and facili- (3) Maximum weighted distance between disas-
tate the dispatch of materials. Therefore, the construction ter-stricken areas and reserve points;
of reserve points should follow the principle of maximum (4) All reserve points are selected within the candidate
coverage. Maximum coverage model does not require reserve points. Combined with the central location model
all affected areas to be covered. It maximizes the needs based on the degree of demand for emergency medical
of affected areas under the premise of limited number of packages and the median location model based on the
reserve points, so as to maximize the number of people degree of demand for emergency medical packages, the
covered in affected areas. The constraints are: following models are established:
(1) There are reserve points in default;
(2) To ensure that the selected reserve point i can cov- min W1 = L
er the disaster area i in the candidate reserve point. Spe-
cific models are as follows:
min W2 = ∑∑ ri dij xij
max z = ∑ ri yi i∈I j∈J
i∈I
∑ xij = 1
∑ x j = p j∈J
j∈J x − z ≤ 0
y ≤ ij j
∑ x ,∀i ∈ I
s.t. i j∈Ni j
∑ z j = p
s.t. j∈J
x ∈ 0,1 , ∀j ∈ J
j { } L − ∑ ri dij xij ≥ 0
yi ∈ {0,1} , ∀i ∈ I
j∈J
xij , z j ∈ {0,1}
∀i ∈ I , ∀j ∈ J
Among them, variable x j ,when candidate reserve
Considering the maximum demand of the disaster
point j is selected, x j = 1 ;otherwise, x j = 0 ; vari- area, the results are re-modeled and the reserve point
able yi ,when disaster area i is covered, yi = 1 ;oth- has been found. For existing reserve points, using the
idea of maximum coverage, the reserve points will be
erwise, yi = 0。 scheduled to support the surrounding disaster areas, and
The difference between this model and the two models the objective function is to support the areas that can be
is that each disaster area can be supported by multiple supported to the maximum extent. The constraint condi-
reserve points, which contradicts that each disaster area of tion is that the distance between the reserve point and the
the two models has only one reserve point. Therefore, pri- support area is within a certain range. Namely dij ≤ R ,
ority should be taken into account when establishing the
model. Therefore, the maximum coverage model is used ∀i ∈ I , j ∈ J ,At this time, the j th reserve point sup-
as an improved model after solving the above two models.
ports the i th disaster area. dij false indicates the distance
Freight Container Location Model:According to the
analysis of the above-mentioned central location model, between i and reserve point j in the affected area。 R
considering the three factors of efficiency, fairness and represents constrained radius.
demand, the model is improved by taking the minimum
maximum distance from the reserve point to each disas- 2. Location of Reserve Points
ter-stricken area and the minimum average distance from
In 2017, the worst hurricane to ever hit the United States
the reserve point to each disaster-stricken area as the ob-
territory of Puerto Rico left the island with severe damage
Reference
[1] Gang Chen. Research on urgent relief quick response
model and algorithm for post-disaster[D]. Southwest
Jiaotong University, 2011.
[2] Jun Zhan. Build a model system of supply demand
and distribution in the emergency system[D]. Nan-
jing University of Finance and Economics, 2011.
[3] Wagner H M. An integer linear-programming model
Figure 3. The diagram of demand sites
for machine scheduling[J]. Naval Research Logistics
In the figure above, the red triangle represents the loca- Quarterly, 2010, 6(2): 131-140.
tion of the optimal reserve point. The blue square represents [4] Daskin M S, Coullard C R, Shen Z J M. An Invento-
the location of the candidate reserve point. The black circle ry-Location Model: Formulation, Solution Algorithm
represents the location of five demand points. The dotted and Computational Results[J]. Annals of Operations
line represents the coverage area of the reserve point. As Research, 2002, 110(1-4): 83-106.
ARTICLE
Research and Application on Spark Clustering Algorithm in Campus
Big Data Analysis
Qing Hou* Guangjian Wang Xiaozheng Wang Jiaxi Xu Yang Xin
Nanjing Xiao Zhuang University, Jiangsu, Nanjing, 210017, China
Article history Big data analysis has penetrated into all fields of society and has brought
Received: 14 January 2020 about profound changes. However, there is relatively little research on big
data supporting student management regarding college and university’s
Accepted: 17 January 2020 big data. Taking the student card information as the research sample, us-
Published Online: 31 March 2020 ing spark big data mining technology and K-Means clustering algorithm,
taking scholarship evaluation as an example, the big data is analyzed.
Keywords: Data includes analysis of students’ daily behavior from multiple dimen-
Spark sions, and it can prevent the unreasonable scholarship evaluation caused
by unfair factors such as plagiarism, votes of teachers and students, etc.
Clustering algorithm At the same time, students’ absenteeism, physical health and psychologi-
Big data cal status in advance can be predicted, which makes student management
Data analysis work more active, accurate and effective.
Chinese Library Classification: TP311 Document code: A
Mllib
1. Introduction consumption, such as student’s information, course
B
selection, school report card, book borrowing history,
y 2013, big data had penetrated into all fields of online time distribution, internal forum communication,
society and brought about profound changes [1]..
MicroBlog and WeChat, etc. The existing huge infor-
Big data is a great change in thinking, a source
mation system in these universities has accumulated
of power for human beings to acquire new cognition
a lot of basic original data through years of operation.
and create new values, and a method for changing the
Carrying out in-depth analysis and application of these
market and innovating educational management. It is
not only a technology, but also a value and methodol- original data, strengthening the scientific management of
ogy. By mining, analyzing and synthesizing the data, the school based on overall analysis, and offering data
more valuable products and services can be obtained. support for the development decision of the school, has
In China, there are many colleges and universities with become an important issue and pioneering opportunity
more than 10,000 students and teachers. For university for Chinese universities. At present, data analysis and
management, a large number of information data will application in colleges and universities are mainly used
be generated through the Internet login and meal card to assist teaching management in many aspects such as
*Corresponding Author:
Qing Hou,
NanJing XiaoZhuang University, Jiangsu, Nanjing, 210017, China;
Email: 815422078@qq.com
Fund Project:
Nanjing Key Laboratory of Intelligent Information Processing Open Fund Project (No.19AIP05)
scientific research calculation, enrollment promotion, Therefore, it is a good choice to analyze campus big data
subject management, overall salary planning, and stu- with Spark related technologies.
dent information tracking. Data mining in student man- This paper, mainly based on Spark distributed platform,
agement is rare . In the management of college students, analyzes the daily behavior of university students by col-
managers can know students’ static data (such as grades, lecting their smart card data with clustering algorithm and
curriculum, personal basic information, etc.), but it is from multiple angles. To provide more accurate and ef-
difficult to master the dynamic data for students’ behav- fective data for the evaluation of scholarship may benefit
ior. If big data analysis on campus is related to student those who actively work hard and restrain the unfairness
behavior analysis, such as consumption of three meals a of plagiarism and fraud votes.
day, class attendance, library access and reading, daily
consumption (bathing, school supermarket, printing con- 2. Valid Data Sources
sumption, school hospital consumption), etc., then some
The data sources used in this research are all-in-one card
students’ learning conditions, physical health conditions
data of my University in 2014. It mainly include: canteen
and mental health conditions can be excavated and ana-
card swiping information (including breakfast, lunch and
lyzed. This offers forward-looking reference data for the
dinner), supermarket card swiping information, learning
management of college students and improves the accu-
and consumption card swiping information (printing
racy and efficiency of management.
and copying, book purchase, etc.), bathhouse card swip-
Big data analysis needs to rely on a well-performed
ing information, daily life consumption card swiping
data analysis platform. Traditional high-performance sin-
information (such as haircut, etc.), totaling more than
gle-machine running big data is no longer practical. Sev-
20,000. It is essential that these data be filtered out to es-
eral big data system analysis platforms, such as Hadoop,
tablish a relationship with the target issue. Scholarships
PureData and Exadata, have been launched in recent
are generally assessed in two ways. One is an ordinary
years, with Hadoop platform being the most prominent
scholarship with good results, and the other is an inspi-
and popular among users. However, with the deepening
rational scholarship with good results and poor families.
of application, Hadoop has exposed its limitations. This
For ordinary scholarships, it is unfair to simply look at
is mainly reflected in the following aspects: first, the op-
the results or the election of teachers and students. For
eration is too single and only supports Map and Reduce
example, students may cheat and lead to better results.
operations. Second, iterative computation is inefficient,
They may also get the relevant examination questions,
especially in machine learning and graphic computation.
or they may not work hard at ordinary times and finally
These issues are better addressed by the Spark framework
suddenly recite them. In addition to the conditions for
technology proposed by the Apache Software Foundation
scholarships, inspirational scholarships also have a re-
at the end of 2013. Spark is a parallel computing archi-
quirement for family difficulties. Student managers can
tecture based on HDFS. The main idea is to reduce disk
only understand family difficulties through their own de-
and network I/O overhead through a new job and data
scriptions and the relevant unit certificates offered by the
fault-tolerant approach whose core technology is elastic
students, but do not know that such certificates are often
distributed data sets (RDD). Unlike MapReduce, Spark is
easy to fake or find a relationship seal to get. Therefore,
not only limited to writing map and reduce. It offers users
if students’ daily behaviors are analyzed through all-in-
with a more powerful memory computing model, enabling
one card data information and the scope of personnel
users to read data into the cluster’s memory through pro-
who actively work hard in learning is locked, then the
gramming, which can quickly iterate data sets in memory
student management personnel can choose from this
for many times and support complex data mining algo-
scope according to their learning achievements, which
rithms and graph computing algorithms. At present, Spark
will achieve the relatively fair goal.
has built its own whole big data processing ecosystem,
Generally speaking, scholarship recipients should
such as flow processing, graph technology, machine learn-
start and finish classes on time, study hard and have
ing, NOSQL query, etc., and is a top-level Apache project.
good results. Students’ daily behavior is that students eat
Although Spark requires high memory and its launch time
breakfast and lunch by the hour, which accounts for a
is relatively short , with the gradual maturity of big data
large proportion of learning and spending, and they often
related technologies and industries, Spark technology has
go to and from the library. The card information shows
developed rapidly with unparalleled advantages after Ha-
the time of swiping the card when students go for break-
doop and will become the next generation of cloud com-
fast, lunch and dinner. In this work, the average number
puting and big data core technology to replace Hadoop [2].
of breakfasts and lunches in a month except weekends
is counted to indicate whether students attend classes on calculated; secondly, it should sum up the sample values
time, in which breakfast time is between 6:00 and 7:40, and count the number of samples through aggregate func-
and lunch time is 11:40 to 12:40. Therefore, the behav- tion; finally, it should obtain the newest center point, and
ior analysis of motivational scholarship personnel is set judge whether the center has changed.
from four dimensions: monthly average consumption,
breakfast by point, noon by point and study consumption 3.2 Algorithm Implementation
ratio. After screening, searching, removing and deleting K-means algorithm is one of the main algorithms in Spark
invalid data, a total of 18,389 valid data are obtained, Mllib algorithm. Spark-based distributed K-means algo-
including: student number, total consumption, canteen rithm offers a good practical tool for big data clustering
consumption, study consumption, breakfast (6:00-7:40), [4]
. The source code implementation of K-means algorithm
lunch (11:40-12:40). The authors group students into dif- is not repeated here. the following are the key codes for
ferent catalogs through multidimensional clustering: stu- clustering analysis of campus data based on Spark Mllib
dents who eat breakfast and lunch on time (they are not K-means algorithm:
late or absent from school), students who often go to the object Test1 {
library, and students who spend a lot of money on study def main(args :Array[String]): Unit ={
but the overall consumption level is not too high. Such val conf = new SparkConf().setMaster(“local”).se-
students can offer reference basis for schools to evaluate tAppName(“Consume”)
inspirational scholarships. The evaluation standard of val sc = new SparkContext(conf)
ordinary scholarships do not need to care about canteen Logger.getRootLogger.setLevel(Level.WARN)
consumption //val data = sc.textFile(“/home/spark/file1/FLWING1
_ medical.csv”)
3. K-means Clustering Algorithm
val data = sc.textFile(“/input/NJXZC_FLWing.csv”)
3.1 Algorithmic Thinking val data1 = data.map(d => d.split(“,”))
val stu_no = data1.map(arr => arr(0))
The basic idea of K-means algorithm is to give K cluster val consume_data = data1.map(arr => arr.slice(1,6))
centers randomly in the initial state and divide the sample val parseData = consume_data.map(consume =>
points to be classified into clusters according to the near- Vectors.dense(consume.map(str => str.toDouble))).cache()
est neighbor principle. Then, the centroid of each cluster is val initMode = “kmeans//”
recalculated according to the average method to determine val numClusters = 10
the new cluster center. Iterative until the moving distance val model = new KMeans()
of cluster center is less than a given value. K-means clus- .setInitializationMode(initMode)
tering algorithm is divided into the following three steps [3]: .setK(numClusters)
(1) The first step is to find a clustering center for the .setMaxIterations(numClusters)
points to be clustered. .run(parseData)
(2) The second step is to calculate the distance from for(c < -model.clusterCenters)
each point to the cluster center and cluster each point into println(c.toString)
the cluster nearest to the point. val a = parseData.collect
(3) The third step is to calculate the coordinate average val b = model.predict(parseData).collect
of all points in each cluster and use the average as a new val c = model.predict(parseData).map((_,2)).count-
cluster center. Step 2 and step 3 should be repeated until ByKey()
the cluster center no longer moves in a large range or the c.foreach(println)
number of clusters meets the requirements. val wssse = model.computeCost(parseData)
Spark MLIib K-means clustering model uses K-means println(“sse =” + wssse)
algorithm to calculate clustering center. Mllib implements }
K-means clustering algorithm: firstly, cluster centers are }
randomly generated, which supports randomly selecting
sample points as initial center points, and also supports 4. Experiments and Data Analysis
K-means++ method to select the optimal cluster center
points. Then the center point of the sample is calculated 4.1 Experimental Environment
iteratively. The distributed implementation of the iterative The experimental environment is Spark distributed clus-
calculation focus is: firstly, each sample’s center should be ter environment. One Dell server T410, hard disk 1.4T
sis, all students with cluster number 1 spend a lot of mon- management data for student management work, improve
ey on study and eat breakfast and lunch on time, which is management efficiency and try to avoid the preventable
the best locking range for scholarship evaluation. At the in advance.
same time, the number of student canteens has a higher
proportion of consumption and higher learning consump- References
tion, which is also the best locking range for motivational [1] Yihua Huang. Understanding Big Data[M]. China
scholarship pacifiers. Using the campus card data, we can Machine Press, 2014.
also explore whether students have friends and are lone- [2] Meiling Huang. Spark MLlib Machine Learning:
ly, and pay attention to students’ psychological health. Algorithm, Source Code and Actual Combat De-
Students’ physical health status is mined by swiping the tails[M]. Publishing House of Electronics Industry,
campus hospital data with a all-in-one card. The brunch 2016. (in Chines)
card time is used to predict the number of students who [3] Aiwu Zhou, Dandan Cui, Yong Pan. An Optimiza-
were frequently late and absent from school. As increas- tion Initial Clustering Center of K-means Clustering
ingly data sources (such as educational administration Algorithm[J]. Microcomputer and Its Applications,
data, library data, WeChat and MicroBlog forum, etc.) are 2011, 30(13): 1-3.
opened to the research group, our research content and ac- [4] Weizhong Zhao, Huifang Ma, Yanxiang Fu, et al. Re-
curacy will be improved. search on Parallel K-means Algorithm Design Based
on Hadoop Platform[J]. Computer Science, 2011(10):
166-168.
[5] Dean J, Ghemawat S. MapReduce: Simplified Data
Processing on Large Clusters[J]. Communications of
the ACM, 2008, 51(1): 107-113.
[6] Jianpei Zhang, Yue Yang, Jing Yang, et al. Algorithm
for Initialization of K-Means Clustering Center
Based on Optimized-Division[J]. Journal of System
Simulation, 2009, 21(9): 2586-2589.
[7] The Apache Software Foundation. Apache Mahout:
Figure 2. Partial consumption relationship diagram Scalable Machine Learning and Data Mining [EB/
OL], 2014.
5. Conclusion [8] F Wang, Z Liu. Optimization method of distributed
K-means algorithm based on Spark. Computer Engi-
In this paper, the information of all-in-one card for school neering and Design, 2019; 40(6): 1595-1600.
students is taken as the research sample. Big data mining DOI: 10.16208/j.issn1000-7024.2019.06.017
technology based on Spark platform is applied, combined [9] Y Qu, W Deng, F Hu, et al. Algorithm for ordering
with KMeans clustering algorithm. This paper takes points to identify clustering structure based on spark.
scholarship evaluation as an example to analyze big data. Computer Science, 2018; 45(1): 97-102+107.
Data includes analysis of students’ daily behavior from DOI: 10.11896/j.issn.1002-137X.2018.01.015
multiple dimensions, and it can prevent the unreasonable [10] M Xu, C Yu, H Shen. Research on K-means algo-
scholarship evaluation caused by unfair factors such as rithm of spark parallelization. Microelectronics &
plagiarism and fraud votes and can improve student man- Computer, 2018, 35(5): 95-99.
agement work. Further study can be extended to analyze [11] Liu P, Teng J, Zhang G, et al. Parallel K-means algo-
useful data and be better-prepared for things like truancy, rithm for massive texts on spark. The 2nd CCF Big
physical health and psychological status of students. All Data Conference, 2014. (in Chinese). Available from:
the efforts will provide more systematic and valuable http://mahout.apache.org/
Ⅰ. Format
Ⅱ. Cover Letter
Ⅲ. Abstract
A general introduction to the research topic of the paper should be provided, along with a brief summary of its main
results and implications. Kindly ensure the abstract is self-contained and remains readable to a wider audience. The
abstract should also be kept to a maximum of 200 words.
Authors should also include 5-8 keywords after the abstract, separated by a semi-colon, avoiding the words already used
in the title of the article.
Abstract and keywords should be reflected as font size 14.
Ⅳ. Title
The title should not exceed 50 words. Authors are encouraged to keep their titles succinct and relevant.
Titles should be reflected as font size 26, and in bold type.
Ⅳ. Section Headings
Ⅴ. Introduction
The introduction should highlight the significance of the research conducted, in particular, in relation to current state of
research in the field. A clear research objective should be conveyed within a single sentence.
Ⅵ. Methodology/Methods
In this section, the methods used to obtain the results in the paper should be clearly elucidated. This allows readers to be
able to replicate the study in the future. Authors should ensure that any references made to other research or experiments
should be clearly cited.
Ⅶ. Results
In this section, the results of experiments conducted should be detailed. The results should not be discussed at length in
this section. Alternatively, Results and Discussion can also be combined to a single section.
Ⅷ. Discussion
In this section, the results of the experiments conducted can be discussed in detail. Authors should discuss the direct and
indirect implications of their findings, and also discuss if the results obtain reflect the current state of research in the field.
Applications for the research should be discussed in this section. Suggestions for future research can also be discussed in
this section.
Ⅸ. Conclusion
This section offers closure for the paper. An effective conclusion will need to sum up the principal findings of the papers,
and its implications for further research.
Ⅹ. References
References should be included as a separate page from the main manuscript. For parts of the manuscript that have
referenced a particular source, a superscript (ie. [x]) should be included next to the referenced text.
[x] refers to the allocated number of the source under the Reference List (eg. [1], [2], [3])
In the References section, the corresponding source should be referenced as:
[x] Author(s). Article Title [Publication Type]. Journal Name, Vol. No., Issue No.: Page numbers. (DOI number)
J = Journal/Magazine
M = Monograph/Book
C = (Article) Collection
D = Dissertation/Thesis
P = Patent
S = Standards
N = Newspapers
R = Reports
Kindly note that the order of appearance of the referenced source should follow its order of appearance in the main manu-
script.
Graphs, Figures, Tables, and Equations
Graphs, figures and tables should be labelled closely below it and aligned to the center. Each data presentation type
should be labelled as Graph, Figure, or Table, and its sequence should be in running order, separate from each other.
Equations should be aligned to the left, and numbered with in running order with its number in parenthesis (aligned
right).
Ⅻ. Others
Conflicts of interest, acknowledgements, and publication ethics should also be declared in the final version of the manu-
script. Instructions have been provided as its counterpart under Cover Letter.
Journal of Computer Science Research
Aims and Scope
Journal of Computer Science Research is an international open-access, peer-reviewed journal specializing in computer
science. As technology progresses, computer science and technology has become an integral component of the everyday
lives for most. As such, the journal aims to present innovative insights in the field of computer science.
The scope of the Journal of Computer Science Research includes, but is not limited to:
● Algorithms
● Computer graphics
● Programming and programming language
● Complex systems
● Computer science theories
E-mail:contact@bilpublishing.com
Website:www.bilpublishing.com
About the Publisher
Bilingual Publishing Co. (BPC) is an international publisher of online, open access and scholarly peer-reviewed
journals covering a wide range of academic disciplines including science, technology, medicine, engineering,educa-
tion and social science. Reflecting the latest research from a broad sweep of subjects, our content is accessible world-
wide – both in print and online.
BPC aims to provide an analytics as well as platform for information exchange and discussion that help organizations
and professionals in advancing society for the betterment of mankind. BPC hopes to be indexed by well-known
databases in order to expand its reach to the science community, and eventually grow to be a reputable publisher
recognized by scholars and researchers around the world.
Database Inclusion