Main Project Document Rev1

A Novel Software Engineering Approach Toward using Machine
Learning for Improving the Efficiency of Health Systems

ABSTRACT
Recently, machine learning has become a hot research topic. Therefore, this study
investigates the interaction between software engineering and machine learning within the
context of health systems. We proposed a novel framework for health informatics: the
framework and methodology of software engineering for machine learning in health
informatics (SEMLHI). The SEMLHI framework includes four modules (software, machine
learning, machine learning algorithms, and health informatics data) that organize the tasks in
the framework using a SEMLHI methodology, thereby enabling researchers and developers
to analyze health informatics software from an engineering perspective and providing
developers with a new road map for designing health applications with system functions and
software implementations. Our novel approach sheds light on its features and allows users to
study and analyze the user requirements and determine both the function of objects related to
the system and the machine learning algorithms that must be applied to the dataset. Our
dataset used in this research consists of real data and was originally collected from a hospital
run by the Palestine government covering the last three years. The SEMLHI methodology
includes seven phases: designing, implementing, maintaining and defining workflows;
structuring information; ensuring security and privacy; performance testing and evaluation;
and releasing the software applications.
INDEX TERMS: Health dataset analysis, machine learning, methodology, software

development management, software engineering.
CHAPTER-1
1. INTRODUCTION
The field of health informatics (HI) aims to provide a large-scale linkage among disparate
ideas. Normally, a healthcare dataset is found to be incomplete and noisy; as a result, reading
data from dataset linkage traditionally fails within the discipline of software engineering.
Machine learning (ML) is a rapidly maturing branch of computer science since it can store
data on a large scale. Many ML tools can be used to analyze data and yield knowledge that
can improve the quality of work for both staff and doctors; however, for developers, there is
currently no methodology that can be used. Regarding software engineering, there has been a
lack of approaches to evaluating which software engineering tasks are better performed by
automation and which require human involvement or human-in-the-loop approaches [1]. Big
data has many challenges regarding analysis challenges for real-world big data [2], including
OLAP mass data, mass data protection, mass data survey and mass data dissemination.
Recently, a set of frameworks have been used to develop data analysis tools such as Win-
CASE [3] and SAM [4]. The market has vast data analysis tools that can discover interesting
patterns and hidden relationships to support decision makers [5]. BKMR used the R package
as a statistical approach on health effects to estimate the multivariable exposure-response
function [6].
Augmentor included the Python image library for augmentation [7], while for the
visualization of medical treatment plans and patient data, CareVis was used [8], as it was
designed for this task. Other applications require a visual interface using COQUITO [9]. For
health-care data analytics, the widely known 3P tools [10] were used. Many simple
applications, such as WEKA, which provided a GUI for many machine learning algorithms
[11], while Apache Spark was used for the cluster computing framework [12], are powerful
systems that can used in various applications for solving problems using big data and
machine learning [13]. Table 1 summarizes the main tools used for big data in analytics
according with respect to the task. Software engineering for machine learning applications
(SEMLA) discusses the challenges, new insights, and practical ideas regarding the
engineering of ML and artificial engineering (AI) [14]. NSGA-II proposed algorithms for
real-world applications that include more than one objective function for enhancing
performance in terms of both diversity and convergence [15]. ML algorithms in clinical
A Novel Software Engineering Approach Toward using Machine Learning for Improving the Efficiency of Health Systems
genomics generally come in three main forms: supervised, unsupervised and semi-supervised
[16]. Interflow system requirement analysis (ISRA) has been used to determine the system
requirements [17].
Electronic healthcare (eHealth) frameworks have replaced traditional medical frameworks to

improve mobile healthcare (mHealth) and enable patient-to-physician and patient-to-patient
interactions to achieve improved healthcare and quality of life (QoL) . Big data and IoT have
been used for improving the efficiency of m-health systems by predicting potential life-
threatening conditions during the early stages. Intelligent IoT eHealth solutions enable
healthcare professionals to monitor health-related data continuously and provide real-time
actionable insights used to support decision making.
Machine learning is a field of software engineering that frequently utilizes factual procedures
to enable PCs to “learn” by using information from saved datasets. Unsupervised or
information mining focuses more on exploratory information investigation and is known as
learning supported by data analytics. Patient laboratory test queue management and wait time
prediction are a challenging and complicated job. Because each patient might require
different phase operations (tasks), such as a check-up, various tests, e.g., a sugar level test or
blood test, X-rays or surgery, each task can consider different medical tests, from 0 to N , for
each patient according to their condition.
In this article, based on a grounded theory methodology [21], the researchers proposed a
novel methodology, SEMLHI, in developing a framework by defining the research problem
and methodology for the developers. The SEMLHI framework includes a theoretical
framework to support research and design activities that incorporate existing knowledge. The
SEMLHI framework was composed of four components that help developers observe the
health application flow from the main module to submodules to run and validate specific
tasks. This enables multiple developers to work on different modules of the application
simultaneously. The SEMLHI framework supports the methodological approach to
conducting research on health informatics. It also supports a structure that presents a common
set of ML terminology to use, compare, measure, and design software systems in the area of
health. This creates a space whereby SE and ML experts can work on a specific
methodological approach to enable health informatics software development teams to
integrate the ML model lifecycle. Our methodology was applicable to current systems or in
the development of new systems that use the ML module for current systems, which can be
Dept. Of MCA, RGMCET, Nandyal 2

used in regular updates to add data to the system, to perform irregular updates and to add new
features such as new versions of ICD diagnosis codes, minor model improvements for bug
fixes, new functionalities required by the client, and new hardware or architectural
constraints.
1.1 Objectives
Normally, a healthcare dataset is found to be incomplete and noisy; as a result, reading data
from dataset linkage traditionally fails within the discipline of software engineering. Machine
learning (ML) is a rapidly maturing branch of computer science since it can store data on a
large scale. Many ML tools can be used to analyse data and yield knowledge that can
improve the quality of work for both staff and doctors; however, for developers, there is
automation and which require human involvement or human-in-the-loop approaches. Big
data has many challenges regarding analysis challenges for real-world big data, including
OLAP mass data, mass data protection, mass data survey and mass data dissemination.
1.2 SYSTEM ARCHITECTURE

CHAPTER-2
2. LITERATURE REVIEW
2.1. Existing System

Normally, a healthcare dataset is found to be incomplete and noisy; as a result, reading data
from dataset linkage traditionally fails within the discipline of software engineering. Machine
learning (ML) is a rapidly maturing branch of computer science since it can store data on a
large scale. Many ML tools can be used to analyze data and yield knowledge that can
improve the quality of work for both staff and doctors; however, for developers, there is
automation and which require human involvement or human-in-the-loop approaches.
2.1.1. Disadvantages

 Regarding software engineering, there has been a lack of approaches to evaluating

which software engineering tasks are better performed by automation and which
require human involvement or human-in-the-loop approaches.
2.2 Proposed System
 In propose paper author is combining Software Engineering and Machine Learning

algorithms to improve disease prediction in health care systems and to minimize time
taken to predict disease as we don’t have enough hospitals or bed to accommodate
growing number of patients and we can solve this problem of predicting disease with
less time by employing software and machine learning algorithms. Propose paper
concept is known as SEMLHI (where SE refers to software and ML refers to machine
learning and HI refers to health data).
The goals that are achieved by the software are:
 Instant access.
 Improved productivity.
 Optimum utilization of resources.
 Efficient management of records.
 Simplification of the operations.
 Less processing time and getting required information.
 Advance machine learning algorithm called EXTREME LEARNING MACHINE

(EML) and this EML algorithm is giving better prediction result compare to propose
paper algorithms.

CHAPTER-3
3. SYSTEM ANALYSIS
Analysis is a logical process. The objective of this phase is to determine exactly what must be
done to solve the problem. Tools such as Class Diagrams, Sequence diagrams, data flow
diagrams and data dictionary are used in developing a logical model of system.
3.1 Software development lifecycle
A software life cycle model (also termed process model) is a pictorial and diagrammatic
representation of the software life cycle. A life cycle model represents all the methods
required to make a software product transit through its life cycle stages. It also captures the
structure in which these methods are to be undertaken.

Fig. 1: SDLC Model
A life cycle model maps the various activities performed on a software product from its
inception to retirement.
There are different software development life cycle models specify and design, which are
followed during the software development phase. These models are also called "Software
Development Process Models." Each process model follows a series of phase unique to its
type to ensure success in the step of software development.
 Waterfall Model
 RAD Model
 Spiral Model
 Incremental Model

 Iterative Model
 Among all these models’ spiral model is the one of the best models.
Spiral Model
This SDLC model helps the group to adopt elements of one of more process models like a
waterfall, incremental. The spiral technique is a combination of rapid prototyping and
concurrency in design and development activities. Each cycle in the spiral begins with the
identification of objective for that cycle.
Fig. 2: Spiral Model
3.2 Feasibility Study
All projects are feasible when provided with unlimited resources and infinite time
unfortunately, the development of computer-based system or product is more likely plagued
by a scarcity of resources and difficult delivery dates. It is both necessary and prudent to
evaluate the feasibility of a project at the earliest possible time. Months or years of effort,
thousands or millions of dollars, and untold professional embarrassment can be averted if an
ill-conceived system is recognized early in the definition phase.
Feasibility and risk analysis are related in many ways. If project risk is great the feasibility of
producing quality software is reduced. During product engineering, however, we concentrate
our attention on four primary areas of interest.

3.2.1 Technical Feasibility
This application in going to be used in an Internet environment called www (World Wide
Web). So, it is necessary to use a technology that is capable of providing the networking
facility to the application. This application as also able to work on distributed environment.
GUI is developed using HTML to capture the information from the customer. HTML is used
to display the content on the browser. It uses TCP/IP protocol. It is an interpreted language.
3.2.2 Economical Feasibility
The economical issues usually arise during the economical feasibility stage are whether the
system will be used if it is developed and implemented, whether the financial benefits are
equal are exceeds the costs. The cost for developing the project will include cost conducts full
system investigation, cost of hardware and software for the class of being considered, the
benefits in the form of reduced costs or fewer costly errors.
3.2.3 Operational Feasibility
In our application front end is developed using GUI. So, it is very easy to the customer to
enter the necessary information. But customer must have some knowledge on using web
applications before going to use our application.
3.3 Requirements Analysis
A requirement is a relatively short and concise piece of information, expressed as a fact. It

can be written as a sentence or can be expressed using some kind of diagram.
3.3.1. Functional Requirement
Functional requirements describe what the system should do. The functional
requirements can be further categorized as follows:
1. What inputs the system should accept?
2. What outputs the system should produce?

3. What data the system must store?
4. What are the computations to be done?
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and the steps are necessary to
put transaction data in to a usable form for processing that can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system.
Input Design considered the following things:
1. What data should be given as input?
2. How the data should be arranged or coded?
3.3.2. Non-Functional Requirements
Non-functional requirements are the constraints that must be adhered during development.
They limit what resources can be used and set bounds on aspects of the software’s quality.
3.3.3. User Interfaces
The User Interface is a GUI developed using Java.
3.3.4. Software Interfaces
The main processing is done in Java and console application.
3.3.5. Manpower Requirements
5 Members can complete the project in 2 – 4 months if they work fulltime on it.
3.4 Modules
3.4.1 Health Informatics Data
To predict any disease we need to build Machine Learning models by using datasets and this
datasets often contains missing data, null and non-numeric data and this type of data could

degrade ML prediction accuracy and to overcome from this problem author is applying
PREPROCESSING on health care data to remove all missing and null values and then
convert non-numeric data to numeric data by applying python SKLEARN
PREPROCESSING classes. Often this dataset may contains unnecessary columns or
attributes and to remove this attributes author applying dimensionality reduction algorithm
called PCA. PCA (principal component analysis) remove unnecessary attributes from dataset
and maintain only important attributes necessary to make correct prediction.
3.4.2 ML Algorithms:
In this module author using various machine learning algorithms such as Linear SVC,
Multinomial Naïve Bayes, Random Forest, Logistic Regression and KNN. This algorithm
train itself with available datasets and then generate a train model and then this train model
will be applied on new test data to perform prediction. By using above algorithms we can
make machine to learn and perform prediction without any human supports.
3.4.3 Machine Algorithm Model:
Once after building above models then we can apply new test data on this model to predict
whether patient lab reports are positive or negative.
3.4.4 Software:
This module used by developers to check reliability of above modules by applying software
quality check, UNIT TESTING and software verification.
3.5 Proposed Algorithms:
3.5.1 Naïve Bayes

The naive bayes approach is a supervised learning method which is based on a simplistic
hypothesis: it assumes that the presence (or absence) of a particular feature of a class is
unrelated to the presence (or absence) of any other feature .
Yet, despite this, it appears robust and efficient. Its performance is comparable to other
supervised learning techniques. Various reasons have been advanced in the literature. In this

tutorial, we highlight an explanation based on the representation bias. The naive bayes
classifier is a linear classifier, as well as linear discriminant analysis, logistic regression or
linear SVM (support vector machine). The difference lies on the method of estimating the
parameters of the classifier (the learning bias).
While the Naive Bayes classifier is widely used in the research world, it is not widespread
among practitioners which want to obtain usable results. On the one hand, the researchers
found especially it is very easy to program and implement it, its parameters are easy to
estimate, learning is very fast even on very large databases, its accuracy is reasonably good in
comparison to the other approaches. On the other hand, the final users do not obtain a model
easy to interpret and deploy, they does not understand the interest of such a technique.
Thus, we introduce in a new presentation of the results of the learning process. The classifier
is easier to understand, and its deployment is also made easier. In the first part of this tutorial,
we present some theoretical aspects of the naive bayes classifier. Then, we implement the
approach on a dataset with Tanagra. We compare the obtained results (the parameters of the
model) to those obtained with other linear approaches such as the logistic regression, the
linear discriminant analysis and the linear SVM. We note that the results are highly
consistent. This largely explains the good performance of the method in comparison to
others.

3.5.2 K-Nearest Neighbors (KNN)
 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
3.5.3 Logistic regression Classifiers

Logistic regression analysis studies the association between a categorical dependent variable
and a set of independent (explanatory) variables. The name logistic regression is used when
the dependent variable has only two values, such as 0 and 1 or Yes and No. The name
multinomial logistic regression is usually reserved for the case when the dependent variable
has three or more unique values, such as Married, Single, Divorced, or Widowed. Although
the type of data used for the dependent variable is different from that of multiple regression,
the practical use of the procedure is similar.

Logistic regression competes with discriminant analysis as a method for analyzing

categorical-response variables. Many statisticians feel that logistic regression is more
versatile and better suited for modelling most situations than is discriminant analysis. This is
because logistic regression does not assume that the independent variables are normally
distributed, as discriminant analysis does.
This program computes binary logistic regression and multinomial logistic regression on both
numeric and categorical independent variables. It reports on the regression equation as well
as the goodness of fit, odds ratios, confidence limits, likelihood, and deviance. It performs a
comprehensive residual analysis including diagnostic residual reports and plots. It can
perform an independent variable subset selection search, looking for the best regression
model with the fewest independent variables. It provides confidence intervals on predicted
values and provides ROC curves to help determine the best cut-off point for classification. It
allows you to validate your results by automatically classifying rows that are not used during
the analysis.
3.5.4 Random Forest
Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks that operates by constructing a multitude of decision
trees at training time. For classification tasks, the output of the random forest is the class
selected by most trees. For regression tasks, the mean or average prediction of the individual
trees is returned. Random decision forests correct for decision trees' habit of overfitting to
their training set. Random forests generally outperform decision trees, but their accuracy is
lower than gradient boosted trees. However, data characteristics can affect their performance.
The first algorithm for random decision forests was created in 1995 by Tin Kam Ho[1] using
the random subspace method, which, in Ho's formulation, is a way to implement the
"stochastic discrimination" approach to classification proposed by Eugene Kleinberg.
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who
registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab,
Inc.).The extension combines Breiman's "bagging" idea and random selection of features,
introduced first by Ho[1] and later independently by Amit and Geman[13] in order to
construct a collection of decision trees with controlled variance.
Random forests are frequently used as "blackbox" models in businesses, as they generate
reasonable predictions across a wide range of data while requiring little configuration.

3.5.4 Linear SVM
In classification tasks a discriminant machine learning technique aims at finding, based on an

independent and identically distributed (iid) training dataset, a discriminant function that can
correctly predict labels for newly acquired instances. Unlike generative machine learning
approaches, which require computations of conditional probability distributions, a
discriminant classification function takes a data point x and assigns it to one of the different
classes that are a part of the classification task. Less powerful than generative approaches,
which are mostly used when prediction involves outlier detection, discriminant approaches
require fewer computational resources and less training data, especially for a
multidimensional feature space and when only posterior probabilities are needed. From a
geometric perspective, learning a classifier is equivalent to finding the equation for a
multidimensional surface that best separates the different classes in the feature space.
SVM is a discriminant technique, and, because it solves the convex optimization problem
analytically, it always returns the same optimal hyperplane parameter—in contrast to genetic
algorithms (GAs) or perceptron’s, both of which are widely used for classification in machine
learning. For perceptron’s, solutions are highly dependent on the initialization and
termination criteria. For a specific kernel that transforms the data from the input space to the
feature space, training returns uniquely defined SVM model parameters for a given training
set, whereas the perceptron and GA classifier models are different each time training is
initialized. The aim of GAs and perceptron’s is only to minimize error during training, which
will translate into several hyperplanes’ meeting this requirement.

3.6 System Designing
UML Introduction
The unified modelling language allows the software engineer to express an analysis model
using the modelling notation that is governed by a set of syntactic, semantic and pragmatic
rules. A UML system is represented using five different views that describe the system from
distinctly different perspective.
UML is specifically constructed through two different domains they are:
1) UML Analysis modelling, this focuses on the user model and structural model views
of the system.
2) UML design modelling, which focuses on the behavioural modelling, implementation
modelling and environmental model views.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
System design aspects
Once the analysis stage is completed, the next stage is to determine in broad outline form
how the problem might be solved. During system design, we are beginning to move from the
logical to physical level. System design involves architectural and detailed design of the
system. Architectural design involves identifying software components, decomposing them
into processing modules and conceptual data structures, and specifying the interconnections
among components.
Two kinds of approaches are available:

 Top-down approach
 Bottom-up approach
Design of the code
Since information systems projects are designed with space, time and cost saving in mind,
coding methods in which conditions, words, ideas or control errors and speed the entire
process. The purpose of the code is to facilitate the identification and retrieval of the
information. A code is an ordered collection of symbols designed to provide unique
identification of an entity or an attribute.
Design of input
Design of input involves the following decisions.
 Input data
 Input medium
 The way data should be arranged or coded
Validation needed to detect every step to follow when error occurs.The input controls provide
ways to ensure that only authorized users access the system guarantee the valid transactions,
validate the data for accuracy and determine whether any necessary data has been omitted.
The primary input medium chosen is display. Screens have been developed for input of data
using HTML.
Design of output
Design of output involves the following decisions
 Information to present
 Output medium
 Output layout
Output of this system is given in easily understandable, user-friendly manner, Layout of the
output is decided through the discussions with the different users.
Design of control
The system should offer the means of detecting and handling errors. Input controls provides
ways to:

 Valid transactions are only acceptable

 Validates the accuracy of data
 Ensures that all mandatory data have been captured
All entities to the system will be validated. And updating of tables is allowed for only valid
entries. Means have been provided to correct, if any by change incorrect entries have been
entered into the system they can be edited.
3.7 UML Diagram
As the strategic value of software increases for many companies, the industry looks for
techniques to automate the production of software and to improve quality and reduce cost and
time-to-market. These techniques include component technology, visual programming,
patterns and frameworks.
Businesses also seek techniques to manage the complexity of systems as they increase in
scope and scale. In particular, they recognize the need to solve recurring architectural
problems, such as physical distribution, concurrency, replication, security, load balancing and
fault tolerance.
The Unified Modelling Language (UML) was designed to respond to these needs. Simply,
Systems design refers to the process of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements which can be done easily
through UML diagrams. In the project four basic UML diagrams have been explained among
the following list:
 Class Diagram
 Use Case Diagram
 Sequence Diagram
 Activity Diagram
3.7.1 Class Diagram
In software engineering, a class diagram in the Unified Modelling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, and the relationships between the classes. This is one of the most
important of the diagrams in development. The diagram breaks the class into three layers.

The relationships are drawn between the classes. Developers use the Class Diagram to
develop the classes. Analyses use it to show the details of the system.

Fig. 3: class Diagram
3.7.2 Use case Diagram
In software engineering, a use case diagram in the Unified Modelling Language (UML) is a
type of behavioural diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases.
Fig. 4: Use Case Diagram
The main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted. Use cases are used during

requirements elicitation and analysis to represent the functionality of the system. Use cases
focus on the behaviour of the system from the external point of view.
3.7.3 Sequence Diagram
A sequence diagram in Unified Modelling Language (UML) is a kind of interaction diagram

that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called Event-trace diagrams,
event scenarios, and timing diagrams.
user system
1 : start()
2 : GrapeCS-ML Database()
3 : Run SVM Algorithm()
4 : Run KNN Algorithm()
5 : Run Logistic Regression Algorithm()
6 : Run Extension extreme learning machine Algorithm()
7 : Upload Test Image()

8 : Predict Harvest Stage, Growth Stage And Phenology Ty pe()
9 : ALL algorithms comparison graph()
10 : end()
Fig. 5: Sequence Diagram

3.7.4 Activity Diagram
The process flows in the system are captured in the activity diagram. Similar to a state
diagram, an activity diagram also consists of activities, actions, transitions, initial and final
states, and guard conditions.
Fig. 6: Activity Diagram
3.7.5 Collaboration diagram:
A collaboration diagram groups together the interactions between different objects. The
interactions are listed as numbered interactions that help to trace the sequence of the

interactions. The collaboration diagram helps to identify all the possible interactions that each
object has with other objects.
Fig. 8: Ccollaboration Diagram
3.8 Software and Hardware Requirements
3.8.1 Software Requirements

Operating System : Windows 10/11
Programming : Python
Front End : Python
3.8.2 Hardware Requirements

Processor : Intel core i5
RAM : 4 GB
Hard disk : 512 GB

Main Project Document Rev1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main Project Document Rev1

Uploaded by

Copyright:

Available Formats

A Novel Software Engineering Approach Toward using Machine

Learning for Improving the Efficiency of Health Systems

INDEX TERMS: Health dataset analysis, machine learning, methodology, software

Electronic healthcare (eHealth) frameworks have replaced traditional medical frameworks to

Dept. Of MCA, RGMCET, Nandyal 2

1.2 SYSTEM ARCHITECTURE

Dept. Of MCA, RGMCET, Nandyal 3

2.1. Existing System

Dept. Of MCA, RGMCET, Nandyal 4

 Regarding software engineering, there has been a lack of approaches to evaluating

2.2 Proposed System

 In propose paper author is combining Software Engineering and Machine Learning

The goals that are achieved by the software are:

 Optimum utilization of resources.

 Efficient management of records.

 Simplification of the operations.

 Less processing time and getting required information.

 Advance machine learning algorithm called EXTREME LEARNING MACHINE

Dept. Of MCA, RGMCET, Nandyal 5

3.1 Software development lifecycle

Dept. Of MCA, RGMCET, Nandyal 6

Fig. 1: SDLC Model

Dept. Of MCA, RGMCET, Nandyal 7

Fig. 2: Spiral Model

3.2 Feasibility Study

Dept. Of MCA, RGMCET, Nandyal 8

3.2.1 Technical Feasibility

3.2.2 Economical Feasibility

3.2.3 Operational Feasibility

3.3 Requirements Analysis

A requirement is a relatively short and concise piece of information, expressed as a fact. It

3.3.1. Functional Requirement

1. What inputs the system should accept?

2. What outputs the system should produce?

Dept. Of MCA, RGMCET, Nandyal 9

3. What data the system must store?

4. What are the computations to be done?

Input Design considered the following things:

1. What data should be given as input?

2. How the data should be arranged or coded?

3.3.2. Non-Functional Requirements

3.3.3. User Interfaces

The User Interface is a GUI developed using Java.

3.3.4. Software Interfaces

The main processing is done in Java and console application.

3.3.5. Manpower Requirements

Dept. Of MCA, RGMCET, Nandyal 10

3.4.3 Machine Algorithm Model:

3.5 Proposed Algorithms:

3.5.1 Naïve Bayes

Dept. Of MCA, RGMCET, Nandyal 11

Dept. Of MCA, RGMCET, Nandyal 12

3.5.2 K-Nearest Neighbors (KNN)

 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

3.5.3 Logistic regression Classifiers

Dept. Of MCA, RGMCET, Nandyal 13

Logistic regression competes with discriminant analysis as a method for analyzing

Dept. Of MCA, RGMCET, Nandyal 14

3.5.4 Linear SVM

In classification tasks a discriminant machine learning technique aims at finding, based on an

Dept. Of MCA, RGMCET, Nandyal 15

3.6 System Designing

UML is specifically constructed through two different domains they are:

Dept. Of MCA, RGMCET, Nandyal 16