You are on page 1of 22

DATA POISON DETECTION

SCHEMES FOR DISTRIBUTION


MACHINE LEARNING
GUIDED BY:
MRS. S.MOUNASRI PRESENTED BY
R.Gowthami-18D21A05M5
V.Ujwala-18D21A05N4
K.Pallavi-17D21A0547
ABSTRACT:-

• Software defect detection is mainly used for identifying the defective modules that are present in the
software so that it helps in improving the quality of the software system. Machine learning techniques
are useful in terms of software defect detection. Software is divided into defective and non-defective
modules. Decision Tree and Logistic Regression algorithms are implemented for the classification of
defective data set. The datasets are taken from the promise data repository and thereby accuracy is
computed. The algorithms are implemented using Weka tool as well as Python idle and the
comparative analyses of the results are exhibited.
EXISTING MODEL OF SYSTEM

•Using WEKA tool and following Classifiers -


 Naïve Bayes
 SVC
DISADVANTAGES:-

• For the problem statement , the results are not accurate with this existing
algorithms.
PROPOSED MODEL OF SYSTEM:-

•Random forest
•Neural network
ADVANTAGES:-

• Time consumption is less.


• More accurate results are to be seen.
SYSTEM REQUIREMENTS

SOTFWARE REQUIREMENTS:
1. Operating System: Windows 7,8,10
2. Server –side Script: Python 3.6+
3. IDE : PyCharm

HARDWARE REQUIREMENTS:
1.RAM : 4GB
2.Processer : Intel Core i3
SYSTEM DESIGN:

• Design is the first step in development phase for any techniques and principles for purpose of defining a
device, a process or system in sufficient details
• Software architectural Design:
• Our system follows two phases of DML
1.Basic DML:
In DML,the centre server dispatches learning tasks to Distributed machines and aggregate their learning
results.
2.Semi DML:
In semi DML,the centre server further devoted poison detection scheme for basic DML
SYSTEM ARCHITECTURE:-
SYSTEM REQUIREMENTS:-

• Hardware:-
• windows of operating system 7,8,10(32-bit or 64-bit)
• RAM-4GB
• Software :-
• Anaconda navigator , jupyter notebook, python language.
MODULES:-

• Data collection
• Data preprocessing
• Feature extraction
• Evaluation metrix model
DATA COLLECTION:-

• Data used in this paper is a software data of JM1. This step is concerned with selecting the subset of all
available data that you will be working with. ML problems start with data preferably, lots of data
(examples or observations) for which you already know the target answer. Data for which you already
know the target answer is called labelled data.
DATA PREPROCESSING

• Organize your selected data by formatting, cleaning and sampling from it. 

• Three common data pre-processing steps are: 


• Formatting

• Cleaning

• Sampling
• Formatting: The data you have selected may not be in a format that is suitable for you to work with. The data may be in a
relational database and you would like it in a flat file, or the data may be in a proprietary file format and you would like it
in a relational database or a text file.


Cleaning: Cleaning data is the removal or fixing of missing data. 


Sampling: There may be unbalanced dataset to make it balance we have to consider the whole dataset and have to apply
sampling.
FEATURE EXTRACTION

• In this section we are going to count values of explained variable otherwise known as the determining
variable which is going to give us the prediction that flood will be there or not. Second of all we are
going to separate numeric features from categorical features. Then we are going to show the relation
between the categorical features in various plots and try to figure out or rather observe the influence of
those categorical features in the actual determining variable “prediction”. 
EVALUATION METRIX

• precision rate 

• recall value 

• f1 score value 

• prediction with some accuracy level


USECASE DIAGRAM:-
CLASS DIAGRAM:-
SEQUENCE DIAGRAM:-
ACTIVITY DIAGRAM:-
ALGORITHM EXPLANATION:

• We use the support vector machine(SVM)algorithm to learn a dataset that is


generated by the machine learning called scikit-learn. The trained model by SVM
is compared with the mathematical results conducted by another platform called
Wolfram Mathematica.
• We presented an improved data poison detection scheme and the optimal resource
allocation in the semi-DML scenario. Simulation results show that in the basic-
DML scenario, the proposed scheme can increase the model accuracy by up to
20% for support vector machine
CONCLUSION

• The data poison detection schemes in both basic-DML and semi-DML


scenarios. Simulation results show that in the basic-DML scenario, the
proposed scheme can increase the model accuracy by up to 20% for support
vector machine respectively. As to the semi-DML scenario, the improved
data poison detection scheme with optimal resource allocation can decrease
wasted resources for 20%-100% compared to the other two schemes without
the optimal resources allocation.

You might also like