You are on page 1of 27

JSPM’S

Bhivarabai Sawant Institute of Technology & Research


Pune-412207

Department Of Computer Engin

Academic Year 2019-20

Mini Project Report

On
“Classification on Final
settlements in labor
negotiations in Canadian
industry”

Submitted by:

Sonali Patil(BEA_05)
Reshma Jadhav(BEA_06)
Austin Wilson(BEA_22)

Under the guidance of


Prof.Nilufar Zaman

Subject : Laboratory Practice II


DEPARTMENT OF COMPUTER ENGINEERING

BHIVARABAI SAWANT INSTITUTE OF TECHNOLOGY & RESEARCH

WAGHOLI, PUNE – 412 207

CERTIFICATE

This is to certify that the Sonali Patil(BEA_05) ,Reshma Jadhav(BEA_06) and Austin Wilson(BEA_22)
have submitted her Project report on “ Classification on weather record dataset ”under my guidance and supervision.
The work has been done to my satisfaction during the academic year 2019-2020 under Savitribai Phule Pune University
guidelines.

Date:

Place: BSIOTR, PUNE.

Prof.Nilufar Zaman Dr. Prof. Gayatri Bhandari


Project Guide H.O. D.
ACKNOWLEDGEMENT

This is a great pleasure & immense satisfaction to express my deepest sense of gratitude &
thanks to everyone who has directly or indirectly helped me in completing my Project work
successfully.

I express my gratitude towards guide Prof. and Dr. Prof. G.M. Bhandari Head of
Department of Computer Engineering, Bhivarabai Sawant Institute Of Technology and Research,
Wagholi, Pune who guided & encouraged me in completing the Project work in scheduled time. I
would like to thanks our Principal, for allowing us to pursue my Project in this institute.

Sonali Patil(BEA_05)
Reshma Jadhav(BEA_06)
Austin Wilson(BEA_22)
INDEX
Page
Sr. No. Chapters (14 points)
No

CERTIFICATE PAGE I

ACKNOWLEDGEMENT II

ABSTRACT III

Index Page IV

1. INTRODUCTION 1

2. OBJECTIVES AND SCOPE 3

PROPOSED SYSTEM 4
3.
METHODOLOGY

4. RESULTS AND DISCUSSIONS 11

ADVANTAGES AND 19
5.
DISADVANTAGES

6. CONCLUSION 20

7. REFERENCES 21
ABSTRACT

In this era of data science where R and Python are ruling the roost, let’s take a look at another
data science tool called Weka. Weka has been around for quite a while and was developed internally at
University of Waikato for research purpose. What makes Weka worthy of try is the easy learning curve.
Weka is a free software under the GNN general public license. Weka supports several standard data
mining tasks, more specifically, data preprocessing, clustering, classification ,regression, visualization,
and feature selection. A number of organizations monitor and analyse current Canadian economic
conditions including the Bank of Canada, the economic research units of major Canadian banks, and think
tanks such as the Conference Board of Canada. Large unions (e.g., CUPE or Unifor) and trade / industry
associations may also have economists on staff, however their analysis may only be available to their own
membership. Weka provides access to learning with Deep learning. It is not capable of multi-relational data
mining, but there is separate software for converting a collection of linked database tables into a single table
that is suitable for processing using Weka. Another important area that is currently not covered by the
algorithms included in the Weka distribution is sequence modeling.

.
CHAPTER 1

INTRODUCTION
Weka is a collection of machine learning algoritham for data mining tasks.Weka supports
several standard functions as:
- data mining tasks
- data preprocessing
-clustering
- classification
-regression
- visualization

In law, a settlement is a resolution between disputing parties about a legal case, reached either before or after
court action begins. The term "settlement" also has other meanings in the context of law. Structured settlements provide
for future periodic payments, instead of a one time cash payment. A settlement, as well as dealing with the dispute
between the parties is a contract between those parties, and is one possible (and common) result when parties sue (or
contemplate so doing) each other in civil proceedings. The plaintiffs and defendants identified in the lawsuit can end the
dispute between themselves without a trial.
The contract is based upon the bargain that a party forgoes its ability to sue (if it has not sued already), or to continue
with the claim (if the plaintiff has sued), in return for the certainty written into the settlement. The courts will enforce the
settlement. If it is breached, the party in default could be sued for breach of that contract. In some jurisdictions, the party
in default could also face the original action being restored.
The settlement of the lawsuit defines legal requirements of the parties and is often put in force by an order of the court
after a joint stipulation by the parties. In other situations (as where the claims have been satisfied by the payment of a
certain sum of money), the plaintiff and defendant can simply file a notice that the case has been dismissed.

The majority of cases are decided by a settlement. Both sides (regardless of relative monetary resources) often have a
strong incentive to settle to avoid the costs (such as legal fees, finding expert witnesses, etc.), the time and the stress
associated with a trial, particularly where a trial by jury is available. Generally, one side or the other will make
a settlement offer early in litigation. The parties may hold (and indeed, the court may require) a settlement conference, at
which they attempt to reach such a settlement.
CHAPTER 2
OBJECTIVES AND SCOPE

The main objective of WEKA are too:

1.Make Machine Learning(ML) techniques generally available.


2.Apply them to practical problem as in labor negotition.
3.Analyze the dataset well and display results graphically.
4.Generating more clever view about labor negotition.

AREA OR SCOPE OF INVESTIGATION:

This project requires investigations in following areas:


1.candian industries
2.Best fit to predict accurate analysis.
CHAPTER 3
PROPOSED SYSTEM
METHODOLOGY

WEKA INTERFACE:
Data Mining Classification

1. Decision Tree(D-Tree)

Decision Tree is a classification method which yields output as flowchart like tree structure. The result from
D-Tree is highly interpretable, but the outcome must be represented in categorical data. In this work ,D-tree
algorithm called “J48” is applied to classify data.

2. Naive Bayes

Naïve Bayes is a simple probabilistic classifier based on Bayes theorem, with a native assumptions of
independence between every pair of features.

3. LMT(Logistic Model Trees)

Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear
regression models at its leaves to provide a piecewise linear regression model (where ordinary decision
trees with constants at their leaves would produce a piecewise constant model).In the logistic variant,
this algorithm is used to produce an LR model at every node in the tree; the node is then split using
the C4.5 criterion.
Classification Steps:

Dataset in ARFF Format

Preprocessing the dataset

Select Dataset

Choose Classifier

Turn the Train the

Evaluate the model for test dataset

Find Performance Criteria


Cross-validation:
Cross-validation is a technique that is used for the assessment of how the results of statistical analysis
generalize to an independent data set. This results in a loss of testing and modeling capability. Cross-
validation is also known as rotation estimation.
Cross validation is an extension of data split. In my understanding, the purpose of k-fold cross validation is
to test how well your model is trained upon a given data and test it on unseen data. . So, for this purpose we
use K-fold cross validation to make sure that each and every data point comes to test at-least once.

Discretize:
Data discretization converts a large number of data values into smaller once, so that data evaluation
and data management becomes very easy. One reason to discretize continuous features is to improve
signal-to-noise ratio. Fitting a model to bins reduces the impact that small fluctuates in the data has on the
model, often small fluctuates are just noise. Each bin "smooth" out the fluctuates/noises in sections of
the data.

Normalize:
n statistics and applications of statistics, normalization can have a range of meanings. In the simplest
cases, normalization of ratings means adjusting values measured on different scales to a notionally
common scale, often prior to averaging.

J48:
C4.5 (J48) is an algorithm used to generate a decision tree developed by Ross Quinlan mentioned earlier.
C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used
for classification, and for this reason, C4.5 is often referred to as a statistical classifier.

Naïve Bayes:
Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem. It is not a
single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of
features being classified is independent of each other Here is a tabular representation of our dataset.

LMT:
Logistic model trees are based on the earlier idea of a model tree: a decision tree that has linear regression
models at its leaves to provide a piecewise linear regression model (where ordinary decision trees with
constants at their leaves would produce a piecewise constant model).In the logistic variant, this
algorithm is used to produce an LR model at every node in the tree.
Dataset Used:
@relation 'labor-neg-data'
@attribute 'duration' real
@attribute 'wage-increase-first-year' real
@attribute 'wage-increase-second-year' real
@attribute 'wage-increase-third-year' real
@attribute 'cost-of-living-adjustment' {'none','tcf','tc'}
@attribute 'working-hours' real
@attribute 'pension' {'none','ret_allw','empl_contr'}
@attribute 'standby-pay' real
@attribute 'shift-differential' real
@attribute 'education-allowance' {'yes','no'}
@attribute 'statutory-holidays' real
@attribute 'vacation' {'below_average','average','generous'}
@attribute 'longterm-disability-assistance' {'yes','no'}
@attribute 'contribution-to-dental-plan' {'none','half','full'}
@attribute 'bereavement-assistance' {'yes','no'}
@attribute 'contribution-to-health-plan' {'none','half','full'}
@attribute 'class' {'bad','good'}
@data
1,5,?,?,?,40,?,?,2,?,11,'average',?,?,'yes',?,'good'
2,4.5,5.8,?,?,35,'ret_allw',?,?,'yes',11,'below_average',?,'full',?,'full','good'
?,?,?,?,?,38,'empl_contr',?,5,?,11,'generous','yes','half','yes','half','good'
3,3.7,4,5,'tc',?,?,?,?,'yes',?,?,?,?,'yes',?,'good'
3,4.5,4.5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
2,2,2.5,?,?,35,?,?,6,'yes',12,'average',?,?,?,?,'good'
3,4,5,5,'tc',?,'empl_contr',?,?,?,12,'generous','yes','none','yes','half','good'
3,6.9,4.8,2.3,?,40,?,?,3,?,12,'below_average',?,?,?,?,'good'
2,3,7,?,?,38,?,12,25,'yes',11,'below_average','yes','half','yes',?,'good'
1,5.7,?,?,'none',40,'empl_contr',?,4,?,11,'generous','yes','full',?,?,'good'
3,3.5,4,4.6,'none',36,?,?,3,?,13,'generous',?,?,'yes','full','good'
2,6.4,6.4,?,?,38,?,?,4,?,15,?,?,'full',?,?,'good'
2,3.5,4,?,'none',40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
3,3.5,4,5.1,'tcf',37,?,?,4,?,13,'generous',?,'full','yes','full','good'
1,3,?,?,'none',36,?,?,10,'no',11,'generous',?,?,?,?,'good'
2,4.5,4,?,'none',37,'empl_contr',?,?,?,11,'average',?,'full','yes',?,'good'
1,2.8,?,?,?,35,?,?,2,?,12,'below_average',?,?,?,?,'good'
1,2.1,?,?,'tc',40,'ret_allw',2,3,'no',9,'below_average','yes','half',?,'none','bad'
1,2,?,?,'none',38,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,4,5,?,'tcf',35,?,13,5,?,15,'generous',?,?,?,?,'good'
2,4.3,4.4,?,?,38,?,?,4,?,12,'generous',?,'full',?,'full','good'
2,2.5,3,?,?,40,'none',?,?,?,11,'below_average',?,?,?,?,'bad'
3,3.5,4,4.6,'tcf',27,?,?,?,?,?,?,?,?,?,?,'good'
2,4.5,4,?,?,40,?,?,4,?,10,'generous',?,'half',?,'full','good'
1,6,?,?,?,38,?,8,3,?,9,'generous',?,?,?,?,'good'
3,2,2,2,'none',40,'none',?,?,?,10,'below_average',?,'half','yes','full','bad'
2,4.5,4.5,?,'tcf',?,?,?,?,'yes',10,'below_average','yes','none',?,'half','good'
2,3,3,?,'none',33,?,?,?,'yes',12,'generous',?,?,'yes','full','good'
2,5,4,?,'none',37,?,?,5,'no',11,'below_average','yes','full','yes','full','good'
3,2,2.5,?,?,35,'none',?,?,?,10,'average',?,?,'yes','full','bad'
3,4.5,4.5,5,'none',40,?,?,?,'no',11,'average',?,'half',?,?,'good'
3,3,2,2.5,'tc',40,'none',?,5,'no',10,'below_average','yes','half','yes','full','bad'
2,2.5,2.5,?,?,38,'empl_contr',?,?,?,10,'average',?,?,?,?,'bad'
2,4,5,?,'none',40,'none',?,3,'no',10,'below_average','no','none',?,'none','bad'
3,2,2.5,2.1,'tc',40,'none',2,1,'no',10,'below_average','no','half','yes','full','bad'
2,2,2,?,'none',40,'none',?,?,'no',11,'average','yes','none','yes','full','bad'
1,2,?,?,'tc',40,'ret_allw',4,0,'no',11,'generous','no','none','no','none','bad'
1,2.8,?,?,'none',38,'empl_contr',2,3,'no',9,'below_average','yes','half',?,'none','bad'
3,2,2.5,2,?,37,'empl_contr',?,?,?,10,'average',?,?,'yes','none','bad'
2,4.5,4,?,'none',40,?,?,4,?,12,'average','yes','full','yes','half','good'
1,4,?,?,'none',?,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,2,3,?,'none',38,'empl_contr',?,?,'yes',12,'generous','yes','none','yes','full','bad'
2,2.5,2.5,?,'tc',39,'empl_contr',?,?,?,12,'average',?,?,'yes',?,'bad'
2,2.5,3,?,'tcf',40,'none',?,?,?,11,'below_average',?,?,'yes',?,'bad'
2,4,4,?,'none',40,'none',?,3,?,10,'below_average','no','none',?,'none','bad'
2,4.5,4,?,?,40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
2,4.5,4,?,'none',40,?,?,5,?,11,'average',?,'full','yes','full','good'
2,4.6,4.6,?,'tcf',38,?,?,?,?,?,?,'yes','half',?,'half','good'
2,5,4.5,?,'none',38,?,14,5,?,11,'below_average','yes',?,?,'full','good'
2,5.7,4.5,?,'none',40,'ret_allw',?,?,?,11,'average','yes','full','yes','full','good'
2,7,5.3,?,?,?,?,?,?,?,11,?,'yes','full',?,?,'good'
3,2,3,?,'tcf',?,'empl_contr',?,?,'yes',?,?,'yes','half','yes',?,'good'
3,3.5,4,4.5,'tcf',35,?,?,?,?,13,'generous',?,?,'yes','full','good'
3,4,3.5,?,'none',40,'empl_contr',?,6,?,11,'average','yes','full',?,'full','good'
3,5,4.4,?,'none',38,'empl_contr',10,6,?,11,'generous','yes',?,?,'full','good'
3,5,5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
3,6,6,4,?,35,?,?,14,?,9,'generous','yes','full','yes','full','good'
CHAPTER
RESULT AND
DISCUSSIONS
Labor-negotition Dataset using WEKA
Screenshot Captured:

Pre-Processing done using normalize filter :


Pre-Processing done using discretize filter:
Pre-Processing done using replace missing values filter:
Classifier Decision Stump used for classification with 84% of accuracy:

Classifier Naïve Bayes used for classification with 98% of accuracy:


Classifier Logistic used for classification with 100% of accuracy:

Cross validation performed on Naïve Bayes :


Cross Validation performed on Logistics:

Cross Validation performed on Decision Stump:


Overall Analysis of Classification Done:

Sr Classifier Instances Instances Overall


no used correctly incorrectly Accuracy
Classified classified
1. Decision Stump 48 9 84%
2 Naïve 56 1 98%
Bayes
3. logistic 57 0 100%
4. Cross 46 11 80.70%
validation
on
Decision
Stump
5 Cross 51 6 89%
validation
on
Naïve
Bayes
6. Cross 53 4 92%
validation
on
Function
Logistic

So ,we have concluded that as Function Logistic Algorithm works out best for our
labor negotiation dataset analysis giving the accuracy of 100%,hereby is considered
to be suitable enough for analyzing out given dataset.
CHAPTER 5
ADAVANTAGES AND DISADVANTAGES

ADVANTAGES:
1. Free available under the GNU General Public License.
2.Portability,Since it is fully implemented in java programming languages
3.Runs on almost any modern computing platform
4. Ease of use due to its graphical user interface.

DISADVANTAGES:
1. It can only handle small datasets.
2.Blockchain can be a thing to be consider.
3. Using it via command line is a pain without read line capability of the shell.
CHAPTER 6
CONCLUSION

Finally after all analysis we obtained the result for the corresponding dataset. We analysis that function
logistic is the best classification algorithm analyzed, it’s then followed by naive bayes and decision stump
with the approximate accuracy nearby to function logistic. But at some point both Naïve Bayes and
decision stump shows same level of accuracy .
,we have concluded that as Function Logistic Algorithm works out best for our labor negotition analysis
giving the accuracy of 100%,hereby is considered to be suitable enough for analyzing out given dataset.
REFRENCES

1. https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/labor.arff

2. https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf

3. https://courses.soe.ucsc.edu/courses/tim245/Spring12/01/pages/attached-files/attachments/11549

You might also like