You are on page 1of 57

Orange

Data Mining Tool


Presentation
Group Members:

•Name Registration Number

2
Why Orange?
Introduction
 Open Source
 Orange is component based visual
programing software for data mining.
 Component based
 machine learning and data analysis
 No programming
 Supports communication between data
 Data visualization
scientists and domain experts.  Platform independent software
 Allows clustering and classification
 Data mining through visual programming
and python scripting
You can get orange software from this link:
https://orange.biolab.si/getting-started/

3
Getting Started With ORANGE!!

4
sss
6
Dataset: Heart Disease
● Has 303 instances ATTRIBUTES
● 13 attributes ● Narrowing diameter
● Categorical class with 2 ● Cholesterol
values (0,1) ● Chest pain
● In .csv format ● Rest ECG
● Source: pre loaded ● Fasting blood sugar
datasets of Orange. ● Max HR
● Age,gender and more
.

. 7
Dataset: How following factors cause
Heart Disease?
● Age: heart disease increases with age greater than 65
● Fatty deposits called plaques also collect along your artery walls
● Slow the blood flow from the heart
● Causing coronary heart diseases.
● Gender: Heart disease is leading cause of death for both men and women.

8
● Aangina: is chest pain or discomfort caused when your heart muscle doesn't
get enough oxygen-rich blood.

● Cholesterol: When there is too much cholesterol in your blood.


● it builds up in the walls of your arteries
● causing a process called atherosclerosis(heart disease),

● Diameter Narrowing:
● Heart disease is caused by the narrowing or blockage of the coronary arteries.
● Target attribute (0,1)

9
Loading data file into data table:

11
● Distributions

. EDA: Exploratory data analysis

12
● Distributions

13
14

15
Selected Algorithm
Algorithms:

● Neural Network
● KNN
● Random Forest
● Naïve Bayes'
● Logistic Regression
● Decision Tree

16
Experimental
Setup
This is how we drag and drop the widgets and
implements our algorithms

17
KNN(k nearest neighbor)
KNN is non-parametric method used for classification and regression.
Requires three things

 The set of stored records.


 Distance Metric to compute distance between records.
 The value of k, the number of nearest neighbors to retrieve Unknown record

Math equation: d(p,q) = √Σ(pi – 𝒒𝒊)𝟐

18
19
20
21
22
Decision tree
 Used to visually and explicitly represent decisions and decision making.
 predictive modelling approaches used in:
 statistics, data mining and machine learning

m
Entropy( D)   pi log 2 ( pi )
i 1

23
24
25
26
27
28
29
30
Naïve Baye's
 Also known as Naive Bayes Classifiers.
 Attributes are statistically independent on one another.
 Unlike other classifiers for a given class
 There will be some correlation between features.
 Explicitly models the features as conditionally independent given the class.

P(X|H)(P H
P(H|X) = 𝑃(𝑋)

31
32
33
34
35
Random Forest
 It is a flexible and simple
 Random Forest algorithm avoid the over fitting problem.
 Used for identifying the most important features from the training dataset.

 It can be used for both classification and regression tasks.

36
37
38
39
40
Logistic Regression
 Used to assign observations to a discrete set of classes.
 Logistic regression can be binomial, ordinal or multinomial.
 Binary (Pass/Fail)
 Multi (Cats, Dogs, Sheep)
 Ordinal (Low, Medium, High)

 Can view probability scores underlying the model’s classifications.

41
42
43
44
Neural Network
 Neural networks is learning algorithms.
 Interpret sensory data
 Through a kind of machine perception, labeling or clustering raw input.
 Consist of different layers for analyzing and learning data.
Math equation :
f(X)=b+∑iwixi

45
46
47
48
49
Concluding
Results

50
Table to compare data
Recall Precision F-Measures

Neural Network 0.813 0.814 0.814

Logistic Regression 0.848 0.848 0.848

Random forest 0.807 0.807 0.807

51
52
53
54
References:
https://www.youtube.com/watch?v=pYXOF0jziGM&index=6&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://www.youtube.com/watch?v=bp0VtVS3LN4&index=9&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://orange.biolab.si/getting-started/
https://en.wikipedia.org/wiki/Random_forest
https://en.wikipedia.org/wiki/Decision_tree_learning

55
Thanks!
Any questions?

56
Want big impact?
Use big image.
57

You might also like