Professional Documents
Culture Documents
2
Why Orange?
Introduction
Open Source
Orange is component based visual
programing software for data mining.
Component based
machine learning and data analysis
No programming
Supports communication between data
Data visualization
scientists and domain experts. Platform independent software
Allows clustering and classification
Data mining through visual programming
and python scripting
You can get orange software from this link:
https://orange.biolab.si/getting-started/
3
Getting Started With ORANGE!!
4
sss
6
Dataset: Heart Disease
● Has 303 instances ATTRIBUTES
● 13 attributes ● Narrowing diameter
● Categorical class with 2 ● Cholesterol
values (0,1) ● Chest pain
● In .csv format ● Rest ECG
● Source: pre loaded ● Fasting blood sugar
datasets of Orange. ● Max HR
● Age,gender and more
.
. 7
Dataset: How following factors cause
Heart Disease?
● Age: heart disease increases with age greater than 65
● Fatty deposits called plaques also collect along your artery walls
● Slow the blood flow from the heart
● Causing coronary heart diseases.
● Gender: Heart disease is leading cause of death for both men and women.
8
● Aangina: is chest pain or discomfort caused when your heart muscle doesn't
get enough oxygen-rich blood.
● Diameter Narrowing:
● Heart disease is caused by the narrowing or blockage of the coronary arteries.
● Target attribute (0,1)
9
Loading data file into data table:
11
● Distributions
12
● Distributions
13
14
“
15
Selected Algorithm
Algorithms:
● Neural Network
● KNN
● Random Forest
● Naïve Bayes'
● Logistic Regression
● Decision Tree
16
Experimental
Setup
This is how we drag and drop the widgets and
implements our algorithms
17
KNN(k nearest neighbor)
KNN is non-parametric method used for classification and regression.
Requires three things
18
19
20
21
22
Decision tree
Used to visually and explicitly represent decisions and decision making.
predictive modelling approaches used in:
statistics, data mining and machine learning
m
Entropy( D) pi log 2 ( pi )
i 1
23
24
25
26
27
28
29
30
Naïve Baye's
Also known as Naive Bayes Classifiers.
Attributes are statistically independent on one another.
Unlike other classifiers for a given class
There will be some correlation between features.
Explicitly models the features as conditionally independent given the class.
P(X|H)(P H
P(H|X) = 𝑃(𝑋)
31
32
33
34
35
Random Forest
It is a flexible and simple
Random Forest algorithm avoid the over fitting problem.
Used for identifying the most important features from the training dataset.
36
37
38
39
40
Logistic Regression
Used to assign observations to a discrete set of classes.
Logistic regression can be binomial, ordinal or multinomial.
Binary (Pass/Fail)
Multi (Cats, Dogs, Sheep)
Ordinal (Low, Medium, High)
41
42
43
44
Neural Network
Neural networks is learning algorithms.
Interpret sensory data
Through a kind of machine perception, labeling or clustering raw input.
Consist of different layers for analyzing and learning data.
Math equation :
f(X)=b+∑iwixi
45
46
47
48
49
Concluding
Results
50
Table to compare data
Recall Precision F-Measures
51
52
53
54
References:
https://www.youtube.com/watch?v=pYXOF0jziGM&index=6&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://www.youtube.com/watch?v=bp0VtVS3LN4&index=9&list=PLmNPvQr9Tf-
ZSDLwOzxpvY-HrE0yv-8Fy
https://orange.biolab.si/getting-started/
https://en.wikipedia.org/wiki/Random_forest
https://en.wikipedia.org/wiki/Decision_tree_learning
55
Thanks!
Any questions?
56
Want big impact?
Use big image.
57