73 views

Uploaded by Deyaa Eldeen

South Africa Heart Disease problem
Pattern Recognition Project

- Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?
- Pollock 2005 SPSS i 1 3
- Sigma Plot Manual
- Week1 SuggestedProblems Chapter 4
- Gross Domisitic Investment Growth Effeects on Growth of Some Micro and Macro Variables in Jordanian Economics
- apparel
- Worded Problems
- Sound Report
- 12-3820
- KN 335 Topic.docx
- linear regression approach to academic performance
- 22SPE_20592_ Well Analisis Tes YPFB Monograph 5
- Analytical Tools Thesis
- 19.- The Benefit- A mil-Written Entrepreneurial Business Plan is to an Entrepreneur What a Midwife is to an Expecting Mother.pdf
- biost 311 final project
- 6_1_effect of Family Structure on Materialism
- Journal English
- Binary Logistic Regression Lecture 9
- Introductory%20Guide%20to%20using%20Stata
- Research Methodology, Wulan

You are on page 1of 7

Omar M. Osama

Abstract

This report presents a study of various classication methods ap-

plied on South Africa heart disearse problem. It is found that the

neural network method is much better, i.e., the error down in range

between 15.5% and 16.5%.

1 Introduction

This report presents three dierent classiers that estimate the response of

the South Africe heart disearse problem. In this problemt there are nine

features, systolic blood pressure, cumulative tobacco, low densiity lipopro-

tein, cholesterol, adiposity, family history of heart disease (Present, Absent),

type-A behavior, obesity, current alcohol consumption and age at onset. And

the response is coronary heart disease. The classiers that are used is Linear

Discriminant Analysis, Logistic Regression, and Neural Netwrk.

Before talking about LDA we have to talk about log-likelihood ratio.

(C2 ) C

C2 (C1 )

1

(C2 ): is the loss of choosing C2 as a prediction.

(C1 ): is the loss of choosing C1 as a prediction.

1

so if (C2 ) > (C1 ) we assign to C1

and if (C2 ) < (C1 ) we assign to C2 . Which makes a lot of sense.

since (C1 ) = L12 P (C1 |X = x) + L22 P (C2 |X = x)

and (C2 ) = L11 P (C1 |X = x) + L21 P (C2 |X = x).

L12 : is the loss when you assign to C2 but it is C1 .

and L21 : is the loss when you assign to C1 but it is C2 .

L11 and L22 is right decisions.

so

L12 P (C1 |X = x)+L22 P (C2 |X = x) C

C2 L11 P (C1 |X = x)+L21 P (C2 |X = x)[Lii = 0usualy]

1

after calculating

P (C1 |X = x) C1 L21

P (C2 |X = x) C2 L12

P (X |C1 ) 1 /P (X) C1 L21

P (X |C2 ) 2 /P (X) C2 L12

P (X |C1 ) C1 2 L21

P (X |C2 ) C2 1 L12

f1 (X) C1 2 L21

f2 (X) C2 1 L12

f1 (X)

ln C

C2 th

1

f2 (X)

h (X) C

C2 th

1

(1)

so:

if f1 (X) > th we assign to C1 where if f2 (X) > th we assign to C2 . Which

makes alot of sense as we assign according to the greatest prior.

Now let's begin with LDA.

2

2.2 How does it work?

Then estimate for both classes, assuming that both of them comming from

the same distribution. (unlike QDA). Then we nd h (X) from the equation

1

h (X) = X0 1 (T2 T1 ) + (2 1 T2 1 1 T1 ) (2)

2

using equation (1). We can predict which class each observation belongs

to.

Although of the word regression but it is a classication method.

currence of an event by tting data to a logit function logistic curve.

p (G = 1|X = x)

log = X

p (G = 2|X = x)

and from X we can nd the probability p (G = 1|X = x) (posterior

probability) by using the sigmoid function

1

p (G = 1|X = x) =

1 + exp(X)

Why logistic regression rather than linear regression ?

Because in linear rgression when X moves far enough on the X-axis.

Such values are theoretically unacceptable.

It is a very simple and easy way from the rst glance but the problem

is .

late it we use this equation:

N

X

L() = {yi log (pi ) + (1 yi ) log (1 pi )}

i=1

3

N

X

L() = {yi xi log (1 + expxi )}

i=1

function.

N

() X

= xi (yipi ) = 0

i=1

the second-derivative or hessian matrix.

N

2 () X

T

= xi xTi pi (1 pi )

i=1

to nd . So we set old to some value (zero is acceptable unlike neural

networks) and update new then use new as old and so on.

p1

p2

where p = ..

where pi is p (G = 1|X = xi ) (posterior probabil-

.

pN

ity)

p1 (1 p1 )

0 0

... ..

0 p2 (1 p2 ) .

and W = .. ... ...

.

0

0 0 pn (1 pn )

Axiomatic question When will we stop?

When the very rst = 0. Knowing that p (G = 1|X

old

= x) =

0.5

0.5

1

. We can nd an initial posterior for all observations. p = ..

.

1+exp(X)

.

0.5

With this p vector we can calculate score function and hessian matrix and

4

devide the rst over the second then nd new new . Do it again and so on.

The algorithm tunes value to minimize score. When score is close to zero

then new w old . When that happens satises the model, Which makes

alot of sense.

We train on 300 observations and according to that we found that 9 fea-

tures are small. We decided to incease complexity by taking a non-linear

combination of some features. Because linear combination will make the

matrix singular. We choose those features according to its correlation with

response. When the correlation value is large, response will almost feel the

change in the trasformed feature. Which makes a lot of sense. In our case

the squared sbp and typea, and Multiplying the alcohol with age increase

the performance extremely which has no proof or mathematical equations,

It is measured by sense.

4.1 What is Neural Network?

M

!

X

(2) (2) (1)

Yk = (2) w0k + wmk (1) wm0 + Wm X

m=1

M

!

(2)

X (2)

Yk = (2) w0k + wmk Zm

m=1

stand the meaning of NN. Les't simpilfy those equations.

X : The data Matrix.

W (1) and W (2) : The weights matrix.

M : The number of nuerons.

(1) and (2) : Non-linear functions.

Yk :The output.

5

The main idea is to project the data on the weights directions. Which makes

us look to data from diferent sides. Because data could be more understand-

able from that side.

(1) is the sigmoid function. Which adds exibility to the model. If we

use linear function instead of sigmoid function, the large values in x-axis will

be theoritically unacceptable.

(2) is the identity function or soft-max function, for regression or classi-

cation respectively.

We train on some data with some weights [w1 w2 . . . wI ] and test on another

using wi which minimize error. Hoping that we trained on some data which

are so close to the population. The question now is how to nd that w.

N

X

E(w) = (y (xi , w) yi )2

i=1

N M

! !2

X X

(2) (2) (1)

E(w) = (2) w0k + wmk (1) wm0 + Wm X yi

i=1 m=1

simplest network which has 1 input, 1 neuron 1 and output has 4 w. So there

in no closed form solution to solve the network. second the error function

is non-convex that means local minimum exists; however, we seek the global

one. So we have to solve it numerically.

tansig function used as (2) instead of softmax which works in a very strange

manner. Training stops when the performance gradient falls below 104

6

4.3.1 Before Regularization

Figure 1:

Best error before regularization

The best error is in range between 16.5% and 17% after many tries.

Figure 2:

The best error in range between 15.5% and 16.5 after many tries.

- Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?Uploaded bymirando93
- Pollock 2005 SPSS i 1 3Uploaded byAlehandra Uifalean
- Sigma Plot ManualUploaded byMarta Muñoz Ibañez
- Week1 SuggestedProblems Chapter 4Uploaded byjessssssse
- Gross Domisitic Investment Growth Effeects on Growth of Some Micro and Macro Variables in Jordanian EconomicsUploaded byAlexander Decker
- apparelUploaded byTodor Stojanov
- Worded ProblemsUploaded byNeill Sebastien Celeste
- Sound ReportUploaded byChandrashekhar Katagi
- 12-3820Uploaded byMarhadi Leonchi
- KN 335 Topic.docxUploaded bymariam ameer
- linear regression approach to academic performanceUploaded byDessy
- 22SPE_20592_ Well Analisis Tes YPFB Monograph 5Uploaded bycris1515
- Analytical Tools ThesisUploaded byRakib Hossain
- 19.- The Benefit- A mil-Written Entrepreneurial Business Plan is to an Entrepreneur What a Midwife is to an Expecting Mother.pdfUploaded byMARIA JOSE GUERRERO VASQUEZ
- biost 311 final projectUploaded byapi-301743957
- 6_1_effect of Family Structure on MaterialismUploaded byLamija Nukic
- Journal EnglishUploaded byTaufik Hidayat
- Binary Logistic Regression Lecture 9Uploaded byTrongtin Lee
- Introductory%20Guide%20to%20using%20StataUploaded byMarius Argetoianu
- Research Methodology, WulanUploaded byFariz Achmad Sutantyo
- Report_BAUploaded bySiddhanth Hiremath
- Ch Logistic Regression GlmUploaded byChristian Beren
- Longitudinal Research AddictionUploaded byAlbertoRivaldi
- msr18ciUploaded bykidaikidai
- (Chapman & Hall Texts in Statistical Science Series) A. A. Afifi, V. Clark - Computer-Aided Multivariate Analysis-Chapman & Hall (1996).pdfUploaded bydacsil
- J Public Health 2009 Erem 47 58Uploaded byutarinu
- DGS Vehicle Ownership 2007Uploaded byIffat Ara Mustaque
- jslhr_53_6_1440El uso de claves acústicas por niños con implantes cocleares.pdfUploaded byNisayet Galvan Perez
- Ho 4.Glmpath1Uploaded byS.L.L.C
- nice_12109Uploaded byImanuel Simanjuntak

- Secure framework for distributed privacy-preserving machine learning using fully homomorphic cryptosystems and differential privacyUploaded byDeyaa Eldeen
- Sabry - 1998 - What is a Purely Functional LanguageUploaded byDeyaa Eldeen
- Chapter 1Uploaded byDeyaa Eldeen
- BoostUploaded bySeven Nguyen
- South Africa Heart Disease ProblemUploaded byDeyaa Eldeen

- Development of Statistical Quality Assurance Criterion for Concrete Using Ultasonic Pulse Velocity MethodUploaded byZiyad12
- 85857385-SQC-Chapter-6-11Uploaded byqwert 12345
- 19 NorshahidaUploaded byJohn Vargas
- ARIMA Model.pptUploaded byAmado Saavedra
- Load forecastingUploaded bypowerman619
- Soberon&Nakamura.2009Uploaded byEdgar Saavedra
- GlossaryUploaded by29_ramesh170
- Ch11 ProbabilityUploaded byDan
- StatisticsUploaded byJustine Ruiz
- Chap5 Alternative ClassificationUploaded byanupam20099
- Hydro-Meteorological Trends in the Upper IndusUploaded byMohammed Sharif
- Aust Solar DataUploaded byJOHNDOVE7
- Introduction to Econometrics, Tutorial (7)Uploaded byagonza70
- T544Uploaded byrafael_figueroa
- IWMS 2016 talksUploaded bygyjak
- Statistics ReviewUploaded byJohn_2998
- Real GasesUploaded byJamesBanglagan
- doc 8991 Forecasting EnUploaded byMohamed Abodabash
- Lecture 1Uploaded byJatin Kamboj
- System Identification without Lennart Ljung: what would have been different?Uploaded byAnonymous vcdqCTtS9
- Contribution of Slx to DialectologyUploaded byilafour
- Cumulative Distribution Function - WikipediaUploaded bySagarSaren
- 11_MANOVAUploaded bySunil Kamat
- Processing of Yield Map DataUploaded byLuis Eleno
- Co-Relation Between Determination of Skeletal Maturation Using Cervical Vertebrae and Dental Calcification StagesUploaded byCezar Trascu
- Stock Watson 20 Years of Time Series Econometrics in 10 PicturesUploaded byjc224
- Chap9 - Examples Robust RegressionUploaded byjcmani1
- Investigating Trig GraphsUploaded bylittlegus
- The Odd Generalized Exponential Log Logistic DistributionUploaded byinventionjournals
- gmf revision sheet 2 november 2010Uploaded byapi-278036025