V S Prasannakumar Mamidala

CBA Batch 3 Section B

Student ID: 71420100

from the standard linear regression model?

Linear regression using Ordinary Least Squares (OLS) method assumes that the

data values are normally distributed and it can take any values whether it is

positive or negative, integer or fraction which is not the case with count

data which will only have integer values.

Also as professor mentioned during the class, the data are a random sample

of the population where the errors are statistically independent from one

another, need to satisfy Homoscedasticity (equality of variances of data

points) and Normality of errors (for purposes of hypothesis testing) in case of

Linear regression using Ordinary Least Squares (OLS) method. These

assumptions may not hold true for count data.

Count data tend to take Poisson distribution, but not normal distribution as

assumed by Linear regression using Ordinary Least Squares (OLS) method .

2(a) Analyze data to study the effects of Drug and physician Age on the number

of prescriptions. Also, study the effect of interaction between Drug & Age.

(i) Poisson Model with both drug & age

Log(Pres) = 0 + D1 * (Drug B) + D2 * (Age O) + D3 * (Age Y)

2(a) Analyze data to study the effects of Drug and physician Age on the number

of prescriptions. Also, study the effect of interaction between Drug & Age.

(ii) Poisson Models with NULL model, drug & age individually make a note of residual deviance

2(a) Analyze data to study the effects of Drug and physician Age on the number

of prescriptions. Also, study the effect of interaction between Drug & Age.

(iii) Negative Binomial Regression Model

In

In this

this case,

case, the

the goodness

goodness of

of fit

fit shows

shows p

p of

of 0.33;

0.33; Note

Note AIC

AIC =

= 53

53

Comparison:

Residual Deviance(NB) < Residual Deviance (Poisson)

AIC(NB) < AIC(Poisson)

By this, we can say that : As compare to Poisson regression,

Negative binomial regression fits the data better.

(b) Suppose that there are two missing values in the above data in the following way:

one value missing in second coumn - 5th row and other value missing in third coumn - 8th row.

Apply a suitable regression technique to impute the missing values.(15 marks)

regression value using which we can compute missing

value of prescriptions in row 5, given the values of Drug &

Age are known.

5 will be 39

Similarly, using the same equation, we can calculate the

missing value of drug to be A. The same value is also

obtained using K-nearest neighbor (KNN) technique.

