You are on page 1of 8

MIT 402 CAT 2.

1.State what each line of the following R code does and discuss the expected output.
remove(k,no,r,a)

This line of code doesn't do anything, as "remove" is not a built-in function in R. It may be a custom
function defined by the user, but without knowing its contents, it's impossible to say what it does.

k=12086

no=328

r=0.062

tmid=57.7

a=(k-no)/no

These lines of codes define some variables used later in the code. k, no, and r are parameters used to
create a logistic curve (more on that in a bit), while tmid is the midpoint of the curve, and a is a
coefficient that depends on k and no. The values assigned to these variables represent specific
parameters for the curve being plotted.

x=c(1:140)

y=k/(1+(a*exp(-r*x)))

These lines of code create two vectors of data that will be used for plotting. x is a vector of integers from
1 to 140, representing the number of days since the 50th case was confirmed. y is a vector of numbers
calculated using the logistic equation, where k is the maximum number of confirmed cases, a and r are
the parameters defined earlier, and x is the number of days since the 50th case was confirmed.

plot(x,y,col="blue",xlim=c(1,140),ylim=c(0,k+.2*k),

+ xlab="Days since case 50",

ylab="Total confirmed cases",pch=".",cex=0.7,

+ main="COVID-19 Algeria",col.main="purple")

This line of code creates the initial plot of the logistic curve, using the data from x and y. The col
parameter sets the color of the points to blue, while xlim and ylim set the limits of the x and y axes. xlab
and ylab set the labels for the x and y axes, respectively. pch sets the symbol used for the points, while
cex controls the size of the points. main sets the main title of the plot, while col.main sets the color of
the title.

lines(x,y, col="blue",lwd=1.5)

This line of code adds a line to the plot, connecting the points from the logistic curve. col sets the color
of the line to blue, while lwd sets the line width to 1.5.
lines(Days,alg,lty=2,lwd=3,col="red2")

This line of code adds another line to the plot, but the variables Days and alg are not defined in the code
provided, so it's unclear what this line of code does.
b) Study the following output and answer the questions:
i. Write R code that yields the output. (2 marks)

# Create a data frame with five variables

df <- data.frame(temp= rnorm(20,5:25),

medage = rnorm(20,20:40),

lnpci = rnorm(20,7:10),

lntdths50 = rnorm(20,3:9),

lntcss50 = rnorm(20,6:11))

# Create a matrix of scatter plots

pairs(df)

ii. State the purpose of the output. (2 marks)

The R code first creates a data frame called "df" with five variables: "temp", "medage", "lnpci",
"lntdths50", and "lntcss50". The "rnorm" function is used to generate random values for each variable.
The first argument for "rnorm" specifies the number of values to generate, which is set to 20 in this
case. The second argument specifies the mean for the random values, which is set to a sequence of
integers from 5 to 25 for "temp", 20 to 40 for "medage", 7 to 10 for "lnpci", 3 to 9 for "lntdths50", and 6
to 11 for "lntcss50". The third argument specifies the standard deviation for the random values, which is
not explicitly stated and therefore uses the default value of 1. Once the data frame "df" is created, the
code uses the "pairs" function to create a matrix of scatter plots. This function takes a data frame as its
argument and generates a scatter plot for every possible pair of variables in the data frame. In this case,
there are five variables, so the resulting matrix of scatter plots will be 5 x 5. Each scatter plot shows the
relationship between two variables by plotting their values on the x and y axes. The diagonal of the
matrix displays a histogram for each variable. The resulting scatter plots and histograms can provide
insights into the relationships and distributions of the variables in the data frame

.iii. Which of the variables may have some significant linear relationship? (2 marks)

iv. In multiple linear regression, if lnpci is the response variable, which 2 independent variables will
yield a strong model? Temp and medage

Question five: 20 marks


a) The Blood alcohol level BAL was measured for 10 students each of whom had taken a
given number of bottles of the same brand of beer.

Make a scatter plot and a least squares regression line. Test the hypothesis that another beer raises
your BAL by 0.02 percent against the alternative that it is less. (4 marks)
mydata<-
data.frame(Beers=c(5,2,9,8,3,7,3,5,3,5),BAL=c(0.10,0.03,0.19,0.12,0.04,0.095,0.07,0.06,0.02,0.06))

View(mydata)

library(ggplot2)

ggplot(mydata,aes(x=Beers,y=BAL))+geom_point()
+geom_smooth(method="lm",se=FALSE,col="red")
t.test(mydata$BAL,alternative="less",mu=0.02/100,pa
ired=FALSE,var.equal=TRUE,)

To test the hypothesis that another beer raises your BAL by 0.02% against the alternative that it is
less,we used a one sided t-test with a significance level of 0.05. we found out that our p-
value=0.9996 which is greater than our significance level=0.05,therefore,we fail to reject the null
hypothesis and conclude that there is no enough evidence to support the alternative hypothesis.This
means we cannot conclude that another Beer raises the BAL by less than 0.02%.
b) For the same Blood alcohol data, do a hypothesis test that the intercept is 0 with a
two-sided alternative. (5 marks)
model<-lm(BAL~Beers,data=mydata)

summary(model)

c) From some analysis in R, The following output was derived. Use it to answer the questions that
follow
d)Write R code that could yield the output. (5 marks)

e) Discuss the information in of each of the four plots and give a comparative analysis (6 marks)

You might also like