21 views

Uploaded by Francis Tan

Data Analysis Note

- jurnal epid uas
- Mh 3511 Midterm 2018 So Ln
- dae.pdf
- AgainstAllOdds_StudentGuide_Unit06
- 919 Hypothesis
- Ringkasan UAM Modelling
- SM
- Yarensky Final Exam Part 1
- 12 Chapter 4
- Statistics solution based on Histogram and Bar graph
- statistics final 12-10-12
- Intro
- some.pdf
- Nursing Research
- lu_cui
- AP Statistics Multiple Choice Practice
- Course Notes Descriptive Statistics (1)
- Capm Questions
- Standard deviation
- 919 f 0 Hypothesis

You are on page 1of 4

1. Circle T (True) or F (False) for each of the following statements. (15 pts. 3 pts for a correct

answer, -1 pt for an incorrect answer and 0 for no answer. The minimum total pts= 0.)

(a) The interquartile range can be used to construct a robust estimate for the location of a distribution.

[T / F]

(b) The fact that the mean is less than the median often indicates a left-skewed distribution.

[T / F]

(c) To compare distributions of two data sets using a qqplot, e.g., > qqplot(data1, data2), a straight line

pattern presented in the qqplot implies that both data sets have nearly identical distributions.

[T / F]

(d) If (2.23, 4.57) is a 90% conﬁdence interval for the mean GPA of NTU year-4 students in 2016, then a

randomly selected student from this cohort has a GPA between 2.23 and 4.57 with probability 90%.

[T / F]

(e) A student applies the t-test for testing H0 : µ = 1 vs HA : µ > 1. If the t-test statistic is obtained

as t = 1.28 based on a random sample of size n, then the p-value of the test can be calculated by

1 − P (T < 1.28|T ∼ tn−1 ).

[T / F]

Solution: F, T, F, F, T

2. Write R code to complete the following tasks (35 pts. 5 pts each. For a question requiring

interpretations, 3pts for code and 2 pts for interpretation).

(a) Suppose there is a matrix B in the R console, write R code to subtract each row of B by its row’s

mean value and name the resulting matrix as C (with its row sums equal to zeros).

Solution:

> rm = rowM eans(B)

# or

> rm = apply(B, 1, mean)

# or

> rm = c(mean(B[1, ]), mean(B[2, ]))

> C = B − rm

1

(b) Write R code to i) generate 1000 random values from the standard normal distribution and put

them in variable x; ii) calculate the mean, variance, standard deviation and IQR of x; iii) visualize

the distribution of x using histogram and then plot the density of the standard normal distribution

between - 3 and 3 over it as shown in the ﬁgure below.

Histogram of x

0.4

0.3

Density

0.2

0.1

0.0

−3 −2 −1 0 1 2 3

Solution:

> x = rnorm(1000)

> mean(x); var(x); sd(x); IQR(x)

> hist(x, f req = F )

# or

> hist(x, breaks = 14, f req = F )

(c) Two sets of daily maximum temperature data T emp1 and T emp2 were collected from two cities,

respectively, over the past 3 months. Suppose that variables T emp1 and T emp2 exist in the R

console. Explain how to graphically check which of the two samples is more likely to be normally

distributed and write the corresponding R code.

Solution:

> qqnorm(T emp1)

> qqline(T emp1)

> qqnorm(T emp2)

> qqline(T emp2)

For both plots, the one which exhibits a pattern closer to the corresponding reference line will be

more likely to be normally distributed.

(d) A previous health study had found that 5% of the population suﬀered from a blood disease. 23 people

randomly selected from an area near a mobile phone transmitter have a medical exam recently and 3

of them are found to have that disease. The locals believe that the transmitter increases the likelihood

of having the disease. A researcher wants to perform a formal hypothesis test on whether the mobile

phone transmitter increases the incidence of the disease. State H0 and HA in this test, write R code

to test the hypotheses and explain how you can make a decision and draw a conclusion.

Solution:

> prop.test(3, 23, p = 0.05, alternative = “greater”)

H0 : p = 0.05 vs H1 : p > 0.05.

Decision: if the p-value provided by the prop.test is less than α, then we reject H0 at level α.

otherwise do not reject.

Conclusion: When rejecting H0 , we conclude that the mobile phone transmitter increases the inci-

dence of the disease signiﬁcantly. Otherwise, we conclude that there is no signiﬁcant evidence that

the mobile phone transmitter increases the incidence of the disease.

2

(e) Variable “Score” in the R console contains math exam scores of a random sample of 120 primary-

ﬁve students. Write R code to construct a 98% conﬁdence interval for the mean math score of the

underlying population. State necessary assumption(s) required in your construction of the conﬁdence

interval.

Solution: Use a t-conﬁdence interval:

> t.test(Score, conf.level = 0.98)

To use the t-interval, we need an distribution assumption that the variable Score is normally dis-

tributed.

Or we can use a z-interval:

> L = mean(Score) − qnorm(0.99) ∗ sd(Score)/sqrt(120)

> R = mean(Score) + qnorm(0.99) ∗ sd(Score)/sqrt(120)

> cat(“98%CI = ”, “(”, L, “, ”, R, “)”)

We need assumptions for using the CLT in constructing the z-interval only. That is, the sample size

n is large enough and the distribution of variable Score has ﬁnite the ﬁrst and second order moments.

Note that it is not necessary to use the normal distribution assumption in this case.

(f) If you have a set of data collected from a study, which aims to explore whether the hair color and

gender are dependent. The raw data are given in the following table. State the null and alternative

hypotheses in this problem and write R code to perform an appropriate hypothesis test based on the

raw data.

Subject

1 2 3 4 5 6 7 8 9 10

Haircolor BRN BRN BLOND BRN RED RED BLOND BRN RED BLOND

Gender F M M M M F M M F F

Solution:

> gender = c(“F ”, “M ”, “M ”, “M ”, “M ”, “F ”, “M ”, “M ”, “F ”, “F ”)

> hair = c(“BRN ”, “BRN ”, “BLON D”, “BRN ”, “RED”, “RED”, “BLON D”, “BRN ”, “RED”, “BLON D”)

> D = table(gender, hair)

> chisq.test(D)

H0 : Hair color and gender of a subject are independent

against

H1 : Both are dependent.

3

(g) Breast cancer patients receiving treatment-M followed by chemotherapy were matched to each other

on age and cancer stage. By random assignment, one patient in each matched pair received both

chemotherapy and treatment-M, while the other patient in each matched pair received chemotherapy

only. After 5 years followup, data collected from this study are summarized in the table below.

Chemo only

Survived 5 years Died within 5 years

Chemo+M Survived 5 years 510 17

Died within 5 years 5 90

To answer the question “Does survival to 5 years diﬀer by treatment group?”, one applied McNemar’s

test in R and obtained the following output:

> mcnemar.test(cancer, correct = F )

M cN emar′ s Chi − squared test

data : cancer

M cN emar′ s chi − squared = 6.5455, df = 1, p − value = 0.01052

ii) What conclusion can you draw from the given output at level α = 0.05?

Solution: H0 : The 5-year survival status and the treatment type are independent.

HA : Both variables are dependent.

Decision: Reject H0 by the p-value=0.01 < α = 0.05. Or by the rejection region method: Chisq

statistic= 6.54 > χ21 = 3.84.

Conclusion: The data provide evidence that the 5-year survival status depends on the treatment type

as the treatment of chemo combined with an extra treatment-M results in a diﬀerent survival status

from that of the treatment with chemo alone.

- jurnal epid uasUploaded byAmira Alhadar
- Mh 3511 Midterm 2018 So LnUploaded byFrancis Tan
- dae.pdfUploaded bytantiba
- AgainstAllOdds_StudentGuide_Unit06Uploaded byKirandeep Singh
- 919 HypothesisUploaded byJasper Villaflor Paulite
- Ringkasan UAM ModellingUploaded byIotalaseria Putu
- SMUploaded byifa afifah
- Yarensky Final Exam Part 1Uploaded byKevin Fisher
- 12 Chapter 4Uploaded byJay Patel
- Statistics solution based on Histogram and Bar graphUploaded byAssignmentstore
- statistics final 12-10-12Uploaded byapi-212148153
- IntroUploaded byGeorge Ataher
- some.pdfUploaded byKhalidBashirMirza
- Nursing ResearchUploaded byRachel Cabiguen
- lu_cuiUploaded bynieotyagi
- AP Statistics Multiple Choice PracticeUploaded byDr. Drift
- Course Notes Descriptive Statistics (1)Uploaded byFelipe Ferrucio
- Capm QuestionsUploaded byyen
- Standard deviationUploaded byfaisal
- 919 f 0 HypothesisUploaded byApeksha Bagchi
- o1106Uploaded byAbiola Usman
- Abhiraj Biswasairline OutputUploaded byAbhiraj Biswas
- ProblemSet01.pdfUploaded byMashiat Mutmainnah
- STA6166_FinalExam_Ramin_Shamshiri_Part2_CUploaded byRaminShamshiri
- Critically Reviewing the LiteratureUploaded byClaudiaBasile
- MTH540Uploaded byOgor
- Perception of Female Chauffeurs in IndiaUploaded bybspkumar
- Independent Samples Tes1.docxUploaded byRIZKI DWI CAHYANI
- Shank y Brown_cap 3Uploaded bybuhlteufel
- Edexcel a-level Biology Experimental Design Marks Scheme (1)(Full Permission)Uploaded byFardeenKhan

- Cheat Sheet FinalUploaded byFrancis Tan
- MH3511midterm2017Q (1)Uploaded byFrancis Tan
- solu03Uploaded byFrancis Tan
- STAT 462 FinalsolUploaded byFrancis Tan
- ex091slUploaded bySean Bonagacci
- Cheat SheetUploaded byFrancis Tan
- Cheat SheetUploaded byFrancis Tan
- 1001.2906Uploaded byFrancis Tan
- Cheat Sheet fUploaded byFrancis Tan

- Three Levels of Music Therapy HW1Uploaded byDiana Fox
- Osmania University College of EngineeringUploaded byandhracolleges
- Neurology Syllabus 2012-2013Uploaded bykriskris12345
- LITA Lean IT Foundation SyllabusUploaded byrrenu33
- GraduateUploaded bybrooklynsnow
- Application for Medical and Dental Colleges in Bangladesh for the Academic Session 2017-2018Uploaded byBD HC
- U-3 PART 1Uploaded bySanthosh Katkuri
- 4021_y12_syUploaded byGB R
- Discuss the Various Stages of Project FormulationUploaded byrock52land
- Math 6-12 FloridaUploaded byBinny Sharma
- Notification FCPS-II KSAUploaded byNadia Usman
- Physics 5054 June 2011Uploaded byDhevesh Domun
- PhdUploaded bywe_mbm
- PAR Participatory Action ResearchUploaded bySaidur Rahman Milon
- Assessment Port (1)Uploaded byBobbie Little
- TRUE LOVE by Isaac AsimovUploaded bycostea0028
- Data Collectioin MehtodUploaded bypopat vishal
- mark schedule as 90987Uploaded byapi-220013388
- PRJM6004 Project Procurement Management Trimester 2 2017 Miri Sarawak Campus INT (1)Uploaded byNazim Saleh
- Bush y FrancisUploaded byCarol Artigas
- AVEN1920_S1-2016Uploaded byGautham Palaniappan
- National Protocol for Assessment - DBE.pdfUploaded bybiccard7338
- Syllabus in Math AlgebraUploaded byVanessa Kayting
- word study lessonUploaded byapi-283392755
- Teaching Plan on DmUploaded byaambroce
- Book List for Iit JeeUploaded bysat2400
- AEIS Application 2018Uploaded bycamycax
- EOM-09_Human Resource and Talent ManagementUploaded byNguyễn Kim Thông
- Ch#4 - Personnel Selection 3Uploaded byZubair Ali
- Gims Mbbs Tuition1Uploaded byDr.G.Bhanu Prakash