You are on page 1of 4

STA255StatisticalTheory

Assignment#1(/36marks)
DuethroughaPortalTestat10pm,Wednesday,Feb8,2017


Question1(11marks)

Thefilebody.csv*(postedonPortal)containsseveralmeasurementson507physicallyactiveadults(247men
and260women),mostconsideredtobewithinahealthyweightrange.Themeasurementsinclude:
Age(years)
Weight(kg)
Height(cm)
Gender(codedas1=male,0=female)
SavethesedatatoyourcomputeranduploaditintoR.UseRtocreateappropriatenumericalandgraphical
summariestoexploredistributionsandassociationsbetweenvariables.

a. (4marks)Comparethedistributionsofheightsformalesandfemalesinthesample.Besuretomake
explicitreferencestotheappropriateRoutput/plotsyoucreated(i.e.,stateexactlywhat
output/plot(s)youusedaswellasyourinterpretationofthem).

b. (5marks)BodyMassIndex(BMI)canbecomputedasmass(or,weightinthedata)inkilograms

dividedbyheightinmetressquared( ).UsethedatatocomputeandstoreBMIs

foreveryoneinthesampleinanewRvariablecalledBMIandproduceappropriatenumericaland
graphicalsummariesofBMI.
i. Describethecentre,spreadandshapeoftheBMIdistributioninthesample.Besuretomake
explicitreferencestotheappropriateRoutput/plotsyoucreated(i.e.,stateexactlywhat
output/plotsyouusedaswellasyourinterpretationofthem).
ii. BMIisoftenusedtoclassifyapersonasunderweight(BMI<18.5kg/m2),normalweight(18.5
kg/m2BMI<25kg/m2),overweight(25kg/m2BMI<30kg/m2),orobese(BMI30kg/m2).
UseRtodeterminehowmanypeopleinthesamplewouldbeclassifiedasbeinginthenormal
weightrange.Reportthisnumber.Knowingthatmostoftheindividualsinthesamplewere
consideredtobewithinahealthy(i.e.,normal)weightrange,whatdoesthissuggestaboutthe
BMIclassifications?

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartsacofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

*DataadaptedfromJournalofStatisticsEducationDataSetandStoryhttp://ww2.amstat.org/publications/jse/datasets/body.txt.
STA255Assignment1Spring2017 1
Question2(14marks)

Themaximumpatentlifeforanewdrugisseventeenyearsbutthethetestingandapprovalprocesseswith
theUSFoodandDrugAdministration(FDA)cantakeyears.Theactualpatentlifeforthedrug(i.e.,thelength
oftimethatthepharmaceuticalcompanyhastorecovercostsandtomakeaprofit)canbethoughtofas
maximumpatentlifelessapprovalprocesstime.Supposethefollowingtablesummarizesthedistributionof
actualpatentlivesforallnewdrugs.

x (years) 3 4 5 6 7 8 9 10 11 12 13
p(x) 0.03 0.05 0.07 0.10 0.14 0.20 0.18 0.12 0.07 0.03 0.01

a. (6marks)Compute,2,andP(X10).Explainyoursteps(e.g.,sincewearelimitedtotextinthe
Portaltest,youcandescribeyourcalculationsinwords,orexpresstheformulasusingplaintext,
bracketsandarithmeticoperations).

b. (6marks)UseRtogenerateobservedvaluesoftherandomvariablethatfollowstheprobability
distributiongivenaboveandestimateE(X),V(X)andP(X10)basedon,10,100,and10,000
repetitionsoftheexperiment(i.e.,fillinthefollowingtable).
Estimate 10 100 10,000
of repetitions repetitions repetitions
E(X)
V(X)
P(X10)
Reportyourresults(completethetableaboveandcopyandpasteitdirectlyintothePortaltest)and
commentonhoweachsetofestimatescomparetothevaluesyoucomputedinQuestions2parta.

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartbofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

STA255Assignment1Spring2017 2
Question3(11marks)

SupposewearesamplingindividualsrandomlywithoutreplacementfromapopulationofN=100individuals
withproportionofsuccesses .Xfollowsahypergeometricdistributionwithpmf:

, max 0, min ,

0,

HowwelldoBinomialprobabilitiesapproximateHypergeometricprobabilities?Conductasimulationstudy
inRtocompareprobabilityestimatesbasedontheBinomialDistributiontothosebasedonHypergeometric
Distribution(thetruedistributionofthenumberofsuccessesinthesampleinthissituation).Inthissimulation
study,youwillvarythesamplesize(n),relativetothepopulationsize(N)andestimate ).

a. (6marks)UseRtosimulaterepetitionsofHypergeometricandBinomialexperimentsandusethe
observedvaluesoftherandomvariablestoestimate ).Use5000repetitions.Fillineachof
thetablesbelowandcopyandpasteyourtablesintothePortalassignment.
Table3.1:Lowproportionofsuccessesinthepopulation( 0.2)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

Table3.2:Moderateproportionofsuccessesinthepopulation( 0.5)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

Table3.3:Highproportionofsuccessesinthepopulation( 0.8)
Estimateof 0.01 0.05 0.1 0.25 0.5 0.75 0.9
)
Hypergeometric
Binomial

HereissomesampleRcodethatrandomlygenerates5000observationsofaBinomialrandomvariableand
5000observationsofaHypergeometricrandomvariableandusesthemtoobtainestimatesoftheprobability
).Modifythiscode,asnecessary,tofillintheabovetables.
N=100
n=round(0.01*N,0)
p<-0.5
pn<-p*n
binomreps<-rbinom(5000,n, p)
binomp<-mean(binomreps > pn)
hyperreps<-rhyper(5000,round(p*N,0),N-round(p*N,0),n)
hyperp<-mean(hyperreps > pn)
hyperp
binomp

STA255Assignment1Spring2017 3
b. (3marks)ReviewyoursimulationresultsinQuestion3parta.Basedonthese,whatcanyouconclude
abouttheappropriatenessofusingtheBinomialDistribution(i.e.,treatingtheexperimentassampling
withreplacement)whenthetruedistributionofthenumberofsuccessesisHypergeometric(i.e.,when
youareactuallysamplingwithoutreplacement)?Refertoanypatternsthatyouobservedinyour
simulationtojustifyyouranswer.

c. (2marks)CopyandpasteallyourRcodeandoutputfromyourRconsolewindowthatyouusedto
answerpartaofthisquestion.Includeonlytheworkingcode/output(i.e.,removeanylinesofcode
thatdidntworkaswellastheirerrormessages)andonlyincludeoutputthatdirectlyrelatestoyour
answerstothisquestion.

STA255Assignment1Spring2017 4

You might also like