You are on page 1of 1

Week 5: >P04=(Q7$probabilities[Q7$y==0 & >library(“datasets”) # Load the package for illustrative

 
Bivariate Distribution: the distribution of a pair of random    
Q7$x==4])/sum(Q7$probabilities[Q7$x==4])  
datasets
variables >P14=(Q7$probabilities[Q7$y==1 & >chick<-ChickWeight #For detail, ?ChickWeight
P(X<=170 int Y<=50) is an example of the probability Q7$x==4])/sum(Q7$probabilities[Q7$x==4]) >mean(chick$weight)
that can be derived from a bivariate distribution f(x,y). >P24=(Q7$probabilities[Q7$y==2 & >var(chick$weight)
• When (X,Y) are independent RVs, their bivariate Q7$x==4])/sum(Q7$probabilities[Q7$x==4]) >sd(chick$weight)
distribution has a simple form f(x,y)=f(x)f(y) >P34=(Q7$probabilities[Q7$y==3 & >range(chick$weight) # returns the minimum and
Q7$x==4])/sum(Q7$probabilities[Q7$x==4]) maximum
Loading the csv file into R >cor(cbind(chick$weight,chick$Time)) #correlation btw
1. car=read.csv(file.choose(),header=T) to load the file 1. xad-sale=read.csv(file.choose(),header=T) >EY_X4=sum(c(0,1,2,3)*c(P04,P14,P24,P34)) weight and time
from your computer 2. Use cov() to obtain covariance and cor() to obtain
2. car$probabilities=car$frequencies/100 to convert correlation coefficient Q3. Suppose that X any Y are negatively correlated. Is The 1st quartile () is the 0.25 quantile (or 25th percentile)
frequency to probability 3. Use plot(sale,xad) to generate a scatterplot of these Var(X+Y) larger or smaller than Var(X-Y)? The 3rd quartile () is the 0.75 quantile (or 75th percentile)
3. To calculate P(X=1), 2 variables The Interquartile range (IQR) is -
sum(car$probabilities[car$x==1]) Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y) and Var(X-Y)=Var(X)
4. To calculate P(Y>=2), +Var(Y)-2Cov(X,Y). Since Cov(X,Y)<Var(X-Y). >library(“datasets”) # Load the package for illustrative
sum(car$probabilities[car$Y>=2])
  )
Q4. Suppose that X and Y are random variables such that
datasets
>chick<-ChickWeight #For detail, ?ChickWeight
Covariance of X and Y:
Cov(X,Y) = E[(X-μx )(Y-μy )]
 
Conditional distribution (of a random variable):
Var(X)=9, Var(Y)=4, and (X,Y)=. Determine:
(1)Var(X+Y)
>summary(chick$weight) #conveniently give you a summary
table of quartiles, median, mean, etc
(X-μx ): deviation X from its mean (2)Var(X-3Y+4) >quantile(chick$weight) # returns Q1 to Q4 of the sample
(Y-μy ): deviation Y from its mean >quantile(chick$weight,p=0.5) # Q2=median
Conditional expectations: Cov(X,Y) = ρ(X,Y)σxσy = − 1 6 × 3 × 2 = −1 >quantile(chick$weight,p=0.25) # Q1
Positive correlation: If an upward swing of X tend to be (1) Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)=11 >quantile(chick$weight,p=0.75)-
accompanied by an upward swing of Y  (2) Var(X-3Y+4)=Var(X)+9Var(Y) –2×3×Cov(X,Y)=51 quantile(chick$weight,p=0.25) # the IQR
Cov (X,Y) >0  Positively correlated
Week 5 Tutorial Week 6:
Negative correlation: If an upward swing of X tend to be Q1. Determine each of the following probabilities: Statistical inference: process of using partial data to infer the
accompanied by a downward swing of Y  Cov(X,Y) < 0 Pr(X=2) properties of the population
 Negatively correlated Pr(X2 and Y2) 1. Modelling assumptions
Pr(X=Y) 2. Sampling
No correlation: If a swing of X is not accompanied by a 3. Making inferences
systematic swing of Y  Cov(X,Y) = 0  X and Y are >Q1=read.csv(file.choose(),header=T)
uncorrelated/independent >sum(Q1$probabilities[Q1$x==2]) Sampling is the process of obtaining samples from the
>sum(Q1$probabilities[Q1$y<=2 & Q1$x<=2]) population
Correlation limitations: values depends on the units of >sum(Q1$probabilities[Q1$x==Q1$y])
measurement Iid: independently and identically distributed
Correlation coefficient is defined as Q2 What is the joint probability Pr(X=4,Y=0)?
What is the marginal probability function f(X=4)? A statistic is a real-value function of a random sample:
What is the conditional probability function of Y given
where and are the standard deviation of X and Y, X=4?
respectively What is the conditional mean of Y given X=4? For a size- random sample:
Measure of central location: Sample mean, Sample median
takes value between -1 and +1, independent of units of >Q2=read.csv(file.choose(),header=T) Measure of dispersions: Sample variance (standard
measurement of X and Y >View() deviations), Sample range (max-min) & Interquartile range
>sum(Q7$frequencies)  to calculate total number -the formula for sample variance is slightly different. Refer to
Covariance of X and X >Q7$probabilities <-Q7$frequencies / 200 any textbook for the formula
>sum(Q7$probabilities[Q7$x==4 & Q7$y==0]) 0.01 Measure of association: Sample correlation coefficient

>sum(Q7$probabilities[Q7$x==4 ]) 0.21

You might also like