Dips Statistics PrintedNotes 80pages PDF

A High Quality Study Material Soa Higher Level Exam for V.G. L£ P.G. Students rN RES ERCP AC ECE EL MELEE MPR me Et LLY eL yChapter 1: Probability Ld 12 13 Outcomes and Events .. Probability Functions/Measures ... 1.2.1 Properties of P{] Conditional Probability and Independence ‘Chapter 2: Random Variables 24 22 23 24 25 26 24 Random Variables and Cumulative Distribution Functions Density Functions Expectations and Moments... Expectation of a Function of a Random Variable .. ‘Two Important Inequality Results Moments and Moment Generating Functions Other Distribution Summaries Chapter 3: Special Univariate Distributions 3 32 Diserete Distributions .. BLL Degenerate Distribution 3.4.2 Two Point Distribution .. 34.3 Uniform Distribution on »— Points 3.14 Binomial Distribution 3.1.5 Negative Binomial Distri ion (Paseal ar Waiting time distribution) 3.1.6 Hyper-Geometric Distribution 3.1.7. Poisson Distribution... 348 Multi-Nomial Distribution: (Generalized Binomial Distribution)... Continuous Distributions ronnie 14 sve 14 3.2.1 Uniform Distribution .. 3.22 Gamma Distribution 3.23. Beta Distribution .. 3.2.4 Normal Distribution (Gaussian Law) 3.2.5 Cauchy Distribution...Chapter 4: Joint and Conditional Distributions 4.1 Joint Distributions .... 4.2 Special Multivariate Dis 4.2.1 Multinomial Distribution .... 42.2 Bi-Variate Normal Distribution 4.3 Conditional Distributions and Densities ..... 44 Conditional Expectation 45 Conditional Expectations of Functions of Random Variables 4.6 Independence of Randont Variables .. 4.7 Covariance and Correlation 48 Transformation of Random Variables: Y = g(X) 49, Moment-Generating-Funetion Technique .. Chapter 5: Inference 5.1 Sample Statistics 5.2 Sampling Distributions 53 Point Estimation 54 Interval Estimation Chapter 6: Maximum Likelihood @stimation 6.1 Likelihood Funetion and ML Estimator .. 64 Properties of MLES en Chapter 7: Hypothesis Testing 71 Introduction 7.2 Simple Hypothesis Tests .... 7.3 Simple Null, Composite Alternative «sou 74 — Composite Hypothesis Tests Chapter 8: Elements of Bayesian Inference 8.1 Introduction 82 Parameter Estimation 83 Conjugate Prior Distributions Chapter 9: Markov Chains 9.1 Discrete Time Markov Chain Chapter 1 10.1 Reliability Analysis 10.2 Quadratic Forms Assignment Sheet-l sme ): Miscellaneous Assignment Sheet-2 .. Assignment Sheet-3 .. Assignment Sheet-4 ..— Feipsacaien 7 ‘Rago 9001! Zove Coreen cere CHAPTER 1 PROBABILITY 1.1, Outcomes and Events We consider experiments, which comprise: a collection of distinguishable ‘outcomes, which are termed elementary events, and typically denoted by a and’a collection of sets of possible outcomes 10 which we might wish to assign probabilities, 4 the event. i In order to obtain a sensible theory of probability, we require that our collection of events Ais an algebra over, ie. it mist possess the following properties (i) Qea Gi) Iain a, then Zea Git) If 4, and 4, €A, then 4U 4 €a. i In the case of finite , we might note that the cotlection of all subsets of f necessarily satisfies the above properties and By-usitg this default choice of algebra, we can assign probabilities to. any_ possible combination of | lementary events Proposition 1.1: If a is an algebra, then-p.< ® Proposition 1.2: If 4, and 4, ¢ A, thes 4, (V4) € « for any algebra A : Proposition 1.3: If 4 is an algebra and 4, 42,.u14y €A, then (Yf,4,¢ A. 1.2. Probability Funetions/Measures Let © denote the sample space and a denote a collection of events assumed to be a o- algebra, : : i Definition 1.1: (Probability Function): A probability function Pf) is a set function with domain 4 (a o- algebra of events) and range [0,1], ie P: a> (0, 1], which satisfies the following axioms (i) P[A}20 forevery dea Gi) PIQ]=1 1 Git) 1 4,4... 18 sequence of mutually exclusive events (ie. 4,14, = Al for any i+ j) in a andif (4A, then o[G4]-Seta a 1.2.1. Properties of P[-] A remarkably rich theory emerges fiom these three axioms (together, of course, with those of set theory). Indeed, all formal probability follows as a logical consequence of these axioms. Some of the most important simple results are summarised here. Throughout this section, assume that ois ou collection of possible outcomes 4 is a c- algebra over a and Pf] is an ‘asgaciated probability distribution i (3 Fi Sa es Na Ne DOL Pa PSI, CO IRD A GTI STP Eamallallgncademcom: Wel: eigen om13. Many of these results simply demonstrate that things which we would intuitively want to be truc of probabilities do, indeed, arise as logical consequences of this simple axiomatic framework. Pidl= Proposition 1.5: If 4.,4s,..4, are pairwise disjoint elements of 4, corresponding to mutually exclusive outcomes in our experiment, then Proposition 1. PA UGiU-.U4]= SPA] Proposition 1.6: If 4.4 then fA) Proposition 1.7; For any two events 4,B0, [3] and is feft undefined when P[3]=0 Exercisel 3.1: Consider the experiment of tossing two coins, 2={(HH)(H.T)(T.H)(T.T)}, and assume that each point is equally likely. Find (i) ‘The probability of two heads given a head on the first coin. (ii) The probability of two heads given at least one head. Theorem 1.1: (Law of Total Probability): For a given probability space (Q.AP[]). if 8,5, iS @ collection of mutually disjoint events in A satisfying a-(a,, Rut Your Own Notes —_———_- ( ADE, et Fo) Sra Hass Ka, Nee LET. Now Deb O06, Pa oi 2es78 Cal: 9 918844 OPPI6ITR, AREAETOD BIps. ie. By..B, partition a and P[B)|>9, j=l,..,n, then forevery 4e a, Pl] vang,} Conditional probability has @ number of useful properties. The following elementary result is surprisingly important and has some farteaching consequences. Theorem 1.2 (Bayes? Formul: ‘or a given probability space (2,4,P(}]), if 4,Be A are such that P[4]>0,P[2]>0, then: (3) 4)P[4} ¥[3] Theorem 1.3 (Partition Formula): If &,...B, ¢A. partition then for any AeA lala) P(Ay=YP(418,)P(B) Theorem 1.4 (Multiplication Rule): For a~given probability space {O,AP[]), let Asondy be events belonging toa for Which P[4,...4y.,]>0 «then P[A Ay dy] = PL Az ABE A, Definition 1.4 (Independent Events): For @ given probability space (Q.AP[}), let « and 2 be two events ina. Events 4 and a are defined to be independent iff one of the following conditions is satisfied GO) Plang]=P[4]P[a] (ii) P[aj a) =P[4] if P[B}>0 (ii) Pa} 4]=P[a] if r[4]>0 Bxercise 1.3.2: Consider the experiment of rolling two dice. Let A= {total is odd}, B= {6 on the fist die}, C'= {total is seven}. (i) Are 4 and 8 independent? (ii) Are 4 and C independent? (i Definition 1.5 (Independence of Several Events): For a given probability space (QA,P[]), let Ayndy be events ina. Events 4,...4y are defined to de independent iff @ Plana, ]=Pall4 ies Gi) P44 na ]= PLA PLA, PLA hie nse kiek ‘Are # and C independent? ned] = Tart 2A, Feat wt) Ha Saal res Kas, Near LT, New Dei rm fete com Che: or: digegademy coe (oni, Ph (i) 16757, Cle FRG 8 HRLGLEG SET BBTy CHAPTER 2 RANDOM VARIABLES 2.1 Random Variables and Cumulative Distribution Functions We considered random events in the previous chapter: experimental outcomes which either do or do not occur. In general we cannot predict whether or not a random event will or will not occur before we observe the outcome of the associated experiment — although if we know enough about the experiment we may be able to make good probabilistic predictions. The natura! generalization of a random event is a random variable: an object which can take values in the set of real numbers (rather: than simply happening or not happening) for which the precise value which it takes is not known before the experiment is observed The following definition may seem a litle surprising if you've seen probability only outside of measure-theoretic settings in the past. In particular, random variables are deterministic functions: neither random nor variable in themselves. This definition is rather convenient; all randomness stems from the underlying probability space and it is clear that random variables and random events are closely related. This definition also. makes it straightforward to define multiple random variables related to a single experiment and to investigate and model the relationships between them. Definition 2.1 (Random Variable): Given a prohsbility space (0, 4,P[]), a random variable, X, is a function with domain a and co-domain r (the real line) (ie, X:0> R) Example 2.1: Roll 2 dice Q={(i,/)si,j=lwus6}- Several random variables can be defined, for example x(i,j)=i+j, also ¥((i,/))=|'-j]. Both, x and rare random variables. x can take values 2,3,..I2 and y can take values 0h... Definition 2.2 (Distribution Function): The distribution function of a random variable X, denoted by Fy() is defined to be the function FR [0,1] which assigns Fe(x)=P[¥s x]=P[(o:X(0)--0%,,.» then the discrete density function of x is defined by weo-ffs9) antp Wot Exercise 2.2.1: Consider the experiment of tossing two dice. Let x = {total of uptumed faces} and y = {absolute difference of uptumed faces} (i) Give the probability function fy and sketch it (i) Give fy Definition 2.5; Any function FR -+[0, I] is defined to be a discrete density function if for some countable set x,.x3.-%45 @ F(z)20 F=12, @f(4)=0 for xe 555/21 Gi) D(x) =1 where summation is over ¥4.x2.--1 54. Definition 2.6 (Continuous Random Variable): A random variable x is called continous if there exists @ function Fy (-) Such that Fe (3) = [fe (u)eu forevery xR Definition 2.7 (Probability Density Function of a Continuous Random Variable): If x is a continuous random variable, the function fy in Fe (x)= feu) is called the probability density function of x Definition 2.8: A function f: — (0, o-) is a probability density function iff @ f(z ve [EP e)ar=1. Expectations and Moments Definition 2.9 (Expectation, Mean): Let x be a random variable, The mean of x , denoted by iy or E[], is defined by G@)-B[X]= x fy (xj) if is discrete with mass points 5.23.08). (ii) ELX]=[% xf (x)de if x is continuous with density fy (x) Intuitively, E[2] isthe centre of gravity of the unit mass that is specified by the density function. RUC rma ftipsende om: Webley dyad oe HAI, (Firat iow Wa Sr aus Khas, Near LL, New Deli 10016, Phi (OL1}26557527, Ct: 99O91H434& 999161736, BSUESATES B24 Exercise 2.3.1: Consider the experiment of rolling two dice. Let x” denote the total of two dice and ¥ their absolute difference. Compute E[X] and E(Y] Exercise 2.3.2: Let 1 be a continuous random variable with density * if0sxo [e(xyek]s Corollary 2.1: If x is a random variable with finite variance, 0% , then ate P(]X-nel2rox]=P[(X te) 220k] <5 Note thatthe last stitement can also be written as Pll ne rox ]2t—y ‘Thus the probability that x falls within ro units of ny is greater than or Puy Pox 0 if its PMF is given Xj XpionXy —be independent Poisson. -—R.V.’s Xp ~ PO) RaQ at Then Sy =X [tout Ny 188 PQA tty) RIS Converse is also tru. Let X&Y be independent R.V.’s with P(2,) and P(A) respectively then conditional distribution of xX given x+y is binomial. (Converse is true) Uses of the Poisson Distribution For large », and small p, X ~ Bin(n,p) is approximately distributed as Poi (np). This is sometimes termed the “law of small numbers ‘A Poisson Process with rate per unit time is such that (_X, the number of occurrences of an event in any given time interval of length « is Poi (x). Gi) The number of events in non-overlapping time intervals are independent random variables (see later). "BAIL (Fit For a Srak Hs Khas Near LT New Db 11016 Ph (11) S650, Cal ODIRDENG K DDPLGTIG, SHOEI om nfstlnacaaeny on Web: wr ne3.1.8 Multi-Nomial Distribution (Generalized Binomial Distribution) Let sy.on%44 be non-negative imtegers such that x, +...) $m then probability that exactly trials terminate in 4,(/=1,...,f—1) & hence that xp em [ny tt np) trials is AEN iP er et z M (= (ne + pre to pal +a) A(eta) Viale eR #(x,)=np,Yar(x))=09,(I-P)) Cov Xj.) =m? /Pe Summary (s. Distribution |PMF E(X)\. |Var(X) M(t) Poison al ray a ha le) * Lk=0,1,2, Binomit |?%=4)="Caota™* leseey ale ie doo aoe ¥~-B(r7) k= 0,1,2,000 nel wet 3 Unto jest ate 4 | wo Point 7 80-F|-nye-ap [Oe a |w J a 2 X~ NB( rp P P get |e | Hyper. peometeiec Tee ce Fle) Sia Sara Hn Kas Near mal nt3.2. Continuous Distributions 3.2.4. Uniform Distribution is said to have uniform distribution on [4,b] ifits PDF is given by asxsh ref 0; otherwise 1 ‘0 M()= 1 {ete ):r#0 Oma} Results: Let x be an RLV. with a continuous DF. F, then (x) has the uniform distribution on [0,1} 3.2.2. Gamma Distribution An RV. X is said to have gamma distribution with parameters q and B if its PDP is = o0 otherwise (b) When a=" (n>0 integer) & B=2 Then S(x)= {os erty "HATE, (Ft Foo a Sara Hae Kas, Near TT, New Dai Erma ezdanendenyis said chi-square 72 (n) distribution B(X)=n, Ver(X)=2n 1 1 MO Taya Results: Let X,(/=1,....2) be independent R.V. such that X; ~G(a,,B), then s, ¥4~6(Da,p) RV. Corollary: (i) Take a, =1 vi ‘Then S, ~ G(n.B) ie. sum of exponential is Gamma Gi) Let X~G(a),B) & ¥~G(az,B) be independent RL. then x4¥ and % are independent or x + and = + all independent conversely also true. (Gi) Memory less property of exponential P(X>résly,,)=P(X>7) where x ~exp(2) Gx) If X.Y are independent exponential R.V.’s with parameter, then x xe¥ 3.2.3. Beta Distribution An RN. x 48 said to have beta distribution with parameters a&f (a>0,B>0) ifits PDF is Zz has @ (0,1) distribution a ool H pees0 0; otherwise We write x ~ (2,8) (x)=. v(x of Note: Ifa = =1, we have U(0,t) Results: (i) xX ~B(a,B) then 1-X ~ 8(B,a) (i) Let X~G(a,B) & Y~G(az,p) be independent then als SRE & TSN ASTD a ~B(aj,02) RV. XY3.2.4. Normal Distribution (Gaussian Law) (@) AnR.V. X is said to have a standard normal distribution if its PDF is, F()= we write x ~N(0,1) (b) An RV. is said to have normal distribution with parameters 4 & o(> 0) if We write x~(j.0") wojeoa| we ) Central moments iee((x-»))=0 it odd =[(2n—1)(2n-3)....3.1]}o™ if n is even Results: (i) Let XX, be independent R:V.'s such that X= M(ugo8). bebo then “En eo(Snds] Corollary: @) X,-N(u07) (b) IE X,~N(0,1)(¢=1,...0m) ate independent, then S, N(0.1) (i) Let x &Y be independent R.V.’s then x +Y is normal iff x &¥ is normal (iil) Let x &¥ be independent R-V."s with (0,1) then x+y and x-y are independent. i Haz Khas, Near ELT, New Deli1016, Ph (112650507, Ca BIBS VOPT6I7, SRIBATSD Emil auivncndem com Webster daca Fw(iv) Let X, & Xp are independent N(H;,07) & 2.0?) then 4-3 ee Put Your Own Notes and X, +X; are independent Sees (&) @) X~N(o) 3 4? =N2(1) (0) X=N (4,02) => ak ~ Waa?) aX +b~(au+0,a%0) i) x=M(uo?)-22=#- w(o1) (il) X&Y deiid. N(007) RV.'s then J -Cauchy (10) x Cauchy (1,0 ia (1,0) at 2 WA has PDF dezree HH wiz) 3.28. Cauchy Distribution ‘An RLV. is said to have Cauchy distibution with parameters 4 and @ ifits PDF is vt S()=t 4 j-e 0,-1< p< (a) p is the correlation coefficient (b) for normal densities: this corresponds to xX, and Xz being independent the bi-variate normal density is the product of wo uni-variate Normal random variables i (© x+y and.x-y are independent if 6? = o} (candy are independent iff =0 Extension to Multivariate Normal Distribution: Infact, it's straightforward to extend the normal distribution to vectors or arbitrary length, the multivariate normal distribution has density Fon N(xas2) “(ze Jer Fexn{- Arp Ex sh | Where | "BATE Ft For a Sara HK BB43. 44, Note that x is a vector; it has mean 1 which is itself a vector and = is the variance-covariance matrix, It x is k-dimensional then x isa kxk matrix, Conditional Distributions and Densities Given several random variables how much information does knowing one provide about the others? The notion of conditional probability provides an explicit answer to this question. Definition 4.7 (Conditional Discrete Density Function): For discrete random variables with x and r with probability mass points ,%35..5% and Ya .003r Six (ol [Y=y)1X =] Pre] is called the conditional discrete density function of y_given x = Definition 4.8 (Conditional Discrete Distribution): For jointly discrete random variables x and Y, Sax (Vx) = PLY S91 X X fae(osl2) is called the conditional discrete distribution of v given x =x. Definition 4.9 (Conditional Probability Density Function): For continuous random variables x and r with. joint probability density function fy (%9)> Su y(%y) Sa (3) Where fy (x) is the marginal density of x. fyx (vx) = + if fy (x)> 0 Conditional Distribution For jointly continuous random variables x and Y, Sins (v13)= Conditional Expectation We can also ask what the expected behaviour of one random variable is, given knowledge of the value of a second random variable and this gives rise to the idea of conditional expectation, Definition 4.16 (Conditional Expectation): The conditional expectation in discrete and continuous cases corresponds to an expectation with respect to the appropriate conditional probability distribution: Discrete ("fiw (2\x)dz vx such that fy(x)>0 Continuous BLY |X = x]= f° vf (VI ‘Note that before x is known to take the value x, E[Y | X’] is itself a random. variable being a function of the random variable x . We might be interested inthe distribution of the random variable E[Y |X] Theorem 4.1 (Tower Property of Conditional Expectation): For any two random variables X, and X, Pa (01) 685757, Cal 9991S A HO9TGIT, SRISTOD45. AG, 47, Foee Coren notte BLE LaT]= EG) Exercise 4.4.1: Suppose that © ~ Ufo, 1] and (x \) -Bin(2, ©) Find E[4|@] and hence or otherwise show that E(x Conditional Expectations of Functions of Random Variables By extending the theorem on marginal expectations we can relate the conditional and marginal expectations of functions of random variables (in particular, their variances). Theorem 4.2: (Marginal Expectation of @ Transformed Random Variables): For any random variables 4, and x, and for any function a). POUeoEaleouenl Theorem 4.3 (Marginal Variance): Foray random variables X, and x; Var( 4) = [Var ¥) | Xa) Vas(BL% |X). Independence of Random Variables Whilst the previous sections have been conceméd with the information that one random variable carries about another, it would seem that there must be pairs af random variables which each provide nd: information whatsoever about the other. Iti, for example, difficult to imagine that the value obtain, ‘when a dic is rolled in Coventry will tell us much about the outcome of a coin toss taking place at the same time in Lancaster. There are two equivalent statements of a property termed stochastic independence which captute precisely this idea The following two definitions are equivalent for both discrete and continuous random vatiables Definition 4.11: (Stochastic Independence): Definition 1 Random variables X;,Ky.X, are stochastically independent iff Feat ent) FTF (5) Definition 2: Random variables ,,¥,--.%, are stochastically independent ift Th, (s) If X, and x, are independent then theis conditional densities are equal to their marginal densities Py caly Moore Covariance and Correlation Having established that sometimes one random variable does convey information about another and in other cases knowing the value of a random variable tells us nothing useful about smother random variable it is useful to have mechanisms for characterising the teltionship between pais (or larger ‘groups) of random variables. Definition 4.12 (Covariance and Correlation): Covariance: For random variables. X and ¥ defined on the same probability space ‘aH, ie ofa Sral Haar Ks, Nar LL, New Da 1006, Ps OH) ASA, Cas DRTC a Ell jfalgrendeancom: Web: Wow Annee gay A & PPT, Sta se48. cov] =B[(4-ne Hy] =B[AY]-ty Correlation: For random variables Y and Y defined on the same probabiity space Cov[XY]___ Cov[x,¥] oxy] provided that oy >0 and oy >6, Theorem 4.4 (Cauchy-Schwarz Inequality): Let X and Y have finite second moments. Then ’] with equality ifand only if P[Y =eX]=1 for some constant c. =a(X) Theorem 4.5 (Distribution of a Function of «Random Variable): Let X be a random variable and Y= ¢(X) where g is injective (i.e. it maps at (eler)) =elerp s efx? ]e[) ‘Transformation of Random Variables: most one x to any value y). Then \de"" (2) f0)= fe" ONPG ven that (g'(y)) exists and (g“'(y)) >0 Vy or (e"(y)) <0 vy. Ife is not bijective (one-to-one) there may be values of y for which there exists no x such that y= g(x). Such points clearly have density zero, When the conditions of this theorem are not satisfied it is necessary to be a little more careful. The most general approach for finding the density of a transformed random variable is to explicitly construct the distribution function of the transformed random variable and then to use the standard approach to tur the distribution function into a density (this approach is discussed in Larry Wasserstein’s “All of Statistics”) Exercise 4.8.1: Let X be distributed exponentially with parameter a, that _foe* x20 Sox) Lo reo Find the density function of (y= a(x) with e(x) (i) ¥=x", poo 0 for x<0 Gil) ¥=g(x) with g(X)=}x forosxsi 1 forxat "BAI (St Fo nS Khas New LT, New DaRFIIOO6, Ph (12650507, Ca PODTERS4 A 98916173, SRISETODTy Joint and Conditional Distributions [Rnigo 900": 7008 cortned atte Theorem 4.6 (Probability Integral Transformation): If X is a random variable with continuous Fy (x), then U=Fy(X) is uniformly distributed over the interval (0,1) Conversely if U is uniform over (0,1), then X = Fz'(U/) has distribution function Fy 4.9, Moment-Generating-Funetion Technique The following technique is but one example of a situation in which the ‘noment generating function proves invaluable, Funetion of a Variable. For Y= ¢(’) compute [“] If the result is the MGF of a known distribution then it-will follow that Y has that distribution. ‘Sums of Independent random variables. For ¥ =), X;y where the X, ate independent random variables for which the MGF exists V-h<1-0 me(=B[2=T] my, (0) for ~herch mm (0)=[e"] ‘Thus [],mx, (0) may be used to identify the distribution of Y as above. ‘AI, Prt Foy Jin Sar Hass Khas, Near Nw Doi 10016, Pas (1) BASS, Cale SODTARGG K HPOTTIN. BSEOUTSS @ nai fdddlatcademy cons Wen wediCHAPTER 5 INFERENCE 5A. Sample Statistics ‘Suppose we select a sample of size n from a population of size N. For each in {t,..n}, let X, be a random variable denoting the outcome of the i ‘observation of a variable of interest. For example, X, might be the height of the i person sampled. Under the assumptions of simple random sampling, the X; are independent and identically distributed (iid) ‘Therefore, if the distribution of a single unit sampled from the population can be characterized by a distribution with density function. /, the marginal density function of each X, is also and their joist density fanction g is a simple product of their marginal densities: (At) =F) FCF Om) In order to make inferences about a population parameter, we use sample data to form an estimate of the population parameter. We calculate our estimate using an estimator or sample statistic, which is a function of the X, We have already seen examples of sample statisties, for exampte the sample Sx where n is the size of the sample is an estimator of the population mean, e.g. for discrete X we Sx(xs)] where NV is the number of distinct values which it is possible for an X; to take Sampling Distributions Since an estimator 6 is a function of random variables, it follows that 6 is itself a random variable and possesses its own distribution. The probability distribution of an estimator itself i called a sampling distribution Proposition 5.1 (Distribution of The Sample Mean): Let ¥ denote the sample mean of a random sample of size m from ¢ normal distribution with ‘mean Hand variance o?. Then 7A, et For dia Sra Haez Kh, es LT New Dal Penal naa 16 Ph: (1) 265727, Cae WIDTEMG 9891617, TODTheorem S.A (Central Limit Theorem): Let f be a density funetion with mean } and finite variance o®. Let ¥ be the sample mean of a random sample of size m froth f and let onl = ‘Then the distribution of Z, approaches the standard normal distribution as n+. This is often written as: Z,—“-»N(0,1) with —4» denoting convergence in distribution. Thus, if the sample size is “large enough”, the sample mean can be assumed to follow a normal distribution regardless of the population distribution. In practice, this assumption is often taken to be valid for a sample size n>30 ‘The Chi-Squared Distribution The chi-squared distribution is a special case of the gamma distribution, The sample variance of a standard normal distribution is x? with 1-1 degrees of freedom. Definition $.1: If X is a random variable with density Oe s(a)=musie) qe ee o otherwise then X is defined to have 72 distribution with & degrees of freedom (1) where & is a positive integer. Thus the 22 density isa gamma density with r Result: Ifthe RVs X;,=1,... are independently normally distributed with meats 5, and variances o? then 3 = y RV has a 12 distribution. Theorem 5.2: 1f X,,...X, is @ random sample from a normal distribution with mean wand variance o? then @) ¥ ana D7 (x,-H) are independent, Due iy SEMA? basa 42, distribution, Website: wor disaendemcm Ci) 3ST, Ca POLO A HALEN EHTS ea |TOs oe Cavted ie Corollary Ss f = 9(x, 2) is the sample variance of a random sample of size n from normal distribution with mean 4x and variance o°, then (oo) ada The ¢ Distribution The « distribution is closely related to the normal distribution and is needed for making inferences about the mean of a normal distribution when the variance is also unknown. Definition 5.2: If Z~N(0,1),U~} and Z and U are independent of one other, then . ic k where 4 denotes a r distribution with & degrees of freedom, ‘The density of the 1, distribution is: ker romain qfé 2 For -01 although an extension known as the Cauchy principle value can be defined more generally), and it can be shown that k Var[x]=—£-, for > [l=75 2 Theorem 5.3: 1f X~ i, then ), as ko, That is, as k>a the density approaches the density of a standard normal. ‘The F Distribution The F Distribution is useful for making inferences about the ratio of two unknown variances. 284 et Fon SiS Tn Rh Nag UT Ne DAE TOG FH) TY, Ge RDO OTA ARID a |53. Definition 5.3: Suppose U and ¥ are independently distributed vith ee, u~3, and y~ 2 . Then the random variable ue is distributed according to an F distribution with m and n degrees of freedom, ‘The density of X is given by x Returning to the case of two independent random samples. froin normal Populations of common variance, but differing means: XorXy, ~N(tio?) oma, ~ M(H2,07) we have Point Estimation The sample mean and variance are examples of point estimators, because the estimates they produce are single point values, rather than a range of values. For a given parameter there are an infinite number of possible estimators, hence the question arises: what makes a “good” estimator? Definition 5.4 (Unbiasedness): Let X be @ random variable with pdf J(x;0), where @eQER? is some unknown parameter, p21, Let XjyonXq be a random sample from the distribution of and let 6 denote a statistic. 6 is an unbiased estimator of 8 if £[@]-0 voco Where the expectation is with respect to f(x; 8. If 6 is not unbiased, we say that 6 is a biased estimator of 0, with Bias(6) =x 6]-0 If Bias(8)+0 when n> then we say that 6 is asymptotically unbiased. Example 5.1: Consider the following estimator ofthe population variance aE (a 2HARI Fen Fos) Sara Haus Kis, Near LUT, Now De rma neiddrnndem cam Web worm dianeadenn com 16, Ph) 26757, Cal 91RD OTT ASRTTID EaSo this estimator is biased with bias equial to—2~ As this decays to zero as. n> it is an asymptotically unbiased estimator. However, we can see that the sample variance Consistency: In order to define a consistent éstimator, we first define ‘convergence in probability. Definition 5.5 (Convergence in Probal Let {X,} be a sequence of random variables and let be @ random variable. We say that X, converges in probability to X if Ve>0 lim P[|X,-¥|2¢]=0, or equivalently tim P(x, —X] x. Definition 5.6 (Consistent Estimator): Let X),..., be a sample from the distribution of X where X is a random variable with distribution function F(x; 0). Let 6 denote a statistic. 6 is a consistent estimator of 0 if, whatever the value of 6, b6—+0 A particular case of consistency is often used to justify using sample averages: Theorem 54 (Weak Law of Large Numbers): Let ¥=15"",x, with Xiong td Then Xu, ite. X is a consistent estimator of 1. "BAL, (Fist Foo) a Sarak Haus King Naw LL, New Dal mai arta coms Webster dieteaden co 6, Ps (I) 27, Cale SOIR A OTT, TDeam = 5.4. Ay Theorene 5.5: Suppose X,—P-»a and the real function g is continuous at aa. Then g(X,)—2->2(2). Consistency according to the definition above may be hard to prove, but it turns out that a sufficient (though not necessary) condition for consistency is that 8ias(8) +0 and Var(8)—+0 as n>. Definition 5.7 (Consistency in Mean-Squared Error): If 6 is an estimator of , then the mean squared error of 6 is defined as wse()=#[(0-0)'] and 6 is said to be consistent in M! if Mse(6) +0. as the. size of the sample on which 6 is based increases to infinity. Result: Mse(6) = var(6) +[sias(@)] Interval Estimation This section describes confidence intervals which ave intervals constructed such that they contain @ with some level of confidence. Definition 5.8 (Confidence Interval): Let Xj,...X, be @ random sample feom a distribution with pdf f(x; 6) where @ is afi unknown parameter in the parameter space ©. 1f {and U arestatisties such that P[Lsesu then the interval (1,U) i8 a 100(1~a)% confidence interval for @ \-a is known as .the confidence coefficient and a is the level of significance. There are several ways in which confidence intervals can be constructed. The basic procedure we shall use t0 construct a 100(I-«)% confidence interval for a parameter is as follows (i) Select a sample statistic to estimate the parameter. (ii) Identify the sampling distribution forthe statistic. (iii) Determine the bounds within which the sample statistic will reside with probability 1a, iv) Invert these bounds to obtain an expression in terms of 6 Confidence Intervals based on the CLT: Suppose that we are interested in the population mean @ and we wish to use the sample mean X as an estimator for @. Then if the population density has variance o the CLT states that 4 5N(0,1) Where nm is the sample size. 7A, rn iow Sis Sara Hoa Khas, Near LT, New De 0016, Pas HART ‘Cosi etsadiracaden com Webuts wormdpaetdemcomHence P(-196", ie.: py =P(N =n) =A" exp(-Al/nt (within the general probability model p, 20 Yn=0,,..; 5%) =, Git) "0 <0," and 8)“ =" (assuming the linear model E]Y |x] =By + 8.x). Definition 7.1 (Hypathesis Test): A bypothesis test is a procedure for deciding whether to accept a particular hypothesis as a reasonable simplifying assumption, ot to reject it a5 unreasonable in the Tight of the data Definition 7.2. (Null Hypothesis): The null hypothesis Zp is the default assumption we are considering making Definition 7.3 (Alternative Hypothesis): The altemative hypothesis 1%, is the alternative explanation(s) we are considering for the data Ordinarily, the null hypothesis is what we would assume ta be true in the absence of data which suggests that itis not. Ordinarily, Hy will explain the data in atleast as simple a way as 1, Definition 7.4 (Type 1 error): A wype Terror is made if Hy is rejected when ‘Ay is true. In some situations this type of error is knowa as a. alse positive. Detinition 7.5 (Type 1 error): A type TI error is made if Hy is accepted when Ho is false. This may also be termed a false negative. TH it Feo a ar Rae72. Example 7.1: In the first example above (pandas) the mull hypothesis is Ho: r= Pa ‘The alternative hypothesis in the first example would usually be Hyp, # pa, though it could also be (for example) @ Hep pa, oF (iii) H,:p,—p, =8 for some specified 5 +0, each of these altemative hypotheses makes a slightly different statement about the collection of situations which we believe are possible. A statistician needs to decide which type of hypothesis test is appropriate in any real situation, \ Simple Hypothesis Tests The simplest type of hypothesis testing occurs ihen the probability distribution giving rise to the data is specified completely under the null and alternative hypotheses. . ; Definition 7.6 (Simple Hypotheses): A simple hypothesis is of the form H,:0=6, , where 0 is the parameter vector, which, parameterizes the probabilistic model for the data. A simple. hypotiiesis-specifies the precise value of the parameter vector (ie. the probability distribution of the data is specified completely), Definition 7.7 (Composite Hypotheses): A-composite hypothesis is of the form H,:0¢0, ie. the parameter O.licS~in & specified subset ©, of the parameter space, . This type of hypotiesis specifies an entire collection of values for the parameter vector and so specifies class of probabilistic ‘models from which the data may have arisen, Definition 7.8 (Simple Hypothesis Test): A simple hypothesis test tests a simple null hypothesis Hy :0=0, against a simple alternative Hy :0=0;, where 8 parameterizes the distribution of our experimental random variables X= X1,¥2....%, Although simple hypothesis tests seem appealing, there are many situations in which a statistician cannot reduce the problem at hand to a clear dichotomy between two fully-specified models for the data-generating process. ‘There may be many seemingly sensible approaches to testing @ given hypothesis. A reasonable criterion for choosing between them is to attempt to minimize the chance of making a mistake: incorrectly rejecting a true mull hypothesis, or incorrectly accepting a false null hypothesis, Definition 7.9 (Size): A test of size a is one which rejects the null hypothesis Hy :0~=6, in favour ofthe alternative H; :0=6, ifandif XeCq Where P(X eC, 10=0)}=« for some subset C,, of the sample space S of X. The size, a, of a test is the probability of rejecting Hg when Hy is in fact ‘ruc; ie. size is the probability of type I error if Hy is true. We want a to be small (.=0.05, say). "HA Fit Plea Sra Haus Ks, Near LE, New at 10016, Pas) RHA, Cle POTN A SDPLGTIN SSROMTED a | mai ineszdarnrdeny com Website: wr praca ceDefinition 7.10 (Critical Region): The set C,, in Definition 7.9 is called the critical region or rejection region of the test. Defi jon 7.11 (Power & Power Function): The power function of atest |= with critical region C,, is the function 8(0) =P(XeC, |0), and the power of a simple test is = (0), ie. the probability that we reject Hg in favour of #4, when / is true. Thus a simple test of power f has probability 1p of a type IL error occurring when H, is true. Clearly for a fixed size a of test, the larger the power fj of atest the better. However, there is an inevitable trade-off between small size and high power (as in a jury trial: the more careful one is not to convict an innocent defendant, the more likely one is to freea guilty one by mistake) A hypothesis test typically uses a test statistic 7(X), whose distribution is known under Hy, and such that extreme values of 7(X) are more compatible with H, that Hy Many useful hypothesis tests have the following form: Definition 7.12 (Simple Likelihood Ratio Test): A simple likelihood ratio test (SLRT) of Hy=0=8, against 17,:0=0, rejects My iff (00%) <4 } Qs) <* Whete £(8; x) is the likelihood of 0 given the data x, and the number 4, is chosen so that the size of the testis a. xecaly Exercise 7.21: Suppose that X,,%2..4,~N (81). Show that the likelihood ratio for testing Hy :0=0 against H, :0=1 can be written [-( ze | V2} Hence show that the corresponding SLRT of size a rejects Hy when the test statistic 7(X)=* satisfies T > @™'(1-a)/V/2 A number of points should be borne in mind: + For a simple hypothesis test, both 1% and £, are ‘point hypotheses’, each specifying @ paricular value for the parameter @ rather than a region of the parameter space. d(x) + Inpractice, no hypothesis will be precisely true, so the whole foundation of classical hypothesis testing seems suspect! Actually, there is a tendency to overuse hypothesis testing: itis appropriate only when one really does wish to compare competing, well-defined hypotheses. In many problems the use of point estimation with an appropriate confidence interval is much easier to justify. ‘+ Regarding likelihood as a measure of compatiblity between data and model, an SLRT compares the compatibility of @ and 0, with the observed data x, and accepts 4, iff the ratio is sufficiently large ‘© One reason for the importance of likelihood ratio tests is the following theorem, which shows that out of all tests of a given size, an SLRT (if ‘one exists) is “best” in a certain sense. Ne Da Fa: HN AERA, Cok po OLA ATG REET B A, atTheorem 7.1 (The Neyman-Pearson Lemma): Given random variables Xj XgrXg» With joint density f(x]@), the simple likelihood ratio test of a fixed size a for testing Hy :0=0, against H; :0=0, is at least as powerful as any other test of the same size. Proof Fix the size of the test to be «t. Let 4 bea positive constant and Gy a subset of the sample space satistying (a) P(K eG |0=0,) £(055 2) _Fl1%0) ¢ 4 Ts) F(«10) Suppose that there exists another test of size a, defined by the critical region Cl, ic. Reject Hy iff xeC,, where P(x eC, |0=0)) =a. (b) XeQaTe Let B= CNG, By = CNG, B= CENG, Note that B, UB; =Cy, B, UB; =C;, and By,B, & By are disjoint, Let the power of the likelihood ratio test be fy = P(X €Cy|8=6,), and the power of the other test be 1, = P(X C,\8=6,). ‘We want to show that [y—I, 20. But To“ ty= Jo, F (1 )ee-f,, FeO )ar = Faun, SLO Ya yg, £2101) = Ju, S(O) ae-f, F (10) AlsoB, 1 (©) Find the UMP test of #790? =1 against #, a? <1 (€.) Show that no UMP test of Hy :0?=1 against 2:0? +1 exists ‘There are several points to notice: ‘+ Ifa UMP test exists, then itis clearly the appropriate test to use. It’s the “best” test for comparing & with any altemative consistent with H, Often UMP tests don’t exist! The requirements above are actually rather strong: in particular, requiring that a testis the most powerful test for a complicated class of alternative hypotheses limits the circuimstances in Which itis possible to specify such a tes. ‘+ Sufficient statistics are summaries of data which have the property that the distribution of the data, given the statistic, is conditionally independent of the data itself (ve. PLX =s17(x)=10]=PLX =x17(x)=1]). A UMP test involves the data only via a likelihood ratio, so is a function of the sufficient statisties, + As a result of the previous point, when a UMP test does exist, the critical region C, often has a simple form, and is usually easily found once the distribution of the sufficient statistics have been determined (hence the importance of the x2, + and F distributions) The above three examples illustrate how important is the form of alternative hypothesis being considered, ‘The first two ate one-sided alternatives whereas H,.<7 «1 is a two-sided altemative hypothesis, since o? could lie on either side of 1 aComposite Hypothesis Tests ‘The most general situation we'll consider is that in which the parameter space Q is divided into two subsets: 2=0, UM), where 0, =4, and the hypotheses are Hy:0€M, Hy 060). For example, one may want (0 test the null hypothesis that the data come from an exponential distribution against the altemative that the data come from a more general gamma distribution. In this case, the class of exponential distributions corresponds to the class of gamma distributions in which the shape parameter is required to be 1. Note that 0 is a one- dimensional subspace of ©, whilst Q, is the remainder of the two- dimensional parameter space of the gamma distribution: here, as in many other cases, dim(q) < dim(2,) = dim(O) One possible approach to this situation is to segard the maximum possible likelihood over 8€, a a measure of compatibility between the data and the hypothesis #,(i=0,1), Itis convenient to define the following: «8 is the MLE of © over the whole parameter space ©, % the MLE of 0 over i.e, under the mull hypothesis “, and + isthe MLE of © vcr %, ic. under the altemative hypothesis Note that 6 must therefore be the sameas éither ) or 6, , because 2=9,09,, One might consider using the likelihood ratio criterion L(8,; x}/Z(d9; x) by direct analogy with the SLRT. However, it is-generally easier to use the related ratio £(6;3)/2(6p; x). Note that his grees withthe likelihood ratio criterion above when 6, has the higher likelihood and is otherwise equal to 1. This avoids the calculation of the MLE under the constraint that 8 which will generally be somewhat more difficult than the calculation of the unconstrained ML Definition 7.14 (Likelihood Ratio Test (LRT): A likelihood ratio test rejects Hy :0.€% in favour of the alternative 1, :0 <0, =O\0, iff (0:2) (és) Where @ is the MLE of @ over the whole parameter space ©, @y is the MLE of 8 over Q, and the value 2 is fixed so that sup P(A(X)2 410) =a ey A(x) ‘Where «, the size of the test, is some chosen value. It is possible to define an equivalent criterion in terms of the log likelihood and the log likelihood ratio test statistic: r(x) =0(6:-x)~¢( 6: 5) Where ¢(0; x)=¢(0;x), and 2’ is chosen to give chosen size = suPpca, P(r(X) 2210). As is offen the case, taking logarithms can simplify computations. Equation 7.2 is often easier to work with than Equation 7.1—see the exercises and problems. "AA iat Flor Sa Sara Haus Rs, Near LT New Dei ema ntalnende ca Ph 1) BEST, CAL PHOTRO & HTETTIG TNS aElements of Bayesian Inference Asever, the size a is typically chosen by convention to be 0.05 or 0.01 High values of the test statistic A(x), or equivalently of r(x), are taken as evidence against the null hypothesis #y. If 4(x)=1 or equivalently r(x)=0 then the null hypothesis explains the observed data at least as well as the alternative hypothesis, A note on terminology: The test given in Definition 7.14 is sometimes referred to as a generalized likelihood ratio test, and Equation 7.4 a generalized likelihood ratio test statistic. When this terminology is used the simple likelihood ratio test defined here in definition 7.12 may be referred to as a likelihood rata test. ‘These abstract tests may seem far removed from real statistical problems However, as the following exercises demonstrate, standard widely-used procedures can be readily obtained as particular cases of the type of procedure introduced above. An understanding of the general ease can be invaluable when trying t0 interpret or apply specific examples. Exercise 74.1 (Paired test: Suppose that Xda. My 0?) and Let ¥=Dxin, P= D(x, RX) nt) What is the distribution of (shia)? Is the test based on rejecting Hy :u=0 for large 7 a likelihood ratio test? | Assuming that the observed differences in diastolic blood pressure (after before) are iid and Normally distributed with mean 8, use the captopril data (Table 7.1) to test the null hypothesis Hy :8y =0 against the altemative hypothesis #,:8p # 0. This procedure is known as the paired ¢ test. Patient ‘Systolic Diastolic Number Before [After [Change | Before | After | Chang 210 201 39 130 125 5 io (165__[4 2 fiat 187 166 | 21 wa tana | 160, 7 [3 104 | 106 2 167 147, -20 112 101 -ll 176 mas [3101 85 16 185 168 |-7___ at [98 23 206 180 26 124 105 -19 173 147 [26 [1s _[103 | -12 146 136 10 102 98 4 | 174 51 {23 (98 [90 3 201 168 [33 [119__[ 98 2 198 179. -19 106, 110 4 148 19 [19107 [1034 154 wi_| 23 100 2 -18 Supine systolic and diastolic blood pressures of 15 patients with moderate hypertension (high blood pressure), immediately before and 2 hours after taking 25 mg of the drug captopril Emel ntlainncdeecom: Wet: wy ditacateer see "BATE, (Fit For Sn Sra Haz Khas Near LET, New Dani, 6: (Oy 26STED, Cal 9991894 VOIGITDG, RSET BoB1 Definition 7.15 (Chi-squared Goodness of Fit Statistic): epee, Where o, is the observed count in the ith category and ¢, is the corresponding expected count under the null hypothesis, is called the x? goodness-of-fit statistic. Under Ho, X? has approximately a x? distribution with number of degrees ‘of freedom being (number of categories)- I - (number of parameters estimated wndet Ho). This approximation works well provided all the expected counts are reasonably large (in practice, it’s often employed when the expected number of counts are all atleast 5 and works tolerably well even here). ‘This 7 test was suggested by Karl Pearson before the theory of hypothesis testing was fully developed ‘Test of Independence in a Contingency Table: The same test statistic can also be used to test for independence of variables in a contingency table. In this ease, the row and column totals are fixed in addition to the grand total, hence we lose 1+(r-1)+(e~1) degrees of freedom, leaving 1r-1)(e-1) degrees of freedom. Example 7.2: Use the following 3x3 contingency table to test at the 0.01 evel of significance whether a person’s. ability in mathematics is independent of her/his interest in statistics. Ability in Maths — re-r~etl Low | Average | High | Total [interest in Stats | Low 6 a2 15 120 Average | 58 | Ol 31 | 150 i | High 4 [| 47 | 29 | 90 [ Totals 13s 150 75 | 360 Hy: Ability in maths and interest in statistics are independent. Versus Hf; : Ability in maths and interest in statistics are not independent. Decision rule: reject iy if X* >.g “18277 where se oS letzte ete iia expected, Under Hy, ie. independence, the expected frequencies are given by the product of corresponding marginal estimated probabilities times the total number of individuals. For example for the first row these are 120135 120) 150 (Sea ea)°= (55350) = (63-45) (42-50)? (29-18.75)? es mas Since 32,14>13.277 the null hypothesis must be rejected and 120-45~50=25, Thus xe 2.14 ‘Put Your Own Notes DAT, et Flor ia Sara ner bs, Near ULIps: Sn Noe Cd CHAPTER 8 ELEMENTS OF BAYESIAN INFERENCE Thus far we have considered only the so-called classical or frequentist approach to statistical inference. In the classical paradigm, unknown parameters are treated as fixed but unknown constants which are to be estimated, Probabilistic statements are made only about “true” random variables and observed data is assumed to correspond to a sampled set of realizations of random variables. Bayesian inference provides an alternative approach 8.1. Introduction {In the Bayesian approach to inference, parameters are treated as random variables and hence have a probability distribution. Prior information about 0 is combined with information from sample data to estimate the distribution of ®. This distribution contains all the available information about 0 so should be used for making estimates or inferences We have prior information about 0 given by the prior distribution, p(0), and information from sample data given by the likelihood £(0:x) = (130). By Bayes Theorem the conditional distribution of © given X £(69)p(0) _ L(x) (9) x) Hx) is, (8x © £(0:x) (0) Where A(x) is the marginal distribution of x, We call 4(8|x) the posterior distribution. Actually, a Bayesian would most probably have written: (x18) (0) A(x) There is no need to distinguish between parameters and random variables in notation and it’s perfectly reasonable to condition upon the parameters within the Bayesian framework 4(01x) Note: A Bayesian statistician does not necessarily believe that all parameter values are classical random variables. Viewing probabilistic statements as quantifications of uncertainty without explicit reference to relative frequencies of occurrence allows their use much more widely. In the subjective Bayesian framework, probabilities are quantfications of personal belief in a statement. Prior Distribution: The prior distribution p(0) quantities information about 0 prior to the gathering of the current data, Sometimes p(0) can be constructed on the basis of past data. More commonly, (8) must be based upon an expert's experience and personal judgments. The following three examples give some simplified examples of the procedures by which such distributions can be obtained, "EA, (Ft For) Ss Sra Has Ks ‘Da 01, Ps (Fy S707, Ca OOTRSA & PATI TO, HRTIG‘Anis0 5001 200g esed ratte @qitee! 82. Example 8.1: Suppose that the proportions @ of defective items in a large manufactured fot is unknown, ‘The prior distribution assigned to 0 might be U(0,1), ie. 1 for<@ nw-[ar Pe ®, B>0 ° otherwise Example 8.3: A tnedical researcher was questioned abit, the proportion of asthma sufferers who would be helped by a new drug. She thought that P[o>0.3]=P[0<03] ive. that the median 8, =0.3 Similarly, she thought that 35 =02 and 0575 = 0.45 From tables giving quantiles of beta distributions, the researcher's opinion could be represented by Beta (a =2,8=4) for which 250.194, 95 =0.314 and O75 = 0.454 Note that fusing information from different sources and extracting the knowledge of domain experts in order to produce prior distributions which genuinely encode the state of knowledge prior to the gathering of a first data set is a difficult problem. A large amount of research has been done of dhe area which is termed prior elicitation. If one conducts a series of experiments concerned with the same parameter then the situation is improved a little as we shall see the posterior distribution p(@|x,) can be used as the prior distribution when the next data set x) , say, is obtained and this procedure can be applied iteratively as we leam progressively more about the parameter. Parameter Estimation Suppose we wish to estimate a parameter @. We define a loss function 1,(00) ‘Which measures the loss which would be suffered as a result of using 6 as «an estimator when the true value is 0. This is a formal way of encoding how bad various types of possible ertor in the estimation are to us. For instance, it can encode the difference between type I and type II errors if we are interested in estimating a binary indicator of disease status. ek ‘ima felony ses Web: wien sme 2A, et For Sia Sra Haw Kas, New HLF Row Deb 1016, Ph (10) 255757, Cl OTE PTI ="I ‘The Bayes estimator minimizes the expected loss: 2(1,(0,6)\x]= f 2,(0,8)a(aix)40 for the observed value of x. ‘The form of the Bayes estimator depends on the loss function that is used and the prior that is assigned to For example, if the loss is the absolute error 1,(0.8)=10-4) then the Bayes estimator 0) is the median ofthe posterior distribution, For other loss functions the minimum might have to be numerically estimated. Example 8.2.1: A commonly used loss function is.the squared error loss function 1, (0.8) Show thatthe corresponding Bayes estimator 6, is equal to the mean of the posterior distribution 85 ()=e(012] What is the minimum expected loss? Example 8.4 (Continuing example 8.1): Suppose that a random sample of 1 items is taken from the lot of manufactured items. Let : _ [i if item is defective * 10 otherwise Then Xjsn%, i8 a sequence of Bemoulli trials with parameter 0. The pdf ofeach ¥, is pea f@ee™ forx=0,1 0 otherwise Then the joint pdf of Xj,..X, 18 Sa(x10)=024 (1-0)" 2" Since the prior pdf p(9) is uniform it follows that Su(*19)0(0 This is proportional to a beta distribution with parameters a= y+1 and Ben-y+l, where y= Sx) Es (1-0) 5 » ssl ‘Therefore the posterior has pdf T(n42) TOs )F Oven) (01x) or(1-0)"7, 0<0si "A i i Hara Hs Ke Ne EVEN DBE IONG Fs HSS, Ca HRI HOOT TPExample 8S: (Example 8.2 continued) Suppose that the lifetime X;,..,X, of a random sample of » lamps are recorded. The pdf of each x, is (pets x30, res0)-{ 0 otherwise The joint pdf of x)..%, 1B is £(x1B)=Bre-Pr, where with a gamma specified for p(B) we have F (x18) p(B) 2 Brrsel-—920000)8 Where a factor that is constant w. r, ) has been omitted. Thé RHS is proportional to a Gamma (n+4,y-+20000), hence (v+20000)"** 01x) Gray Brvse-(y+20000)8 8.3. Conjugate Prior Distributions ‘A conjugate prior distribution when combined with the likelihood function produces a posterior distribution inthe same family athe prior. If we find a conjugate prior distributiowhicit adequately fits our prior belief regarding 0, we should use it, because it-will simplify computations considerably. However, one should not empidy a conjugate prior distribution for computational convenience if it doesnot represent those prior beliefs reasonably closely, Example 8.6: (Sampling from a Bernoulli Distribution). Suppose Xj,...%, are a random sample from Ber(@),0<6<1. Let p(0) be Beta (a,8) Then (0|2) is Beta (a+, 8+n- Ds) The proof of this claim is analogous to Example 8.4 (note U0, ia(LD)- Bxercise 8.3.1 (Sampling from a Poisson distribution). Suppose X;y...Xy are a random sample from Poi (8) Let p(0) be Gamma(a,) Show that the posterior density, 9(81), is Gamma (orSsner] a Example 8.7: (Sampling ftom a Normal Distribution with known o? ). Suppose X;,..,, are a random sample from (0,02), with o known, Let p(0)be N(6,e2) digest fe) 2A, Pet Fon Sa Saal Hawa Kas, Near LET, Nem DaIE-IGOT, Pa (11) USSSA DOORN & OIG, SOBNTOD rma affdenendem cams Webi: won beanie sm2> ips academy | Proof (01>) 9(0)4(04x) ive. g(01x) is the pdf of (tees { tne Precision: Note that the posterior variance ( ent)" (3 4)" i.e. the reciprocal of the sum of the reciprocals of the prior variance and the variance of the sample mean,, respectively. Because of this reciprocal relationship Precisioin Is sometimes quoted instead of the variance Posterior Mean: The posterior mean ees 2 ein T 2° ain i.e. a weighted average of the prior mean 4 and the sample mean , with weights proportional o the prior precision and the precision of the sample This type of celationship holds for several sampling distributions when a conjugate prior is used. Example 8.8: (Sampling from an Exponential Distribution). Suppose XivorX, are a random sample from Exp(0). Let p(0) be Gamma (0,8) Then 4(0|) is onnm (oonte$) See example 8.5 for a proof of this claim. {HA Pt Fon) Saar vas Kh Ne ELT, New Da 1006, Pus 11) 2657527, Cal OOHRDAN K HOTT, SREBas Ea boromeecocar CHAPTER 9 MARKOV CHAINS Stochastic Process: The collection of R.V. {X;:t€7} defined on some probability space (9,F,P) is called stochastic process. Here, the set T is called index set & the values that R.V. X, takes is called state space and is denoted by S We will study the ease when 7 is disérete (ie. a countable set) and stite space is also discrete. Example: {X,:teR, where X, is 1 if head & 0 if tail & we tass the coin daily. 9A. Discrete Time Markov Chain A sequence (X,),..j of RLV. with discrete state space S ig called DTMC if (Xa =A painkeces Sai) =P(Kse= dagen) Yioont eS ic. the future value depends only on,thépresnt value not on the past history Examples: Suppose that a coin is tossed repeatedly and let 2X, :=mumber of heads in nose, Definitions ()Teamsition Probabil Jie. probabitty of Ay = P(X, (i) Tran Given the one step transition probabilities, we can write it in the matrix form as: Fo For Foo Fo fu Ma P=(%), 6 (iss"|ing Pa Pa Note that each Fj 20 Vi, and each row sum has to bel The matrix whose each row sum is 1 is called stochastic matrix, And if both row & column sum is 1, then itis called a doubly stochastic matrix, ii) Initial Distribution: The probability distribution = P(X =i) is called initial distribution. It basically gives you the probability that the chain starts at i state initially. i.e, Probability of system starting a state ¢ is x; Erma nlalannademy co Webi: ateadencom ‘AT, Pet Fo) Bia Sra Haus Khas, Near LI, New DFLV006, P11) 285707, Ca 9OTNNO A SHLTIN RSRRLATIDIps. (iv) State Transition Diagram: Given the matix P, we can draw a sate TT transition diagram & vice-versa. The diagram consists of eitles " Pué Vour Own Notes ‘showing the state and ares from one circle to other showing whether it is possible to go from that state to the other. Example: 07 03 0 01 03 06 10 0 (®) M-Step Transition Probabilities: We can define m step transition probabilities that gives you the probability of going from i® state to * state in m seps. i. 80) =P( Xun =Jfegs) ides So, we can define a m-step transition miatrix. denoted by P”) as pln) = pm (vi) Accessibility: State / is said to be accessible from the state # if Hf) >0 forsome n i.e. if we can go from i to j in n steps for some (vii) Communicating States: Two states i and j are said to be communicating if i—> j and ji i.e ifitis possible to go from i to jin. steps and then back to Fin msteps for some m, n needn't be same. (vii) Class of a State i: Cli) =(JeS:10 j} es C(i)= {Set of all states that communicate with i } ieC(i) [+ ¢ Communicate with é in zero steps} Closed Communicating Class: A class is said to be closed if the exit from that class is not possible i.e. once you entered that class, you get trapped there. A closed communicating class consists of recurrent states only. (a) Irreducible and Reducible Chain:A Markov Chain is said to be irreducible if every state communicate with every other state. ie. C(i)=5 otherwise chain is said to be reducible. 2 et FS Hau ag Nay LL Ney De LG, Ph QUAI, Ca PHM A ATCT SISTIDSere 12 2 0 0 22 0 0 wa va wa 4 (o 0 o 1 42 12 Ke oe” 1 14 Oo 14 (1) ={1.2} [2-1 to 2 and 2 to 1 is possible] c2)=c(1) C(3)=(3} [+ 3-91,3-92,3-04 is possible, bt not back] cla)= Since there are 3 classes, chai fs reducible (ix) Period ofa state #: Let 15 Ai) a(i)= wane lef” >} ie Steps to calculate the period of any stat i (2) first collect the number of steps in Which is possible. ie. (94 2 steps 1915 stops et (b) Then take g.ed, of these no’s and this g.ed gives you the period of state i Example: In the above example 2(t)= ged {1.2.3.4} = () 2(3)=e04{1,2,3, a(4)= 402 Result: If 4 j, then (i) = 1, se Fn) His Sara Hn Khas, Near LT, New DATONG Ph (OD BSSVSTT Cle SWAIN DSRTGATIG SRRATIDtate i is said to be aperiodic if Ifall states are aperiodic, then we say chain is aperiodic. (ai) First Visit Probability: (f= P(X_= kX, ek WV px)=1-P(X x) =1- F(x) ice. the probability that device is still working after time. x (ii) Hazard Rate Function/Failure Rate Function salt FO) = 5 “+ our possible values will =H (o8s(0) where f(r) is PDF and F(t) is CDF of system. (ili) Expected Lifetime of the System #(x)= f (eda where f(x) is PDF of system ‘Type of Question ‘Type-I: When reliability is given to be a constant Consider the system with components 4,2,C,D whose reliability is + for each component. Find the reliability of the system, Solution: P(X > x)= P(An(BUC)>D) (ie, 4.0 works, atleast one of B&C works, then system is working) Since each device is working independently {a} =P(A)P(BUC) PD) P(A)[P(B)+P(C)- P(Bac)}P(D) = P(ALP(B)+ P(C)- PB) P(C)]P(D) Ps 11) 265797, al DRA PITTI ASNTTy ‘Type-Ik: (When reliability is some random variable & mostly exponential random variahle and you have to find survival function or expected life) Note: X is said to be exposential RLV. with parameter % ne, xed fe) -{ otherwise ie. Peal=meeeni|, ‘ x = P(X>x)=1-P(s) rea ary mean E(X)= [xf (x)dx= [x(de™ |== & mean E(X) [ora fel \ey Problem (June 2013) (Section B): Consider a parallel system with two components each having exponential distribution with mean one and life time of both components are indeperctent. Find expected life ofthe system? ft (b) ) @) Solution: To find expected lifetime of , we need PDF of X (a) Survival Function Si)= P(X >0) = P(X, >) P(X St) {75 Independent} +P(X, $1) PU >) +P(X21)P(X; 22) (G. atleast one of them is working) or S()= P(X >th=1~P(X $1) =1-P(X, SPX $1) ite, 1-P (system not working) ie, 1-P (both not working) s(=1-(I-et-e") at-(\-20" re) P(X >t)=20% F(x)=1-208 40% woete nie = Sle‘Type-IIL Comparing Two Systems ————— Sune 2012 (Part B) Put Your Own Notes —— ee! The hazard rate of two lifetimes variables 7; and 1; with respective CDF's (0) and Fy(0) and PDF's (0) and f(t), are fy(1)=40 and h(t) <3, 150 thea @ AW@2K() vero &) RIDA) veo © E(R)0,28%)>0 and 22) >0 Then 3 chosen at then step. Which of the ee es following is necessarily true for the Markov (b.) The Markov Chain is a periodic. Chain X, ? (©) State 8 is recurrent. (@) lim PU, =a)e! (4) State 9 is recurrent we 3 Consider a Markov Chain on the state space (b:) tim P(X, =6)=4 {1,2,3,4,5} with transition probability matrix so 4 (c.)the average proportion of time *c* is ff ) V2 ie 0 V4 0 observed converges to + V3 0 0 2/3 0 oo 1 0 0 (djchain is imeducible 2s 5 0 2/5 0 12. consider a Markov chain {X,:neN} such vs Us Vs Us U5 that 2, >0 for some i. Let chain is ‘Then which of the following statements are ee true? (a.)chain is aperiodic always (a) States 1,2,4 are recurrent and 3,5 are ha ee eee as (c)chain may be aperiodic or may be periodic (b) States 1.2.3.4 are recurrent and state 5 is (@.)chain has period 2. ee 13. The one step transition probability matrix of (©) The chain has a unique stationary a homogeneous Markov chain {X,,n20} is istribution, distri V3 1/6 1/6 1/3 (@.)The chain has more than one stationary ene aT distributions vs 1/4 1/4 3/10 10. Let {P,:n20} be a sequence of numbers v6 6 3 U3 SI whic jing is correct? such that for all n20, p,>0, SP, Then which of the fallowing 7 cat (a) (Ay =U Xs =2) = P(X “11 Xo =2) a ; (b.)this Markov chain is irreducible are aL (c.) the states 1 and 2 are the only recurrent © 5 = wit sion states sate space S=[04.2--} wim wants (a) itis a recurrent Markov chain probability matrix 14, The one step transition probability matrix of BRB a homogeneous Markov chain {X,,n20) is 10 0 3 6 1/6 1/3 o1o0 va va va wa Ws V4 wa 30 (2) The chain is not irreducible V6 V6 3 1/3 (b.) The chain is irreducible and transient. Then which of the following is correct? (c.) The chain is irreducible and null (a) PCAs =11 Xs =2) = P(X =U Xy =2) recurrent (b.) this Markov chain is irreducible (4) The chain is irreducible and positive (c.) the states 1 and 2 are the only recurrent recurrent. states (@) it is a recurrent Markov chain A iP Sra Hae Sar LE Ne De ONG Ps LAE, Ca PO HHT AAT16. 17. 18, Consider an_aperiodie Markov chain with state space $ and with stationary transition probability matrix P=((p,),#¢S . Let the step transition probability matrix be denoted by P*= (pf), 4/5. Then which of the following statements is true? 2) lim pf =0 Only if ¢ is transient (69 tim pi (©) lim pf = lim p; if ¢ and jare in the same and only if ‘is recurrent communicating class. (4) lim pp = lim pt if ¢ and jare in the same ‘communicating class. Let [X,} be a stationary Markov chain such that PUK =U, =D ep, 21 PUX, PU =X, and p(x, =1)= 7, = Then @) gaa (b) 5, © (d) x, ‘An aperiodic Markov chain with stationary transition probabilities on the state space (1.2,3,4.5} must have (a, At least one null recurrent state (b.) At least one positive recurrent state (©) At least one positive recurrent and at least one null recurrent state (4, At least one transient state Lat P be the stationary transition probability matrix of the Markov Chain {X,.20}, which is irreducible and every state has period 2. Further suppose that the Markov chain {¥,,20) on the same state space has transition probability matrix P* . Both the chains are assumed to have the same initial distribution. Then (a) FLX, =% 8) =H ]=1 (b, Allstates ofthe chain {Y,,"20} are aperiodic (€) The chain {Y,,n2 O}is ireducibte (d.) Ifa state is recurrent for the chain {X,,n2 0}, then it is also recurrent for the chain {1.20}, 20. un. (14, dy ys Hy) be the stationary distribution for a Markov chain on the state space {1,2,3,4} with transition probability matrix P . Suppose that the states 1 and 2 are transient and the states 3 and 4 form a communicating class. Which of the following is/are true? (a) uP =P" (b.) 44, (©) atm and 1, =0 (4). One of 14 ant is 2200. Consider a Markov chain with state space §={1,2,3,4,5} and stationary transition probability matix P given by 01 0 02 07 0 o 1000 a7 0 01 02 0}. Let pi be the 02 0 07 01 0 0 050 0 os (i, /)th element of P*. Then (ay Dilin (b.) (0.25,0.25,0.25,0.25,0) is a stationary distribution for the Markov chain. (6) Spy 0, Sp, a wp, <2. Consider a Markov chain on the state space 0,1, 2,..} with transition PO Ps 100 probability matrix] 1 0 ~~]. Then eo! (a) The chain is not irreducible (b.) The chain is irreducible and transient () The chain is imeducible and null recurrent (@.) The chain is irreducible and positive THAT, Foe a Sr Haas Ks a ie, us (11) 307597, Cal NB 99 TCT ENRRUT lai filgencade coms Webster ia recurrent2 Consider a Markov chain {2,: 20} on the state space (0, 1} with transition probability matrix P . Which of the following statements, are necessarily true? (a) When Lo {tints =i (0 te RL =A] converges for i=, 1, but the limits depend onthe inital dstbuton v () When r(? Qetine x, <1] exits and is positive for all choices for the initial distribution v ei? "Pamala does not exist for any’ choice of the initial distribution (c) When 7) lim P[X, =] (4) When e("? 1 always exists, but may be 0 for some choice of the initial distribution v. 23. 24, Let X,,be the result of then-th roll of a fair die, n21. Let S,=)°X;and ¥, be the last digit of S,, for > 1 and Yo = 0. ‘Then, which of the following statements are correct? (a) (¥:020} is an imeducibje Markov chain (b) {¥,: 2 0) is an aperiodic Markov chain (©) PQ, =0)-94 asm 1 @.) PU, =0) ++ asn—>00 (@) PO, =O) 75 as Let {X, :226} be a Markov chain on a finite state space $ with stationary transition probability matrix. Suppose that the chain is not irreducible, Then the Markov chain (a) Admits infinitely many stationary distributions (b.) Admits a unique stationary distribution (€) May not admit any stationary distribution (4) Cannot admit exactly two stationary distributions i Ha Kas, Near TAT, New Dei 11016, Ph (11) 285757, Cas SIRI K 976TH, BRSBATOD rma nldgrnrdemy om Web ww donc com1. 3. ASSIGNMENT SHEET-2 Suppose that the random variable X has a -4,—_‘let ¥; and % be two random variables with uniform distribution, x ~Uf0, 1], and that, Joint density once =x has been observed, the conditional distribution of = is [vx=x]~o[s] . Then which of the following is/are true? a) eee oan) @) «(r)={2 ) I=(5 (a) P(y=2 aa (b) P(%,=0)=q? © re-(4) @) PU =0)=4 © £Ui-n)=0 @ voy-(Z) (@)% and ¥% are Binomial random 144) variables ‘Suppose that the random variable X and Y 5. Suppose that X is a random variable having have a continuous joint distribution with pdf finite mean pt and variance o? . Then: fay), means py &." py respectively. variances of & o} respectively and @ correlation p Suppose also that ) £(r1s)=Bo-+ Bie 2 (re)= Bo +r (©) Pllx-u[2e)s1-5 for e>0 Which of the following is/are true? é ()f »Flsv)av =(Bo +B.x) fe (2) CD P(K-uze)si-p for 470 ~ 6 Let {X,} be any sequence of random ) Hy =Bo +n variables. Then for the sequence {X,} to (©) oxay +r “Buty +Pi(o% +H) satisfy the weak law of targe numbers, the condition, for Y=" "SX, that (@_=22E and By = Hy —Putex : ‘ : AE) so wre Suppose that a random sample is to be taken ler] from a normal distribution for which the (a) necessary and sufficient conditon value of the mean @ is unknown and the (b) necessary condition standard deviation is 2. How large a random e oF ‘sample must be taken in order that (© sufficient condition (@) neither necessary nor suffi P(VF-0]<0.)2095 vo contion Note that: Zpq:s =1.96 i.e., 1 © Eales variate with parameters 1 eee) SO {@) the random variable 4 has no memory (a) 40 (b) P(X >res|X>s)=P(X>r) ) 28 (©) X follows exponential distribution () 50 ‘with parameter 6 @ 3 (@ _allthe above are true [I Fn ia er RNa Reg DUP I ETN, a TREN HTT TTT ras inlolaisacaden com Wo8. If X and ¥ are two standard normal variates 12. Suppose X is @ random variable with and p is the correlation coefficient between £(X)=Var(X). Then the distribution of X X and 7. then the corctaton coefficient Oy eeeeen oan ake (b.) Is necessarily Exponential @ e fe.) Is necessarily Normal 4.) Cannot be identified from the given data. ® » ( identified from the gi : 13. Batteries for torch lights are packed in boxes. © efi-r") of 10 and a fot contains 10 boxes. A quality inspector randomly chooses a box and then (@) none of the above is true. checks two. batteries selected randomly 9 Let X and ¥ be two variables which can be Without replacement from that box. The lot correlated by an equation aX +bY-+c=0. It pelle ede ty ont Gi eo ca @ and b are of the different signs then the mciceiee Binns Onto be) oe ocire Lupe oee) correlation coefficient between X and ¥ is: that 9 of.the 10 boxes in the lot contain no defective batteries and only one box contains (= 2 defective ones. What is the probability that the lot will NOT be passed by the Inspector? wy @ 197 ot (9 4950 (a) Between -J and 1 »y 2 10. A system of 5 identical units consists of two ©) 395 parts A and B which are connected in series A Part A kas 2 units connected in parallel ‘and eS part B has 3 units connected in parallel, All the 5 units function independently, with at probability of faire 1. Then the reliability, “ 2 14, Which of the following is/are cumulative of the system is distribution function(s) (c.d.£) of random variable(s)? (ay 3 el 2 0, <0 ) 103-| u x30 (h) a x<0 1 wo nco-{, x50 Os 0. «<0 ay 2! AO oo (a) vd 0 <0 11. Suppose ,,X,,... is an iid, sequence of @ Rn ={12, osx random variables with common variance x20 1g Ise o> 0. Let La Xya and Z,=— dy, 15. Let X be a random variable taking values in a ‘ “ set E. Let Then the asymptotic distribution (as n>) P(x>a+b|X >a)=P(X >) forall a,b,€ £. of Vn(¥,-2,) is ¢ 1x>a)=P(X>2) ‘Then which of the following is a possible (@) N(0.1) distribution of X? ©) N(0, ) (a, Poisson (b.) Geometric (©) N(0, 20°) (©) Long-normal (4) Degenerate at 0 (@.) Exponential (Mn R CeCe a Mma i Stl taiinsdoeyene Msn aberedonReo REEL Ser ON cay 16. Suppose X and Y are independent NOI) 20, Suppose x°~M(0, 1) and Y~ 22. Which of random variables. the following is always correct? x x Let U =< and V =<. Then WL) X24 ~ Z2, tu =X ii @) Pay 2, (4) U and V are independent yen s, (00 and Veh mae sttion E ©) Pu (©) E(x? +Y)=14n (@) Var(X4Y)=142n 2A. Let X,, Xp... be iicd, sandom variables with 17. Suppose X.Xj,... 18 a sequence of iid. random veriables where mean 0 and variance 2 and let y= 2-44. PU, = PEK, #0), F=1,2, for 121. Then the limiting distribution of aS () (44% 4.04%) is normal with ze, and a= Piz~ p> pt ttl Let z 02 (and @=P\Z~p/>0.1) Ta ‘mean O and variance | Then for all p 1 i (D) {Ke 4%, +..+%,4) is normal with @yasi Te 5 ) 1 | (b) as.05 mean 0 and variance 1 (©) @>01 | (e) Ej ey 47 44%) is normal with — | (a) 2-0 ve mean Oana variance 4 18. Random variables x,,.,,.X, are such that ‘correlation (.X,, X,) = correlation (¥, X,) = @ ple 4K +¥, ttK.,)is normal with — | correlation (X;, X,)= 2 mean 0 and variance 1 (a) p cannot be negative 2 (KK) (Xp KJoon(Xo X) are independent (b. p can take any value between I and +1 and identically distributed random vectors where Xs normally distributed with mean 0 ©) o p2-as and variance 1 and. (@) p iseither +1 or—1 Pf Pk are independently distributed for 2}=1.. Further, x, and ¥, 19. Suppose that we have a data set consisting of ZI 25 observations, where each value is either 5 or 10. smand (a.) The mean of the data cannot be larger 2, KY Ayhy eat (A XM. Then than the median. (a.) Z,is ly distributed for each nm (b.) The mean of the data cannot be smaller Se aenmene me than the median. (b,) 2, is symmetrically distributed about 0 (€) The mean and the median for the data will be the same only if the variance of © (%)-« the data is zero a (4) The mean and the median for the data @ ae A will be different only if the range is 5. Nn [HAY (Et Fao) Sora Hows Khas Net LT, New Ds 01, Po) 2557, Cals OLR & OTCIT, ERREATS Website wn dipeeaen.com24, 25. 26, Let F(x), Ga) and H() be the joint ¢. d. £ of (X,Y), marginal c. d. f. of X and marginal ¢. d. f of Y respectively. Define L[Xsa yp forse Lxea Hl Yoo Where a and b ate fixed real numbers. Then. {a3 l€Cov U,7) = then F(2,») =G0) HO) for all x andy (b) If F(,9) = GHG) for all x and y then Cov (U.P) =0 (c.) IFU and V are independent then X and Y are independent. (A)IEX and Y are independent then U and V are independent. ‘Suppose X and ¥ are independent random variables where ¥ is symmetric about 4. Let U=X+¥ and V=¥-Y , Then (a.) U and 7 ate always independent (QU and ¥ have the same distribution (©) U is always symmetric about 0 (d.) / is always symmetric about 0 Let X,X,,. be iid. (1,1) random vatiabies, Let S,= A? 4X3 +.442 for m1 ren tint) ws (b)6 (1 (4.0 Which of the following conditions imply independence of the random variables X and Y (a) p(X >a|¥>a)=P(X >a) Forall aeR (6) p(X > al ¥ a) forall abe R (c.) X and Y are uncorrelated, (a) E[(x~a)(¥~6)]= E(x ~a) B( alla, beR ‘Suppose that we have a data set consisting of 25 observations, where each value is either 0 orl (a.) The mean of the data cannot be larger than the variance. (b.) The mean of the data cannot be smaller than the variance (€) The mean being same as the variance implies that the mean is zero (2) The variance will be 0 if and only if the ‘mean is either 1 or 0. ¥-0) for 28, 29. 3. Cee O Reno) Consider the following five observations on (KY); (0404 (L.2),(2,3).3,2).4,)). Then (a) The least-square linear regression of Y on Xis ¥=2 5 (b.) The least-square linear regression of X on Y is ¥=2 (c) The correlation coefficient between X and Y is 0. (d.) The correlation coefficient between X and Y is +1 Let X.Xjo. be independent random variables with x, being uniformly distributed between R be defined by f(x)=i, “if x20 and f(x)=-1, if x<0. Let UW be defined by U=Ixi-f(H), FIs, W=Z. f(x). Then (a) U and ¥” are independent each having a 'N(0, 1) distribution. (b.) U and W are independent each having 1N(0, 1) distribution, (©) ¥ and W are independent each having M(0, 1) distribution, (4) U,V and W are independent random variables. Let X, and X, be two independent random 38. variables with X,~ binomial (m2) and B 39. binomial (a), men. Whi of te following are slays et? (@.) 24, £34, ~ binomial (2m--3n, 4) (by) X,-X, ¢m~ binomial (mand) (6) Conditional distribution of x, given (X+%)is hyper-geometrc (@) Distribution of x,-x, is symmetric bout 0. Let 4 be an event, which has non zero probability such ‘that P(AMB,)=P(B)P(ALB,) and P(AnB,) = P(A)P(B, | 4)- Then are: (4) exhaustive, may not be mutually exclusive and none of which has zero probability (b. exhaustive, may not be _mutvally exclusive and atlcast one of which has a non-zero probability. (©) exhaustive, mutually exclusive and atleast _one of which has a non-zero probability. (4) exhaustive, mutually exclusive and none of which has zero probability. Let 1 balls be put at random into » cells, then the probability that each cell will be ‘occupied is: (a) Un" (b) In (©) nti" @ va Let X,.Xzu0%, bem independent observations from distribution with density fanetion (+) and distribution fuention Then the — density funetion in(X1,Xay-rXn) A @) AfPOIT 70) ©) aftr OY 0) ) [I-FO)T' £0) @) [FT FO) It p(x)=2", . then P[]X~2|<2] (a) cannot be computed from Chebychev’s inequality. (b) can only be computed from Chebychev's inequality. (c) computed from Chebychev's inequality gives a result identical to that obiained from direct compulation, (@) computed from Chebychev's inequality gives as precise a result as obtained Khas, Near ELT, New Das 11006 TAI, (Pet Few ina OT BGS, Ca 9991 K DOTGITH, RRR rma infetanscrden co: Webs wer dptacadn com from direct compulation. | |41. 42. 43. If X and Y are two independent non- negative integer-valued random variables such that, P(X=k)>0 and k= 0,12, PUY=k)>0 for and the conditional distribution of X given X+Y is binomial, then: (@.) both 5 and Y are binomial (b.)_X is binomial and ¥ is Poisson. (c:) _X is Poisson and Y is binomial. @,) both. and ¥ are Poisson If X follows a binomial distribution with Parameters n and p, then 2=2=") jg 74 ‘asymptotically normal (@)_n is large and neither p nor g is close to one. (b) a is large and neither p nor q is close to zero. (©) 7 is small and neither p nor q is close to zero, (4) 7 is small and neither p nor g is elose to one, Let X be a Poisson random variable with Parameter 2. Then as A> (a) (X~A)/2 is asymptotically normal. (b) (X-Vk)/Vi is asymptotically normal (©). (X-2)/E is asymptotically normal (d.) none of the above. Let there be 1V units in a population and let M units be added to it before a sample of size n is selected. Then the probability of the selection of any unit at a specified draw in simple random sampling is © Gna eon 44. 45. 46. 41. Let ¥,,( Quam) be @ random sample of size m drawn from a normal population having mean p and variance o?, then for L(x - F’ , the distribution of the statistic (a) student's freedom. with (n-1) degrees of (b)_ normal (©) chi-square. (di) standard normal, Let 1 be any statistics, (1) be the bypothetical or expected value of t, V(¢) be the variance of ¢ and SE(¢) be the standard error of ¢, then Z is normal with mean 0 and variance 1, where Z. is: (@) [r-e(9]/F() ©) [-2(0/ EO (©) |-E(0)/SEO (@) [+-£(9)/ 58 (0) X~U(0,1) and ¥~ B(10,X), then ¥(Y) is f@) 5 (b) 6 () 10 (@) 12 Which one of the following statements is false? Characteristic functions are: (a) independent of origin and scale. (b.) independent of origin and not of scale. (c)_ independent of scale and not of origin. (d.) independent neither of origin nor of ey Da 1006 Ph (11) 6857597, Col STS & 9W91GTTN, RSEEATOD49. 50. Which one of the following statements is true? (a) Correlation coefficient between A and Y is independent of units of measurements of the variables X and y (b) Let p(X.) be the correlation coefficient between X and Y. Then the correlation between U and Vis the same as that between X and Y, where Kea). (¥-b) = ana y= for a,b,c and d 10 be four arbitrarily ‘chosen constants (€.) Correlation coefficient is independent of origin and scale. us (4) 16 GD and oP) oy and oy, are standard deviates of where and Y respectively, then Lea : which lie between =I and 1, is also's correlation coefficient Which one of the following statements. is true? fa} If X and Y_ are two independent random variables and Z=X+Y is‘chi- square distributed with n degrees of freedom. For X to be chi-square distributed with n, degrees of freedom, ¥ should also be chi-square distributed with nn, degrees of freedom for n>m (b.) For large values of n, 272 ~V2n=1 is nearly normally distributed with mean 0 and variance 1 (©) The mean and variance of a chi-square distribution with » degrees of freedom are n and 2n respectively. (d.) For a chi-square distribution with » degrees of freedom, the moment generating and the characteristic functions are (1-21? and (1-20)"? respectively. Let y <<. be the order statistics from. uniform distribution over (0,1) , then ‘he cortelation coefficient between % and ¥, @ 1 (b) 0 (€) directly proportional to sample size (@) inversely proportional to sample size 52. 53. Xf X-B(n,p), then con (a) "8 (o) 2a ©) (a) 2 Let_X be a positive integer vatued random variable. Then its expected value is given by @® Sepro4 a ) YP[xee a © Lesa a (@) Slr =4] a Let X; and x, be two independeat standard normal variables, Let v= @2—4" v(Y) is @t (b) 2 12 (@) None of these Hundred (100) tickets are marked 12,100 and are arranged at random. Four tickets are Picked from these tickets and are given to four persons 4,B,C and D . What is the probability that A gets the ticket with the largest value (among 4,B,C)D and gets the ticket with the smallest value (among ABCD)? Then 1 wy l (a) n "HAF Foo) a Sera Maus Khas Near ew De 06 Pas) AST, Cak STR & HOTTCT RTT Iaftaurcrdercant Wea wn dunsdeu et as eB |55. 56. 37. Ty Let X and ¥ be independent and identically distributed random variables such that 58. P(X. W=|X-Y|. Then which statement is not correct? (a) Xand W are independent (b) ¥ and W are independent (c.) Zand W are independent. (@) Zand W ave independent Let X)~(0,l)and let a} Psa s2 2° | x otherwise correct statement Then identity the (4) corr(X4:X2)=1 (0) Xz does not have 1V(01) distribution (©) (XjX2) has a bivariate normal distribution. (@) (X,X2) does not have a bivariate norinal distribution Let X-N3(u8) where y=(UJ,l) and rir Yi ae ie 2 and -Xy +X a) -2 (0 ()2 @t ‘The value of csuch that X', LX are independent is Let (02,F, P)be a probability space and let A be an event with P(A)>0. In which of the following cases does Q define a probability measure on(,F)? (a) OD) = PAU DVD EF (b) QD) = PAN D)YDeF P(A|D),if D € FwithP(D)>0 ©) Q0)= feed (@) Qib)= PD ADE F ‘The ‘joint probability density function of Nis FY) (1-¥),00 A sufficient statistic for 0 is (a.) Sample mean (b,) Sample median «) Slog) @) Y# 5, X,,Xz are two independent observ from B(I,0). The statistic X, —X is (a) Sufficient and complete (b3) Neither sufficient nor complete (©) Complete but not sufficient (@) Sufficient but not complete Which one of the following is a true statement if g is any continuous function? (a) If Tis consistent for then g(7) is consistent for 0 (b.) If T is unbiased for 6 then (7) is unbiased for 0 (©) If Tis sufficient for 6 then (7) is sufficient for 0 (@) All the above three options are false Three random observations X,X3,Xy are made on a Poisson distribution with parameter A Let g(2)=e. Then the Cramer Rao lower bound for the variance of unbiased estimators of g(2) is given by es w 3 ey (e) Bem (a) Let Xi,Xq..%, be @ random sample fiom a distribution with density [ocr* Hemel x>0;050 0; otherwise ‘The moment estimator of 0 is @) DA (b.) Sample median ©) n/D% (a) (xe AT, ln) fi Sara Howe (CLUE, New Da T10016 Fa. Wy ml ase com: Wes wm chem 7, Cale PAIR aN OPTS BSR10. a. Ips: Let Xj XgnuX, be a random sample of observations from a population with mean 0 and finite variance. For estimating @, the (a) Biased and consistent (b), Unbiased but not consistent (c.) Unbiased and consistent (4) Biased and not consistent Which one of the following statements is true? (a) In sufficiency we reduce original random variables to a few statistics for the purpose of drawing inference about parameters. (b) Sufficiency amounts to replacing Xi XoenX, by a few statistics TsToroF, in the process of discarding the information which is irrelevant to 0 (€)FiTavoT; Will be called sufficient to 0 if the conditional distribution Sof TTarole SVEN XY Xan Xo. is independent of 0 (a) A statistics is said to be sufficient if and only if the point probability density function of random variables can be expressed as two factors, where one factor depends on parameter(s) and observations through the statistics white the other is independent of parameter(s). Which one of the following statements is true? (2) The arithmetic mean of a random sample from sorte population with a finite mean is unbiased for the population mean. (b.) If the population variance o? is finite then the sample variance 2 Bd-4F based on a sample %;,¥j..%, is an unbiased estimate of 0 (€) If the r population moment exists, then r® sample moment is unbiased for the corresponding population moment (4) In general, the fiction of any unbiased estimates is unbiased 12. B. 14, 15. . Let and x be two independent observations of a Bernoulli random variable that takes values 1 or 0 with probabilities © and (1-6) respectively. If oe[.3] | which of the following is not maximum likelihood estimate of @ is? Ss ian es, ~“ Pe a 3425, 4, aera Xf distribution possesies the monotone likelihood ratio property, then which of the following is incorrect? (a) for every randomized test, there exists an equally good non randomized test (b) there exists a uniformly most powerful test for one-sided alternatives for testing composite hypotheses (©) the liketitood ratio test statistic is distributed normally (a) expectation of the likelihood ratio test statistic is equal to zero. @ ) c) @ Let ¥),Xgs0X, be a random sample from U(0,8). Which of the following is not an unbiased estimator of 0? @ Lala ) (C2 )mm(a, Xy) © (224) min(x. .X,) ()_ (Ma (24.024,)—Min(%-%)) Which one of the following statements is false? (a)For m pairs of observations the maximum possible total score in Kendall's tau is. m(n—1)/2 (b.) Kendall’s tau can not be obtained from Karl Pearson's correlation coefficient. (c)) Kendall's tau can not be obtained in the case of ties (4) Kendall’s tau can not be adapted to the theory of sampling as Spearman’s rank correlation coefficient can be. AAT, tia i PG) BGS, smi TT RTD EB |17. 18. Which one of the following statements is ‘rue? =e (a.) In sufficiency we reduce original random variables to a few statistics for the Purpose of drawing inference about parameters (b) Sufficiency amounts 0 replacing Xi XoonX, by a few statistics FsbyF, it the process of discarding the infonnation which is irrelevant to (©) TisTasoT will be called sufficient to 0 if the conditional distribution of TTavnky given XyAyynXy iS independent of 0 (4) A statistics is said to be sufficient if and only if the point probability density function of random variables can be fAPressed as two factors, where one factor ‘depends on parameter(s) and observations through the statistics while the other is independent of parameter(s). Which one of the following statements is true? (a.) The arithmetic mean of a random sample from some population with a finite mean 's tmbiased for the population mean. (b,) If the population variance a? is, finite then “the sample variance based on a sample XXX, 8 am unbiased estimate of o? (©) If the + population. moment 7 then +™ sample moment is unbiased for the conesponding population moment (@.) In general, the function of. ‘any unbiased estimates is unbiased Let and x, be wo independent observations of a Bernoulli random variable that takes values 1 or 0 with probabilities 0 and (1-6) respectively. 1¢ oe[ 2 43]. which 33 of the following is not maximum likelihood estimate of 8 is) 3425 42s (d.) 1" 2 @ BA 8A, Ft le aa Haat Khas, Ne 19. 20. 2. 22. 23, TE New Dab 110016 Ph: (I) 2650S, Ca 991854 KDOOTRTTOG RTD mal ntiiracadeny com; Westen nsendenn com XjonXy is a random sample from the U(a-0,+0),aR, then the characteristic function of X,-% is @) (4) ©) PO) | @) (7) @ af!) If Xiu, is a random sample of size *n’ from Cauchy (10) , then a consistent estimator for 0 is (a) Sample Mean (b.) Saruple variance (ce) Sample Median (a) Maxima of (2...) If a distribution possesses the monotone \ikelihood ratio property, then which of the following is incorrect? : (a.) for every randomized test, there exists an equally good non randomized test (b) there exists a uniformly most powerful test for one-sided alternatives for-testing composite hypotheses (©) the likelihood ratio test statistic is distributed normally (4, expectation of the likelihood ratio test Statistic is equal to 720, If, and W, are the most powerful critical regions of size a and a) then a 0 a sufficient statistics for 0 is @) DX (b) Sample median (ce) Yios(t-%,) @ De? Let (Xj,X2o00X~) be a random sample of observations with mean qt and finite variance. Then for estimating 1, the statistic Six, [n(n is (a2) unbiased and consistent (b.) biased and consistent (€) unbiased but not consistent (@.) biased and not consistent T, 29, 30. 31. ;,%;,%3 are independent observations from a normal population with mean @ and variance unity. Two statistics Tj and % are defined as FRAN ang 7, Xt 4 a 3 The efficiency of 7 relative to 7, is given by @) ) © (a) Let § be a test for testing 1-0. a against K-06 0,. Then the size of the test of Assertion (A): We speak of consistency of a sequence of estimates rather than one point estimate. Reason (R): Consistency is essentially a large sample properly. Choose the correct answer. (@) Both A and R are true and R is the correct explanation of A. (b.) Both A and R are true but R is not the correct explanation for A. (€.) Ais true but Ris false (4) Ais false but R is tre. Match List 1 and List 2 and select the comect answer using the codes given below the list. List t List 2 1 Randomized A Probability of ejecting Test the null hypothesis when the altemative hypothesis is true. 2 Non B o Let ¥ bea p dimensional random vector which follows (9, /,) distribution and let 4 bea real symmeirie matrix. Which ofthe following is tue? salen (a) \" AX has a chi-square distribution if A = Abutthe converse is not true (IF X”AX has a chi-square distibution then the degrees of freedom is equal to P (©) If ¥7 Ax has a chi-square distibution then the characteristics roots of 4 are either 0 or | (IE X Ax has a chi-square distibution then 4 is necessaily positive definite For a random variable ¥’ with probabil density function preabilty F(x)= (eax Jel") 250,020, the hazard function can be (#) Constant for some a (b.) An increasing function for some (¢.) A decreasing function for some q (d.) A bathtub-shaped function for some @ The hazard rates of two lifetime variables 7 and 7, with respective ed.fis F(0) and A(Dandpags f(0) and f(t), are I(O=30 and (0) =40, > 0 respectively Thea ely @) ROSA foratte>0 (b) R(1 (©) E(t) < (0) (€) AW 0 5. The failure rate of a parallel system of ewo components, where the component lifetimes are independent and have the exponential distribution with mean 2, is (a) A constant (b.) A monotone and bounded function (c) A monotone and unbounded function (@A non-tonotone function Let X,,X, and X, be independent with X,~N(h I, A,~M(-L 1) and X,~N(0, 1) Let 242 424%, 2 2424.4, Then which of the following statements ate always true? (a) 4, has a centeal chi-square distribution (b.) qz has a central chi-square distribution (©) ato has a central chi-square distribution (@.) 4, and g, are independent. Consider a series system with two independent components. Let the component lifespan have an exponential distribution with density a A000 f Ootherwise If n observation Xy,X,--X,, on lifespan of ‘this component are available and rs 1°; then the maximum likelihood nT estimator of the reliability of the system is given by wo (eee oy -{-e*) au (ee (ayeips. Suppose X is a random variable with following pat c £4 201 pe 28% slay = PO" #20 Pe 0,otherwise, 7) and0< p $1. Then the hazard function of X isa (a.) Constant function for p=Oand p =1 Ly (b.) Constant function for all 0< p <1 (c) decreasing function for0 < p <1 (d.) non-monotone function for all < p <1 Circuit 3 ‘Suppose that each of the three components fail with probability p and independently of OD ett ae each other, Let q; ~Prob (Circuit does not c fail); 71.2.3. For 0 4 A B (b) 1 > a. e (©) a> 4 C (dh) a2 > 93 a Circuit 2 i "A, Ft For a Sra How Kb, ear LT, Now Dai10016, Ph) ASTER, Cale DORIAN @ RIFE RAT mai naairnndem cmt Western

Dips Statistics PrintedNotes 80pages PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dips Statistics PrintedNotes 80pages PDF

Uploaded by

Copyright:

Available Formats

You might also like