15 views

Uploaded by Usama Ajaz

a student project

- Hypothesis (Presentation)
- Introduction to the Practice of Basic Statistics (Textbook Outline)
- Prc
- Chapter 2 Probability
- Quantitative Techniques
- Recitation 1
- Management of Out of Specification (OOS) for Finished Product
- Exam2_SampleProblems(1)
- Lecture Probability
- ifdp1114
- Management of out of specification.pdf
- Meta-Analysis of Hazard Ratios
- 1-intro-rev
- LP in msc
- Chi Square T Test
- THTR
- Lecture Note Statis Probability-5(1811)
- CCCU_CGE13101_EXAM2013A
- Count Data PDF
- Data Driven Decision HW 1

You are on page 1of 17

RELATED

DISTRIBUTIONS

SUBMITTED BY

OSAMA BIN AJAZ

(std_18154@iobm.edu.pk)

CONTENTS

Abstract

03

Bernoulli distribution

04

Binomial distribution

05

Multinomial Distribution

07

Beta binomial distribution

08

Correlated binomial distribution

08

09

Neyman C () test

09

Testing goodness of fit of binomial distribution

09

The C () test for correlated binomial alternatives

10

C () test for beta binomial alternatives

10

The C () test for Althams multiplicative alternatives

11

12

13

References

14

ABSTRACT

R. E. Tarone from National Cancer Institute, Bethesda, Maryland;

derive the tests for the goodness of fit of the binomial distribution

using C() procedure of Neyman (1959), which are

asymptotically optimal against generalized binomial alternatives

proposed by Altham (1978) and Kupper & Haseman (1978).

Before coming to the article I have explain about binomial and

related distributions. I have reproduced key parts of the article, if

somebody interested in detail of the article then he is advice to

see references at the end page of the report.

Bernoulli trial

A Bernoulli trial (named after James Bernoulli, one of the founding fathers of

probability theory) is an experiment with two, and only two possible

outcomes [2]. For example: female or male, life or death, Head or Tail and

success or failure etc. A sequence of Bernoulli trials occur when a Bernoulli

experiment is performed several independent times so that the probability of

success, say p, remains the same from trial to trial.

Bernoulli distribution

A random variable X is defined to have a Bernoulli distribution if the discrete

density function of X is given by

1 x

f ( x )= p (1p) forx=01

0 otherwise

Where the parameter p satisfies 0p1,

If X has a Bernoulli distribution, then

E[x] = p,

var [x] = pq,

Mx (t) = pet + q.

Proof

1

E[x] =

x p x (1 p)1x=0. q+1. p= p

x=0

Mx (t) = E[etx] =

etx p x ( 1 p)1x

x=0

= q+pet

Example 1: out of millions of instant lottery tickets, suppose that 20% are

winners. If five such tickets are purchased, (0, 0, 0, 1, 0) is a possible

observed sequence in which the fourth ticket is a winner and the other four

are losers. Assuming independence among winning and losing tickets, the

probability of this outcome is (0.8) (0.8) (0.8) (0.2) (0.8) = (0.2) (0.8) 4 [5]

In a sequence of Bernoulli trials, we are often interested in the total number

of successes and not in the order of their occurrence. If we let the random

variable X equal the number of observed successes in n Bernoulli trials, the

possible values of X are 0, 1, 2, . . ., n. if x successes occur, where x=0,1,2,

, n, then n-x failures occur. The number of ways selection x positions for

the x successes in the n trials is

n!

(nx)= x !(nx)!

independent and since the probabilities of success and failure on each trial

are, respectively, p and q=1-p, the probability of each of these ways is px (1p) n-x. Thus f(x), the p.m.f of X, is the sum of the probabilities of these

mutually exclusively events, that is

f ( x )= n p x (1 p)n x for x=0,1,2, n

x

()

(nx )

variable X is said to have a binomial distribution.

A binomial experiment satisfies the following properties:

1. A Bernoulli experiment is performed n times.

2. The trials are independent.

3. The probability of success on each trial is a constant p; the probability

of failure is q=1-p.

4. The random variable X equals the number of successes in the n trials.

A binomial distribution is denoted by the symbol b (n, p) and we say that the

distribution of X is b (n, p). The constants n and p are called the parameters

of the binomial distribution. Thus if we say that the distribution of X is b (10,

n=10 from a Binomial distribution with p=1/5.

The binomial distribution derived its name from the fact that the (n+1) terms

in the binomial expansion of (q + p) n correspond to the various values of b(x;

n, p) for x=0, 1, 2. . . n. That is

n

(q+ p) n=

n1

n2

+ + n pn

n

()

b ( x ; n , p )=1

x=0

Example 2: if we want to find the probability of obtaining exactly three 2s if

an ordinary die is tossed 4 times; then the probability is:

b (4,

6 =

1

6

5

6

4

3

()

b(x; n, p) are:

= np

2 = npq

Mx (t) = (q+pet) n respectively.

Proof

pe

( t )x qn x

Mx (t) = E[etx] =

x=0

x=0

= (pet + q)

And second derivative is

(pet + q) n-1

Var[X] = E[x2]-{E[x]} 2= Mx (0) (np) 2= n (n 1) p2 + np (np) 2= np (1 p)

2 1 t

+ e

3 3

)5

then X has a binomial distribution with n = 5 and p = 1/3; that is, the pmf of

X is

Note: Binomial distribution reduces to the Bernoulli distribution when n=1.

Sometimes the Bernoulli distribution is called the point binomial.

Example 4: Let the random variable Y be equal to the number of successes

throughout n independent repetitions of a random experiment with

probability p of success. That is, Y is b (n, p). The ratio Y/n is called the

relative frequency of success. Now recall Chebyshevs Inequality i.e. P (|x-|

2

) 2 for all >0.

Y

Var ( )

Y

p(1 p)

n

P (| n p )

=

2

n 2

Now, for every fixed > 0, the right-hand member of the preceding inequality is close to zero

for sufficiently large n. That is

Since this is true for every fixed > 0, we see, in a certain sense that the

relative frequency of success is for large values of n, close to the probability

of p of success [3].

Example 5: Let the independent random variables X1, X2, X3 have the same

cdf F(x). Let Y be the middle value of X1, X2, X3. To determine the cdf of Y ,

say FY (y) = P(Y y), we note that Y y if and only if at least two of the

random variables X1, X2, X3 are less than or equal to y. Let us say that the ith

trial is a success if Xi y, i = 1, 2, 3; here each trial has the probability of

success F(y). In this terminology, FY (y) = P(Y y) is then the probability of

at least two successes in three independent trials. Thus

FY(y) =

3

2

()

y

1F

[

)+

2

[F ( y )]

[F ( y )] .

If F(x) is a continuous cdf so that the pdf of X is F(x) =f(x), then the pdf of Y

is

FY(y) = FY(y) =6[F(y)] [1-F(y)] f(y). [4]

MULTINOMIAL DISTRIBUTION

Recall that in order for an experiment to be binomial; two outcomes are

required for each trial. But if each trial in an experiment has more than two

outcomes, a distribution called the multinomial distribution must be used.

For example, a survey might require the responses of approve,

disapprove, or no opinion. In another situation, a person may have a

choice of one of five activities for Friday night, such as a movie, dinner,

baseball game, play, or party. Since these situations have more than two

possible outcomes for each trial, the binomial distribution cannot be used to

compute probabilities.

If X consists of events E1, E2, E3, . . . , Ek, which have corresponding

probabilities p1, p2, p3, . . . , pk of occurring, and X1 is the number of times E1

will occur, X2 is the number of times E2 will occur,X3 is the number of times E3

will occur, etc., then the probability that X will occur is

P ( X )=

n!

. p x p x p xk

X1 ! X2! X3! Xk ! 1 2

1

For an illustration purpose let a box contains four white balls, three red balls,

and three blue balls. A ball is selected at random, and its color is written

down. It is replaced each time and let we want to find the probability that if

five balls are selected, two are white, two are red, and one is blue.

The distribution with discrete density function

f(x) = f(x; n, , ) =

(nx)

( + ) (n+ x )

.

(n+ + )

I{0,1 , , n}(x)

binomial distribution.

The beta binomial distribution has Mean =

n

+

and variance =

n ( n+ + )

( + )2 ( + +1)

If ==1, then the beta binomial distribution reduces to a discrete uniform

distribution over the integers 0, 1 n. [2]

the fetuses in a litter are not mutually independent. This idea is due to

Bahadur (1961). Retaining only the first order correlation between the

responses and denoting as the covariance between the binary responses of

any two fetuses, the random variable X is such that

where p is the probability that the fetus is abnormal. Note that for the above

equation to be a valid probability distribution, a data-dependent bound for

the parameters has to be imposed; see Kupper and Haseman (1978). It can

be shown that the expectation and variance of the correlated binomial

distribution are np and np (1-p) + n(n-1), respectively. Thus, the correlated

binomial distribution is a generalization of the binomial distribution, the CB

distribution becomes the binomial distribution when =0. Altham (1978)

derived a further two-parameter generalized binomial distribution, namely,

the multiplicative generalized binomial (MB) distribution.

The probability mass function of the Altham-multiplicative binomial

distribution is

n p (1 p)

(

x)

P ( X=x )=

x

nx

a x(nx)

F( n)

x= 0, 1, 2, . . . , n

a0

0p1

Neyman C () test

10

hypotheses testing problems in applied research often involve several

nuisance parameters. In these composite testing problems, most powerful

tests do not exist, motivating search for an optimal test procedure that yields

the highest power among the class of tests obtaining the same size.

Neymans locally asymptotically optimality result for the C() test employs

regularity conditions inherited from the conditions used by Cramer (1946) for

showing consistency of MLE and some further restrictions on the testing

function to allow for replacing the unknown nuisance parameters by its nconsistent estimators. It is the confluence of these Cramer conditions and

the maintained significance level that gives the name to the C () test

DISTRIBUTION*

R. E. Tarone from National Cancer Institute, Bethesda, Maryland; derive the

tests for the goodness of fit of the binomial distribution using C() procedure

of Neyman (1959), which are asymptotically optimal against generalized

binomial alternatives proposed by Altham (1978) and Kupper & Haseman

(1978) [5].

Consider an experiment in which the responses take the form of proportions

and let the ith response be given by pi=xi/ni for i=1, ... , M. Under the

correlated binomial model the log likelihood function is :

M

i=1

i=1

x ni p ) 2+ x i ( 2 p1 )ni p2 }]

2 2 {( i

2p q

the goodness of fit of the binomial distribution is obtained by testing the null

hypothesis: Ho: =0 in the presence of nuisance parameter p. Moran (1970)

demonstrated that for such problems the C () tests proposed by Neyman

(1959) are asymptotically equivalent to tests using maximum likelihood

11

the following partial derivatives of L evaluated at = 0:

Under the null hypothesis, the xi are independent binomial random variables,

and hence it follows from (2) that E {S2 (p)} =0. Neyman (1959) has shown

that when E {S2 (p)} =0 the null hypothesis Ho: =0 can be tested using the

^

statistic S1 ( p) , where ^p is a root-n consistent estimator of p (Moran,

1970). Substituting the consistent estimator

^p=

xi

ni

p

x ini ^

ni

S

2

(^

p) =

S=

,

we

find

that

C

()

test

statistic

is

given

by

S

.

Since E {S2 (p)} =0, the variance of S ( ^p ) is given by E {S3 (p)} where the

expectation is taken under Ho: =0. From (3) it follows that E {S3 (p)} =

ni (ni1)

2 p2 q2

. Substituting

^p

The statistic X2c is the C () test statistic for homogeneity of proportions

which is asymptotically optimal against correlated binomial alternatives.

The binomial variance test for homogeneity is based on the statistic

12

freedom when b= 0. It is clear from the above expressions that for the case

in which ni = n for all i, the C () test statistic S is equivalent to the variance

test statisticX2v.

The beta-binomial distribution is a mixture of binomial distributions which

has often

been utilized as an alternative to the binomial distribution. Under the betabinomial model

the log likelihood function is given by

of fit of the

binomial distribution is obtained by testing the null hypothesis Ho: = 0.

The derivation of

the C () test statistic using the beta-binomial model is similar to the

derivation for the correlated binomial model, and the optimal statistic again

is found to be the statistic S

derived in the last section. Note, however, that in the beta-binomial model

the parameter cannot take negative values. The alternative hypothesis is

necessarily one sided, and hence the

C () test is the one-sided test based on the statistic the C () test is the onesided test based on the statistic cannot take negative values. The alternative

hypothesis is necessarily one sided, and hence the C () test is the one-sided

test based on the statistic

Under the null hypothesis Ho: = 0, the statistic Z will have an asymptotic

standard normal

distribution.

13

The multiplicative generalization of the binomial distribution provides an alternative for which

the correlated binomial C () test is not asymptotically optimal. The log likelihood function for

the multiplicative generalization of the binomial model is

i

nix

The C () test for Ho: =1 is based on the statistic x I () . Note that unlike the correlated

R=

binomial C () statistic, R is not equivalent to the variance test statistic in the case ni = n for all i.

Will have an asymptotic chi-squared distribution with one degree of freedom. The test based on

X2m is asymptotically optimal against alternatives given by the multiplicative generalization of

the binomial mode

In order to compare the different tests of the goodness of fit of the binomial

distribution we

consider the treatment group data of Kupper & Haseman (1978, p. 75). The

observed proportions were 0/5,2/5,1/7,0/8,2/8,3/8,0/9,4/9,1/10and 6/10.The

variance test gives X2v = 19.03 and P = 0.025,the correlated binomial C()

test gives X2c= 6.63and P = 0. 01. Thus for this example, the correlated

binomial C() test is more sensitive to the departure of the observed

proportions from a binomial distribution than the other tests considered.

14

the null hypothesis, a Monte Carlo experiment was performed. Ten binomial

proportions were randomly generated using the unequal sample sizes from

the above example. For each pseudorandom sample of 10 proportions the C

() statistics X2c and the variance test statistic X2v were calculated and

compared to the 100%, 500 and 1% points of their asymptotic null

distributions. The empirical significance levels based on 1500 replications are

shown in Table 1 for under lying binomial probabilities of 0.10, 0.25and 0.50.

For the cases considered, the empirical significance levels for the correlated

binomial C () statistic are significantly lower than the nominal level for the

500 and 10% critical values. The empirical significance levels for the 1%

critical value show no consistent pattern.

optimal against correlated binomial and variance test, based on

1500 replications for underlying binomial probabilities of 0.10, 0.25

and 0.50

Nomin

al

level

X2c

X2m

X2v

0.01

0.007

0.010

0.003

Binomial Probabilities

P=0.10

P=0.25

0.05

0.10

0.01

0.05

0.10

0.019

0.043

0.042

0.048

0.100

0.082

0.013

0.012

0.012

0.035

0.037

0.042

0.073

0.085

0.097

0.01

0.009

0.009

0.007

P=0.50

0.05

0.10

0.034

0.031

0.049

0.077

0.075

0.108

variance test and the generalized binomial C () tests for

correlated binomial and multiplicative alternatives

15

Test

statistic

X2v

X2c

X2m

Correlated binomial Multiplicative generalized

binomial

0.95

0.71

1.00

0.82

0.79

1.00

variance test and the generalized binomial C () test for

correlated binomial and multiplicative alternatives; it shows that

the correlated binomial C () test is more efficient than the

variance test for multiplicative alternatives as well as for

correlated binomial alternatives.

REFERENCES

1. Alexander M. Mood, Franklin A. Graybill and Duane C. Boes,

Introduction to the theory of statistics, third edition, McGrawHill series in probability and statistics

16

second edition, page 89, Duxbury Advanced Series.

3. Hogg, McKean and Craig, Introduction to Mathematical

Statistics (2013), seventh edition, Pearson education, Inc.

4. Paul S. R. , A three parameter generalization of binomial

distribution, Windsor mathematics report, February 1984

5. Robert V. Hogg, Elliot A. Tennis, Jagan Mohan Rao, Probability

and Statistical Inference, seventh edition, Pearson Education

6. Tarone, R. E. (1979), Testing the goodness of fit of binomial

distribution, Biometrika 66, 585 590

17

- Hypothesis (Presentation)Uploaded byAlleli Aspili
- Introduction to the Practice of Basic Statistics (Textbook Outline)Uploaded byEugene Johnson
- PrcUploaded byVíctor Rivasplata
- Chapter 2 ProbabilityUploaded byIzwan Yusof
- Quantitative TechniquesUploaded byGaurav Somani
- Recitation 1Uploaded byAhmed Hassan
- Management of Out of Specification (OOS) for Finished ProductUploaded bypires35
- Exam2_SampleProblems(1)Uploaded byVivek Singh
- Lecture ProbabilityUploaded byAzfar Faizan
- ifdp1114Uploaded byTBP_Think_Tank
- Management of out of specification.pdfUploaded byKuldeep
- Meta-Analysis of Hazard RatiosUploaded byscjofyWFawlroa2r06YFVabfbaj
- 1-intro-revUploaded byKevin Hyun
- LP in mscUploaded byMczoC.Mczo
- Chi Square T TestUploaded bysitalcoolk
- THTRUploaded byPoe Sarpany Putra
- Lecture Note Statis Probability-5(1811)Uploaded byAshraful Alam
- CCCU_CGE13101_EXAM2013AUploaded byPing Fan
- Count Data PDFUploaded byRhoda Mae Dano Jandayan
- Data Driven Decision HW 1Uploaded byJen Chang
- math1040skittles1-4Uploaded byapi-320298210
- 3_Testing1Uploaded byVarun Gupta
- School level StatisticsUploaded byganeshedw
- Midterm ReviewUploaded byArthur Chow
- AnovaUploaded byNicole Mapili
- Study on the Relationship Between Financial Constraints and Stock Return in Tehran Stock ExchangeUploaded byTI Journals Publishing
- L10-EnergyUsed by age.pdfUploaded byDaniela
- 11002192Uploaded bywaachathura
- Approval SheetUploaded byReza Sungkar
- Bioinformatics 2007 Rivals 401 7Uploaded byjose

- Forty Hadith on the Intercession of NABI SALLALAHU ALAYHI WASALLAMUploaded byAbdul Mustafa
- Challan FormUploaded byAhmer Khan
- Microsoft Word - SYLLABUS-cceUploaded byab99math
- ProjectUploaded byUsama Ajaz
- 7 Cs of CommunicationUploaded byUsama Ajaz
- Antidote to SuicideUploaded bySehra E Madina
- Binomial and Related DistributionsUploaded byUsama Ajaz
- Binomial and Related DistributionsUploaded byUsama Ajaz
- Bloodshed in Karbala [English]Uploaded byDar Haqq (Ahl'al-Sunnah Wa'l-Jama'ah)
- The Odds, NYtimes BeyesianUploaded byUsama Ajaz
- work of GEP BoxUploaded byUsama Ajaz
- OrthogonalityUploaded byUsama Ajaz
- GAT-subjectveUploaded byUsama Ajaz
- The Respect of a MuslimUploaded byAli Asghar Ahmad
- 101 Madani PearlsUploaded byMuhammad Wajeeh
- Khazana-e-Khuda_Ki_Chabiyan_Habib-e-Khuda_K_Hath_MeinUploaded byTariq Mehmood Tariq

- EURAMET.M.FF-S6Uploaded byRoman Vulpe
- ch04Uploaded byPrashanthi Priyanka Reddy
- A HPTLC Densitometric Determination of FlavpnoidsUploaded byiphc_patilaya1046
- Kaschner Et Al, 2006, MEPS, #2430Uploaded byKristin Kaschner
- 31-3ee-1Uploaded byDedeh Reskasari
- AncovaUploaded byJoanne Lew
- homework suggestions from chapter 13Uploaded byapi-234237296
- Wetted Wall Column.docx report.docxUploaded byLookingtowin
- 2017 ACTL2131 ExercisesUploaded byAlex Wu
- Starting rUploaded bychysa
- 2015 Estimating MDD and OMCUploaded byKrmt Goji Samaratunggadewa
- Distribution parameters From MATLABUploaded byJunhyo Lee
- CHE 102 Package- Final 2010.pdfUploaded byzain-hirani
- Multiple RegressionUploaded bytito khan
- LITTLE FILED 1.pdfUploaded byPrerana Rai Bhandari
- Experiment 3 (Sublimation and Melting Point Determination).docUploaded byCheng Bauzon
- Surprise2 SolUploaded byDHarishKumar
- Avalanche Deaths.pdfUploaded bymohd ferdaus mohd tahar
- dataanalysisusingspss-150424001503-conversion-gate01.pdfUploaded byKamil Irmansyah
- hmk3[1]Uploaded bysykim657
- Availability and Unavailability FactorsUploaded byjose_manuel_freitas4189
- HW1_ININ5559.pdfUploaded byOsiris Lopez - Manzanarez
- Final Notes on SQC.docUploaded byShashank Srivastava
- Forecasting MUploaded byYenny Dusty Pink
- Data MiningUploaded byvibgyor31
- Brien 1992 Libro FactorialesUploaded byCarlos Andrade
- 1997_3_structural Integrity Assessment Procedures ForUploaded bykjyeom_258083906
- F Distribution PDFUploaded byTracy
- prmia iiUploaded byLetsogile Baloi
- Lumb.the Residual Soils of Hong KongUploaded bypleyvaze