You are on page 1of 12

Nuclear counting statistics and

probability
Abdallah Ezat, Physics department, Sohag University
April 11, 2016
Supervisor: Dr. Mohamed Mekhemar, Sohag University

Abstract
Statistics, being the study of the collection, analysis, interpretation, presentation, and
organization of data, is the way we deal with the large populations and
approximations in the nuclear experimental physics, alongside with probability. In
this report, I will discuss some of the basic concepts of statistics which will be used in
specific nuclear experiments. At the end of the report, I will discuss the probability
theory and two probability distributions, namely, normal (Gauss) distribution and
Poisson's distribution.

Contents
1. Statistics..............................................................................................................3
1.1 Descriptive statistics................................................................................................3
1.2 Inferential statistics: population and sample...........................................................3
1.3 Variable, observation, and data set..........................................................................3

2. The frequency distribution.......................................................................3


3. The Mean and the Median........................................................................4
3.1 The arithmetic Mean................................................................................................4
3.2 The properties of arithmetic mean...........................................................................4
3.3 The Median..............................................................................................................5

4. The standard deviation...............................................................................5


4.1 Dispersion or variation............................................................................................5
4.2 The range.................................................................................................................5
4.3 The standard deviation.............................................................................................5

5. Probability.........................................................................................................7
5.1 Experiment, outcomes, and sample space...............................................................7
5.2 Events, simple events, and compound events..........................................................7
5.3 Probability................................................................................................................7
5.4 Poisson's distribution...............................................................................................8
5.5 Normal distribution..................................................................................................9

References.............................................................................................................12

1. Statistics
1.1 Descriptive statistics
The use of graphs, charts, and tables and the calculation of various statistical measures to organize
and summarize information is called descriptive statistics. Descriptive statistics help to
reduce our information to a manageable size and put it into focus.

1.2 Inferential statistics: population and sample


The complete collection of items or data under consideration in a statistical study
is referred to as the population. The portion of the population selected for analysis is called the
sample. Inferential statistics consists of techniques for reaching conclusions about a population
based upon information contained in a sample.

1.3 Variable, observation, and data set


A characteristic of interest concerning the individual elements of a population or a sample is called
a variable. A variable is often represented by a letter such as x, y, or z. The value of a variable for
one particular element from the sample or population is called an Observation. A data set consists of
the observations of a variable for the elements of a sample.

2. The frequency distribution


if we have large masses of raw data (collected data which have not been organized numerically,
such as the set of 100 male students obtained from an alphabetical listing of university records), it is
often useful to distribute data into classes or categories and to determine the number of individuals
belonging to each class, which is called class frequency. The following table is an example:
Mass (kg)

Number of
students

60-62
63-65
66-68
69-71
72-74

5
18
42
27
8
Total 100

The first class or category, for example, consists of masses from 60 to 62 Kilograms. Since 5
students have masses belonging to this class, the corresponding class frequency is 5.
A symbol defining a class such as 60-62 in the above table is called class interval. The end
numbers, 60 and 62, are the class limits (the lower class limit is 60 and the higher class limit is 62).
3

3. The Mean and the Median


As most nuclear experiments are affected by many factors, which may result in an inaccurate
results, we should measure the same quantity more than once, and then choose the most accurate
one of the readouts, i.e. taking the average of them. The average is a value which is typical or
representative of a set of data. Since such typical values tend to lie centrally within a set of data
arranged according to magnitude, averages are also called measurements of central tendency. There
are several types of averages, the most common being the Mean (also known as arithmetic Mean)
and the Median.

3.1 The arithmetic Mean


The arithmetic mean of a set of N numbers X1,X2,...XN is denoted by

X and defined as:

X , X , ... X N
=
X= 1 2
N

Xi
i=1

If the numbers X1,X2,...XN occur f1,f2,... fk times respectively, (i.e. occur with frequencies f1,f2,... fk),
the arithmetic mean is:
X=

f 1 X 1 , f 2 X 2 , ... f k X k
f 1 +f 2+...+ f k

3.2 The properties of arithmetic mean


a) The algebraic sum of the deviations of a set of numbers from their arithmetic mean is zero
example: the deviations of the numbers 8,3,5,12,10 from their arithmetic mean 7.6 are 8-7.6,
3-7.6, 5-7.6, 12-7.6, 10-7.6 with algebraic sum 0.4-4.6-2.6+4.4+2.4=0.
b) the sum of the squares of the deviations of a set of numbers xj from any number a is a
minimum iff a = X
c) if f1 numbers have mean m1, f2 numbers have mean m2,... fk numbers have mean mk, then the
mean of all numbers is
f m , f m , ... f k mk
X= 1 1 2 2
f 1 + f 2+ ...+ f k

3.3 The Median


The median of a set of numbers arranged in order of magnitude (i.e. in an array) is the middle value
or the arithmetic mean of the two middle values.
4

Example 1. the set of numbers 3,4,4,5,6,8,8,8,10 has median 6

1
(9+ 11)=10
2
For grouped date (like the frequency table above) the median is given by
N
( f )1
2
)
Median = L1 + L1+(
f median
where
L1 = lower class boundary of the median class (i.e. the class containing the median )
N = number of items in the data (i.e. total frequency)
( f )1 = sum of frequencies of all classes lower than the median class
fmedian = frequency of median class
c = size of median class
Example 2. the set of numbers 5,5,7,9,11,12,15,18 has median

4. The standard deviation


4.1 Dispersion or variation
The degree to which numerical data tend to spread about an average value is called variation or
dispersion of the data. Various measures of dispersion or variation are available, the most common
being the range, mean deviation, semi-interquartile, and the standard deviation. We will discuss
only the range and the standard deviation.

4.2 The range


The range of a set of numbers is the difference between the largest and the smallest number in the
set.
Example: the range of the set 2,3,3,5,5,5,8,10 is 12-2 = 10. sometimes it is given by quoting the
largest and smallest number. In the last example, for instance, the range could be indicated as 2 to
12 or 2-12.

4.3 The standard deviation


The standard deviation of a set of numbers

X1,X2,...XN is denoted by s -for a sample from a


population and for the standard deviation of the population itself- and is defined by

s=

( X i X)2

(i=1)

x2

(
)
N

where x represents the deviations of each of the numbers xi from the mean X .
Thus s is the root mean square 1 of the deviations from the mean, or the root mean square deviation.
1- Its name suggests that it is the root of the square of the mean of the data! This type of average is
frequently used in physical applications
5

The variance of a set of data is defined as the square of the standard deviation and is thus given by
s2 in the above equation.
4.3.1 The properties of the standard deviation
1. For normal distributions2 it turns out that:
a) 68.3% of the cases are included between X s and X + s
(i.e. one standard deviation on either side of the mean) (see the figure below)
b) 95.45% of the cases are included between X 2 s and X 2 s
(i.e. two standard deviations on either side of the mean)
c) 97.73% of the cases are included between X 3 s and X 3 s
(i.e. three standard deviations on either side of the mean)

2- Suppose that two sets consisting of N1 and N2 numbers have variances given by s 21 and s 22
respectively and the same mean X . Then the combined variance of both sets is given by
N 1 s21 + N 2 s 22
s =
N1+ N2
2

5. Probability
2- A distribution that describes most statistical processes having a continuously varying magnitude.
6

5.1 Experiment, outcomes, and sample space


An experiment is any operation or procedure whose outcomes cannot be predicted with certainty.
The set of all possible outcomes for an experiment is called the sample space for the experiment.
Example
When a quality control technician selects an item for inspection from a production line, it may be
classified as defective or non-defective. The sample space may be represented by S = (D, N}. When
the blood type of a patient is determined, the sample space may be represented as S = (A, AB, B, 0).

5.2 Events, simple events, and compound events


An event is a subset of the sample space consisting of at least one outcome from the sample
space. If the event consists of exactly one outcome, it is called a simple event. If an event consists of
more than one outcome, it is called a compound event.
Example a quality control technician selects two computer mother boards and classifies each as
defective or non-defective. The sample space may be represented as S = {NN, ND, DN, DO], where
D represents a defective unit and N represents a non-defective unit. Let A represent the event that
neither unit is defective and let B represent the event that at least one of the units is defective. A
{NN} is a simple event and B = {ND, DN, DD) is a compound event.

5.3 Probability
Probability is a measure of the likelihood of the occurrence of some event. There are several
different definitions of probability. Three definitions are discussed in the next section. The
particular definition that is utilized depends upon the nature of the event under consideration.
However, all the definitions satisfy the following two specific properties and obey the rules of
probability.
The probability of any event E is represented by the symbol P(E) and the symbol is read as P of
E or as the probability of event E. P(E) is a real number between zero and one as indicated in the
following inequality:
0P ( E)1
The sum of the probabilities for all the simple events of an experiment must equal one. That is, if
E1 , E2 , . . . , E, are the simple events for an experiment, then the following equality must be true:
P(E1) + P(E2) + . . . + P(En) = 1
This equation is also sometimes expressed as in formula
P(S) = 1
The last equation states that the probability that some outcome in the sample space will occur is
one.

5.4 Poisson Distribution


A Poisson distribution is the probability distribution that results from a Poisson experiment.
Attributes of a Poisson Experiment
A Poisson experiment is a statistical experiment that has the following properties:
The experiment results in outcomes that can be classified as successes or failures.
The average number of successes () that occurs in a specified region is known.
The probability that a success will occur is proportional to the size of the region.
The probability that a success will occur in an extremely small region is virtually zero.
Note that the specified region could take many forms. For instance, it could be a length, an area, a
volume, a period of time, etc.
Notation
The following notation is helpful, when we talk about the Poisson distribution.
e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural
logarithm system.)
: The mean number of successes that occur in a specified region.
x: The actual number of successes that occur in a specified region.
P(x; ): The Poisson probability that exactly x successes occur in a Poisson experiment,
when the mean number of successes is .
Poisson Distribution
A Poisson random variable is the number of successes that result from a Poisson experiment.
Theprobability distribution of a Poisson random variable is called a Poisson distribution.
Given the mean number of successes () that occur in a specified region, we can compute the
Poisson probability based on the following formula
Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of
successes within a given region is . Then, the Poisson probability is:
P(x; ) = (e-) (x) / x!
where x is the actual number of successes that result from the experiment, and eis approximately
equal to 2.71828.
The Poisson distribution has the following properties:
The mean of the distribution is equal to .
The variance is also equal to .
Example 1
The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:

= 2; since 2 homes are sold per day, on average.


x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
e = 2.71828; since e is a constant equal to approximately 2.71828.
We plug these values into the Poisson formula as follows:
P(x; ) = (e-) (x) / x!
P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is 0.180.

5.5 Normal Distribution


The normal distribution refers to a family of continuous probability distributions described by the
normal equation.
The Normal Equation
The normal distribution is defined by the following equation:
Normal equation. The value of the random variable Y is:
Y = { 1/[ * sqrt(2) ] } * e-(x - )2/22
where X is a normal random variable, is the mean, is the standard deviation, is
approximately 3.14159, and e is approximately 2.71828.
The random variable X in the normal equation is called the normal random variable. The normal
equation is the probability density function for the normal distribution.
The Normal Curve
The graph of the normal distribution depends on two factors - the mean and the standard deviation.
The mean of the distribution determines the location of the center of the graph, and the standard
deviation determines the height and width of the graph. When the standard deviation is large, the
curve is short and wide; when the standard deviation is small, the curve is tall and narrow. All
normal distributions look like a symmetric, bell-shaped curve, as shown below.

The curve on the left is shorter and wider than the curve on the right, because the curve on the left
has a bigger standard deviation.

Probability and the Normal Curve


The normal distribution is a continuous probability distribution. This has several implications for
probability.
The total area under the normal curve is equal to 1.
The probability that a normal random variable X equals any particular value is 0.
The probability that X is greater than a equals the area under the normal curve bounded by
a and plus infinity (as indicated by the non-shaded area in the figure below).
The probability that X is less than a equals the area under the normal curve bounded by a
and minus infinity (as indicated by the shaded area in the figure below).

Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following "rule".
About 68% of the area under the curve falls within 1 standard deviation of the mean.
About 95% of the area under the curve falls within 2 standard deviations of the mean.
About 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly, given a
normal distribution, most outcomes will be within 3 standard deviations of the mean.
To find the probability associated with a normal random variable, use a graphing calculator, an
online normal distribution calculator, or a normal distribution table. In the examples below, we
illustrate the use of Stat Trek's Normal Distribution Calculator, a free tool available on this site. In
the next lesson, we demonstrate the use of normal distribution tables.
Example 1
An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard
deviation of 50 days. Assuming that bulb life is normally distributed, what is the probability that an
Acme light bulb will last at most 365 days?
Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want to find the
cumulative probability that bulb life is less than or equal to 365 days. Thus, we know the following:
The value of the normal random variable is 365 days.
The mean is equal to 300 days.
The standard deviation is equal to 50 days.
We enter these values into the Normal Distribution Calculator and compute the cumulative
probability. The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will
burn out within 365 days.

10

Example 2
Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard
deviation of 10, what is the probability that a person who takes the test will score between 90 and
110?
Solution: Here, we want to know the probability that the test score falls between 90 and 110. The
"trick" to solving this problem is to realize the following:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )
We use the Normal Distribution Calculator to compute both probabilities on the right side of the
above equation.
To compute P( X < 110 ), we enter the following inputs into the calculator: The value of the
normal random variable is 110, the mean is 100, and the standard deviation is 10. We find
that P( X < 110 ) is 0.84.
To compute P( X < 90 ), we enter the following inputs into the calculator: The value of the
normal random variable is 90, the mean is 100, and the standard deviation is 10. We find that
P( X < 90 ) is 0.16.
We use these findings to compute our final answer as follows:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )
P( 90 < X < 110 ) = 0.84 - 0.16
P( 90 < X < 110 ) = 0.68
Thus, about 68% of the test scores will fall between 90 and 110.

11

References
[1] Spiegel, Murray R. Schaum's Outline of Theory and Problems of Statistics.
New York: Schaum Pub., 1961. Print.
[2] Dodge, Yadolah. The Oxford Dictionary of Statistical Terms. Oxford: Oxford
UP, 2003. Web.
[3] "Normal Distribution" <http://stattrek.com/probabilitydistributions/normal.aspx>.
[4] "Poisson Distribution" <http://stattrek.com/probabilitydistributions/poisson.aspx>.

12