You are on page 1of 5

6. Basic terms of mathematical statistics.

Statistical population (Генеральной совокупностью) is a set of similar


objects or items, which is of interest for some study.
Typically, the population is very large, making a complete enumeration of
all the values in the population either impractical or impossible. The sample
usually represents a subset of manageable size.
A data sample (выборочная совокупность) is a set of data collected or
selected from a statistical population.
There can be the following types of sampling:
Repeated sampling (Повторной) is a chosen object or a set of objects that
are in the statistical population again before the next object is selected.
Non-repeated sampling (Бесповторной) is a chosen object that doesn’t
return to the statistical population.
The population size (Объёмом совокупности) is the total number of
objects of the population.
An unbiased (representative) sample (Репрезентативной
(представительной)) is defined as a sample where each individual member of
the population has the same chance of being selected as part of the sample.
Let’s consider the sample, where value x1 is repeated n1 times, х2 - n2 times,
…, xk. - nk times. Variants are the values (variables) of х1, х2, ..., xk. Variation
series (Вариационный ряд) – is a sequence of variables in ascending order.
Частотами называются числа наблюдений n1, n2 ,… , nk . Relative frequencies are

ni
variables wi  , where n – is a sample volume. Statistical sample distribution is
n
the correspondence between the observed variants and their frequencies or relative
frequencies. Empirical distribution function (Эмпирической функцией

nx
распределения) is the function like F *  , where nx – a number of variants,
n
which are less than x, n – sample volume.
1. Point and interval estimations of distribution parameters

Sample mean (Выборочной средней) x в is the arithmetic average of the

k
observed values of one of the variables. x в  1  ni xi   n1 x1  n2 x2  ...  nk xk  n .
n i 1

2
Sample variance (Выборочной дисперсией) is Dв 
1 k
 
 ni xi  x в .
n i 1
Sample

standard deviation is   X   Dв  X 

Example 7.1. Sampling


xi 5 7 9 10
ni 2 3 1 4

Find: а) sample mean; б) sample variance.


Solution. Sample volume n =2+3+1+4=10.
1 k
а) x в   ni xi   2  5  3  7  1  9  4  10 10  8
n i 1

 
2

б) Dв 
1 k
n i 1

 ni xi  x в  
1
10
2 5  8  3 7  8  1 9  8  410  8  3,8
2 2 2 2

Point estimation is determined by the one number for example the point
estimation of the mathematical expectation of a value Х of statistical population is
the sample mean.
Interval estimation is determined by two numbers – the ends of the interval.
Confidence probability  is the probability , of deviation  which is quite less.
The most frequent reliability (accuracy of estimation), is 0,95; 0,99; 0,995.
Interval (– ,  + ) is called confidence interval for estimation of the parameter
.
Confidence interval for estimation of the mathematical expectation
(Доверительный интервал для оценки математического ожидания) of
normal distribution when the standard deviation  is written as  xв   ; xв    ,

where the accuracy of the estimation can be calculated by the formula  t
n
, the


value t can be found in the proportion F t  or in the table.
2

Example 7.2. The random value Х has a normal distribution with standard
deviation =4. Find the confidence interval for the estimation of unknown
mathematical expectation а by sample mean x в  3,6 , if the sample volume n =64
and accuracy of the estimation =0,95.
0,95
Ft   0,475 you can find t, using Laplace tables. t=1,96.
2

So, the accuracy of the estimation equals   1,96


4
 0,98 and the
64

confidence interval (3,6 – 0,98; 3,6 + 0,98).


If there is accuracy = 0,95, then we get inequation
2,62 < a <4,58.
8. Linear regression equation
We have found the linear connection between X and Y, where Y – is a
random value. For example, we have values y1, y 2,..., y n of the random value Y in
points x1, x 2,..., xn . Points depicted in the coordinate system (хi, yj),
i  1,2,..., n, j  1,2,..., n, is called scattering graph (диаграммой рассеяния).
The placement of points gives the possible form of functional relationships
y    x, and this equation is called regression equation.
Y

y1
y2 y = φ(х;а,b,…)

X
x1 x2

Fig.8.1.
There can be linear regression y  ax  b , quadratic y  ax 2  bx  c and so on.
The values of the parameters a, b remain unknown; experimental data gives only
point estimations  ,  parameters a, b

In case of linear regression y  ax  b simultaneous equations for the point


estimations  ,  parameters a, b can be written as

 n n

  xi   n   y i
 i 1 i 1
 n n n

  xi2    xi   xi yi
 i 1 i 1 i 1

Target values , can be found as the solution of the simultaneous


equations.
Example 8.1. Find a random (sample) equation of the linear regression by
the following parameters:
x 1 2 3 4 5 6
i
y 5,2 6,3 7,1 8,5 9,2 10,0
i

Solution. According to the least square method, you can find the regression
parameters  and  from the set of simultaneous equations.

 n n

  xi   n   y i
 i 1 i 1
 n n n

  xi2    xi   xi yi
 i 1 i 1 i 1

Fill the table with values necessary for calculation coefficients of the linear
regression equation.

i xi yi xi2 xi yi
1 1 5,2 1 5,2
2 2 6,3 4 12,6
3 3 7,1 9 21,3
4 4 8,5 16 34,0
5 5 9,2 25 46,0
6 6 10,0 36 60,0
Sum 21 46,3 91 179,1

Now write down the simultaneous equations:


21  6   46,3,

91  21  179,1

So we get  =0,97;  =4,3.


This way, the linear regression equation can be written as y  0,97 x  4,3 .

You might also like