# 2012

PWSZ Kalisz

[STATISTICS]
Statistics is a collection of methods for planning experiments, obtaining data and organize them, summarize them, analyze them, interpret them and draw them conclusions.

## What is the statistics?

(Descriptive statistics and, mathematical statistics)

Statistics branch of mathematics concerned with collection, classification, analysis, and interpretation of numerical facts, for drawing inferences on the basis of their quantifiable likelihood (probability). Statistics can interpret aggregates of data too large to be intelligible by ordinary observation because such data (unlike individual quantities) tend to behave in regular, predictable manner. It is subdivided into descriptive statistics and mathematical statistics.

a. Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. They are used in the first instance to get a feel for the data, in the second for use in the statistical tests themselves, and in the third to indicate the error associated with results and graphical output. Many of the descriptions or "parameters" such as the mean will be familiar to you already and probably use them far more than you are aware of. This is an estimate of the standard deviation or perhaps standard error. b. Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term "mathematical statistics" is closely related to the term "statistical theory" but also embraces modeling for actuarial science and non-statistical probability theory.

Measures of location
a) b) c) d) e) f) the arithmetic mean geometric mean harmonic mean the root mean square median quantities( quartiles, quintiles, deciles, percentiles) mode

a. 'Arithmetic Mean' A mathematical representation of the typical value of a series of numbers, computed as the sum of all the numbers in the series divided by the count of all numbers in the series. Arithmetic mean is commonly referred to as "average" or simply as "mean".

b. Geometric mean A geometric mean is often used when comparing different items finding a single "figure of merit" for these items when each item has multiple properties that have different numeric ranges. The use of a geometric mean "normalizes" the ranges being averaged, so that no range dominates the weighting, and a given percentage change in any of the properties has the same effect on the geometric mean.

c. Harmonic mean
In mathematics, the harmonic mean (sometimes called the subcontrary mean) is one of several kinds of average. Typically, it is appropriate for situations when the average of rates is desired. It is the special case (M1) of the power mean. As it tends strongly toward the least elements of the list, it may (compared to the arithmetic mean) mitigate the influence of large outliers and increase the influence of small values.

## d. The root mean square median

In mathematics, the root mean square (abbreviated RMS or rms), also known as the quadratic mean, is a statistical measure of the magnitude of a varying quantity. It is especially useful when variates are positive and negative, e.g., sinusoids. RMS is used 3

in various fields, including electrical engineering. It can be calculated for a series of discrete values or for a continuously varying function. The name comes from the fact that it is the square root of the mean of the squares of the values. It is a special case of the generalized mean with the exponent p = 2.

## e. Quantities( quartiles, quintiles, deciles, percentiles)

Quantities such as quartiles, quintiles, deciles and percentiles perform similar functions to the median in a data set. Quartile

There are 3 quartiles in a data set. Between them, they divide the data into 4 equal parts or quarters. The first quartile is called the lower quartile and is often denoted Q1. The second quartile is obviously just the median, as it is the middle value of the data set. The third quartile is called the upper quartile and is often denoted Q3. You should note that Q1 effectively splits the data set into the lower 25% of values and the upper 75% of values, whereas Q3 splits the data into the lower 75% of values and the upper 25% of values. The distance between Q1 and Q3, namely Q3 - Q1, is called the inter-quartile range, and gives an indication of the spread of the middle 50% of the data set. Quintile There are 4 quintiles in a data set. Between them, they divide the data into 5 equal parts or fifths. Quintiles are not that commonly used. You should note that the first quintile effectively splits the data set into the lower 20% of values and the upper 80% of values; the second quintile splits the data set into the lower 40% of values and the upper 60% of values, and so on. Decile There are 9 deciles in the data set. Between them, they divide the data into 10 equal parts or tenths. Obviously, the fifth decile is the median, as it is the middle value in the data set. Also, the third decile splits the data set into the lower 30% of values and the upper 70% of values, and so on. Percentile There are 99 percentiles in the data set. Between them, they divide the data into 100 equal parts or hundredths. The fiftieth percentile is the median, as it is the middle value in the data set. As before, the sixty third percentile divides the data set into the lower 63% of values and the upper 37% of values, and so on. Note also that the 75th percentile is also the same value as Q3, for example.
These types of values are used to rank investment performance, such as the performance of mutual funds. In calculating these values, it is important firstly to order the data set, as we did in the case of the median. Once this is done, it is then necessary to find the position of the value that you are calculating, and then the value itself, so the procedure is exactly the same as for the median.

f. Mode The mode is the value that appears most often in a set of data. The mode of a discrete probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. The mode of a continuous probability distribution is the value x at which its probability density function has its maximum value, so, informally speaking, the mode is at the peak.

## Measures of statistical dispersion

a. b. c. d. e. f. standard deviation variance average deviation range coefficient of variation quartile coefficient of dispersion

a. Standard deviation In statistics and, standard deviation (represented by the symbol sigma, ) shows how much variation or "dispersion" exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean; high standard deviation indicates that the data points are spread out over a large range of values. In addition to expressing the variability of a population, standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the expected standard deviation in the results if the same poll were to be conducted multiple times. The reported margin of error is typically about twice the standard deviation the radius of a 95 percent confidence interval. Some examples:

Consider a population consisting of the following eight values: These eight data points have the mean (average) of 5:

To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result of each:

Next, compute the average of these values, and take the square root:

Here the population standard deviation is equal to the square root of the variance. The formula is valid only if the eight values we began with form the complete population. If they instead were a random sample, drawn from some larger, "parent" population, then we should have divided by 7 (which is n 1) instead of 8 (which is n) in the denominator of the last formula, and then the quantity thus obtained would have been called the sample standard deviation. b. Variance The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions. While other such approaches have been developed, those based on moments are advantageous in terms of mathematical and computational simplicity. c. Average deviation
The absolute deviation is the absolute difference between each value of statistical variable and the arithmetic mean.

D i = |x - x|
It is also the arithmetic mean of the absolute deviations, and it is represented by

d. Range
Is the difference between the lowest and the highest values. In {4.6.9.3.7} the lowest value is 3, and the highest is 9, so the range is 9-3=6.

e. Coefficient of variation
Is a normalized measure of a probability distribution. It is also known as unitized risk or the variation coefficient. Its absolute value is sometimes known as relative standard deviation, which is expressed as a percentage.

The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean :

which is the inverse of the signal-to-noise ratio. It shows the extent of variability in relation to mean of the population. f. Quartile coefficient of dispersion Is a descriptive statistic which measures dispersion and which is used to make comparisons within and between data sets. The statistic is easily computed using the first (Q1) and third (Q3) quartiles for