Professional Documents
Culture Documents
Introduction :
The collection of numerical information dates back to the earliest times in human
history. For example, the first shepherds did not use stones or twigs to count and control
the number of their flocks entering and leaving the sheepfold.
But it was with the sedentarization of man and the development of agricultural activity,
which gave rise to great civilizations, that the need for information on the population and its
wealth became a necessity for the survival of these states. Indeed, these great organized
states (Sumerians, Chinese, Egyptians, Persians, Greeks, Romans, etc.) used headcounts for
fiscal purposes (taxes on harvests, trade in goods, etc.), to distribute agricultural land, or to
mobilize the armed forces.
With the advent of Islam, there was an incredible development in the techniques used by
the public services of the caliphate to count and evaluate goods and wealth for the payment
of taxes: ZAKAT. In the period of the OMAR EL KHATTAB caliphate, there was a census of
livestock, fruit trees and other assets.
Thus, an unprecedented shift has taken place in the aims of data collection: no longer to
increase the opulence, wealth and domination of those in power, but to improve the lot of
the underprivileged and strengthen social cohesion.
The beneficial effects of this form of income redistribution (or social solidarity taxes)
have led to peace and social equilibrium never seen before.
Later on, Muslims encouraged the use of statistics in scientific research because of the
enormous progress made in mathematics, IBN KHALDOUN (1332 -1406) used and advised
on the use of statistics in research into history, sociology etc.... He sometimes made
estimates for certain balance sheets.
The Middle Ages, spanning roughly from the 5th to the 15th century, were characterized
by a lack of systematic statistical development compared to later periods. The Middle Ages
were marked by different intellectual and social priorities, and statistical methods as we
understand them today were not a prominent part of the scholarly landscape during this
time. However, there were some rudimentary data collection and analysis practices that can
be considered precursors to modern statistics:
Mr : SAHNOUN.A.Y Page 1
Descriptive statistics 2023/2024
The development of statistics in the 17th century marked an important step in the
evolution of this field. While statistics as we know it today didn't fully take shape until later
centuries, there were several key developments and figures in the 17th century that laid the
groundwork for the discipline. Here's an overview of some of the significant developments
in statistics during this period:
John Graunt (1620-1674): John Graunt, an Englishman, is often considered one of the
pioneers of modern statistics. In 1662, he published a book titled "Natural and Political
Observations Made upon the Bills of Mortality." In this work, Graunt analyzed data on births
and deaths in London, creating the first known life tables and mortality statistics. He is
credited with introducing the concept of the life expectancy and is often referred to as the
father of demography.
William Petty (1623-1687): William Petty, an English economist and philosopher, made
significant contributions to the development of statistics. He applied statistical methods to
economic and social data. Petty is known for his work on political arithmetic, where he used
quantitative data to analyze various aspects of society, including population, wealth, and
resources.
Early Census Efforts: In the 17th century, there were initial attempts to conduct
censuses and collect data on population and economic activities in various countries. These
early census efforts laid the groundwork for more systematic data collection and analysis in
the centuries that followed.
Scientific Revolution: The 17th century was a time of great scientific advancement, and
many of the scientific thinkers of this era, such as Galileo Galilei and Johannes Kepler, made
contributions that would later have implications for statistics. The scientific method, which
emphasizes systematic data collection and analysis, became more widely adopted during
this period.
It's important to note that while these developments were crucial for the emergence of
statistics as a field, the mathematical and theoretical foundations of modern statistics were
still evolving. The 17th century laid the groundwork for subsequent advancements in
statistics, which continued to develop in the centuries that followed, particularly during the
Mr : SAHNOUN.A.Y Page 2
Descriptive statistics 2023/2024
18th and 19th centuries. The field of statistics as we understand it today, with concepts like
probability theory, hypothesis testing, and sampling theory, was more fully developed in the
18th and 19th centuries by figures like Carl Friedrich Gauss, Pierre-Simon Laplace, and Sir
Francis Galton.
Statistics
Definition:
«Statistics is the set of methods and techniques for processing numerical data
associated with a situation or phenomenon, with the aim of reporting reality, presenting
and analyzing data, and drawing conclusions and making decisions».
Remarque
Descriptive statistics
Definition :
A statistical series is the sequence of values taken by a variable X over units of observation.
A single statistical variable, X, is considered here. The aim is to explain the elementary tools,
adapted to the nature of X, that enable us to present this variable in a synthetic way, to
make an appropriate graphical representation and to summarize its main characteristics.
Mr : SAHNOUN.A.Y Page 3
Descriptive statistics 2023/2024
Comments:
Do not confuse "enumeration" with "census":
- Enumeration: counting individuals in a population
- Census: quantifying data according to several parameters
Exemple :
Study of Algerian demographics over a specific period of time.
Exemple :
In a company, several surveys can be established, such as:
Labor survey, employment and salary survey, expenditure and consumption survey,
industrial survey, municipal survey, building and public works survey:
Mr : SAHNOUN.A.Y Page 4
Descriptive statistics 2023/2024
Statistical test
Statistical test :
Descriptive statistics aims to study the characteristics of a set of observations, such as
the measurements obtained in an experiment. The experiment is the preliminary stage
in any statistical study.
Definition :
The statistical test is an experiment that we provoke.
Population: The population is the set on which our statistical study is based. This set is
denoted Ω.
Mr : SAHNOUN.A.Y Page 5
Descriptive statistics 2023/2024
Exemple :
For individuals: gender, SPC, age, salary, etc.
For companies: number of employees, business sector, etc.
For geographical locations: altitude, vegetation type, etc.
For dates: share price, temperature, daily sales, etc.
Modality : modalities are the different situations in which the individual can be
envisaged each of the traits studied can present two or more modalities.
Each individual in the population presents one and only one of the modalities of the trait
under consideration.
Exemple :
The number of modalities for a character varies according to the degree of detail. For
example, the marital status characteristic can have, depending on the case :
Two modes: married, unmarried
Three modes: single, married, widowed or divorced
Four: single, married, widowed, divorced
five: single, married, widowed, divorced, undeclared
Mr : SAHNOUN.A.Y Page 6
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 7
Descriptive statistics 2023/2024
In this example, there is no natural order between the eight categories, or modalities,
which are simply labels; the qualitative variable "CSP" is defined on a nominal scale.
Example: for the character 'mention du baccalauréat', the modalities are ordered in
ascending order as follows: Fair, Fairly good, Good, Very good, Excellent.
Definition : A quantitative statistical variable is said to be discrete if its modality set is finite
or countable. Thus, the set of modalities can be given in the form of a list of numbers.
In the rest of this chapter, we consider the following situation:
→* +
with Card ( ) := N is the number of individuals in our study.
Exemple: number of children per household; number of hours worked per day by a
company's employees, weight in kg of a person
Mr : SAHNOUN.A.Y Page 8
Descriptive statistics 2023/2024
Exemple : Survey of a sample of 60 families in the city ..... on the number of children per
household. The raw results of the number of children are:
214220123045
254262642132
133311132332
425233151526
252312201431
It should be noted that the raw data are not legible, hence the need to group them
together in a table for easier analysis.
Mr : SAHNOUN.A.Y Page 9
Descriptive statistics 2023/2024
Rappel :
Let Ω be a set. We call cardinal and denote Card(Ω), the number of elements of Ω.
Card(Ω) := number of elements of Ω = N.
Exemples:
Number of children observed in a sample of households in the region Z
Number of Number of
children per households (ni)
household (xi)
0 3
1 12
2 18
3 12
4 6
5 6
6 3
Total 60
Partial headcount : :
Definition :
For each value xi, we define
𝑛𝑖 𝐶𝑎𝑟𝑑*𝜔 ∈ Ω ∶ 𝑋(𝜔) 𝑥𝑖 +
ni : the number of individuals with the same xi, called the partial headcount of xi.
𝜔 𝒙𝒊 𝒙𝒊
𝒏𝒊
𝒏𝒊
Exemple:
In the example above, the number of families with three children is :
Number of children
… 3 …
per household (xi)
Number of
… 12 …
households (ni)
Mr : SAHNOUN.A.Y Page 10
Descriptive statistics 2023/2024
Cumulative headcount:
Relative frequency, or fi, is the proportion of individuals in the population presenting the
same modality. It is obtained by dividing each number ni by the total number N:
Definition :
For each value xi, we define
𝑁𝑖 𝑛 + 𝑛 + + 𝑛𝑖
The cumulative
Exemples: number Ni of a value is the sum of the number of this value and all the numbers
𝑛
of the
Dans preceding
l’exemple précédant, 𝑘 𝑛𝑘 ont un nombre inférieur ou égale à trois enfants
values. Ni45 familles
Number of children
0 1 2 3 4 5 6
per household (xi)
Number of
3 15 33 45 51 57 60
households (ni)
Partial frequency:
Relative frequency, or fi, is the proportion of individuals with the same modality in the
population. fi is obtained by dividing each number ni by the total number N:
Note :
fi can be replaced by fi × 100, which then represents a percentage.
Exemple :
Applying the notion of partial frequency to the previous example gives us :
Total 60 100
Proposition :
Proposition:
∑ 𝑓𝑖
𝑖
Alors
∑ ∑ ∑
Cumulative frequency :
Definition :
For each value xi, we define
𝐹𝑖 𝑓 + 𝑓 + + 𝑓𝑖
The quantity Fi is called the cumulative frequency of xi.
Calculating cumulative numbers, Ni, and cumulative frequencies, Fi, helps us to diagnose
our problem.
The calculation is made by summing the relative numbers and frequencies in a table
column. In effect :
Mr : SAHNOUN.A.Y Page 12
Descriptive statistics 2023/2024
000
Exemple :
Using the same example, answer the following questions:
- How many families have less than four children?
- How many families have at least four children?
- What is the proportion of families with at most four children?
- What proportion of families have more than four children?
Number Number of Household Cumulativ Cumulativ Increasing Cumulative
of household frequencie e e cumulative decreasing
children s s (%) increasing decreasing frequencie frequencie
per numbers numbers s (%) s (%)
househol
d
0 3 0.05 3 60 0.05 1
1 12 0.20 15 57 0.25 0.95
2 18 0.30 33 45 0.55 0.75
3 12 0.20 45 27 0.75 0.45
4 6 0.10 51 15 0.85 0.25
5 6 0.10 57 9 0.95 0.15
6 3 0.05 60 3 1 0.05
Total 60 1
According to the table:
- 45 households have fewer than 4 children.
- 15 households have at least 4 children.
- 85% of households have no more than 4 children.
- 15% of households have more than 4 children.
Mr : SAHNOUN.A.Y Page 13
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 14
Descriptive statistics 2023/2024
Class amplitude:
Definition :
The number 𝑒 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
With
X max et X min are respectively the largest and smallest values of X in the statistical series.
Exemple :
We weigh the 50 students in a section and we obtain the following results:
43,43,43,47,48,48,48,48,49,49,49,50,50,51,51,52,53,53,53,54,54,56,56,56,57,59,59,59,62,
62, 63,63,65,65,67,67,68,70,70,70,72,72,73,77,77,81,83,86,91,91
Create a summary table.
Solution :
Using STURGE's formula
STURGE's rule: k= 1+ (3,3 log n)
The number of classes equals 6.6
Mr : SAHNOUN.A.Y Page 15
Descriptive statistics 2023/2024
4
The YULE rule: 2
The number of classes equals 6.64
If the number is between 0.0 and 0.5 it is rounded to 0
If the number is between 0.51 and 0.99, round to 1
We accept 7
-Calculation of the class interval.
−
a: class interval
xmax: maximum value of the statistical series.
xmin: minimum value of the statistical series.
K: number of classes.
Donc (91-43)/7=6.85
6.85 is between 6.51 and 6.99
We take the value 7
Weight Number Amplitude « ai»
[43, 50[ 11 7
[50,57 [ 13 7
[57,64 [ 8 7
[64, 71[ 8 7
[71,78[ 5 7
[78, 85[ 2 7
[85,92 [ 3 7
TOTAL 61
Class amplitude (raw table)
This is the difference between the upper and lower bounds of a class.
The amplitude "a" of a class i is given by the following formula :
𝑖𝑛𝑓
𝑎𝑖 𝑒𝑖𝑠𝑢𝑝 − 𝑒𝑖
Mr : SAHNOUN.A.Y Page 16
Descriptive statistics 2023/2024
Corrected headcount
𝑛𝑖
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ℎ𝑒𝑎𝑑𝑐𝑜𝑢𝑛𝑡 𝑝𝑒𝑟 𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 𝑢𝑛𝑖𝑡 𝑎𝑚𝑖𝑛
𝑎𝑖
Corrected frequency
𝑓𝑖
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑝𝑒𝑟 𝑎𝑚𝑝𝑙𝑖𝑡𝑢𝑑𝑒 𝑢𝑛𝑖𝑡 𝑎𝑚𝑖𝑛
𝑎𝑖
Example:
The human resources manager of a company has drawn up a statistical distribution of the
years of service of the company's managers, expressed in years:
classes [6,5 ;9,5[ [9,5 ;11[ [11 ;12,5[ [12,5 ;14[ [14 ;17[
Amplitue 3 1,5 1,5 1,5 3
Headcount 11 12 19 9 9
2
Height h 2 19 9 4,5
Mr : SAHNOUN.A.Y Page 17
Descriptive statistics 2023/2024
Definition :
The center "c" of a class i is given by the following formula :
𝑖𝑛𝑓
𝑒𝑖𝑠𝑢𝑝 + 𝑒𝑖
𝑐𝑖
2
Mr : SAHNOUN.A.Y Page 18
Descriptive statistics 2023/2024
Exemple :
amplitude center
Height headcount
« ai » « ci »
[130,150[ 2 20 140
[150,160[ 30 10 155
[160,165[ 60 5 162,5
[165,170[ 62 5 167,5
[170,175[ 44 5 172,5
[175,180[ 28 55 177,5
[180,190[ 16 10 185
[190,220[ 8 30 205
- Lower boundary
- Upper boundary
- Amplitude (ai)
- Center (ci)
Ci
ai
Lower boundary Upper boundary
A) Line diagrams: this chart is applicable when the statistical units are few in number,
individually known and not repeated.
Exemple 1 : Let be the series of numbers :
{8, 2, 3, 7, 4}
Mr : SAHNOUN.A.Y Page 19
Descriptive statistics 2023/2024
{8, 2, 3, 7, 4, 7, 2}
Exemple : That's 18 people, identified by a number from 1 to 20, and given scores
from 0 to 5.
Notes = {{0, 12}, {0, 14}, {1, 7}, {1, 9}, {1, 13}, {1, 18}, {2, 4}, {2, 8}, { 2, 11}, {2, 15}, {2,
16}, {3, 17}, {3, 10}, {4, 5}, {4, 6}, {4, 20}, {5, 3}, {5, 19}}
In each data pair, the first number corresponds to the score (from 0 to 5), i.e. the
"stem", and the second identifies the person by a number ranging from 1 to 20,
i.e. "the leaves".
Mr : SAHNOUN.A.Y Page 20
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 21
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 22
Descriptive statistics 2023/2024
Exemple : When studying the teenage population of a popular neighborhood, their height
values can be distributed as follows:
Height (cm) Headcount Frequency (%)
[140,145[ 1 2
[145,150[ 1 2
[150,155[ 9 18
[155,160[ 17 34
[160,155[ 16 32
[165,170[ 3 6
[170,175[ 3 6
Mr : SAHNOUN.A.Y Page 23
Descriptive statistics 2023/2024
The histogram of the numbers in this series is shown in the following graph:
Mr : SAHNOUN.A.Y Page 24
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 25
Descriptive statistics 2023/2024
Definition :
The mode (or modal value), noted Mo, is the value that the statistical variable takes most often
(the value with the highest headcount).
The mode can be calculated for both qualitative and quantitative characteristics..
Comment:
A series can have a single mode, i.e. uni-modal, or multiple modes, i.e. multimodal or modeless.
Exemple :
Let be the series S = {4, 0, 1, 1, 2, 2, 2, 3, 3, 4, 2, 3, 4, 5, 2, 1, 3, 3, 4, 5}.
"2" and "3" are the most frequently recurring values: 5 times each.
This series has two modes: 2 and 3.
Let be the series R = {8, 6, 5, 7, 3, 1}. In this case, we can also say that all values are modal, or
there is no mode.
Mr : SAHNOUN.A.Y Page 26
Descriptive statistics 2023/2024
Examples:
Consider the statistical distribution of a population of students according to their height (in cm):
height (cm) <160 [160, 170[ [170, 180[ [180, 190[ > 190 Total
H 6 7 8 2 1 24
Freq (%) 25 29,1 33,3 8,3 4,3 100
The highest number of employees or the highest frequency indicate that the modal class is
[170, 180[
Mode calculation
Consider the distribution of a population of students by weight (in kg)
Mr : SAHNOUN.A.Y Page 27
Descriptive statistics 2023/2024
The modal class with the highest corrected frequency is the [70; 75[
Rule :
The mode for a continuous quantitative characteristic can be calculated by the following
∆1
formula: 𝑀𝑜 𝐿𝐼 CMo + 𝐴 CMo ∆ +∆
1 2
Where
LI CMo is the lower bound of the modal class
ACMo is the amplitude of the modal class
Δ1= fCMO-fCMO-1: difference between the frequency of the modal class and the frequency of the
preceding class
Δ2= fCMO-fCMO+1: difference between the frequency of the modal class and the frequency of the
next class
Using the headcount
2 −
+
(2 − ) + (2 − 2 )
Mo = 72,99
Mr : SAHNOUN.A.Y Page 28
Descriptive statistics 2023/2024
Rule:
To find the median, you must:
Arrange the series in increasing order of values;
Locate the value that divides the total number into two equal sub-numbers by applying the
formula (n+1)/2,
Check that there are as many values below the median as there are above it. The total number
is divided into two equal parts.
Mr : SAHNOUN.A.Y Page 29
Descriptive statistics 2023/2024
− ( )
+ [ ]
−
+ [ ]
Me= 11,66
The first quartile Q1 is the smallest value in the series such that at least 25% of the values are less
than or equal to Q1.
The first quartile Q2 is the smallest value in the series such that at least 50% of the values are less
than or equal to Q2.
The third quartile Q3 is the smallest value in the series such that at least 75% of the values are less
than or equal to Q3.
We can also define the quartiles Q1, Q2, Q3 as values that can be used to divide an ordered
population into four groups, each containing the same number of elements.
Mr : SAHNOUN.A.Y Page 30
Descriptive statistics 2023/2024
Exemple : we carry out a statistical study on the 50 marks awarded by a board of examiners.
Here are the results obtained by classifying these marks in ascending order (discrete variable).
Cumulative Increasing
Marks headcount headcount Frequency cumul Freq
0 1 1 2 2
1 2 3 4 6
2 2 5 4 10
3 3 8 6 16
4 2 10 4 20
5 3 13 6 26 Q1
6 2 15 4 30
7 3 18 6 36
8 4 22 8 44
9 3 25 6 50 Q2
10 2 27 4 54
11 3 30 6 60
12 4 34 8 68
13 4 38 8 76 Q3
14 3 41 6 82
15 1 42 2 84
16 2 44 4 88
17 1 45 2 90
18 2 47 4 94
19 2 49 4 98
20 1 50 2 100
n/4 = 12,5 this is not an integer, so the first quartile is the term of rank 13,is Q1 = 5
3n/4 = 37,5 this is not an integer, so the third quartile is the term of rank 38 soit Q3 = 13
Continuous case
Formula :
The first quartile
𝑁
. − 𝑁𝑄1 /
4
𝑄 𝐿𝑄 + 𝑎𝑄
𝑛𝑄
The third quartile
𝑁
. − 𝑁𝑄3 /
4
𝑄 𝐿𝑄 + 𝑎𝑄
𝑛𝑄
Mr : SAHNOUN.A.Y Page 31
Descriptive statistics 2023/2024
Exemple :
Cumulative
Marks headcount headcount Frequency cumul Freq
[0 ; 5[ 10 10 20 20
[5 ; 8[ 8 18 16 36
[8 ; 12[ 12 30 24 60
[12 ; 15[ 11 41 22 82
[15 ; 20[ 9 50 18 100
50
Solution :
Method 1 :
.4 − /
+
. − /
+
2
. 4
− /
2+
Methode 2 :
Mr : SAHNOUN.A.Y Page 32
Descriptive statistics 2023/2024
Definition :
The difference between Q3 and Q1 is called the interquartile range.
The interquartile range is used to assess the dispersion of a series, either absolutely, or by
comparison with another series (provided the values of the other series are expressed in the
same unit). The Q1 and Q3 values delimit a range within which approximately 50% of the values
in the series are concentrated.
Box plot:
The median as a positional parameter and the interquartile range as a dispersion parameter provide a
good description of a statistical series. We use these two data to construct a box plot of the series.
Exemple :
Let a series of values be summarized as:
- minimum Min = 8
- maximum Max = 33
Mr : SAHNOUN.A.Y Page 33
Descriptive statistics 2023/2024
Exemple :
1-3-3-3-5-5-6-7-7-8-8-8-9-9-10-10-10-10-11-11-12-12-13-13-13-13-14-15-16-19
and n' = 9N:10 = 27 so D9 is the 27th value of the series arranged in ascending order, so D9= 14.
Mr : SAHNOUN.A.Y Page 34
Descriptive statistics 2023/2024
Définition :
The (arithmetic) mean is the sum of the observed values divided by their number.
Let {x1, x2 , ....,xn } be a series of numbers. The formula for the arithmetic mean of this series is
given by : 𝑥̅ 𝑛 𝑘𝑖 𝑥𝑖
Exemple : Let be the series of numbers {8, 5, 9, 13, 25}. The arithmetic mean of this series of figures
is calculated as follows:
+ + + +2
̅ 2
Definition :
Let {x1, x2 , ....,xk } be a series of numbers and {n1, n2 , ....,nk } be the corresponding numbers.
The formula for the weighted arithmetic mean of this series is given by :
𝑥̅ 𝑛 𝑘𝑖 (𝑛𝑖 𝑥𝑖 )
Exemple: The study of 20 families led to the distribution of the number of children in each
family:
Nbr of children (xi) 0 1 2 3 4 5
Nbr of families (ni) 5 3 6 1 3 2
fi 25 15 30 5 15 10
The average number of children per family is :
2
̅ 2
Définition :
Let [ai,bi ] be the classes of a continuous variable and {n1, n2 , ....,nk } the corresponding
numbers.
CiMr
is the center of these classes.
: SAHNOUN.A.Y Page 35
The formula for the weighted arithmetic mean of this series is given by :
𝑘
𝑥̅ 𝑛𝑖 (𝑛𝑖 𝑐𝑖 )
Descriptive statistics 2023/2024
I.7.1 Moments :
Moments are algebraic quantities used to describe the characteristics of statistical distributions:
shape, symmetry, kurtosis, central tendency, dispersion.
( + + + + + ) ∑
,( − ) + +( − ) + +( − ) - ∑ ( − )
Or : ( − ̅) ( − ̅)
Mr : SAHNOUN.A.Y Page 36
Descriptive statistics 2023/2024
I.7.2
The variance σ2 or V (x)
Calculation of the "developed" formula
Definition :
The variance is an indicator of the dispersion of a series in relation to its mean.
1) The variance of a series is given by the following formula :
𝑛
𝑉(𝑥) ∑(𝑥𝑖 − 𝑥̅ )
𝑛
𝑖
2) The variance of a discrete quantitative variable is expressed by :
𝑉(𝑥) 𝑛 𝑛𝑖 𝑛𝑖 (𝑥𝑖 − 𝑥̅ ) if the size considered is that of a population …..(1)
3) The variance of a continuous quantitative variable is expressed by :
𝑉(𝑥) 𝑛 𝑛𝑖 𝑛𝑖 (𝑐𝑖 − 𝑥̅ ) ………………………………………………………………………………(2)
Where ci is the center of the class
Formula (1) can also be calculated using the previous method. However, to facilitate
calculs, it is preferable to use the "developed" formula. We show that formula (1) can be written
as :
Variance property :
V(x+a)=V(x), so σ(x+a)=σ(x)
V(ax)= a²V(x), so σ(ax)=aσ(x)
( + ) ∑ ,( + ) − (̅̅̅̅̅̅̅
+ )-
( + ) ∑ ( − ̅)
( + ) ( )
( ) ( ) − (̅̅̅)
( ) − ( ̅ ) , avec la propriété de la moyenne (̅̅̅) ̅
Note :
The "expanded" formula for a discrete quantitative variable:
𝑘
𝑉(𝑥) ∑ 𝑛𝑖 𝑥𝑖 − (𝑥̅ )
𝑛
𝑖
The "expanded" formula for a continuous quantitative variable:
𝑘
𝑉(𝑥) ∑ 𝑛𝑖 𝑐𝑖 − (𝑥̅ )
𝑛
𝑖
Where ci is the center of the class
Mr : SAHNOUN.A.Y Page 37
Descriptive statistics 2023/2024
( ) ( ∑ − ( ̅) )
( ) ( )
Exemple : The study of 20 families led to the distribution of the number of children in each
family:
Nbre of childrenxi 0 1 2 3 4 5
Nbre of family ni 5 3 6 1 3 2
Definition:
The standard deviation of a variable is the square root of the variance.
𝜎𝑥 √𝑉(𝑥)
Mr : SAHNOUN.A.Y Page 38
Descriptive statistics 2023/2024
Example:
Using the result of the last example, calculate the stabdard deviation
√
Remarque :
The σx parameter measures the average distance between 𝑥̅ and the values of X (see next
Figure). It is used to measure the dispersion of a statistical series around its mean.
– More it is smaller, more characters are concentrated around the mean (the series is said to be
homogeneous).
– More it is greater, more the characters are scattered around the mean (the series is said to be
heterogeneous).
: Standard deviation.
Fisher's asymmetry coef is a dimensionless number, i.e. independent of the units of measurement of
xi.
Mr : SAHNOUN.A.Y Page 39
Descriptive statistics 2023/2024
B - Symmetric distribution
Mr : SAHNOUN.A.Y Page 40
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 41
Descriptive statistics 2023/2024
( − )−( − )
( − )
: standard deviation.
: standard deviation.
Mr : SAHNOUN.A.Y Page 42
Descriptive statistics 2023/2024
Mr : SAHNOUN.A.Y Page 43
Descriptive statistics 2023/2024
Exemple :
A survey of 1500 households in a certain rural geographical area looked at the variable X
corresponding to household size, i.e. the number of people in the household. The data collected can
be presented in the form of the following bar chart.
Solution :
4 4
The 3rd-order moment is : 2
4
The average ̅ 2
The variance ( ) − (2 ) 22
Mr : SAHNOUN.A.Y Page 44
Descriptive statistics 2023/2024
Exemple
Let's assume that, following a statistical study of passenger weight x and baggage weight y, an
airline has obtained the following results:
Mr : SAHNOUN.A.Y Page 45
Descriptive statistics 2023/2024
A bivariate statistical series is one in which two measurable characteristics are recorded for the
same population. It can be presented in the form of a table, in rows or columns.
Exemple : We want to study the relationship between the height of men and their weight..
The q modalities of Y
xi y1 Y2 ……. yj ……… yq ni.
The p modalities of X
Marginal headcount
yj
x1 n11 n12 ……. n1j ……. n1q n1.
X2 n21 n22 ……. n1j ……. n2q
ni. : sum of the number in the ith row, where the subscript j, ranging from 1 to q, is
replaced by " . "
n.j : sum of the numbers of the modality yj, index i=1 to p is replaced by " .”
Remarque :
1. in the 1st column the n modalities x1, x2, ..., xi, ...., xp of characteristic X
In the 1st row, the k modalities y1, y2, ..., yj, ...., yq of characteristic Y
2. The number nij corresponds to the intersection of a row i and a column j
The number of people in the population with both modality xi and modality yj
3. For the marginal numbers ni. and n.j , replace the index that varies by " .
ni. : sum of the numbers in the ith row, j =1, ..., q is replaced by " .
Propriétés des tableaux de contingence :
n.j : sum of the numbers in the jth column, i =1, ..., p is replaced by " .
4.Mr
The: SAHNOUN.A.Y
marginal headcount of X is noted "ni." and that of Y "n.j". Page 46
5. The total number in the table is "n..". This is the total number of people in the population
studied.
Descriptive statistics 2023/2024
The xi and yj modalities being incompatible and exhaustive, we can write several series of
equalities
Represents the number of individuals presenting the modality yj of Y whatever the modality of X
It appears at the intersection of the last row and the last column.
∑ ∑
∑∑ ∑∑
Partial frequencies :
The partial frequency is the ratio of the partial number to the total number.
This is the proportion of individuals satisfying both modality xi and modality yj.
Note :
The sum of partial frequencies is 1
𝑝 𝑞
∑ ∑ 𝑓𝑖𝑗
𝑖 𝑗
Mr : SAHNOUN.A.Y Page 47
Descriptive statistics 2023/2024
It is made up of the modalities of character X and the corresponding numbers, whatever the
modalities of character Y.
xi ni. fi.
xp np. fp.
total n.. 1
Marginal frequencies" can be calculated as the ratio of the marginal number to the total
number.
It is composed of the modalities of the Y character and the corresponding number of individuals,
whatever the modalities of the X character. The marginal frequency of the yj modality is equal to:
yi n.i f.j
yp n.p f.q
total n.. 1
Y
1 2 3 4
X
1 12 4 5 11
2 18 16 11 3
3 10 4 20 6
Mr : SAHNOUN.A.Y Page 48
Descriptive statistics 2023/2024
y
1 2 3 4 ni.
X
1 12 4 5 11 32
2 18 16 11 3 48
3 10 4 20 6 40
n.j 40 24 36 20 120
Les fréquences :
y 1 2 3 4 fi.
X
1 0.10 0.033 0.041 0.091 0.2667
2 0.15 0.1333 0.091 0.025 0.40
3 0.083 0.0333 0.1666 0.05 0.3333
f.j 0.3333 0.20 0.30 0.1667 1
Distribution marginale de X:
xi ni. fi. (%)
1 32 0.2667
2 48 0.40
3 40 0.3333
total 120 1
Distribution marginale de Y:
Ce sont les modalités de X et des effectifs de chacune de ces modalités dans la sous population
présentant la modalité yj de Y.
Mr : SAHNOUN.A.Y Page 49
Descriptive statistics 2023/2024
xi ni j fi/j
xp np j fp/j
total n.j 1
Ce sont les modalités de Y et des effectifs de chacune de ces modalités dans la sous population
présentant la modalité xi de X
yi nij fj/i
yp nip fq/i
total ni. 1
Mr : SAHNOUN.A.Y Page 50
Descriptive statistics 2023/2024
4 11 0.343
total 32 1
Y
X fonctionner inactif retraité
masculin 5 3 1
féminin 4 3 4
3 masculin
2 féminin
0
fonctionneur inactif retraité
100%
90%
80%
70%
60%
50% féminin
40% masculin
30%
20%
10%
0%
fonctionneur inactif retraité
Mr : SAHNOUN.A.Y Page 51
Descriptive statistics 2023/2024
féminin
retraité
inactif
fonctionneur
masculin
0 1 2 3 4 5 6
féminin
fonctionneur
inactif
retraité
masculin
Mr : SAHNOUN.A.Y Page 52
Descriptive statistics 2023/2024
3 masculin
2 féminin
0
fonctionneur inactif retraité
féminin
retraité
inactif
fonctionneur
masculin
0 2 4 6
100%
80%
60%
40% féminin
20% masculin
0%
fonctionner
inactif
retraité
Mr : SAHNOUN.A.Y Page 53
Descriptive statistics 2023/2024
II.2.2 Quantitatifs :
II.2.2.1 Cas discret :
Une variable discrète a une valeur finie. Il est possible de les énumérer
Y
X
1 2 3 4
1 1 1 1 4
2 4 3 1 3
3 2 4 2 2
Y
X
1 2 3 4
1 1 1 1 14
2 4 13 10 3
3 12 4 2 2
Mr : SAHNOUN.A.Y Page 54
Descriptive statistics 2023/2024
4 [120,140[
3 [140,160[
[160,180[
2
0
[20,40[ [40,60[ [60,80[
Mr : SAHNOUN.A.Y Page 55
Descriptive statistics 2023/2024
[160,180[
[60,80[
[140,160[
[40,60[
[20,40[
[120,140[
0 1 2 3 4 5 6 7
Ces deux distributions peuvent être étudiées comme dans le cas des statistiques univariées.
En particulier, elles peuvent être caractérisées par leur moyenne et variance.
La moyenne du caractère X:
̅ ∑
La moyenne du caractère Y:
̅ ∑
La variance du caractère X:
( ) ∑ ( − ̅)
∑ − ̅
Mr : SAHNOUN.A.Y Page 56
Descriptive statistics 2023/2024
La variance du caractère Y:
( ) ∑ ( − ̅)
∑ −̅
Remarque : Dans le cas où l’un des caractères X et Y est quantitatif continu, on remplace les
formules de la moyenne et de la variance les valeurs xi par les centres ci des classes du caractère
Exemple :
Y
1 2 3 4
X
1 12 4 5 11
2 18 16 11 3
3 10 4 20 6
La moyenne de X :
̅ ∑
2
Y
ni. ni.* xi
X
1 32 32
2 48 96
3 40 120
somme 120 248
̅ 2 2
2
La variance de X :
( ) ∑ − ̅
2
Y
ni. ni.* xi²
X
1 32 32
2 48 192
3 40 360
somme 120 584
( ) − (2 )
2
Mr : SAHNOUN.A.Y Page 57
Descriptive statistics 2023/2024
̅ ∑
̅ ∑
( ) ∑ ( − ̅)
∑ −̅
( ) ∑ ( −̅ )
∑ −̅
Exemple :
Y
1 2 3 4
X
1 12 4 5 11
2 18 16 11 3
3 10 4 20 6
X
ni. ni.* xi
Y
1 18 18
2 16 32
3 11 33
4 3 12
somme 48 95
Mr : SAHNOUN.A.Y Page 58
Descriptive statistics 2023/2024
X
ni. ni.* yi²
Y
1 18 18
2 16 64
3 11 99
4 3 48
somme 48 229
( ) 22 − ( )
Mr : SAHNOUN.A.Y Page 59
Descriptive statistics 2023/2024
II.3.3 La covariance
Il s'agit de définir un indice de liaison entre les deux variables considérées. Cet indice est le
coefficient de corrélation linéaire ; il nécessite la définition préalable de la covariance.
Définition :
La covariance généralise à deux variables la notion de variance. Sa formule de définition est la suivante :
Bon ajustement 𝑛
𝑐𝑜𝑣(𝑥 𝑦) 𝑠𝑋𝑌 ̅- 𝒚𝒋 − 𝒚
∑,𝒙𝒊 − 𝒙 ̅
𝑛
𝑖
Mauvais ajustement : 𝑛
[ ∑ 𝒙𝒊 𝒚𝒋 ] − ,𝒙 ̅-
̅ 𝒚
𝑛
𝑖
Démonstration
La covariance est donc la moyenne des produits des écarts aux moyennes (dans chaque produit,
chacun des deux écarts est relatif à l'une des deux variables considérées). On peut, la encore,
retenir son expression sous la forme suivante : c'est la moyenne des produits moins le produit
des moyennes. Comme la variance, la covariance n'a pas de signification concrète. Dans le cas de
la variance, on doit passer à l'écart-type pour avoir un indicateur interprétable ; dans celui de la
covariance, il faudra passer au coefficient de corrélation linéaire.
Propriétés de la covariance :
La covariance est un indice symétrique. De façon évidente, on a SXY = SYX (les deux variables
jouent donc le même rôle dans la définition de la covariance).
La covariance peut prendre toute valeur réelle (négative, nulle ou positive ; \petite" ou \grande"
en valeur absolue).
Mr : SAHNOUN.A.Y Page 60
Descriptive statistics 2023/2024
Définition :
Les coefficients de corrélation permettent de donner une mesure synthétique de l’intensité de la
relation entre deux caractères et de son sens lorsque cette relation est monotone.
Bon ajustement
Le coefficient de corrélation de Pearson permet d’analyser les relations linéaires :
𝑐𝑜𝑣(𝑋 𝑌)
𝑟 𝑐𝑜𝑟𝑟(𝑥 𝑦)
Mauvais ajustement : 𝜎(𝑋)𝜎(𝑌)
corr(X, Y ) € *−1, 1+
corr(X, Y ) = corr(Y,X)
corr(X,X) = 1
Mr : SAHNOUN.A.Y Page 61
Descriptive statistics 2023/2024
4. corr(X, Y ) > 0 : liaison relative, X et Y ont tendance à varier dans le même sens ;
5. corr(X, Y ) < 0 : liaison relative, X et Y ont tendance à varier dans le sens contraire ;
Exemple :
Nous considérons 10 joueurs et soient :
– Y la variable qui représente le nombre de jeux auquel un joueur joue.
– X la variable qui représente le gain ou perte (+1 s’il gagne 10 Da et −1 s’il perd 10
Da et 0 sinon).
Nous avons le tableau de contingence suivant :
Calculer cov(X, Y ).
Solution :
Nous avons
̅ ∑ (− )+( )+( 2) −
Et
̅ ∑ ( ) + (2 )+( )+( ) 2
( ) [ ∑ ] − ,̅ ̅-
( ) (− ) + (− 2) + (− 2 ) + (− 2 )
+( 2) + ( ) − (− 2 ) − 2
On calcule l’écart-type
Mr : SAHNOUN.A.Y Page 62
Descriptive statistics 2023/2024
( ) ∑ − ̅
Alors
( ) √
( ) ∑ −̅
Alors
( ) √
− 2
( )
( ) −
II.4 Ajustement :
L'ajustement linéaire consiste à tracer une droite qui passe au plus près des observations d'un
nuage de points
Définition :
Soit X et Y deux variables statistiques numériques observées sur n individus. Dans un repère
orthogonal ( ⃗ ⃗) , l’ensemble des n points de coordonnées (xi, yi) forme le nuage de points
associé à cette série statistique.
+ + +
+ + +
− ( − )
Avec
( 2− ) ( 2− )
:abscisse de G1
: ordonné de G1
2 :abscisse de G2
Mr : SAHNOUN.A.Y Page 63
Descriptive statistics 2023/2024
2 : ordonné de G2
Exemple :
Le tableau suivant donne l’évolution du nombre d’adhérents d’un club de Tennis de 2001 à
2006.
Année 2001 2002 2003 2004 2005 2006
Rang xi 1 2 3 4 5 6
Nombre d’adhérents 70 90 115 140 170 220
Solution :
+ +
Calcul des coordonnées de G1 : 2
+ +
+ +
+ + 22
+ + + + + 22
2
donc, G( 3, 5 ; 134, 2 )
Mr : SAHNOUN.A.Y Page 64
Descriptive statistics 2023/2024
Détermination de l’équation :
G1( 2 ; 91, 7 )
G2( 5 ; 176, 7 )
Y-91,7=. / ( − 2)
Y=28,33 x + 280,02
II.4.3 Méthode des moindres carrés :
La loi normale ou de Laplace-Gauss est encore appelée loi des erreurs ou des écarts, car
c’est ainsi qu’elle a été introduite. Le principe de la méthode des moindres carrés ordinaires
(MCO) consiste à s’intéresser à la série statistique
Définition :
On appelle droite de régression de Y selon x, notée DY / x, déterminée par la méthode des moindres
carrés, la droite d’équation y = ax + b, pour laquelle la somme des carrés des résidus est minimale.
Les résidus ou erreurs
Si le modèle observées
est fort, sont définis( comme
les n observations étant lesvérifier
) devraient différences entre+les. valeurs observées
et les valeurs estimées par un modèle de régression.
En fait, cela se produit rarement. Le plus souvent, il y a des écarts notés ei que l’on va
introduire dans l’équation du modèle :
Mr : SAHNOUN.A.Y Page 65
Descriptive statistics 2023/2024
+ +
L'ajustement linéaire par la méthode des moindres carrés consiste à déterminer la droite
(que l'on appelle aussi droite de régression) telle que la somme des carrés des n valeurs –̂
soit minimale (ce qui explique le nom de la méthode).
–̂
Rappelons que la valeur minimale d'une fonction se calcule en posant sa dérivée égale à 0. Pour
trouver a et b. Calculons d'abord la dérivée partielle de q par rapport à a.
−2 ∑( −( + ))
∑ ∑ + ∑ ( )
−2 ∑( − − )
∑ ∑ +∑
∑ ∑ +
Mr : SAHNOUN.A.Y Page 66
Descriptive statistics 2023/2024
̅ ̅+
̅− ̅ (2)
∑ ∑ + (̅ − ̅) ∑
∑ ∑ + ̅∑ − ̅∑
∑ + ̅∑ ∑ − ̅∑
−̅
− ̅
− ̅ −̅ ̅
− ̅ − ̅
( )
̂
+ { ( )
̂ ̅− ̅
̂ ( )
+ { ( )
̂ ̅− ̅
Exemple :
Les ventes au cours des 6 premiers mois ont été les suivantes :
Mois (x) Janvier Février Mars Avril Mai Juin
Ventes(y) 345 410 485 535 610 675
Mr : SAHNOUN.A.Y Page 67
Descriptive statistics 2023/2024
( +2+ + + + ) 2
̅
( + + + + + )
̅
−( ) 2 2
+2 2
Tendance exponentielle :
Une courbe de tendance exponentielle est un trait courbe qui s'avère particulièrement utile
lorsque les valeurs de données augmentent ou baissent de manière croissante
( ) ( ) ( ) + ( ) ( ) + ( )
On transforme une droite exponentielle en droite linéaire par les logarithmes décimaux.
( )
( )
( )
« a » est constant.
Exemple :
Période (x) Janvier Février Mars Avril Mai Juin
Quantité (y) 430 455 520 730 1140 1850
Mr : SAHNOUN.A.Y Page 68
Descriptive statistics 2023/2024
xi Y=log(yi) xi Yi xi²
1 Log(430)=2,633 2,633 1
2 Log(455)=2,658 5,316 4
3 2,716 8,148 9
4 2,863 11,453 16
5 3,057 15,285 25
6 3,267 19,603 36
21 17,195 62,438 91
2
̅
̅ 2
2 −, 2 -
2
−( )
4 4
2 − 2 2 2
Mr : SAHNOUN.A.Y Page 69