Professional Documents
Culture Documents
.......................................... ..........................................
تقييم البحث
..........................................
Introduction
Introduction to statistics
The modern field of statistics emerged in the late 19th and early 20th century in three
stagesThe first wave, at the turn of the century, was led by the work of Francis Galton and
Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for
analysis, not just in science, but in industry and politics as well. Galton's contributions
included introducing the concepts of standard deviation, correlation, regression analysis and
the application of these methods to the study of the variety of human characteristics—height,
weight, eyelash length among others
( HelenMaryWalker(1975). Studiesinthehistoryofstatisticalmethod.ArnoPress. ISBN 978040506
6283.)
Today, statistical methods are applied in all fields that involve decision making, for making
accurate inferences from a collated body of data and for making decisions in the face of
uncertainty based on statistical methodology. The use of modern computers has expedited
large-scale statistical computations and has also made possible new methods that are
impractical to perform manually. Statistics continues to be an area of active research for
example on the problem of how to analyze big data.
Scientific Content :
1 - Definition of Statistics.
1
2 – Distinguish between a qualitative variable and quantitative variable.
1 - Definition of Statistics:
Science of collecting, organizing, analyzing, presenting, data to make better decisions.
- Type of Statistics
A- Descriptive
- Statistics are methods of organizing, summarizing, and presenting data in an
informative way.
Inferential
- Can be used to take a decision, estimate, prediction or generalize about a
population based on a sample.
Population
All items that have common characteristics or attributes.
Sample
Is a portion, or part, of the population of interest.
Type of Variable :
Qualitative variable :
The characteristic being studied is nonnumeric. Such as gender, religious affiliation,
types of a car, eye color are examples of the qualitative variable.
Quantitative variable :
Information is reported numerically. Such as balance in your checking account,
minutes remaining in class, or number of children in a family.
Discrete variables : can assume only certain values (No. of students, No.
of children, No. of rooms)
A continuous variable : can assume any values within a certain range (Salary,
age, weight, height)
2
Level Of Measurment :
There are Four Levels Of data:
1 – Numerical data is classified into categories and can’t be arranged in any particular
Order. Such as eye color, gender. Religious affiliation.
2 – Ordinal Level:
Involves data arranged in some Order, but the differences between data value can’t be
determined or are meaningless. Such as a taste – Test of 4 soft drinks “pepsi was ranked
number1 , Cocacola number 2, Seven-up number 3, and Orange Crush number4.
3 – Interval Level:
Similar to the Ordinal level, with the additional Property those meaningful amount Of
differences between data value can be determined. Such as the temperature on the
Fahrenheit scale is an interval level of data.
4 – Ratio Level:
The interval levels with zero starting point. Differences and ratios are meaningful for this
level of measurement such as monthly income of doctors, or Distance traveled by a car per
month.
3
Graphical Presentation
Data are the raw martial of any Statistical analysis. Data may be collected from each and
every item of a specific group from the population, In this case the Statistical procedure is
called “CENSUS”. Or data could be collected only from a portion of the population. In
this case the Statistical procedure is called “ SAMPLE”.
Data sources:
Data can be obtained as their primary of secondary data.
Primary data :
Primary data is data that we collected ourselves by observation, experiment or
questionnaires.
Secondary data :
Secondary data is data that has been collected by someone else. We can obtain this from
sources such as CAMPAS Central Agency of public Mobilization and statistical, market
research companies or report on analysis.
Statistical analysis:
When we use statistics to make a decision, we carry out a statistical analysis.
4
Construct a frequency distribution:
- Determine the littlest and largest number within the rawdata and find the range..
Determine the suggested number of classes. attempt to find the minimum power of TWO to
give variety is adequate or greater than the amount of observation. Then add one to itpower.
For observation. Then add one to it power. as an example, if the amount of observation is 20,
then the minimum power of two is 5 ( 25 = 32) ; that the number of classes will be 5 + 1 = 6
classes another example; if the amountof observation is 33; that the minimum power of
two are going to be 6 ( 26 = 64 ) ; then the amount of classes are going to be 6 +1 = 7 classes.
- Frequency distribution
Class Frequency
50 up to 58 3
58 up to 66 5
66 up to 74 5
74 up to 82 3
82 up to 90 1
90 up to 98 3
20
Stem and Leaf Displays:
Stem- and leaf display; A Statistical technique for displaying a set of data.
5
Each numerical value is divided into two parts : the leading digits become the stem and the
trailing digits
Become the leaf.
Note : an advantage of the tem-and-leaf display over a frequency distribution is we do not lose
the identity of each observation.
A student in year three achieved the following scores on his twelve statistics
quizzes this semester
86 79 92 84 69 88
91 83 96 78 82 85
Find a stem – and leaf chart.
Stem Leaf
6 9
7 8 9
8 2 3 4 5 6 8
9 1 2 6
To Convert the leaf & stem into frequency distribution, we count the number in each row
Classes Frequency
60 – 70 1
70 – 80 2
80 – 90 6
90 – 100 3
Total 12
1 –Pie Chart :
Pie chart display the distribution of categorical data by showing the fraction of the case falling
within each of the categories because all of the categories account 100% of the cases, wedges
( pie slices) that added to 100% of the circle.
The Following table shows the average of weekly expenditure for a group of students:
Item Ice cream Chocolate Chips Soda drinks Tea and coffee
Amount 30 20 15 25 30
360
7
Bar Chart
A bar chart can be used to show any of the levels of measurement ( nominal, ordinal,
interval, or ratio) A. bar chart displays a batch of numbers with side – by- side bars,
usually spaced evenly along the horizontal base. The bar chart can be simple bar chart or
multiple bar chart.
( Abdel Aaal , Medhat and others , 2020)
simple bar chart:
Simple bar chart can be used to compare values of one quantity.
Multiple bar chart
Used to compare the value of more than one quantity.
Line chart :
A line chart or line plot or line graph or curve chart is a type of chart which displays
information as a series of data points called 'markers' connected by straight line segments.
(Spear, Mary Eleanor (1952). Charting Statistics. New York: McGraw-Hill. p. 41)
It is a basic type of chart common in many fields. It is similar to a scatter plot except that the
measurement points are ordered (typically by their x-axis value) and joined with straight line
segments.
(Burton G. Andreas (1965). Experimental psychology. p.186)
A line chart is often used to visualize a trend in data over intervals of time – a time series –
thus the line is often drawn chronologically. In these cases they are known as run charts.
1. It is highly improbable that the discontinuities in the slope of the best-fit would
correspond exactly with the positions of the measurement values.
2. It is highly unlikely that the experimental error in the data is negligible, yet the curve
falls exactly through each of the data points.
Graphic Presentation of a grouped data:
The three commonly used graphic forms are histogram frequency, polygons, and a
cumulative frequency distribution.
1 – Histogram :
8
A graph in which the classes are marked on the horizontal axis and class frequencies on the
vertical axis. The class frequency are represented by the heights of the bars and the bars are
drawn adjacent to each other.
( Pearson, K. (1895). "Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous
Material" (PDF). Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences. 186: 343–414. )
If the bins are of equal size, a rectangle is erected over the bin with height proportional to
the frequency—the number of cases in each bin. A histogram may also be normalized to
display "relative" frequencies. It then shows the proportion of cases that fall into each of
several categories, with the sum of the heights equaling 1.
However, bins need not be of equal width; in that case, the erected rectangle is defined to have
its area proportional to the frequency of cases in the bin. The vertical axis is then not the
frequency but frequency density—the number of cases per unit of the variable on the
horizontal axis. Examples of variable bin width are displayed on Census bureau data below.
As the adjacent bins leave no gaps, the rectangles of a histogram touch each other to indicate
that the original variable is continuous.
Histograms give a rough sense of the density of the underlying distribution of the data, and
often for density estimation: estimating the probability density function of the underlying
variable. The total area of a histogram used for probability density is always normalized to 1.
If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative
frequency plot.
9
is very difficult to describe mathematically, while it is simple for a histogram where each bin
varies independently.
Histograms are sometimes confused with bar charts. A histogram is used for continuous data,
where the bins represent ranges of data, while a bar chart is a plot of categorical variables.
Some authors recommend that bar charts have gaps between the rectangles to clarify the
distinction
(( Naomi, Robbins. "A Histogram is NOT a Bar Chart". Forbes.com. Forbes. Retrieved 31 July 2018.)
A frequency polygon consists of line segments by the class midpoint and the class frequency.
A cumulative frequency distribution is used to determine how many or what proportion of the
data values are below or above a certain value.
1 – Frequency Histograms :
To construct a histogram, the class frequency are provided on the vertical axis ( the y-axis)
and the class limits ( or class mid-points) are on the horizontal axis ( the X-axis)
Classes 20 - 40 - 60 - 80 – 100
frequency 50 80 70 30
10
The classes will be represented on the X-axis (horizontal axis) and the frequency will be
represented on the Y-axis ( vertical axis) as shown in the previous figure.
Cumulative histogram :
- The data values are shown on the horizontal axis. Shown on the vertical axis are
the cumulative frequencies.
- The frequency (one of the above) of each class is plotted as a point.
- The plotted points are connected by straight lines.
- the same rules can be used to plot descending cumulative frequency.
11
Ex:
C 0- 10 - 20 - 30 - 40 - 50 – 60
F 20 40 50 80 40 20
C F (UL) (ACF)
0- 20 Less than 10 20
10 - 40 Less than 20 60
Total 250
ACF
250
200
150
100
50
0
UL
10 20 30 40 50 60 70
12
EX:
Plot the (ACF) histogram
C 5- 10 - 20 - 30 - 40 - 50 – 60
F 20 40 50 80 40 20
C F (LL) (DCF)
5- 20 More than 5 250
10- 40 More than 10 230
20- 50 More than 20 190
40 - 80 More than 30 140
50 - 40 More than 40 60
70-80 20 More than 50 20
Total 150
DCF
250
200
150
100
50
LL
5 10 20 30 40 50 60
13
EX:
The following table shows motor repair costs in pounds for an insurance company's minor
claims department:
C F (UL) (ACF)
200 – 300 6 Less than 300 6
300 – 400 10 Less than 400 6+10=16
400 – 500 15 Less than 500 16+1531
500 – 600 20 Less than 600 31+20=51
600 – 700 45 Less than 700 51+45=96
700 – 800 55 Less than 800 96+55=151
800 – 900 140 Less than 900 151+140=291
900 – 1000 9 Less than 1000 291+9=300
Total 300
300
250
200
150
100
50
UL
Less than 1000
Less than 500
14
To determine the number of cars for a certain amount of costs, we can use the cumulative
frequency polygon, such as determining the number of cars that will cost less than 650 pounds
as shown in the following figure.
ACF
350
300
250
200
150
100
50
UL
15
2) The descending cumulative frequency:
C F (LL) (DCF)
200 – 300 6 More than 200 300
300 – 400 10 More than 300 300 – 6 = 294
400 – 500 15 More than 400 294 – 10 = 284
500 – 600 20 More than 500 284 – 15 = 269
600 – 700 45 More than 600 269 – 20 = 249
700 – 800 55 More than 700 249 – 45 = 204
800 – 900 140 More than 800 204 – 55 = 149
900 – 1000 9 More than 900 149 – 140 = 9
Total 300
The ACF polygon will look like the following figure:
DCF
350
300
250
200
150
100
50
LL
900 - 1000
200 – 300
300 – 400
400 – 500
500 – 600
600 – 700
700 – 800
800 – 900
16
References :
1. Dr. Medhat abdel Aal , introduction Mathematics of finance , Ain shames university
2019.
2. Dr. Mamdouh Abdel Alim , introduction Mathematics of finance , Ain shames
university 2020.
3. ( Helen Mary Walker (1975). Studies in the history of statistical method. Arno
Press. ISBN 9780405066283.)
17