Professional Documents
Culture Documents
INTRODUCTION
1.1. History, definition and classification of statistics
The word statistics comes from Latin word ‘status’ or Italian ‘Statista’; meaning political state.
Professor Gott Fried Achenwall used it for the first time in the middle of 18th century. During early
period, these words were used for political state of the region. The word ‘Statista’ was used to
keep the records of census or data related to wealth of a state. Gradually, its meaning and usage
extended and there onwards its nature also changed.
Since different people have different understanding of statistics, we can say there are as many
definitions as the number of people who have tried to define the term statistics. Some of these
definitions are given below.
Like almost all other fields of study statistics has two aspects.
In general, its meaning can be categories into two entirely different categories. These are plural
sense and singular sense.
Plural sense (statistical data): statistics is defined as aggregates of numerically expressed facts
or figures collected in a systematic manner for a pre-determined purpose.
Singular sense (statistical methods): statistics is defined as the science of collecting organizing,
presenting, analyzing and interpreting numerical data to make good decision on the basis of such
analysis.
Depending on how data are used, statistics has two main areas.
Example: Expenditures for the cable industry were $5.66 billion in 1996.
Data Collection: This is a stage where we gather information for our purpose.
Data may be collected by the investigator directly using methods like interview,
questionnaire, and observation or may be available from published or unpublished
sources.
Data gathering is the basis (foundation) of any statistical work.
Data Presentation: At this stage, large data will be presented in tables and diagrams in a very
summarized and condensed manner to facilitate statistical analysis.
Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. It is mainly used to dig out information useful for decision making.
Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. It requires a great care since it is the basis for decision making.
Parameter: It is a descriptive measure (value) computed from the population. It is the population
measurement used to describe the population. Example: Populations mean, population standard
deviation, etc.
Statistic: It is a measure (value) computed from the sample and used to describe the sample.
Example: sample average, sample standard deviation, etc.
Variable: is a characteristic under study that assumes different values for different elements.
Data: are the result of taking measurements or making observations on variables.
Discrete Variable: If the possible data values of numerical data are isolated points, i.e.,
there are gaps between the possible values, the data is discrete. (Example: counts; rate on a
scale of 1 to 10)
Continuous Variable: If the possible data values of numerical data consist of all numbers
within an interval, i.e., there are no gaps between the possible values, the data is continuous (
example: diameter of a pipe, Temperature).
III. Interval Scale: ranks data, and precise differences between units of measure do exist;
however, there is no true zero point (meaningful zero).
Possible to add or subtract interval data but they may not be multiplied or divided.
Example: Temperature, IQ
IV. Ratio Scale: possesses all the characteristics of interval measurement, and there exists a
true zero. In addition, true ratios exist when the same variable is measured on two
different members of the population.
Example: Time, Height, Salary…
CHAPTER TWO
Methods of Data Collection and Presentation
2.1. Methods of Data Collection
The method of data collection is depends according to the sources of data. According to sources
we classified data as primary and secondary.
i. Primary Data
Data measured or collect by the investigator or the user directly from the source.
Are collected for the first time through census or sample survey and It may become
necessary to conduct first hand investigation.
Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
Example: a social worker collected the following data on socio-economic status for 16 persons.
(H=high, M=medium, L=low)
H L L H
L M L M
H M M L
L H L M
Since the data are categorical, discrete classes can be used. To construct a frequency
distribution for the given data, we should follow the next steps.
Step 1: Make a table as shown.
Class (A) Tally (B) Frequency(C) Percent (D)
Percentages are not normally part of a frequency distribution, but they can be added since they
are used in certain types of graphs such as pie graphs. Also, the decimal equivalent of a percent
is called a relative frequency.
Step 5: Find the totals for columns C (frequency) and D (percent).
Now we can construct a categorical frequency distribution by considering all the steps.
Class (A) Tally (B) Frequency(C) Percent (D)
L //// // 7 43.75
M //// 5 31.25
H //// 4 25
Total 16 100
For the sample, more people have low socio-economic status than any other status.
ii. Grouped Frequency Distributions
When the range of the data is large, the data must be grouped into classes that are more
than one unit in width, in what is called a grouped frequency distribution.
Definitions:
Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one
class and lower limit of the next.
Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001,…
Class boundaries: Separates one class in a grouped frequency distribution from another.
The boundaries have one more decimal places than the row data and therefore do not
appear in the data. There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by subtracting U/2 from the
corresponding lower class limit and the upper class boundary is found by adding U/2 to
the corresponding upper class limit.
Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes.
Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
More than cumulative frequency: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
Less than cumulative frequency: it is the total frequency of all values less than or equal to
the upper class boundary of a given class.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Basic rules to construct a frequency distribution
PREPARED BY: ABDULMENAN M. (MSc) 6
BASIC STATISTICS LECTURE NOTE 2021
Step 2: Select the number of classes desired, usually between 5 and 20 or use Sturges rule
𝑛
k=1+3.322 𝑙𝑜𝑔10 where k is number of classes desired and n is total number of observation.
𝑅
Step 3: Find the width (W) by dividing the range by the number of classes(𝐾) and rounding up.
Step 4: Select a starting point (usually the lowest value or any convenient number less than the
lowest value); the starting point is called the lower limit of the first class. Continue to add the
class width to this lower limit to get the rest of the lower limits.
Step 5: To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper limits.
Step 6: Find the class boundaries, frequencies and the cumulative frequencies.
Example*: The following data are on age of 20 women who attended health education in a
certain hospital. Construct frequency distribution by using sturge’s rule.
30, 25, 23, 41, 39, 27, 41, 24, 32, 29, 35, 31, 36, 33, 36, 42, 35, 37, 41, and 29.
Solution
1) R = highest value - lowest value= 42-23 =19.
20
2) Given number of observation (n) = 20, then no. of classes: K = 1 + 3.322𝑙𝑜𝑔10 ≅ 5.
𝑅 19
3) Class width( w)= 𝐾 = 5 ≅ 4(rounding up)
4) Let the starting point be the minimum observation. 23, 27,31,35,39 are the lower class limits.
5) The first upper class=27-U=27-1=26, 30, 34,38 and 42 are the upper class limits.
6) For class 1 Lower class boundary=23-U/2=22.5, Upper class boundary =26+U/2=26.5 …
Importance:
They have greater attraction.
They facilitate comparison.
They are easily understandable.
The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
1. Bar Charts
A set of bars (thick lines or narrow rectangles) representing some magnitude over time
space.
They are useful for comparing aggregate over time space.
Bars can be drawn either vertically or horizontally.
There are different types of bar charts. The most common being :
i. Simple bar chart
It is used to represent only one variable
They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example: The table shows the average money spent by first-year college students. Draw a
horizontal and vertical bar graph for the data.
Item Cost($)
Electronics 728
Dorm decor 344
Clothing 141
Shoes 72
400 Cost($)
200
0
Electronics Dorm decor Clothing Shoes
item
crop year
2000 2001 2002
barley 28 30 34
wheat 18 19 15
maize 20 22 25
total 66 71 74
30
20
10
0
2000 2001 2002
year of production
60
50
40
30
20
10
0
2000 2001 2002
year of production
2. Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage
of frequencies in each category of the distribution. The angle of the sector is obtained
using:
𝑓
Degrees =𝑛 ∗ 3600
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Age Distribution
old
children
15%
25%
adult youth
40% 20%
The histogram is a graph that displays the data by using contiguous vertical bars (unless the
frequency of a class is 0) of various heights to represent the frequencies of the classes. Class
boundaries are placed along the horizontal axes.
Example 1: Construct a histogram that represent the record high temperatures in degrees
Fahrenheit (oF) for each of the 50 states.
Frequency polygon
The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the
heights of the points.
Example 2: Construct a frequency polygon for the frequency distribution described in Example 1.
The ogive is a graph that represents the cumulative frequencies for the classes in a
frequency distribution.
A graph showing the cumulative frequency (less than or more than type) plotted against
upper or lower class boundaries respectively.
The class boundaries are plotted along the horizontal axis and the corresponding
cumulative frequencies are plotted along the vertical axis.
PREPARED BY: ABDULMENAN M. (MSc) 13
BASIC STATISTICS LECTURE NOTE 2021