You are on page 1of 15

CHAPTER ONE

INTRODUCTION
1.1. History, definition and classification of statistics
The word statistics comes from Latin word ‘status’ or Italian ‘Statista’; meaning political state.
Professor Gott Fried Achenwall used it for the first time in the middle of 18th century. During early
period, these words were used for political state of the region. The word ‘Statista’ was used to
keep the records of census or data related to wealth of a state. Gradually, its meaning and usage
extended and there onwards its nature also changed.

Since different people have different understanding of statistics, we can say there are as many
definitions as the number of people who have tried to define the term statistics. Some of these
definitions are given below.

 Statistics is a branch of mathematics that consists of a set of analytical techniques that


can be applied to data to help in making judgments and decisions in problems involving
uncertainty.
 Statistics is the art of learning from data.
 Statistics is a scientific discipline consists of procedures for collecting, describing,
analyzing and interpreting numerical data.

Like almost all other fields of study statistics has two aspects.

 Theoretical or mathematical statistics deals with the development of statistical formula,


rules and laws.
 Applied statistics(descriptive and inferential) which involves in the application of those
formulas, rules and laws in solving real world problems.

In general, its meaning can be categories into two entirely different categories. These are plural
sense and singular sense.

Plural sense (statistical data): statistics is defined as aggregates of numerically expressed facts
or figures collected in a systematic manner for a pre-determined purpose.

Singular sense (statistical methods): statistics is defined as the science of collecting organizing,
presenting, analyzing and interpreting numerical data to make good decision on the basis of such
analysis.

Depending on how data are used, statistics has two main areas.

Descriptive statistics: consists of the collection, organization, summarization, and presentation


of data without making generalization beyond that data.
BASIC STATISTICS LECTURE NOTE 2021

Example: Expenditures for the cable industry were $5.66 billion in 1996.

Inferential statistics consists of generalizing from samples to populations, performing


estimations and hypothesis tests, determining relationships among variables, and making
predictions. It is mainly used to find out something about the population based on a sample taken
from the population. Example: Drinking decaffeinated coffee can raise cholesterol levels by 7%.

1.2. Stages in Statistical Investigation


In statistics (singular sense) we have the following stages of statistical investigation:

Data Collection: This is a stage where we gather information for our purpose.

 Data may be collected by the investigator directly using methods like interview,
questionnaire, and observation or may be available from published or unpublished
sources.
 Data gathering is the basis (foundation) of any statistical work.

Data Organization: It is a stage where we edit our data.

 After editing, we may classify (arrange) according to their common characteristics to


make the information easer for presentation.

Data Presentation: At this stage, large data will be presented in tables and diagrams in a very
summarized and condensed manner to facilitate statistical analysis.

Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. It is mainly used to dig out information useful for decision making.

Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. It requires a great care since it is the basis for decision making.

1.3. Definition of some Basic terms


Population: it is the totality of peoples, things or objects about which information is being
collected.
Sample: it is a limited number of items that represent the characteristics of a large number of
items called population.
Sampling: is the selection of small number of elements from a large defined target group of
elements.
Sampling frame: A list of people, items or units from which the sample is taken.
Census survey: It is the process of examining the entire population.

PREPARED BY: ABDULMENAN M. (MSc) 1


BASIC STATISTICS LECTURE NOTE 2021

Parameter: It is a descriptive measure (value) computed from the population. It is the population
measurement used to describe the population. Example: Populations mean, population standard
deviation, etc.

Statistic: It is a measure (value) computed from the sample and used to describe the sample.
Example: sample average, sample standard deviation, etc.
Variable: is a characteristic under study that assumes different values for different elements.
Data: are the result of taking measurements or making observations on variables.

1.4. Application, uses and limitations of statistics


Application of statistics
 Research works.
 Almost all human beings in their daily life are subjected to obtaining numerical facts
Example: about price.
 Applicable in some process e.g. invention of certain drugs
 In industries especially in quality control area.
 Generally, it applies on life science, engineering, economics, social science, etc.
Uses of statistics
 It condenses and summarizes complex data.
 It facilitates comparison of data.
 It helps in predicting future trends.
 It helps in making policies and plans to meet national needs and aspirations.
 Statistical methods are very helpful in formulating and testing hypothesis and to
develop new theories.
Limitations of statistics

 Statistics doesn’t deal with single (individual) values.


 It doesn’t deal with qualitative characteristics directly.
 Statistical conclusions are true only under certain condition or true only on average.
 Statistics is sensitive for misuse, so statistical interpretations require a high degree of
skill and understanding of the subject.
Types of Variables or Data
 Qualitative Variable (Categorical data): are nonnumeric variables and can't be
measured or quantified. Example: gender, religious affiliation…
 Quantitative Variables: are numerical variables and can be measured.
Example: balance in checking account, number of children in family.
 Note that quantitative variables are either discrete or continuous.

PREPARED BY: ABDULMENAN M. (MSc) 2


BASIC STATISTICS LECTURE NOTE 2021

Discrete Variable: If the possible data values of numerical data are isolated points, i.e.,
there are gaps between the possible values, the data is discrete. (Example: counts; rate on a
scale of 1 to 10)

Continuous Variable: If the possible data values of numerical data consist of all numbers
within an interval, i.e., there are no gaps between the possible values, the data is continuous (
example: diameter of a pipe, Temperature).

1.5. Scales of Measurement


I. Nominal Scale: Consists of ‘naming’ observations or classifying them into various
mutually exclusive categories in which no order or ranking can be imposed on the data.
 One is different from the other.
Example: Sex: Male, Female Blood type: A, B, AB and O
II. Ordinal Scale: classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. The variables deal with their relative
difference rather than with quantitative differences.
 One is different from and greater/better/softer/weaker than the other.
Example: Patients may be characterized as unimproved, improved & much improved.
letter grading system (A, B, C, D, F)
Socio-economic status (low, medium, high)

III. Interval Scale: ranks data, and precise differences between units of measure do exist;
however, there is no true zero point (meaningful zero).
 Possible to add or subtract interval data but they may not be multiplied or divided.
Example: Temperature, IQ
IV. Ratio Scale: possesses all the characteristics of interval measurement, and there exists a
true zero. In addition, true ratios exist when the same variable is measured on two
different members of the population.
Example: Time, Height, Salary…

CHAPTER TWO
Methods of Data Collection and Presentation
2.1. Methods of Data Collection

The method of data collection is depends according to the sources of data. According to sources
we classified data as primary and secondary.

i. Primary Data
 Data measured or collect by the investigator or the user directly from the source.
 Are collected for the first time through census or sample survey and It may become
necessary to conduct first hand investigation.

PREPARED BY: ABDULMENAN M. (MSc) 3


BASIC STATISTICS LECTURE NOTE 2021

 Following are the methods for collecting primary data.


 Direct personal interview or observation.
 Forms which are completed through an interview with the respondent.
 It has the advantage of obtaining in-depth responses to questions from the person
being interviewed.
 One disadvantage is that interviewers must be trained in asking questions and
recording responses, which makes the personal interview survey more costly.
 Selection bias also another disadvantage of this method.
 Indirect personal interview or observation
 When the information cannot be obtained directly from the informants, an indirect
personal inquiry is conducted to get the desired information.
 Mailed Questionnaires.
 A questionnaire is sent by mail to the informants.
 It is more applicable when the area under investigation is wide and the informants
are educated.
 Disadvantages of this method include a low number of responses and inappropriate
answers to questions.
ii. Secondary Data
 When an investigator uses the data, which has already been collected by others, such
data are called secondary data. Secondary data can be obtained from journals, reports,
government publications, publications of research organizations, unpublished sources
etc.
 Before using secondary data, the investigator should examine whether the data are suitable,
adequate and reliable for the purpose of investigation.

2.2. Methods of data presentation

The presentation of data is broadly classified in to the following two categories:


1. Frequency distribution (tabular presentation)
2. Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically is
called classification.
Definitions:
Raw data: is recorded information in its original form.
Frequency: is the number of occurrence of a repeating value per unit time/area/class.

Frequency distribution: is the organization of raw data in table form using classes and
frequencies.

PREPARED BY: ABDULMENAN M. (MSc) 4


BASIC STATISTICS LECTURE NOTE 2021

 The main objective in developing a frequency distribution is to provide insights about


the data that cannot be quickly obtained if we look only at the original data.

 There are three basic types of frequency distributions.

i. Categorical frequency distribution


 It is used for data that can be place in specific categories such as nominal, or ordinal.

Example: a social worker collected the following data on socio-economic status for 16 persons.
(H=high, M=medium, L=low)
H L L H
L M L M
H M M L
L H L M
 Since the data are categorical, discrete classes can be used. To construct a frequency
distribution for the given data, we should follow the next steps.
Step 1: Make a table as shown.
Class (A) Tally (B) Frequency(C) Percent (D)

Step 2: Tally the data and place the results in column B.


Step 3: Count the tallies and place the results in column C.
Step 4: Find the percentage of values in each class by using the formula, %= f/n*100%

Percentages are not normally part of a frequency distribution, but they can be added since they
are used in certain types of graphs such as pie graphs. Also, the decimal equivalent of a percent
is called a relative frequency.
Step 5: Find the totals for columns C (frequency) and D (percent).
 Now we can construct a categorical frequency distribution by considering all the steps.
Class (A) Tally (B) Frequency(C) Percent (D)

PREPARED BY: ABDULMENAN M. (MSc) 5


BASIC STATISTICS LECTURE NOTE 2021

L //// // 7 43.75

M //// 5 31.25

H //// 4 25

Total 16 100

 For the sample, more people have low socio-economic status than any other status.
ii. Grouped Frequency Distributions
 When the range of the data is large, the data must be grouped into classes that are more
than one unit in width, in what is called a grouped frequency distribution.

Definitions:
Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one
class and lower limit of the next.
Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001,…
Class boundaries: Separates one class in a grouped frequency distribution from another.
The boundaries have one more decimal places than the row data and therefore do not
appear in the data. There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by subtracting U/2 from the
corresponding lower class limit and the upper class boundary is found by adding U/2 to
the corresponding upper class limit.
Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes.
Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
More than cumulative frequency: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
Less than cumulative frequency: it is the total frequency of all values less than or equal to
the upper class boundary of a given class.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Basic rules to construct a frequency distribution
PREPARED BY: ABDULMENAN M. (MSc) 6
BASIC STATISTICS LECTURE NOTE 2021

 There should be between 5 and 20 classes.


 The classes must be mutually exclusive. This means one data value cannot be placed into
two different classes.
 The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no gaps in a frequency
distribution.
 The classes must be exhaustive. There should be enough classes to accommodate all the
data.
 To avoid a distorted view of the data, the classes must be equal in width. One exception
occurs when a distribution has a class that is open-ended. That is, the class has no specific
beginning value or no specific ending value.
Steps for constructing Grouped frequency Distribution
Step 1: Find the range (highest value - lowest value).

Step 2: Select the number of classes desired, usually between 5 and 20 or use Sturges rule
𝑛
k=1+3.322 𝑙𝑜𝑔10 where k is number of classes desired and n is total number of observation.
𝑅
Step 3: Find the width (W) by dividing the range by the number of classes(𝐾) and rounding up.

Step 4: Select a starting point (usually the lowest value or any convenient number less than the
lowest value); the starting point is called the lower limit of the first class. Continue to add the
class width to this lower limit to get the rest of the lower limits.
Step 5: To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper limits.
Step 6: Find the class boundaries, frequencies and the cumulative frequencies.
Example*: The following data are on age of 20 women who attended health education in a
certain hospital. Construct frequency distribution by using sturge’s rule.
30, 25, 23, 41, 39, 27, 41, 24, 32, 29, 35, 31, 36, 33, 36, 42, 35, 37, 41, and 29.
Solution
1) R = highest value - lowest value= 42-23 =19.
20
2) Given number of observation (n) = 20, then no. of classes: K = 1 + 3.322𝑙𝑜𝑔10 ≅ 5.
𝑅 19
3) Class width( w)= 𝐾 = 5 ≅ 4(rounding up)
4) Let the starting point be the minimum observation. 23, 27,31,35,39 are the lower class limits.
5) The first upper class=27-U=27-1=26, 30, 34,38 and 42 are the upper class limits.
6) For class 1 Lower class boundary=23-U/2=22.5, Upper class boundary =26+U/2=26.5 …

Class limit Class Class Frequency LCF(less MCF(more Relative


boundary mark(W) than type) than type) frequency(RF)
PREPARED BY: ABDULMENAN M. (MSc) 7
BASIC STATISTICS LECTURE NOTE 2021

23-26 22.5-26.5 24.5 3 3 20 0.15


27-30 26.5-30.5 28.5 4 7 17 0.2
31-34 30.5-34.5 32.5 3 10 13 0.15
35-38 34.5-38.5 36.5 5 15 10 0.25
39-42 38.5-42.5 40.5 5 20 5 0.25

iii. Ungrouped frequency Distribution


When the range of the data values is relatively small or data on discrete variable, a frequency
distribution can be constructed using single data values for each class. This type of distribution
is called an ungrouped frequency distribution.
Example: The data shown here represent the number of miles per gallon (mpg) that 30 selected
four-wheel-drive sports utility vehicles obtained in city driving. Construct a frequency
distribution.
12 17 12 14 16 18 16 18 12 16 17 15 15 16 12
15 16 16 12 14 15 12 15 15 19 13 16 18 16 14
Solution
Step 1) Determine the classes. Since the range of the data set is small (19 - 12 =7), classes
consisting of a single data value can be used. They are 12, 13, 14, 15, 16, 17, 18, and 19.
Note: If the data are continuous, class boundaries can be used. Subtract 0.5 from each class value
to get the lower class boundary, and add 0.5 to each class value to get the upper class boundary.
Step 2) Find the numerical frequencies, and find the cumulative frequencies.

Class limits Class boundaries Frequency Cumulative


frequency
12 11.5–12.5 6 6
13 12.5–13.5 1 7
14 13.5–14.5 3 10
15 14.5–15.5 6 16
16 15.5–16.5 8 24
17 16.5–17.5 2 26
18 17.5–18.5 3 29
19 18.5–19.5 1 30

Diagrammatic and Graphic presentation of data


These are techniques for presenting data in visual displays using geometric and pictures.

PREPARED BY: ABDULMENAN M. (MSc) 8


BASIC STATISTICS LECTURE NOTE 2021

Importance:
 They have greater attraction.
 They facilitate comparison.
 They are easily understandable.

The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
1. Bar Charts
 A set of bars (thick lines or narrow rectangles) representing some magnitude over time
space.
 They are useful for comparing aggregate over time space.
 Bars can be drawn either vertically or horizontally.
 There are different types of bar charts. The most common being :
i. Simple bar chart
 It is used to represent only one variable
 They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.

Example: The table shows the average money spent by first-year college students. Draw a
horizontal and vertical bar graph for the data.

Item Cost($)
Electronics 728
Dorm decor 344
Clothing 141
Shoes 72

Average Amount Spent)


800
600
cost($)

400 Cost($)

200
0
Electronics Dorm decor Clothing Shoes
item

ii. Multiple Bar charts


PREPARED BY: ABDULMENAN M. (MSc) 9
BASIC STATISTICS LECTURE NOTE 2021

 These are used to display data on more than one variable.


 They are used for comparing different variables at the same time.
Example: The following data represents the production of different cereals for three consecutive
years (2000-2002) in a certain rural village.

crop year
2000 2001 2002
barley 28 30 34
wheat 18 19 15
maize 20 22 25
total 66 71 74

Production of cereal (2000-2002)


40
production in ton

30

20

10

0
2000 2001 2002
year of production

barley wheat maize

iii. Component Bar chart


 When there is a desire to show how a total (or aggregate) is divided in to its component
parts, we use component bar chart.
 The bars represent total value of a variable with each total broken in to its component
parts and different colors or designs are used for identifications
Example: Draw a component bar chart for the above cereal production data.

PREPARED BY: ABDULMENAN M. (MSc) 10


BASIC STATISTICS LECTURE NOTE 2021

Production of cereal (2000-2002)


80
70
production in ton

60
50
40
30
20
10
0
2000 2001 2002
year of production

barley wheat maize

2. Pie chart
 A pie chart is a circle that is divided in to sections or wedges according to the percentage
of frequencies in each category of the distribution. The angle of the sector is obtained
using:
𝑓
Degrees =𝑛 ∗ 3600

Where f= frequency for each class and n =sum of the frequencies.


 The degrees should sum to 3600 .
Example: Draw a suitable diagram to represent the following age distribution in a town.

Children youth adult old


2500 2000 4000 1500

Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.

Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.

Class Frequency Percent Degree


Children 2500 25 90
Youth 2000 20 72
Adult 4000 40 144
Old 1500 15 54
PREPARED BY: ABDULMENAN M. (MSc) 11
BASIC STATISTICS LECTURE NOTE 2021

Age Distribution
old
children
15%
25%

adult youth
40% 20%

children youth adult old

Graphical Presentation of data


 The histogram, frequency polygon and cumulative frequency graph (ogive) are most
commonly applied graphical representation for continuous data.
Histogram

 The histogram is a graph that displays the data by using contiguous vertical bars (unless the
frequency of a class is 0) of various heights to represent the frequencies of the classes. Class
boundaries are placed along the horizontal axes.
Example 1: Construct a histogram that represent the record high temperatures in degrees
Fahrenheit (oF) for each of the 50 states.

Class boundaries Frequency Midpoint LCF


99.5–104.5 2 102 2
104.5–109.5 8 107 10
109.5–114.5 18 112 28
114.5–119.5 13 117 41
119.5–124.5 7 122 48
124.5–129.5 1 127 49
129.5–134.5 1 132 50

PREPARED BY: ABDULMENAN M. (MSc) 12


BASIC STATISTICS LECTURE NOTE 2021

Frequency polygon
The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the
heights of the points.
Example 2: Construct a frequency polygon for the frequency distribution described in Example 1.

Cumulative frequency (Ogive)

 The ogive is a graph that represents the cumulative frequencies for the classes in a
frequency distribution.
 A graph showing the cumulative frequency (less than or more than type) plotted against
upper or lower class boundaries respectively.
 The class boundaries are plotted along the horizontal axis and the corresponding
cumulative frequencies are plotted along the vertical axis.
PREPARED BY: ABDULMENAN M. (MSc) 13
BASIC STATISTICS LECTURE NOTE 2021

Example 3: Construct an ogive for the frequency distribution described in Example 1.

PREPARED BY: ABDULMENAN M. (MSc) 14

You might also like