Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION TO STATISTICS
Statistics refers to the practice of science of collecting and analyzing numerical data in large
quantities. In general, one can say that
In other words, statistics is the methodology which scientists and mathematicians have
developed for interpreting and drawing conclusions from collected data. Everything that deals
even remotely with the collection, processing, interpretation and presentation of data belongs
to the domain of statistics, and so does the detailed planning of that precedes all these
activities.
Collecting
Organizing
Analyzing
Interpreting
Presenting
These statistical processes form a part of the decision making process in many
organizations. Managers of today need to have strong mathematical abilities to interpret
statistical analyses before they can make informed decisions.
Chapter 1 Introduction to Statistics 2
Quick Check 1
1.2 TERMINOLOGIES
1.2.1 Variable
A variable is any characteristics, number, or quantity that can be measured or
counted.
1.2.2 Data
1.2.3 Population
1.2.4 Sample
Example :If it is not possible to obtain information about all school-age children with
asthma in Malaysia, she may select just 50 school-age children with asthma treated
in government hospitals in Malaysia and obtain a sample data of these 50 children.
1.2.5 Census
1.2.7 Parameter
1.2.8 Statistics
The selection of sample requires that each member of the population has an equal
and independent chance of being selected.
(http://explorable.com)
The selection of sample does not require that each member of the population has an
equal and independent chance of being selected.
Chapter 1 Introduction to Statistics 4
A list of all the elements in the population from which the sample is drawn.
Example :If it is not possible to obtain information about all school-age children with
asthma in Malaysia, she may select just 50 school-age children with asthma treated
in government hospitals in Malaysia and obtain a sample data of these 50 children.
The sampling frame would be: A list of all school-age children with asthma treated in
government hospitals in Malaysia.
Broadly speaking, applied statistics can be divided into two areas: descriptive statistics and
inferential statistics or inductive statistics.
sample. This is because the sample values are close representations of the actual
values of the population of interest. However, there is a certain amount of
uncertainty about the estimations. Therefore, probability is often used when stating
the conclusions.
Thus, inferential statistical techniques are used to make inferences about the
population based on measurements obtained from the sample. The procedure is to
select a sample from a population, measure the variables of interest, analyze the
data, interpret the output and draw conclusions based on the data analysis.
Quick Check 2
Discrete Variable
It is a countable variable
Example: number of children in your family, number of students in a class or
number of television sets in your houses.
Continuous Variable
It can be measured and the responses take on values that lie within a
continuum or interval.
Example: height of students, weights of babies, age and father’s income.
They are categorical in nature and their values cannot be counted or measured.
Example: gender, father’s occupation, program of study, courses registered or place
of birth.
Chapter 1 Introduction to Statistics 6
Data collected for a numeric variable. It is information that can be measured and
written down with numbers.
Data collected for a categorical variable. It is expressed not in terms of numbers, but
rather by means of a natural language description such as attributes, characteristics,
properties of a thing or phenomenon. In statistics, it is often used interchangeably
with "categorical" data.
TYPES OF
VARIABLE /DATA
Quantitative
Qualitative
(Numerical) (Categorical)
Discrete Continuous
Here's a quick look at the difference between qualitative and quantitative data.
Example 1: Example 1:
Example 2: Example 2:
Latte Latte
Example 3: Example 3:
Data may be described in accordance with the level of measurement attained. The four
levels of measurement are – from weakest to strongest level – nominal, ordinal, interval
and ratio scales.
Ratio
Interval
Ordinal
Nominal
1.6.1 Nominal
1.6.2 Ordinal
Data for which numerical order is meaningful and the categories can be
ranked.
Example:
Education Qualification (PhD, Master, Degree, Diploma, SPM)
1.6.3 Interval
A random sampling process which is each member of the population has an equal
chance of being selected as an element in the sample.
Procedure
Advantage
Disadvantage
A random sampling process in which every kth (e.g every 4th) element or member of
the population is selected for the sample after a random start is determined
Procedure
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
i. Determine value 𝑘 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
.
ii. Use a table of random numbers to select one number between 1 and k. Say
this number ism. This is the first element of the sample.
iii. The rest of the members of the sample will be elementsm + k, m + 2k, m +
3k, ……until the desired sample size is obtained.
Chapter 1 Introduction to Statistics 10
Example:
Suppose population size (N) = 2000, sample size (n) = 50. Hence k = 2000/ 50
= 40. Use a table of random numbers to select a number between 1 and 40.
Suppose the number selected is 15. This is the starting point for selecting
every 40th subject. With the list of the 2000 subjects in the sampling frame, we
select subject number 15, 15+40=55, 15+55=70, …..until the sample size is
reached.
Advantage
Disadvantage
The process of selection can interact with a hidden periodic trait within the
population. If the sampling technique coincides with the periodicity of the trait, the
sampling technique will no longer be random and representativeness of the sample
is compromised.
Advantage
Disadvantage
The population is divided into subgroups, called strata, according to some variable
or variables in importance to the study.Variables often used include: age, gender,
ethnic origin, SES (socioeconomic status), diagnosis, geographic region, institution,
or type of care.A common approach to stratification is by proportional method
whereby subgroup sample sizes equal the proportions of the subgroup in the
population.
In stratified random sampling, all the strata of the population is sampled while in
thecluster sampling researcher only randomly selects a number of clusters from the
collection of clusters of the entire population. Therefore, only a number of clusters
are sampled, all the other clusters are left unrepresented.
Quick Check 3
For the Example above, calculate the number of students that must be selected
from each department to form a sample of size 40.
Advantage
Disadvantage
Advantages
i. It may not be totally representative of the population since only the selected
traits of the population were taken into account in forming the subgroups.
ii. Other traits in the sample may be overrepresented. In a study that considers
gender, socioeconomic status and religion as the basis of the subgroups, the
final sample may have skewed representation of age, race, educational
attainment, marital status and a lot more.
After samples are selected from the population, data are now ready to be gathered from
the selected samples using data collection techniques. There are four major techniques:
iii. The interviewer can note specific reactions and the environment surrounding
the respondents.
iv. Can get response spontaneously from respondents
v. A well trained interviewer can detect incorrect information respondents
Disadvantages
i. Very expensive
ii. Any movement, facial expression or statement by the interviewer can affect
the response obtained
iii. Errors in recording
Advantages
Disadvantages
Advantages
i. Save cost
ii. The investigator does not have to monitor the interviewers
iii. No gestures from the interviewer to affect the response obtained
Disadvantages
Example: Ifwe are interested to estimate the amount of time a customer spends at
checkout counters in a supermarket, we can assign a worker to record the time
from the moment she gets in the queue until she finishes paying at the counter.
Advantages
i. The access of information from objective sources that are not affected by the
respondents
ii. Observation avoids interviewer-interviewee bias
Disadvantages
FURTHER READINGS
Agriculture
What varieties of plant should we grow? What are the best combinations of fertilizers,
pesticides and densities of planting? How does changing these factors affect the course of
the growth process?
Anthropology
How old is an archaeological site? What difference in physical size was there between
ancient Celts and modern day English people? Does the percentage of body fat differ
between urban and rural dwellers in India?
Education
Does a course on classroom behavior for teachers purchased by the Department of
Education have a measurable effect on the teacher’s classroom performance? Do boys
perform better than girls in Mathematics examinations? Is there evidence of sex bias in
admissions to the University of California at Berkeley? What proportions of graduates of
various programs are subsequently employed in their field of study?
Environmental studies
What impact will a proposed industrial plant have on the surrounding ecology? Is there an
increase in birth defects near nuclear power plants? Do strong electric or magnetic fields
induce higher cancer rates among people living close to them?
Fisheries
How many fish of a given species are in the fishing grounds? What level of quotas imposed
on fishermen will maintain the fish stocks? Does antifouling paint contaminate the fish
supply? What is the distribution of deep fish in the North Atlantic Basin?
Forestry
How much wood is there in a forest due to be felled? When should we fell trees in order to
maximize economic return?
Genetics
Does the data support genetic theories about how various characteristics are inherited?
In statistics, the word population is used to designate the complete set of items that are of
interest in the research. Meanwhile, the term sample is used to designate a subset of items
that are chosen from the population. Data on the variables of interest are obtained from the
Chapter 1 Introduction to Statistics 16
sample. The data are then summarized, analyzed and presented in useful forms so that
effective information and conclusions can be derived.
In statistics, the word population is used to designate the complete set of items that are of
interest in the research. Meanwhile, the term sample is used to designate a subset of items
that are chosen from the population. Data on the variables of interest are obtained from the
sample. The data are then summarized, analyzed and presented in useful forms so that
effective information and conclusions can be derived.
What we are typically after in a study is the parameter. A parameter is a numerical value that
states something about the entire population being studied.For example; we may want to
know the mean wingspan of the American bald eagle. This is a parameter, because it is
describing all of the population.Parameters are difficult if not impossible to obtain exactly. On
the other hand, each parameter has a corresponding statistic that can be measured exactly.
A statistic is a numerical value that states something about a sample. To extend the example
above, we could catch 100 bald eagles and then measure the wingspan of each of these.
The mean wingspan of the 100 eagles that we caught is a statistic.The value of a parameter
is a fixed number. In contrast to this, since a statistic depends upon a sample, the value of a
statistic can vary from sample to sample.
Surveys are used as a tool to collect information from some or all units of a population and
compile the information into a useful form. There are two different types of surveys that can
be used to collect information in different circumstances to satisfy differing needs. These
are sample surveys and censuses.
Chapter 1 Introduction to Statistics 17
Sample Surveys
In a sample survey, only part of the total population is approached for information on the
topic under study. These data are then 'expanded' or 'weighted' to make inferences about
the whole population. We define the sample as the set of observations taken from the
population for the purpose of obtaining information about the population.
Advantages
Disadvantages
Estimates are subject to sampling error which arises as the estimates are
calculated from a part (sample) of the population.
May have difficulty communicating the precision (accuracy) of the estimates to
users.
Censuses
Pilot Study
A pilot study is a research study conducted before the intended study. The aim is to identify
possible problems and difficulties that the researcher may encounter when the actual study is
being carried out. Pilot studies are usually executed as planned for the intended study, but
on a smaller scale. Although a pilot study cannot eliminate all systematic errors or
unexpected problems, it reduces the likelihood of making a Type I or Type II error. Both
types of errors make the main study a waste of effort, time, and money.
Designing a Questionnaire
Chapter 1 Introduction to Statistics 19
EXERCISE CHAPTER 1
1. A randomly selected sample of 15 mothers with newborn babies living in a town is asked
the following questions in a face-to-face interview:
3. What is your education level? School dropout / High school / College graduate
i. Discuss TWO advantages and TWO disadvantages of using the method described
above to collect the data.
ii. What is the population for the study?
iii. What are the variables in the study? Determine the level of measurement of each
variable.
3. A researcher wishes to study the career aspirations of students from the Faculty of
Applied Science, which consists of 50 classes. The researcher intends to choose only 10
classes and all the students from these 10 classes will be chosen for the study.