You are on page 1of 24

Introduction to Statistics

CHAPTER 1: INTRODUCTION TO STATISTICS

We consume statistical reports and statistical figures every day, and they affect our daily lives.
We take for instance petroleum prices, which as of 2014, have fallen between USD58 to
USD65 per barrel due to oversupply coupled with poor demand. The price has dropped
between 40-50% compared to the same period a year before. This drop-in price has caused
a chain effect in other commodities as well. Mostly, it has reduced the income of oil exporting
countries and the consequences would be noted by their gross domestic product (GDP).
Reduction in GDP has notable chain effects in the economy and socio-economic variables of
a country. Thus, understanding the hints and signs from statistical data and their
consequences to the economy enable countries to plan their budgets efficiently. Doing so
would avoid a nation from falling into recession which could cause hardships, for example
people losing their jobs.

1.1 What is Statistics?

Every day, millions of data generated: data on demand, productivity, economic growth,
commodity prices, exchange rates, interest rates, company profits, competitors’ strategies,
imports, exports, etc. Managers of organizations collect data, organize, analyze and transform
them into relevant information so that they can be used to compare the current performance
of their organization with prior years, and to guide and plan for future needs.

Collecting

Presenting Organizing

STATISCTICS

Interpreting Analyzing

Figure 1.1 Statistics involves scientific procedures and method


Introduction to Statistics

Statistics represents scientific procedures and methods for collecting, organizing, analyzing,
interpreting and presenting data as useful information, to draw valid conclusions and make
effective decisions (Figure 1.1). These statistical processes form part of the decision-making
process in many organizations. Managers of today need to have strong mathematical abilities
to interpret statistical analyses before they can make informed decisions.

1.2 Statistics to Ponder

In our daily lives, huge amounts of data and information are being released on various events.
The events that occur may be of interest to us or many be worth considering, as these events
affect our decisions. At the very last, we need to keep ourselves abreast with the latest
information, so as not be useful to decision-makers of organizations as well as individuals.

1.3 Statistical Problem-solving

Managers, decision-makers and researches the statistical problem-solving procedures to help


them in making wise and effective decisions. The basic steps in statistical problem-solving are
outlined below.

Step One: Identifying Problems or Opportunities

A manager must understand clearly and define correctly the problem at hand. He must be
careful not to confuse the actual problems that the management is trying to solve and the
symptoms. However, sometimes one fan use symptoms as clues to fins the actual problem.

For example, the monthly sale of Proton cars has been declining significantly for the past 24
months even though the overall auto industry has shown steady growth. The management is
trying to identify the actual causes or factors that had contributed to the problem of declining
local car sales so that corrective action can be taken immediately.

Failing to find the actual causes might result in the local auto industry having to slow down,
and hence, reduced sales and lower profits. The objective is to determine the factors that
contributed to the decline in demand for Proton cars. The actual problem is unknown while the
symptoms are a decline in sales, high cancellation of bookings and slow growth or new
bookings.

Step Two: Gathering Available Facts

Data and information that are related to the actual problem must be gathered. Internal data
can be obtained from the departments within an organization. For example, accounting and
financial data can be obtained from the financial and accounting departments, production
figures can be obtained from the marketing and sales department. The customer service
department and human resource department also provide useful data for analysis.
Introduction to Statistics

External data can be obtained from other organizations such as the Ministry of Domestic Trade
and Consumer Affairs, Central Bank, the Ministry of International Trade and other business
organizations. Other sources include the Journal of Auto Industry, the Journal of Malaysian
Business, newspapers and magazines.

Step Three: Gathering New Data

If the available data are inadequate to get a clear picture of the problem, the management
may decide to collect new data. Sometimes, data on important variables are not available from
secondary sources or the data obtained from these sources are already out-dated or not
suitable for use. As such, the management must obtain data from primary sources.

Appropriate data collections methods must be applied so that the data are gathered
accurately. For example, the management may want to collect data on customers’
expectations on certain characteristics of passenger cars such as the safety standard, design,
performance, price, after-sales service, resale value and rate of financing.

At the same time, the management may also require information regarding the marketing
strategy of competitors such as advertisement and promotional strategies, package offer,
incentive for trade-in, or switching incentive. Several data collection methods can be applied.
They are direct observation, personal interview, telephone interview (especially for long
distance respondents), direct questionnaires, mailed questionnaires and focus group study.

Before primary data is obtained, the manager must determine the representative sample to
be used for the research. In choosing the sample, the researcher must apply appropriate
sampling techniques so that the sample selected represents the target population. Selecting
a wrong sample will produce data that will not accurately represent the population and results
in inaccurate information for decision-making. Any analysis on the biased data is not valid.
The sampling technique used depends on the nature of the target population, the budget
available and also the objectives of the study. Among the sampling techniques available are
simple random sampling, systematic sampling stratified sampling, cluster sampling, quota
sampling, judgemental sampling and snowball sampling.

Step Four: Classifying and Organizing Data

After the required data have been collected, the next task is to make the data more meaningful,
readable and understandable in the context of the problem being investigated. Raw data are
meaningless. They must be transformed into meaningful forms.

Step Five: Presenting and Analyzing Data

Data must be presented in useful and meaningful ways so that they are useful for decision-
makers, and the people reading the report. Some of the common methods of presenting data
are through frequency tables, bar charts, graphs, histograms, frequency polygons, ogives and
stem-and-leaf plots.
Introduction to Statistics

Frequency tables are used to summarize data based on variables of interest. For example,
Proton customers can be grouped according to demographic variables such as income level,
education level, ethnic group and type of job, so that useful information on demand can be
obtained and analyzed. Data presentation through charts, graph, scatter plots and other
visualized methods helps in identifying the relationship between variables of interest.

For example, a manager of a local car company may want to determine the relationship
between the demand for local cars and demographic variables such as gender, education
level, income level and social classes. At the same time, he may be interested to establish the
relationship of these variables with the choice of models, price, quality of service and product
performance. If we want to get more through information, the data needs to be further
analyzed. Among the methods of data analyzes are cross tabulation, chi-square test,
regression analysis and time series analysis.

Step Six: Making a Decision

After going through data presentation, data analysis and interpretation of the results, the
management should have a clear idea of the problem at hand. Certain variables may influence
some other variables. The management can list down the possible alternative action to take
under various economic conditions, and other influential conditions such as change in interest
rates, change in consumers’ lifestyles and developments in technology.

With appropriate statistical analyzes techniques and models, the management can make the
right decision. Among the models that can be applied are decision-making under certainty,
decision-making under uncertainty and decision-making under risk. This is followed by the
implementation of the plan. Appropriate corrective action should be carried out in cases where
deviation from the plan occurs.

1.4 Who Uses Statistics?

Statistical techniques and methods are very important tools for all managers and decision-
makers in government departments as well as private firms. The executives who are involved
in marketing, accounting and financial planning, advertising, hospital administration, research
and development, and other areas of work must have a sound knowledge of statistics in order
to utilize the available data to improve the efficiency and effectiveness of the organization.
Sometimes they may need to collect primary data, summarize, organize, present and analyze
them in order to come out with appropriate conclusions and decisions, as well as
recommendations for their organizations to make accurate decisions.
Introduction to Statistics

1.5 Types of Statistics

Statistical techniques can be divided into two categories: descriptive statistics and inferential
or inductive statistics.

Descriptive Statistics

For descriptive statistics, data are compiled, organized, summarized, and presented in
suitable visual forms which are easy to understand and suitable for use. Various tables,
graphs, charts and diagrams are used to exhibit the information obtained from the data. Thus,
raw data are transformed into meaningful forms so that the user and manager can make
generalizations or conclusions just by taking a quick look at visual presentations.

Inferential Statistics

In inferential statistics, we make generalizations about a population by analyzing the sample.


If the sample is a good representation of a population, accurate conclusions about the
population can be inferred from the analysis of this sample. This is because the sample values
are close representations of the actual values of the population of interest. However, there is
a certain amount of uncertainty about the estimations. Therefore, probability is often used
when stating the conclusions.

Thus, inferential statistical techniques are used to make inferences about the population based
on measurements obtained from the sample. The procedure is to select a sample from the
population, measure the variables of interest, analyze the data, interpret the output and draw
conclusions based on the data analysis.
Introduction to Statistics

1.6 Some Common Statistical Terms

Population and Sample

In statistics, the words population is used to designate the complete set of items that are of
interest in the research.

A set of all items (population)

A sample is a set of items selected


from the population. Hence, the
sample is a subset of the population

Figure 1.2 Relationship between sample and population

The term sample is used to designate a subset of items that are chosen from the population.
Data on the variables of interest are obtained from the sample. The data are then summarized,
analyzed and presented in useful forms so that effective information and conclusions can be
derived.

Statistic and Parameter

A summary measure such as mean, median, mode, or standard deviation, computed from
sample data is called a statistic. However, a summary measure for the entire population is
called parameter. Statisticians often estimate population parameters from the corresponding
sample statistics. For example, in a country of 10 million students, when we compute the mean
of English oral scores of all 10 million students and find that the score is 60, this is called a
population parameter. If 10,000 students are randomly selected from 10 million students in
the country and the average score of their English oral test is calculated, then this is a statistic.

Census

If the population we wish to study is small, it is possible for us to measure a variable for every
unit in the population. If the study is carried out in this way on the whole population, the end
result is a census of the population. For example, if we wish to study the monthly income of
fisherman in a small fishing village, it is possible to obtain data on all the fisherman in that
village. This is a census of the population. Many countries carry out a census study of their
population every 10 years in order to update the information on the residents. Our country
undertakes a census every 10 years and the last census study was done in 2001.
Introduction to Statistics

Sample Survey

A sample survey, on the other hand, involves a subgroup (or sample) of a population being
chosen and questioned on a set of topics. The researcher simply asks the respondents to
answer some questions. The results of this sample survey are usually used to make inferences
about the larger population. A sample survey is necessary if the population is large. Sample
surveys reduce cost and time and the results may be as accurate as the census study if the
sample is selected using a proper sampling technique.

Pilot Study

A pilot study is a study done before the actual fieldwork is carried out. The aim is to identify
possible problems and difficulties that the researcher may encounter when the actual study is
being carried out. This study is also used to test out questionnaires and to improve the
questionnaires in terms of flow, question design, language and clarity.

1.7 Data

Data are measures on variables of interest obtained from a sample. For example, researchers
may collect data on the amount of money spent by secondary school students on textbooks,
the brand of detergents most preferred by housewives in Kelantan, the monthly income of
rubber smallholders in Malaysia, the time taken by Malaysian-made cars to accelerate from 0
to 100 km/h, the average length of stay of foreign tourists in Malaysia and their favourite places
of visits.

There are two types of data: qualitative and quantitative.

Qualitative Data

Sometimes managers may use non-numerical data in their reports. For example, reports on
the perception of the public towards the 1Malaysia concept, reports of consumer surveys
regarding the implementation of the Goods and Service Tax (GST) beginning April 2014, or
the preference of car users towards new car models, namely Perodua Axia and Proton Iriz in
the market. Surveys collect qualitative data. Qualitative data provide definite information
regarding opinions, perceptions, preferences and behaviours of the respondents towards the
subject under study.

For example: the customers’ perception on the new Proton model can be one of the following:

a. Excellent
b. Good
c. Average
d. As expected
e. Below average
Introduction to Statistics

Qualitative data are less accurate and have limited statistical methods available for data
analysis. These data are measured based on specific categories. For example, gender is
measured as ‘male’ or ‘female’, marital status is measured as ‘single’, ‘married’ or ‘divorced’,
opinion is measured as ‘strongly disagree’, ‘disagree’, ‘neutral’, ‘agree’ or ‘strongly agree’, etc.
in practice, qualitative research is employed to achieve a variety of objectives. Among the
objectives of qualitative research are:

a. To obtain certain information on a market segmentation.


b. To explore the concepts and positioning of a product or service.
c. To identify attitudes and opinions shared by the target customers.
d. To clarify certain issues that might arise when defining the problem.
e. To provide proper direction for the development of questionnaires.

Quantitative Data

When numerical data are required, the research is called a quantitative research. Data on total
demand, total supply, rate of interests, amount of expenditure, annual sales, total export, total
import and volume of transaction are all numerical. Quantitative research provides more
accurate market information as they are numerical.

For example, the weight of an iPhone 6 is:

a. Below 300 grams


b. Between 300-400 grams
c. Between 400-500 grams
d. Above 500 grams

Quantitative data are more accurate compared to qualitative data, and more statistical
methods can be applied in the data analysis. Some common applications of quantitative
research include:

a. Estimating the future demand for certain product in the local market,
b. Projecting the growth of certain business based on population growth, and
c. Modelling the growth of gross national product (GNP) for the country.

Quantitative variables can be further classified as either continuous or discrete. Continuous


variable is one with an unlimited number of values that may take place. Examples of
continuous variables are the distance of travel, the amount of petrol consumption, the weights
of individuals, monthly expenditure, the amount of loans, and the volume of export. Discrete
variables are whole numbers obtained through counting. Examples include the number of
students in a class, the number of foreign employees in a firm, the number of Malaysians who
perished in MH17 plane crash, the number of MAS employees terminated, and the number of
cars available in stock. Personal interviews, telephone interviews and mailed questionnaires
are the most common techniques used to obtain quantitative data.
Introduction to Statistics

1.8 Data Sources: Primary and Secondary

Primary data are basically the data that is collected by the user himself. The user here can be
a student completing his assignment, a lecturer preparing for class lecture, or a postgraduate
student completing his Master or a PhD thesis. Some experts refer to primary data as “first-
hand” data source.

Secondary data include data structure which were previously collected and summarized by
certain parties or departments for their own use and the current user obtain this data from their
report. These data can be in the form of newspaper report, annual report of certain
department, or the published report by certain organization.

Primary Data Source

Primary data source is the source that presents first-hand information. The sources can be
the original records, original documents, personal interviews, and records of eyewitnesses
that are original in nature.

Among the examples of primary data collection activities are:

a. Conducting surveys to obtain data on students in this class by asking their age,
gender, marital status, previous education such as SPM results, matriculation
results, parent occupation, parent income, monthly expenditure, source of
financing, and their ambition for the future.
b. Recording the observations on customers’ behaviour when choosing cosmetic
products to purchase in the supermarket, or observing the behaviour of teenagers
when they are riding motorcycles in the highways.
c. Conducting laboratory experiments on rats to measure how much it gains weight
after eating certain food, or how long it takes to heal from injury after applying
certain drugs.

Secondary Data Source

Secondary data are the data obtained from secondary sources such as the Police Station
(data: types of crime occurred and amount of losses incurred by the victims), Immigration
Department (data: illegal immigrants detained and their country of origins), annual report from
a company (data: total sales, expenditure, profits, and dividend), Department of Students
Affairs (data: total number of students enrolled for every program, the number of graduates
every semester, and their class of graduation), etc.
Among the examples of secondary data collection activities are:
a. The review and analysis of the existing data on the target markets. These data are
available in the business magazines, research studies, government publications,
annual report of a company.
b. The evaluation of weekly demand trend n competitors’ product for the last 5 years
in comparison with the company’s product. These data are available in their
respective annual report.
c. The assessment of an impact on the economy due to the increase in fuel prices,
food prices, and also political unrest. These data are available from academic
researchers in a university.
Introduction to Statistics

d. The analysis of voting trend of the Malay voters towards the Malay Political Parties
in the 2013 General Election. These data are available from General Election
Commission or SPR.

The secondary research does not require any method to get the data because it already exists.
Thus, no sampling technique, or method of data collection is needed to collect secondary data.
Instead, the time and effort are spent to locate and gather the information from reliable
sources. Some resources for secondary research information include:
a. Government departments, ministries, government agencies-some can access
through the Internet.
b. Libraries, books, research reports, business publications, magazines and daily
newspapers.
c. Trade associations- most associations have reports on the industries they serve,
the standards they operate under, the profile of investors, and the corporate
leaders in the field.
d. Local agencies such as district office, post-office, land-office, district police
stations, religious offices, immigration departments etc.
e. Consumer association, non-governmental organizations, financial institutions, real
estate agencies, insurance companies, business organizations.
f. Federal government agencies can provide extensive demographic data on
population, industries, commodity prices, investments, financial markets and other
economic figures.
g. Regional planning organizations and local governments have historical as well as
current data on community growth trends. Many offices also have forecasted
demographic statistics for the area.
h. Media representative- advertising and sales personnel for television station, radio
and print media outlets keep information on their viewers, listeners, and readers to
help influence potential advertisers.

1.9 Types of variables

A variable measure the characteristics of the population that the researcher wants to study.
For example, variables of interest may be the monthly income of respondents, respondents’
age, gender, level of education, number of children and type of house owned by respondents.

Qualitative and Quantitative Variables.

Variables can be divided into qualitative and quantitative variable. A variable that cannot be
measured or assumed a numerical value but can only be divided into different categories is
called qualitative variable (or categorical variable). An example of qualitative variable is the
colour of the car. The colour can be red, blue, green, white, black or other colours and it cannot
be measured.

A quantitative variable on the other hand, is a variable that can be measured numerically or
counted. Volume of water consumed per day by a person is a quantitative variable. An English
test score of a student is another example of quantitative variable. If the values of a quantitative
Introduction to Statistics

variable are countable, the variable is a discrete quantitative variable. The discrete variable
can assume only certain values with no other sub-values in between. The number of cars that
pass through the Subang toll in one hour is a discrete variable. A continuous variable can
assume any numerical value between a specified interval or intervals of values. Thus, a
person’s weight is a continuous variable as it can take on values such 56.7 kg, 77.3 kg and
44.6 kg etc. Figure 1.3 illustrates that types of variables.

Variable

The characteristics of the population of interest

Example: Monthly income, respondents’ age, gender, level of


education, and type of house owned etc

Quantitative or Numerical Qualitative or Attributive

• Measured with a • Measured with a non-


numerical scale numerical scale
• Yields numerical • Yields categorical response
response Example : Are you a
Example : How tall are Malaysian?
you? The answer is only ‘Yes’ or
The answer is numerical. ‘No’

Discrete Continuous

• Numerical response • Numerical response which


which arises from a arises from a measuring
counting process process.
• Example : How many • Example : How tall are you ?
children do you have ? What is your weight ?

Figure 1.3 Two types of variables


Introduction to Statistics

1.10 Scale of Data Measurement

Basically, data can be divided into numerical and categorical data. Numerical data contains
numbers that we can manipulate using ordinary arithmetical operations. For example, if we
count the number of cars that pass through a toll-booth for three consecutive days, then the
data is numerical.

Categorical data can be sorted into categories. For example, data on the marital status of
respondents can be classified into single, married, widow or widower, or divorced. When data
can be divided into different categories, then the data is categorical. Usually, data is classified
as nominal, ordinal, interval or ratio.

Nominal Data

Nominal data is categorical data. This type of data is not capable of being manipulated
arithmetically. The number in the data cannot be added or subtracted from another number
as these arithmetic operations do not give any meaning. For example, if we code 1 for red car,
2 for white car and 3 for blue car, and we add 1 to 2 and obtain 3, the result is meaningless a
red car plus white car is not equal to blue car.

The nominal scale is the lowest in the level of data measurement scales. Data are classified
into categories and the frequency of each category is counted. Other examples of nominal
scale measurements are eye colour, gender, religion, country of origin, hobby and taste.

Ordinal Data

Data of ordinal scale can be arranged in ranking order and inequality signs can be used when
comparing the values of the variable. However, the differences between data values cannot
be determined or are meaningless.

For example, four basketball teams A, B, C and D that took part in a competition can be ranked
as being the first place, second place, third and last. The ordering of the numbers has a certain
meaning. For example, if 1 is used to represent SPM qualification, 2 to represent diploma, 3
to represent degree, 4 for Masters and 5 for PhD, then we know that qualification 5 is higher
than qualification 4, and qualification 3 is higher than qualification 2 and so on.

The ordinal scale is a level higher than the nominal scale. For example, a supervisor can rank
three subordinates with number 1, 2 and 3 where the subordinate with rank 1 is considered to
be the most productive. However, the value of 2-1 and 3-2 are meaningless.

Interval Data

If the differences between data values are meaningful but cannot be manipulated with
multiplication and division, then the variable is for interval scale. For example, the temperature
in degrees Celsius is of interval scale. We know that the temperature of 300C is warmer than
200C but 300C is not 1.5 times warmer than 200C (with division 30/2051.5).
Introduction to Statistics

Ratio Data

Ratio measurement is the interval measurement with an inherent zero setting. Differences
between two values and the ratio of two values are meaningful for this level of measurement.
The zero has meaning and represents the absence of the phenomenon being measured. This
is the highest level of data measurement scales. Some examples of ratio measurements are
height of a respondent, weight of a durian, time taken to complete a given task, monthly
income of a surgeon, monthly expenditure of an average family, test score, monthly amount
spent on hand-phone usage and others.

Table 1.1 The comparison among the types of measurement level

Statistics

Scale Basic Characteristics Examples Descriptive Inferential

Nominal Number assigned to ID number, Frequency, Chi-square,


classify objects. gender, percentages, Binomial Test
programme, mode
types of house,
ethnic group
Ordinal Number assigned to Social class, Median, Spearman
indicate the relative qualification, job percentile, Rank
positions of the position, rank of ranking Correlation,
ordered objects. opinion, Friedman
perception ANOVA
Interval Number assigned to Age, income, Range, mean, Person
indicate the magnitude attitudes, variance, Correlations, t-
of differences between opinions, the standard tests, ANOVA,
objects. Normally in strength of deviation, Factor Analysis,
the multiple-choice agreement or skewness Regression
response. disagreement
Ratio Zero of point is fixed; Length, weight, Range, mean, Coefficient of
number assigned to income, cost, variance, Variation,
indicate the actual sales quantity, standard almost all
value or amount of amount of deviation, statistical
variable expenditure skewness analysis
methods can be
used

The comparison of the four types of measurement scales is explained in Table 1.2.
Introduction to Statistics

Table 1.2 The relationship between nominal, ordinal, interval, and ratio data

Nominal Ordinal Interval Ratio


No Name of Preference Preference rating of Money spent in the last
restaurant ranking among food quality from two months at the
the restaurants the scale of 1 to 10 respective restaurants
1. Delima Cafe 2 5.5 RM180.00

2. Sigai Cafe 1 7.1 RM300.00

3. Restaurant 4 3.4 RM10.00


Selera
Kampung
4. Restaurant 5 1.0 RM0.00
Cenderawasih
5. Restaurant 3 4.0 RM100.00
MakDara

1.11 Sampling and Data Collection Methods

What is Sampling?

Sampling is the scientific procedure of selecting a sample of a population. Since the data
obtain from the sample is used to generalise or to make a conclusion about the population,
the sample must be selected in a such a way that it will accurately represent its population. In
order to ensure the accuracy of the sampling process, the appropriate sampling techniques
must be used.

Sampling techniques are scientific methods of selecting representative samples from


populations. The samples selected must be random and representative of the respective
population. The sampling technique used in each study depends on the characteristics of the
population of interest. This includes factors such as homogeneity (or heterogeneity) of the
population, the availability of the sampling frame (list of individuals or items from which the
sample can be obtained), the research budget and the importance of the research. Some
common sampling frames include the list of school teachers in the country, list of Malaysians
who own luxury cars, list of credit card holders, list of eligible voters, list of homes in certain
housing are, etc.
Introduction to Statistics

1.12 Types of Sampling Techniques

Sampling techniques can be classified broadly into two categories: the non-probability
sampling technique and the probability sample technique. The non-probability sampling
technique includes convenience sampling, judgemental sampling, snowball sampling and
quota sampling. The probability sampling technique includes simple random sampling, cluster
sampling, systematic sampling and stratified random sampling. Figure 1.4 shows the
classification of the two categories of sampling techniques. In general, researchers prefer to
use probability sampling techniques to ensure that their findings are valid as well as to allow
them to make inferences on the population.

Non-probability sampling Probability sampling


techniques techniques

Simple random
Convenience sampling
sampling
Snowball sampling Cluster sampling

Quota sampling Systematic sampling


Stratified random
Judgemental sampling
sampling
Figure 1.4 Two types of sampling techniques

Non-probability Sampling Techniques

Non-probability sampling techniques are used when the generalization concerning the
population is not important, and at the same time the sampling frame where the sample is to
be selected is not available or it is difficult to obtain. One of the most common non-probability
sampling techniques is convenience sampling.

Convenience sampling.

Convenience sampling, as its name implies, is the procedure where the researcher selects
respondents at his own convenience. Using this method, the selection of the respondents is
at convenience of a researcher. Often, the respondents are selected because they happen to
be in the right place at the right time where the researcher is conducting survey. For example,
the researcher is conducting a survey at the entrance at the shopping complex at 10 am in the
morning; the customers who arrive there at 10 am will be the respondents for his research.
The researcher can conduct the interview with these respondents or he can distribute the
questionnaire for them to answer. Interviews and questionnaires are the method of data
collection.

Judgemental sampling

Judgemental sampling is the procedure of selecting respondents for research solely based on
the judgement of the researcher. The researcher selects a respondent whom (from his
judgement) he feels possesses certain characteristics that represent the population of interest.
For example, a researcher is doing a study on illegal traders who sell pirate CD/DVD on the
Introduction to Statistics

streets. He will go to the street and look for the respondent based on the certain characteristics
namely ‘illegal traders’ and ‘pirate CD/DVD’. The characteristics of ‘illegal traders’ and ‘pirate
CD/DVD’ depend on the experience of a researcher such as not having proper shop; selling
items at low price, and looked suspiciously all the time.

Snowball sampling

Snowball sampling is the procedure of selecting the subsequent respondent based on the
information provided by the earlier respondents, and the process continues until enough
respondents are obtained.

The researcher only needs to identify the first respondent who possess the characteristics
required by the study. For example, in the study on victims of the “quick-rich schemes”, there
is no sampling frame available. The researcher only needs to identify the first victim (may be
through the means of intensive interview or any other means). After some interview session
(data collection), the researcher would ask for other investors whom the first respondents
know who suffer the same fate. The process continues until enough respondents are
surveyed.

Quota sampling

Quota sampling is the procedure of selecting respondents who possess certain characteristics
determined by the study. The characteristics of respondents of interest could be their hair
styles, their dressing styles, their personal character, their hobby, etc. The sampling process
is quite similar to convenience sampling but it differs in term of flexibility to choose the
respondents he wants provided they abide by the stated specifications.

Probability Sampling Techniques

Probability sampling techniques are used when a researcher plans to make inferences about
the population of interest, and the sampling frame where the sample is to be selected is
available. The sample is randomly selected from the sampling frame in a such a way that
every unit in the population has an equal chance to be selected as respondents in the study.
The type of probability sampling technique to be used depends very much on the characteristic
of the population in the study. The characteristic of population could either be homogeneous
or heterogeneous.

Simple random sampling

A simple random sample is used when the population is homogeneous and the sampling
frame is available. The sample is selected from the population in such a way that each item
has the same chance of being selected as a respondent. This method is similar to a lottery
system in which the winners are selected from a pool of contestants based on the list number
without looking at their names.

To obtain a sample using this method, the researcher first needs a sampling frame or a list of
the population. Then random numbers are generated to determine which elements are to be
selected as a sample. The random numbers may be generated using a computer routine or
by using a table of random numbers. For example, a sample size of 10 is to be selected from
Introduction to Statistics

a list containing names of 800 people in certain housing area. This can be done by generating
10 random numbers from 1 to 800 using the random number generator in a computer. If the
computer produces the following numbers: 5, 123, 289, 292, 349, 376, 589, 667,698 and 754,
then the names on the list corresponding to the random numbers generated would be selected
as respondents. In this case, all 800 people have an equal chance to be selected in the study.
The selection of the 10 names is completely unbiased because the computer generates the
numbers and these numbers correspond to the names on the list. Hence, the respondents are
selected unbiasedly using random numbers

Systematic sampling

A systematic sampling is also used when the population is homogeneous and the sampling
frame is available. In systematic sampling, the researcher divides the population size (N) by
the intended sample size (n) to obtain the range k(k = ). One number is then randomly selected
from the first k elements in the list. Suppose number 4 is selected from this range, then the 4th
element in the sampling frame is to be first respondent. The following respondents would be
(4 + k) th, (4 + 2k) th, (4 + 3k) th, …. And so on until a sample of size n is obtained. In short,
a sample is obtained by randomly selecting an element from the first k elements in the
sampling frame and ten every kth element thereafter. This is called a 1-in-k systematic sample,
with a random start. The random start r is obtained using simple random sampling.

For example, the population is 100 and a sample of 10 is desired. In this case, the sampling
𝑁 100
interval k is 𝑛 = 10
= 10. Firstly, a random number between 1 to 10 is selected. If, for example
this number is 4, the sample consists of elements 4, 14, 24, 34,44,54,64,74,84 and 94. Now,
we have a sample of 10 elements obtained from number 1 to 100.

Stratified sampling

A stratified sampling is also used when the population is heterogeneous and the sampling
frame is available. The researcher divides this heterogeneous population into several
homogenous strata. Then the researcher would select the sample randomly from each of
these strata. Stratified samples may be proportionate or disproportionate. Elements within
each stratum should be homogeneous, whereas the differences between strata should be
heterogeneous.

A stratified sampling is a two-step process in which the population is partitioned into strata.
Stratification of the population is needed in order to obtain several homogeneous subgroups.
If we are dealing with a population that is very diverse, and if the population can be subdivided
into subgroups that are homogeneous, we will achieve better results through such subdivision
or stratification. Next, elements are selected from each stratum by a random procedure,
usually simple random sampling.

Researchers prefer stratified sampling due to several reasons. Firstly, the researcher could
analyse each strata separately and compare the results among the strata. Secondly, the
statistical analysis based on stratification is more accurate since the data come from a
homogeneous population compared to non – stratification where the data comes from a
heterogeneous population.
Introduction to Statistics

Cluster sampling

A cluster sampling is used when the population of interest is scattered widely across certain
geographical area and the sampling frame is available. First of all, the researcher divides the
target population into some clusters based on geographical areas. Then, a random sample of
clusters is selected based on a probability sampling technique such as simple random
sampling. For each of the selected cluster, the researcher could use all population as
respondents for the study. Cluster sampling is more economical and efficient since the
researcher focuses more on the selected clusters. If the sampling frame for individual
elements is not available or incomplete, cluster sampling provides a good alternative method.

For example, to take a cluster sample from a large town, we begin by listing all the streets with
residential units. These streets may be considered the clusters. Next, we take a random
sample of these streets and we pick the respondents from the streets selected. This is
relatively easier to do than simple random sampling, since the researcher needs to focus only
on residents of a few streets rather than all the residents of the town.

Multi-stage sampling

Multi-stage sampling is designed to reduce time and cost when working with samples from
very large populations. Let us assume we need a random sample of 2000 residents from the
Malaysian population. Since Malaysia consists of 14 states, with many districts within each
state, and many villagers within each district, we could apply the multi-stage sampling
technique. First, we select four states at random. Then, we choose five districts randomly from
each of the 20 districts chosen to make up our sample of 2000 (4 states x 5 districts in each
state x 100 people from each district). Therefore, we only collect the data from 20 relatively
small areas instead of having to visit 2000 people throughout the country.
Introduction to Statistics

1.13 Strengths and Weakness of Basic Sampling Techniques

Table 1.3 provides a summary of the strengths and weaknesses of the basic sampling
techniques.

Table 1.3 Summary of the strengths and weaknesses of the basic sampling techniques.

Non-probability sampling

Technique Strength Weakness

Selection bias, no assurance of


Convenience Less expensive, less time representativeness, not
sampling consuming, convenient recommended for descriptive or
casual research

Judgemental Less expensive, less time


Does not generalization, subjective
sampling consuming, convenient

Quota Sample can be controlled for Selection bias, no assurance of


sampling certain characteristics representativeness

Snowball Can estimate rare


Time consuming
sampling characteristics

Probability sampling

Technique Strength Weakness

Difficult to obtain sampling frame,


Simple random Easily applied. Results can be
expensive, sometimes no assurance
sampling projected on population
of representativeness

Can decrease representativeness if


Systematic Easier to implement than
certain patterns exist in sampling
sampling simple random sampling
frame

Includes all important


Stratified Difficult to select relevant stratification
subpopulations, precision is
sampling variables, expensive
improved

Cluster Easy to implement, cost Imprecise, difficult to compute and to


sampling effective and work is reduced interpret results
Introduction to Statistics

1.14 Data Collection Methods

The next step after the sample is identified and selected by using the appropriate sampling
technique is to determine the best way to reach the respondents in order to obtain the required
data. There are several methods of collecting data and each has its own advantages and
disadvantages. A researcher must choose the methods that provide the most information at
minimum cost. The common methods of data collection are as follows.

a) Fate-to-face interview (personal interview)

b) Telephone interview

c) Direct questionnaire (questionnaires are distributed and collected personally)

d) Mail or postal questionnaire (questionnaires are sent and received back through the post)

e) Direct observation (respondents are observed and data recorded); and

f) Other methods (e-mail, video recording).

Normally, each method requires a set of prepared questions so that the data can be obtained
systematically and accurately. Individuals who respond to questionnaires or interview are
called respondents and their responses are the required data. The person conducting the
interview is called an interviewer. A research assistant normally helps to carry out the
interview.

➢ Face-to-face interview

In a face-to-face interview, an interviewer asks the questions, normally from a questionnaire


and records the responses. It is also known as a personal interview. This form of data
collection normally yields a high response rate. There are some advantaged and
disadvantages of using a face-to-face interview.

An advantage is that the method allows an interviewer to clarify terms that the respondents
do not understand, which results in higher response rates. At the same time, an interviewer
can note the reactions of the respondents and their surrounding environment. People will
usually respond spontaneously when approached personally. As such, a well-trained
interviewer can detect if a s is giving false information.

On the other hand, face-to-face interviews are expensive. Interviewers must be carefully
selected and trained, and sufficient incentives must be provided to ensure that the interviewers
hired are competent and dedicated. Facial expressions and statements by interviewers can
affect responses. In addition, errors in recording responses can lead to erroneous data.

The interviewers must be supervised closely to ensure that respondents are interviewed and
that interviewers’ behaviour is appropriate. A researcher must also ensure that the
interviewers do not fill in the questionnaire themselves without conducting proper interviews.
Introduction to Statistics

➢ Telephone interview

In a telephone interview, an interviewer asks questions from a prepared questionnaire, these


interviews are normally short in duration. This method has some limitations because
respondents are restricted only to individuals who can reached by telephone. Another
disadvantage is that the response rates are normally lower than face-to-face interviews.
Furthermore, only a few questions can be asked through a telephone interview as it may not
be convenient for respondents to answer too many questions.

One advantage of telephone interviews is that it is less expensive that personal interviews. A
researcher can also monitor the interviews to ensure that specified interview procedures are
followed during the process.

➢ Direct Questionnaire

In this method, the researcher will greet respondents and explain briefly his intention before
giving the questionnaires to the respondents. The researcher will wait for the respondents to
complete the questionnaire.

➢ Mailed Questionnaire

A questionnaire is sent to each respondent with a stamped addressed envelope attached. The
respondents are requested to answer the questions in the questionnaire and return it to the
researcher within a certain period of time. Many researchers use this method of data collection
as it is the cheapest and easiest compared to other methods. The advantages and
disadvantages of postal questionnaire are listed as follows.

✓ Advantages

a) The method is cheaper than personal interviews

b) The research coverage is wider

c) No interviewer influence

d) The respondent has more time to think of proper responses

✓ Disadvantages

a) Normally, the rate of response is quite low

b) It may be biased because only particular types of people will reply

c) Nobody is on hand to explain the questions, resulting in questions incorrectly answered or


not answered at all.

d) Only very simple questions can be asked


Introduction to Statistics

e) Questions may not be answered in a particular sequence and the respondent can see the
whole form before filling up.

f) Some people may send fictitious answer

g) The questionnaire may be filled in as a team effort, so the opinions of several people
embodied in one form

h) There may be considerable delay before enough replies are received and therefore analysis
may be delayed.

➢ Direct Observation

This is the most commonly used method of collecting statistical data. Direct observations is
used in work, studies and organizations. It is also used by social scientists to learn about the
customs and habits of people or communities.

This method enables the researcher to record what actually happens. It is not influenced by
what people say or think. The access of information from objective sources is not affected by
respondents themselves.

A disadvantage is that the observer needs to be highly skilled and unbiased. Observations do
not tell us about the respondents’ intentions. Present observations do not tell us about past or
future happenings.

➢ Other Methods

Nowadays, many new techniques of gathering information are available. For example,
information can be collected through IT, electronic e-mail, internet survey and short messaging
service (SMS).
Introduction to Statistics

EXAMPLE 1

a) Define each of the following terms:


i) Statistics
ii) Population
iii) Sample
iv) Pilot study

b) State the measurement scale used for each of the following variables.
i) Length of a frog’s jump: Ratio / Interval
ii) Number of typing errors in a report: Ratio
iii) Students’ grade in an examination: Ordinal
iv) Marital status of employees in XYZ Enterprise: Nominal

c) A researcher wishes to conduct a study on the type of daily newspaper readership based
on race in Bintang Town which consists of 20,000 people. The resident’s breakdown is
60% Malay, 20% Chinese, 15% Indian and 5% others. A random sample of 400 people will
be selected for this study.

i) State the population and the sample frame of the study.


Population: All residents in Bintang Town
Sampling Frame: List of resident’s names.

ii) Identify the variable of interest for this study and state its type.
Variable of interest: Type of daily newspaper.
Type of variable: Qualitative Variables.

iii) What is the sampling method used?


Stratified sampling technique.

Race Residents’ Breakdown Sample


Malay 60% 60/100 x 400 = 240
Chinese 20% 80
Indian 15% 60
Others 5% 20

iv) If systematic sampling technique is used, describe the steps on how to select the sample for
the Malay race.
N (Malay) = 12000 and n (Malay) = 240
Interval = 12000/240 = 50
From 1 to 50, 1st sample will be selected randomly.
Next sample will be selected every 50th people based on the selection of first sample.
Example: 5, 55, 105, … until 240 samples
Introduction to Statistics

EXAMPLE 2

a) Define each of the following terms:


i) Inferential Statistics
ii) Continuous Variable
iii) Qualitative Data
iv) Sampling Frame
v) Census
vi) Data Collection Method

b) A research was conducted to study the amount of money spent on textbooks by students in UiTM
Perak. Five faculties were randomly selected from a total of eight faculties and every student from
the selected faculties was studied.

i) State the population and the sample frame of the study.


Population: All students of UiTM Perak
Sampling Frame: List of student’s names or student’s ID

ii) Identify the variable of interest for this study and state its type.
Variable of interest: The amount of money spent on textbooks
Type of variable: Discrete quantitative

iii) What is the sampling method used? Give ONE advantage and ONE disadvantage of the
method.
Cluster
Advantage: Any relevant.
Disadvantage: Any relevant.

iv) Name the most appropriate data collection method for the above study and give ONE (1)
advantage of this method.
Method: Direct Questionnaire
Advantage: Any relevant.

You might also like