You are on page 1of 610

Matters of Discussion

Introduction to Data & Analytics

Getting to Know your data and dataset.

Analytic case

Compiled for Nasscom Associate Analytics


1
Data Analytics in Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Analytics Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses


DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
2
Data Analytics: On What Kinds of Data?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web

3
Types of Data Sets
 Record
 Relational records
 Data matrix, e.g., numerical matrix,
crosstabs
 Document data: text documents: term-
frequency vector
 Transaction data
 Graph and network
 World Wide Web
 Social or information networks
 Molecular Structures
 Ordered
 Video data: sequence of images
 Temporal data: time-series
 Sequential Data: transaction sequences
 Genetic sequence data
 Spatial, image and multimedia:
 Spatial data: maps
 Image data:
 Video data:
4
Data Objects
 Data sets are made up of data objects.
 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales
 medical database: patients, treatments
 university database: students, professors, courses
 Also called samples , examples, instances, data points,
objects, tuples.
 Data objects are described by attributes.
 Database rows -> objects; columns ->attributes.

Compiled for Nasscom


5
Attributes
 Attribute (or dimensions, features,
variables): a data field, representing a
characteristic or feature of a data object.
 E.g., customer _ID, name, address
 Types:
 Nominal

 Binary

 Numeric: quantitative

 Interval-scaled

 Ratio-scaled

Compiled for Nasscom Associate Analytics


6
Attribute Types
 Nominal: categories, states, or “names of things”
 Hair_color = {auburn, black, blond, brown, grey, red, white}
 marital status, occupation, zip codes
 A variable with values which have no numerical value
 Binary
 Nominal attribute with only 2 states (0 and 1)
 Symmetric binary: both outcomes equally important
 e.g., gender
 Asymmetric binary: outcomes not equally important.
 e.g., medical test (positive vs. negative)
 Convention: assign 1 to most important outcome (e.g., HIV
positive)
 Ordinal
 Values have a meaningful order (ranking) but magnitude
between successive values is not known.
 Size = {small, medium, large}, grades, army rankings
 A variable with values which have no numerical value
7
Numeric Attribute Types
 Quantity (integer or real-valued)
 Interval
 Measured on a scale of equal-sized units
 Values have order
 E.g., temperature in C˚or F˚, calendar dates
 No true zero-point
 Ratio
 Inherent zero-point
 We can speak of values as being an order of
magnitude larger than the unit of measurement
(10 K˚ is twice as high as 5 K˚).
 e.g., temperature in Kelvin, length, counts,
monetary quantities

Compiled for Nasscom Associate Analytics


8
Discrete vs. Continuous Attributes
 Discrete Attribute
 Has only a finite or countably infinite set of values

 E.g., zip codes, profession, or the set of words in


a collection of documents
 Sometimes, represented as integer variables

 Note: Binary attributes are a special case of discrete


attributes
 Continuous Attribute
 Has real numbers as attribute values

 E.g., temperature, height, or weight

 Practically, real values can only be measured and


represented using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
Compiled for Nasscom Associate Analytics
9
• Document database
 A document can be represented by thousands of
attributes, each recording the frequency of a particular
word (such as keywords) or phrase in the document.
 Term frequency (TF) means how often a term occurs in
a document.

• Term frequency dataset

Compiled for Nasscom Associate Analytics


10
Time series Data
 A time series is a series of data points
indexed (or listed or graphed) in time order.
 Most commonly, a time series is a sequence
taken at successive equally spaced points in
time.
 Thus it is a sequence of discrete-time data.
 Examples of time series are heights of ocean
tides, USD value, market share value, many
more..
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
11
12
Example of Time Series Data
Field Example topics
Economics Gross Domestic Product (GDP), Consumer Price Index (CPI), S&P 500
Index, and unemployment rates

Social Birth rates, population, migration data, political indicators


sciences

Epidemiology Disease rates, mortality rates, mosquito populations

Medicine Blood pressure tracking, weight tracking, cholesterol measurements,


heart rate monitoring

Physical Global temperatures, monthly sunspot observations, pollution levels.


sciences

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
Time Series Components

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


14
Time Series Components – cont..

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
Time Series Components---cont..
 Trend:- general tendency of the data to increase or
decrease during a long period of time.
 'long term' movement in a time series without calendar
related and irregular effects.
 population growth, price inflation and general economic
changes.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


16
Time Series Components—cont..
How do we identify seasonality or seasonal pattern?
 With respect to calendar related effects.
 large seasonal increase in December retail sales due
to Christmas shopping.
 magnitude of the seasonal component increases
over time , as the trend does.

periodic time series

17
Time Series Components--Cycle
 A cyclic pattern exists when data exhibit rises and
falls that are not of fixed period.
 The duration of these fluctuations is usually of at
least 2 years.
 If the fluctuations are not of fixed period then they
are cyclic else seasonal.

Within the time


interval,
the fluctuations
are not fixed

18
Time Series Components-summary
These components are defined as follows:
 Level: The average value in the series.
 Trend: The increasing or decreasing value in
the series.
 Seasonality: The repeating short-term cycle
in the series.
 Cyclic: data exhibit rises and falls that are not
of fixed period
 Noise: The random variation in the series.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Specific Data Analytic case

Compiled for Nasscom Associate Analytics


20
Cosine Similarity in Data Analytic Apps
• A document can be represented by thousands of attributes, each
recording the frequency of a particular word (such as keywords) or
phrase in the document.

• Other vector objects: gene features in micro-arrays, …


• Applications: information retrieval, biologic taxonomy, gene
feature mapping, ...
• Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency
vectors), then
cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d||: the length of
21 vector d
Example: Cosine Similarity
• cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d|: the length of vector d

• Ex: Find the similarity between documents 1 and 2.

d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)

d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 =
6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5
= 4.12
cos(d1, d2 ) = 0.94
22
TASK FOR YOU—A2
1. Investigate the Attribute or dimensions or
features or variables with a suitable scenario
and prepare your critical report?
 Nominal

 Binary

 ordinal

 Numeric: quantitative

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
V.V.I
1. Investigate the numerous time series
components in context to the business
analytics and application modeling.

2. Types of Data Sets and Data Object concept

3. Document database- Term frequency dataset

4. Cosine Similarity in Data Analytic Apps.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
25
Cheers For the Great Patience!
Query Please?

Compiled for Nasscom Associate Analytics


26
Matters of Discussion
Brief Evolution of DSA
DW-OLAP
KDD
Data mining& analytics
Core Tasks, Apps &
Algorithms
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
1
Brief Evolution of DSA
Year/Duration Features Included in DSA
1960 Data science as a substitute of CS
1974 DS as data processing methods
1977 Exploratory data analysis
1989-1996 Data classification, mining, and knowledge discovery

1997-2001 Statistical computing, KDD


2005 Analytics and fact based decision
2010-11 Statistics & machine learning

2012 to till date IoTA, Cognitive learning, Big data analytics.

2
DSA For Analytics and Applications

Information Framework for IoT-DSA.


3
TECHNOLOGIES & SUPPORTED
TOOLS/ PROCESSES/
FRAMEOWRKS/ TASKS/
APPLICATIONS

4
DW-OLAP
 Like SQL in DBMS

 OLAP is the dynamic synthesis, and analysis


of large volumes of multi-dimensional data.

 OLAP uses multi-dimensional view of


aggregate data to make forecasting.

 OLAP finds- what is happening?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Multi-dimensional data

One-dimensional Two-dimensional
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
6
OLAP Architecture

7
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
OLAP Applications

1. Finance: Budgeting, activity-based costing,


financial performance analysis, and financial
modeling.
2. Sales: Sales analysis and sales forecasting.
3. Marketing: Market research analysis, sales
forecasting, promotions analysis, customer
analysis, and market/customer segmentation.
4. Manufacturing: Production planning and defect
analysis.
8
OLAP Limitations
 Limitation:-

 can not predict :

 what will happen in future?

 Why happens?

 How to overcome this limitation--KDD


Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
9
KDD process
 knowledge discovery from database[KDD].

 KDD- find useful information or knowledge &


pattern from data.

 Data mining uses algorithms to extract


information & pattern derived by KDD
process.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


10
Cont..
 ANN/ machine learning:- transform database
into a knowledge base system. Part of data
mining technique.
 Data mining is a part of KDD.
 KDD process- selection(obtain data from
source), preprocessing(data cleaning),
transformation(into desired data format),
data mining(obtain desired result),
interpretation(present result to user
meaningfully ).
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
11
 Data Mining:- computational process of
discovering patterns in large data sets.
 Integration of artificial intelligence, machine
learning, statistics, and database systems.
 Knowledge Discovery in Databases (KDD)
process:-
1. Data Selection
2. Pre-processing (attribute extraction &
Normalization)
3. Transformation- transform data into desired format.
4. Data Mining-- discovering patterns.
5. Interpretation/Evaluation
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
12
Data mining Core Tasks, Apps &
Algorithms
1. Classification task :- Identifying to which category an
object belongs to.
Applications: e-mail Spam detection, Image recognition.
Algorithms: SVM, nearest neighbors, random forest.
2. Regression task :- Predicting a attribute value associated
with an object.
Applications: Drug response, Stock prices.
Algorithms: SVR, ridge regression.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
3. Clustering task:- Automatic grouping of similar
objects into sets.
Applications: Customer segmentation, Grouping
experiment outcomes
Algorithms: k-Means, spectral clustering.
4. Dimensionality reduction task:- How to choose
a good set of attributes.
Applications: Visualization, Increased efficiency
Algorithms: PCA, feature selection.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan] 14


5. Anomaly detection (Outlier/change/deviation
detection) – The identification of unusual data
records, that might be interesting or data errors
that require further investigation.
6. Association rule learning (Dependency
modeling) – Searches for relationships between
variables.
For example, a supermarket might gather data on
customer purchasing habits. Using association rule
learning, the supermarket can determine which
products are frequently bought together and use
this information for marketing purposes. This is
sometimes referred to as market basket analysis.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
16
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
pcai.com
• Knowledge Based Systems, AI Languages,
Neural Networks, Machine Learning, Genetic
Algorithms, Evolutionary Software, Expert
Systems, Fuzzy Logic, Data Mining, Intelligent
Agents, Business Rules, Case-Based
Reasoning, Common Sense, Data
Visualization, Inferencing, Forecasting, Pattern
Matching, Speech, Rule-Based Systems, Text
Mining, Vision, Robotics.

17
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Analytic is a never ending process.
Analytic is the Major part of Data science,
Analytic is a never ending process because of
progressive technological change requirements as well
as the business change requirements.

The beauty of Analytics is that two data scientist with


same problem may come up with two different new
solutions.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
18
Gartner's Hype Cycle for Advanced Analytics and Data Science -2015

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
20
Time to explore[Activity-01]
Investigate the numerous Data mining and
analytic Core Tasks, Applications & Algorithms
and prepare your investigation report.
RBT – Revised Bloom’s Taxonomy
KL1 – Remember,
KL2-Understand,
KL3-Apply,
KL4-Analyse,
KL5-Evaluate,
KL6-Create
CO – Course Outcome

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


21
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


22
Notes Unit 8: Mean,
Median, Standard
Deviation
I. Mean and Median
The MEAN is the numerical average
of the data set.
The mean is found by adding all the
values in the set, then dividing the
sum by the number of values.
The MEDIAN is the number that is in the
middle of a set of data

1. Arrange the numbers in the set in


order from least to greatest.
2. Then find the number that is in the
middle.
Ex 1: These are Abby’s science test
scores. Find the mean and median.

86 97 84

73 63 88

97 100 95
97
84
Lets find Abby’s 88
MEAN science test 100
score? 95
63
73
783 ÷ 9 86
+ 97
The mean is 87 783
63 73 84 86 88 95 97 97 100

The median is 88.

Half the numbers are Half the numbers are

less than the median. greater than the median.


Median
Sounds like
MEDIUM
Think middle when you hear median.
How do we find
the MEDIAN
when two numbers are in the middle?

1. Add the two numbers.

2. Then divide by 2.
Ex 2: Find the median.

63 73 84 88 95 97 97 100

88 + 95 = 183

183 ÷ 2 The median is


91.5
II. Standard Deviation
A. Definition and Notation
Standard Deviation shows the
variation in data. If the data is close
together, the standard deviation will
be small. If the data is spread out, the
standard deviation will be large.
Standard Deviation is often denoted

by the lowercase Greek letter sigma, .
B. Bell Curve: The bell curve, which
represents a normal distribution of data,
shows what standard deviation represents.

One standard deviation away from the mean (  ) in


either direction on the horizontal axis accounts for
around 68 percent of the data. Two standard
deviations away from the mean accounts for roughly
95 percent of the data with three standard deviations
representing about 99 percent of the data.
C. Steps to Finding
Standard Deviation
1) Find the mean of the data.
2) Subtract the mean from each value.
3) Square each deviation of the mean.
4) Find the sum of the squares.
5) Divide the total by the number of
items.
6)Take the square root.
D. Standard Deviation
Formula
The standard deviation formula can be
represented using Sigma Notation:
The expression

  ( x   ) 2 under the radical is


called the ‘variance’.
n

The standard deviation formula is the


square root of the variance.
Ex 1: Find the standard deviation
The math test scores of five students
are: 92,88,80,68 and 52.
1) Find the mean: (92+88+80+68+52)/5 = 76.
2) Find the deviation from the mean:
92-76=16
88-76=12
80-76=4
68-76= -8
52-76= -24
3) Square the deviation from the
mean: (16)  256
2

(12)  144
2

(4)  16
2

(8)  64
2

(24)  576
2

4) Find the sum of the squares of the


deviation from the mean:
256+144+16+64+576= 1056
5) Divide by the number of data items:
1056/5 = 211.2
6) Find the square root of the
variance: 211.2  14.53

Thus the standard deviation of


the test scores is 14.53.
Ex 2: Standard Deviation

A different math class took the


same test with these five test
scores: 92,92,92,52,52.

Find the standard deviation for


this class.
Remember:
1) Find the mean of the data.
2) Subtract the mean from each value.
3) Square each deviation of the mean.
4) Find the sum of the squares.
5) Divide the total by the number of
items.
6)Take the square root.
The math test scores of five students
are: 92,92,92,52 and 52.
1) Find the mean: (92+92+92+52+52)/5 = 76
2) Find the deviation from the mean:
92-76=16 92-76=16 92-76=16
52-76= -24 52-76= -24
3) Square the deviation from the mean:
(16)2  256(16)2  256(16)2  256
  
4) Find the sum of the squares:
256+256+256+576+576= 1920
5) Divide the sum of the squares
by the number of items :
1920/5 = 384 variance
6) Find the square root of the variance:
384  19.6
Thus the standard deviation of the
second set of test scores is 19.6.
III. Analyzing the Data:
Consider both sets of scores. Both
classes have the same mean, 76.
However, each class does not have the
same scores. Thus we use the standard
deviation to show the variation in the
scores. With a standard variation of
14.53 for the first class and 19.6 for the
second class, what does this tell us?
Class A: 92,88,80,68,52
Class B: 92,92,92,52,52

With a standard variation of 14.53


for the first class and 19.6 for the
second class, the scores from the
second class would be more spread
out than the scores in the second
class.
Summary:
The mean is the average, and the
median is the number in the middle
when you order all the numbers from
least to greatest.
As we have seen, standard deviation
measures the dispersion of data.
The greater the value of the
standard deviation, the further the
data tend to be dispersed from the
mean.
Matters of Discussion
Refresh Basic
statistics:
mean, median, standard
deviation, variance, correlation,
covariance
R-Implements

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Statistical analysis in R
 Statistical analysis in R is performed by using
many in-built functions.

 Most of these functions are part of the R


base package.

 These functions take R vector as an input


along with the arguments and give the result.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
Mean
It is calculated by taking the sum of the values and
dividing with the number of values in a data series.
The function mean() is used to calculate this in R.
Syntax
The basic syntax for calculating mean in R is −
mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used −
x is the input vector.
trim is used to drop some observations from both end of
the sorted vector.
na.rm is used to remove the missing values from the
input vector.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Example
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
O/P—
[1] 8.22

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Applying Trim Option
 When trim parameter is supplied, the values in
the vector get sorted and then the required
numbers of observations are dropped from
calculating the mean.
 When trim = 0.3, 3 values from each end will be
dropped from the calculations to find mean.
 In this case the sorted vector is (−21, −5, 2, 3,
4.2, 7, 8, 12, 18, 54) and the values removed
from the vector for calculating mean are
(−21,−5,2) from left and (12,18,54) from right.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
5
Example
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x, trim = 0.3)
print(result.mean)
O/P-
5.55

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
Applying NA Option
 If there are missing values, then the mean
function returns NA.

 To drop the missing values from the


calculation use na.rm = TRUE. which means
remove the NA values.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
Example; O/P–
[1] NA
[1] 8.22
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)

# Find mean.
result.mean <- mean(x)
print(result.mean)

# Find mean dropping NA values.


result.mean <- mean(x,na.rm = TRUE)
print(result.mean)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
***************************

Median

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
Median
The middle most value in a data series is called the
median. The median() function is used in R to
calculate this value.
Syntax
The basic syntax for calculating median in R is −
median(x, na.rm = FALSE)
Following is the description of the parameters used −
x is the input vector.
na.rm is used to remove the missing values from
the input vector.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
Example
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find the median.
median.result <- median(x)
print(median.result)
O/P—
[1] 5.6

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


11
*************************

Mode

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Mode
 The mode is the value that has highest
number of occurrences in a set of data.
 Unike mean and median, mode can have
both numeric and character data.
 R does not have a standard in-built function
to calculate mode.
 So we create a user function to calculate
mode of a data set in R.
 This function takes the vector as input and
gives the mode value as output.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
13
Example
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Create the vector with numbers.


v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
O/P
# Calculate the mode using the user function. [1] 2
result <- getmode(v) [1] “it”
print(result)

# Create the vector with characters.


charv <- c("o","it","the","it","it")

# Calculate the mode using the user function.


result <- getmode(charv)
print(result)
14
*************************

standard deviation and variance

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
standard deviation
 ‘Standard deviation is the measure of the
dispersion of the values’.
 The higher the standard deviation, the wider
the spread of values.
 The lower the standard deviation, the
narrower the spread of values.
 In simple words the formula is defined as –
Standard deviation is the square root of the
‘variance’.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
16
Variance – It is defined as the squared
differences between the observed value and
expected value.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


17
Standard deviation in R
x <- c(34,56,87,65,34,56,89) #creates list 'x'
with some values in it.
sd(x) #calculates the standard deviation of the
values in the list 'x‘
--------------------------------------------
create a list ‘x’ and add some value to it. Then
we can find the standard deviation of those
values in the list.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


18
Computing variance of a vector
# # enter data
filter_none
y=c(445, 530, 540, 510, 570, 530, 545, 545, 505,
535, 450, 500, 520, 460, 430, 520, 520, 430,
535, 535, 475, 545, 420, 495, 485, 570, 480,
495, 470, 490)
# # calculate
var(y)
sd(y)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
********************

Covariance and Correlation in R

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


20
Covariance and Correlation in R Programming
 Covariance and Correlation are terms used in
statistics to measure relationships between
two random variables.
 Both of these terms measure linear
dependency between a pair of random
variables or bivariate data.
 Y is the response variable(dependent);
X is the predictor variable( Independent)
Y=aX+
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
21
Covariance
 In R programming, covariance can be measured
using cov() function.
 Covariance is a statistical term used to measures the
direction of the linear relationship between the data
vectors.
 Mathematically,

where,
A represents the A data vector
B represents the B data vector
mean of A data vector
mean of B data vector

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


22
Cont..
# Data vectors
x <- c(1, 3, 5, 10)
y <- c(2, 4, 6, 20)
# Print covariance using different methods
print(cov(x, y))
print(cov(x, y, method = "pearson"))
******************************
Output:

[1] 30.66667
[1] 30.66667

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
Correlation
 Correlation is a relationship term in statistics
that uses the covariance method to measure
how strong the vectors are related.
 cor(x, y, method)
 x and y represents the data vectors
 method defines the type of method to be used
to compute covariance. Default is "pearson".
 Covariance indicates the direction of the linear
relationship between variables while correlation
measures both the strength and direction of the
linear relationship between two variables. 24
Correlation(cont..)
 Correlation means association - more
precisely it is a measure of the extent to
which two variables are related.
 There are three possible results of a
correlational study:
 a positive correlation,
 a negative correlation,
 no correlation.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


25
Correlation(cont..)
 A positive correlation is a relationship
between two variables in which both
variables move in the same direction.
 when one variable increases as the other
variable increases, or one variable decreases
while the other decreases.
 An example of positive correlation would be
height and weight.
 Taller people tend to be heavier.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
26
Correlation(cont..)
 A negative correlation is a relationship
between two variables in which an increase
in one variable is associated with a decrease
in the other.
 An example of negative correlation would be
height above sea level and temperature.
 As you climb the mountain (increase in
height) it gets colder (decrease in
temperature).
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
27
Correlation(cont..)
 A zero correlation exists when there is no
relationship between two variables.
 For example there is no relationship between the
amount of tea drunk and level of intelligence.

28
Guidelines to interpreting Pearson's correlation coefficient
 measure of the strength of a linear association between two
variables
 Pearson correlation coefficient, r, can take a range of values
from +1 to -1. A value of 0 indicates that there is no
association between the two variables.
 for example, that r = .67. That is, as height increases so does
basketball performance.
Coefficient, r

Strength of Association Positive Negative


Small .1 to .3 -0.1 to -0.3
Medium .3 to .5 -0.3 to -0.5
Large .5 to 1.0 -0.5 to -1.0
29
Cont..
# Data vectors
x <- c(1, 3, 5, 10)

y <- c(2, 4, 6, 20)

# Print correlation using different methods


print(cor(x, y))

print(cor(x, y, method = "pearson"))


o/p
[1] 0.9724702
[1] 0.9724702
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
30
Example
# R program to illustrate
# pearson Correlation Testing
# Using cor()
Output:
# Taking two numeric
# Vectors with same length Pearson correlation coefficient is: 0.5357143
x = c(1, 2, 3, 4, 5, 6, 7)
y = c(1, 3, 6, 2, 7, 4, 5)

# Calculating
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "pearson")

# Print the result


print("Pearson correlation coefficient is:", result)
Correlation measures the linear relationship between objects
31
ACTIVITY-4(LAB-01)
Investigate the R implements of mean, median,
standard deviation, variance, correlation, and
covariance.
Please practice those above statistical
computations in R- Studio, prepare a report by
taking all practice screen sorts along with
relevant analysis, and finally, upload to the
respective Google classroom assignment
section.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


32
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


33
Data Preprocessing

 Major Tasks in Data Preprocessing: An Overview

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation and Data Discretization

 Summary

1
Major Tasks in Data Preprocessing
 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files
 Data reduction
 Dimensionality reduction
 Data reduction
 Data transformation and data discretization
 Normalization
 Concept hierarchy generation

2
Data Cleaning
 Data in the Real World Is Dirty: Lots of potentially incorrect data,
e.g., instrument faulty, human or computer error, transmission error
 incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
 e.g., Occupation=“ ” (missing data)
 noisy: containing noise, errors, or outliers
 e.g., Salary=“−10” (an error)
 inconsistent: containing discrepancies in codes or names, e.g.,
 Age=“42”, Birthday=“03/07/2010”
 Was rating “1, 2, 3”, now rating “A, B, C”
 discrepancy between duplicate records
 Intentional (e.g., disguised missing data)
 Jan. 1 as everyone’s birthday?
3
Incomplete (Missing) Data

 Data is not always available


 E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
 Missing data may be due to
 inconsistent with other recorded data and thus deleted
 data not entered due to misunderstanding
 certain data may not be considered important at the
time of entry
 not register history or changes of the data
 Missing data may need to be inferred

4
How to Handle Missing Data?
 Ignore the tuple: usually done when class label is missing
(when doing classification)—not effective when the % of
missing values per attribute varies considerably
 Fill in the missing value manually: tedious + infeasible?
 Fill in it automatically with
 a global constant : e.g., “unknown”, a new class?!
 the attribute mean
 the attribute mean for all samples belonging to the
same class: smarter
 the most probable value: inference-based such as
Bayesian formula or decision tree
5
Noisy Data
 Noise: random error or variance in a measured variable
 Incorrect attribute values may be due to
 faulty data collection instruments

 data entry problems

 data transmission problems

 technology limitation

 inconsistency in naming convention

 Other data problems which require data cleaning


 duplicate records

 incomplete data

 inconsistent data

6
How to Handle Noisy Data?

 Binning
 first sort data and partition into (equal-frequency) bins

 then one can smooth by bin means, smooth by bin

median, etc.
 Regression
 smooth by fitting the data into regression functions

 Clustering
 detect and remove outliers

 Combined computer and human inspection


 detect suspicious values and check by human (e.g.,

deal with possible outliers)

7
Data Cleaning as a Process
 Data discrepancy detection
 Use metadata (e.g., domain, range, dependency, distribution)

 Check field overloading

 Check uniqueness rule, consecutive rule and null rule

 Use commercial tools

 Data scrubbing: use simple domain knowledge (e.g., postal

code, spell-check) to detect errors and make corrections


 Data auditing: by analyzing data to discover rules and

relationship to detect violators (e.g., correlation and clustering


to find outliers)
 Data migration and integration
 Data migration tools: allow transformations to be specified

 ETL (Extraction/Transformation/Loading) tools: allow users to


specify transformations through a graphical user interface
 Integration of the two processes
 Iterative and interactive (e.g., Potter’s Wheels)

8
 Data Preprocessing: An Overview

 Major Tasks in Data Preprocessing

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation and Data Discretization

 Summary

9
Data Integration
 Data integration:
 Combines data from multiple sources into a coherent store
 Schema integration: e.g., A.cust-id  B.cust-#
 Integrate metadata from different sources
 Entity identification problem:
 Identify real world entities from multiple data sources, e.g., Bill
Clinton = William Clinton
 Detecting and resolving data value conflicts
 For the same real world entity, attribute values from different
sources are different
 Possible reasons: different representations, different scales, e.g.,
metric vs. British units
10
Handling Redundancy in Data Integration

 Redundant data occur often when integration of multiple


databases
 Object identification: The same attribute or object
may have different names in different databases
 Derivable data: One attribute may be a “derived”
attribute in another table, e.g., annual revenue
 Redundant attributes may be able to be detected by
correlation analysis and covariance analysis
 Careful integration of the data from multiple sources may
help reduce/avoid redundancies and inconsistencies and
improve mining speed and quality
11
Correlation Analysis (Nominal Data)
 Χ2 (chi-square) test
(Observed  Expected) 2
 
2

Expected
 The larger the Χ2 value, the more likely the variables are
related
 The cells that contribute the most to the Χ2 value are
those whose actual count is very different from the
expected count
 Correlation does not imply causality
 # of hospitals and # of car-theft in a city are correlated
 Both are causally linked to the third variable: population

12
Chi-Square Calculation: An Example

Play chess Not play chess Sum (row)


Like science fiction 250(90) 200(360) 450

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

 Χ2 (chi-square) calculation (numbers in parenthesis are


expected counts calculated based on the data distribution
in the two categories)
(250  90) 2 (50  210) 2 (200  360) 2 (1000  840) 2
 
2
    507.93
90 210 360 840
 It shows that like_science_fiction and play_chess are
correlated in the group
13
Correlation Analysis (Numeric Data)

 Correlation coefficient (also called Pearson’s product


moment coefficient)

i1 (ai  A)(bi  B) 


n n
(ai bi )  n AB
rA, B   i 1
(n  1) A B (n  1) A B

where n is the number of tuples, A and B are the respective


means of A and B, σA and σB are the respective standard deviation
of A and B, and Σ(aibi) is the sum of the AB cross-product.
 If rA,B > 0, A and B are positively correlated (A’s values
increase as B’s). The higher, the stronger correlation.
 rA,B = 0: independent; rAB < 0: negatively correlated

14
Visually Evaluating Correlation

Scatter plots
showing the
similarity from
–1 to 1.

15
Correlation (viewed as linear relationship)
 Correlation measures the linear relationship
between objects
 To compute correlation, we standardize data
objects, A and B, and then take their dot product

a'k  (ak  mean( A)) / std ( A)


b'k  (bk  mean( B)) / std ( B)

correlation( A, B)  A' B'

16
Covariance (Numeric Data)
 Covariance is similar to correlation

Correlation coefficient:

where n is the number of tuples, A and B are the respective mean or


expected values of A and B, σA and σB are the respective standard
deviation of A and B.
 Positive covariance: If CovA,B > 0, then A and B both tend to be larger
than their expected values.
 Negative covariance: If CovA,B < 0 then if A is larger than its expected
value, B is likely to be smaller than its expected value.
 Independence: CovA,B = 0 but the converse is not true:
 Some pairs of random variables may have a covariance of 0 but are not
independent. Only under some additional assumptions (e.g., the data follow
multivariate normal distributions) does a covariance of 0 imply independence17
Co-Variance: An Example

 It can be simplified in computation as

 Suppose two stocks A and B have the following values in one week:
(2, 5), (3, 8), (5, 10), (4, 11), (6, 14).

 Question: If the stocks are affected by the same industry trends, will
their prices rise or fall together?

 E(A) = (2 + 3 + 5 + 4 + 6)/ 5 = 20/5 = 4

 E(B) = (5 + 8 + 10 + 11 + 14) /5 = 48/5 = 9.6

 Cov(A,B) = (2×5+3×8+5×10+4×11+6×14)/5 − 4 × 9.6 = 4

 Thus, A and B rise together since Cov(A, B) > 0.


Chapter 3: Data Preprocessing

 Data Preprocessing: An Overview

 Data Quality

 Major Tasks in Data Preprocessing

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation

 Summary
19
Data Reduction Strategies
 Data reduction: Obtain a reduced representation of the data set that
is much smaller in volume but yet produces the same (or almost the
same) analytical results
 Why data reduction? — A database/data warehouse may store
terabytes of data. Complex data analysis may take a very long time to
run on the complete data set.
 Data reduction strategies
 Dimensionality reduction, e.g., remove unimportant attributes

 Principal Components Analysis (PCA)

 Feature subset selection, feature creation

 Numerosity reduction (some simply call it: Data Reduction)

 Regression and Log-Linear Models

 Histograms, clustering, sampling

 Data cube aggregation

20
Principal Component Analysis (Steps)
 Given N data vectors from n-dimensions, find k ≤ n orthogonal vectors
(principal components) that can be best used to represent data
 Normalize input data: Each attribute falls within the same range
 Compute k orthonormal (unit) vectors, i.e., principal components
 Each input data (vector) is a linear combination of the k principal
component vectors
 The principal components are sorted in order of decreasing
“significance” or strength
 Since the components are sorted, the size of the data can be
reduced by eliminating the weak components, i.e., those with low
variance (i.e., using the strongest principal components, it is
possible to reconstruct a good approximation of the original data)
 Works for numeric data only
21
Attribute Subset Selection
 Another way to reduce dimensionality of data
 Redundant attributes
 Duplicate much or all of the information contained in
one or more other attributes
 E.g., purchase price of a product and the amount of
sales tax paid
 Irrelevant attributes
 Contain no information that is useful for the data
mining task at hand
 E.g., students' ID is often irrelevant to the task of
predicting students' GPA

22
Data Reduction
 Reduce data volume by choosing alternative, smaller
forms of data representation
 Parametric methods (e.g., regression)
 Assume the data fits some model, estimate model

parameters, store only the parameters, and discard


the data (except possible outliers)
 Ex.: Log-linear models—obtain value at a point in m-

D space as the product on appropriate marginal


subspaces
 Non-parametric methods
 Do not assume models

 Major families: histograms, clustering, sampling, …

23
Parametric Data Reduction: Regression
and Log-Linear Models
 Linear regression
 Data modeled to fit a straight line

 Often uses the least-square method to fit the line

 Multiple regression
 Allows a response variable Y to be modeled as a

linear function of multidimensional feature vector

24
Types of Sampling

 Simple random sampling


 There is an equal probability of selecting any particular
item
 Sampling without replacement
 Once an object is selected, it is removed from the
population
 Sampling with replacement
 A selected object is not removed from the population

 Stratified sampling:
 Partition the data set, and draw samples from each
partition (proportionally, i.e., approximately the same
percentage of the data)
 Used in conjunction with skewed data

25
Sampling: With or without Replacement

Raw Data
26
Sampling: Cluster or Stratified Sampling

Raw Data Cluster/Stratified Sample

27
Chapter 3: Data Preprocessing

 Data Preprocessing: An Overview

 Data Quality

 Major Tasks in Data Preprocessing

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation

28
Data Transformation
 A function that maps the entire set of values of a given attribute to a
new set of replacement values s.t. each old value can be identified
with one of the new values
 Methods
 Smoothing: Remove noise from data
 Attribute/feature construction
 New attributes constructed from the given ones
 Aggregation: Summarization, data cube construction
 Normalization: Scaled to fall within a smaller, specified range
 min-max normalization
 z-score normalization
 Discretization: Concept hierarchy climbing

29
Data Normalization
 Min-max normalization: to [new_minA, new_maxA]
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
 Ex. Let income range $12,000 to $98,000 normalized to [0.0,
73,600  12,000
1.0]. Then $73,000 is mapped to 98,000  12,000 (1.0  0)  0  0.716
 Z-score normalization (μ: mean, σ: standard deviation):
v  A
v' 
 A

73,600  54,000
 Ex. Let μ = 54,000, σ = 16,000. Then  1.225
16,000

30
ACTIVITY-5
Investigate the Handling of Redundancy in Data
Integration.
Explore the usage of correlation analysis and
covariance analysis towards eliminating the
redundant attributes along with relevant
computations and scenario analysis.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


31
Matters of Discussion[ML]
Linear regression
- Simple & Multiple linear regression

- Estimating the regression equation

- prediction variable selection in linear regression

Machine Learning Case


1
ML in Simple Linear Regression
 Simple linear regression is what you can use
when you have one independent variable[X] and
one dependent variable [Y].

 While training the model :


 x: input training data (univariate – one input
variable(parameter))
 y: labels to data (supervised learning)
 e.g.-For Given Height(X), predict the Weight (Y);
Refer ---Linear Regression in Weka
2
ML in Multiple linear regression
 Multiple linear regression is what you can
use when you have a bunch of different
independent variables or predictor variables
[X1,X2,..Xn] and one dependent variable or
response variable[Y].
 The multiple linear regression explains the
relationship between one continuous
dependent variable (y) and two or more
independent variables (x1, x2, x3… etc).

Refer ---Logistic Regression in Weka


3
Cont..
 Multiple Input Factors and One predicted O/P.
 Dimension reduction technique may be used to
filter out the good set of independent variables
[Input Factors ]that influence on the dependent
variable[Y].
 Predictive variable Y or O/P .
 Eye disease type prediction[predicted O/P] based
on input disease symptoms [ input factors].

Refer ---Logistic Regression in Weka


4
Regression problem in machine learning
 A regression can have real valued or discrete input variables.
 Regression models are used to predict a continuous value.
 Predicting prices of a house given the features of house like
size, price etc. is one of the common examples of
Regression.

Continuous variables are simply running numbers.


Categorical variables are categories.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Cont..

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
Regression examples

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


Hypothesis function for Linear Regression :

When training the model –


 it fits the best line to predict the value of y for a given value of
x. The model gets the best regression fit line by finding the
best θ1 and θ2 values.
 θ1: intercept
 θ2: coefficient of x
 Once we find the best θ1 and θ2 values, we get the best fit
line. So when we are finally using our model for prediction, it
will predict the value of y for the input value of x.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Linear regression Applications
 Given an input x we would like to
compute an output y
 For example:
- Predict height from age
Y
- Predict Google ’ s price from
Yahoo’s price

- Predict distance from wall from


sensors X

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


Linear regression in ML
• Given an input x we would like to
compute an output y
• In linear regression we assume that
y and x are related with the
following equation:
What we are Observed values Y
trying to predict

y = wx+

where w is a parameter and 


represents measurement or other
X
noise

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


Linear regression in ML
y = wx + e
• Our goal is to estimate w from a
Y
training data of <xi,yi> pairs
• Optimization goal: minimize squared
error (least squares):

arg min w å ( yi - wxi ) 2


i
X

• Why least squares?


- minimizes squared distance
between measurements and predicted
line.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


Solving linear regression
• To optimize:
• We just take the derivative w.r.t. to w ….
prediction


å
¶w i
(yi - wxi ) 2
= 2å-xi (yi - wxi )
i

Training data of <xi,yi> pairs

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


Solving linear regression
• To optimize – closed form:
• We just take the derivative w.r.t. to w and set to 0:


å
¶w i
(yi - wxi ) 2
= 2å -xi (yi - wxi ) Þ
i

2å xi (yi - wxi ) = 0 Þ 2å xi yi - 2å wxi xi = 0


i i i

Training data of <xi,yi> pairs


å x y = å wx
i i
2
i Þ
i i

åx y i i
w= i

åx 2
i
i
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Implementation logic
# observations
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {w}".format(b[0], b[1]))
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
14
Analysis
y = wx+
Estimated coefficients:

 = -0.0586206896552
w = 1.45747126437
The linear regression model ----------
y = 1.45747126437 x + (-0.0586206896552)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
Cost function(J) of Linear Regression Model
Cost function(J) of Linear Regression is the Root
Mean Squared Error (RMSE) between predicted
y value (predicted) and true y value (y).

Reduced Cost function (minimizing RMSE value)


Best value of w and  should be estimated that
minimize the error between predicted y value
(predicted) and true y value (y).
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
16
Steps to Establish a Regression
1) Carry out the experiment of gathering a sample of
observed values of height and corresponding weight.
2) Create a relationship model using the lm() functions
in R.
3) Find the coefficients from the model created and
create the mathematical equation using these
4) To predict the weight of new persons, use the
predict() function in R.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
17
1. Input Data
# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48
2. lm() Function - creates the relationship model between the
predictor and the response variable.
lm(formula,data)
formula is a symbol presenting the relation between x and y.
data is the vector on which the formula will be applied.

18
Cont..
Create Relationship Model & get the Coefficients
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
N.B- "x ~ y" meaning that x and y are of the same order of magnitude.
3. O/P----Coefficients:
Find the coefficients from the model created and create the
mathematical equation using these
(Intercept) x
-38.4551 0.6746
Y = 0.6746 x + (-38.4551)

19
4. use the predict() function in R.

predict(object, newdata)

object is the formula which is already created


using the lm() function.

newdata is the vector containing the new value


for predictor variable.

20
Predict the weight of new persons
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result) O/P-- 76.22869
21
# Create the predictor and response variable.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab =
"Height in cm")

https://www.tutorialspoint.com/r/r_linear_regression.htm
22
23
ACTIVITY -7( LAB-03)
Consider any dataset and Implement and
investigate the Linear regression algorithm, and
analyze the results in details.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
ACTIVITY-08
Formulate Hypothesis function for Linear
Regression and Investigate the computational
analysis of linear regression model to estimate
the coefficients for any real world application.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


25
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


26
Matters of Discussion
1) Simple Linear Regression computation

2) ANOVA in R

3) Autocorrelation

Machine Learning Case


1
1. Simple Linear Regression
 Simple linear regression is a statistical method use to
understand the relationship between two variables, x and y.
 One variable, x, is known as the predictor variable.
 The other variable, y, is known as the response variable.
 For example, suppose we have the following dataset with
the weight and height.

Let weight be the


predictor variable(I/P)
and let height be the
response variable
(O/P).

2
Cont..
 If we graph these two variables using a scatterplot,
with weight on the x-axis and height on the y-axis,
here’s what it would look like:

Refer ---Linear Regression in Weka


3
Cont..
 Suppose we’re interested in understanding
the relationship between weight and height.
 From the scatterplot we can clearly see that
as weight increases, height tends to increase
as well,
 but to actually quantify this relationship
between weight and height, we need to use
linear regression.

Refer ---Linear Regression in Weka


4
Cont..
 Using linear regression, we can find the line that best
“fits” our data.
 This line is known as the least squares regression line
and it can be used to help us understand the
relationships between weight and height.
 The formula for the line of best fit is written as:

ŷ = b0 + b1x

 where ŷ is the predicted value of the response


variable,
 b0 is the y-intercept, b1 is the regression coefficient,
and x is the value of the predictor variable.
5
Quantify the relationship through Linear
Regression
 Simple linear regression is a statistical
method you can use to quantify the
relationship between a predictor variable
and a response variable.
 Example:

Refer ---Linear Regression in Weka


6
Cont..
 Use the following steps to fit a linear
regression model to this dataset, using
weight as the predictor variable(I/P) and
height as the response variable(O/P).
 Step 1: Calculate X*Y, X2, and Y2

Refer ---Linear Regression in Weka


7
Cont..
 Step 2: Calculate ΣX, ΣY, ΣX*Y, ΣX2, and ΣY2

Refer ---Linear Regression in Weka


8
Cont.. ŷ = b0 + b1x
 Step 3: Calculate b0
The formula to calculate b0 is:
[(ΣY)(ΣX2) – (ΣX)(ΣXY)] / [n(ΣX2) – (ΣX)2]
In this example,
b0 =
[(477)(222755) – (1237)(85125)] / [7(222755) – (1237)2]
= 32.783
NB- n is the sample size= 7

Refer ---Linear Regression in Weka


9
Cont.. ŷ = b0 + b1x
 Step 4: Calculate b1
The formula to calculate b1 is:
[n(ΣXY) – (ΣX)(ΣY)] / [n(ΣX2) – (ΣX)2]
In this example,
b1 =
[7(85125) – (1237)(477)] / [7(222755) – (1237)2]
= 0.2001

Refer ---Linear Regression in Weka


10
Cont.. ŷ = b0 + b1x
 Step 5: Place b0 and b1 in the estimated linear
regression equation.
The estimated linear regression equation is:
ŷ = b0 + b1*x
In our example,
it is
ŷ = 32. 783 + (0.2001)*x
b0 = 32.7830.
When weight is zero pounds, the predicted height is
32.783 inches. Sometimes the value for b0 can be
useful to know, but in this example it doesn’t actually
make sense to interpret b0 since a person can’t
weigh zero pounds.
b1 = 0.2001. A one pound increase in weight is
associated with a 0.2001 inch increase in height. 11
2. ANOVA in R
 ANOVA also known as Analysis of variance
 used to investigate relations between categorical variables and
continuous variable in R Programming.
 It is a type of hypothesis testing for population variance.
 R – ANOVA Test
ANOVA test involves setting up:
• Null Hypothesis: All population means are equal.
• Alternate Hypothesis: At least one population mean is
different from other.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Cont..
 ANOVA tests are of two types:
•One way ANOVA: It takes one categorical group into
consideration.
•Two way ANOVA: It takes two categorical group into
consideration.
 The Dataset [Motor Trend Car Road Tests]
 The mtcars (motor trend car road test) dataset is used
which consist of 32 car brands and 11 attributes.
 The dataset comes preinstalled in dplyr package in R.
 To get started with ANOVA, we need to install and load
the dplyr package.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
Performing One Way ANOVA test in R
 One way ANOVA test is performed using mtcars dataset which comes
preinstalled with dplyr package between --disp attribute, a continuous
attribute and gear attribute, a categorical attribute.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time

[, 8] vs Engine (0 = V-shaped, 1 = straight)

[, 9] am Transmission (0 = automatic, 1 = manual)

[,10] gear Number of forward gears


[,11] carb Number of carburetors 14
# Installing the package
install.packages(dplyr)

# Loading the package


library(dplyr)

# Variance in mean within group and between group


boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")

# Step 1: Setup Null Hypothesis and Alternate Hypothesis


# H0 = mu = mu01 = mu02(There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal

# Step 2: Calculate test statistics using aov function


mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))
summary(mtcars_aov)

# Step 3: Calculate F-Critical Value


# For 0.05 Significant value, critical value = alpha = 0.05

# Step 4: Compare test statistics with F-Critical value


# and conclude test p < alpha, Reject Null Hypothesis

15
Result Analysis

mean values of gear with respect of displacement.

categorical variable is gear on which factor function is used and


continuous variable is disp.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


16
The degrees of freedom (DF) are the number of independent pieces of information.

17
Cont..
 The summary shows that the gear attribute is very
significant to displacement (Three stars denoting it).

 Also, the P value is less than 0.05, so proves that gear


is significant to displacement i.e related to each other
and we reject the Null Hypothesis.

 Obtained significant result……….

 Displacement is strongly related to Gears in cars i.e.


displacement is dependent on gears with p < 0.05.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


18
Key Insights
 The F-value is simply a ratio of two variances.
 The F value in one way ANOVA is a tool to help you answer
the question “Is the variance between the means of two
populations significantly different?”
 The F value in the ANOVA test also determines the P value;
 The P value is the probability of getting a result at least as
extreme as the one that was actually observed.
 The higher the F-value, the lower the corresponding p-
value.
 If the p-value is below a certain threshold (e.g. α = . 05), we
can reject the null hypothesis of the ANOVA and conclude
that there is a statistically significant difference between
group means.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Two Way ANOVA test in R
 Two-way ANOVA test is performed using mtcars dataset
which comes preinstalled with dplyr package between
 disp attribute, a continuous attribute and gear attribute,
a categorical attribute, am attribute, a categorical
attribute.
am----Transmission (0 = automatic, 1 = manual)
Disp—displacement ; gear -Number of forward gears

20
# Installing the package
install.packages(dplyr)

# Loading the package


library(dplyr)

# Variance in mean within group and between group


boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),
xlab = "gear", ylab = "disp", main = "Manual")

# Step 1: Setup Null Hypothesis and Alternate Hypothesis


# H0 = mu0 = mu01 = mu02(There is no difference between
# average displacement for different gear)
# H1 = Not all means are equal

# Step 2: Calculate test statistics using aov function


mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) *
factor(mtcars$am))
summary(mtcars_aov2)

# Step 3: Calculate F-Critical Value


# For 0.05 Significant value, critical value = alpha = 0.05

# Step 4: Compare test statistics with F-Critical value


# and conclude test p < alpha, Reject Null Hypothesis
21
O/P

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


22
O/P analysis
1) The summary shows that gear attribute is very
significant to displacement(Three stars denoting it)

2) and am attribute is not much significant to


displacement.

3) P-value of gear is less than 0.05, so it proves that gear


is significant to displacement i.e related to each other.

4) P-value of am is greater than 0.05, am is not


significant to displacement i.e not related to each other.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
Final result on mtcat
1) Displacement is strongly related to Gears in cars i.e

displacement is dependent on gears with p < 0.05.

2) Displacement is strongly related to Gears but not

related to transmission mode in cars with p 0.05 with

am.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
3. Autocorrelation
 Already we have discussed the Time-series data to identify
the trend, sessional, and cyclic patterns.
 Autocorrelation, also known as serial correlation, refers to the
degree of correlation of the same variables between two
successive time intervals.
 It is mainly used to measure the relationship between the
actual values and the previous values.
 The value of autocorrelation ranges from -1 to 1.
 A value between -1 and 0 represents negative
autocorrelation. A value between 0 and 1 represents positive
autocorrelation.
 Autocorrelation gives information about the trend of a set of
historical data, so it can be useful in the technical analysis for
the equity market.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


25
Cont..
 In R, we can calculate the autocorrelation in a vector by
using the module tseries. Within this module, we have to
use acf() method to calculate autocorrelation.
Syntax:
acf(vector, lag, pl)
Parameter:
•vector is the input vector
•lag represents the number of lags
•pl is to plot the auto correlation
 A “lag” is a fixed amount of passing time; One set of
observations in a time series is plotted (lagged) against
a second, later set of data. The kth lag is the time period
that happened “k” time points before time i.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
26
auto correlation in a vector with different lags
# load tseries module
library(tseries)

# create vector1 with 8 time periods


vector1=c(34,56,23,45,21,64,78,90)

# calculate auto correlation with no lag


print(acf(vector1,pl=FALSE))

# calculate auto correlation with lag 0


print(acf(vector1,lag=0,pl=FALSE))

# calculate auto correlation with lag 2


print(acf(vector1,lag=2,pl=FALSE))

# calculate auto correlation with lag 6


print(acf(vector1,lag=6,pl=FALSE))

lag” is a fixed amount of passing time


27
auto correlation in a vector with different lags

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


28
ACTIVITY-09(Lab-04)
Formulate a null Hypothesis by considering any
scenario and Investigate the computational
analysis of one way and two way ANOVA to
estimate the P-value to take a decision.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


29
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


30
Matters of Discussion
Classifications - Classification methods

Decision Tree,
Naïve Bayes,
K-Nearest Neighbors
Classification And Regression Trees –Logistic Regression
Models. [To be discussed Later]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Supervised vs. Unsupervised Learning
 Supervised learning (classification)
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in
the data
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
2
Prediction Problems:
Classification
 Classification
 predicts categorical class labels

 classifies data (constructs a model) based on the


training set and the values (class labels) in a
classifying attribute and uses it in classifying new
data
 Typical applications
 Credit/loan approval:

 Medical diagnosis: if a tumor is cancerous or


benign
 Fraud detection: if a transaction is fraudulent

 Web page categorization: which category it is

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Classification—A Two-Step Process
 Model construction: describing a set of
predetermined classes
 Each tuple/sample is assumed to belong to a

predefined class, as determined by the class label


attribute
 The set of tuples used for model construction is

training set
 The model is represented as classification rules,

decision trees, or mathematical formulae

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Cont..
 Model usage: for classifying future or unknown
objects
 Estimate accuracy of the model

 Accuracy rate is the percentage of test set

samples that are correctly classified by the


model
 Test set is independent of training set

 If the accuracy is acceptable, use the model to

classify new data.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Process (1): Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


M ike A ssistant P rof 3 no (Model)
M ary A ssistant P rof 7 yes
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
IF rank = „professor‟
D ave A ssistant P rof 6 no
OR years > 6
A nne A ssociate P rof 3 no
THEN tenured = „yes‟
6
Process (2): Using the Model in Prediction

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
Performance measures
The performance of the developed model can
be evaluated using Confusion Matrix
Predicted Class
Class = Positive Class =Negative

Actual Class = Positive True Positive False Negative


Class (TP) (FN)
Class = Negative False Positive True Negative
(FP) (TN)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Performance measures
• Performance metrics

9
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Decision Tree
Decision tree is a flow-chart-like tree structure that
consists of nodes and branches. (Root node, Internal
node and Leaf node)
• Root Node: The top node of the decision tree with no
incoming branch and one or more outgoing branches.
• Internal Node(s): has (have) one incoming branch and
one or more outgoing branches.
• Leaf node: has only one incoming but no outgoing
branch and it represents the class label.
• Each internal node and root node denotes an
attribute (Feature), each branch represents an
outcome of the test.
10
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-

conquer manner
 At start, all the training examples are at the root

 Attributes are categorical (if continuous-valued, they are

discretized in advance)
 Examples are partitioned recursively based on selected

attributes
 Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)


 Conditions for stopping partitioning
 There are no remaining attributes for further partitioning

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


11
Decision Tree Induction: An Example
 Training data set: Buys_computer age income student credit_rating buys_computer
 The data set follows an example of ID3 <=30 high no fair no
 Resulting tree: <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Decision Tree Example . using ID3
• Extracting Classification Rules from the
decision tree
 If Age (31…40) Then Buys-Computer (Yes)
 If Age (<=30) And Student (No) Then Buys-Computer
(No)
 If Age (<=30) And Student (Yes) Then Buys-Computer
(Yes)
 If Age (>40) And Cr-Rating (Excellent) Then Buys-
Computer (No)
 If Age (>40) And Cr-Rating (Fair) Then Buys-Computer
(Yes)

IS8101 Compiled by Kindie B.


13
(Ph.D)
Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):

 This greatly reduces the computation cost: Only


counts the class distribution

 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having

value xk [Ci ---class instances]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


14
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Task:
Classify X using Bayesian
classifier ????

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
age income studentcredit_rating
buys_compute

Naïve Bayes Classifier: An Example <=30


<=30
31…40
high
high
high
no fair
no excellent
no fair
no
no
yes
>40 medium no fair yes
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 >40 low yes fair yes
>40 low yes excellent no
P(buys_computer = “no”) = 5/14= 0.357 31…40 low yes excellent yes
<=30 medium no fair no
Compute P(X|Ci) for each class <=30
>40
low yes fair
medium yes fair
yes
yes
<=30 medium yes excellent yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 31…40 medium no excellent yes
31…40 high yes fair yes
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 >40 medium no excellent no

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444


P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
16
The k-Nearest Neighbor Algorithm
All instances correspond to points in the n-D space
The nearest neighbor are defined in terms of
Euclidean distance, dist(X1, X2)
Target function could be discrete- or real- valued
For discrete-valued, k-NN returns the most common
value among the k training examples nearest to xq
Vonoroi diagram: the decision surface induced by 1-
NN for a typical set of training examples

_
_
_ _ .
+
_
. +
xq +
. . .
_ + . 17
k-Nearest Neighbor (k-NN) Classification
 In k-nearest-neighbor (k-NN) classification, the training

dataset is used to classify each member of a “target” dataset.


 There is no model created during a learning phase but the

training set itself.


 It is called a lazy-learning method.

 Basic idea: The basic idea of nearest-neighbor models is that

the properties of any particular input X are likely to be


similar to those of points in the neighborhood of X.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan] 18


k-Nearest Neighbor (k-NN) Classification

KNN algorithm works on a similarity measure.


KNN model will find the similar features of the new data set to the cats and
dogs images and based on the most similar features it will put it in either cat
or dog category.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Nearest-Neighbor Classifiers
 Requires three things
 The set of stored records
 Distance Metric to compute distance between
records
 The value of k, the number of nearest neighbors to
retrieve
 To classify an unknown record:
1) Compute distance to other training records
2) Identify k nearest neighbors
3) Use class labels of nearest neighbors to determine
the class label of unknown record

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


20
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that
category for which the number of the neighbor
is maximum.
Step-6: Our model is ready.
21
Examples of Nearest Neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x
22
ACTIVITY-10(LAB-5)
Investigate any classification problem for a
dataset and try to implement those three
algorithms i.e. Decision Tree, Naïve Bayes, K-
Nearest Neighbors, and analyze the
classification accuracy and other performance
factors of each type algorithm.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
Decision Tree
• R package "party" is used to create decision
trees. package "party" has the function ctree()
which is used to create and analyze decison
tree.
ctree(formula, data)
• formula is a formula describing the predictor
and response variables.

• data is the name of the data set used.

24
Naive Bayes Classifier
pkgs = c("klaR", "caret", "ElemStatLearn")
# Install these packages
# Split the data in training and testing
# Define a matrix with features, X_train
# And a vector with class labels, y_train
# Train the model
train(X_train, y_train, method = 'nb‘)
# Compute pred using the model to get the
predictive accuracy.
25
K-Nearest Neighbors
K-nearest neighbors (KNN) algorithm is a type
of supervised ML algorithm which can be used
for both classification as well as regression
predictive problems.
##use knn() function
install.packages("e1071")
install.packages("caTools")
install.packages("class")
# Confusiin Matrix
26
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


27
Matters of Discussion
Performance Evaluation:
Evaluating classification performance

[Review]
Classification Performance - Evaluating
Predictive Performance

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Performance measures
The performance of the developed model can
be evaluated using Confusion Matrix
=== Confusion Matrix ===
a b <-- classified as
Predicted
150 28 | a = tested_negative Class
32 51 | b = tested_positive
Class = Class
Negative =Positive
Actual Class = True False
Class Negative
Negative Positive
[a] (FP)
(TN)
Class = False True
Positive Negative Positive
[b] (FN) (TP)
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
2
Data set for building Confusion Matrix Example

TP FP
FN TN

3
Performance measures
• Performance metrics

4
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Classifications - Classification methods

Decision Tree,
Naïve Bayes,
K-Nearest Neighbors

Already Discussed - ok
how to estimate the performance of those algorithms based on the
measures.

Now we investigate the performance measure?


Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
5
Performance measure for Naïve Bayes classification[class wise]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
Comparison Result

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
Four common Test options
For both, training and testing, you need data.
Those four options are commonly used.
1. Use training set:
 Means you will test your knowledge on the
same data you learned.
 Not very accepted because you can just make
build your code to memorize the training
instances (which will be in the test).
 Less degree of use for research.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
2. Supplied test set:
 It is an external file that you can use as
training set.

 It can be used when you want/need to test


the algorithm's knowledge against a specific
test set.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
3. K-fold cross validation
 The training set is randomly divided into K disjoint sets of
equal size where each part has roughly the same class
distribution.
 You fold the data in 10 folds (for example) and
repeat 10 (because it is 10-folds) the following
process: Use 9 folds for training and leave 1 fold out
for testing.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


10
4. Percentage split:
 Splits the data and separates x% of the data
for learning and the rest of it for testing.

 It is useful when your algorithm is slow.

 The best method to evaluate your classifier is


to train algorithm with 67% of your training
data and 33% to test your classifier.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


11
Model performance for classification models
A classification model is a machine learning model
which predicts a Y variable which is categorical:
1. Will the employ leave the organization or stay?
2. Does the patient have cancer or not?
3. Does this customer fall into high risk, medium
risk or low risk?
4. Will the customer pay or default a loan?
A classification model in which the Y variable can
take only 2 values is called a binary classifier.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
CASE: Confusion matrix for customer class prediction
=== Confusion Matrix ===
a b <-- classified as TN= 150 ; FP = 28
150 28 | a = tested_negative
FN= 32 ; TP = 51
32 51 | b = tested_positive

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
model performance measure
1. Accuracy: = [TP+TN] / [TP+FP+TN+FN]
Accuracy is the number of correct predictions made by
the model by the total number of records. The best
accuracy is 100% indicating that all the predictions are
correct. TN= 150 ; FP = 28
2. Sensitivity or recall FN= 32 ; TP = 51
Sensitivity (Recall or True positive rate) is calculated as
the number of correct positive predictions divided by
the total number of positives.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


14
3. Specificity:
Specificity (true negative rate) is calculated as
the number of correct negative predictions
divided by the total number of negatives.
TN= 150 ; FP = 28
4. Precision: FN= 32 ; TP = 51
Precision (Positive predictive value) is
calculated as the number of correct positive
predictions divided by the total number of
positive predictions.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
5. KS statistic : KS statistic is a measure of degree of
separation between the positive and negative
distributions. KS value of 100 indicates that the scores
partition the records exactly such that one group
contains all positives and the other contains all
negatives. In practical situations, a KS value higher than
50% is desirable.
6. ROC chart & Area under the curve (AUC)
ROC chart is a plot of 1-specificity in the X axis and
sensitivity in the Y axis. Area under the ROC curve is a
measure of model performance. The AUC of a random
classifier is 50% and that of a perfect classifier is 100%.
For practical situations, an AUC of over 70% is
desirable.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
16
7. Precision vs. recall: Recall or sensitivity gives
us information about a model’s performance on
false negatives (incorrect prediction of
customers who will default),
while precision gives us information of the
model’s performance of false positives.
8. F-measure [measure of a test's accuracy]
= F1 Score = 2*(Recall * Precision) / (Recall +
Precision) TN= 150 ; FP = 28
FN= 32 ; TP = 51
(F1 score or F score): alternate terms
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
17
Performance measures
• Performance metrics

18
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
Performance measures Summary

19
Rules extracted from classification algorithms

The final classification Rules are the actual classifier model use for prediction.
20
Challenge of Evaluation Metrics
1) Evaluation measures play a crucial role in both
assessing the classification performance and
guiding the classifier modeling.
2) In fact, the use of common metrics in imbalanced
domains can lead to sub-optimal classification
models and might produce misleading conclusions
since these measures are insensitive to skewed
domains.
 skewness is a measure of the asymmetry of the
probability distribution.

21
ACTIVITY-11
Explore a classification problem case by
considering any real-world domain application,
formulate a confusion matrix through scenario
assumption for the classifier model and
investigate the various parameters to measure
the performance of the classifier model.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


22
Extra Query
1. Investigate the four test options to perform
training and testing for Data analytic algorithms
based on the dataset. What do the four test
options mean and when do you use them?
2. Investigate the major challenges in context to
the performance measures of the classifier
models connecting to the real-world application
scenario.

23
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
Matters of Discussion
More on Classification!!!!
classification and regression trees
Logistic Regression

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Classification and Regression Tree(CART)
 CART is a term used to describe decision tree
algorithms that are used for classification
and regression learning tasks.

 In order to understand classification and


regression trees better, decision tree plays
vital role.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
Decision Tree Induction: An Example
 Training data set: Buys_computer age income student credit_rating buys_computer
 The data set follows an example of ID3 <=30 high no fair no
 Resulting tree: <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Decision Tree Example . using ID3
• Extracting Classification Rules from the
decision tree
 If Age (31…40) Then Buys-Computer (Yes)
 If Age (<=30) And Student (No) Then Buys-Computer
(No)
 If Age (<=30) And Student (Yes) Then Buys-Computer
(Yes)
 If Age (>40) And Cr-Rating (Excellent) Then Buys-
Computer (No)
 If Age (>40) And Cr-Rating (Fair) Then Buys-Computer
(Yes)

IS8101 Compiled by Kindie B.


4
(Ph.D)
REVIEW
 Machine learning algorithms can be classified
into two types- supervised and unsupervised.
 A decision tree is a supervised machine learning
algorithm.
 It has a tree-like structure with its root node at
the top.
The CART or Classification & Regression Trees
methodology refers to these two types of decision
trees.
1. Classification Trees
2. Regression Trees

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
1. Classification Trees
 A classification tree is an algorithm where
the target variable is fixed or categorical.
 The algorithm is then used to identify the
“class”.
 binary classifications: classification-type
problem would be determining
 who will or will not subscribe to a digital
platform;
 who will or will not graduate from high
school.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
6
Cont..
Classification Trees: where the target variable
is categorical and the tree is used to identify
the "class" within which a target variable
would likely fall into.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
2. Regression Trees
 refers to an algorithm where for the target
variable is Y and the algorithm is used to
predict it’s value based on the input
parameter X. [Y = mX + epsilon[error]].
 As an example of a regression type problem,
you may want to predict the selling prices of
a residential house [Y],
 In dependent variables [X] like square
footage as well as categorical factors like the
style of home, area in which the property is
located and so on.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
8
 Regression Trees: where the target variable [Y] is
continuous and tree is used to predict it's value.

 The CART algorithm is structured as a sequence of


questions, the answers to which determine what the next
question, if any should be.
 The result of these questions is a tree like structure where
the ends are terminal nodes at which point there are no
more questions.
Sibsp – no of siblings or spouses
9
Key aim of CART and results
 create a set of if-else conditions that allow for
the accurate prediction [predict the exact
value] or classification of a case [class level].

 The results from classification and regression


trees can be summarized in simplistic if-then
conditions.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


10
When to use Classification and Regression Trees
 Classification trees are used when the
dataset needs to be split into classes which
belong to the response variable. In many
cases, the classes Yes or No.
 Regression trees, are used when the
response variable is continuous.
 For instance, if the response variable is
something like the price of a property or the
temperature of the day, a regression tree is
used.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
11
HOW CART works [Example]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
If the dependent variable [Y] is categorical, CART produces a
classification tree. And if the variable is continuous, it produces a
regression
tree.

Y = mX+ C
13
THE KEY IDEA
 Take all of your data.
 Consider all possible values of all variables.
 Select the variable/value (X=t1) that produces the
greatest
 “separation” in the target.
 (X=t1) is called a “split”.
 If (X< t1) then send the data to the “left”; otherwise,
send data point to the “right”.
 Now repeat same process on these two “nodes”
 You get a “tree”
Note: CART only uses binary splits.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
14
CART model transformation example

CART model
If Height > 180 cm Then Male
If Height <= 180 cm AND Weight > 80 kg Then Male
If Height <= 180 cm AND Weight <= 80 kg Then Female
Make Predictions With CART Models

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
When the response variable has only 2 possible values, it is
desirable to have a model that predicts the value either as 0 or 1
or as a probability score that ranges between 0 and 1.

Linear regression does not have this capability. Because, If you use
linear regression to model a binary response variable, the
resulting model may not restrict the predicted Y values within 0
and 1.
NEXT REGRESSION TREE-CART implements in R 16
install.packages("rpart")
install.packages("rpart.plot")
install.packages("ggplot2")
library(rpart)
library(rpart.plot)
library(ggplot2)
data() # to check the availability of datasets
data(msleep)
str(msleep) # to view the structure of the dataset
df <- msleep[ , c(3,4,6,10,11)] # reduce to specific attributes
str(df) # to view the structure of new data frame
head(df) # to view the table
# sleep_total ~ brainwt, bodywt
m1 <- rpart(sleep_total~ ., data = df , method= "anova")

print(m1)

rpart.plot(m1, type=3, digits=3, fallen.leaves = TRUE)

p1 <- predict(m1, df)

print(p1)
REGRESSION TREE-CART implements in R 17
18
Logistic Regression

 Logistic regression is forcefully not a


classification algorithm on its own.

 It is only a classification algorithm in


combination with a decision rule that makes
the predicted probabilities of the outcome.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Application analysis for Logistic Regression

 As an example, consider the task of


predicting someone’s gender (Male/Female)
based on their Weight and Height.

 For this, we will train a machine learning


model from a data set of 10,000 samples of
people’s weight and height.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


20
Logistic Regression for R Implements
 The Logistic Regression is a regression model
in which the response variable (dependent
variable) has categorical values such as
True/False or 0/1.
 It actually measures the probability of a
binary response as the value of response
variable based on the mathematical equation
relating it with the predictor variables.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


21
Cont.. e- scientific notation
The general mathematical equation for logistic
regression is −Sigmoid Function
y = 1/(1+e^-(a+b1x1+b2x2+b3x3+...))
 y is the response variable.
 x is the predictor variable.
 a and b are the coefficients which are
numeric constants.
The function used to create the regression
model ---- glm() function.
22
Cont..
glm(formula,data,family)
 formula is the symbol presenting the
relationship between the variables.
 data is the data set giving the values of these
variables.
 family is R object to specify the details of the
model. It's value is binomial for logistic
regression.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
Logistic Regression
 Logistic regression is a binary classification algorithm.
 You can implement using the glm() function by setting the
family argument to "binomial".
Step 1: Build Logit Model on Training Dataset
logitMod <- glm(Y ~ X1 + X2, family="binomial", data =
trainingData)
# Step 2: Predict Y on Test Dataset
predictedY <- predict(logitMod, testData, type="response")

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
Cont..
# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))

am cyl hp wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
25
Cont..
input <- mtcars[,c("am","cyl","hp","wt")] am.data =
glm(formula = am ~ cyl + hp + wt, data = input, family =
binomial) print(summary(am.data))

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
26
Result analysis
 In the summary as the p-value in the last
column is more than 0.05 for the variables
"cyl" and "hp",
 we consider them to be insignificant in
contributing to the value of the variable
"am".
 Only weight (wt) impacts the "am" value in
this regression model.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


27
Logistic Regression as classification algorithm

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


28
Analysis
 This method seeks to simplify the model
during training by minimizing the coefficients
learned by the model.
 The ridge parameter defines how much
pressure to put on the algorithm to reduce
the size of the coefficients.
 You can see that with the default
configuration that logistic regression
achieves an accuracy of 88%.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
29
ACTIVITY-12(LAB—06)
Stepwise investigate the implementations of
logistic regression algorithm by considering any
application , and analyze the results in detail.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


30
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


31
[4.1-PART-01; 4.1-PART-02]
1. Cosine similarity: measure distance between

two records

2. Jaccard distance

3. Distance Measures For Attributes

4. Measuring Distance Between Two Clusters-

Hierarchical Clustering
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
1
4.1-PART-01
Cluster analysis

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
What is Cluster Analysis?
 Cluster: A collection of data objects
 similar (or related) to one another within the

same group.
 dissimilar (or unrelated) to the objects in

other groups.
 Cluster analysis (or clustering, data
segmentation, …)
 Finding similarities between data according to

the characteristics found in the data and


grouping similar data objects into clusters.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Cont..
 Unsupervised learning: no predefined
classes (i.e., learning by observations vs.
learning by examples: supervised)
 Typical applications
 As a stand-alone tool to get insight into

data distribution
 As a preprocessing step for other
algorithms

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Clustering for Data Understanding and
Applications
 Biology: taxonomy of living things: kingdom,
class, order, family, genus and species
 Information retrieval: document clustering
 Land use: Identification of areas of similar land
use in an earth observation database
 Marketing: Help marketers discover distinct
groups in their customer bases, and then use
this knowledge to develop targeted marketing
programs

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Cont..
 City-planning: Identifying groups of houses
according to their house type, value, and
geographical location
 Earth-quake studies: Observed earth quake
epicenters should be clustered along continent
faults
 Climate: understanding earth climate, find
patterns of atmospheric and ocean
 Economic Science: market research

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
Clustering as a Preprocessing Tool (Utility)
 Summarization:
 Preprocessing for regression, PCA, classification,

and association analysis


 Compression:
 Image processing: vector quantization

 Finding K-nearest Neighbors


 Localizing search to one or a small number of

clusters
 Outlier detection
 Outliers are often viewed as those “far away” from

any cluster

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
Quality: What Is Good Clustering?
 A good clustering method will produce high quality
clusters
 high intra-class similarity: cohesive within clusters
 low inter-class similarity: distinctive between clusters
 The quality of a clustering method depends on
 the similarity measure used by the method
 its implementation, and
 Its ability to discover some or all of the hidden
patterns

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Cosine similarity:
 Let’s imagine that you need to determine
how similar two documents or corpus of text
are.
 Which distance metrics will you use?
 The answer is cosine similarity.
 In order to calculate it, we need to measure the
cosine of the angle between two vectors. Then,
cosine similarity returns the normalized dot product
of them.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
Cosine Similarity in Data mining Apps
• A document can be represented by thousands of attributes, each
recording the frequency of a particular word (such as keywords) or
phrase in the document.

• Other vector objects: gene features in micro-arrays, …


• Applications: information retrieval, biologic taxonomy, gene
feature mapping, ...
• Cosine measure: If d1 and d2 are two vectors (e.g., term-frequency
vectors), then
cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d||: the length of
10 vector d
Example: Cosine Similarity
• cos(d1, d2) = (d1  d2) /||d1|| ||d2|| ,
where  indicates vector dot product, ||d|: the length of vector d

• Ex: Find the similarity between documents 1 and 2.

d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0)
d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1)

d1d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25
||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)0.5=(42)0.5 =
6.481
||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)0.5=(17)0.5
= 4.12
cos(d1, d2 ) = 0.94
11
Jaccard similarity/Jaccard distance
 Similarity is computed through distance.
 A set is an unordered collection of objects.
 So for example, {1, 2, 3, 4} is equal to {2, 4,
3, 1}.
 We can calculate its cardinality (represented
as |set|) which is no other thing than the
number of elements contained in the set.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Cont..
 Let’s say we have two sets of objects, A and
B. We wonder how many elements they have
in common. This is called Intersection. It is
represented mathematically as A ∩ B.
 Maybe, we want to get all items regardless of
the set they belong to. This is called Union. It
is represented mathematically as A ∪ B.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
Cont..
 How does this relate to Jaccard similarity?
 Jaccard similarity is defined as the
cardinality of the intersection of defined sets
divided by the cardinality of the union of
them.
 It can only be applied to finite sample sets.
Jaccard similarity = |A ∩ B| / |A ∪ B|
 Imagine we have the set A = {“flower”, “dog”,
“cat”, 1, 3} and B = {“flower”, “cat”, “boat”}.
Then, A ∩ B = 2 and A ∪ B = 6. As a result,
the Jaccard similarity is 2/6 = 0.333.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


14
DISTANCE MEASURES FOR ATTRIBUTES

15
[A-13]: PRACTICE FOR YOU
1. A document can be represented by thousands of attributes, each recording
the frequency of a particular word (such as keywords) or phrase in the
document.
Formulate a scenario by Considering at least 6 documents and 12 keywords
and keywords frequency may be assigned randomly. Compute the similarity
among those documents.
2. Formulate a real-world scenario to use Jaccard similarity and
compute the similarity between two object sets through the
scenario.

3. Consider a relation having 5 attributes and 6 records and


compute the distance between those records or instances.

16
4.1-PART-02

17
Hierarchical Clustering
 Create a hierarchical decomposition of the
set of data (or objects) using some criterion.

 Use distance matrix as clustering criteria.


This method does not require the number of
clusters k as an input, but needs a
termination condition

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


18
Cont..
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
Hierarchical Clustering (Cont’d)
Two main types of hierarchical clustering
 Agglomerative:
 Start with the points as individual clusters
 At each step, merge the closest pair of clusters until only one
cluster (or k clusters) left
 Divisive:
 Start with one, all-inclusive cluster
 At each step, split a cluster until each cluster contains a point
(or there are k clusters)
 Traditional hierarchical algorithms use a similarity or distance matrix

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


20
Hierarchical Clustering (agglomerative)
 we assign each object (data point) to a separate cluster.
 Then compute the distance (similarity) between each of the
clusters and join the two most similar clusters.
 A tree like diagram that records the sequences of merges or
splits ----DENDOGRAM

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


21
Example….problem

Consider one dimensional data set

{7,10,20,28,35}, perform hierarchical clustering

and plot the dendogram to visualize it.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


22
Observation :
First, let’s the visualize the data.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


23
Observing the plot above, we can intuitively
conclude that:
 The first two points (7 and 10) are close to
each other and should be in the same cluster
 Also, the last two points (28 and 35) are close
to each other and should be in the same
cluster
 Cluster of the center point (20) is not easy to
conclude.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


24
Solution :
 Let’s solve the problem by hand using both the
types of agglomerative hierarchical clustering :

 Single Linkage : In single link hierarchical


clustering, we merge in each step the two
clusters, whose two closest members have the
smallest distance.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


25
Dendrogram
26
Solution analysis

Using single linkage two clusters are formed :

Cluster 1 : (7,10)

Cluster 2 : (20,28,35)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


27
2. Complete Linkage :

In complete link hierarchical clustering, we

merge in the members of the clusters in each

step, which provide the smallest maximum

pairwise distance.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


28
Dendrogram
29
Analysis
Using complete linkage two clusters are formed

Cluster 1 : (7,10,20)

Cluster 2 : (28,35)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


30
Conclusion :
 Hierarchical clustering is mostly used when
the application requires a hierarchy, e.g
creation of a taxonomy.

 However, they are expensive in terms of their


computational and storage requirements.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


31
Performing Hierarchical clustering on Dataset [ R-Implements]

 Complete-linkage (farthest neighbor)

 distance is measured between the farthest pair of


observations in two clusters.

 Single –linkage (nearest neighbor)

 distance is measured between the nearest pair of


observations in two clusters.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


32
Cont..
 Average-linkage – (Average distance)
 Average-linkage is where the distance between
each pair of observations in each cluster are added
up and divided by the number of pairs to get an
average inter-cluster distance.
Ex- C1 : (7,10)
C2: 20
Distance = [(20-7)+ (20-10)]/2 = (13+10)/2 = 11.5

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


33
REVIEW

Dendrogram
34
REVIEW

Dendrogram
35
R-implements using Average-linkage Hierarchical
clustering
 hclust() is pre-installed in stats package when R is
installed.

# Installing the package


install.packages("dplyr")

# Loading package
library(dplyr)

# Summary of dataset in package


head(mtcars)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


36
# Finding distance matrix
distance_mat <- dist(mtcars, method = 'euclidean')
distance_mat

# Fitting Hierarchical clustering Model


# to training dataset
set.seed(240) # Setting seed
Hierar_cl <- hclust(distance_mat, method = "average")
Hierar_cl

# Plotting dendrogram
plot(Hierar_cl)

# Choosing no. of clusters


# Cutting tree by height
abline(h = 110, col = "green")

# Cutting tree by no. of clusters


fit <- cutree(Hierar_cl, k = 3 )
fit

table(fit)
rect.hclust(Hierar_cl, k = 3, border = "green")
37
Model Hierar_cl:
In the model, the cluster method is average, distance is
Euclidean and no. of objects are 32.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


38
Plot dendrogram:

The plot dendrogram is shown with x-axis as distance matrix and y-axis as
height. 39
Cutted tree:
Tree is cut where k = 3 and each category represents

its number of clusters.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


40
Plotting dendrogram after cutting:
The plot denotes dendrogram after being cut. The
green lines show the number of clusters

41
A-14: LAB--07
Consider a dataset without any class level and

implement the hierarchical clustering algorithm.

Try to analyze and understand the results.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


42
TASK FOR YOU [A15]
Consider one dimensional data set {8,11,21,29,36},
perform hierarchical clustering to decide the
clusters using both single linkage and complete
linkage analysis and plot the respective
dendrogram to visualize it.
Give some key inferences to distinguish between
those two analysis.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


43
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


44
Matters of Discussion

Non-hierarchical Clustering
K-means Algorithm
K-Medoids

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
K-means Algorithm
 K-means algorithm is an iterative algorithm
 that tries to partition the dataset into K pre-
defined distinct non-overlapping subgroups
(clusters)
 where each data point belongs to only one
group.
 The algorithm is used when you have unlabeled
data.
 The goal is to find certain groups based on
some kind of similarity in the data with K
number of groups.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
Cont.. Key point
 It assigns data points to a cluster such that
the sum of the squared distance between the
data points and the cluster’s centroid is at
the minimum.

 cluster’s centroid :- (arithmetic mean of all


the data points that belong to that cluster).

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
K-Means Clustering Method
 Given k, the k-means algorithm is implemented in four
steps:
1. Partition objects into k nonempty subsets
2. Compute seed points as the centroids of the
clusters of the current partitioning (the centroid is
the center, i.e., mean point, of the cluster)
3. Assign each object to the cluster with the nearest
seed point
4. Go back to Step 2, stop when the assignment does
not change

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Cont..

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
An Example of K-Means Clustering

K=2

Arbitrarily Update the


partition cluster
objects into centroids
k groups

The initial data set Loop if Reassign objects


needed
 Partition objects into k nonempty
subsets
 Repeat
 Compute centroid (i.e., mean Update the
cluster
point) for each partition
centroids
 Assign each object to the
cluster of its nearest centroid
 Until no change
6
Key Analysis
Finally, this algorithm aims at minimizing an objective function
as:

Where,
xi (j) = data point
cj = cluster center
n = Number of data points
k = Number of cluster
||xi(j) – cj||2 = distance between a data point xi(j) and cluster
center cj
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
7
K-means: Method-01
Using K-means clustering, cluster the following data into two clusters
and show each step.
{2, 4, 10, 12, 3, 20, 30, 11, 25}

Solution: K1
Given: {2, 4, 10, 12, 3, 20, 30, 11, 25} C1
Step 1: Assign alternate value to each cluster randomly.

Step 2: k1= {2, 10, 3, 30, 25} Mean value= 14


k2= {4, 12, 20, 11} Mean value = 11.75
K2
Step 3: Again assign the values,
k1 = {20, 30, 25} Mean value = 25 C2
k2 = {2, 4, 10, 12, 3, 11} Mean value = 7

Step 4: Again assign the values,


k1 = {20, 30, 25} Mean value = 25 Step-3& 4: No change
k2 = {2, 4, 10, 12, 3, 11} Mean value = 7

Step-2: K1 cluster having cluster centroid C1 = 14


K2 cluster having Cluster centroid C2 = 11.5

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Method-1 Analysis
 Computation to move from Step-2 to Step-3.
 In step-2, The clusters centroid values are as follows:
 C1= 14 of K1 cluster
 C2 = 11.5 of K2 cluster
 Now Consider each data point from K1 and k2
clusters
compute the distance from C1 and C2,
consider the minimum distance,
and assign the respective data point to the cluster
k1 or k2. K1/C1 K2/C2
 E.g. for data point ‘2’ : Min( │2-14 │, │2-11.5 │)= 9.5
so, data point ‘2’ assigns to cluster K2 having centroid
C2[K2/C2].

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
K-means: Method-02
{2, 4, 10, 12, 3, 20, 30, 11, 25}

Step 1: Randomly assign the means: m1 = 3, m2 = 4

Step 2: Group the numbers close to mean m1 = 3 are grouped into cluster

k1 and m2 = 4 are grouped into cluster k2

Step 3: k1 = {2, 3}, k2 = {4, 10, 12, 20, 30, 11, 25}, m1= 2.5, m2 = 16

Step 4: k1 = {2, 3, 4}, k2 = {10, 12, 20, 30, 11, 25}, m1= 3 m2 = 18

Step 5: k1 = {2, 3, 4, 10}, k2= {12, 20, 30, 11, 25}, m1= 4.75, m2 = 19.6

Step 6: k1 = {2, 3, 4, 10, 11, 12}, k2 = {20, 30, 25}, m1 = 7, m2 = 25

Step 7: k1 = {2, 3, 4, 10, 11, 12}, k2 = {20, 30, 25}, m1 = 7, m2 = 25

Step 8: Stop. The clusters in step 6 and 7 are same.

Final answer: k1 = {2, 3, 4, 10, 11, 12} and k2 = {20, 30, 25}

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


10
Applications of K-Means Clustering:
k-means can be applied to data that has a
smaller number of dimensions, is numeric, and
is continuous. such as
document clustering, identifying crime-prone
areas, customer segmentation, insurance fraud
detection, public transport data analysis,
clustering of IT alerts…etc.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


11
Comments on the K-Means Method
Strength:

Efficient: O(tkn), where n is # objects, k is # clusters, and t

is # iterations. Normally, k, t << n.

Weakness:

 Need to specify k, the number of clusters, in advance

 Sensitive to noisy data and outliers

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Evaluation of Cluster Quality using Purity
 Quality measured by its ability to discover some or all of the
hidden patterns or latent classes in gold standard data
 Assesses a clustering with respect to ground truth …
requires labeled data
 Assume documents with C gold standard classes, while our
clustering algorithms produce K clusters, ω1, ω2, …, ωK with
ni members
 Simple measure: purity, the ratio between the dominant
class in the cluster πi and the size of cluster ωi

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
Purity in Clusters: example

     
     
    
Cluster I Cluster II Cluster III

• Assume that we cluster three category of data items (those colored


with red, blue and green) into three clusters as shown in the above
figures. Calculate purity to measure the quality of each cluster.
Cluster I: Purity = 1/6 (max(5, 1, 0)) = 5/6 = 83%
Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 = 67%
Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5 = 60%
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
R-Implements
 R base has a function to run the k mean
algorithm. The basic function of k mean is:
kmeans(df, k)
arguments:
-df: dataset used to run the algorithm
-k: Number of clusters
 K-means clustering can handle larger datasets
than hierarchical cluster approaches.
 Unlike hierarchical clustering, K-means
clustering requires that the number of clusters
to extract be specified in advance.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
Package Objective Function Argument

base Train k-mean kmeans() df, k

Access cluster kmeans()$cluster

Cluster centers kmeans()$centers

Size cluster kmeans()$size

16
17
Comments on the K-Means Method
Limitation
 Applicable only when mean is defined, then what
about categorical data?
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers

• The k-means algorithm is sensitive to outliers !


Since an object with an extremely large value may
substantially distort the distribution of the data.

IS8101 Compiled by Kindie B. (Ph.D) 18


The K-Medoids Clustering Method
• Find representative objects, called medoids, in clusters
• PAM (Partitioning Around Medoids)-
 starts from an initial set of medoids and iteratively replaces
one of the medoids by one of the non-medoids if it improves
the total distance of the resulting clustering.
 PAM works effectively for small data sets, but does not scale
well for large data sets
 CLARA (CLustering LARge Applications)
 CLARANS (CLustering LAge Applications based upon
RANdomized Search)
19
• K-Medoids: Instead of taking the mean value
of the object in a cluster as a reference point,
medoids can be used, which is the most
centrally located object in a cluster.

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

20
Cluster Computation using K-Medoids through a dataset scenario

Graph is drawn using the above data points


21
Step 1: Let the randomly selected 2 medoids, so
select k = 2 and let C1 = (4, 5) and C2 =(8,
5) are the two medoids.
Step-2: Calculating cost. The dissimilarity of
each non-medoid point with the medoids is
calculated and tabulated:

22
Each point is assigned to the cluster of that
medoid whose dissimilarity is less.
points 1, 2, 5 go to cluster C1
 0, 3, 6, 7, 8 go to cluster C2.
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) =
20
Step 3: randomly select one non-medoid point
and recalculate the cost.
Let the randomly selected point be (8, 4). The
dissimilarity of each non-medoid point with the
medoids – C1 (4, 5) and C2 (8, 4) is calculated
and tabulated.

23
Each point is assigned to that cluster whose
dissimilarity is less.
So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2
>0 24
As the swap cost is not less than zero, we
undo the swap. Hence (4, 5) and (8, 5) are the
final medoids. The clustering would be in the
following way

25
Research Methodology

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


26
[A-16]:-LAB-08
Consider a dataset without any class level and

implement the K-means Algorithm.

Try to analyze and understand the final results. Also,

interpret the final result to estimate the accuracy.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


27
TASK FOR YOU [A17]
1. Using K-means clustering, cluster the
following data into two clusters and show each
step.
{3, 5, 10, 13, 4, 21, 31, 12, 26}.
Give your step by step computational analysis.
2. Formulate any four cluster scenarios to
Calculate purity to measure the quality of each
cluster.
3. Investigate the computational processes of
K-medoid algorithm with a suitable scenario.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
28
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


29
Matters of Discussion
Associative Prediction:

Frequent pattern Mining[rules],

Utility item set mining,

Association Rules – Association Algorithms

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Association Rule Mining
 Association Mining searches for frequent
items in the data-set.
 In frequent mining usually the interesting
associations and correlations between item
sets in transactional and relational databases
are found.
 In short, Frequent Mining shows which items
appear together in a transaction or relation.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
Association Rule Mining Task

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Cont..
 Association Rule (AR) Mining: Finding frequent patterns,

associations, correlations, or causal structures among sets of


items or objects.
 Frequent pattern: a pattern (a set of items, subsequences,

substructures, etc.) that occurs frequently in a data set.


 Association Rule (AR) discovery is often referred to as Market

Basket Analysis (MBA), and is also referred to as Affinity


Grouping

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Motivation
Finding inherent regularities in data
 What products were often purchased together?

 What are the subsequent purchases after buying a


PC?
 What kinds of DNA are sensitive to this new

drug?
 Can we automatically classify web documents?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Association Rule- Basic Concepts
 Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other items


in the transaction.
 Ex. transaction

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
Association Rule- Basic . . . (cont’d)

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
Association Rule- Basic . . . (cont’d)
 Support or utility for an association rule X  Y is the

percentage of transactions in the database that contains X AND

Y.

 Confidence or Certainty for an association rule X  Y is the

ratio of the number of transactions that contain X (AND) Y to

the number of transactions that contain X.


 Association Rule form :
Antecedent  Consequent [support, confidence]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Association Rule- Basic . . . (cont’d)

Support count("X  Y") = Number of transactions that

contain both X and Y

#_of_tuples containing both X and Y


Support (XY) =
Total_Number_of_tuples

#_of_tuples containing both X and Y


Confidence XY =
Number of tuples containing X

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
11
Apriori Algorithm
 Apriori is a seminal algorithm proposed by R. Agrawal and R.

Srikant in 1994 for mining frequent item sets for Boolean


association rules.
 It uses prior knowledge of frequent item set properties,

 Apriori employs an iterative approach known as a level-wise

search, where k-item-sets are used to explore k+1 item-sets.


 Apriori pruning principle: If there is any item set which is

infrequent, its superset should not be generated/tested!

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
Computational Steps of Apriori Algorithm
 Method:

1. Initially, scan DB once to get frequent 1-itemset

2. Generate length (k+1) candidate item sets from

length k frequent item sets


3. Test the candidates against DB

4. Terminate when no frequent or candidate set can

be generated

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


13
Apriori Algorithm
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


14
Apriori Algorithm Example

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
Frequent Item set Generation

16
Apriori Algorithm Example
• Exercise 1: TID Items purchased

Find the frequent Itemset T100 I1,I2,I5


T200 I2,I4
where the minimum support
T300 I2,I3
count=2 T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

Practice FOR YOU------


17
Extra Investigation Task For You
Investigate the Computational Steps of any
Associative prediction algorithm through an
appropriate problem scenario in context to an
application.

Apriori Algorithm
18
LAB
Investigate any real world problem to
Implement the Apriori Algorithm for
Association Rule Learning and prepare your
Lab Report……

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
LAB CONT..
 1. Start the Tool
 2. Load the Datasets
 3. Discover Association Rules
The “Apriori” algorithm will already be selected.
This is the most well known association rule
learning method because it may have been the
first (Agrawal and Srikant in 1994) and it is very
efficient.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


20
LAB CONT..
 4. Analyze Results
The real work for association rule learning is in
the interpretation of results.
You have to be very careful about interpreting
association rules. They are associations (think
correlations), not necessary causally related.
Consider the results snap short in Report along
with the steps analysis of step-1,2,3.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


21
Best rules found:
1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723
<conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35)
2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696
<conf:(0.92)> lift:(1.27) lev:(0.03) [149] conv:(3.28)
3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705
<conf:(0.92)> lift:(1.27) lev:(0.03) [150] conv:(3.27)
4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746
<conf:(0.92)> lift:(1.27) lev:(0.03) [159] conv:(3.26)
5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779
<conf:(0.91)> lift:(1.27) lev:(0.04) [164] conv:(3.15)
6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725
<conf:(0.91)> lift:(1.26) lev:(0.03) [151] conv:(3.06)
7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701
<conf:(0.91)> lift:(1.26) lev:(0.03) [145] conv:(3.01)
8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 <conf:(0.91)>
lift:(1.26) lev:(0.04) [179] conv:(3)
9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757
<conf:(0.91)> lift:(1.26) lev:(0.03) [156] conv:(3)
10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 <conf:(0.91)>
lift:(1.26) lev:(0.04) [179] conv:(2.92)

Lift, conviction,leverage
22
Cheers For the Great Patience!
Query Please?
https://www.kirenz.com/post/2020-05-14-r-association-rule-
mining/#:~:text=Association%20rule%20mining%20is%20one,the%20arules%20and%20arulesViz%20packages

R-Implements
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
23
Module:5 Managing Health and Safety

What is a Safety and Health Management System?


A safety and health management system means the part of the Organisation's management system
which covers:

 the health and safety work organisation and policy in a company


 the planning process for accident and ill health prevention
 the line management responsibilities and
 the practices, procedures and resources for developing and implementing, reviewing and
maintaining the occupational safety and health policy.

The system should cover the entire gambit of an employer's occupational health and safety
organisation. The key elements of a successful safety and health management system are:
1. Policy and commitment
The workplace should prepare an occupational safety and health policy programme as part of the
preparation of the Safety Statement required by Section 20 of the Safety, Health and Welfare at
Work Act 2005. Effective safety and health policies should set a clear direction for the organisation
to follow. They will contribute to all aspects of business performance as part of a demonstrable
commitment to continuous improvement. Responsibilities to people and the working environment will
be met in a way that fulfils the spirit and letter of the law. Cost-effective approaches to preserving
and developing human and physical resources will reduce financial losses and liabilities. In a wider
context, stakeholders' expectations, whether they are shareholders, employees or their
representatives, customers or society at large, can be met.
2. Planning
The workplace should formulate a plan to fulfil its safety and health policy as set out in the Safety
Statement. An effective management structure and arrangements should be put in place for
delivering the policy. Safety and health objectives and targets should be set for all managers and
employees.
3. Implementation and operation
For effective implementation, organisations should develop the capabilities and support mechanisms
necessary to achieve the safety and health policy, objectives and targets. All staff should be
motivated and empowered to work safely and to protect their long-term health, not simply to avoid
accidents. These arrangements should be:

 underpinned by effective staff involvement and participation through appropriate


consultation, the use of the safety committee where it exists and the safety representation
system and,
 sustained by effective communication and the promotion of competence, which allows all
employees and their representatives to make a responsible and informed contribution to the
safety and health effort.

There should be a planned and systematic approach to implementing the safety and health policy
through an effective safety and health management system. The aim is to minimise risks. Risk
Assessment methods should be used to determine priorities and set objectives for eliminating
hazards and reducing risks. Wherever possible, risks should be eliminated through the selection and
design of facilities, equipment and processes. If risks cannot be eliminated, they should be
minimised by the use of physical controls and safe systems of work or, as a last resort, through the

1|Page
provision of PPE. Performance standards should be established and used for measuring
achievement. Specific actions to promote a positive safety and health culture should be identified.
There should be a shared common understanding of the organisation‘s vision, values and beliefs on
health and safety. The visible and active leadership of senior managers fosters a positive safety and
health culture.
4. Measuring performance
The organisation should measure, monitor and evaluate safety and health performance.
Performance can be measured against agreed standards to reveal when and where improvement is
needed. Active self-monitoring reveals how effectively the safety and health management system is
functioning. Self-monitoring looks at both hardware (premises, plant and substances) and software
(people, procedures and systems, including individual behaviour and performance). If controls fail,
reactive monitoring should find out why they failed, by investigating the accidents, ill health or
incidents, which could have caused harm or loss. The objectives of active and reactive monitoring
are:

 to determine the immediate causes of substandard performance


 to identify any underlying causes and implications for the design and operation of the safety
and health management system.

5. Auditing and reviewing performance


The organisation should review and improve its safety and health management system continuously,
so that its overall safety and health performance improves constantly. The organisation can learn
from relevant experience and apply the lessons. There should be a systematic review of
performance based on data from monitoring and from independent audits of the whole safety and
health management system. These form the basis of complying with the organisation’s
responsibilities under the 2005 Act and other statutory provisions. There should be a strong
commitment to continuous improvement involving the development of policies, systems and
techniques of risk control. Performance should be assessed by:

 internal reference to key performance indicators


 external comparison with the performance of business competitors and best practice in the
organisation’s employment sector.

Many companies now report on how well they have performed on worker safety and health in their
annual reports and how they have fulfilled their responsibilities with regard to preparing and
implementing their Safety Statements. In addition, employers have greater responsibilities under
Section 80 of the 2005 Act on ‘Liability of Directors and Officers of Undertakings’ that requires them
to be in a position to prove they have pro-actively managed the safety and health of their workers.
Data from this ‘Auditing and reviewing performance’ process should be used for these purposes.

2|Page
What issues should a review of the safety and health management system cover?
An organisation should carry out an initial review of the safety and health management system, and
follow this up with periodic reviews. The initial review should compare existing safety and health
practice with:

 the requirements of safety and health legislation


 the provisions set out in the organisation’s Safety Statement
 safety and health guidance in the organisation
 existing authoritative and published safety and health guidance
 best practice in the organisation’s employment sector

The following checklist may be used for the review

 Is the Safety Statement clear and concise so that it can be read and understood by those
who may be at risk?
 Is the Safety Statement available at the workplace to which it relates and are workers given
relevant extracts where they are at specific risk?
 Is the overall safety and health policy of the organisation and the internal structure for
implementing it adequate, e.g. are responsibilities of named persons clearly outlined?
 Does the Safety Statement contain a systematic identification of hazards and an assessment
of risks for the workplace(s) it covers?
 Are Risk Assessments being carried out on a regular basis as risks change and are the
necessary improvements made to keep the safety and health management system up to
date?
 Are the necessary safety control measures required for a safe workplace identified and
implemented, e.g. the provision of safe access and egress, good housekeeping, clear
passageways and internal traffic control?
 Are written safe procedures for those operations that require them available, e.g. for routine
processing and ancillary activities, handling and using chemicals, preventive maintenance,
plant and equipment breakdown maintenance, accident and ill-health investigations,
emergency planning, assessment of personal protective equipment (PPE) requirements?
 Are procedures available for monitoring the implementation of safety systems and control
measures, e.g. are safety audits being carried out?
 Is safety and health training being carried out and does the training give adequate
information to workers on risks they might be exposed to?
 Is the impact of this training and the level of understanding of the information assessed by
anyone?
 Do safety consultation, employee participation and representation procedures exist and are
these procedures effective, e.g. is there good co-operation between employer, managers
and employees on safety and health issues at the workplace? Is there a safety committee in
existence and if so does it comply with the 2005 Act requirements? Are safety committee
meetings constructive with meeting reports and follow-up action lists? Is the safety
representative or representatives involved at every stage of the safety consultation process?

3|Page
A Safety Statement should have a safety and health policy incorporated into it. What is this
policy?
A safety and health policy is a written document which recognises that safety and health is an
integral part of the organisation’s business performance. It is a statement by the organisation of it’s
intentions and approach in relation to it’s overall safety and health performance and provides a
framework for action, and for the setting of its safety and health objectives and targets. The safety
and health policy must:

 be appropriate to the hazards and risks of the organisation’s work activities and include a
commitment to protect, so far as is reasonably practicable, its employees and others, such
as contractors and members of the public, from safety and health risks associated with its
activities.
 include a commitment to comply with relevant safety and health legislation, Codes of
Practice and guidelines, as a minimum.
 provide a framework for measuring performance and ensuring continuous improvement by
setting, auditing and reviewing safety and health objectives and targets.
 be documented, understood, implemented and maintained at all levels of the organisation.
 clearly place the management of safety and health as a prime responsibility of line
management from the most senior executive level to first-line supervisory level.
 cover employee safety and health consultation, safety committee meetings where they exist,
worker participation and safety representation and includes a commitment to provide
appropriate resources to implement the policy.
 provide for employee co-operation and compliance with safety rules and procedures.

Organisations achieving high standards of safety and health develop policies that recognise the:

 contribution that safety and health can make to business performance by preserving and
developing human and physical resources, by reducing costs and liabilities, and by
expressing corporate responsibility.
 need for leaders to develop appropriate organisational structures and a culture that supports
risk control and secures the full participation of all members of the organisation.
 requirement to resource and plan policy implementation adequately.
 necessity of approaching injury, ill health and loss prevention by systematically identifying
hazards, assessing and controlling risks.
 need for the organisation to develop an understanding of risks and risk control and to be
responsive to internal and external change.
 requirement to scrutinise and review performance to learn from experience.
 connection between quality, the environment, safety and health, and good management
practice.

4|Page
What critical safety and health issues should be addressed, and allocated adequate
resources, in the safety and health policy?
Critical safety and health issues, which should be addressed and allocated resources, in the safety
and health policy, include the:

 design, provision and maintenance of a safe place of work for all employees
 design, provision and maintenance of safe means of access to and egress from each part of
the workplace
 design, provision and maintenance of any article, plant, equipment or machinery for use at
work in a safe manner, provision of systems of work that are planned, organised, performed,
maintained or revised, so as to be safe, particularly for safety critical process operations or
services
 performance of ongoing hazard identification and Risk Assessments, and compliance with
the general principles of prevention as set out in the legislation
 provision and maintenance of welfare facilities and PPE
 preparation of emergency plans and the provision of first-aid training
 reporting of accidents and dangerous occurrences to the Authority and their investigation
 provision and dissemination of safety and health information, instruction, training and
supervision as required
 operation of safety and health consultation, employee participation and safety representation
programmes
 review and keeping up-to-date the safety and health policy in order to prevent adverse
effects on the safety and health of employees from changing processes, procedures and
conditions in the workplace
 appointment of people responsible for keeping safety and health control systems in place
and making them aware of their responsibilities
 establishment of monitoring arrangements, including safety and health inspections and
audits, which should be used by the employer to ensure ongoing compliance with legal
duties, responsibilities and controls
 development of in-house safety and health competence
 employment of external safety and health experts as required
 use of standards, Codes of Practice, guidelines or industry practices
 co-operation required from employees and disciplinary procedures for non-compliance.

However, this list is not exhaustive and the critical safety and health issues that could be covered by
the policy will depend on the risks in the organisation. If the above issues are adequately covered
elsewhere in the Safety Statement or in the safety and health management system, they might need
only to be referred to in the safety and health policy. Backup documentation may also be referred to
in the policy.

5|Page
What are the responsibilities of management regarding the implementation of safety and
health in the organisation?
Responsibility for safety and health management ultimately rests with the employer. This
responsibility is normally delegated to executive directors, senior managers, line managers,
supervisors and employees. Each person’s authority and duties should be clearly defined,
documented and communicated to them. The organisational and reporting structure for
implementing these duties should be illustrated in an in-house organisational chart. In addition each
director on the organisation’s board needs to accept their responsibilities in providing safety and
health commitment and leadership by:

 ensuring that each members’ actions and decisions at board level always reinforce the
message in the organisation’s Safety Statement
 preventing a mismatch between individual board members attitudes, behaviour or decisions
and the organisation’s Safety Statement so as not to undermine workers belief in maintaining
good safety and health standards.

Accidents, ill health and incidents are seldom random events. They generally arise from failures of
control and involve multiple contributory elements. The immediate cause may be a human or
technical failure, but such events usually arise from organisational failings, which are the
responsibility of management. Successful safety and health management systems aim to utilise the
strengths of managers and other employees. The organisation needs to understand how human
factors affect safety and health performance. Senior executive directors or other senior management
controlling body members and executive senior managers are primarily responsible for safety and
health management in the organisation. These people need to ensure that all their decisions reflect
their safety and health intentions, as articulated in the Safety Statement, which should cover:

 the appointment of someone at senior management level with executive responsibility,


accountability and authority for the development, implementation, periodic review and
evaluation of their safety and health management system
 the safety and health ramifications of investment in new plant, premises, processes or
products. For example such changes could introduce:
 new materials - are they toxic or flammable, do they pose new risks to employees,
neighbours or the public and how will any new risks be controlled?
 new work practices - what are the new risks and are managers and supervisors competent
to induct workers in the new practices?
 new people - do they need safety and health training and are they sufficiently competent to
do the job safely?
 only engaging contractors to do new or ongoing projects that reinforce rather than damage
the organisation’s safety and health policies
 recognising their continuing responsibility for safety and health even when work is contracted
out
 providing their customers with the necessary safety and health precautions when supplying
them with articles, substances or services
 being aware that although safety and health responsibilities can and should be delegated,
legal responsibility for safety and health still rests with the employer.

6|Page
Senior managers responsibilities include:

 preparing safety and health policies and consulting employees, including the safety
committee where it exists, and the safety representative, as appropriate
 devising safety and health strategies for key high risks
 setting safety and health objectives and targets for employees
 devising plans to implement the safety and health policy
 ensuring that appropriate organisational structures are in place
 identifying and allocating resources for safety and health
 ensuring that the safety and health policy is effectively implemented and checking whether
objectives and targets have been met
 reviewing the effectiveness of the safety and health management system
 implementing any necessary improvements derived from carrying out Risk Assessments
 giving all personnel the authority necessary to carry out individual safety and health
responsibilities
 devising appropriate arrangements whereby employees are held accountable for discharging
their responsibilities
 establishing clear and unambiguous reporting relationships
 devising job descriptions that include safety and health responsibilities
 incorporating safety and health performance in the appraisal system where personal
appraisal systems exist
 developing safety and health cultures in project teams and team working situations.

7|Page
How can an organisation control safety and health aspects of contractors’ work?
Although organisations routinely contract out either all or parts of their work activities, they may still
retain some of the legal responsibility for health and safety, particularly if they directly control how
this work is done. For this reason, the organisation should establish and maintain procedures for
controlling the safety and health aspects of contractor work. These should include:

 pre-planning for medium or long-term contracts. This will involve carrying out a full safety and
health pre-qualification procedure; for short-term contracts, safety and health aspects should
be suitably checked by questionnaire or review
 ensuring the contractor has prepared Risk Assessments and an up-to-date Safety
Statement, which are specirfic for the project to be undertaken
 defining responsibility for and setting up communication links between appropriate levels of
the organisation and the contractor before work starts and throughout the contract
 who is responsible for developing and providing site safety rules and method statements
 providing safety and health training and induction of contractor personnel, where necessary,
before work begins
 monitoring safety and health aspects of contractor activities on site
 establishing procedures for communication of accidents and incidents involving the
contractor‘s personnel

Additionally, it is also necessary for organisations to check the ability of contractors where they work
close to, or in collaboration with, direct employees or with other contractor's employees. Such
arrangements should cover the:

 recruitment and placement procedures that ensure employees (including managers) have
the necessary physical and mental abilities to do their jobs or can acquire them through
training and experience. This may require individual fitness assessments by medical
examination and tests of physical fitness or aptitudes and abilities where work-associated
risks require it
 systems to identify safety and health training needs arising from recruitment, changes in
staff, plant, substances, technology, processes or working practices
 training documentation as appropriate to suit the size and activity of the organisation
 refresher training to maintain or enhance competence, to include where necessary
contractors‘ employees, self-employed people or temporary workers who are working in the
organisation
 communication systems and resources made available to ensure work is co-ordinated safely
and the risk of accidents are minimised
 arrangements to ensure competent cover for staff absences, especially for staff with critical
safety and health responsibilities
 general health promotion and surveillance schemes that contribute to the maintenance of
general health and fitness; this may include assessments of fitness for work, rehabilitation,
job adaptation following injury or ill-health, or a policy on testing employees for drugs or
alcohol abuse

8|Page
Effective safety and health management includes effective emergency planning. What should
this cover?
The organisation should establish and maintain procedures to respond to accidents and emergency
situations, and to prevent and minimise the safety and health impacts associated with them. This is
required by Section 11 of the Safety, Health and Welfare at Work Act 2005. Emergency planning
should cover:

 the development of emergency plans


 the testing and rehearsing of these plans and related equipment, including fire fighting
equipment and fire alarms
 training personnel on what to do in the event of an emergency, particularly those people who
have to carry out duties (e.g. fire-fighting teams, first- aiders)
 advising people working or living near the installation about what they should do in the event
of an emergency
 familiarising the emergency services with the facilities at the organisation so that they know
what to expect in the event of an emergency.

The emergency plan itself should include:

 details on the installation, availability and testing of suitable warning and alarm systems
 details of emergency scenarios that might occur, including the means for dealing with these
scenarios
 the emergency procedures in the organisation, including the responsibilities of key
personnel, procedures for fire-fighting and evacuation of all personnel on site and first-aid
requirements
 details of emergency services (e.g. fire brigade, ambulance services, spill clean-up services)
and the contact arrangements for these services
 internal and external communications plans
 training plans and testing for effectiveness
 details on the availability of emergency rescue equipment and its maintenance log.

The organisation should periodically test, review and revise its emergency preparedness and
response procedures where necessary, in particular after the occurrence of accidents or emergency
situations. The emergency plan should dovetail with the Safety Statement as required by Section 20
of the 2005 Act. Major accident hazard sites covered by the EU COMAH Regulations, need to have
emergency plans in place to cover major accidents involving chemicals. Details of what is required
are covered at Control of Major Accident Hazards on this website.

9|Page
What key questions should an employer ask her/himself to determine the adequacy of safety
and health management in the organisation?
The following are some key questions for employers to assist in determining the adequacy of their
safety and health management in the organisation:

 Does your executive board of directors or senior management team ensure all their
decisions reflect the safety and health intentions in your Safety Statement?
 Does your executive board of directors or senior management team recognise the need to
involve all staff in issues that affect their safety and health?
 Do your directors and senior managers provide daily safety and health leadership in the
organisation?
 Do you have an agreed safety and health policy? Is it written into your Safety Statement?
 Have you allocated responsibilities for safety and health to specific people - are they clear on
what they have to do and are they held accountable?
 Is safety and health always considered before any new work is started or work equipment is
bought?
 Did you consult and involve your staff and your safety representatives effectively?
 Have you identified the hazards and assessed the risks to your own staff, to others and to
the public in the workplaces you control?
 Do you set standards for the premises, plant, substances, procedures and people you
control or the products you produce? Are these standards in place and the risks effectively
controlled?
 Do you have an emergency plan to deal with serious or imminent danger, e.g. fires, process
deviations, gas leaks, the effects of poor weather, floods etc?
 Does your staff have sufficient information about the risks they are exposed to and the
preventive measures they must take?
 Do you have the right levels of safety and health expertise? Are your employees properly
trained and do they attend the training provided by you?
 Do you need specialist safety and health advice from outside and if so have you arranged to
obtain it?
 Does all your staff accept their responsibilities under safety and health law?

10 | P a g e
How can the safety and health management system
be monitored?
It should be a line-management responsibility to monitor safety and health performance against
predetermined plans and standards. Monitoring reinforces management’s commitment to safety and
health objectives in general and helps to develop a positive safety and health culture by rewarding
positive work done to control risk. Two types of monitoring are required:

1. Active Systems, that monitor the design, development, installation and operation of
management arrangements, safety systems and workplace precautions.
2. Reactive Systems, that monitor accidents, ill health, incidents and other evidence of
deficient safety and health performance.

1. Active monitoring
Every organisation should collect information to investigate the causes of substandard performance
or conditions adequately. Documented procedures for carrying out these activities on a regular basis
for key operations should be established and maintained. The monitoring system should include:

 identification of the appropriate data to be collected and accuracy of the results required
 monitoring of the achievement of specific plans, set performances criteria and objectives
 installation of the requisite monitoring equipment and assessment of its accuracy and
reliability
 calibration and regular maintenance of this equipment together with documented records of
both the procedures involved and the results obtained
 analysis and records of the monitoring data collected and documented actions to be taken
when results breach performance criteria
 evaluation of all the data as part of the safety and health management review
 documented procedures for reviewing the monitoring and safety and health implications of
forthcoming changes to work systems.

Techniques that should be used for active measurement of the safety and health management
system include:

 systematic inspections of workplace processes or services to monitor specific objectives, e.g.


weekly, monthly or quarterly reports
 systematic review of the organisation’s Risk Assessments to determine whether they are
functioning as intended or need to be updated, and are the necessary improvements being
implemented
 plant or machinery inspections, e.g. statutory plant inspections and certification
 environmental sampling for dusts, chemical fumes, noise or biological agents
 analysis of safety and health management system records.

2. Reactive monitoring
A system of internal reporting of all accidents (which includes ill health cases) and incidents of non-
compliance with the safety and health management system should be set up so that the experience
gained may be used to improve the management system. The organisation should encourage an
open and positive approach to reporting and follow-up and should also put in place a system of
ensuring that reporting requirements are met.

11 | P a g e
The organisation should establish procedures for investigating accidents and incidents to identify
their causes, including possible deficiencies in the safety and health management system. Those
responsible for investigating accidents, and incidents should be identified and the investigation
should include plans for corrective action, which incorporate measures for:

 restoring compliance as quickly as possible


 preventing recurrence
 evaluating and mitigating any adverse safety and health effects
 reviewing the Risk Assessments to which the accident relates
 assessing the effects of the proposed remedial measures.

Should the management of safety and health


be audited in addition to monitoring performance?
Monitoring provides the information to let the organisation review activities and decide how to
improve performance. Auditing and performance review are the final steps in the safety and health
management control cycle. They constitute the ‘feedback loop’ that enables an organisation to
reinforce, maintain and develop its ability to reduce risks to the fullest extent and to ensure the
continued effectiveness of its safety and health management system. Audits, by the organisation’s
own staff or by external bodies, complement monitoring activities by looking to see if the safety and
health management systems are actually achieving the right results. Combine the results from
measuring performance with information from audits to improve the organisation’s overall approach
to safety and health management.
The organisation should establish and maintain a programme and procedures for periodic safety and
health management system audits to be carried out. This enables a critical appraisal of all the
elements of the safety and health management system to be made. Auditing is the structured
process of collecting independent information on the efficiency, effectiveness and reliability of the
total safety and health management system and drawing up plans for corrective action. These audits
should be carried out in addition to routine monitoring, inspection and surveillance of the safety and
health management system. The purpose of these audits is to ensure the continued suitability,
adequacy and effectiveness of the safety and health management system. The audit process should
ensure that the necessary information is collected to allow management to carry out this evaluation
adequately.
The organisation should establish and maintain audit records consistent with the safety and health
management system records. Their retention times should be established and must comply with
legal requirements.

12 | P a g e
What should be contained in the system audit
protocols and procedures?
The protocols and procedures for the audit on the health and safety management system should
include the following:

 the allocation of resources to the process


 personnel requirements, including that of the audit team, i.e. competence required for
auditors (auditors should have the appropriate training and skills so that they can assess
physical, human and other factors and the use of procedures as well as documents or
records - wherever possible, auditors should be independent of the activity being audited and
include support from a wider range of specialists if necessary)
 the methodologies for conducting and documenting the audits, which may include checklists,
questionnaires, interviews, measurement and direct observation
 the procedures for reporting audit findings to those responsible to facilitate timely corrective
action and improvement
 a system for auditing and tracking the implementation of audit recommendations to include
addressing the possible need for changes to safety and health policy, objectives and other
elements of the safety and health management system.

What key questions should an employer ask


her/himself when measuring, reviewing and auditing
their safety and health performance?
The key questions that an employer should ask when measuring, reviewing and auditing their safety
and health performance are:

 Do you know how well you perform in safety and health?


 Are your executive board, your directors and senior management team kept informed of your
safety and health performance and do you report on this performance in your annual report?
 How do you know if you are meeting your own objectives and standards for safety and
health? Are your controls for risks good enough?
 How do you know you are complying with the safety and health laws that affect your
business?
 Do your accident or incident investigations get to all the underlying causes - or do they stop
when you find the first person that has made a mistake?
 Do you have accurate records of injuries, ill health, bullying complaints and accidental loss?
 Do you report on safety and health failures to your board and your directors?
 How do you learn from your mistakes and your successes?
 Do you carry out safety and health audits at least annually? If you do, what action do you
take on audit findings?

13 | P a g e
 Do the audits involve staff at all levels? Do you involve your safety representative and safety
committee, where it exists, in the audits?
 When did you last review your Safety Statement and your safety and health performance?
 Does your executive board of Directors or senior management team review your safety and
health performance and ensure safety and health risk management systems are in place and
remain effective?
 Has your executive board and your Directors or senior management team appointed
someone at Director level to ensure safety and health risk management issues are properly
addressed and is this person competent to do so?

How does the employer train staff to ensure they have


the skills, knowledge and attitudes to make them
competent in the safety and health aspects of their
work?
Under Section 10 of the Safety, Health and Welfare at Work Act 2005, employers must provide their
employees with the instruction and training necessary to ensure their safety and health. There are
specific training obligations for employees involved in the safety consultation and safety
representation processes. Safety and health training must form part of the training of all people who
work at the workplace. Training helps people acquire the skills, knowledge and attitudes to make
them competent in the safety and health aspects of their work. It includes formal off-the-job training,
instruction to individuals and groups, and on-the-job coaching and counselling. However, training is
not a substitute for proper risk control, for example to compensate for poorly designed plant or
inadequate workstations. The key to effective training is to understand job requirements and
individual abilities.
In order to train staff to ensure they obtain the necessary skills, knowledge and attitudes to make
them competent in the safety and health aspects of their work, it is important to identify appropriate
training objectives and methods by first identifying the training needs. Training needs may be
organisational, job-related and individual:
1. Organisational needs: Everyone in the organisation should know about the organisation‘s Safety
Statement and the philosophy underlying it and the structure and systems for delivering the policy.
Employees should also know which parts of the systems are relevant to them, to understand the
major risks in the organisation‘s activities and how they are controlled.
2. Job-related needs: These fall into two main types - management needs and non-management
needs.
Management needs include:

 leadership skills
 communication skills

14 | P a g e
 techniques of safety and health management
 training, instruction, coaching and problem - solving skills relevant to safety and health
 understanding of the risks in a manager's area of responsibility
 knowledge of relevant legislation and appropriate methods of control, including risk
assessment
 knowledge of the organisation‘s planning, measuring, reviewing and auditing arrangements
 awareness of the financial and economic benefits of good safety and health performance.

Non-management needs include:

 an overview of safety and health principles


 detailed knowledge of the safety and health arrangements relevant to an individual‘s job
 communication and problem-solving skills to encourage effective participation in safety and
health activities.

3. Individual needs: Individual needs are generally identified through performance appraisal. They
may also arise because an individual has not absorbed formal job training or information provided as
part of their induction. Training needs vary over time, and assessments should cover:

 induction of new starters, including part-time and temporary workers


 maintaining or updating the performance of established employees, especially if they may be
involved in critical emergency procedures
 job changes, promotion or when someone has to deputize
 introduction of new equipment or technology
 follow-up action after an incident investigation.

15 | P a g e
How does an organisation ensure it has access to
sufficient safety and health knowledge, skills and/or
experience to identify and manage safety and health
risks effectively?
Organisations should ensure they have access to sufficient safety and health knowledge, skills or
experience to identify and manage safety and health risks effectively, and to set appropriate
objectives by:

 training managers to a sufficient level of competence to be able to manage their activities


safely and keep up to date with developments in safety and health
 employing appropriate safety and health professionals as part of the management team to
advise the organisation on relevant safety and health matters
 acquiring the necessary skills and advice from external providers as required.

Whichever method or combination of these methods is chosen by an organisation it does not relieve
the employer and the management of the organisation from their legal responsibilities to ensure a
safe workplace.

****************************************************************************************************************

What is the role of the safety and health advisor?


Safety and health advisers should have the status and competence to advise management and
employees with authority and independence. By virtue of the definition of ‘competent person’ under
the 2005 Act, they must possess sufficient training, experience and knowledge appropriate to the
work to be done. They should be capable of advising on:

 formulating and developing safety and health policies, not just for existing activities but also
with respect to new acquisitions or processes
 promoting a positive safety and health culture in the organisation and securing the effective
implementation of safety and health policy
 planning for safety and health, including the setting of realistic short and long term
objectives, deciding priorities and establishing adequate systems and performance
standards
 day-to-day implementation and monitoring of policy and plans, including accident and
incident investigation, reporting and analysis
 reviewing performance and auditing the whole safety and health management system.

To do this properly, safety and health advisers should:

 be properly trained by reputable organisations or be individuals who are suitably qualified;


having membership of recognised professional safety and health bodies such
as IOSH or BOHS and having a qualification to at least Diploma level in a recognised third-
level safety and health course will offer routes for demonstrating competence

16 | P a g e
 maintain adequate information systems on topics including safety and health law, safety and
health management and technical advances
 demonstrate the ability to interpret the law in the context of the organisation
 be involved in establishing organisational arrangements, systems and risk - control
standards relating to hardware and human performance, by advising line management on
matters such as legal and technical standards
 establish and maintain procedures for reporting, investigating, recording and analysing
accidents and incidents
 establish and maintain procedures, including monitoring and other means such as review
and auditing, to ensure that senior managers get a true picture of how well safety and health
is being managed (where a benchmarking role may be especially valuable)
 present their advice independently and effectively.

What information should be covered in accident and


incident reports?
Key information to be covered in accident, ill-health and incident reports include:

1. The event:
 Details of any injured person, including age, sex, experience, training, etc.
 A description of the circumstances, including the place, time of day and conditions.

Details of the event, including:

 any actions which led directly to the event


 the direct causes of any injuries, ill-health or other loss
 the immediate causes of the event
 the underlying causes, e.g. failures in workplace precautions, risk control systems or
management arrangements
 Details of the outcomes, including in particular:

a. the nature of the outcome for example, injuries or ill-health to employees or members of the
public; damage to property; process disruptions; emissions to the environment; creation of
hazards
b. the severity of the harm caused, including injuries, ill-health and losses
c. the immediate management response to the situation and its adequacy, i.e.
Was it dealt with promptly?
Were continuing risks dealt with promptly and adequately? Was the first-aid response
adequate?
Were emergency procedures followed?
d. Whether the event was preventable and if so how.

17 | P a g e
2. The potential consequences:
 What was the worst that could have happened?
 What prevented the worst from happening?
 How often could such an event occur (the ‘Recurrence Potential’)?
 What was the worst injury or damage, which could have resulted (the ‘Severity Potential’)?
 How many people could the event have affected (the 'Population Potential')?

3. Recommendations:
 Prioritised actions with responsibilities and targets for completion
 Whether the risk assessments need to be reviewed and the safety statement updated.

4. Learning from and communicating results from investigations:


The organisation, having learnt from its investigations, should:

 identify root causes in the safety and health and general management of the organisation
 communicate findings and recommendations to all relevant parties
 include relevant findings and recommendations from investigations in the continuing safety
and health review process.

5. Cautions in using accident and ill health data:


Accident and ill health data are important, as they are a direct indicator of safety and health
performance. However, some cautions relating to their use are:

 most organisations have too few injury accidents or cases of work-related ill health to
distinguish real trends from random effects.
 if more work is done by the same number of people in the same time, increased workload
alone may account for an increase in accident rates.
 the length of absence from work attributed to injury or work-related ill health may be
influenced by factors other than the severity of injury or occupational ill health. Such factors
can include poor morale, monotonous work, stressful working conditions, poor management /
employee relations and local advice or traditions.
 accidents are often under-reported, and occasionally over-reported. Levels of reporting can
change. They can improve as a result of increased workforce awareness and better reporting
and recording systems.
 a time delay can occur between safety and health management system failures and harmful
effects. Moreover, many occupational diseases have long latent periods. Management
should not wait for harm to occur before judging whether safety and health management
systems are working.

**********************************************************************

18 | P a g e
HEALTH & SAFETY
 Along with physical health, we should also
take care of our mental health also.
 When you are stressed or depressed or
anxious, you may not able to do your duties
properly.
 But these types of issues, most of the time we
ignore.
 So if you want to be a real healthy person,
then You must have to take care of your
mental health also.
 Whenever you feel any stress, anxiety or
depression, take the help of professionals.
Some of the health issues faced by software engineers are:
1.Neck and back pain due to sitting on desks in front of the
computer for long hours.
2.Eye pain or bloodshot eyes due to staring continuously at the
computer screen.
3.Spine problems due to sitting on a desk for long durations.
4.Insomnia due to working on the computer for long hours.
5.Headache due to long hours of work and exposure to the
screen for a long time.
6.Increase in weight due to sitting in a desk causing inactivity.
7.Depression and stress due to increased amount of work and
therefore the need to work overtime.
8.Improper blood circulation in the body due to the sitting idle in a
desk.
9.Risk of bacterial infection due to dirty tech equipment.
10.Laziness due to inactivity.
11.Reduced vision or gradually reducing eyesight due to
continuous exposure to computer screen.
STAY
HEALTHY & SAFE !!
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Module 2: Unit- 1.2


Maintain a Healthy, Safe and Secure Working
Environment
Topic Activities

Maintain a Healthy, Safe and Secure Working By the end of this session, you will be able
Environment to learn about:
1. Workplace safety
2. Reporting accidents and emergencies
3. Protecting health and safety as you
work

Material and Handouts

Facilitator Material Participant Material and Handouts

Facilitator Guide, Handouts  Participants’ Guide

Classroom Session Map

Topic description Location

 Welcome participants to the course  Classroom


 Introduce facilitators
 Recap of core skills through questions and Polling Questions
 Review learning objectives

 Discuss the significance of work place safety  Classroom


 Create awareness on basic safety guidelines
 Summarize the appropriate discussion points from the breakout sessions

 Discuss accidents and emergencies and how to identify one.  Classroom


 How to address risks and threats and handle accidents
 Create awareness around how to handle general emergencies

Page 37 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

 Develop understanding on the potential health and safety hazards found  Classroom
at work place
 Create awareness on the common safety signs used at workplace

Facilitator Preparation
Responsibilities

 Review examples provided: reflect on your own experiences and determine when to
share them.

 Review all material – Facilitator Guide, Presentation, Guides and Handouts (if any)

 Make sure you havecopies of all the handouts.

 Make sure the learning resources are loaded on your computer.

 Conduct a run through of the content. Conduct a dress rehearsal of the session as you
move through the content. Make sure you are comfortable with the tools and
interactions recommended in the facilitator guide.

 Note that all examples are in italics to emphasize key learning points; however, you
may use your own professional experience to enhance the learning.

 Make sure you create folders for all breakout activities.

Page 38 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Principles of Facilitating
Personal Experiences
As a facilitator, you lead participants through prepared scenarios and discussions. During
this process, relate your own professional experience to add realism. Often, personal
experienceson how you helped a colleague through the career ownership process and
guided them to achieving work satisfaction are more memorable than step-by-step
instructions on following the career ownership process. Sharing experiences helps
participants understand how professionals work and think, and gives them the opportunity
to apply those lessons to their own work processes. Also, participants are more likely to
remember answers if they have to think and explore on their own. Your goal is to foster
independent thinking and action rather than having participants depend on your experience.

Experiential Learning
This workshop includes exercises designed to help participants discover the principles of
guiding the participants through the career ownership process and career satisfaction.
Encourage a free-wheeling discussion and call out important trends and insights. Make
liberal use of the whiteboard to capture and display critical participant insights.

Socratic Questions
Your goal throughout the session is to guide participants towards thinking through the
scenarios and discussion questions independently, rather than providing answer. For
example:

Rather than saying… Ask…

The Reality Check worksheet provides valuable What information can you gather from the
information about how time is currently spent and Reality Check worksheet and how can the
what it would look like in the best case scenario. information be used to move towards
career satisfaction?

Page 39 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Session : 1 - Welcome and Introduction


Topic:Welcome and Introduction

Health, Safety and Security


Welcome the participants to the course and move to the introductions.

Introductions
I am <Facilitator’s Name> and I am your facilitator today.”
Briefly review the roles of the Lead Facilitator and Support Facilitator, if any.
Give a brief of your own experience and background.

Why are you here today? [Course Objectives]


“Why are you here today?”
After reviewing and arranging responses, summarize the responses and map the
responses to the suggested course benefits below.

“Regardless of why you’re here today, we’re all going to walk away with some key benefits – let’s
discuss those briefly.”

Suggested Responses/Benefits to Debrief:


The benefits of this course include:
 Impact of workplace disasters and need for workplace safety
 Clear understanding of the basic guidelines to be followed in the event of a risk or a
hazardous event
 Awareness of the common security threats and risks and actions to be taken to address
them.

Review the course objectives listed above.


“To fulfill these objectives today, we’ll be conducting a number of hands-on activities.
Hopefully we can open up some good conversations and some of you can share your

Page 40 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Topic:Welcome and Introduction


experiences so that we can make this session as interactive as possible. Your participation will
be crucial to your learning experience and that of your peers here in the session today.”

Knowledge Check Question 1


“Please answer the following question.” Discuss and debrief the correct answer
Question: What are some of the hazardous events that may happen at your workplace?
A. Fire break-out (fire accident)
B. Terrorist/ Bomb threat
C. Tripping accidents
D. All of the above
Answer – All of the above – each of the situation A,B,C can lead to an hazardous event
and needs to have a safely plan in place to mitigate and manage the risk if it occurs.

Page 41 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Session: 2– Workplace Safety


Key Points

Let’s Get Started


Importance of prevention of disasters/ risk events

Provide a brief overview of the session. Discuss the importance of prevention of


disasters than be sorry after the event.
Open up the discussion for the session and ask participants to share their thoughts on
“workplace safety”?

The first part of this session discusses the following:


 “Prepare and prevent, don't repair and repent.”
 It’s better to prevent disasters from happening than be sorry and suffer after the accident.
 It is important to follow safety rules in any office and as future employees, you should know
about these safety rules.

Why Workplace Safety

Ask the question to the participants and gather responses.


Discuss the responses with the group to understand the significance of workplace
safety.

1. Refer to the Workplace Safety Rules table in the material later and identify
the rules that employees/workers must follow.
2. Refer to the Vocabulary Words table in the material later if you do not
understand the meaning of a word/term.

Page 42 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Suggested Responses:
 Safety rules in the workplace protect workers from injury or death.
 These rules teach workers how to work safely – use rules in tables as outlined later to
discuss .

Basic Workplace Safety Guidelines

Prompt participants to come up with basic safety rules that they follow at their
workplace.

 Fire Safety
Employees should be aware of all emergency exits, including fire escape routes, of the
office building and also the locations of fire extinguishers and alarms.

 Falls and Slips


To avoid falls and slips, all things must be arranged properly. Any spilt liquid, food or
other items such as paints must be immediately cleaned to avoid any accidents. Make sure
there is proper lighting and all damaged equipment, stairways and light fixtures are
repaired immediately.

 First Aid
Employees should know about the location of first-aid kits in the office. First-aid kits
should be kept in places that can be reached quickly. These kits should contain all the
important items for first aid, for example, all the things required to deal with common
problems such as cuts, burns, headaches, muscle cramps, etc.

 Security
Employees should make sure that they keep their personal things in a safe place.

 Electrical Safety
Employees must be provided basic knowledge of using electrical equipment and common
problems. Employees must also be provided instructions about electrical safety such as
keeping water and food items away from electrical equipment. Electrical staff and
engineers should carry out routine inspections of all wiring to make sure there are no
damaged or broken wires.

Page 43 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Check Your Understanding

1. True or False? The employer and employees are responsible for workplace
safety.
a. True
b. False

Suggested Responses:
Yes – It is the joint responsibility of both employer and employees to ensure that the workplace
is safe and secure.

2. True or False? Any injury at work should be reported to the supervisor


immediately.
a. True
b. False

Suggested Responses:
True, always keep the management informed on any potential injury or health, Safety and
Security events or risks noticed in an organization.

3. True or False? No matter how big or small the injury; the injured person
should receive medical attention.
a. True
b. False

Suggested Responses:
True, No matter what the size of the injury – it is critical that medical help is sought.
Sometimes physical injury may be minimal but internal injury cannot be assessed, which could
be critical.

Page 44 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

4. True or False? While working with machines and equipment, employees


must follow the safety guidelines set by the company.
c. True
d. False

Suggested Responses:
True, all guidelines set by the company takes into account the potentials risks to the employees
and measures to encounter those risks. While there is temptations to indentify shortcuts or
alternative ways of working with machines and equipment, the pre-defined protocol should not
be changed without proper authorization by requisite experts.

5. True or False? At any office, the first-aid kit should always be available for
use in an emergency.
a. True
b. False

Suggested Responses:
True, at times when a medical emergency hits – the medical aid could take time to reach. First
aid kits can provide relief in the interim and prevent increased risks.

6. True or False? It is optional to participate in the random fire drills conducted


by the Offices from time-to-time.
a. True
b. False

Suggested Responses:
False, fire drills are critical activities that everyone should participate unless permissions have
been taken prior. When emergency hits knowing the process to follow is very critical to
provide safety support to the employees. It is always better to be prepared for such situations.

7. True or False? The "Wet Floor" sign is not needed and causes problems for people. Wet
floor can be identified easily, without the signs.
a. True
b. False

Page 45 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Suggested Responses:
False, Wet floors are hazards waiting to happen unless clearly marked. It is not
very easy from distance to know if a surface is wet. Slips can be fatal and should be avoided
with proper signage on the floor when they are wet

8. True or False? It is okay to place heavy and light items on the same shelf.
a. True
b. False

Suggested Responses:
False, Heavy and light items should be clearly demarcated. If heavy items are placed by error
in light items they can lead to breakage and other related accidents. Further limits of weight
for each shelf should be defined so that it does not exceed acceptable limits.

9. True or False? There is no need to train employees on how to use the fire
extinguisher. They can operate extinguishers following the instruction written
on the extinguisher case, when needed.
a. True
b. False

Suggested Responses:
False, In case of situations like fire panic is created and there will not be enough time to react.
Reading instructions will be a challenge leading to more disaster. Being prepared knowing
how to operate will enable employees to react fast and prevent further damage that could be
caused due to fire

10. True or False? The cleaning supplies, especially chemical products, can be left in the
bathrooms or in any of the cupboards in the office.
a. True
b. False

Page 46 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Suggested Responses:
False, Cleaning supplies and other chemical products should be kept safe and
secure with only authorized staff. If consumed by error can cause harmful
impact.

Create Your Own Checklist

Activity Description:
Based on what you have learnt, create safety checklists for yourself.
These checklists will be discussed in the next session.

Summary

 It is important to follow safety rules to prevent accidents and protect


workers.
 Employees must follow safety guidelines for the following:
 Fire safety
 Falls and slips
 Electrical safety
 Use of first aid

Page 47 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Case studies of hazardous events

Case 1: On Friday, June 13, 1997 a fire broke out at Uphaar Cinema, Green Park, Delhi, while
the film Border was being shown. The fire happened because of a blast in a transformer in an
underground parking lot in the five-organization building which housed the cinema hall and
several offices.59 people died and 103 were seriously hurt when people rushed to move out of
the exit doors. Many people were trapped on the balcony and died because the exit doors were
locked.

Case 2: 43 people died when fire broke out on the fifth and sixth floors of the Stephen Court
building in Kolkata.

Case 3: 9 people were killed and 68 hurt when a fire accident took place in a commercial
complex in Bangalore.

Case 4: In Kolkata, more than 90 people were killed when a fire broke out at the Advanced
Medicare and Research Institute (AMRI) Hospitals at Dhakuria.

Page 48 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Module 3 - Unit: 1.2


Session: 3- Report Accidents and Emergencies
Key Points

Accidents and Emergencies

Ask participants to define accidents and emergencies.


Gather responses.
Start the session by connecting the course content to the candidate responses.

Discuss the definition of ‘accidents and emergencies’ and the events that fall in the category of
accidents.

An accident is an unplanned, uncontrolled, or unforeseen event resulting in injury or harm to


people and damages to goods. For example, a person falling down and getting injured or a
glassware item that broke upon being knocked over. Emergency is a serious or crisis situation that
needs immediate attention and action. For example, a customer having a heart attack or sudden
outbreak of fire in your organization needs immediate attention.

Each organization or chain of organizations has procedures and practices to handle and report
accidents and take care of emergencies. Although you will find most of these procedures and
practices common across the industry, some procedures might be modified to fit a particular type of
business within the industry. For example, procedure to handle accidents caused by slipping or
falling will be similar across the industry. You need to be aware of the general procedures and
practices as well as the ones specific to your organization.

The following are some of the guidelines for identifying and reporting an accident or emergency:

Notice and correctly identify accidents and emergencies: You need to be aware of what
constitutes an emergency and what constitutes an accident in an organization. The organization’s
policies and guidelines will be the best guide in this matter. You should be able to accurately
identify such incidents in your organization. You should also be aware of the procedures to tackle
each form of accident and emergency.

Page 49 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Get help promptly and in the most suitable way: Follow the
procedure for handling a particular type of accident and
emergency. Promptly act as per the guidelines. Ensure that you
provide the required help and support as laid down in the policies.
Do not act outside the guidelines and policies laid down for your
role even if your actions are motivated by the best intention.
Remember that only properly trained and certified professionals
may be authorized to take decisions beyond the organization’s
policies and guidelines, if the situation requires.

Follow company policies and procedures for preventing further injury while waiting for help
to arrive: If someone is injured, do not act as per your impulse or gut feeling. Go as per the
procedures laid down by your organization’s policy for tackling injuries. You need to stay calm and
follow the prescribed procedures. If you panic or act outside the prescribed guidelines, you may
end up further aggravating the emergency situation or putting the injured person into further
danger. You may even end up injuring yourself.

Act within the limits of your responsibility and authority when accidents and emergencies
arise: Provide help and support within your authorized limit. Provide medical help to the injured
only if you are certified to provide the necessary aid. Otherwise, wait for the professionals to arrive
and give necessary help. In case of emergencies also, act within your authorized limits and let the
professionals do the task allocated to them. Do not attempt to handle any emergency situation for
which you do not have formal training or authority. You may end up harming yourself and the
people around you.

Promptly follow instructions given by senior staff and the emergency services: Provide
necessary services as described by the organization’s policy for your role. Also, follow the
instructions of senior staff that are trained to handle particular situations. Work under their
supervision when handling accidents and emergencies.

Types of Accidents

The following are some of commonly occurring accidents in organizations:

Trip and fall: Customers or employees can trip on carelessly left loose material and fall down,
such as tripping on loose wires, goods left on aisles, elevated threshold. This type of accident may
result in simple bruises to serious fractures.

Page 50 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
Slip and fall: People may lose foothold on the floor and stairs
resulting in injuries. Slips are mainly due to wet floors. Other
causes: spilling of liquids or throwing of other slip-causing
material on floors, such fruit peels. Tripping and slipping is
generally caused by negligence, which can be either from the
side of organization employees or from the side of customers.
It can also be due to broken or uneven walking surface, such as
broken or loose floor tile. However, you should prevent any
such negligence. In addition, people should be properly
cautioned against tripping and slipping. For example, a “wet
floor” sign will warn people to walk carefully on freshly
mopped floors. Similarly, “watch your steps” signs can prevent accidents on a staircase with a
sharp bent or warn against a loose floor tile.

Injuries caused due to escalators or elevators (or lifts): Although such injuries are uncommon,
they mainly happen to children, ladies, and elderly. Injuries can be caused by falling on escalators
and getting hurt. People may be injured in elevators by falling down due to sudden, jerking
movement of elevators or by tripping on elevators’ threshold. They may also get stuck in elevators
resulting in panic and trauma. Escalators and elevators should be checked regularly for proper and
safe functioning by the right person or department. If you notice any sign of malfunctioning of
escalators or elevators, immediately inform the right people. If organization’s procedures are not
being followed properly for checking and maintaining these, escalate to appropriate authorities in
the organization.

Accidents due to falling of goods: Goods can fall on people from shelves or wall hangings and
injure them. This typically happens if pieces of goods have been piled improperly or kept in an
inappropriate manner. Always check that pieces of goods are placed properly and securely.

Accidents due to moving objects: Moving objects, such as trolleys, can also injure people in the
organization. In addition, improperly kept props and lighting fixtures can result in accidents. For
example, nails coming out dangerously from props can cause cuts. Loosely plugged in lighting
fixtures can result in electric shocks.

Page 51 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Activity Description:
1. Refer to the Workplace Safety Rules table in the Student Workbook
and identify the rules that employees/workers must follow.
2. Refer to the Vocabulary Words table if you do not understand the
meaning of a word/term.

Workplace Safety Rules


# Workplace Safety Rules Followed by Followed by employers
workers

1 Keep the floor dry all the time.


2 Regularly check safety equipment
such as fire extinguishers to make
sure they are in working condition.

3 Mark fire exit doors clearly.


4 Know where fire extinguishers and
fire alarms are kept.

5 Conduct mock drills regularly.


6 Find out the fire escape routes in a
building.

7 Keep first-aid kits where they can


be easily found.

8 Make sure that first-aid kits are


stocked with all necessary things.

9 Check and service all electrical


equipment regularly.
10 Repair faulty machinery
immediately.

Page 52 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
11 Make sure there is proper lighting
in all areas.

12 Make sure that the office layout and


furniture are designed and arranged
so that they do not cause injury to
workers.

Vocabulary Words
Mock Drill/Fire Drill
Practice how to respond/react in case of an emergency, such as a fire

Fire Extinguisher
A small container usually filled with special chemicals for putting out a fire.

Exit
The way to go out of a building or room

First Aid Kit


A container, which has medicines and ointments

Fire Escape Route


The way out in case of a fire

Emergency
A sudden, urgent and unexpected event

Spilt Liquid
Soft drink/water/coffee/tea etc. that has fallen on the floor

Routine inspections –
Regular checking

Damaged equipment
Torn wires or broken plugs

Stairways
Staircase/ stairs to go to the next floor

Page 53 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
Light fixtures
Bulbs, tube lights etc.

Injury
Getting hurt/bleeding

Kitchen equipment
Vessels used in the kitchen, such as wok, knives, cutting board etc.

Cleaning Supplies
Liquid soap, dish washing liquid etc.

Handling Accidents

Try to avoid accidents in your organization by finding out all potential hazards and eliminating
them. If a colleague or customer in the organization is not following safety practices and
precautions, inform your supervisor or any other authorized personnel. Always remember that one
person’s careless action can harm the safety of many others in the organization. In case of an injury
to a colleague or a customer due to an accident in your organization, you should do the following:

Attend to the injured person immediately. Depending on the level and seriousness of the injury,
see that the injured person receives first aid or medical help at the earliest. You can give medical
treatment or first aid to the injured person only if you are qualified to give such treatments. Let
trained authorized people give first aid or medical treatment.

Inform your supervisor about the accident giving details about the probable cause of accident and
a description of the injury.

Assist your supervisor in investigating and finding out the actual cause of the accident. After
identifying the cause of the accident, help your supervisor to take appropriate actions to prevent
occurrences of similar accidents in future.

Page 54 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Activity Description:
1. Present a scenario where there is a physical injury to a colleague at
workplace.
2. Ask participants on how they would react/attend to the emergency.

Types of Emergencies

 Discuss the various types of emergencies that one may come across at the
workplace.
 Share some examples.

Each organization also has policies and procedures to tackle emergency situations. The purpose of
these policies and procedures is to ensure safety and well-being of customers and staff during
emergencies. Categories of emergencies may include the following:

Medical emergencies, such as heart attack or an expectant mother in labor: It is a medical


condition that poses an immediate risk to a person’s life or a long-term threat to the person’s health
if no actions are taken promptly.

Substance emergencies, such as fire, chemical spills, and explosions:


Substance emergency is an unfavourable situation caused by a toxic, hazardous,
or inflammable substance that has the capability of doing mass scale damage to
properties and people.

Structural emergencies, such as loss of power or collapsing of walls: Structural emergency is an


unfavourable situation caused by development of some faults in the building in which the
organization is located. Such an emergency can also be caused by the failure of an essential
function or service in the building, such as electricity or water supply failure. Such emergencies
result in a long-term or permanent disruption of the organization’s functions.

Page 55 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
Security emergencies, such as armed robberies, intruders, and mob attacks or civil disorder:
Security emergency is an unfavourable situation caused by a breach in security posing a significant
danger to life and property.

Natural disaster emergencies, such as floods and earthquakes: It is an emergency situation


caused by some natural calamity leading to injuries or deaths, as well as a large-scale destruction of
properties and essential service infrastructures.

Handling General Emergencies

It is important to have policies and procedures to tackle the given


categories of emergencies. You should be aware of at least the basic
procedures to handle emergencies. The basic procedures that you should
be aware of depend on the business of your organization. Typically, you
should seek answers to the following questions to understand what basic
emergency procedures that you should be aware of:
 What is the evacuation plan and procedure to follow in case of an
emergency?
 Who all should you notify within the organization?
 Which external agencies, such as police or ambulance, you should notify in which
emergency?

What all services and equipment should you shut down during which emergency?
Here are some general emergency handling procedures that you can follow:

 Keep a list of numbers to call during emergency, such as those of police, fire brigade, security,
ambulance etc. Ensure that these numbers are fed into the organizations telephone program and
hard copies of the numbers are placed at strategic locations in the organization.

Page 56 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

 Regularly check that all emergency handling equipments are in working condition, such as the
fire extinguisher and fire alarm system.

 Ensure that emergency exits are not obstructed and keys to such exists are easily accessible.
Never place any objects near the emergency doors or windows.

Check Your Understanding


1. True or False? An accident is a serious or crisis situation that needs immediate attention and
action.
a. True
b. False

Suggested Responses:
True, you need to attend to the person immediately, inform to the supervisor and assist the
supervisor

2. Which of the following are appropriate actions for handling accidents and emergencies?
Select the two correct actions.

a) You should give medical treatment or first aid to the injured


even if you are not properly trained in such procedures because
such treatments should be given promptly.
b) Take decisions beyond the organization’s policies and
guidelines, if the situation requires.
c) Get help promptly and in the most suitable way.
d) Follow instructions given by senior staff and the emergency
services.

Suggested Responses:

Page 57 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
c and d – You should provide first aid only if you are qualified to do such treatments. While
attending the accident or emergencies it is critical that all policy and guidelines needs to be
adhered to.

3. Match each type of emergency with its corresponding example.

Type of Emergency Example


A. Medical iv. An expectant mother in labor
B. Substance v. Chemical spills
C. Structural ii. Power failure
D. Security iii. Armed robbery
E. Natural Disaster i. Earthquake
Type of Emergency Example
A. Medical i. Earthquake
B. Substance ii. Power failure
C. Structural iii. Armed robbery
D. Security iv. An expectant mother in labor
E. Natural Disaster v. Chemical spills
Suggested
Responses:

Page 58 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Summary

 Identify and report accidents and emergencies:


 Notice and correctly identify accidents and emergencies.
 Get help promptly and in the most suitable way.
 Follow company policy and procedures for preventing further
injury while waiting for help to arrive.
 Act within the limits of your responsibility and authority when
accidents and emergencies arise.
 Promptly follow the instructions given by senior staff and the
emergency services personnel.
 Handling accidents:
 Attend the injured person immediately.
 Inform your supervisor about the accident giving details.
 Assist your supervisor in investigating and finding out the actual
cause of the accident.
 General emergency handling procedures:
 Keep a list of numbers to call during emergencies.
 Regularly check that all emergency handling equipment is in
working condition.
 Ensure that emergency exits are not obstructed.

Page 59 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Session: 4 – Protect Health & Safety as You Work


Key Points

Let’s Get Started


Provide a brief overview of the session.Discuss the key points/guidelines to protect
health and safety as you work.

o Each year, an estimated 2 million people die because of occupational accidents and work-
related diseases.
o Across the globe, there are almost 270 million occupational accidents and 160 million
work-related diseases each year.
Hazards

What are hazards?

In relation to workplace safety and health, hazard can be defined as any source of potential harm
or danger to someone or any adverse health effect produced under certain condition.

A hazard can harm an individual or an organization. For example, hazard to an organization


include loss of property or equipment while hazard to an individual involve harm to health or
body.

A variety of sources can be potential source of hazard at workplace. These hazards include
practices or substances that may cause harm. Here are a few examples of potential hazards:

o Material: Knife or
sharp edged nails can
cause cuts.
o Substance: Chemicals
such as Benzene can
cause fume
suffocation.
Inflammable substances like petrol can cause fire.
o Electrical energy: Naked wires or electrodes can result in electric shocks.

Page 60 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
o Condition: Wet floor can cause slippage. Working conditions in mines can cause health
hazards.
o Gravitational energy: Objects falling on you can cause injury.
o Rotating or moving objects: Clothes entangled into ratting objects can cause serious harm.

Potential Sources of Hazards in an Organization

Ask participants to come up with examples of sources/items/places that can


possibly be the root cause of the hazard.

Here are some potential sources of hazards in an organization:

Using computers: Hazards include poor sitting postures or excessive duration of sitting in one
position. These hazards may result in pain and strain. Making same movement repetitively can
also cause muscle fatigue In addition, glare from the computer screen can be harmful to eyes.
Stretching up at regular intervals or doing some simple yoga in your seat only can mitigate such
hazards.

Handling office equipment: Improper handling of office equipment can result in injuries. For
example, sharp-edged equipment if not handled properly can cause cuts. Staff members should be
trained to handle equipment properly. Relevant manual should be made available by
administration on handling equipment.

Handling objects: Lifting or moving heavy items without proper procedure or techniques can be
a source of potential hazard. Always follow approved procedure and proper posture for lifting or
moving objects.

Stress at work: In today’s organization, you may encounter various stress causing hazards. Long
working hours can be stressful and so can be aggressive conflicts or arguments with colleagues.
Always look for ways for conflict resolution with colleagues. Have some relaxing hobbies for
stress against long working hours.

Working environment: Potential hazards may include poor ventilation, inappropriate height
chairs and tables, stiffness of furniture, poor lighting, staff unaware of emergency procedures, or
poor housekeeping. Hazards may also include physical or emotional intimidation, such as

Page 61 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
bullying or ganging up against someone. Staff should be made aware of organization’s policies to
fight against all the given hazards related to working environment.

General Evacuation Procedures

Each organization will has its own evacuation procedures as listed in its policies. An alert
employee, who is well-informed about evacuation procedures, can not only save him or herself,
but also helps others in case of emergencies. Therefore, you should be aware of these procedures
and follow them properly during an emergency evacuation. Read your organization’s policies to
know about the procedures endorsed by it. In addition, here are a few general evacuation steps
that will always be useful in such situations:

o Leave the premises immediately and start moving towards the nearest emergency exit.
o Guide your customers to the emergency exits.
o If possible, assist any person with disability to move towards the emergency exit.
However, do not try to carry anyone unless you are trained to do so.
o Keep yourself light when evacuating the premises. You may carry your hand-held
belongings, such as bags or briefcase as you move towards the emergency exit. However,
do not come back into the building to pick up your belongings unless the area is declared
safe.
o Do not use the escalators or elevators (lifts) to avoid overcrowding and getting trapped, in
case there is a power failure. Use the stairs instead.
o Go to the emergency assembly area. Check if any of your colleagues are missing and
immediately inform the personnel in charge of emergency evacuation or your supervisor.
o Do not go back to the building you have evacuated till you are informed by authorized
personnel that it is safe to go inside.

After discussing the course content, ask candidates to prompt the key points
on their understanding of the evacuation procedures at their current
organization.

Safety Signs

Some of the common safety signs are given below. Note down the labels for
each sign.

Page 62 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
Discuss and check the participants understanding of the various safety signs given in the picture
above.

Review: Safety Guidelines Checklist

Page 63 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points
1. Store all cleaning chemicals in tightly closed containers in separate cupboards.
2. Keep the kitchen clean and dry all the time.
3. Throw away rubbish daily.
4. Make sure all areas have proper lighting.
5. In case of any injury or fracture, do not move the person until he or she has received medical
attention.
6. Do not wear loose clothing or jewelry when working with machines. It may catch on moving
equipment and cause a serious injury.
7. Never distract the attention of people who are working near fire or with some machinery,
tools or equipment.
8. Where required, wear protective items, such as goggles, safety glasses, masks, gloves, hair
nets, etc.
9. Shut down all machines before leaving for the day.
10. Do not play with electrical controls or switches.
11. Do not operate machines or equipment until you have been properly trained and allowed to
do so by your supervisor.
12. Do not adjust, clean or oil moving machinery.
13. Stack all shelves in an orderly way.
14. Stack all boxes and crates properly.
15. Never leave dishrags, aprons and other clothing near any hot surface.
16. Repair torn wires or broken plugs before using any electrical equipment.
17. Do not use equipment if it smokes, sparks or looks unsafe.
18. Cover all food with a lid, plastic wrap or aluminium foil.
19. Do not smoke in “No Smoking” areas.
20. Report any unsafe condition or acts to your supervisor. These could include:
 Slippery floors
 Missing entrance and exit signs
 Poorly lighted stairs
 Loose handrails or guard rails
 Loose, open or broken windows
 Dangerously piled supplies or equipment
 Unlocked doors and gates
 Electrical equipment left operating
 Open doors on electrical panels
 Leaks of steam, water, oil or other liquids
 Blocked aisles
 Blocked fire extinguishers.
 Blocked fire doors
 Smoke in non-smoking areas
 Roof leaks
 Safety devices not operating properly
Find the Problem

Page 64 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

In this activity, you will be shown some pictures. Observe the displayed pictures
carefully and identify the problems in each of the pictures that could cause
accidents.

There are many sources of hazard in the picture among others discuss about –
painter on ladder without any support being provided, people walking on the
stairs with wet paint on the floor, person picking a heavy box without any
support, ladder too close and in the way of the stairs.

Page 65 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

There are many sources of hazard in the picture among others discuss about –
Women climbing on the chair and handling electric equipment. Heater on the
floor could be a cause of fire, Computer wires hanging out could easibly get
entangled in legs and make people fall, there is material on the floor on the
mat where a person could trim, coffee kettle on a tall drawer chest could
create electric hazard

Page 66 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

There are many sources of hazard in the picture among others discuss about:

 Hard hat area but staff not wearing hat


 Water spilled on the floor
 Smoking in No smoking zone
 Picture frame on Health & Safety instructions

Page 67 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Healthy Living

What constitutes healthy living?

Eating a balanced diet: A balanced diet is a meal that provides you the right amount of
carbohydrate, fat, protein, vitamins, and minerals. A balanced diet helps to keep you physically
fit and provides stamina to work.

Having proper sleep: Good sleep reduces stress, reduces risk for developing diseases, and keeps
you alert. You need to get 6 or 7 hours of sleep each night. Lack of sleep increases the chances of
high blood pressure and cholesterol, and stroke.

Exercising regularly: Exercise is a physical activity that keeps your body fit. Exercising helps
prevent development of disease conditions and makes you energetic.

Avoiding bad habits, such as smoking and drinking: It's not too late to identify and change
bad habits such as smoking, drinking, over-eating, and more. Understanding the harmful routines
is the first step to reversing these. The next step is realizing ways correct them and embracing
new ones, which help adopt healthier behaviours and start living a happier, healthier life.

Page 68 of 141
Facilitators Guide – SSC/ Q2101 – Associate Analytics

Key Points

Ergonomics: Ergonomics is the science concerned with designing and arranging things so that
people can use them easily and safely. Applying ergonomics can reduce the potential for
accidents, potential for injury and ill health, and improve performance and productivity.

Activity Description:
1. Make groups of 4-5.
2. Ask participants to discuss within group – and present their
thoughts on “healthy living”.

Summary

 Hazards can be defined as any source of potential harm or danger to


someone or any adverse health effect produced under certain condition.

 Some potential sources of hazards in an organization are as follows:

 Using computers

 Handling office equipment

 Handling objects

 Stress at work

 Working environment

 Every employee should be aware of evacuation procedures and follow them


properly during an emergency evacuation.

 Follow all safety rules and warning to keep your workplace free from
accidents.

 Recognize all safety signs in offices.

 Report any incidence of non-compliance to safety rules and anything that is


a safety hazard.

Page 69 of 141
Matters of Discussion

Data Collection Methods


[ MTRL-6.1.2]
Knowledge sharing practice
[ MTRL-6.1.1]

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


1
Knowledge sharing practice
 Knowledge sharing is a learning activity such
as

 observation, listening and asking questions,


sharing ideas, suggesting potential solutions
and adopting patterns of behavior.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


2
Cont..
 Knowledge is a familiarity, awareness or
understanding of someone or something,
such as facts, information, descriptions, or
skills, which is acquired through experience
or education by perceiving, discovering, or
learning.
 Knowledge can refer to a theoretical or
practical understanding of a subject.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


3
Personal Knowledge
 Personal knowledge means knowledge of a
circumstance or fact gained through firsthand
observation or experience.
 "Personal knowledge means something the
witness actually saw or heard, as distinguished
from what he learned from some other person
or source."

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


4
Procedural Knowledge
Procedural knowledge is the type of knowledge
someone has and demonstrates through the
procedure of doing something.

Procedural knowledge, is the knowledge


exercised in the performance of some task.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


5
Proposition Knowledge
"To say something about a thing" is a

proposition.

"To say of a thing that another thing is true of it

or is false" is a proposition.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


6
KNOWLEDGE SHARING GOAL
 The ultimate goal of KS is to distribute the right
content to the right people at right time.

 Knowledge sharing depends on the habit and


willingness of the knowledge worker to seek out
and/or be receptive to these knowledge
sources.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


7
In practice… Learning from successes and
mistakes
 using existing knowledge to improve today’s
performance.
 Learning how to be more successful
 creating new knowledge to improve
tomorrow’s performance
 Improving collaboration
 joining things up
 Having the right knowledge in the right place
at the right time to make better decisions

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


8
Benefits of KS Expertise can be shared
 Turnover and job changes don’t cripple the
system
 Reduces Cycle time
 Reduces Costs
 More Efficient use and reuse of Knowledge
assets
 Enhance functional effectiveness
 Increases value of existing products and
services

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


9
 Knowledge sharing is defined as exchange,
transfer and dissemination of knowledge
between and among individuals, teams,
departments and organizations.
 Sharing knowledge involves formulating a
problem and suggesting potential solutions,
supplying justifications or stimulating events
to reflect on something.
 Knowledge sharing is a learning activity such
as observation, listening and asking
questions, sharing ideas, suggesting potential
solutions and adopting patterns of behavior.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
10
Tacit Vs. Explicit knowledge
 Tacit knowledge is what embedded in the
human mind can be expressed through
ability applications and it is transferred in
form of learning by doing by watching.
 Explicit knowledge is knowledge that is
straightforwardly expressed and shared
between people. It has been clearly
documented in a tangible form such as a
Standard Operating Procedure or a
marketing report.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


11
Explicit knowledge Tacit (implicit) knowledge
Subjective, cognitive, experiential
Objective, rational, technical
learning
Structured Personal
Fixed content Context sensitive/specific
Context independent Dynamically created
Externalized Internalized
Easily documented Difficult to capture and codify
Easy to codify Difficult to share
Easy to share Has high value
Easily transferred/ taught/learned Hard to document
Exists in high volumes Hard to transfer/teach/learn
Involves a lot of human
interpretation

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


12
knowledge conversion Framework

SPIRAL FRAMEWORK
13
tacit to tacit knowledge transfer
 As tacit knowledge is internal, and embedded in
people, human interactions are essential for its
transfer.
 So in the socialization process tacit knowledge in the
form of experience or skills can be transferred
between individuals.
 online social networks seem to be a more efficient
way to transfer tacit knowledge than are individual
face-to-face interactions.
 Tacit to Tacit: When skills and knowledge are shared
directly from one person to another - think about
how a new sales hire might learn through shadowing
your company's top seller.
14
tacit to explicit knowledge transfer
 The process of converting Tacit-to-Explicit is
called 'Externalization', that means making
internal & implicit knowledge, external &
explicit.
 Tacit Knowledge can only be made explicit
when it is possible to codify and express such
knowledge formally, in forms associated with
Explicit Knowledge.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


15
Explicit to Explicit knowledge transfer
 Explicit to Explicit:

 When existing explicit knowledge is collected


and synthesized into new knowledge.

 For example, when the finance team gathers


information from each department to
present the company’s annual budget.

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


16
Explicit to Tacit knowledge transfer
 Explicit to Tacit:
 When new knowledge is disseminated
throughout your organization, employees
can begin to internalize it and use it to
enhance and expand their own personal
knowledge.
 For example, onboarding documents can be
used to impart critical ideas and concepts
that new hires can draw on to create new
innovations.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
17
QUESTIONS
1) Investigate any knowledge conversion
Framework to explore the different models of
knowledge conversion forms.
2) Investigate the numerous data collection
techniques with an aim to accumulate data
from primary data sources.
3) Investigate the numerous data collection
techniques with an aim to accumulate data
from secondary data sources.
4) Investigate the hazards and solutions about the
knowledge sharing practices at organizational
level.
Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]
18
Cheers For the Great Patience!
Query Please?

Compiled By: Dr. Nilamadhab Mishra [(PhD- CSIE) Taiwan]


19
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/277997198

Knowledge sharing practice in organization

Article · January 2010

CITATIONS READS

3 9,446

2 authors:

Ts. Dr. Muhamad Saufi Che Rusuli Rosmaini Tasmin


University of Malaysia, Kelantan Universiti Tun Hussein Onn Malaysia
34 PUBLICATIONS   50 CITATIONS    54 PUBLICATIONS   208 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Relationship between knowledge management practices and library users' satisfaction at malaysian university libraries: A preliminary finding View project

System Assurance for Railway View project

All content following this page was uploaded by Ts. Dr. Muhamad Saufi Che Rusuli on 21 October 2015.

The user has requested enhancement of the downloaded file.


Knowledge SHARING Practice In Organization
1
Muhamad Saufi Che Rusuli & 2Rosmaini Tasmin
1
Library of Universiti Tun Hussein Onn Malaysia,
86400 Parit Raja, Batu Pahat, Johor. MALAYSIA.
msaufi@uthm.edu.my
2
Faculty of Technology Management, Business and Entrepreneurship,
Universiti Tun Hussein Onn Malaysia,
86400 Parit Raja, Batu Pahat, Johor. MALAYSIA.
rosmaini@uthm.edu.my

Abstract
The effectiveness of a knowledge sharing activities in organization has the potential of improving customer services, bringing
new product to market and reducing cost of business operations. Recently, Information Technologies are often used in knowledge
management in informing customers and employees of the latest innovation or development as well as sharing knowledge
among the employees. In knowledge management, effective knowledge sharing is considered to be one of the most vital
components of KM success. Knowledge sharing practice helps organization to improve performance and achieve their mission.
However, many researchers and authors agree and disagree with each others about embedding knowledge sharing practice in
workplace. Therefore, this paper discusses generally about knowledge sharing practices in organization to investigate whether
knowledge sharing is practiced and embedded sufficiently in organization.

1.0 Introduction

Knowledge sharing is central to success of all knowledge management strategists. Effective knowledge
sharing practices enable reuse and regeneration of knowledge at individual and organizational level.
In recent years there had been considerable emphasis on the need to create a culture in organization
that is proponent to knowledge sharing and implement strategies that are more knowledge friendly.
Nowadays, organizations worldwide have been seriously undertaking initiatives to ensure knowledge
management is successful by embedding knowledge sharing practices in their daily work process. In
Malaysia context, several organizations have taken initiatives to embed knowledge sharing in their
operational activities. They believe that through knowledge management platform they could share
the experience and knowledge from individual to individual without boundaries.

2.0 Definition of Knowledge Sharing

Park and Im (2003) defined knowledge sharing as “the process of transferring knowledge from a person
to another in organization. It is a process to accumulate shared knowledge among members”. Bock and
Kim (2002) stated it can be defined also as a kind of social interaction among people. Knowledge, unlike
information and is locked in the human mind and part of human identity. Frappaolo (2006) claimed
that knowledge sharing is about “how people share and use what they know”. In addition, Tasmin and
Woods (2007) asserted that knowledge sharing as a social system that supports collaboration and
integration which is normally facilitated by technology.
Dalkir (2005) also supported the defined notion that knowledge sharing is to be associated with
“appropriate mix” of technological channels for optimizing knowledge exchanges. Creating and
exchanging knowledge are intangible activities that can neither be supervised nor imposed. They
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

happen only when people cooperate voluntarily. This exchange of knowledge can lead to the creation
of new knowledge, which can be an important source of competitive advantage.
Referring to Bock and Kim (2002) stated that Davenport (1997) argues sharing knowledge is
often unnatural. He said that people will not share their knowledge as they think their knowledge is
valuable and important. But, Samieh and Wahba (2007) agreed that the knowledge sharing practice are
motivated and executed mainly at the individual levels. Even in the absence of strong organizational
norms of knowledge sharing, employees may tend to share knowledge according to their personal
benefits and cost. At the end, knowledge sharing practices can help organizations becomes more
profitable and undefeated.

3.0 Knowledge Sharing Practice at Work

Nowadays, many CEOs and managers in organizations understand the importance of knowledge
sharing among their employees and eager to introduce the knowledge management paradigm in
their workplaces. Chaudhry (2005) reported that several studies have been conducted during five
years to review knowledge management strategies and knowledge sharing practices in the local
organizations. Singapore, for example, provides an interesting case study in this regard. Singapore
is conservative in adhering to Asian cultural traditions and at the same time open to innovation
and creativity. It is a diverse and multiethnic society that is eager to stick to meritocracy and system
efficiency in its pursuits of innovation and creativity which are crucial to the success of knowledge
management activities.
In Malaysia, knowledge sharing practices are not widely implemented. Only several government
body and private sectors especially which have link with worldwide company embedding knowledge
sharing. Besides, the private companies which embedding knowledge sharing own their innovation
and creativity to become more profitable and knowledgeable.
Chong (2003) found that knowledge sharing was taking place on informal basis through face-to-
face communication and collaborative workgroups. His study reveals that knowledge is supported
in this environmental by a culture that encourages sharing of knowledge, learning from failures, and
developing people’s skills. Rastogi (2000) emphasized that organisational culture required favorable
social environment such as trust, shared values, and goodwill to facilite knowledge sharing. This
signifies the importance of trust in knowledge culture and knowledge sharing.
Lim, Tang and Yang (2004) agreed through face-to-face context, people that have knowledge
sharing attitudes were getting more evident rather than electronic medium. Employees were found
to be more willing to share knowledge with increased rewards.
“Embedding knowledge into everyday work process is time consuming and expensive”
Snowden (2002) stated it’s impossible to measure whether someone is sharing their knowledge or
not in organizations, but it is possible to measure if they comply with a process. Therefore, employees
are not susceptible to directive control in respect of intangible assets such as knowledge.
Norris et al. (2003) supported that knowledge becomes tangible as digitized content, as context
that can be digitally shared and through direct and indirect interactions. Knowledge can be created by
asking a question and watching responses provoke through conversations, responses, and interactions
among network participants.

4.0 Tacit versus Explicit

Today, managers are very concerned with implementing knowledge management practices in their
organization. They face a number of challenges in implementing and developing knowledge practice
methods. Both the growing literature on knowledge management and the advice offered by various

798
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

knowledge management consultants, however, seem to advocate forms of knowledge management


practice that often appear incomplete, inconsistent, and even contradictory. Knowledge is classified
into two types namely as the “tacit knowledge” and the “explicit knowledge”.
Basic conditions and elements should exist in the first place in organization for the Evolution of
tacit knowledge. Tacit knowledge is what embedded in the human mind can be expressed through
ability applications and it is transferred in form of learning by doing by watching (Lee and Choi, 2003).
The spiral model of knowledge from Nonaka shows that new knowledge always begins with the
individual, (e.g. a good researcher has an insight that leads to a new patent or a shop-floor worker
draws on years of experience to come up with a new process innovation.)
In this case, an individual’s personal knowledge is transformed into organizational knowledge,
which expands through the organization and is valuable to the company as a whole. Making personal
knowledge available to others should be the central activity of the knowledge and innovation creating
in company or organization. It takes place continuously and at all levels of the organization. Through
these interactions an organization creates a knowledge process, called knowledge conversion (Nonaka
et al., 2000; Alwis et al., 2004). These four modes of knowledge conversion form a spiral, the SECI
process.

Figure 1: SECI Process by Nonaka Takeuchi, 2000

Explicit knowledge is the type of knowledge that can be easily documented and shaped. It can be
created, written down, transferred and followed among the organizational units verbally or through
computer programs, patents, diagrams and information technologies (Calo, 2008; Keskin, 2005; Choi
& Lee, 2003). Explicit knowledge is easier to capture and distribute because of its ability to be passed
on in the form of tangible material. However, while it is easier to transfer, there are still obstacles with
the transference of explicit knowledge. One major issue is that though explicit knowledge is available,
it must be left up to the interpretation of the person who is using the material (Parise et al., 2006).

5.0 Why Don’t People Share

As mentioned earlier, knowledge sharing needs to communicate with face-to-face and collaboration
with workgroup. One of the challenges of knowledge management is that of getting people to share
their knowledge. In some organizations, sharing is caring and natural (Skyrme, 2008). There are
questions why don’t people share knowledge:

799
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

5.1 “Not invented here” syndrome

People have pride in not having to seek advice from others and in waiting to discover new ways for
themselves.

5.2 Not realizing how useful particular knowledge is to others

An individual may have knowledge used in one situation but unaware that other people at other
times and places might face similar situations.

5.3 Lack of trust

If people share some of their experience, will they used it out of context, mis-apply it and then blame
each other or pass it off as their own without giving any acknowledgement or recognition to them
as source.

5.4 Lack of time

Skyrme (2008) reveals that lack of time is the major reason given by employees in many organizations.
There is pressure on productivity on deadlines, and it’s general rule that the more knowledgeable
they are, the more people waiting to collar for the next task.

5.5 Secret Information and knowledge

There is not all information and knowledge can be share within community and society. In organization,
there are or maybe have top secret information which cannot be share. This classified “Top Secret”
information and knowledge which keep in organizations have a high values. Only trusted individuals
or people know the secret information and knowledge to protect organizations or country.

6.0 A PRACTICE APPROACH TO KNOWLEDGE SHARING IN ORGANIZATION

Kim, Lee and Olson (2006) stated that embedding knowledge sharing practice can be regarded as a
public good because people who do not pay or contribute to the organization or community also can
share knowledge. Multiple people also can access and shared knowledge at the same time. Viehland
(2005) stated that the alternative approach to managing knowledge sharing is the practice approach.
This approach is more effective in gathering tacit knowledge through informal networks with moderate
use of information technology. Table 1 show two approach of knowledge sharing.

Table 1: Process and Practice Approaches to Knowledge Sharing

Process Approach Practice Approach


Type of knowledge Explicit knowledge: codified in rules Mostly tacit knowledge: unarticulated
supported tools, and processes
Means of Formal controls, procedures, and Informal social groups that engage in
transmission standard operating procedures with story telling and improvisation
heavy emphasis on information
technologies to support knowledge
creation, codification, and transfer of
knowledge

800
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

Process Approach Practice Approach


Benefits Provides structure to harness generated Provides an environment to generate
ideas and knowledge and transfer high-value tacit knowledge

Achieves scale in knowledge reuse Provides spark for fresh ideas


and responsiveness to changing
environment.
Disadvantages Fails to tap into tacit knowledge. Can result in inefficiency.

May limit innovation and forces Abundance of ideas with no structure to


participants into fixed patterns of implement them.
thinking.
Role of information Heavy investment in IT to connect Moderate investment in IT to facilitate
technology people with reusable codified conversations and transfer of tacit
knowledge knowledge
(Source: Dennis Viehland, 2005)

7.0 Advantages of Knowledge Sharing Practice in Organization

There are some advantages of embedding knowledge sharing practices in organizations.

7.1 Sharing is Caring

Kim, Lee and Olson (2006) stated that embedding knowledge sharing practice can be regarded as a
public good because people who do not pay or contribute to the organization or community also can
share knowledge. Multiple people also can access and shared knowledge at the same time.

7.2 Innovative and Creative

Knowledge sharing practices can make people in organization innovative and creative to created
things. Meetings, discussion and forum are the best platform to share the knowledge and idea among
groups. The people in the groups can easily exchange and share knowledge to make their tasks
work. It is generally understood that knowledge sharing is an antecedent to many more knowledge
management activities. Tasmin and Woods (2008) evinced that knowledge sharing through knowledge
management effort has been empirically shown to positively and strongly influence higher innovation
activities among manufacturing firms in Malaysia. According to Tasmin and Woods (2008), the
predictive constructs of knowledge management enabling practices were able to explain 99% of its
variance and innovation activities were 52% of its variance. Most importantly, the influence strength
of KM on innovation was at a magnitude of 0.74. These facts show the significance and importance
of knowledge sharing towards innovative activities.

7.3 Knowledge is Power

When knowledge sharing among people or employees in organization becomes stronger, it shows
that knowledge also becomes more powerful in organization. Individual or person who shares
their tacit knowledge through conversation becomes more innovative and creative in their work.
Norris et al. (2003) agreed that much of this tacit knowledge exists and is communicated through
conversations in community of practices or networks of practices. Such “know how”, “know who”,
“know where” knowledge promises to be more important. As it is aptly said by an industry captain
of Hewlett-Packard;
801
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

“If HP knew what HP knows, we would be three times more profitable.”


~ Lew Platt, former CEO of HP

7.4 Attitude

One of advantages of embedding knowledge sharing practice in organization is attitude. Kuo and
Young (2008) stated that for knowledge sharing practices, attitude has been shown to be a critical factor
because one’s knowledge about how to solve organizational problems could influence one’s trade
value. Chowdhury (2004) reported, in a case study at Petronas, the importance of the expertise sharing
attitude with peers and people in workplaces. People also may consider sharing their knowledge in
an organization if they believe this will be personally important and valuable for them.

7.5 Changing Culture

Culture change is never easy and takes time. But cultures can be changed. Takeuchi and Nonaka
(2004) stated, in his KM milestone book, that “both IBM and Canon have successfully undergone a
transformation and have proven themselves capable of changing as fast as the environment around
them…” (p.25). In those firms environment, effective knowledge sharing deals with cultural change
of the people, process transformation, and technological management systems. According to Skyrme
(2008), involvement from people or individual in organization could be some of the best knowledge
sharing cultures is where everybody believes their knowledge is respected, valued and used to inform
decision. Knowledge sharing practice could make people and individual become valuable.

8.0 Conclusion

Finally, knowledge sharing practice in organization is very important and beneficial to be implemented.
It helps organizations in many ways such as information updating, innovations, creations and others.
Therefore, by understanding the concepts and advantages could facilitate knowledge sharing and
help managers, information and knowledge professionals to support knowledge sharing practices.
Due to this importance, it is expected that organizations to take advantage of the new transformation
of information handling skills for their employees to turn into knowledge management capabilities.

ACKNOWLEDGEMENT

The author would like to thank the lecturer, Dr. Rosmaini Tasmin, who was particularly insightful in
guiding the paper throughout the writing process. The author is also grateful to Faculty of Technology
Management, Business and Entrepreneurship and Library of Universiti Tun Hussein Onn Malaysia,
Johor for their supports on the paper.

REFERENCES

Alwis, R. S., Hartmann, E. and Gemünden, H. G. (2004). The role of tacit knowledge in innovation management.
20th Annual IMP Conference in Copenhagen. Sept. 2nd – 4th. 2004. p.1-23.
Bock, G. W. and Kim, Y. G. (2002). “Breaking the Myths of Rewards: An Exploratory Study of Attitudes About
Knowledge Sharing”. Information Resources Management Journal. 15(2), p.14-21.
Calo, T., 2008. Talent management in the era of the aging workforce: The critical role of knowledge transfer.
Public Person. Manage, 37. p. 403-416.

802
International Conference on Ethics and Professionalism 2010 (ICEP 2010)

Chaudhry, A. S. (2005). Knowledge Sharing Practices in Asian Institutions: A Multi-Cultural Perspective From
Singapore. World Library and Information Congress: 17th IFLA General Conference and Council. Aug. 14th –
18th.
Chowdury, N. (2004). People’s perception on various KM issues: Case Study with A Malaysian Oil Company.
Gombak, Malaysia: IIUM.
Davenport, T. H. (1997). Some Principles of Knowledge Management. (PhD Thesis).
Frappaolo, C. (2006). Knowledge Management. Capstone Publishing Ltd. (A Wiley Company): West Sussex,
England.
Keskin, Halit. (2005).The relationships between explicit and tacit oriented KM strategy and Firm Performance.
Journal of American Academy of Business, Cambridge Hollywood. 7(1), p.169-176.
Kim, J., Lee, S. M. and Olson, D. L. (2006). Knowledge Sharing: Effects of Cooperative Type and Reciprocity Level.
International Journal of Knowledge Management. 2(4), p.1-16.
Kimiz, D. (2005). Knowledge Management in Theory and Practice. Elsevier Butterworth-Heinemann: Oxford:
UK.
Kuo, F. Y. and Young, M. L. (2008). Predicting Knowledge Sharing Practices Through Intention: A Test of Competing
Models. Computer in Human Behavior. (24), p.2697-2722.
Lee, H. & Choi, B. (2003). Knowledge Management Enablers, Processes, and Organizational Performance: An
Integrative View and Empirical Examination. Journal of Management Information Systems. 20(1), p. 179-
228.
Nonaka, I., Toyama, R., Konno, N. (2000): SECI, Ba and Leadership: a unified model of dynamic knowledge creation,
in: Long Range Planning, 33(4), p.4-34.
Norris, D. M. et. al. (2003). A Revolution In Knowledge Sharing. EDUCAUSE Reviews. Sept./Oct., p.15-22.
Parise, S., R. Cross and T. Davenport, 2006. Strategies for preventing knowledge-loss crisis. MIT Sloan Manage.
Rev., 47. p. 31-38.
Park, H. S. and Im, B. C. (2003). “A study on the Knowledge Sharing Behavior of Local Public Servants in Korea”.
[Internet] http://www.kapa21.or.kr/down/2003
Rastogi, P.N. (2000). Knowledge Management and Intellectual Capital – The New Virtuous Reality of
Competitiveness. Human Systems Management. 19(1), p.39-49.
Skyrme, D. J. (2008). The 3Cs of Knowledge Sharing: Culture, Co-opetition and Commitment. (64). p.1-6.
Snowden, D. (2002). Knowledge Management Review. 5(5), p.13-17.
Takeuchi, H. & Nonaka, I. (2004). Hitotsubashi on Knowledge Management. Singapore: John Wiley & Sons.
Tasmin, R. and Woods, P. (2007). Relationship between corporate knowledge management and the firm’s
innovation capability. International Journal of Services Technology and Management, 8(1), p. 62-79.
Tasmin, R. and Woods, P. (2008). Knowledge Management Practices and Innovation Activities Among Large
Manufacturers in Peninsular Malaysia. PhD. Thesis. Multimedia University, Cyberjaya, Malaysia.
Viehland, D. (2005). ISExpertNet: Facilitating Knowledge Sharing in the Information Systems Academic
Community. The Journal of Issues in Informing Science and Information Technology. (2), p.441-450.

803

View publication stats


See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/325846997

METHODS OF DATA COLLECTION

Chapter · July 2016

CITATIONS READS
3 667,519

1 author:

Syed Muhammad Sajjad Kabir


Curtin University
107 PUBLICATIONS   291 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Others Writings View project

PhD Candidate in Health Sciences View project

All content following this page was uploaded by Syed Muhammad Sajjad Kabir on 25 June 2018.

The user has requested enhancement of the downloaded file.


Page 201

CHAPTER – 9

METHODS OF DATA COLLECTION


Topics Covered
9.1 Concept of Data Collection
9.2 Types of Data
9.3 Issues to be Considered for Data Collection
9.4 Methods of Primary Data Collection
9.4.1 Questionnaire Method
9.4.2 Interviews Method
9.4.3 Focus Group Discussion (FGD)
9.4.4 Participatory Rural Appraisal/ Assessment (PRA)
9.4.5 Rapid Rural Appraisal/ Assessment (RRA)
9.4.6 Observation Method
9.4.7 Survey Method
9.4.8 Case Study Method
9.4.9 Diaries Method
9.4.10 Principal Component Analysis (PCA)
9.4.11 Activity Sampling Technique
9.4.12 Memo Motion Study
9.4.13 Process Analysis
9.4.14 Link Analysis
9.4.15 Time and Motion Study
9.4.16 Experimental Method
9.4.17 Statistical Method
9.5 Methods of Secondary Data Collection
9.6 Methods of Legal Research
Chapter - 9 Methods of Data Collection Page 202

9.1 CONCEPT OF DATA COLLECTION


Data collection is the process of gathering and measuring information on variables of interest, in an
established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes. The data collection component of research is common to all
fields of study including physical and social sciences, humanities, business, etc. While methods vary
by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal
for all data collection is to capture quality evidence that then translates to rich data analysis and
allows the building of a convincing and credible answer to questions that have been posed. Regardless
of the field of study or preference for defining data (quantitative, qualitative), accurate data
collection is essential to maintaining the integrity of research. Both the selection of appropriate
data collection instruments (existing, modified, or newly developed) and clearly delineated
instructions for their correct use reduce the likelihood of errors occurring.
Data collection is one of the most important stages in conducting a research. You can have the best
research design in the world but if you cannot collect the required data you will be not be able to
complete your project. Data collection is a very demanding job which needs thorough planning, hard
work, patience, perseverance and more to be able to complete the task successfully. Data collection
starts with determining what kind of data required followed by the selection of a sample from a
certain population. After that, you need to use a certain instrument to collect the data from the
selected sample.

9.2 TYPES OF DATA


Data are organized into two broad categories: qualitative and quantitative.
Qualitative Data: Qualitative data are mostly non-numerical and usually descriptive or nominal in
nature. This means the data collected are in the form of words and sentences. Often (not always),
such data captures feelings, emotions, or subjective perceptions of something. Qualitative
approaches aim to address the ‘how’ and ‘why’ of a program and tend to use unstructured methods of
data collection to fully explore the topic. Qualitative questions are open-ended. Qualitative methods
include focus groups, group discussions and interviews. Qualitative approaches are good for further
exploring the effects and unintended consequences of a program. They are, however, expensive and
time consuming to implement. Additionally the findings cannot be generalized to participants outside
of the program and are only indicative of the group involved.
Qualitative data collection methods play an important role in impact evaluation by providing
information useful to understand the processes behind observed results and assess changes in
people’s perceptions of their well-being. Furthermore qualitative methods can be used to improve
the quality of survey-based quantitative evaluations by helping generate evaluation hypothesis;
strengthening the design of survey questionnaires and expanding or clarifying quantitative evaluation
findings. These methods are characterized by the following attributes -
 they tend to be open-ended and have less structured protocols (i.e., researchers may change the
data collection strategy by adding, refining, or dropping techniques or informants);
 they rely more heavily on interactive interviews; respondents may be interviewed several times
to follow up on a particular issue, clarify concepts or check the reliability of data;
 they use triangulation to increase the credibility of their findings (i.e., researchers rely on
multiple data collection methods to check the authenticity of their results);

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 203

 generally their findings are not generalizable to any specific population, rather each case study
produces a single piece of evidence that can be used to seek general patterns among different
studies of the same issue.
Regardless of the kinds of data involved, data collection in a qualitative study takes a great deal of
time. The researcher needs to record any potentially useful data thoroughly, accurately, and
systematically, using field notes, sketches, audiotapes, photographs and other suitable means. The
data collection methods must observe the ethical principles of research. The qualitative methods
most commonly used in evaluation can be classified in three broad categories -
 In-depth interview
 Observation methods
 Document review.
Quantitative Data: Quantitative data is numerical in nature and can be mathematically computed.
Quantitative data measure uses different scales, which can be classified as nominal scale, ordinal
scale, interval scale and ratio scale. Often (not always), such data includes measurements of
something. Quantitative approaches address the ‘what’ of the program. They use a systematic
standardized approach and employ methods such as surveys and ask questions. Quantitative
approaches have the advantage that they are cheaper to implement, are standardized so
comparisons can be easily made and the size of the effect can usually be measured. Quantitative
approaches however are limited in their capacity for the investigation and explanation of similarities
and unexpected differences. It is important to note that for peer-based programs quantitative data
collection approaches often prove to be difficult to implement for agencies as lack of necessary
resources to ensure rigorous implementation of surveys and frequently experienced low
participation and loss to follow up rates are commonly experienced factors.
The Quantitative data collection methods rely on random sampling and structured data collection
instruments that fit diverse experiences into predetermined response categories. They produce
results that are easy to summarize, compare, and generalize. If the intent is to generalize from the
research participants to a larger population, the researcher will employ probability sampling to
select participants. Typical quantitative data gathering strategies include -
 Experiments/clinical trials.
 Observing and recording well-defined events (e.g., counting the number of patients waiting in
emergency at specified times of the day).
 Obtaining relevant data from management information systems.
 Administering surveys with closed-ended questions (e.g., face-to face and telephone interviews,
questionnaires etc).
 In quantitative research (survey research), interviews are more structured than in Qualitative
research. In a structured interview, the researcher asks a standard set of questions and
nothing more. Face -to -face interviews have a distinct advantage of enabling the researcher to
establish rapport with potential participants and therefore gain their cooperation.
 Paper-pencil-questionnaires can be sent to a large number of people and saves the researcher
time and money. People are more truthful while responding to the questionnaires regarding
controversial issues in particular due to the fact that their responses are anonymous.
Mixed Methods: Mixed methods approach as design, combining both qualitative and quantitative
research data, techniques and methods within a single research framework. Mixed methods
approaches may mean a number of things, i.e. a number of different types of methods in a study or
at different points within a study or using a mixture of qualitative and quantitative methods. Mixed

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 204

methods encompass multifaceted approaches that combine to capitalize on strengths and reduce
weaknesses that stem from using a single research design. Using this approach to gather and
evaluate data may assist to increase the validity and reliability of the research. Some of the
common areas in which mixed-method approaches may be used include –
 Initiating, designing, developing and expanding interventions;
 Evaluation;
 Improving research design; and
 Corroborating findings, data triangulation or convergence.
Some of the challenges of using a mixed methods approach include –
 Delineating complementary qualitative and quantitative research questions;
 Time-intensive data collection and analysis; and
 Decisions regarding which research methods to combine.
Mixed methods are useful in highlighting complex research problems such as disparities in health
and can also be transformative in addressing issues for vulnerable or marginalized populations or
research which involves community participation. Using a mixed-methods approach is one way to
develop creative options to traditional or single design approaches to research and evaluation.
There are many ways of classifying data. A common classification is based upon who collected the
data.
PRIMARY DATA
Data that has been collected from first-hand-experience is known as primary data. Primary data has
not been published yet and is more reliable, authentic and objective. Primary data has not been
changed or altered by human beings; therefore its validity is greater than secondary data.
Importance of Primary Data: In statistical surveys it is necessary to get information from primary
sources and work on primary data. For example, the statistical records of female population in a
country cannot be based on newspaper, magazine and other printed sources. A research can be
conducted without secondary data but a research based on only secondary data is least reliable and
may have biases because secondary data has already been manipulated by human beings. One of such
sources is old and secondly they contain limited information as well as they can be misleading and
biased.
Sources of Primary Data: Sources for primary data are limited and at times it becomes difficult to
obtain data from primary source because of either scarcity of population or lack of cooperation.
Following are some of the sources of primary data.
Experiments: Experiments require an artificial or natural setting in which to perform logical study
to collect data. Experiments are more suitable for medicine, psychological studies, nutrition and for
other scientific studies. In experiments the experimenter has to keep control over the influence of
any extraneous variable on the results.
Survey: Survey is most commonly used method in social sciences, management, marketing and
psychology to some extent. Surveys can be conducted in different methods.
Questionnaire: It is the most commonly used method in survey. Questionnaires are a list of
questions either open-ended or close-ended for which the respondents give answers. Questionnaire
can be conducted via telephone, mail, live in a public area, or in an institute, through electronic mail
or through fax and other methods.
Interview: Interview is a face-to-face conversation with the respondent. In interview the main
problem arises when the respondent deliberately hides information otherwise it is an in depth
source of information. The interviewer can not only record the statements the interviewee speaks

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 205

but he can observe the body language, expressions and other reactions to the questions too. This
enables the interviewer to draw conclusions easily.
Observations: Observation can be done while letting the observing person know that s/he is being
observed or without letting him know. Observations can also be made in natural settings as well as in
artificially created environment.
Advantages of Using Primary Data
 The investigator collects data specific to the problem under study.
 There is no doubt about the quality of the data collected (for the investigator).
 If required, it may be possible to obtain additional data during the study period.
Disadvantages of Using Primary Data
1. The investigator has to contend with all the hassles of data collection-
 deciding why, what, how, when to collect;
 getting the data collected (personally or through others);
 getting funding and dealing with funding agencies;
 ethical considerations (consent, permissions, etc.).
2. Ensuring the data collected is of a high standard-
 all desired data is obtained accurately, and in the format it is required in;
 there is no fake/ cooked up data;
 unnecessary/ useless data has not been included.
3. Cost of obtaining the data is often the major expense in studies.
SECONDARY DATA
Data collected from a source that has already been published in any form is called as secondary
data. The review of literature in any research is based on secondary data. It is collected by someone
else for some other purpose (but being utilized by the investigator for another purpose). For
examples, Census data being used to analyze the impact of education on career choice and earning.
Common sources of secondary data for social science include censuses, organizational records and
data collected through qualitative methodologies or qualitative research. Secondary data is
essential, since it is impossible to conduct a new survey that can adequately capture past change
and/or developments.
Sources of Secondary Data: The following are some ways of collecting secondary data –
 Books
 Records
 Biographies
 Newspapers
 Published censuses or other statistical data
 Data archives
 Internet articles
 Research articles by other researchers (journals)
 Databases, etc.
Importance of Secondary Data: Secondary data can be less valid but its importance is still there.
Sometimes it is difficult to obtain primary data; in these cases getting information from secondary
sources is easier and possible. Sometimes primary data does not exist in such situation one has to
confine the research on secondary data. Sometimes primary data is present but the respondents are
not willing to reveal it in such case too secondary data can suffice. For example, if the research is
on the psychology of transsexuals first it is difficult to find out transsexuals and second they may

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 206

not be willing to give information you want for your research, so you can collect data from books or
other published sources. A clear benefit of using secondary data is that much of the background
work needed has already been carried out. For example, literature reviews, case studies might have
been carried out, published texts and statistics could have been already used elsewhere, media
promotion and personal contacts have also been utilized. This wealth of background work means that
secondary data generally have a pre-established degree of validity and reliability which need not be
re-examined by the researcher who is re-using such data. Furthermore, secondary data can also be
helpful in the research design of subsequent primary research and can provide a baseline with which
the collected primary data results can be compared to. Therefore, it is always wise to begin any
research activity with a review of the secondary data.
Advantages of Using Secondary Data
 No hassles of data collection.
 It is less expensive.
 The investigator is not personally responsible for the quality of data (‘I didn’t do it’).
Disadvantages of Using Secondary Data
 The data collected by the third party may not be a reliable party so the reliability and accuracy
of data go down.
 Data collected in one location may not be suitable for the other one due variable environmental
factor.
 With the passage of time the data becomes obsolete and very old.
 Secondary data collected can distort the results of the research. For using secondary data a
special care is required to amend or modify for use.
 Secondary data can also raise issues of authenticity and copyright.
Keeping in view the advantages and disadvantages of sources of data requirement of the research
study and time factor, both sources of data i.e. primary and secondary data have been selected.
These are used in combination to give proper coverage to the topic.

9.3 ISSUES TO BE CONSIDERED FOR DATA COLLECTION/ NORMS IN RESEARCH


There are several reasons why it is important to adhere to ethical norms in research. First, norms
promote the aims of research, such as knowledge, truth, and avoidance of error. For example,
prohibitions against fabricating, falsifying, or misrepresenting research data promote the truth and
avoid error. Second, since research often involves a great deal of cooperation and coordination
among many different people in different disciplines and institutions, ethical standards promote the
values that are essential to collaborative work, such as trust, accountability, mutual respect, and
fairness. For example, many ethical norms in research, such as guidelines for authorship, copyright
and patenting policies, data sharing policies, and confidentiality rules in peer review, are designed to
protect intellectual property interests while encouraging collaboration. Most researchers want to
receive credit for their contributions and do not want to have their ideas stolen or disclosed
prematurely. Third, many of the ethical norms help to ensure that researchers can be held
accountable to the public. Fourth, ethical norms in research also help to build public support for
research. People more likely to fund research project if they can trust the quality and integrity of
research. Finally, many of the norms of research promote a variety of other important moral and
social values, such as social responsibility, human rights, animal welfare, compliance with the law, and
health and safety. Ethical lapses in research can significantly harm human and animal subjects,
students, and the public. For example, a researcher who fabricates data in a clinical trial may harm

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 207

or even kill patients, and a researcher who fails to abide by regulations and guidelines relating to
radiation or biological safety may jeopardize his health and safety or the health and safety of staff
and students.
Given the importance of ethics for the conduct of research, it should come as no surprise that many
different professional associations, government agencies, and universities have adopted specific
codes, rules, and policies relating to research ethics. The following is a rough and general summary
of some ethical principles that various codes address -
Honesty: Strive for honesty in all scientific communications. Honestly report data, results, methods
and procedures, and publication status. Do not fabricate, falsify, or misrepresent data. Do not
deceive colleagues, granting agencies, or the public.
Objectivity: Strive to avoid bias in experimental design, data analysis, data interpretation, peer
review, personnel decisions, grant writing, expert testimony, and other aspects of research where
objectivity is expected or required. Avoid or minimize bias or self-deception. Disclose personal or
financial interests that may affect research.
Integrity: Keep your promises and agreements; act with sincerity; strive for consistency of thought
and action.
Carefulness: Avoid careless errors and negligence; carefully and critically examine your own work
and the work of your peers. Keep good records of research activities, such as data collection,
research design, and correspondence with agencies or journals.
Openness: Share data, results, ideas, tools, resources. Be open to criticism and new ideas.
Respect for Intellectual Property: Honor patents, copyrights, and other forms of intellectual
property. Do not use unpublished data, methods, or results without permission. Give credit where
credit is due. Give proper acknowledgement or credit for all contributions to research. Never
plagiarize.
Confidentiality: Protect confidential communications, such as papers or grants submitted for
publication, personnel records, trade or military secrets, and patient records.
Responsible Publication: Publish in order to advance research and scholarship, not to advance just
your own career. Avoid wasteful and duplicative publication.
Responsible Mentoring: Help to educate, mentor, and advise students. Promote their welfare and
allow them to make their own decisions.
Respect for Colleagues: Respect your colleagues and treat them fairly.
Social Responsibility: Strive to promote social good and prevent or mitigate social harms through
research, public education, and advocacy.
Non-Discrimination: Avoid discrimination against colleagues or students on the basis of sex, race,
ethnicity, or other factors that are not related to their scientific competence and integrity.
Competence: Maintain and improve your own professional competence and expertise through lifelong
education and learning; take steps to promote competence in science as a whole.
Legality: Know and obey relevant laws and institutional and governmental policies.
Animal Care: Show proper respect and care for animals when using them in research. Do not conduct
unnecessary or poorly designed animal experiments.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 208

Human Subjects Protection: When conducting research on human subjects, minimize harms and risks
and maximize benefits; respect human dignity, privacy, and autonomy; take special precautions with
vulnerable populations; and strive to distribute the benefits and burdens of research fairly.
Training in research ethics should be able to help researchers grapple with ethical dilemmas by
introducing researchers to important concepts, tools, principles, and methods that can be useful in
resolving these dilemmas. In fact, the issues have become so important for training in research.

9.4 METHODS OF PRIMARY DATA COLLECTION


In primary data collection, you collect the data yourself using qualitative and quantitative methods.
The key point here is that the data you collect is unique to you and your research and, until you
publish, no one else has access to it. There are many methods of collecting primary data.
The main methods include –
 Questionnaires
 Interviews
 Focus Group Interviews
 Observation
 Survey
 Case-studies
 Diaries
 Activity Sampling Technique
 Memo Motion Study
 Process Analysis
 Link Analysis
 Time and Motion Study
 Experimental Method
 Statistical Method etc.

9.4.1 QUESTIONNAIRE METHOD


A questionnaire is a research instrument consisting of a series of questions and other prompts for
the purpose of gathering information from respondents. Although they are often designed for
statistical analysis of the responses, this is not always the case. The questionnaire was invented by
Sir Francis Galton (1822 - 1911). Questionnaires have advantages over some other types of surveys
in that they are cheap, do not require as much effort from the questioner as verbal or telephone
surveys, and often have standardized answers that make it simple to compile data. As a type of
survey, questionnaires also have many of the same problems relating to question construction and
wording that exist in other types of opinion polls.
Types: A distinction can be made between questionnaires with questions that measure separate
variables, and questionnaires with questions that are aggregated into either a scale or index.
Questionnaires within the former category are commonly part of surveys, whereas questionnaires in
the latter category are commonly part of tests. Questionnaires with questions that measure
separate variables, could for instance include questions on –
 preferences (e.g. political party)
 behaviors (e.g. food consumption)
 facts (e.g. gender).

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 209

Questionnaires with questions that are aggregated into either a scale or index, include for instance
questions that measure -
 latent traits (e.g. personality traits such as extroversion)
 attitudes (e.g. towards immigration)
 an index (e.g. Social Economic Status).
Question Types: Usually, a questionnaire consists of a number of questions that the respondent has
to answer in a set format. A distinction is made between open-ended and closed-ended questions. An
open-ended question asks the respondent to formulate his/her own answer, whereas a closed-ended
question has the respondent pick an answer from a given number of options. The response options
for a closed-ended question should be exhaustive and mutually exclusive. Four types of response
scales for closed-ended questions are distinguished –
 Dichotomous, where the respondent has two options.
 Nominal-polytomous, where the respondent has more than two unordered options.
 Ordinal-polytomous, where the respondent has more than two ordered options.
 Continuous (Bounded), where the respondent is presented with a continuous scale.
A respondent’s answer to an open-ended question is coded into a response scale afterwards. An
example of an open-ended question is a question where the testee has to complete a sentence
(sentence completion item).
Question Sequence: In general, questions should flow logically from one to the next. To achieve the
best response rates, questions should flow from the least sensitive to the most sensitive, from the
factual and behavioral to the attitudinal, and from the more general to the more specific. There
typically is a flow that should be followed when constructing a questionnaire in regards to the order
that the questions are asked. The order is as follows -
 Screens
 Warm-ups
 Transitions
 Skips
 Difficult
 Changing Formula
Screens are used as a screening method to find out early whether or not someone should complete
the questionnaire. Warm-ups are simple to answer, help capture interest in the survey, and may not
even pertain to research objectives. Transition questions are used to make different areas flow well
together. Skips include questions similar to ‘If yes, then answer question 3. If no, then continue to
question 5’. Difficult questions are towards the end because the respondent is in ‘response mode’.
Also, when completing an online questionnaire, the progress bars lets the respondent know that they
are almost done so they are more willing to answer more difficult questions. Classification or
demographic question should be at the end because typically they can feel like personal questions
which will make respondents uncomfortable and not willing to finish survey.
Basic Rules for Questionnaire Item Construction: The basic rules are -
 Use statements which are interpreted in the same way by members of different subpopulations
of the population of interest.
 Use statements where persons that have different opinions or traits will give different answers.
 Think of having an ‘open’ answer category after a list of possible answers.
 Use only one aspect of the construct you are interested in per item.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 210

 Use positive statements and avoid negatives or double negatives.


 Do not make assumptions about the respondent.
 Use clear and comprehensible wording, easily understandable for all educational levels.
 Use correct spelling, grammar and punctuation.
 Avoid items that contain more than one question per item (e.g. Do you like strawberries and
potatoes?).
 Question should not be biased or even leading the participant towards an answer.
Questionnaire Administration Modes: Main modes of questionnaire administration are -
 Face-to-face questionnaire administration, where an interviewer presents the items orally.
 Paper-and-pencil questionnaire administration, where the items are presented on paper.
 Computerized questionnaire administration, where the items are presented on the computer.
 Adaptive computerized questionnaire administration, where a selection of items is presented on
the computer, and based on the answers on those items, the computer selects following items
optimized for the testee’s estimated ability or trait.
Concerns with Questionnaires: It is important to consider the order in which questions are
presented. Sensitive questions, such as questions about income, drug use, or sexual activity, should
be put at the end of the survey. This allows the researcher to establish trust before asking
questions that might embarrass respondents. Researchers also recommend putting routine questions,
such as age, gender, and marital status, at the end of the questionnaire. Double-barreled questions,
which ask two questions in one, should never be used in a survey. An example of a double barreled
question is, please rate how strongly you agree or disagree with the following statement - ‘I feel
good about my work on the job, and I get along well with others at work’. This question is
problematic because survey respondents are asked to give one response for two questions.
Researchers should avoid using emotionally loaded or biased words and phrases.
Advantages of Questionnaires: The advantages of questionnaires are -
 Large amounts of information can be collected from a large number of people in a short period
of time and in a relatively cost effective way.
 Can be carried out by the researcher or by any number of people with limited affect to its
validity and reliability.
 The results of the questionnaires can usually be quickly and easily quantified by either a
researcher or through the use of a software package.
 Can be analyzed more scientifically and objectively than other forms of research.
 When data has been quantified, it can be used to compare and contrast other research and may
be used to measure change.
 Positivists believe that quantitative data can be used to create new theories and / or test
existing hypotheses.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 211

Disadvantages of Questionnaires: The disadvantages of questionnaires are -


 To be inadequate to understand some forms of information - i.e. changes of emotions, behavior,
feelings etc.
 Phenomenologists state that quantitative research is simply an artificial creation by the
researcher, as it is asking only a limited amount of information without explanation.
 There is no way to tell how truthful a respondent is being.
 There is no way of telling how much thought a respondent has put in.
 The respondent may be forgetful or not thinking within the full context of the situation.
 People may read differently into each question and therefore reply based on their own
interpretation of the question - i.e. what is ‘good’ to someone may be ‘poor’ to someone else,
therefore there is a level of subjectivity that is not acknowledged.
Questionnaires are not among the most prominent methods in qualitative research, because they
commonly require subjects to respond to a stimulus, and thus they are not acting naturally. However,
they have their uses, especially as a means of collecting information from a wider sample than can be
reached by personal interview. Though the information is necessarily more limited, it can still be
very useful. For example, where certain clearly defined facts or opinions have been identified by
more qualitative methods, a questionnaire can explore how generally these apply, if that is a matter
of interest.

9.4.2 INTERVIEWS METHOD


Interviewing involves asking questions and getting answers from participants in a study. Interviewing
has a variety of forms including: individual, face-to-face interviews and face-to-face group
interviewing. The asking and answering of questions can be mediated by the telephone or other
electronic devices (e.g. computers). Interviews can be –
A. Structured,
B. Semi-structure or
C. Unstructured.
Face to face interviews are advantageous since detailed questions can be asked; further probing can be done to provide rich
data; literacy requirements of participants is not an issue; non verbal data can be collected through observation; complex
and unknown issues can be explored; response rates are usually higher than for self-administered questionnaires.
Disadvantages of face to face interviews include: they can be expensive and time consuming; training of interviewers is
necessary to reduce interviewer bias and are administered in a standardized why they are prone to interviewer bias and
interpreter bias (if interpreters are used); sensitive issues maybe challenging.
Telephone interviews yield just as accurate data as face to face interviews. Telephone interviews are advantageous as they:
are cheaper and faster than face to face interviews to conduct; use less resources than face to face interviews; allow to
clarify questions; do not require literacy skills. Disadvantages of telephone interviews include: having to make repeated calls
as calls may not be answered the first time; potential bias if call backs are not made so bias is towards those who are at
home; only suitable for short surveys; only accessible to the population with a telephone; not appropriate for exploring
sensitive issues.

Structured Interviews
Characteristics of the Structured Interview
 The interviewer asks each respondent the same series of questions.
 The questions are created prior to the interview, and often have a limited set of response
categories.
 There is generally little room for variation in responses and there are few open-ended questions
included in the interview guide.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 212

 Questioning is standardized and the ordering and phrasing of the questions are kept consistent
from interview to interview.
 The interviewer plays a neutral role and acts casual and friendly, but does not insert his or her
opinion in the interview.
 Self-administered questionnaires are a type of structured interview.
When to Use a Structured Interview: Development of a structured interview guide or questionnaire
requires a clear topical focus and well-developed understanding of the topic at hand. A well-
developed understanding of a topic allows researchers to create a highly structured interview guide
or questionnaire that provides respondents with relevant, meaningful and appropriate response
categories to choose from for each question. Structured interviews are, therefore, best used when
the literature in a topical area is highly developed or following the use of observational and other
less structured interviewing approaches that provide the researcher with adequate understanding
of a topic to construct meaningful and relevant close-ended questions.
Recording Interviews: There are a range of ways to collect and record structured interview data.
Data collections methods include, but are not limited to - paper-based and self-report (mail, face-
to-face); telephone interviews where the interviewer fills in participants’ responses; web-based and
self-report.
Benefits: Structured interviews can be conducted efficiently by interviewers trained only to follow
the instructions on the interview guide or questionnaire. Structured interviews do not require the
development of rapport between interviewer and interviewee, and they can produce consistent data
that can be compared across a number of respondents.
Semi-structured Interviews
Characteristics of Semi-structured Interviews
 The interviewer and respondents engage in a formal interview.
 The interviewer develops and uses an ‘interview guide’. This is a list of questions and topics that
need to be covered during the conversation, usually in a particular order.
 The interviewer follows the guide, but is able to follow topical trajectories in the conversation
that may stray from the guide when s/he feels this is appropriate.
When to Use Semi-structured Interviews: Semi-structured interviewing, according to Bernard
(1988), is best used when you won’t get more than one chance to interview someone and when you will
be sending several interviewers out into the field to collect data. The semi-structured interview
guide provides a clear set of instructions for interviewers and can provide reliable, comparable
qualitative data. Semi-structured interviews are often preceded by observation, informal and
unstructured interviewing in order to allow the researchers to develop a keen understanding of the
topic of interest necessary for developing relevant and meaningful semi-structured questions. The
inclusion of open-ended questions and training of interviewers to follow relevant topics that may
stray from the interview guide does, however, still provide the opportunity for identifying new ways
of seeing and understanding the topic at hand.
Recording Semi-Structured Interviews: Typically, the interviewer has a paper-based interview guide
that s/he follows. Since semi-structured interviews often contain open-ended questions and
discussions may diverge from the interview guide, it is generally best to tape-record interviews and
later transcript these tapes for analysis. While it is possible to try to jot notes to capture
respondents’ answers, it is difficult to focus on conducting an interview and jotting notes. This
approach will result in poor notes and also detract for the development of rapport between
interviewer and interviewee. Development of rapport and dialogue is essential in unstructured

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 213

interviews. If tape-recording an interview is out of the question, consider having a note-taker


present during the interview.
Benefits: Many researchers like to use semi-structured interviews because questions can be
prepared ahead of time. This allows the interviewer to be prepared and appear competent during the
interview. Semi-structured interviews also allow informants the freedom to express their views in
their own terms. Semi-structure interviews can provide reliable, comparable qualitative data.
Unstructured Interviews
Characteristics of Unstructured Interviews
 The interviewer and respondents engage in a formal interview in that they have a scheduled time
to sit and speak with each other and both parties recognize this to be an interview.
 The interviewer has a clear plan in mind regarding the focus and goal of the interview. This
guides the discussion.
 There is not a structured interview guide. Instead, the interviewer builds rapport with
respondents, getting respondents to open-up and express themselves in their own way.
 Questions tend to be open-ended and express little control over informants’ responses.
 Ethnographic, in depth interviews are unstructured. Fontana and Frey (1994) identify three
types of in depth, ethnographic unstructured interviews – oral history, creative interviews and
postmodern interviews.
When to Use Unstructured Interviews: Unstructured interviewing is recommended when the
researcher has developed enough of an understanding of a setting and his/her topic of interest to
have a clear agenda for the discussion with the informant, but still remains open to having his/her
understanding of the area of inquiry open to revision by respondents. Because these interviews are
not highly structured and because the researcher’s understanding is still evolving, it is helpful to
anticipate the need to speak with informants on multiple occasions.
Recording Unstructured Interviews: Since unstructured interviews often contain open-ended
questions and discussions may develop in unanticipated directions, it is generally best to tape-record
interviews and later transcript these tapes for analysis. This allows the interviewer to focus on
interacting with the participant and follow the discussion. While it is possible to try to jot notes to
capture respondents’ answers, it is difficult to focus on conducting an interview and jotting notes.
This approach will result in poor notes and also detract from the development of rapport between
interviewer and interviewee. Development of rapport and dialogue is essential in unstructured
interviews. If tape-recording an interview is out of the question, consider having a note-taker
present during the interview.
Benefits: Unstructured interviews are an extremely useful method for developing an understanding
of an as-of-yet not fully understood or appreciated culture, experience, or setting. Unstructured
interviews allow researchers to focus the respondents’ talk on a particular topic of interest, and
may allow researchers the opportunity to test out his/her preliminary understanding, while still
allowing for ample opportunity for new ways of seeing and understanding to develop. Unstructured
interviews can be an important preliminary step toward the development of more structured
interview guides or surveys.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 214

Informal Interviewing
Characteristics of Informal interviewing
 The interviewer talks with people in the field informally, without use of a structured interview
guide of any kind.
 The researcher tries to remember his/her conversations with informants, and uses jottings or
brief notes taken in the field to help in the recall and writing of notes from experiences in the
field.
 Informal interviewing goes hand-in-hand with participant observation.
 While in the field as an observer, informal interviews are casual conversations one might have
with the people the researcher is observing.
When to Use Informal Interviews: Informal interviewing is typically done as part of the process of
observing a social setting of interest. These may be best used in the early stages of the
development of an area of inquiry, where there is little literature describing the setting,
experience, culture or issue of interest. The researcher engages in fieldwork - observation and
informal interviewing - to develop an understanding of the setting and to build rapport. Informal
interviewing may also be used to uncover new topics of interest that may have been overlooked by
previous research.
Recording Informal Interviews: Since informal interviews occur 'on the fly,' it is difficult to tape-
record this type of interview. Additionally, it is likely that informal interviews will occur during the
process of observing a setting. The researcher should participate in the conversation. As soon as
possible, s/he should make jottings or notes of the conversation. These jottings should be developed
into a more complete account of the informal interview. This type of account would tend to be
included in the researcher's field notes. Developing field notes soon after an informal interview is
recommended. Even with good field jottings the details of an informal interview are quickly lost
from memory.
Benefits: Interviews can be done informally, and ‘on the fly’ and, therefore, do not require
scheduling time with respondents. In fact, respondents may just see this as ‘conversation’. Informal
interviews may, therefore, foster 'low pressure' interactions and allow respondents to speak more
freely and openly. Informal interviewing can be helpful in building rapport with respondents and in
gaining their trust as well as their understanding of a topic, situation, setting, etc. Informal
interviews, like unstructured interviews, are an essential part of gaining an understanding of a
setting and its members' ways of seeing. It can provide the foundation for developing and
conducting more structured interviews.
Interviewing, when considered as a method for conducting qualitative research, is a technique used
to understand the experiences of others. Characteristics of qualitative research interviews –
 Interviews are completed by the interviewer based on what the interviewee says.
 Interviews are a far more personal form of research than questionnaires.
 In the personal interview, the interviewer works directly with the interviewee.
 Unlike with mail surveys, the interviewer has the opportunity to probe or ask follow up questions.
 Interviews are generally easier for the interviewee, especially if what is sought are opinions
and/or impressions.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 215

Types of Interviews
Informal, Conversational interview: No predetermined questions are asked, in order to remain as
open and adaptable as possible to the interviewee’s nature and priorities; during the interview the
interviewer ‘goes with the flow’.
General interview guide approach: Intended to ensure that the same general areas of information
are collected from each interviewee; this provides more focus than the conversational approach, but
still allows a degree of freedom and adaptability in getting the information from the interviewee.
Standardized, open-ended interview: The same open-ended questions are asked to all interviewees;
this approach facilitates faster interviews that can be more easily analyzed and compared.
Closed, fixed-response interview: All interviewees are asked the same questions and asked to choose
answers from among the same set of alternatives. This format is useful for those not practiced in
interviewing. This type of interview is also referred to as structured.
Interviewer’s judgments: According to Hackman and Oldman several factors can bias an
interviewer’s judgment about a job applicant. However these factors can be reduced or minimized by
training interviews to recognized them. Some examples are -
Prior Information: Interviewers generally have some prior information about job candidates, such as
recruiter evaluations, application blanks, online screening results, or the results of psychological
tests. This can cause the interviewer to have a favorable or unfavorable attitude toward an
applicant before meeting them.
The Contrast Effect: How the interviewers evaluate a particular applicant may depend on their
standards of comparison, that is, the characteristics of the applicants they interviewed previously.
Iterviewers’ Prejudices: This can be done when the interviewers’ judgment is their personal likes and
dislikes. These may include but are not limited to racial and ethnic background, applicants who
display certain qualities or traits and refuse to consider their abilities or characteristics.
Preparation and Process of Conducting Interviews
Interviews are among the most challenging and rewarding forms of measurement. They require a
personal sensitivity and adaptability as well as the ability to stay within the bounds of the designed
protocol. The followings describe the preparation need to do for an interview study and then the
process of conducting the interview itself.
 Preparation
Role of the Interviewer: The interviewer is really the ‘jack-of-all-trades’ in survey research. The
interviewer’s role is complex and multifaceted. It includes the following tasks –
Locate and enlist cooperation of respondents: The interviewer has to find the respondent. In door-
to-door surveys, this means being able to locate specific addresses. Often, the interviewer has to
work at the least desirable times (like immediately after dinner or on weekends) because that’s
when respondents are most readily available.
Motivate respondents to do good job: If the interviewer does not take the work seriously, why
would the respondent? The interviewer has to be motivated and has to be able to communicate that
motivation to the respondent. Often, this means that the interviewer has to be convinced of the
importance of the research.
Clarify any confusion/concerns: Interviewers have to be able to think on their feet. Respondents
may raise objections or concerns that were not anticipated. The interviewer has to be able to
respond candidly and informatively.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 216

Observe quality of responses: Whether the interview is personal or over the phone, the interviewer
is in the best position to judge the quality of the information that is being received. Even a verbatim
transcript will not adequately convey how seriously the respondent took the task, or any gestures or
body language that were evident.
Conduct a good interview: Last, and certainly not least, the interviewer has to conduct a good
interview! Every interview has a life of its own. Some respondents are motivated and attentive,
others are distracted or disinterested. The interviewer also has good or bad days. Assuring a
consistently high-quality interview is a challenge that requires constant effort.
Training the Interviewers: Here are some of the major topics that should be included in interviewer
training –
Describe the entire study: Interviewers need to know more than simply how to conduct the
interview itself. They should learn about the background for the study, previous work that has been
done, and why the study is important.
State who is sponsor of research: Interviewers need to know who they are working for. They and
their respondents have a right to know not just what agency or company is conducting the research,
but also, who is paying for the research.
Teach enough about survey research: While you seldom have the time to teach a full course on
survey research methods, the interviewers need to know enough that they respect the survey
method and are motivated. Sometimes it may not be apparent why a question or set of questions was
asked in a particular way. The interviewers will need to understand the rationale for how the
instrument was constructed.
Explain the sampling logic and process: Naive interviewers may not understand why sampling is so
important. They may wonder why you go through all the difficulties of selecting the sample so
carefully. You will have to explain that sampling is the basis for the conclusions that will be reached
and for the degree to which your study will be useful.
Explain interviewer bias: Interviewers need to know the many ways that they can inadvertently bias
the results. And, they need to understand why it is important that they not bias the study. This is
especially a problem when you are investigating political or moral issues on which people have
strongly held convictions. While the interviewer may think they are doing good for society by
slanting results in favor of what they believe, they need to recognize that doing so could jeopardize
the entire study in the eyes of others.
‘Walk through’ the Interview: When you first introduce the interview, it’s a good idea to walk
through the entire protocol so the interviewers can get an idea of the various parts or phases and
how they interrelate. Explain respondent selection procedures, including –
Reading maps: It’s astonishing how many adults don’t know how to follow directions on a map. In
personal interviews, the interviewer may need to locate respondents who are spread over a wide
geographic area. And, they often have to navigate by night (respondents tend to be most available in
evening hours) in neighborhoods they’re not familiar with. Teaching basic map reading skills and
confirming that the interviewers can follow maps is essential.
Identifying households: In many studies it is impossible in advance to say whether every sample
household meets the sampling requirements for the study. In your study, you may want to interview
only people who live in single family homes. It may be impossible to distinguish townhouses and
apartment buildings in your sampling frame. The interviewer must know how to identify the
appropriate target household.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 217

Identify respondents: Just as with households, many studies require respondents who meet specific
criteria. For instance, your study may require that you speak with a male head-of-household between
the ages of 30 and 40 who has children under 18 living in the same household. It may be impossible
to obtain statistics in advance to target such respondents. The interviewer may have to ask a series
of filtering questions before determining whether the respondent meets the sampling needs.
Rehearse interview: You should probably have several rehearsal sessions with the interviewer team.
You might even videotape rehearsal interviews to discuss how the trainees responded in difficult
situations. The interviewers should be very familiar with the entire interview before ever facing a
respondent.
Explain supervision: In most interview studies, the interviewers will work under the direction of a
supervisor. In some contexts, the supervisor may be a faculty advisor; in others, they may be the
‘boss’. In order to assure the quality of the responses, the supervisor may have to observe a
subsample of interviews, listen in on phone interviews, or conduct follow-up assessments of
interviews with the respondents. This can be very threatening to the interviewers. You need to
develop an atmosphere where everyone on the research team - interviewers and supervisors - feel
like they're working together towards a common end.
Explain scheduling: The interviewers have to understand the demands being made on their schedules
and why these are important to the study. In some studies it will be imperative to conduct the
entire set of interviews within a certain time period. In most studies, it's important to have the
interviewers available when it's convenient for the respondents, not necessarily the interviewer.
Interviewer’s Kit: It’s important that interviewers have all of the materials they need to do a
professional job. Usually, you will want to assemble an interviewer kit that can be easily carried and
includes all of the important materials such as –
 a ‘professional-looking’ notebook (this might even have the logo of the company or organization
conducting the interviews);
 maps;
 sufficient copies of the survey instrument;
 official identification (preferable a picture ID);
 a cover letter from the Principal Investigator or Sponsor; and
 a phone number the respondent can call to verify the interviewer’s authenticity.
 Process
So all the preparation is complete, the training done, the interviewers ready to proceed, their ‘kits’
in hand. It’s finally time to do an actual interview. Each interview is unique, like a small work of art
(and sometimes the art may not be very good). Each interview has its own ebb and flow - its own
pace. To the outsider, an interview looks like a fairly standard, simple, prosaic effort. But to the
interviewer, it can be filled with special nuances and interpretations that aren’t often immediately
apparent. Every interview includes some common components. There’s the opening, where the
interviewer gains entry and establishes the rapport and tone for what follows. There’s the middle
game, the heart of the process, that consists of the protocol of questions and the improvisations of
the probe. And finally, there's the endgame, the wrap-up, where the interviewer and respondent
establish a sense of closure. Whether it’s a two-minute phone interview or a personal interview that
spans hours, the interview is a bit of theater, a mini-drama that involves real lives in real time.
Opening Remarks: In many ways, the interviewer has the same initial problem that a salesperson has.
You have to get the respondent's attention initially for a long enough period that you can sell them

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 218

on the idea of participating in the study. Many of the remarks here assume an interview that is
being conducted at a respondent's residence. But the analogies to other interview contexts should
be straightforward.
Gaining entry: The first thing the interviewer must do is gain entry. Several factors can enhance the
prospects. Probably the most important factor is your initial appearance. The interviewer needs to
dress professionally and in a manner that will be comfortable to the respondent. In some contexts a
business suit and briefcase may be appropriate. In others, it may intimidate. The way the
interviewer appears initially to the respondent has to communicate some simple messages - that
you're trustworthy, honest, and non-threatening. Cultivating a manner of professional confidence,
the sense that the respondent has nothing to worry about because you know what you’re doing - is a
difficult skill to teach and an indispensable skill for achieving initial entry.
Doorstep technique: You’re standing on the doorstep and someone has opened the door, even if only
halfway. You need to smile. You need to be brief. State why you are there and suggest what you
would like the respondent to do. Don’t ask suggest what you want. Instead of saying ‘May I come in
to do an interview?’, you might try a more imperative approach like ‘I’d like to take a few minutes of
your time to interview you for a very important study’.
Introduction: If you’ve gotten this far without having the door slammed in your face, chances are
you will be able to get an interview. Without waiting for the respondent to ask questions, you should
move to introducing yourself. You should have this part of the process memorized so you can deliver
the essential information in 20-30 seconds at most. State your name and the name of the
organization you represent. Show your identification badge and the letter that introduces you. You
want to have as legitimate an appearance as possible. If you have a three-ring binder or clipboard
with the logo of your organization, you should have it out and visible. You should assume that the
respondent will be interested in participating in your important study - assume that you will be doing
an interview here.
Explaining the study: At this point, you’ve been invited to come in. Or, the respondent has continued
to listen long enough that you need to move onto explaining the study. There are three rules to this
critical explanation - (1) Keep it short; (2) Keep it short; and (3) Keep it short! The respondent
doesn't have to or want to know all of the neat nuances of this study, how it came about, how you
convinced your thesis committee to buy into it, and so on. You should have a one or two sentence
description of the study memorized. No big words. No jargon. No detail. There will be more than
enough time for that later (and you should bring some written materials you can leave at the end for
that purpose). This is the ‘25 words or less’ description. What you should spend some time on is
assuring the respondent that you are interviewing them confidentially, and that their participation is
voluntary.
Asking the Questions: You’ve gotten in. The respondent has asked you to sit down and make yourself
comfortable. It may be that the respondent was in the middle of doing something when you arrived
and you may need to allow them a few minutes to finish the phone call or send the kids off to do
homework. Now, you’re ready to begin the interview itself.
Use questionnaire carefully, but informally: The questionnaire is your friend. It was developed with a
lot of care and thoughtfulness. While you have to be ready to adapt to the needs of the setting,
your first instinct should always be to trust the instrument that was designed. But you also need to
establish a rapport with the respondent. If you have your face in the instrument and you read the
questions, you'll appear unprofessional and disinterested. Even though you may be nervous, you need
to recognize that your respondent is most likely even more nervous. If you memorize the first few

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 219

questions, you can refer to the instrument only occasionally, using eye contact and a confident
manner to set the tone for the interview and help the respondent get comfortable.
Ask questions exactly as written: Sometimes an interviewer will think that they could improve on the
tone of a question by altering a few words to make it simpler or more ‘friendly’ – don’t. You should
ask the questions as they are on the instrument. If you had a problem with a question, the time to
raise it was during the training and rehearsals, not during the actual interview. It is important that
the interview be as standardized as possible across respondents (this is true except in certain types
of exploratory or interpretivist research where the explicit goal is to avoid any standardizing). You
may think the change you made was inconsequential when, in fact, it may change the entire meaning
of the question or response.
Follow the order given: Once you know an interview well, you may see a respondent bring up a topic
that you know will come up later in the interview. You may be tempted to jump to that section of the
interview while you're on the topic – don’t. You are more likely to lose your place. You may omit
questions that build a foundation for later questions.
Ask every question: Sometimes you’ll be tempted to omit a question because you thought you already
heard what the respondent will say. Don't assume that. If you hadn’t asked the question, you would
never have discovered the detail.
Obtaining Adequate Responses - The Probe: OK, you’ve asked a question. The respondent gives a
brief, cursory answer. How do you elicit a more thoughtful, thorough response? You probe.
Silent probe: The most effective way to encourage someone to elaborate is to do nothing at all -
just pause and wait. This is referred to as the ‘silent’ probe. It works (at least in certain cultures)
because the respondent is uncomfortable with pauses or silence. It suggests to the respondent that
you are waiting, listening for what they will say next.
Overt encouragement: At times, you can encourage the respondent directly. Try to do so in a way
that does not imply approval or disapproval of what they said (that could bias their subsequent
results). Overt encouragement could be as simple as saying ‘Uh-huh’ or ‘OK’ after the respondent
completes a thought.
Elaboration: You can encourage more information by asking for elaboration. For instance, it is
appropriate to ask questions like ‘Would you like to elaborate on that?’ or ‘Is there anything else you
would like to add?’
Ask for clarification: Sometimes, you can elicit greater detail by asking the respondent to clarify
something that was said earlier. You might say, ‘A minute ago you were talking about the experience
you had in high school. Could you tell me more about that?’
Repetition: This is the old psychotherapist trick. You say something without really saying anything
new. For instance, the respondent just described a traumatic experience they had in childhood. You
might say ‘What I’m hearing you say is that you found that experience very traumatic’. Then, you
should pause. The respondent is likely to say something like ‘Well, yes, and it affected the rest of
my family as well. In fact, my younger sister...’
Recording the Response: Although we have the capability to record a respondent in audio and/or
video, most interview methodologists don’t think it’s a good idea. Respondents are often
uncomfortable when they know their remarks will be recorded word-for-word. They may strain to
only say things in a socially acceptable way. Although you would get a more detailed and accurate
record, it is likely to be distorted by the very process of obtaining it. This may be more of a
problem in some situations than in others. It is increasingly common to be told that your

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 220

conversation may be recorded during a phone interview. And most focus group methodologies use
unobtrusive recording equipment to capture what’s being said. But, in general, personal interviews
are still best when recorded by the interviewer using pen and paper.
Record responses immediately: The interviewer should record responses as they are being stated.
This conveys the idea that you are interested enough in what the respondent is saying to write it
down. You don’t have to write down every single word – you’re not taking stenography. But you may
want to record certain key phrases or quotes verbatim. You need to develop a system for
distinguishing what the respondent says verbatim from what you are characterizing.
Include all probes: You need to indicate every single probe that you use. Develop a shorthand for
different standard probes. Use a clear form for writing them in (e.g., place probes in the left
margin).
Use abbreviations where possible: Abbreviations will help you to capture more of the discussion.
Develop a standardized system (e.g., R=respondent; DK=don’t know). If you create an abbreviation on
the fly, have a way of indicating its origin. For instance, if you decide to abbreviate Spouse with an
‘S’, you might make a notation in the right margin saying ‘S=Spouse’.
Concluding the Interview: When you've gone through the entire interview, you need to bring the
interview to closure. Some important things to remember -
Thank the respondent - Don’t forget to do this. Even if the respondent was troublesome or
uninformative, it is important for you to be polite and thank them for their time.
Tell them when you expect to send results - You owe it to your respondent to show them what you
learned. Now, they may not want your entire 300-page dissertation. It’s common practice to prepare
a short, readable, jargon-free summary of interviews that you can send to the respondents.
Don’t be brusque or hasty - Allow for a few minutes of winding down conversation. The respondent
may want to know a little bit about you or how much you like doing this kind of work. They may be
interested in how the results will be used. Use these kinds of interests as a way to wrap up the
conversation. As you’re putting away your materials and packing up to go, engage the respondent. You
don’t want the respondent to feel as though you completed the interview and then rushed out on
them - they may wonder what they said that was wrong. On the other hand, you have to be careful
here. Some respondents may want to keep on talking long after the interview is over. You have to
find a way to politely cut off the conversation and make your exit.
Immediately after leaving write down any notes about how the interview went - Sometimes you will
have observations about the interview that you didn’t want to write down while you were with the
respondent. You may have noticed them get upset at a question, or you may have detected hostility
in a response. Immediately after the interview you should go over your notes and make any other
comments and observations - but be sure to distinguish these from the notes made during the
interview (you might use a different color pen, for instance).
Strengths and Weaknesses
Possibly the greatest advantage of interviewing is the depth of detail from the interviewee.
Interviewing participants can paint a picture of what happened in a specific event, tell us their
perspective of such event, as well as give other social cues. Social cues, such as voice, intonation,
body language etc. of the interviewee can give the interviewer a lot of extra information that can be
added to the verbal answer of the interviewee on a question. This level of detailed description,
whether it be verbal or nonverbal, can show an otherwise hidden interrelatedness between emotions,
people, objects unlike many quantitative methods of research. In addition, interviewing has a unique

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 221

advantage in its specific form. Researchers can tailor the questions they ask to the respondent in
order to get rich, full stories and the information they need for their project. They can make it
clear to the respondent when they need more examples or explanations. Not only can researchers
also learn about specific events, they can also gain insight into people’s interior experiences,
specifically how people perceive and how they interpreted their perceptions. How events affected
their thoughts and feelings. In this, researchers can understand the process of an event instead of
what just happened and how they reacted to it.
Interviewing is not a perfect method for all types of research. It does have its disadvantages. First,
there can be complications with the planning of the interview. Not only is recruiting people for
interviews hard, due to the typically personal nature of the interview, planning where to meet them
and when can be difficult. Participants can cancel or change the meeting place at the last minute.
During the actual interview, a possible weakness is missing some information. This can arise from the
immense multitasking that the interviewer must do. Not only do they have to make the respondent
feel very comfortable, they have to keep as much eye contact as possible, write down as much as
they can, and think of follow up questions. After the interview, the process of coding begins and
with this comes its own set of disadvantages. Second, coding can be extremely time consuming. This
process typically requires multiple people, which can also become expensive. Third, the nature of
qualitative research itself, doesn’t lend itself very well to quantitative analysis. Some researchers
report more missing data in interview research than survey research, therefore it can be difficult
to compare populations.

9.4.3 FOCUS GROUP DISCUSSION (FGD)


A focus group discussion (FGD) is an in-depth field method that brings together a small
homogeneous group (usually six to twelve persons) to discuss topics on a study agenda. The purpose
of this discussion is to use the social dynamics of the group, with the help of a moderator/
facilitator, to stimulate participants to reveal underlying opinions, attitudes, and reasons for their
behavior. In short, a well facilitated group can be helpful in finding out the ‘how’ and ‘why’ of human
behavior.
Focus group discussions are a data collection method. Data is collected through a semi-structured
group interview process. Focus groups are generally used to collect data on a specific topic. Focus
group methods emerged in the 1940s with the work of Merton and Fiske who used focus groups to
conduct consumer satisfaction. The discussion is conducted in a relaxed atmosphere to enable
participants to express themselves without any personal inhibitions. Participants usually share a
common characteristic such as age, sex, or socio-economic status that defines them as a member of
a target subgroup. This encourages a group to speak more freely about the subject without fear of
being judged by others thought to be superior. The discussion is led by a trained
moderator/facilitator (preferably experienced), assisted by an observer who takes notes and
arranges any tape recording. The moderator uses a prepared guide to ask very general questions of
the group. Usually more than one group session is needed to assure good coverage of responses to a
set of topics. Each session usually lasts between one and two hours but ideally 60 to 90 minutes.
The aim of the focus group is to make use of participants’ feelings, perceptions and opinions. This
method requires the researcher to use a range of skills - group skills; facilitating; moderating;
listening/observing; analysis. Focus groups or group discussions are useful to further explore a
topic, providing a broader understanding of why the target group may behave or think in a particular

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 222

way, and assist in determining the reason for attitudes and beliefs. They are conducted with a small
sample of the target group and are used to stimulate discussion and gain greater insights.
The design of focus group research will vary based on the research question being studied. Below,
highlight some general principles to consider -
 Standardization of questions - focus groups can vary in the extent to which they follow a
structured protocol or permit discussion to emerge.
 Number of focus groups conducted - or sampling will depend on the ‘segmentation’ or different
stratifications (e.g. age, sex, socioeconomic status, health status) that the researcher identifies
as important to the research topic.
 Number of participants per group - the rule of thumb has been 6-10 homogeneous strangers, but
as Morgan (1996) points out there may be reasons to have smaller or slightly larger groups.
 Level of moderator involvement - can vary from high to low degree of control exercised during
focus groups (e.g. extent to which structured questions are asked and group dynamics are
actively managed).
Focus group interviews typically have the characteristics -
 Identify the target market (people who possess certain characteristics).
 Provide a short introduction and background on the issue to be discussed.
 Have focus group members write their responses to the issue(s).
 Facilitate group discussion.
 Recommended size of the sample group is 6 - 10 people as smaller groups may limit the potential
on the amount of information collected, and more may make it difficult for all participants to
participate and interact and for the interviewer to be able to make sense of the information
given.
 Several focus groups should be used in order to get a more objective and macro view of the
investigation, i.e. focusing on one group may give you idiosyncratic results. The use of several
groups will add to the breadth and depth of information. A minimum of three focus groups is
recommended for best practice approaches.
 Members of the focus group should have something in common which is important to the
investigation.
 Groups can either be put together or existing groups - it is always useful to be mindful of the
group dynamics of both situations.
 Provide a summary of the focus group issues at the end of the meeting.
The purpose of an FGD is to obtain in-depth information on concepts, perceptions, and ideas of the
group. An FGD aims to be more than a question-answer interaction. In combination with other
methods, focus groups might be used to -
 explore new research areas;
 explore a topic that is difficult to observe (not easy to gain access);
 explore a topic that does not lend itself to observational techniques (e.g. attitudes and decision-
making);
 explore sensitive topics;
 collect a concentrated set of observations in a short time span;
 ascertain perspectives and experiences from people on a topic, particularly when these are
people who might otherwise be marginalized;
 gather preliminary data;
 aid in the development of surveys and interview guides;

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 223

 clarify research findings from another method;


 explore the range of opinions/views on a topic of interest;
 collect a wide variety of local terms and expressions used to describe a disease (e.g., diarrhea)
or an act (e.g., defecation);
 explore meanings of survey findings that cannot be explained statistically.
Steps in Focus Group Discussions (FGD)
The steps in using FGDs to study a problem are summarized below. The extent to which these steps
must be followed varies, however, depending on the training and experience of those involved in the
data collection.
STEP 1: Plan the entire FGD
 What activities need to be planned?
 Is there the need for a resource person.
 Role of resource person in training field staff.
STEP 2: Decide what types of groups are needed
 Method of sampling (selection criteria)
 Composition of groups
 Number of groups
 Group size
 Contacting and informing participants.
STEP 3: Select moderator and field team
 Field staff requirements
 Moderator
 Observer/recorder
 Other staff.
STEP 4: Develop moderator’s guide and format for recording responses
 Structure and sequence of topics
 Wording of guide
 Number of topics
 Example of an FGD guide.
STEP 5: Train field team and conduct pilot test
 Training hints
 Training package
 Theory sessions
 Practice sessions
 On-going revision of FGD guide.
STEP 6: Prepare for the individual FGDs
 Site selection and location for FGD
 Date and time
 Plan for supporting materials or FGD checklist.
STEP 7: Conduct the FGD
 Conducting the Discussion
 Introduction
 Warm-up
 Discussion

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 224

 Wrap-up summary
 Debriefing
 Collecting and managing information in FGD.
STEP 8: Analyze and interpret FGD results
 How much analysis is required
 Debriefing;
 Notes;
 Transcripts; and log book
 Writing the report
 Interpretation of findings
 Example of format of an FGD report.
Identify suitable discussion participants and invite a small group to a meeting at an agreed place and
time. The ideal number of participants is six to eight, but be flexible about numbers - do not turn
away participants after they had arrived at the meeting and do not pressure people to come to the
meeting. Be psychologically prepared for the session; you will need to remain alert to be able to
observe, listen, and keep the discussion on track for a period of one to two hours. Make sure you
arrive at the agreed place before the participants, and be ready to greet them. Maintain a neutral
attitude and appearance, and do not start talking about the topic of interest before the official
opening of the group discussion. Begin by introducing yourself and your team (even if the
participants have already met them individually), and ask participants to introduce themselves.
Explain clearly that the purpose of the discussion is to find out what people think about the
practices or activities depicted by the pictures. Tell them that you are not looking for any right or
wrong answer but that you want to learn what each participant's views are. It must be made clear to
all participants that their views will be valued. Bring the discussion to a close when you feel the topic
has been exhausted, and do nor let the group discussion degenerate into smaller discussions. Be
sincere in expressing your thanks to the participants for their contributions. Refreshments may be
served at the end of the meeting as a way of thanking the participants and maintaining good rapport
with them.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 225

Conducting FGD
The following guideline may be provided for conducting FGD.
Preparation
Selection of topic: It is appropriate to define and clarify the concepts to be discussed. The basic
idea is to lay out a set of issues for the group to discuss. It is important to bear in mind that the
moderator will mostly be improvising comments and questions within the framework set by the
guidelines. By keeping the questions open-ended, the moderator can stimulates useful trains of
thought in the participants that were not anticipated.
Selecting the study participants: Given a clear idea of the issues to be discussed, the next critical
step in designing a focus group study is to decide on the characteristics of the individuals who are
to be targeted for sessions. It is often important to ensure that the groups all share some common
characteristics in relation to the issue under investigation. If you need to obtain information on a
topic from several different categories of informants who are likely to discuss the issue from
different perspectives, you should organize a focus group for each major category. For example a
group for men and a group for women, or a group for older women and group for younger women. The
selection of the participants can be on the basis of purposive or convenience sampling. The
participants should receive the invitations at least one or two days before the exercise. The
invitations should explain the general purpose of the FGD.
Physical arrangements: Communication and interaction during the FGD should be encouraged in every
way possible. Arrange the chairs in a circle. Make sure the area will be quite, adequately lighted,
etc., and that there will be no disturbances. Try to hold the FGD in a neutral setting that
encourages participants to freely express their views. A health center, for example, is not a good
place to discuss traditional medical beliefs or preferences for other types of treatment. Neutral
setting could also be from the perspective of a place where the participants feel comfortable to
come over and above their party factions.
Conducting the Session
 One of the members of the research team should act as a ‘facilitator’ or ‘moderator’ for the
focus group. One should serve as ‘recorder’.
 Functions of the Facilitator: The facilitator should not act as an expert on the topic. His/her
role is to stimulate and support discussion. S/he should perform the following functions -
Introduce the session - S/he should introduce himself/herself as facilitator and introduce the
recorder. Introduce the participants by name or ask them to introduce themselves (or develop
some new interesting way of introduction). Put the participants at ease and explain the purpose
of the FGD, the kind of information needed, and how the information will be used (e.g., for
planning of a health program, an education program, et.).
Encourage discussion - The facilitator should be enthusiastic, lively, and humorous and show
his/her interest in the group’s ideas. Formulate questions and encourage as many participants as
possible to express their views. Remember there are no ‘right’ or ‘wrong’ answers. Facilitator
should react neutrally to both verbal and nonverbal responses.
Encourage involvement - Avoid a question and answer session. Some useful techniques include
asking for clarification (can you tell me more?); reorienting the discussion when it goes off the
track (Saying - wait, how does this relate to the issue? Using one participant’s remarks to direct
a question to another); bringing in reluctant participants (Using person’s name, requesting
his/her opinion, making more frequent eye contact to encourage participation); dealing with

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 226

dominant participants (Avoiding eye contact or turning slightly away to discourage the person
from speaking, or thanking the person and changing the subject).
Avoid being placed in the role of expert - When the facilitator is asked for his/her opinion by a
respondent, remember that s/he is not there to educate of inform. Direct the question back to
the group by saying ‘What do you think?’ ‘What would you do?’ Set aside time, if necessary, after
the session to give participants the information they have asked. Do not try to give comments on
everything that is being said. Do not feel you have to say Something during every pause in the
discussion. Wait a little and see what happens.
Control the timing of the meeting but unobtrusively - Listen carefully and move the discussion
from topic to topic. Subtly control the time allocated to various topics so as to maintain
interest. If the participants spontaneously jump from one topic to the other, let the discussion
continue for a while because useful additional information may surface and then summarize the
points brought up and reorient the discussion.
 Take time at the end of the meeting to summarize, check for agreement and thank the
participants: Summarize the main issues brought up, check whether all agree and ask for
additional comments. Thank the participants and let them know that their ideas had been
valuable contribution and will be used for planning the proposed research/intervention/or
whatever the purpose of FGD was. Listen to the additional comments made after the meeting.
Sometime some valuable information surfaces, which otherwise may remain hidden.
Advantages and Disvantages of FGD
Focus groups and group discussions are advantageous as they -
 Are useful when exploring cultural values and health beliefs;
 Can be used to examine how and why people think in a particular way and how is influences their
beliefs and values;
 Can be used to explore complex issues;
 Can be used to develop hypothesis for further research;
 Do not require participants to be literate.
Disadvantages of focus groups include -
 Lack of privacy/anonymity;
 Having to carefully balance the group to ensure they are culturally and gender appropriate (i.e.
gender may be an issue);
 Potential for the risk of ‘group think’ (not allowing for other attitudes, beliefs etc.);
 Potential for group to be dominated by one or two people;
 Group leader needs to be skilled at conducting focus groups, dealing with conflict, drawing out
passive participants and creating a relaxed, welcoming environment;
 Are time consuming to conduct and can be difficult and time consuming to analyze.

9.4.4 PARTICIPATORY RURAL APPRAISAL/ ASSESSMENT (PRA)


Participatory rural appraisal/ assessment (PRA) is a set of participatory and largely visual techniques
for assessing group and community resources, identifying and prioritizing problems and appraising
strategies for solving them. During the 1980s, PRA was firstly developed in India and Kenya, mainly
supported by NGOs operating at grass-roots level. Until today PRA evolved so fast in terms of the
methodology, the creation of new tools and specifically in the different ways it is applied. It is a
research/planning methodology in which a local community (with or without the assistance of
outsiders) studies an issue that concerns the population, prioritizes problems, evaluates options for

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 227

solving the problem(s) and comes up with a Community Action Plan to address the concerns that have
been raised. PRA is particularly concerned that the multiple perspectives that exist in any
community are represented in the analysis and that the community itself takes the lead in evaluating
its situation and finding solutions. Outsiders may participate as facilitators or in providing technical
information but they should not ‘take charge’ of the process.
In PRA, a number of different tools are used to gather and analyze information. These tools
encourage participation, make it easier for people to express their views and help to organize
information in a way that makes it more useful and more accessible to the group that is trying to
analyze a given situation. It is also called ‘Participatory Learning for Action (PLA)’, is a
methodological approach that is used to enable farmers to analyze their own situation and to develop
a common perspective on natural resource management and agriculture at village level.
Key Tenets / Principles of PRA
 Participation: Local people’s input into PRA activities is essential to its value as a research and
planning method and as a means for diffusing the participatory approach to development.
 Teamwork: To the extent that the validity of PRA data relies on informal interaction and
brainstorming among those involved, it is best done by a team that includes local people with
perspective and knowledge of the area’s conditions, traditions, and social structure and either
nationals or expatriates with a complementary mix of disciplinary backgrounds and experience. A
well-balanced team will represent the diversity of socioeconomic, cultural, gender, and
generational perspectives.
 Flexibility: PRA does not provide blueprints for its practitioners. The combination of techniques
that is appropriate in a particular development context will be determined by such variables as
the size and skill mix of the PRA team, the time and resources available, and the topic and
location of the work.
 Optimal Ignorance: To be efficient in terms of both time and money, PRA work intends to gather
just enough information to make the necessary recommendations and decisions.
 Triangulation: PRA works with qualitative data. To ensure that information is valid and reliable,
PRA teams follow the rule of thumb that at least three sources must be consulted or techniques
must be used to investigate the same topics.
Organizing PRA
A typical PRA activity involves a team of people working for two to three weeks on workshop
discussions, analyses, and fieldwork. Several organizational aspects should be considered –
 Logistical arrangements should consider nearby accommodations, arrangements for lunch for
fieldwork days, sufficient vehicles, portable computers, funds to purchase refreshments for
community meetings during the PRA, and supplies such as flip chart paper and markers.
 Training of team members may be required, particularly if the PRA has the second objective of
training in addition to data collection.
 PRA results are influenced by the length of time allowed to conduct the exercise, scheduling and
assignment of report writing, and critical analysis of all data, conclusions, and recommendations.
 A PRA covering relatively few topics in a small area (perhaps two to four communities) should
take between ten days and four weeks, but a PRA with a wider scope over a larger area can take
several months. Allow five days for an introductory workshop if training is involved.
Reports are best written immediately after the fieldwork period, based on notes from PRA team
members. A preliminary report should be available within a week or so of the fieldwork, and the final
report should be made available to all participants and the local institutions that were involved.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 228

PRA Tools
PRA is an exercise in communication and transfer of knowledge. Regardless of whether it is carried
out as part of project identification or appraisal or as part of country economic and sector work,
the learning-by-doing and teamwork spirit of PRA requires transparent procedures. For that reason,
a series of open meetings (an initial open meeting, final meeting, and follow-up meeting) generally
frame the sequence of PRA activities. Common tools in PRA are –
Mapping: Making a community map is probably the best approach for you to get started, and for a
community to get started. Take a group on a walk through the community, and let them draw a map
of the area. Let the map include communal facilities, personal and family buildings, assets and
liabilities. Do not draw the map for them. One method is for individuals or small groups to each make
a separate map, then, as a group exercise later, all the small groups of individuals prepare a large
map (e.g. using newsprint or flip chart paper) combining and synthesizing what is included on all the
maps. Valuable information over and above that shown on scientifically produced maps can be
obtained from maps drawn by local people. These maps show the perspective of the drawer and
reveal much about local knowledge of resources, land use and settlement patterns, or household
characteristics. You can encourage community members to draw their map on the ground, using
sticks to draw lines. Drawing the map on the ground, like drawing a large map on the wall, gives you
and the participants a chance to easily make the drawing process a group process.
Models: If the community members add sticks and stones to a map scratched onto the ground, they
are making a simple model - a three dimensional map. Do not draw the map or construct the model
for the participants; encourage them to all contribute. As you watch them, note if some facilities
are made before others, if some are larger in proportion than others. This will give you some insight
into what issues may be more important than others to the participants. Make notes; these will
contribute to your sociological understanding of the community. Make a copy on paper of the map or
model as a permanent record. Maps and models can later lead to transect walks, in which greater
detail is recorded
Creating a Community Inventory: The inventory, and especially the process of making it, is the most
important and central element of participatory appraisal. The process of making the community
inventory is sometimes called semi structured interviewing. If it were perfectly unstructured, then
it would be a loose conversation that goes nowhere. A ‘Brainstorm’ session, in contrast, is highly
structured (The brainstorm has its uses, especially in the project design phase of community
empowering). Making the inventory is somewhere in between these two. You also allow the discussion
to be a little bit free, especially in allowing participants to analyze their contributions to making the
inventory. You do not work with a set of specific questions, but you might best prepare a check list
of topics to cover and work from that so that you cover all topics. When you prepare your check
list, remember that you should include both assets and liabilities in the community. Include available
facilities, including how well they are working, or not working. Include potentials and opportunities as
well as threats and hindrances, both possible and current. Remember that this is an assessment. Aim
for an inventory that assesses the strengths and weaknesses of the community. Your job is not to
create the inventory, but to guide the community members to construct it as a group.
Focus Group Discussions: There may be a range of experiences and opinions among members of the
community or there may be sensitivity in divulging information to outsiders or to others within the
community. This is where a focus group discussion can be useful. It is best here if you do not work
alone, but as a facilitation team of two or three facilitators, one leading the discussion and another
making a record. The discussion topics chosen should be fewer than for the general community

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 229

inventory. First conduct separate sessions for the different interest groups, record their
contributions carefully, and then bring them together to share as groups their special concerns. It
is important to be careful here. While you recognize the different interest groups in the community,
you do not want to increase the differences between the groups - to widen the schism. You are not
trying to make all the different groups the same as each other, but to increase the tolerance,
understanding and co-operation between them. Special focus groups gives you the opportunity to
work separately with different groups that may find it difficult at first to work together; but you
must work towards bringing them together.
Preference Ranking: When you are working with a community with different interest groups, you may
wish to list preference rankings of the different groups, and then look at them together with the
groups together. Preference ranking is a good ice-breaker at the beginning of a group interview, and
helps focus the discussion.
Wealth Ranking: This is a particularly useful method of (1) discovering how the community members
define poverty, (2) to find who the really poor people are, and (3) to stratify samples of wealth. This
is best done once you have built up some rapport with the community members. A good method here
is to make a card the name of each of the households in the community on it. Select some members
of the community. Ask them to put these cards into groups according to various measures of wealth
and to give their rationale (reasons) for the groupings. How they categorize members of the
community, and the reasons they give for making those categories and for putting different
households into each category, are very revealing about the socio-economic makeup of the
community.
Seasonal and Historical Diagramming: Seasonal and historical variations and trends can be easy to
miss during a short visit to the field. You can attempt various diagramming techniques can help
explore changes in - rainfall, labor demand, farming (fishing, hunting, herding) activities, wood supply
for fuel, disease incidence, migration for employment, food stocks and many other elements that
change over time. The diagrams you produce can be used as a basis for discussions for the reasons
behind changes and implications for the people involved.
Institutional Mapping: Information about the social organization of the community and the nature of
social groups is difficult to get in a short visit. Complex relationships between rich and poor
segments of the community, family ties and feuds, and political groups cannot be untangled in a few
weeks. Using participatory appraisal methods can be useful here. One way to understand the less
sensitive aspects of social interaction in a community is to ask key informants to construct a ‘Venn
diagram’. This technique is simply a collection of circles, each of which represents a different group
or organization active in the community. The size of each circle reflects the relative importance of
the group represented-the smaller the circle, the less influential the group. The amount of overlap
between two circles represents the amount of collaboration or joint decision making between two
groups.
Participatory Mapping: Create a wall or ground map with group participation. Members should do the
marking, drawing and coloring with a minimum of interference and instruction by outsiders. Using
pencils, pens or local materials (e.g. small rocks, different colored sands or powders, plant material)
members should draw maps that depict/illustrate certain things. Each group member is then asked
‘to hold the stick’ to explain the map or to criticize it or revise it. Create resource maps showing the
location of houses, resources, infrastructure and terrain features-useful for analyzing certain
community-level problems. Create social maps, showing who is related to whom and where they live.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 230

Seasonal Calendars: These charts show monthly changes in climate (rainfall or temperature) or
agricultural activities (agricultural hours worked, different activities undertaken, crop cycles). The
calendars are useful in identifying planting and harvesting times, labor constraints and marketing
opportunities.
Matrices: These are grid formats used to illustrate links between different activities or factors.
They are useful in information gathering and analysis.
Important Techniques of Participatory Rural Appraisal (PRA)
Village Transect: A transect is constructed with the help of local inhabitants by walking through the village. The major
objective of a transect is to identify the types of land-use, opportunities and constraints to the agricultural or rural
development. The application of a transect is to identify and explain the cause and effect relationships between
topography, soils, natural vegetation, cultivation and other production activities and human settlement patterns.
Procedure - Draw an outline map of the village. Ask villagers to select one or more routes which cover the main variations in
topography. Ask two or more people to accompany you to the edge of the village. Stop when you arrive at the edge of a new
topography zone; record the characteristics and distance covered by the last zone. When the transect is completed prepare
a chart summarizing the major features encountered. When more than one transects has been completed, prepare a
combined chart, compare results and generate questions and hypothesis for latter enquires.
Social and Physical Maps: The social and resource map is used to show the relative location of different households,
resource points, roads, canals, crop fields, residential areas, markets, educational institutions, co-operative societies, etc.
The villagers are asked to draw a social map of the village usually on the ground using a pointed stick. A social map drawn by
villagers should encourage maximum participation and interaction of the villagers.
Procedure - Select a suitable space. Mark paths and other landmarks from the residential part of the village on the ground.
Sub-divide the village into para or other units to enable the available informants to provide accurate information. Ask the
informant to identify the position of each household, and write the name on a strip of paper, which can then be placed on
the map. Use appropriate symbols and materials to build on any further information, which may be required about assets,
group membership, etc. Start recording on a separate sheet of paper as soon as the locations of the households have been
identified.
Seasonality Exercise: To identify the times of year at which people suffer from particular hardship like unemployment,
diseases, rainfall, draught and some other allied aspects of the rural life. To take appropriate safety nets or other remedial
action.
Procedure - Consider all the months in a year either in Bangla or in English year. Lay out the matrix on the ground
considering months along one axis and the items of a particular phenomenon along the other axis. To get information with
degree of differentiation by the villagers use sticks, seeds and other locally available materials. Count the number of seeds
or sticks by row and column. Consider this number as score of the respective item. Assign rank according to score.
Chapati or Venn Diagrams: To identify the institutions in a community. To show how the various external institutions
involved in the delivery of services. To show how they relate to each other.
Procedure - Cut a large circle of paper to represent the major institutions with which you are concerned (Village or Para).
Cut or draw oval shapes to represent outside institutions with linkages in the village and place these overlapping with the
outer edges of the circle (size can be used to indicate relative importance). Cut or draw further circles of appropriate sizes
to represent institutions wholly contained within the village. Relate these to each other through overlaps where these exist,
through incorporation where one institution lies entirely within another and through separate location where there is no
overlap. Check that the basic diagram is correct before reproducing a clean version on another sheet of paper.
Wealth Ranking: Means of dividing households into different economic categories. This can be used to identify target group
members before an activity is launched or to determine the extent to which targeting has proved successful after the
event.
Procedure - List each household name on a card together with other information. Identify the criteria which they use in
distinguishing between the better and less well off households. Keeping the criteria in mind request the participants to
place the cards in a small number of piles. The category of each household to be recorded at the bottom of the card.
Finally, count the number of households in each pile and record accordingly.
Preference Ranking: Ranking means placing something in sequential order. Preference ranking is a tool that helps us
prioritizing the problems.
Procedure- Organize one focus group representing relevant stakeholders. Make a list of all the problems to be prioritized.
Identify criteria on which problems are to be prioritized. Criteria can be identified through comparing the problems by pair
wise. Define all of the criteria positively. For example ‘tastes good’, ‘not tasted bad’, or ‘easy to cook’, ‘not hard to cook’,
then select a suitable symbol for each one. Decide whether you will ask the informant to rank items on a simple yes/no

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 231

basis, or whether you want to assign scores (say from one to three). Lay out the matrix on the ground with the problems
along one axis and the criteria along the other. Ask the informant to rank or score each item against each criterion, using
seeds or available material. This can be done on a scale of 1-3 or by allocating a fixed number of seeds for each criterion.
When the exercise is completed verify the results with the participants. Put the most favored items at the top; the least
favored at the bottom, the most powerful criteria on the left, and the weakest on the right.

Sequence of Techniques
PRA techniques can be combined in a number of different ways, depending on the topic under
investigation. Some general rules of thumb, however, are useful. Mapping and modeling are good
techniques to start with because they involve several people, stimulate much discussion and
enthusiasm, provide the PRA team with an overview of the area, and deal with noncontroversial
information. Maps and models may lead to transect walks, perhaps accompanied by some of the
people who have constructed the map. Wealth ranking is best done later in a PRA, once a degree of
rapport has been established, given the relative sensitivity of this information. Preference ranking
is a good icebreaker at the beginning of a group interview and helps focus the discussion. Later,
individual interviews can follow up on the different preferences among the group members and the
reasons for these differences.
Seven major techniques used in PRA
1. Secondary data reviews - books, files, reports, news, articles, maps, etc.
2. Observation - direct and participant observation, wandering, DIY (do-it-yourself) activities.
3. Semi-structured interviews - this is an informal, guided interview session, where only some of
the questions are pre-determined and new questions arise during the interview, in response to
answers from those interviewed.
4. Analytical game - this is a quick game to find out a group’s list of priorities, performances,
ranking, scoring, or stratification.
5. Stories and portraits - colorful description of the local situation, local history, trend analysis,
etc.
6. Diagrams - maps, aerial photos, transects, seasonal calendars, Venn diagram, flow diagram,
historical profiles, ethno-history, timelines, etc.
7. Workshop - local and outsiders are brought together to discuss the information and ideas
intensively.
Modified PRA Tools: Resource Map; Social Map; Wealth Ranking Objectives; Local Perceptions of Malnutrition Mapping
Objectives; Venn Diagram on Institutions; Resource Cards; Seasonal Calendar; Income and Expenditure Matrix; Daily
Activity Clocks; Focus Group Discussion; Semi Structured Interview; Community Workshop; Daily Evaluation and Planning
Meeting.
Resource Map: It is a tool that helps us to learn about a community and its resource base. The primary concern is not to
develop an accurate map but to get useful information about local perceptions of resources. The participants should develop
the content of the map according to what is important to them. The objective is to learn the villagers’ perceptions of what
natural resources are found in the community and how they are used.
Social Map: It is a map that is drawn by the residents and which shows the social structures and institutions found in an
area. It also helps us to learn about social and economic differences between the households. The objectives are – to learn
about the social structures and the differences among the households by ethnicity, religion and wealth; to learn about who
is living where; to learn about the social institutions and the different views local people might have regarding those
institutions.
Wealth Ranking Objectives: To investigate perceptions of wealth differences and inequalities in a community; to identify
and understand local indicators and criteria of wealth and well-being; to map the relative position of households in a
community. Ranking and mapping methods are used. Carry out the exercise with a few key informants who know the
community well.
Local Perceptions of Malnutrition Mapping Objectives: To identify various forms of malnutrition prevalent in the community;
to understand the local perceptions of malnutrition; to map nutritionally vulnerable households. Ranking, mapping and matrix

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 232

methods are used. Carry out the interview with one or more key informants (Community Health Worker; Traditional Birth
Attendant; Home Agent; Traditional Healer; Teacher etc.).
Venn Diagram on Institutions: It shows institutions, organizations, groups and important individuals found in the village
(Kushet), as well as the villagers view of their importance in the community. Additionally the Diagram explains who
participates in these groups in terms of gender and wealth. The Institutional Relationship Diagram also indicates how close
the contact and cooperation between those organizations and groups is. The objectives are – to identify external and
internal organizations/groups/important persons active in the community; to identify who participates in local
organizations/ institutions by gender and wealth; to find out how the different organizations and groups relate to each
other in terms of contact, co-operation, flow of information and provision of services.
Resource Cards: Resource picture cards are useful for facilitating a discussion about who uses and controls resources in a
fun and non-threatening way. They show very clearly the resource base of both men and women. This can lead to discussions
about differences between men’s and women’s priorities and their need for resources. The objective is to learn about
differences between men and women in use and control over resources.
Seasonal Calendar: A seasonal calendar is a participatory tool to explore seasonal changes (e.g. gender-specific workload,
diseases, income, expenditure etc.). The objective is to learn about changes in livelihoods over the year and to show the
seasonality of agricultural and non agricultural workload, food availability, human diseases, gender-specific income and
expenditure, water, forage, credit and holidays.
Income and Expenditure Matrix: It is a tool that helps us to identify and quantify the relative importance of different
sources of income and expenditures. The tool also helps us to understand how secure or how vulnerable certain groups of
people incomes are. In the Expenditures matrix, we can see if all, most or only some of people's total income is spent to
meet basic needs - food, water, clothing, shelter, health care, education. We can also ask whether people have any money
left over to save or to invest in tools, fertilizer, or other important items that could help them in their work. The objective
is to learn about sources of income (cash and kind) and how income is proportionality spent by gender and wealth.
Daily Activity Clocks: Daily activity clocks illustrate all of the different kinds of activities carried out in one day. They are
particularly useful for looking at relative work-loads between different groups in the community. Comparisons between
clocks show who works the longest hours, who concentrates on a few activities and who does a number of tasks in a day, and
who has the most leisure time and sleep. The objective is to learn what different people do during one day and how heavy
their workloads are.
Focus Group Discussion: Semi-structured group interview, ranking and matrix methods are used. The objectives are –
understand local perceptions of nutrition and household food security; identify and understand constraints in the household
and community to achieving nutrition and household food security; identify and understand mechanisms in the household and
the community to cope with nutrition and household food insecurity; identify what community, household and individual
resources are required to obtain nutrition and household food security.
Semi Structured Interview: Semi-structured group interview, ranking and observation methods are used. The objectives
are – understand why members of a household (that was mapped as being affected by malnutrition) have nutrition-related
health problems and why other households are not affected; identify constraints and opportunities in the household and
community for household members to achieve nutrition security.
Community Workshop: ‘Group Discussion’ and ‘Presentation’ are used as methods. The objectives are – to present the main
findings and conclusions of the appraisal to the community at large; to provide an opportunity to the community for
discussion of the main findings of the appraisal; to reach a consensus on the way forward and the roles and responsibilities
of the community, the community support staff and the project. Organize a meeting with the community at large, ensuring
that men and women are equally represented, as well as people from different socio-economic groups and ages.
Daily Evaluation and Planning Meeting: Every afternoon the PRA team comes together to reflect the process of day, to
present the results gathered, to evaluate the results and to plan for the next day. The objectives are – to present the
results of the day; to summarize and structure the results according to the key questions and according to related
‘Strength and Weaknesses’ inside the community and according to ‘Opportunities and Threats’ identified outside the
community; to compare the results of the different groups and to identify differences and correspondences; to enable the
PRA team to elaborate new relevant key questions and a program for the next day.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 233

Using of PRA
PRA supports the direct participation of communities, with rural people themselves becoming the
main investigators and analysts. Rural people set the priorities; determine needs; select and train
community workers; collect, document, and analyze data; and plan and implement solutions based on
their findings. Actions stemming from this research tend to serve the local community. Outsiders
are there to facilitate the process but do not direct it. PRA uses group animation and exercises to
facilitate information sharing, analysis, and action among stakeholders. PRA is an exercise in
communication and transfer of knowledge. Regardless of whether it is carried out as part of project
identification or appraisal or as part of country economic and sector work, the learning-by-doing and
teamwork spirit of PRA requires transparent procedures. For that reason, a series of open meetings
(an initial open meeting, final meeting, and follow-up meeting) generally frame the sequence of PRA
activities. A typical PRA activity involves a team of people working for two to three weeks on
workshop discussions, analyses, and fieldwork.
Scope of PRA
PRA is used –
 To ascertain needs;
 To establish priorities for development activities;
 Within the scope of feasibility studies;
 During the implementation phase of projects;
 Within the scope of monitoring and evaluation of projects;
 For studies of specific topics;
 For focusing formal surveys on essential aspects, and identifying conflicting group interests.

Areas of Application
 Natural resource management
 Agriculture
 Poverty alleviation/women in development programs
 Health and nutrition
 Preliminary and primary education
 Village and district-level planning
 Institutional and policy analysis.

Advantages of PRA
 Identification of genuine priorities for target group. PRA allows local people to present their
own priorities for development and get them incorporated into development plans.
 Devolution of management responsibilities. An important goal of PRA is to encourage self-reliant
development with as much of the responsibility for the management and implementation of
development activities devolved to local people themselves. This can greatly improve the
efficiency of development work and eliminate many of the problems regarding proprietorship of
development activities at the community level.
 Motivation and mobilization of local development workers. Participation in PRA by local
development workers, whether from NGOs, government or other agencies can greatly increase
the motivation and level of mobilization in support of the project or program of which it is part.
Where changes in development approaches are being introduced, such as a shift to a more
integrated development planning mechanism, a PRA-type activity which illustrates how these new
mechanisms will work on the ground can help to ensure better understanding and commitment by

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 234

local workers. This is one reason why involvement of people from different administrative and
organizational levels can be vital so that commitment is built up right through the chain.
 Forming better linkages between communities and development institutions. PRA can assist in
forming better links between communities and the agencies and institutions concerned with rural
development. A PRA which encourages a better understanding of the environmental issues at
stake in local communities and develops activities which enable them to benefit from better
management could also lead to better monitoring of mangrove exploitation by the communities
themselves. PRAs involve intensive interaction between communities and outsiders which can
have lasting effects in breaking down the barriers of reticence and suspicion which often
characterize these relationships.
 Use of local resources. Where local people have had more say in the design of projects they are
also more likely to design activities which make full use of existing resources.
 Mobilization of community resources. Greater commitment from the community can also mean
greater mobilization of community resources for development and less reliance on outside inputs.
This can take the form of labor inputs, savings or time devoted to management functions.
 More sustainable development activities. This combination of effects will generally lead to more
sustainable development activities which are less reliant on support from outside agencies and is
technically, environmentally and socially appropriate to local conditions.
These benefits from participation can only be realized where the full implications of participation
for the development agencies which are encouraging it have been taken into account and
accommodated and the institutions involved are willing to support the sort of long-term changes in
social, political and institutional frameworks which proper participation, and PRA, can set in motion.
Where this is not the case, many of the following disadvantages can come into play.
Weaknesses of PRA
 The term PRA itself can cause difficulties. PRA need not be rural, and sometimes is not even
participatory, and is frequently used as a trendy label for standard RRA techniques.
 Raising expectations which cannot be realized. One of the most immediate and frequently
encountered risks in PRA is that it raises a complex set of expectations in communities which
frequently cannot be realized given the institutional or political context of the area. This can be
due to the political situation, the local power and social structure or simply too bureaucratic
inertia in institutions which are supposed to be supporting development. In some cases the
intended aim of the PRA may be to deliberately raise expectations ‘at the grassroots’ so as to
put pressure on the institutional and political structures above to change. However, not all
development agencies are in a position to support such activities and there is a risk that agencies
which are not properly equipped to respond to PRA-type planning may use the approach
inappropriately.
 Hijacking. If PRA becomes part of the global development agenda, there are risks of hijacking -
When this occurs, the PRA agenda is externally driven, and used to create legitimacy for
projects, agencies and NGOs.
 Disappointment. Local expectations can easily be raised. If nothing tangible emerges, local
communities may come to see the process as a transient external development phenomenon. Lack
of feedback to the community adds to the sense of disappointment.
 Failure to take account of stratification in communities. The fact that PRA is often carried out
with the community as a whole can mean that stratification within the community, whether by
wealth, social status, gender or ethnic group, can often be obscured and ignored.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 235

 Threats. The empowerment implications of PRA, and the power of its social analysis, can create
threats to local vested interests, although less so than with PAR (Participatory Action
Research).

9.4.5 RAPID RURAL APPRAISAL/ ASSESSMENT (RRA)


Rapid Rural Appraisal (RRA) emerged in the late 1970s in response to some of the problems with
large-scale, structured questionnaire surveys. It provided an alternative technique for outsiders –
often scientists carrying out research into agriculture – to quickly learn from local people about
their realities and challenges. RRA practitioners worked in multi-disciplinary teams and pioneered
the use a suite of visual methods and semi-structured interviews to learn from respondents. While it
was largely about data collection, usually analyzed by outsiders, RRA contained the seeds from which
other primary methods grew in the 1980s. Reflections on RRA led to the development of
Participatory Rural Appraisal (PRA), which focused more strongly on facilitation, empowerment,
behavior change, local knowledge and sustainable action. It was developed in response to the
disadvantages of more traditional research methods, including - the time taken to produce results,
the high cost of formal surveys and the low levels of data reliability due to non-sampling errors.
RRA is a bridge between formal surveys and unstructured research methods such as depth
interviews, focus groups and observation studies. In developing countries, it is sometimes difficult
to apply the standard marketing research techniques employed elsewhere. There is often a paucity
of baseline data, poor facilities for marketing research (e.g. no sampling frames, relatively low
literacy among many populations of interest and few trained enumerators) as well as the lack of
appreciation of the need for marketing research. The nature of RRA is such that it holds the
promise of overcoming these and other limitations of marketing research.
Unfortunately, there is no generally accepted definition of RRA. RRA is more commonly described as
a systematic but semi-structured activity out in the field by a multidisciplinary team and is designed
to obtain new information and to formulate new hypotheses about rural life. A central characteristic
of RRA is that its research teams are multidisciplinary. Beyond that, the distinction between RRA
and other research methodologies dependents upon its multidisciplinary approach and the particular
combination of tools that in employs. A core concept of RRA is that research should be carried out
not by individuals, but by a team comprised of members drawn from a variety of appropriate
disciplines. Such teams are intended to be comprised of some members with relevant technical
backgrounds and others with social science skills, including marketing research skills. In this way, it
is thought that the varying perspectives of RRA research team members will provide a more
balanced picture. The techniques of RRA include – interview and question design techniques for
individual, household and key informant interviews; methods of cross-checking information from
different sources; sampling techniques that can be adapted to a particular objective; methods of
obtaining quantitative data in a short time frame; group interview techniques, including focus-group
interviewing; methods of direct observation at site level, and use of secondary data sources. RRA is
an approach for conducting action-oriented research in developing countries.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 236

Many ‘definitions’ of RRA have been offered by different people who have worked on it, but there
are always others who object to those definitions because they are not what they think RRA is or
should be. The fact that it is difficult to give a precise definition to RRA is a reflection of the fact
that it is very flexible - it is a tool which can be used in a lot of different situations to achieve very
different objectives. Not surprisingly everybody seems to think RRA is what they have used it for.
So it is probably best to avoid ‘definitions’ and just describe the features which most RRAs seem to
have in common. RRA essentially consists of the following –
 an activity carried out by a group of people from different professional fields or disciplines
which usually aims to learn about a particular topic, area, situation, group of people or whatever
else is of concern to those organizing the RRA
 it usually involves collecting information by talking directly to people ‘on the ground’
 it uses a set of guidelines on how to approach the collection of information, learning from that
information and the involvement of local people in its interpretation and presentation
 it uses a set of tools - these consist of exercises and techniques for collecting information,
means of organizing that information so that it is easily understood by a wide range of people,
techniques for stimulating interaction with community members and methods for quickly
analyzing and reporting findings and suggesting appropriate action.
These features are just about the ‘bottom line’ with RRA but everything else is fairly flexible within
the guidelines described below.
RRA Guidelines
 Structured but flexible: RRA is a structured activity requiring careful planning, clear
objectives, the right balance of people involved and a good choice of tools and techniques for
use in the field. At the same time, it is flexible enough to respond to local conditions and
unexpected circumstances. Progress is reviewed constantly so that new information can be
understood and the focus of the RRA redirected.
 Integrated and interdisciplinary: RRA helps ‘outsiders’ to learn about rural conditions by looking
at them from many points of view. This means having people participating with a variety of
different technical and scientific skills and a balance of different institutional outlooks. This
requires an integrated development approach which cuts across institutional and disciplinary
boundaries.
 Awareness of bias: Researchers and development workers who are trying to understand rural
conditions can be biased by their urban attitudes, their own professional and personal priorities,
the type of transport they use, the language they speak. The people researchers talk to can be
biased as well by their limited experience, their customs and beliefs and their own interests and
those of their families. RRA seeks to avoid biases by being aware of them and by being
systematic in taking into account different points of view and different sets of interests.
 Accelerating the planning process: RRA tries to shorten the time it takes to get from knowing
nothing about an area or a situation to deciding what development interventions might be best
for that area by using key informants, careful observation and by exploiting the knowledge and
experience of local people. The information produced is analyzed ‘on the spot’ and presented in a
form which is more easily used by planners and which can be discussed and understood by local
people themselves.
 Interaction with and learning from local people: Whatever the purpose of the RRA it must
involve the people who are the intended ‘beneficiaries’ of any eventual development activities.
RRA should give them the opportunity to describe their lives and conditions. The people carrying

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 237

out an RRA must be prepared to listen to local people and learn from them. Participation by local
people can take many forms but any RRA will involve intense interaction between researchers,
planners, traditional and formal authorities and local people.
 Combination of different tools: The RRA approach uses a combination of communication and
learning tools. These tools help outsiders to observe conditions in a concise but systematic way.
They also allow local people to present their knowledge, concerns and priorities to outsiders. The
combination of different tools and techniques builds up a more complete picture where
different viewpoints can be compared and contrasted. The systematic cross-checking of
information collected in different ways by different people from different sources can
increases accuracy and comprehensiveness.
 Iterative: During an RRA, what has been learnt is constantly reviewed and analyzed in the field.
This is usually done in workshops carried out at regular intervals. This means the focus of the
RRA, the tools used and the people talked to can be adjusted constantly.
Obviously, these guidelines leave plenty of room for the people using RRA to decide exactly what
they want to do with it. For example, if the most important thing for the people organizing the RRA
is to collect information quickly, they might want to structure the activity more carefully so that
things move faster. If one of the principal concerns is to get local people involved as much as
possible, the structure of the RRA would probably have to be looser and more time allowed for
getting to know the people and putting them at ease.
RRA Teams
The composition of the team which carries out an RRA is extremely important in determining the
outcome of an RRA. Obviously, the composition of an RRA team depends very much on the objectives
of the RRA and the particular concerns which it is addressing.
 Gender Considerations: Gender bias is particularly important for RRA teams. For male
researchers, women in many rural communities are difficult to contact and talk to and may
remain almost invisible to anyone visiting the community for a short time. However all aspects of
rural conditions studied by an RRA team will have gender dimensions which need to be taken into
consideration.
 Multidisciplinary: The composition of teams carrying out RRAs should be dictated by a careful
consideration of the objectives of the appraisal, the issues which are thought to be of
importance in the area and the need to have a balanced set of disciplinary, institutional and
gender viewpoints represented on the team. As a minimum requirement, there should be a
balance between specialists in the biological and physical sciences and specialists in the social
sciences. However, the need for different formal backgrounds should not be overemphasized.
The important point is to have people who can contribute different ways of looking at rural
conditions - so, when organizing an RRA, it might be possible for people to ‘cover’ different
disciplines at the same time if they have the relevant experience.
 Levels of Expertise: One of the risks of RRA is that it tends to rely on the knowledge,
experience and ‘sensitivity’ of team members to come to conclusions about rural conditions.
These conclusions cannot then be tested or checked against ‘hard data’. This means that a great
deal depends on the skills of team members. As a result, it has always been regarded as
important to have experienced and skilled people on RRA teams. Obviously this is preferable, but
RRA does not depend only on the skills and experience of its team members to overcome the
risks of coming to faulty conclusions due to lack of hard data. It is the combination of different
viewpoints and the systematic use of cross-checking during an RRA that counts perhaps more

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 238

than individual skills. The presence on the team of ‘authoritative’ experts, with a wide range and
depth of experience in their fields, can be an advantage as they bring new knowledge and
experience to bear on local problems. However, such ‘experts’ also have to be willing to listen and
learn from the activity. Frequently, those who are most qualified are also most likely to impose
their own biases and interpretations on the work of the team. Experts who are not willing to
learn something new during an appraisal can create more problems than they solve. In such
circumstances it can be better to have a less experienced specialist who is willing to learn
something new than a highly experienced expert who is sure that s/he knows everything already.
 RRA Experience: At least one member of the team should have experience in carrying out RRAs.
This person can act as trainer in RRA techniques and as facilitator, guiding the rest of the team
through the process of carrying out the RRA and making sure that the activity keeps on track.
 Mix of Institutions: The involvement of people from the institutions and agencies which will
implement RRA recommendations is important. It can ensure that the subsequent involvement of
different agencies is based on the same understanding of the local situation and a similar
interpretation of local needs and priorities. Where many agencies are involved a few key
personnel have to be selected either because of their skills or because they are likely to play a
leading role in the future. Team members from different agencies can also contribute a range of
perspectives to the RRA and improve the depth of understanding achieved. RRAs can provide an
opportunity for people from different levels of the hierarchy of development agencies and
institutions to work together. Involvement of such a range of people in an RRA can lead to a
better understanding both of the conditions of ‘target’ communities and of the different
priorities and problems of workers at different administrative and organizational levels i.e.
regional planners and village extension workers.
 Language Ability: As many of the team as possible should be able to communicate directly with
local people in their normal language. Use of translators and interpreters is clumsy and risky.
Advantages of RRA
 The approach is responsive and flexible to new learning and conditions on the ground.
 Achieves a complex understanding of processes and dynamics and connections between different
disciplines, activities and sets of conditions.
 The analysis and interpretation of findings is carried out during the appraisal providing
opportunities for cross-checking.
Weaknesses of RRA
 The findings will not be statistically ‘sound’, even if RRA teams can use ‘quick and dirty’ sampling
methods to make sure that they cover a reasonable number of people or households in a
particular area.
 Risk that the information gathered by an RRA is not very ‘representative’ but is a collection of
‘particular cases’ which do not tell researchers very much about general conditions.
 RRA is very dependent on the skills of the people carrying it out and having the right
combination of experience and viewpoints on the team.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 239

Some Principles that are shared by PRA and RRA


 Offsetting biases through different perspectives, methods and tools, sources of information,
people from different background and places, background of team members - spatial, person,
gender, age groups, interest groups, key informants, wealth groups, seasonal, professionals,
disciplines.
 Rapid and progressive learning - flexible, interactive.
 Be gender sensitive at all times.
 Reversal of roles - learning from, with and by local people, eliciting and using their symbols,
criteria, categories and indicators; and finding, understanding and appreciating local people’s
knowledge.
 Focused learning - not finding out more that is needed and not measuring when comparing is
enough. We are often trained to make absolute measurements and to give exact numbers, but
often relative proportions, trends, scores or ranking are all that is needed for decision making
and planning of activities.
 Seeking for diversity and differences - people often have different perceptions of the same
situation.
 Attitude - in order to make the PRA or RRA workshops as success, it is most important build a
positive relationship with local women and men. Outsiders must have an attitude of respect,
humility and patience, and a willingness to learn from the local people.
Potential Differences between RRA and PRS
RRA PRA
Responding to needs of development Responding to needs of communities and target groups
workers and agencies
More emphasis on efficient use of More emphasis on flexibility to adapt to time frame of
time & achievement of objectives community
Communication and learning tools used Communication and learning tools used to help local people
to help outsiders analyze conditions analyze their own conditions and communicate with
and understand local people outsiders
Focus of RRA decided by outsiders Focus of PRA decided by communities
End product mainly used by End product mainly used by community
development agencies and outsiders
Enables development agencies and Enables (empowers) communities to make demands on
institutions to be more ‘participatory’ development agencies and institutions
Can be used purely for ‘research’ Closely linked to action or intervention and requiring
purposes without necessarily linking to immediate availability of support for decisions and
subsequent action or intervention conclusion s reached by communities as a result of the PRA

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 240

9.4.6 OBSERVATIONAL METHOD


Observation is a fundamental way of finding out about the world around us. As human beings, we are
very well equipped to pick up detailed information about our environment through our senses.
However, as a method of data collection for research purposes, observation is more than just
looking or listening. Research, simply defined, is ‘systematic enquiry made public’ (Stenhouse, 1975).
Firstly, in order to become systematic, observation must in some way be selective. We are
constantly bombarded by huge amounts of sensory information. Human beings are good at selectively
attending to what is perceived as most useful to us. Observation harnesses this ability; systematic
observation entails careful planning of what we want to observe. Secondly, in order to make
observation ‘public’, what we see or hear has to be recorded in some way to allow the information to
be analysed and interpreted. Observation is a systematic data collection approach. Researchers use
all of their senses to examine people in natural settings or naturally occurring situations.
Observation of a field setting involves -
 prolonged engagement in a setting or social situation;
 clearly expressed, self-conscious notations of how observing is done;
 methodical and tactical improvisation in order to develop a full understanding of the setting of
interest;
 imparting attention in ways that is in some sense ‘standardized’;
 recording one’s observations.
Use of Observational Method
There are a variety of reasons for collecting observational data. Some of these reasons include -
 When the nature of the research question to be answered is focused on answering a how- or
what-type question.
 When the topic is relatively unexplored and little is known to explain the behavior of people in a
particular setting.
 When understanding the meaning of a setting in a detailed way is valuable.
 When it is important to study a phenomenon in its natural setting.
 When self-report data (asking people what they do) is likely to be different from actual
behavior (what people actually do). One example of this seen in the difference between self-
reported versus observed preventive service delivery in health care settings.
 When implementing an intervention in a natural setting, observation may be used in conjunction
with other quantitative data collection techniques. Observational data can help researchers
evaluate the fidelity of an intervention across settings and identify when 'stasis' has been
achieved.
Classification of Observational Method
Observational methods can be classified as follows –
Casual and Scientific Observation: An observation can be sometimes casual in nature or sometimes it
may act scientifically. An observation with a casual approach involves observing the right thing at
the right place and also at the right time by a matter of chance or by luck whereas a scientific
observation involves the use of the tools of the measurement, but a very important point to be kept
in mind here is that all the observations are not scientific in nature.
Natural Observation: Natural observation involves observing the behaviour in a normal setting and in
this type of observation, no efforts are made to bring any type of change in the behavior of the
observed. Improvement in the collection of the information and improvement in the environment of
making an observation can be done with the help of natural observations.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 241

Subjective and Objective Observation: All the observations consist of the two main components, the
subject and the object. The subject refers to the observer whereas the object refers to the
activity or any type of operation that is being observed. Subjective observation involves the
observation of the one’s own immediate experience whereas the observations involving observer as
an entity apart from the thing being observed, are referred to as the objective observation.
Objective observation is also called as the retrospection.
Direct and Indirect Observation: With the help of the direct method of observation, one comes to
know how the observer is physically present in which type of situation is he present and then this
type of observation monitors what takes place. Indirect method of observation involves studies of
mechanical recording or the recording by some of the other means like photographic or electronic.
Direct observation is relatively more straight forward as compared to the indirect observation.
Participant and Non Participant Observation: Participation by the observers with the various types
of operations of the group under study refers to the participant type of observation. In participant
observation, the degree of the participation is largely affected by the nature of the study and it
also depends on the type of the situation and also on its demands. But in the non participant type of
observation, no participation of the observer in the activities of the group takes place and also
there occurs no relationship between the researcher and the group.
Undisguised participant observation is often used to understand the culture and behavior of groups
of individuals. Disguised participant observation is often used when researchers believe individuals
would change their behavior if they knew it was being recorded. Participant observation allows
researchers to observe behaviors and situations that are not usually open to scientific observation.
Participant observers may sometimes lose their objectivity or may unduly influence the individuals
whose behavior they are recording.
Structured and Unstructured Observation: Structured observation works according to a plan and
involves specific information of the units that are to be observed and also about the information
that is to be recorded. The operations that are to be observed and the various features that are to
be noted or recorded are decided well in advance. Such observations involve the use of especial
instruments for the purpose of data collection that are also structured in nature. But in the case of
the unstructured observation, its basics are diametrically against the structured observation. In
such observation, observer has the freedom to note down what s/he feels is correct and relevant to
the point of study and also this approach of observation is very suitable in the case of exploratory
research.
Structured observations are set up to record behaviors that may be difficult to observe using
naturalistic observation. Clinical and developmental psychologists often use structured observations.
Problems in interpreting structured observations can occur when the same observation procedures
are not followed across observations or observers, or when important variables are not controlled.
Structured observation is more likely to be carried out by those operating from a ‘positivist’
perspective, or who at least believe it is possible to clearly define and quantify behaviors.
Unstructured observation is more likely to be carried out by those operating from an ‘interpretive’
or ‘critical’ perspective where the focus is on understanding the meanings participants, in the
contexts observed, attribute to events and actions. Positivist and critical researchers are likely to
be operating from a ‘realist’ perspective, namely that there is a ‘real world’ with ‘real impact’ on
people’s lives and this can best be studied by looking at social settings directly.
Controlled and Un-controlled Observation: Controlled observations are the observations made under
the influence of some of the external forces and such observations rarely lead to improvement in

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 242

the precision of the research results. But these observations can be very effective in the working if
these are made to work in the coordination with mechanical synchronizing devices, film recording
etc. Un-controlled observations are made in the natural environment and reverse to the controlled
observation these observations involve no influence or guidance of any type of external force.
Covert and Overt Observation: Covert observations are when the researcher pretends to be an
ordinary member of the group and observes in secret. There could be ethical problems or deception
and consent with this particular method of observation. Overt observations are when the
researcher tells the group s/he is conducting research (i.e. they know they are being observed).
Type of Obser- Advantages Disadvantages
vational Method
Naturalistic  Particularly good for observing specific  Ethics: Where research is undisclosed
Observation subjects. consent will not be obtained, where
 Provides ecologically valid recordings of natural consent is not obtained - details may
behavior. be used which infringe confidentiality.
 Spontaneous behaviors are more likely to
happen.
Structured  Allows control of extraneous variables.  The implementation of controls may
Observation  Reliability of results can be tested by repeating have an effect on behavior.
the study.  Lack of ecological validity.
 Provides a safe environment to study  Observer effect.
contentious concepts such as infant attachment.  Observer bias.
Unstructured  Gives a broad overview of a situation.  Only really appropriate as a ‘first step’
Observation  Useful where situation/subject matter to be to give an overview of a situation /
studied is unclear. concept / idea.
Participant  Gives an ‘insiders’ view.  Observer effect.
Observation  Behaviors are less prone to misinterpretation  Possible lack of objectivity on the part
because researcher was a participant. of the observer.
 Opportunity for researcher to become an
‘accepted’ part of the environment.
Non-Participant  Avoidance of observer effect  Observer is detached from situation
Observation so relies on their perception which may
be inaccurate

Recording Behavior in Observational Method


The goals of observational research determine whether researchers seek a comprehensive
description of behavior record or a description of only selected behaviors. How the results of a
study are ultimately summarized, analyzed, and reported depends on how behavioral observations
are initially recorded.
Fieldnotes: Participant observers may use multiple methods to gather data. One primary approach
involves writing fieldnotes. There are several guides for learning how to prepare fieldnotes -
 Researchers may be interested in creating or using a template to guide a researchers’
observations.
 Templates or observational coding sheets can be useful when data is collected by inexperienced
observers.
 Templates or observational coding sheets should only be developed after observation in the field
that is not inhibited by such a template.
 Theories and concepts can be driven by templates and result in focused data collection.
 Templates can deflect attention from unnamed categories, unimagined and unanticipated
activities that can be very important to understanding a phenomenon and a setting.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 243

Qualitative Records of Behavior: Observation can provide rich qualitative data, sometimes described
as ‘thick description’ (Geertz, 1973), for example, where the relevant phenomena have been
carefully observed and detailed field notes have been recorded. Typically, the researcher would not
approach the observation with pre-determined categories or questions in mind. Because of this
openness, observation in qualitative research is often referred to as unstructured.
Quantitative Measures of Behavior: Researchers often obtain quantitative measures such as
frequency or duration of occurrence when they seek to describe specific behaviors or events.
Quantitative measures of behavior use one of the four levels of measurement scales: nominal,
ordinal, interval, and ratio. The term ‘systematic’ observation is usually associated with observation
undertaken from the perspective of quantitative research where the purpose is to provide reliable,
quantifiable data. This usually involves the use of some kind of formal, structured observation
instrument or schedule. The observation method being used will clearly identify - the variables to be
observed, perhaps by means of some kind of behavioral checklist; who or what will be observed; how
the observation is to be conducted; and when and where the observations will take place.
Analysis of Observational Data
Data Reduction: Observational data are summarized through the process of data reduction.
Researchers quantify the data in narrative records by coding behaviors according to specified
criteria, for example, by categorizing behaviors. Data are summarized using descriptive measures
such as frequency counts, means, and standard deviations.
Observer Reliability: Inter-observer reliability refers to the extent to which independent
observers agree in their observations. Inter-observer reliability is increased by providing clear
definitions about behaviors and events to be recorded, by training observers, and by providing
feedback about discrepancies. High inter-observer reliability increases researchers' confidence
that observations about behavior are accurate (valid). Inter-observer reliability is assessed by
calculating percentage of agreement or correlations, depending on how the behaviors were measured
and recorded.
Influence of the Observer: If individuals change their behavior when they know they are being
observed (reactivity), their behavior may no longer be representative of their normal behavior.
Research participants may respond to demand characteristics in the research situation to guide
their behavior. Methods to control reactivity include unobtrusive (non-reactive) measurement,
adaptation (habituation, desensitization), and indirect observations of behavior. Researchers must
consider ethical issues when attempting to control reactivity.
Observer Bias: Observer bias occurs when observers’ biases determine which behaviors they choose
to observe and when observers’ expectations about behavior lead to systematic errors in identifying
and recording behavior. Expectancy effects can occur when observers are aware of hypotheses for
the outcome of the study or the outcome of previous studies. The first step in controlling observer
bias is to recognize that it may be present. Observer bias may be reduced by keeping observers
unaware (blind) of the goals and hypotheses of the study.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 244

Advantages and Disadvantages of Observational Method


What and how you observe depends very much on your subject of study. Researchers who prefer
more security from the beginning might consider systematic observation. This involves using an
observation schedule whereby teacher and/or pupil behavior is coded according to certain
predetermined categories at regular intervals. The strengths of systematic observation are –
 It is relatively free of observer bias. It can establish frequencies, and is strong on objective
measures which involve low inference on the part of the observer.
 Reliability can be strong. Where teams of researchers have used this approach, 80% reliability
has been established among them.
 Generalisability. Once you have devised your instrument, large samples can be covered.
 It is precise. There is no ‘hanging around’ or ‘muddling through’.
 It provides a structure for the research.
The weaknesses are –
 There is a measure of unreliability. Qualitative material might be misrepresented through the
use of measurement techniques.
 Much of the interaction is missed.
 It usually ignores the temporal and spatial context in which the data is collected.
 It is not good for generating fresh insights.
 The pre-specification of categories predetermines what is to be discovered and allows only
partial description.
 It ignores process, flux, development, and change.
There has been lively debate about the pros and cons of systematic and unsystematic observation.
In general, systematic observation is a useful technique and can be particularly strong where used in
conjunction with more purely qualitative techniques.

9.4.7 SURVEY METHOD


Survey research is often used to assess thoughts, opinions, and feelings. Survey research can be
specific and limited, or it can have more global, widespread goals. Today, survey research is used by
a variety of different groups. Psychologists and sociologists often use survey research to analyze
behavior, while it is also used to meet the more pragmatic needs of the media, such as, in evaluating
political candidates, public health officials, professional organizations, and advertising and marketing
directors. A survey consists of a predetermined set of questions that is given to a sample. With a
representative sample, that is, one that is representative of the larger population of interest, one
can describe the attitudes of the population from which the sample was drawn. Further, one can
compare the attitudes of different populations as well as look for changes in attitudes over time. A
good sample selection is key as it allows one to generalize the findings from the sample to the
population, which is the whole purpose of survey research.
Surveys provide a means of measuring a population’s characteristics, self-reported and observed
behavior, awareness of programs, attitudes or opinions, and needs. Repeating surveys at regular
intervals can assist in the measurement of changes over time. These types of information are
invaluable in planning and evaluating government policies and programs. Unlike a census, where all
members of a population are studied, sample surveys gather information from only a portion of a
population of interest. The size of the sample depends on the purpose of the study. In a statistically
valid survey, the sample is objectively chosen so that each member of the population will have a
known non-zero chance of selection. Only then can the results be reliably projected from the sample

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 245

to the population. The sample should not be selected haphazardly or only from those who volunteer
to participate.
Surveys are a good way of gathering a large amount of data, providing a broad perspective. Surveys
can be administered electronically, by telephone, by mail or face to face. Mail and electronically
administered surveys have a wide reach, are relatively cheap to administer, information is
standardized and privacy can be maintained. They do, however, have a low response rate, are unable
to investigate issues to any great depth, require that the target group is literate and do not allow
for any observation. As surveys are self-reported by participants, there is a possibility that
responses may be biased particularly if the issues involved are sensitive or require some measure of
disclosure on trust by the participant. It is therefore vital that surveys used are designed and
tested for validity and reliability with the target groups who will be completing the surveys.
Careful attention must be given to the design of the survey. If possible the use of an already
designed and validated survey instrument will ensure that the data being collected is accurate. If
you design your own survey it is necessary to pilot test the survey on a sample of your target group
to ensure that the survey instrument is measuring what it intends to measure and is appropriate for
the target group. Questions within the survey can be asked in several ways and include: closed
questions, open-ended and scaled questions, and multiple choice questions. Closed questions are
usually in the format of yes/no or true/false options. Open-ended questions on the other hand leave
the answer entirely up to the respondent and therefore provide a greater range of responses.
Additionally, the use of scales is useful when assessing participants’ attitudes. A multiple choice
question may ask respondents to indicate their favorite topic covered in the program, or most
preferred activity. Other considerations when developing a survey instrument include - question
sequence, layout and appearance, length, language, and an introduction and cover letter. Sensitive
questions should be placed near the end of a survey rather than at the beginning.
Use of Survey
When determining the need for a survey, departments/agencies should first check that the
required information is not already available. The option of collecting the required information using
existing administrative records should also be explored. Using existing data or records provides
considerable advantages in terms of cost, time and the absence of respondent burden. The major
disadvantage is the lack of control over the data collected. If existing data are not available or
suitable, a number of factors must then be considered when determining which type of survey, if
any, is appropriate. For example -
Practicality
 Can the information be collected cost effectively and accurately via a survey?
 How complex and how sensitive is the topic?
 Do respondents have access to the required information?
 Will they be willing to supply the information?
 Will their responses to the questions be valid?
Resources
 Are the necessary financial, staff, computer or other resources available?

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 246

Timing
 When is the information required?
 Is enough time available to ensure that data of sufficient quality can be collected and analysed?
 When is the best time to conduct the survey? (For example, need to allow for seasonality,
impact of school holiday periods etc).
Survey requirements
 Do you want to use this information to target program improvements? If so, you may need to
identify the key sub-groups you wish to report on (for example, geographic areas, age groups,
sex, industry and size of business) and obtain sufficient responses for each group to ensure
results are accurate enough for your needs.
Accuracy
 What level of error can be tolerated? This depends on how and for what purposes you intend to
use the survey results.
Frequency
 Is the survey to be repeated? How often?
Legislative powers
 Does the department/agency have authority to collect the information through either a
compulsory or voluntary survey?
Ethical consideration
Ethical considerations must be observed during the survey exercise. This includes that data, where
appropriate, are treated confidentially, and that where information is sought on the understanding
that the respondent cannot be identified, that such anonymity is preserved. Other ethical
considerations include -
 Do you need identifiable information (for example, names, addresses, telephone numbers)
relating to respondents for follow-up research or matching with other data? If so, you need to
clearly explain why you need such details and obtain the respondents’ consent.
 Will respondents be adversely affected or harmed as a direct result of participating in the
survey?
 Are procedures in place for respondents to check the identity and bonafides of the
researchers?
 Is the survey being conducted on a voluntary basis? If so, respondents must not be misled to
believe it is compulsory when being asked for their co-operation.
 Is it necessary to interview children under 14 years? If so, the consent of their parents /
guardians / responsible adults must be obtained.
These factors must all be taken into consideration when developing an appropriate sample design
(that is, sample size, selection method, etc.) and survey method.
Survey Process
The following is an outline of the general process to be followed once the need for a survey has been
determined. Some steps will not be necessary in all cases and some processes can be carried out at
the same time (for example, data collection and preparation for data entry and processing). A
sample survey is cheaper and timelier than a census but still requires significant resources, effort
and time. The survey process is complex and the stages are not necessarily sequential. Pilot testing
of, at least, key elements such as the questionnaire and survey operations is an essential part of the
development stage. It may be necessary to go through more than one cycle of development, testing,

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 247

evaluation and modification before a satisfactory solution is reached. The entire process should be
planned ahead, including all critical dates. The time required from initial planning to the completion
of a report or publication may vary from several weeks to several months according to the size and
type of survey. Key steps in the survey process include –
Planning and Designing
1. Define the purpose, objectives and the output required. Experience has shown that well-defined
output requirements at the outset minimize the risk of the survey producing invalid results.
2. Design collection methodology and sample selection method.
3. Develop survey procedures. Design and print test questionnaires and any other documentation
(for example, instructions for interviewers and introductory letters).
Testing and Modifying
4. Pilot test all aspects of the survey if possible. As a minimum, a small-scale pre-test of
questionnaires can reveal problems with question wording, layout, understanding or respondent
reaction.
5. Analyze test results (completed questionnaires, response/consent rate etc). Obtain feedback
from respondents and/or interviewers.
6. Modify procedures, questionnaires and documentation according to test evaluation.
7. Repeat steps 1–6 if necessary.
Conducting the Survey
8. Finalize procedures, questionnaires and documentation.
9. Select sample.
10. Train interviewers (if interviewer-based).
11. Conduct the survey (that is, mail out questionnaires or commence interviewing) including follow-
up of refusals and non-contacts, supervision and checks of interviewers’ work.
Processing and Analyzing
12. Prepare data entry, estimation and tabulation systems.
13. Code, enter and edit data.
14. Process data - calculate population estimates and standard errors, prepare tables.
15. Prepare report of survey results.
16. Prepare technical report. Evaluate and document all aspects of the survey for use when
designing future surveys.
Data Collection Method in Survey
Commonly used methods for collecting quantitative data include telephone and face-to-face
interviews, self-completion questionnaires (such as mail, email, web-based or SMS) or combinations
of these. Each has advantages and disadvantages in terms of the cost, time, response/consent rate
and the type of information that can be collected.
Self-completion Surveys via mail, email, the internet or SMS are generally the least expensive,
particularly for a widespread sample. They allow respondents time to consider their answers, refer
to records or consult with others (which can be helpful or unhelpful, depending on the survey’s
objectives). They also eliminate interviewer errors and reduce the incidence of selected people (or
units) being unable to be contacted. A major disadvantage of self-completion surveys is the
potentially high non-response. In such cases, substantial bias can result if people who do not
complete the survey have different characteristics from those who do. However, response can be
improved using techniques such as well-written introductory letters, incentives for timely return of
questionnaires and follow-up for those initially not responding. In self-completion surveys there is no

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 248

opportunity to clarify answers or supplement the survey with observational data. In mail surveys the
questionnaire usually has to be simple and reasonably short, particularly when surveying the general
community. Internet and email-based surveys are commonly used for surveying clients or staff
within organizations and allow more complex questionnaires to be used than mail surveys do.
Interviewer-based Surveys such as face-to-face or telephone surveys generally allow more data to
be gathered than self-completion surveys and can include the use of more complex questionnaires.
Interviewers can reduce non-response by answering respondents’ queries or concerns. They can
often pick up and resolve respondent errors. Face-to-face surveys are usually more expensive than
other methodologies. Poor interviewers can introduce additional errors and, in some cases, the face-
to-face approach is unsuitable for sensitive topics. Telephone surveys are generally cheaper and
quicker than face-to-face surveys, and are well suited to situations where timely results are needed.
However, non-response may be higher than for face-to-face surveys as it is harder for interviewers
to prove their identity, assure confidentiality and establish rapport. Telephone surveys are not
suited for situations where the respondents need to refer to records extensively. Also, the
questionnaires must be simpler and shorter than for face-to-face surveys and prompt cards cannot
be used.
Computer Assisted Telephone Interviewing (CATI) is a particular type of telephone survey technique
that helps to resolve some of the limitations of general telephone-based surveying. With CATI,
interviewers use a computer terminal. The questions appear on the computer screen and the
interviewers enter responses directly into the computer. The interviewer’s screen is programmed to
show questions in the planned order. Interviewers cannot inadvertently omit questions or ask them
out of sequence. Online messages warn interviewers if they enter invalid values or unusual values.
Most CATI systems also allow many aspects of survey operations to be automated, e.g. rescheduling
of call-backs, engaged numbers and ‘no answers’, and allow automatic dialing and remote supervision
of interviewer/respondent interaction. A survey frame or list which contains telephone numbers is
required to conduct a telephone survey. For general population surveys, such lists are not readily
available or they have limitations that can lead to biased results. If the Electronic White Pages list
is used to select a sample of households then the sample will not include households with silent
numbers. In addition, it may exclude households with recent new connections or recent changes to
existing numbers. Electoral rolls exclude respondents aged less than 18 years of age, migrants not
yet naturalised and others ineligible to vote. Random Digit Dialing may address some of the under-
coverage associated with an Electronic White Pages or electoral role list, but it is inefficient for
sampling at a low geographic level and does not allow for communicating (via pre-approach letter, for
example) with households prior to the commencement of telephone interviewing.
Combinations of Collection Methods such as interviewers dropping off a questionnaire to be mailed
back or returning to pick it up, a mail survey with telephone follow-up, or an initial telephone call to
obtain cooperation or name of a suitable respondent followed by a mail survey – are sometimes used
to obtain higher response/consent rates to a survey.
If in-depth or purely qualitative information is required, alternative research methods should be
considered. Focus groups, observation and in-depth interviewing are all useful when developing a
survey or initially exploring areas of interest. They can also be a valuable supplement to survey data.
However, results from such studies should not be considered representative of the entire
population of interest.
Sources of Error

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 249

Whether a survey is being conducted by departmental/agency staff or by consultants, it is


important to be aware of potential sources of error and strategies to minimize them. Errors arising
in the collection of survey data can be divided into two types - sampling error and non-sampling
error.
Sampling error occurs when data are collected from a sample rather than the entire population. The
sampling error associated with survey results for a particular sub-group of interest depends mainly
on the number of achieved responses for that sub-group rather than on the percentage of units
sampled. Estimates of sampling error, such as standard errors, can be calculated mathematically.
They are affected by factors such as -
 sample size - increasing the sample size will decrease the sampling error.
 population variability - a larger sampling error will be present if the items of interest vary
greatly within the population.
 sample design - standard errors cannot be calculated if the probability of selection is not known
(for example, quota sampling).
All other errors associated with collecting survey data are called non-sampling errors. Although
they cannot be measured in the same way as sampling errors, they are just as important. The
following table lists common sources of non-sampling error and some strategies to minimize them.
Table 9.1
Common Sources of Non-sampling Error and Strategies to Minimize Them
Source of error Examples Strategies to minimize error
Planning and Inadequate definitions of concepts, terms Ensure all concepts, terms and populations are defined precisely
interpretation or populations. through consultation between data users and survey designers.
Sample Inadequate list from which sample is Check list for accuracy, duplicates and missing units; use
selection selected; biased sample selection. appropriate selection procedures.
Survey methods Inappropriate method (e.g., mail survey for a Choose an appropriate method and test thoroughly.
very complicated topic).
Questionnaire Loaded, misleading or ambiguous questions, Use plain English, clear questions and logical layout; test
poor layout or sequencing. thoroughly.
Interviewers Leading respondents, making assumptions, Provide clear interviewer instructions and appropriate training,
misunderstanding or misreporting answers. including exercises and field supervision.
Respondents Refusals, memory problems, rounding Promote survey through public media; ensure confidentiality; if
answers, protecting personal interests or interviewer-based, use well-trained, impartial interviewers and
integrity. probing techniques; if mail-based, use a well-written
introductory letter.
Processing Errors in data entry, coding or editing. Adequately train and supervise processing staff; check a sample
of each person’s work.
Estimation Incorrect weighting, errors in calculation of Ensure that skilled statisticians undertake estimation.
estimates.

Non-response occurs in virtually all surveys through factors such as refusals, non-contact and
language difficulties. It is of particular importance if the characteristics of non-respondents differ
from respondents. For example, if high-income earners are more likely to refuse to participate in an
income survey, the results will obviously be biased towards lower incomes. For this reason, all
surveys should aim for the maximum possible response/consent rate, within cost and time
constraints, by using techniques such as call-backs to non-contacts and follow-up of refusals. The
level of non-response should always be measured.
Bias can also arise from inadequate sampling frames, the lists from which respondents are selected.
Household and business telephone listings and electoral rolls are often used as sampling frames, but
they all have limitations. Telephone listings exclude respondents who do not have telephones and can

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 250

exclude those with ‘silent’ or unlisted numbers. Electoral rolls exclude respondents aged less than 18
years of age, migrants not yet naturalized and others ineligible to vote.
Issues for Selecting Survey Methods
Selecting the type of survey you are going to use is one of the most critical decisions in many social
research contexts. There are very few simple rules that will make the decision to balance the
advantages and disadvantages of different survey types. Here, is a number of questions which can
be asked for decision.
 Population Issues
The first set of considerations has to do with the population and its accessibility.
Can the population be enumerated? For some populations, you have a complete listing of the units
that will be sampled. For others, such a list is difficult or impossible to compile. For instance, there
are complete listings of registered voters or person with active driver’s licenses. But no one keeps a
complete list of homeless people. If you are doing a study that requires input from homeless
persons, you are very likely going to need to go and find the respondents personally. In such
contexts, you can pretty much rule out the idea of mail surveys or telephone interviews.
Is the population literate? Questionnaires require that your respondents can read. While this might
seem initially like a reasonable assumption for many adult populations, we know from recent research
that the instance of adult illiteracy is alarmingly high. And, even if your respondents can read to
some degree, your questionnaire may contain difficult or technical vocabulary. Clearly, there are
some populations that you would expect to be illiterate. Young children would not be good targets
for questionnaires.
Are there language issues? We live in a multilingual world. Virtually every society has members who
speak other than the predominant language. Some countries (like Canada) are officially multilingual.
And, our increasingly global economy requires us to do research that spans countries and language
groups. Can you produce multiple versions of your questionnaire? For mail instruments, can you know
in advance the language your respondent speaks, or do you send multiple translations of your
instrument? Can you be confident that important connotations in your instrument are not culturally
specific? Could some of the important nuances get lost in the process of translating your questions?
Will the population cooperate? People who do research on immigration issues have a difficult
methodological problem. They often need to speak with undocumented immigrants or people who may
be able to identify others who are. Why would we expect those respondents to cooperate? Although
the researcher may mean no harm, the respondents are at considerable risk legally if information
they divulge should get into the hand of the authorities. The same can be said for any target group
that is engaging in illegal or unpopular activities.
What are the geographic restrictions? Is your population of interest dispersed over too broad a
geographic range for you to study feasibly with a personal interview? It may be possible for you to
send a mail instrument to a nationwide sample. You may be able to conduct phone interviews with
them. But it will almost certainly be less feasible to do research that requires interviewers to visit
directly with respondents if they are widely dispersed.
 Sampling Issues
The sample is the actual group you will have to contact in some way. There are several important
sampling issues you need to consider when doing survey research.
What data is available? What information do you have about your sample? Do you know their current
addresses? What are their current phone numbers? Are your contact lists up to date?

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 251

Can respondents be found? Can your respondents be located? Some people are very busy. Some
travel a lot. Some work the night shift. Even if you have an accurate phone or address, you may not
be able to locate or make contact with your sample.
Who is the respondent? Who is the respondent in your study? Let’s say you draw a sample of
households in a small city. A household is not a respondent. Do you want to interview a specific
individual? Do you want to talk only to the ‘head of household’ (and how is that person defined)? Are
you willing to talk to any member of the household? Do you state that you will speak to the first
adult member of the household who opens the door? What if that person is unwilling to be
interviewed but someone else in the house is willing? How do you deal with multi-family households?
Similar problems arise when you sample groups, agencies, or companies. Can you survey any member
of the organization? Or, do you only want to speak to the Director of Human Resources? What if
the person you would like to interview is unwilling or unable to participate? Do you use another
member of the organization?
Can all members of population be sampled? If you have an incomplete list of the population (i.e.,
sampling frame) you may not be able to sample every member of the population. Lists of various
groups are extremely hard to keep up to date. People move or change their names. Even though they
are on your sampling frame listing, you may not be able to get to them. And, it’s possible they are not
even on the list.
Are response rates likely to be a problem? Even if you are able to solve all of the other population
and sampling problems, you still have to deal with the issue of response rates. Some members of
your sample will simply refuse to respond. Others have the best of intentions, but can’t seem to find
the time to send in your questionnaire by the due date. Still others misplace the instrument or
forget about the appointment for an interview. Low response rates are among the most difficult of
problems in survey research. They can ruin an otherwise well-designed survey effort.
 Question Issues
Sometimes the nature of what you want to ask respondents will determine the type of survey you
select.
What types of questions can be asked? Are you going to be asking personal questions? Are you going
to need to get lots of detail in the responses? Can you anticipate the most frequent or important
types of responses and develop reasonable closed-ended questions?
How complex will the questions be? Sometimes you are dealing with a complex subject or topic. The
questions you want to ask are going to have multiple parts. You may need to branch to sub-questions.
Will screening questions be needed? A screening question may be needed to determine whether the
respondent is qualified to answer your question of interest. For instance, you wouldn’t want to ask
someone their opinions about a specific computer program without first ‘screening’ them to find out
whether they have any experience using the program. Sometimes you have to screen on several
variables (e.g., age, gender, experience). The more complicated the screening, the less likely it is
that you can rely on paper-and-pencil instruments without confusing the respondent.
Can question sequence be controlled? Is your survey one where you can construct in advance a
reasonable sequence of questions? Or, are you doing an initial exploratory study where you may need
to ask lots of follow-up questions that you can’t easily anticipate?
Will lengthy questions be asked? If your subject matter is complicated, you may need to give the
respondent some detailed background for a question. Can you reasonably expect your respondent to
sit still long enough in a phone interview to ask your question?

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 252

Will long response scales be used? If you are asking people about the different computer equipment
they use, you may have to have a lengthy response list (CD-ROM drive, floppy drive, mouse, touch
pad, modem, network connection, external speakers, etc.). Clearly, it may be difficult to ask about
each of these in a short phone interview.
 Content Issues
The content of your study can also pose challenges for the different survey types you might utilize.
Can the respondents be expected to know about the issue? If the respondent does not keep up with
the news (e.g., by reading the newspaper, watching television news, or talking with others), they may
not even know about the news issue you want to ask them about. Or, if you want to do a study of
family finances and you are talking to the spouse who doesn’t pay the bills on a regular basis, they
may not have the information to answer your questions.
Will respondent need to consult records? Even if the respondent understands what you’re asking
about, you may need to allow them to consult their records in order to get an accurate answer. For
instance, if you ask them how much money they spent on food in the past month, they may need to
look up their personal check and credit card records. In this case, you don’t want to be involved in an
interview where they would have to go look things up while they keep you waiting (they wouldn't be
comfortable with that).
 Bias Issues
People come to the research endeavor with their own sets of biases and prejudices. Sometimes,
these biases will be less of a problem with certain types of survey approaches.
Can social desirability be avoided? Respondents generally want to ‘look good’ in the eyes of others.
None of us likes to look like we don’t know an answer. We don’t want to say anything that would be
embarrassing. If you ask people about information that may put them in this kind of position, they
may not tell you the truth, or they may ‘spin’ the response so that it makes them look better. This
may be more of a problem in an interview situation where they are face-to face or on the phone with
a live interviewer.
Can interviewer distortion and subversion be controlled? Interviewers may distort an interview as
well. They may not ask questions that make them uncomfortable. They may not listen carefully to
respondents on topics for which they have strong opinions. They may make the judgment that they
already know what the respondent would say to a question based on their prior responses, even
though that may not be true.
Can false respondents be avoided? With mail surveys it may be difficult to know who actually
responded. Did the head of household complete the survey or someone else? Did the CEO actually
give the responses or instead pass the task off to a subordinate? Is the person you're speaking with
on the phone actually who they say they are? At least with personal interviews, you have a
reasonable chance of knowing who you are speaking with. In mail surveys or phone interviews, this
may not be the case.
 Administrative Issues
Last, but certainly not least, you have to consider the feasibility of the survey method for your
study.
Costs: Cost is often the major determining factor in selecting survey type. You might prefer to do
personal interviews, but can’t justify the high cost of training and paying for the interviewers. You
may prefer to send out an extensive mailing but can't afford the postage to do so.
Facilities: Do you have the facilities (or access to them) to process and manage your study? In phone
interviews, do you have well-equipped phone surveying facilities? For focus groups, do you have a

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 253

comfortable and accessible room to host the group? Do you have the equipment needed to record
and transcribe responses?
Time: Some types of surveys take longer than others. Do you need responses immediately (as in an
overnight public opinion poll)? Have you budgeted enough time for your study to send out mail
surveys and follow-up reminders, and to get the responses back by mail? Have you allowed for
enough time to get enough personal interviews to justify that approach?
Personnel: Different types of surveys make different demands of personnel. Interviews require
interviewers who are motivated and well-trained. Group administered surveys require people who are
trained in group facilitation. Some studies may be in a technical area that requires some degree of
expertise in the interviewer.
Clearly, there are lots of issues to consider when you are selecting which type of survey you wish to
use in your study. And there is no clear and easy way to make this decision in many contexts.

9.4.8 CASE STUDY METHOD


Case studies are in-depth investigations of a single person, group, event or community. Typically data
are gathered from a variety of sources and by using several different methods (e.g. observations &
interviews). The case study research method originated in clinical medicine (the case history, i.e. the
patient’s personal history). The case study method often involves simply observing what happens to,
or reconstructing ‘the case history’ of a single participant or group of individuals (such as a school
class or a specific social group), i.e. the idiographic approach. Case studies allow a researcher to
investigate a topic in far more detail than might be possible if they were trying to deal with a large
number of research participants (nomothetic approach) with the aim of ‘averaging’.
The case study is not itself a research method, but researchers select methods of data collection
and analysis that will generate material suitable for case studies such as qualitative techniques
(unstructured interviews, participant observation, diaries), personal notes (e.g. letters, photographs,
notes) or official document (e.g. case notes, clinical notes, appraisal reports). The data collected can
be analyzed using different theories (e.g. grounded theory, interpretative phenomenological analysis,
text interpretation (e.g. thematic coding) etc. All the approaches mentioned here use preconceived
categories in the analysis and they are ideographic in their approach, i.e. they focus on the individual
case without reference to a comparison group.
Case studies are widely used in psychology and amongst the best known were the ones carried out by
Sigmund Freud. He conducted very detailed investigations into the private lives of his patients in an
attempt to both understand and help them overcome their illnesses. Freud’s most famous case
studies include ‘Little Hans’ (1909a) and ‘The Rat Man’ (1909b). Even today case histories are one of
the main methods of investigation in abnormal psychology and psychiatry. For students of these
disciplines they can give a vivid insight into what those who suffer from mental illness often have to
endure. Case studies are often conducted in clinical medicine and involve collecting and reporting
descriptive information about a particular person or specific environment, such as a school. In
psychology, case studies are often confined to the study of a particular individual. The information
is mainly biographical and relates to events in the individual’s past (i.e. retrospective), as well as to
significant events which are currently occurring in his or her everyday life. In order to produce a
fairly detailed and comprehensive profile of the person, the psychologist may use various types of
accessible data, such as medical records, employer’s reports, school reports or psychological test
results. The interview is also an extremely effective procedure for obtaining information about an
individual, and it may be used to collect comments from the person’s friends, parents, employer,

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 254

work mates and others who have a good knowledge of the person, as well as to obtain facts from the
person him or herself.
In a case study, nearly every aspect of the subject’s life and history is analyzed to seek patterns
and causes for behavior. The hope is that learning gained from studying one case can be generalized
to many others. Unfortunately, case studies tend to be highly subjective and it is difficult to
generalize results to a larger population.
Characteristics of Case Study Method
 Case study research is not sampling research. Selecting cases must be done so as to maximize
what can be learned in the period of time available for the study.
 The unit of analysis is a critical factor in the case study. It is typically a system of action rather
than an individual or group of individuals. Case studies tend to be selective, focusing on one or
two issues that are fundamental to understanding the system being examined.
 Case studies are multi-perspectives analyses. This means that the researcher considers not just
the voice and perspective of the actors, but also of the relevant groups of actors and the
interaction between them. This one aspect is a salient point in the characteristic that case
studies possess. They give a voice to the powerless and voiceless.
 Case study is known as a triangulated research strategy. Snow and Anderson (1991) asserted
that triangulation can occur with data, investigators, theories, and even methodologies. Stake
(1995) stated that the protocols that are used to ensure accuracy and alternative explanations
are called triangulation. The need for triangulation arises from the ethical need to confirm the
validity of the processes. In case studies, this could be done by using multiple sources of data
(Yin, 1984). The problem in case studies is to establish meaning rather than location. Denzin
(1984) identified four types of triangulation: Data source triangulation, when the researcher
looks for the data to remain the same in different contexts; Investigator triangulation, when
several investigators examine the same phenomenon; Theory triangulation, when investigators
with different viewpoints interpret the same results; and Methodological triangulation, when one
approach is followed by another, to increase confidence in the interpretation.
Characteristics of the case study method in legal research can be described shortly as follows -
 Any researcher can hold research into one single or more social unit such as a person, family,
society and so on for the accomplishment of the aim of his/her study under this method. He/she
can hold comprehensive and intensive study in different aspects of the unit so selected. Under
this method, he/she can give the weight and consideration towards all the aspects of a person,
group or society so selected for study. All aspects can be deeply and intensively studied.
 Any researcher does not only hold the study to find out how many crimes have been committed
by a man but also deeply hold study into causes that forces or abets him to commit such crimes.
In this example, one of the main objectives of the researcher could be to give suggestion to
referring the criminals.
 Under this method, any researcher can endeavor to know the relationship of causal factors
interlinked.
 Under this method, all the related aspects of the unit, which is in subject to study, can be
directly or indirectly studied.
 Case study method helps to find out the useful data and enables to generalize the knowledge
also.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 255

 The main characteristics of the case study method includes continuing, completeness, validity,
and data as it deals with the life of social unit or units or society as whole.
Application of Case Study Model
Yin (1994) presented at least four applications for a case study model.
To…
 explain complex causal links in real-life interventions;
 describe the real-life context in which the intervention has occurred;
 describe the intervention itself; and
 explore those situations in which the intervention being evaluated has no clear set of outcomes.
Sources of Information in Case Study
There are a number of different sources and methods that researchers can use to gather
information about an individual or group. The six major sources that have been identified by
researchers (Yin, 1994; Stake, 1995) are –
Direct Observation: This strategy involves observing the subject, often in a natural setting. While
an individual observer is sometimes used, it is more common to utilize a group of observers.
Interviews: One of the most important methods for gathering information in case studies. An
interview can involve structured survey-type questions, or more open-ended questions.
Documents: Letters, newspaper articles, administrative records, etc.
Archival Records: Census records, survey records, name lists, etc.
Physical Artifacts: Tools, objects, instruments and other artifacts often observed during a direct
observation of the subject.
Participant Observation: Involves the researcher actually serving as a participant in events and
observing the actions and outcomes.
Category of Case Study
There are several categories of case study.
Prospective: A type of case study in which an individual or group of people is observed in order to
determine outcomes. For example, a group of individuals might be watched over an extended period
of time to observe the progression of a particular disease.
Retrospective: A type of case study that involves looking at historical information. For example,
researchers might start with an outcome, such as a disease, and then backwards at information
about the individuals life to determine risk factors that may have contributed to the onset of the
illness.
Explanatory: Explanatory case studies examine the data closely both at a surface and deep level in
order to explain the phenomena in the data. On the basis of the data, the researcher may then form
a theory and set to test this theory (McDonough and McDonough, 1997). Furthermore, explanatory
cases are also deployed for causal studies where pattern-matching can be used to investigate
certain phenomena in very complex and multivariate cases. Yin and Moore (1987) note that these
complex and multivariate cases can be explained by three rival theories - a knowledge-driven theory,
a problem-solving theory, and a social-interaction theory. The knowledge-driven theory stipulates
that eventual commercial products are the results of ideas and discoveries from basic research.
Similar notions can be said for the problem-solving theory. However, in this theory, products are
derived from external sources rather than from research. The social-interaction theory, on the
other hand, suggests that overlapping professional network causes researchers and users to
communicate frequently with each other.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 256

Exploratory: A case study that is sometimes used as a prelude to further, more in-depth research.
This allows researchers to gather more information before developing their research questions and
hypotheses. A pilot study is considered an example of an exploratory case study (Yin, 1984;
McDonough and McDonough, 1997) and is crucial in determining the protocol that will be used.
Descriptive: Descriptive case studies set to describe the natural phenomena which occur within the
data in question. The goal set by the researcher is to describe the data as they occur. McDonough
and McDonough (1997) suggest that descriptive case studies may be in a narrative form. An example
of a descriptive case study is the journalistic description of the Watergate scandal by two
reporters (Yin, 1984). The challenge of a descriptive case study is that the researcher must begin
with a descriptive theory to support the description of the phenomenon or story. If this fails there
is the possibility that the description lacks rigor and that problems may occur during the project.
Intrinsic: A type of case study in which the researcher has a personal interest in the case.
Collective: Involves studying a group of cases.
Instrumental: Occurs when the individual or group allows researchers to understand more than what
is initially obvious to observers.
According to McDonough and McDonough (1997) other categories include interpretive and evaluative
case studies. Through interpretive case studies, the researcher aims to interpret the data by
developing conceptual categories, supporting or challenging the assumptions made regarding them. In
evaluative case studies, the researcher goes further by adding their judgment to the phenomena
found in the data.
Intrinsic - when the researcher has an interest in the case; Instrumental - when the case is used to
understand more than what is obvious to the observer; Collective - when a group of cases is studied.
Exploratory cases are sometimes considered as a prelude to social research. Explanatory case
studies may be used for doing causal investigations. Descriptive cases require a descriptive theory
to be developed before starting the project. In all of the above types of case studies, there can be
single-case or multiple-case applications.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 257

Procedure of Case Study Method


In short, for the case study, the researchers recommend the above procedures in study –
 Design the case study protocol
 Determine the required skills
 Develop and review the protocol

 Conduct the case study


 Prepare for data collection
 Distribute questionnaire
 Conduct interview

 Analyze case study evidence


 Analytic strategy

 Develop conclusions, recommendations, and implications based on the evidence.


Each section begins with the procedures recommended in the literature, followed by the application
of the recommended procedure in the study.
Advantages and Disadvantages of Case Studies
A good case study should always make clear which information is factual description and which is
inference or the opinion of the researcher. The strengths of case studies are - Provides detailed
(rich qualitative) information; Provides insight for further research; Permitting investigation of
otherwise impractical (or unethical) situations.
Merits of case study method can be described briefly as follows -
 The case study helps to study and understand the human nature and conducts very intensively.
As a result, any researcher can formulate a valid hypothesis.
 Any researcher can get actual and exemplary records of experience that may be useful as
guidelines to others life as this method carries out intensive study of all aspects of a unit or a
problem selected for research.
 This case study method is very useful in sampling as it efficiently and orderly classifies the
units selected for research based on data and information so collected.
 Under the case study, any researcher can undertake one or more research method(s) under the
existing circumstances. S/he can use various methods as interviews, questionnaires, report,
sampling and similar other methods.
 As this method emphasizes historical analysis, this method is taken as a means of knowing and
understanding the past life of a social unit. That is why; it can suggest the possible measures to
be taken for having improvements in present life by the lesson of past life. In other words, it is
said that the old is gold and morning show the day.
 Under this case study method, any researcher can find out new helpful things as it holds
perfect study of sociological materials that can represent real image of experience.
 Under this case study method, any research may increase his/her analytical ability and skill of
the study of practical experiences.
 This method makes possible the study, to bring positive changes in the society. As this method
holds overall study of life of a social unit, the researcher can know and understand the changes
occurred in our society and can suggest to make corrections in human behavior for the welfare,
as well.
 As this case study method holds study of all aspects of a social unit, terms of past, present and
future time, it gives the matured knowledge that could also be useful to his/her personal and
public life.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 258

 This case study method is also taken as indispensable and significant as regards to taking
decision on many management issues. Case data are also very useful for diagnosis and thereby of
practical case issues. It can be taken as an example to be followed in future.
Case studies can help us generate new ideas (that might be tested by other methods). They are an
important way of illustrating theories and can help show how different aspects of a person's life are
related to each other. The method is therefore important for a holistic point of view. Despite its
merits as referred to in above, demerits of the case study method can be described shortly as
follows –
 This case study method is a very vague process. There is no mechanism to control researcher.
Generalization is almost impossible to a larger similar population.
 Under this case study method, letters and other documents can be used. A write up is generally
prepared to impress and give undue influence to personal matters. It always depends on the
personal feeling and thought. As a result, the study of the researcher may be worthless and
meaningless by virtue of possible occurrence of distortion.
 Under this case study method, there is no limitation of study. The researcher always finds
difficulties in deciding when s/he should stop to collect data for his/her study. He/she may find
all things to be pertinent.
 This case study method is always based on several assumptions. However, sometimes, they may
not be realistic. Under such circumstances, such data should be tested.
 Under this case study method, the result is drawn up on the basis of all post experiences.
Collection of much data and information may lead to confusion to find out pertinent and specific
information.
 This case study method is based on comparison with the post life. However, human value,
attitude, behavior, reactions, circumstance are very wide and differ with each other. It is
difficult to compare from one another.
 This case study method always collects post information and data of the society. However, there
is no system of checking. Difficult to replicate.
 This case study method is time consuming, expensive and complex.

9.4.9 DIARIES METHOD


A diary is a type of self-administered questionnaire often used to record frequent or
contemporaneous events or experiences. In diary surveys, respondents are given the self-
administered form and asked to fill in the required information when events occur (event-based
diaries) or at specified times or time intervals (time-based diaries). Data from diary studies can be
used to make cross-sectional comparisons across people, track an individual over time, or study
processes within individuals or families. The main advantages of diary methods are that they allow
events to be recorded in their natural setting and, in theory, minimize the delay between the event
and the time it is recorded. Diaries are used in a variety of domains. These include studies of
expenditure, nutrition, time use, travel, media exposure, health, and mental health. Diary studies in
user research are a longitudinal technique used in anthropology, psychology, and ‘User Experience’
research, primarily to capture data from participants as they live through certain experiences.
There are two types of diary studies –
 Elicitation studies, where participants capture media that are then used as prompts for
discussion in interviews. The method is a way to trigger the participant’s memory.
 Feedback studies, where participants answer predefined questions about events. This is a way of
getting immediate answers from the participants.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 259

Using Diaries in Research


Biographers, historians and literary scholars have long considered diary documents to be of major
importance for telling history. More recently, sociologists have taken seriously the idea of using
personal documents to construct pictures of social reality from the actors’ perspective. In contrast
to these ‘journal’ type of accounts, diaries are used as research instruments to collect detailed
information about behavior, events and other aspects of individuals’ daily lives.
Self-completion diaries have a number of advantages over other data collections methods. First,
diaries can provide a reliable alternative to the traditional interview method for events that are
difficult to recall accurately or that are easily forgotten. Second, like other self-completion
methods, diaries can help to overcome the problems associated with collecting sensitive information
by personal interview. Finally, they can be used to supplement interview data to provide a rich
source of information on respondents’ behavior and experiences on a daily basis. Two other major
areas where diaries are often used are consumer expenditure and transport planning research. For
example, the UK Family Expenditure Survey (OPCS) uses diaries to collect data for the National
Accounts and to provide weights for the Retail Price Index. In the National Travel Survey (OPCS)
respondents record information about all journeys made over a specified time period in a diary.
Other topics covered using diary methods are social networks, health, illness and associated
behavior, diet and nutrition, social work and other areas of social policy, clinical psychology and
family therapy, crime behavior, alcohol consumption and drug usage, and sexual behavior. Diaries are
also increasingly being used in market research. Diary surveys often use a personal interview to
collect additional background information about the household and sometimes about behavior or
events of interest that the diary will not capture. A placing interview is important for explaining the
diary keeping procedures to the respondent and a concluding interview may be used to check on the
completeness of the recorded entries. Often retrospective estimates of the behavior occurring
over the diary period are collected at the final interview.
Diary Design and Format
Diaries may be open format, allowing respondents to record activities and events in their own words,
or they can be highly structured where all activities are pre-categorized. An obvious advantage of
the free format is that it allows for greater opportunity to recode and analyze the data. However,
the labor intensive work required to prepare and make sense of the data may render it unrealistic
for projects lacking time and resources, or where the sample is large. Although the design of a diary
will depend on the detailed requirement of the topic under study, there are certain design aspects
which are common to most. Below are a set of guidelines recommended for anyone thinking about
designing a diary.
 An A4 booklet of about 5 to 20 pages is desirable, depending on the nature of the diary.
 The inside cover page should contain a clear set of instructions on how to complete the diary.
This should stress the importance of recording events as soon as possible after they occur and
how the respondent should try not to let the diary keeping influence their behavior.
 Depending on how long a period the diary will cover, each page denoting either a week, a day of
the week or a 24 hour period or less. Pages should be clearly ruled up as a calendar with
prominent headings and enough space to enter all the desired information (such as what the
respondent was doing, at what time, where, who with and how they felt at the time, and so on).
 Checklists of the items, events or behavior to help jog the diary keeper’s memory should be
printed somewhere fairly prominent. Very long lists should be avoided since they may be off-
putting and confusing to respondents. For a structured time budget diary, an exhaustive list of

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 260

all possible relevant activities should be listed together with the appropriate codes. Where more
than one type of activity is to be entered, that is, primary and secondary (or background)
activities, guidance should be given on how to deal with competing or multiple activities.
 There should be an explanation of what is meant by the unit of observation, such as a ‘session’,
an ‘event’ or a ‘fixed time block’. Where respondents are given more freedom in naming their
activities and the activities are to be coded later, it is important to give strict guidelines on
what type of behavior to include, what definitely to exclude and the level of detail required.
Time budget diaries without fixed time blocks should include columns for start and finish times
for activities.
 Appropriate terminology or lists of activities should be designed to meet the needs of the
sample under study, and if necessary, different versions of the diary should be used for
different groups.
 Following the diary pages it is useful to include a simple set of questions for the respondent to
complete, asking, among other things, whether the diary keeping period was atypical in any way
compared to usual daily life. It is also good practice to include a page at the end asking for the
respondents' own comments and clarifications of any peculiarities relating to their entries. Even
if these remarks will not be systematically analyzed, they may prove helpful at the editing or
coding stage.
Data Quality and Response Rates: In addition to the types of errors encountered in all survey
methods, diaries are especially prone to errors arising from respondent conditioning, incomplete
recording of information and under-reporting, inadequate recall, insufficient cooperation and sample
selection bias.
Diary keeping period: The period over which a diary is to be kept needs to be long enough to capture
the behavior or events of interest without jeopardizing successful completion by imposing an overly
burdensome task. For collecting time-use data, anything from one to three day diaries may be used.
Household expenditure surveys usually place diaries on specific days to ensure an even coverage
across the week and distribute their field work over the year to ensure seasonal variation in
earnings and spending is captured.
Reporting errors: In household expenditure surveys it is routinely found that the first day and first
week of diary keeping shows higher reporting of expenditure than the following days. This is also
observed for other types of behavior and the effects are generally termed ‘first day effects’. They
may be due to respondents changing their behavior as a result of keeping the diary (conditioning), or
becoming less conscientious than when they started the diary. Recall errors may also extend to
‘tomorrow’ diaries. Respondents often write down their entries at the end of a day and only a small
minority are diligent diary keepers who carry their diary with them at all times. Expenditure surveys
find that an intermediate visit from an interviewer during the diary keeping period helps preserve
‘good’ diary keeping to the end of the period.
Literacy: All methods that involve self-completion of information demand that the respondent has a
reasonable standard of literacy. Thus the diary sample and the data may be biased towards the
population of competent diary keepers.
Participation: The best response rates for diary surveys are achieved when diary keepers are
recruited on a face-to-face basis, rather than by post. Personal collection of diaries also allows any
problems in the completed diary to be sorted out on the spot. Success may also depend on the
quality of interviewing staff who should be highly motivated, competent and well-briefed. Appealing

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 261

to respondent’s altruistic nature, reassuring them of confidentiality and offering incentives are
thought to influence co-operation in diary surveys.
Coding, Editing and Processing: The amount of work required to process a diary depends largely on
how structured it is. For many large scale diary surveys, part of the editing and coding process is
done by the interviewer while still in the field. Following this is an intensive editing procedure which
includes checking entries against information collected in the personal interview. For unstructured
diaries, involving coding of verbatim entries, the processing can be very labor intensive, in much the
same way as it is for processing qualitative interview transcripts. Using highly trained coders and a
rigorous unambiguous coding scheme is very important particularly where there is no clear
demarcation of events or behavior in the diary entries. Clearly, a well designed diary with a coherent
pre-coding system should cut down on the degree of editing and coding.
Relative Cost of Diary Surveys: The diary method is generally more expensive than the personal
interview, and personal placement and pick-up visits are more costly than postal administration. If
the diary is unstructured, intensive editing and coding will push up the costs. However, these costs
must be balanced against the superiority of the diary method in obtaining more accurate data,
particularly where the recall method gives poor results.
Computer Software for Processing and Analysis: Although computer assisted methods may help to
reduce the amount of manual preparatory work, there are few packages and most of them are
custom built to suit the specifics of a particular project. Time-budget researchers are probably the
most advanced group of users of machine readable diary data and the structure of these data allows
them to use traditional statistical packages for analysis. More recently, methods of analysis based
on algorithms for searching for patterns of behavior in diary data are being used (Coxon 1991).
Software development is certainly an area which merits future attention. For textual diaries,
qualitative software packages such as the ‘Ethnograph’ can be used to code them in the same way as
interview transcripts (Fielding & Lee 1991).
Archiving Diary Data: In spite of the abundance of data derived from diary surveys across a wide
range of disciplines, little is available to other researchers for secondary analysis (further analysis
of data already collected). This is perhaps not surprising given that the budget for many diary
surveys does not extend to systematic processing of the data. Many diary surveys are small scale
investigative studies that have been carried out with very specific aims in mind. For these less
structured diaries, for which a common coding scheme is neither feasible, nor possibly desirable, an
answer to public access is to deposit the original survey documents in an archive. This kind of data
bank gives the researcher access to original diary documents allowing them to make use of the data
in ways to suit their own research strategy. However, the ethics of making personal documents
public (even if in the limited academic sense) have to be considered.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 262

Advantages and Criticism of Diary Studies


Advantages of diary studies are numerous. They allow –
 collecting longitudinal and temporal information;
 reporting events and experiences in context;
 determining the antecedents, correlates and consequences of daily experiences.
The criticism of diary studies are as - diary studies might generate inaccurate recall, especially if
using the elicitation type of diary studies, because of the use of memory triggers, like for example
taking a photo and then write about it later. There is low control, low participation and there is a
risk of disturbing the action. In feedback studies there is also low control, and it can be troubling
and disturbing to write everything down.

9.4.10 PRINCIPAL COMPONENT ANALYSIS (PCA)


Principal component analysis (PCA) is a procedure for identifying a smaller number of uncorrelated
variables, called ‘principal components’, from a large set of data. PCA was invented in 1901 by Karl
Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently
developed (and named) by Harold Hotelling in the 1930s. The goal of principal components analysis is
to explain the maximum amount of variance with the fewest number of principal components.
Principal components analysis is commonly used in the social sciences, market research, and other
industries that use large data sets. Principal component analysis is appropriate when you have
obtained measures on a number of observed variables and wish to develop a smaller number of
artificial variables (called principal components) that will account for most of the variance in the
observed variables. The principal components may then be used as predictor or criterion variables in
subsequent analyses. It is a variable reduction procedure. It is useful when you have obtained data
on a number of variables (possibly a large number of variables), and believe that there is some
redundancy in those variables. In this case, redundancy means that some of the variables are
correlated with one another, possibly because they are measuring the same construct. Because of
this redundancy, you believe that it should be possible to reduce the observed variables into a
smaller number of principal components (artificial variables) that will account for most of the
variance in the observed variables.
Because it is a variable reduction procedure, principal component analysis is similar in many respects
to exploratory factor analysis. In fact, the steps followed when conducting a principal component
analysis are virtually identical to those followed when conducting an exploratory factor analysis.
However, there are significant conceptual differences between the two procedures, and it is
important that you do not mistakenly claim that you are performing factor analysis when you are
actually performing principal component analysis. Principal components analysis is commonly used as
one step in a series of analyses. You can use principal components analysis to reduce the number of
variables and avoid multicollinearity, or when you have too many predictors relative to the number of
observations. A consumer products company wants to analyze customer responses to several
characteristics of a new shampoo: color, smell, texture, cleanliness, shine, volume, amount needed to
lather, and price. They perform a principal components analysis to determine whether they can form
a smaller number of uncorrelated variables that are easier to interpret and analyze. The results
identify the following patterns –
 Color, smell, and texture form a ‘Shampoo quality’ component.
 Cleanliness, shine, and volume form an ‘Effect on hair’ component.
 Amount needed to lather and price form a ‘Value’ component.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 263

Objectives of principal component analysis are –


 To discover or to reduce the dimensionality of the data set.
 To identify new meaningful underlying variables.
Traditionally, principal component analysis is performed on the symmetric Covariance matrix or on
the symmetric Correlation matrix. These matrices can be calculated from the data matrix. The
covariance matrix contains scaled sums of squares and cross products. A correlation matrix is like a
covariance matrix but first the variables, i.e. the columns, have been standardized. We will have to
standardize the data first if the variances of variables differ much, or if the units of measurement
of the variables differ. To perform the analysis, we select the ‘Table of Real’ data matrix in the list
of objects and choose to PCA.

9.4.11 ACTIVITY SAMPLING


Activity sampling is a technique whereby a number of successive observations are made over a
period of time of one or a group of workers, machines or processes. Each observation records what
is happening at that instant, with a rating if necessary. And the percentage of observations
recorded for a particular activity or delay is a measure of the percentage of time during which that
activity occurs. The activity sampling technique was devised for the purpose of getting information
on the time spent by groups of workers or machines on various activities or delays. For this purpose
the sample can be very useful, and in many cases it has been found most valuable as a method of
reconnaissance prior to the use of more detailed work study techniques. Among the many
applications of activity sampling are numbered the investigating work necessary in-
1. Improving the arrangement of duties and general organisation of work.
2. Indicating the directions in which improvements in methods and equipment should be sought, and
assessing the vaue of the proposed changes.
3. Assessing the value of introducing group incentive schemes.
4. Assessing labour requirements to machine utilisation.
5. Examining the causes of unsatisfactory performance/efficiency figures or machine utilisation
figures.
The activity sampling technique is conducted over a representative period of work by taking samples
of activity of the operators and machines to be included and then analysed using statistical
tolerance procedures. Certain types of work may be difficult to study using standard work
measurement techniques, for example warehouses. A full production study would be time
consuming and expensive. This technique, developed on statistical work by Tippett, allows ‘snap’
observations to be built into a picture of the whole. It is an ideal system for assessing machine
efficiency in a large department, and can easily demonstrate the average stoppage rate. The
technique is very similar to statistical quality control, where large numbers of products are
inspected to give an expected confidence level of defect expectation.
Obviously, the accuracy of activity sampling will depend on the number of observations. Few and
infrequent observations will provide a low level of accuracy, whilst many and frequent observations
will give highly accurate but more expensive information. It is, therefore, particularly important
that the observer knows the optimum number of observations necessary for a particular study.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 264

This number can be calculated quite simply once an approximate picture of the situation is
established, using the following formula.
N= 4P(100-P)
L2
Where, N = Number of observations; P = Approximate occurrence of factor as a percentage of N; L
= Acceptable accuracy in occurrence of factor being studied.
This formula will give the accuracy of the study within 95% confidence limits.
For exmple, a worker is studied using activity sampling, and 32 observations are noted. Of these
75% showed that the worker was performing useful work. If we assume that we would like to check
that the worker is performing at this level continuously, plus or minus 10%, ie. between 67.5% and
82.5%, how many observations would we need to provide 95% confidence in the result.
Solution: Here, P = 75%; L = 10%
Hence, N = 4 x 75(100-75)
10 * 10
N = 300 * 25
100
N = 75
However, after performing 75 checks, the value of P was found to be only 70% so the extra data
could be used to assess the new requirement for the number of checks.
N= 4 * 70(100-70)
10 * 10
N = 280 * 30
100
N = 28 * 3 = 84
Hence more checks would be required, ie. a total of 9.
Once these checks had been completed, a final calculation should be done to ensure that the
number required had not changed.
It is normally used for collecting information on the percentages of time spent on activities, without
the need to devote the time that would otherwise be required for any continuous observation. One
of the great advantages of this technique is that it enables lengthy activities or groups of activities
to be studied economically and in a way that produces statistically accurate data. Activity sampling
can be carried out at random intervals or fixed intervals. Random activity sampling is where the
intervals between observations are selected at random e.g. from a table of random numbers. Fixed
interval activity sampling is where the same interval exists between observations. A decision will
need to be made on which of these two approaches is to be chosen. A fixed interval is usually chosen
where activities are performed by a person or group of people who have a degree of control over
what they do and when they do it. Random intervals will normally be used where there are a series of
automated tasks or activities as part of a process, that are have to be performed in a pre
established regular pattern. If fixed interval sampling were to be used in this situation there is a
danger that the sampling point would continue to occur at the same point in the activity cycle.

9.4.12 MEMO MOTION STUDY


Memo motion or spaced-shot photography is a tool of time and motion study that analyzes long

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 265

operations by using a camera. It was developed 1946 by Marvin E. Mundel at Purdue University, who
was first to save film material while planning studies on kitchen work.
Mundel published the method in 1947 with several studies in his textbook ‘Systematic Motion and
Time Study’. A study showed the following advantages of Memo-Motion in regard to other forms of
time and motion study –
 Single operator repetition work.
 Area studies, the study of a group of men or machines.
 Team studies.
 Utilisation studies.
 Work measurement.
As a versatile tool of work study it was used in the US to some extent, but rarely in Europe and
other industrial countries mainly because of difficulties procuring the required cameras. Today
Memo-Motion could have a comeback because more and more workplaces have conditions which it can
explore.

9.4.13 PROCESS ANALYSIS


A step-by-step breakdown of the phases of a process, used to convey the inputs, outputs, and
operations that take place during each phase. A process analysis can be used to improve
understanding of how the process operates, and to determine potential targets for process
improvement through removing waste and increasing efficiency. Inputs may be materials, labor,
energy, and capital equipment. Outputs may be a physical product (possibly used as an input to
another process) or a service. Processes can have a significant impact on the performance of a
business, and process improvement can improve a firm’s competitiveness. The first step to improving
a process is to analyze it in order to understand the activities, their relationships, and the values of
relevant metrics. Process analysis generally involves the following tasks-
 Define the process boundaries that mark the entry points of the process inputs and the exit
points of the process outputs.
 Construct a process flow diagram that illustrates the various process activities and their
interrelationships.
 Determine the capacity of each step in the process. Calculate other measures of interest.
 Identify the bottleneck, that is, the step having the lowest capacity.
 Evaluate further limitations in order to quantify the impact of the bottleneck.
 Use the analysis to make operating decisions and to improve the process.
Process Analysis Tools
When you want to understand a work process or some part of a process, these tools can help -
 Flowchart: A picture of the separate steps of a process in sequential order, including materials
or services entering or leaving the process (inputs and outputs), decisions that must be made,
people who become involved, time involved at each step and/or process measurements.
 Failure Mode Effects Analysis (FMEA): A step-by-step approach for identifying all possible
failures in a design, a manufacturing or assembly process, or a product or service; studying the
consequences, or effects, of those failures; and eliminating or reducing failures, starting with
the highest-priority ones.
 Mistake-proofing: The use of any automatic device or method that either makes it impossible
for an error to occur or makes the error immediately obvious once it has occurred.
 Spaghetti Diagram: A spaghetti diagram is a visual representation using a continuous flow line

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 266

tracing the path of an item or activity through a process. The continuous flow line enables
process teams to identify redundancies in the work flow and opportunities to expedite process
flow.
Process Flow Diagram
The process boundaries are defined by the entry and exit points of inputs and outputs of the
process. Once the boundaries are defined, the process flow diagram (or process flowchart) is a
valuable tool for understanding the process using graphic elements to represent tasks, flows, and
storage. The following is a flow diagram for a simple process having three sequential activities-

The symbols in a process flow diagram are defined as follows-


 Rectangles - represent tasks.
 Arrows - represent flows. Flows include the flow of material and the flow of information. The
flow of information may include production orders and instructions. The information flow may
take the form of a slip of paper that follows the material, or it may be routed separately,
possibly ahead of the material in order to ready the equipment. Material flow usually is
represented by a solid line and information flow by a dashed line.
 Inverted triangles - represent storage (inventory). Storage bins commonly are used to represent
raw material inventory, work in process inventory, and finished goods inventory.
 Circles - represent storage of information (not shown in the above diagram).

In a process flow diagram, tasks drawn one after the other in series are performed sequentially.
Tasks drawn in parallel are performed simultaneously. In the above diagram, raw material is held in a
storage bin at the beginning of the process. After the last task, the output also is stored in a
storage bin. When constructing a flow diagram, care should be taken to avoid pitfalls that might
cause the flow diagram not to represent reality. For example, if the diagram is constructed using
information obtained from employees, the employees may be reluctant to disclose rework loops and
other potentially embarrassing aspects of the process. Similarly, if there are illogical aspects of the
process flow, employees may tend to portray it as it should be and not as it is. Even if they portray
the process as they perceive it, their perception may differ from the actual process. For example,
they may leave out important activities that they deem to be insignificant.
Process Performance Measures
Operations managers are interested in process aspects such as cost, quality, flexibility, and speed.
Some of the process performance measures that communicate these aspects include-
 Process capacity - the capacity of the process is its maximum output rate, measured in units
produced per unit of time. The capacity of a series of tasks is determined by the lowest
capacity task in the string. The capacity of parallel strings of tasks is the sum of the capacities
of the two strings, except for cases in which the two strings have different outputs that are
combined. In such cases, the capacity of the two parallel strings of tasks is that of the lowest
capacity parallel string.
 Capacity utilization - the percentage of the process capacity that actually is being used.
 Throughput rate (also known as flow rate ) - the average rate at which units flow past a specific
point in the process. The maximum throughput rate is the process capacity.
 Flow time (also known as throughput time or lead time) - the average time that a unit requires to
flow through the process from the entry point to the exit point. The flow time is the length of

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 267

the longest path through the process. Flow time includes both processing time and any time the
unit spends between steps.
 Cycle time - the time between successive units as they are output from the process. Cycle time
for the process is equal to the inverse of the throughput rate. Cycle time can be thought of as
the time required for a task to repeat itself. Each series task in a process must have a cycle
time less than or equal to the cycle time for the process. Put another way, the cycle time of the
process is equal to the longest task cycle time. The process is said to be in balance if the cycle
times are equal for each activity in the process. Such balance rarely is achieved.
 Process time - the average time that a unit is worked on. Process time is flow time less idle time.
 Idle time - time when no activity is being performed, for example, when an activity is waiting for
work to arrive from the previous activity. The term can be used to describe both machine idle
time and worker idle time.
 Work In process - the amount of inventory in the process.
 Set-up time - the time required to prepare the equipment to perform an activity on a batch of
units. Set-up time usually does not depend strongly on the batch size and therefore can be
reduced on a per unit basis by increasing the batch size.
 Direct labor content - the amount of labor (in units of time) actually contained in the product.
Excludes idle time when workers are not working directly on the product. Also excludes time
spent maintaining machines, transporting materials, etc.
 Direct labor utilization - the fraction of labor capacity that actually is utilized as direct labor.
Process Bottleneck
The process capacity is determined by the slowest series task in the process; that is, having the
slowest throughput rate or longest cycle time. This slowest task is known as the bottleneck.
Identification of the bottleneck is a critical aspect of process analysis since it not only determines
the process capacity, but also provides the opportunity to increase that capacity. Saving time in the
bottleneck activity saves time for the entire process. Saving time in a non-bottleneck activity does
not help the process since the throughput rate is limited by the bottleneck. It is only when the
bottleneck is eliminated that another activity will become the new bottleneck and presents a new
opportunity to improve the process. If the next slowest task is much faster than the bottleneck,
then the bottleneck is having a major impact on the process capacity. If the next slowest task is
only slightly faster than the bottleneck, then increasing the throughput of the bottleneck will have
a limited impact on the process capacity.
Starvation and Blocking
Starvation occurs when a downstream activity is idle with no inputs to process because of upstream
delays. Blocking occurs when an activity becomes idle because the next downstream activity is not
ready to take it. Both starvation and blocking can be reduced by adding buffers that hold inventory
between activities.
Process Improvement
Improvements in cost, quality, flexibility, and speed are commonly sought. The following lists some
of the ways that processes can be improved.
 Reduce work-in-process inventory - reduces lead time.
 Add additional resources to increase capacity of the bottleneck. For example, an additional
machine can be added in parallel to increase the capacity.
 Improve the efficiency of the bottleneck activity - increases process capacity.
 Move work away from bottleneck resources where possible - increases process capacity.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 268

 Increase availability of bottleneck resources, for example, by adding an additional shift -


increases process capacity.
 Minimize non-value adding activities - decreases cost, reduces lead time. Non-value adding
activities include transport, rework, waiting, testing and inspecting, and support activities.
 Redesign the product for better manufacturability - can improve several or all process
performance measures.
 Flexibility can be improved by outsourcing certain activities. Flexibility also can be enhanced by
postponement, which shifts customizing activities to the end of the process.
In some cases, dramatic improvements can be made at minimal cost when the bottleneck activity is
severely limiting the process capacity. On the other hand, in well-optimized processes, significant
investment may be required to achieve a marginal operational improvement. Because of the large
investment, the operational gain may not generate a sufficient rate of return. A cost-benefit
analysis should be performed to determine if a process change is worth the investment. Ultimately,
net present value will determine whether a process ‘improvement’ really is an improvement.

9.4.14 LINK ANALYSIS


Link analysis is a data analysis technique used in network theory that is used to evaluate the
relationships or connections between network nodes. These relationships can be between various
types of objects (nodes), including people, organizations and even transactions. Link analysis is
essentially a kind of knowledge discovery that can be used to visualize data to allow for better
analysis, especially in the context of links, whether Web links or relationship links between people or
between different entities. Link analysis has been used for investigation of criminal activity (fraud
detection, counterterrorism, and intelligence), computer security analysis, search engine
optimization, market research and medical research.
Link analysis is literally about analyzing the links between objects, whether they are physical, digital
or relational. This requires diligent data gathering. For example, in the case of a website where all
of the links and backlinks that are present must be analyzed, a tool has to sift through all of the
HTML codes and various scripts in the page and then follow all the links it finds in order to
determine what sort of links are present and whether they are active or dead. This information can
be very important for search engine optimization, as it allows the analyst to determine whether the
search engine is actually able to find and index the website. In networking, link analysis may involve
determining the integrity of the connection between each network node by analyzing the data that
passes through the physical or virtual links. With the data, analysts can find bottlenecks and
possible fault areas and are able to patch them up more quickly or even help with network
optimization.
Link analysis has three primary purposes –
 Find matches for known patterns of interests between linked objects.
 Find anomalies by detecting violated known patterns.
 Find new patterns of interest (for example, in social networking and marketing and business
intelligence).

9.4.15 TIME AND MOTION STUDY


Time and motion study, or motion and time study, is a basic set of tools used by industrial engineers
to increase operational efficiency through work simplification and the setting of standards, usually
in combination with a wage-incentive system designed to increase worker motivation. Originally

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 269

developed to drive productivity improvement in manufacturing plants, motion and time study is also
now used in service industries. Motion and time study is associated with the so-called scientific
management movement of the late nineteenth and early twentieth century’s in the United States,
primarily with the work of industrial engineers Frederick Winslow Taylor (1856–1915), Frank B.
Gilbreth (1868– 1924), and Lillian Gilbreth (1878–1972). Some time studies had been conducted
before Taylor, particularly by French engineer Jean Rodolphe Perronet (1708–1794) and English
economist Charles Babbage (1791–1871), both analyzing pin manufacturing. However, modern motion
and time study was developed as part of the scientific management movement championed by Taylor
and eventually became known as Taylorism .
The foundation of Taylorism is a system of task management in which responsibilities are clearly
divided between managers and workers. Managers and engineers engage in planning and task
optimization, primarily through motion and time study, while workers are responsible for carrying
out discrete tasks as directed. The Gilbreths sought to find the best method to perform an
operation and reduce fatigue by studying body motions, attempting to eliminate unnecessary ones
and simplify necessary ones to discover the optimal sequence of motions. The Gilbreths developed
the technique of micromotion study, in which motions are filmed and then watched in slow motion.
Taylor incorporated early research from the Gilbreths in his ‘The Principles of Scientific
Management’ (1911), and subsequent industrial engineers further developed the Taylorist system.
Taylorism played a key role in the continuous productivity improvement generated by the Fordist
model of work organization. The Fordist model, which is based on the supply-driven, mass production
of standardized goods using semiskilled workers, achieved efficiency improvements via scale
economies and detailed division of labor, both accomplished through the Taylorist separation of
conception from execution, in which managers plan tasks that workers execute. Taylor argued that
such a division of labor between management and workers was a form of ‘harmonious cooperation’
that ultimately removed antagonisms from the workplace and benefited both managers and workers.
However, this process of separating conception from execution is often understood as a form of de-
skilling, and Taylorism has been rejected by unions, who have denounced it as a form of speedup that
harms workers and hence quality and productivity.
Debates about the effect of motion and time study on workers continue today in discussions of
post-Fordism, particularly lean production, which employs motion and time study to set standards
and achieve continuous improvement in work processes, but in a context of demand-driven
production without large buffers of in process inventory. Some workers and commentators argue
that motion and time study under lean production is simply a form of work intensification that is
detrimental to workers, while others argue that under lean production workers are able to
contribute to problem solving and standard setting and thus prefer motion and time study under lean
production to that under Fordism.
Underlying each system is a theory of worker motivation - that workers need to be coerced (in the
Fordist model) or that workers want to do their best and are interested in more intellectual activity
(in the post-Fordist model). In reality, there is more likely a distribution of different motivations
across workers, and worker well-being is likely to depend more on the interaction between individual
orientations toward work and how a given set of methods such as motion and time study are applied
in a particular work context. Because it’s the method that determines the time needed for any
activity, the whole emphasis has changed over the years. The 21st century equivalent of the time
and motion study is more literally a method and time study. This is a more far-reaching philosophy
and approach to managing a business. When everyone is focused on better and leaner processes the

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 270

methods improve, time is reduced and more value is added. This - with continuous improvement -
means activities become more streamlined and Lean. Lean means that anything wasteful is shown the
bin (movement, time, materials, space). When improvements and Lean initiatives are identified and
implemented, workers can often benefit from less stressful working conditions, less fatigue -
potentially better rewards, maybe in the form of different hours, increased pay and job
satisfaction. It can be a win-win situation.
Time and Motion Study Basics
In summary, it goes like this –
 Look closely at what you’re doing.
 Spot opportunities to be more efficient.
 Make a change to the way you work to do it.
 See if it produces the expected results.
 Rinse and repeat.
 Small changes, big benefits - Small savings quickly mount up. At the same time, we spend a lot of
time in our lives doing stuff that is not very useful.
 Pay attention - Pay attention to what you do and how you do it.
 Start by thinking, in broad terms, about how you spend your time over the course of a typical
working week.
 Rescue Time, which tracks the applications and websites you use, may give you more objective
data about how you spend your time. Simply writing things down may be enough.
 Spot opportunities for improvement - You already have data about the amount of time spent
from your observations.
 Make a positive change.
 Evaluate results.

Productivity is often linked with ‘time and motion’. The evidence of time and motion studies was used
to put pressure on workers to perform faster. Not surprisingly these studies had a bad press as far
as workers were concerned. Productivity is about the effective and efficient use of all resources.
To manage the resources of a business it is essential that you –
 understand exactly what needs to be done to meet customer demand;
 establish a plan that clearly identifies the work to be carried out;
 define and implement the methodologies that need to be used to complete all activities and tasks
efficiently;
 establish how long it will actually take to complete each activity and task;
 determine what resources you need to meet the plan;
 provide the necessary resources and initiate the plan;
 constantly monitor what is actually happening against the plan; and
 identify variances and take the relevant actions to correct them or modify the plan.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 271

9.4.16 EXPERIMENTAL METHOD


The prime method of inquiry in science is the experiment. The key features are control over
variables, careful measurement, and establishing cause and effect relationships. An experiment is an
investigation in which a hypothesis is scientifically tested. In an experiment, an independent variable
(the cause) is manipulated and the dependent variable (the effect) is measured; any extraneous
variables are controlled. An advantage is that experiments should be objective. The views and
opinions of the researcher should not affect the results of a study. This is good as it makes the
data more valid, and less bias.
There are three types of experiments need to know –
1. Laboratory / Controlled Experiments: This type of experiment is conducted in a well-controlled
environment – not necessarily a laboratory – and therefore accurate measurements are possible.
The researcher decides where the experiment will take place, at what time, with which
participants, in what circumstances and using a standardized procedure. Participants are
randomly allocated to each independent variable group.
Strength: It is easier to replicate (i.e. copy) a laboratory experiment. This is because a
standardized procedure is used. They allow for precise control of extraneous and independent
variables. This allows a cause and effect relationship to be established.
Limitation: The artificiality of the setting may produce unnatural behavior that does not reflect
real life, i.e. low ecological validity. This means it would not be possible to generalize the findings
to a real life setting. Demand characteristics or experimenter effects may bias the results and
become confounding variables.
2. Field Experiments: Field experiments are done in the everyday (i.e. real life) environment of the
participants. The experimenter still manipulates the independent variable, but in a real-life
setting (so cannot really control extraneous variables).
Strength: Behavior in a field experiment is more likely to reflect real life because of its natural
setting, i.e. higher ecological validity than a lab experiment. There is less likelihood of demand
characteristics affecting the results, as participants may not know they are being studied. This
occurs when the study is covert.
Limitation: There is less control over extraneous variables that might bias the results. This
makes it difficult for another researcher to replicate the study in exactly the same way.
3. Natural Experiments: Natural experiments are conducted in the everyday (i.e. real life)
environment of the participants, but here the experimenter has no control over the IV as it
occurs naturally in real life.
Strength: Behavior in a natural experiment is more likely to reflect real life because of its
natural setting, i.e. very high ecological validity. There is less likelihood of demand
characteristics affecting the results, as participants may not know they are being studied. Can
be used in situations in which it would be ethically unacceptable to manipulate the independent
variable, e.g. researching stress.
Limitation: They may be more expensive and time consuming than lab experiments. There is no
control over extraneous variables that might bias the results. This makes it difficult for
another researcher to replicate the study in exactly the same way.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 272

Experiment Terminology
 Ecological validity: The degree to which an investigation represents real-life experiences.
 Experimenter effects: These are the ways that the experimenter can accidentally influence the
participant through their appearance or behavior.
 Demand characteristics: The clues in an experiment that lead the participants to think they
know what the researcher is looking for (e.g. experimenter’s body language).
 Independent variable (IV): Variable the experimenter manipulates (i.e. changes) – assumed to
have a direct effect on the dependent variable.
 Dependent variable (DV): Variable the experimenter measures.
 Extraneous variables (EV): Variables, which are not the independent variable, but could affect
the results (DV) of the experiment. EVs should be controlled where possible.
 Confounding variables: Variable(s) that have affected the results (DV), apart from the IV. A
confounding variable could be an extraneous variable that has not been controlled.
Research Biases
We have got a hypothesis which is the first step in doing an experiment. Before we can continue, we
need to be aware of some aspects of research that can contaminate our results. In other words,
what could get in the way of our results in this study being accurate. These aspects are called
research biases, and there are basically three main biases we need to be concerned with.
 Selection Bias – occurs when differences between groups are present at the beginning of the
experiment.
 Placebo Effect – involves the influencing of performance due to the subject’s belief about the
results. In other words, if I believe the new medication will help me feel better, I may feel
better even if the new medication is only a sugar pill. This demonstrates the power of the mind
to change a person’s perceptions of reality.
 Experimenter Bias – the same way a person’s belief’s can influence his/her perception, so can the
belief of the experimenter. If I’m doing an experiment, and really believe my treatment works,
or I really want the treatment to work because it will mean big bucks for me, I might behave in a
manner that will influence the subject.
Controlling for Biases
After carefully reviewing our study and determining what might effect our results that are not part
of the experiment, we need to control for these biases. To control for selection bias, most
experiments use what’s called ‘Random Assignment’, which means assigning the subjects to each
group based on chance rather than human decision. To control for the placebo effect, subjects are
often not informed of the purpose of the experiment. This is called a ‘Blind’ study, because the
subjects are blind to the expected results. To control for experimenter biases, we can utilize a
‘Double-Blind’ study, which means that both the experimenter and the subjects are blind to the
purpose and anticipated results of the study. We have our hypothesis, and we know what our subject
pool is, the next thing we have to do is standardize the experiment. Standardization refers to a
specific set of instructions. The reason we want the experiment to be standardized is twofold.
First, we want to make sure all subjects are given the same instructions, presented with the
experiment in the same manner, and that all of the data is collected exactly the same or all
subjects. Second, single experiments cannot typically stand on their own. To really show that are
results are valid, experiments need to be replicated by other experimenters with different
subjects. To do this, the experimenters need to know exactly what we did so they can replicate it.

9.4.17 STATISTICAL METHODS

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 273

Statistical methods are the methods of collecting, summarizing, analyzing, and interpreting
variable(s) in numerical data. Statistical methods can be contrasted with deterministic methods,
which are appropriate where observations are exactly reproducable or are assumed to be so. Data
collection involves deciding what to observe in order to obtain information relevant to the questions
whose answers are required, and then making the observations. Sampling involves choice of a
sufficient number of observations representing an appropriate population. Experiments with variable
outcomes should be conducted according to principles of experimental design. Data summarization is
the calculation of appropriate statistics and the display of such information in the form of tables,
graphs, or charts. Data may also be adjusted to make different samples more comparable, using
ratios, compensating factors, etc.
Statistical analysis relates observed statistical data to theoretical models, such as probability
distributions or models used in regression analysis. By estimating parameters in the proposed model
and testing hypotheses about rival models, one can assess the value of the information collected and
the extent to which the information can be applied to similar situations. Statistical prediction is the
application of the model thought to be most appropriate, using the estimated values of the
parameters. More recently, less formal methods of looking at data have been proposed, including
exploratory data analysis.

9.5 METHODS OF SECONDARY DATA COLLECTION


Secondary data is the data that is collected from the primary sources which can be used in the
current research study. Collecting secondary data often takes considerably less time than collecting
primary data where you would have to gather every information from scratch. It is thus possible to
gather more data this way.
Secondary data can be obtained from two different research strands –
 Quantitative: Census, housing, social security as well as electoral statistics and other related
databases.
 Qualitative: Semi-structured and structured interviews, focus groups transcripts, field notes,
observation records and other personal, research-related documents.
Secondary data is often readily available. After the expense of electronic media and internet the
availability of secondary data has become much easier.
Published Printed Sources: There are varieties of published printed sources. Their credibility
depends on many factors. For example, on the writer, publishing company and time and date when
published. New sources are preferred and old sources should be avoided as new technology and
researches bring new facts into light.
Books: Books are available today on any topic that you want to research. The use of books start
before even you have selected the topic. After selection of topics books provide insight on how
much work has already been done on the same topic and you can prepare your literature review.
Books are secondary source but most authentic one in secondary sources.
Journals/periodicals: Journals and periodicals are becoming more important as far as data collection
is concerned. The reason is that journals provide up-to-date information which at times books
cannot and secondly, journals can give information on the very specific topic on which you are
researching rather talking about more general topics.
Magazines/Newspapers: Magazines are also effective but not very reliable. Newspapers on the other
hand are more reliable and in some cases the information can only be obtained from newspapers as in
the case of some political studies.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 274

Published Electronic Sources: As internet is becoming more advance, fast and reachable to the
masses; it has been seen that much information that is not available in printed form is available on
internet. In the past the credibility of internet was questionable but today it is not. The reason is
that in the past journals and books were seldom published on internet but today almost every journal
and book is available online. Some are free and for others you have to pay the price.
e-journals: e-journals are more commonly available than printed journals. Latest journals are
difficult to retrieve without subscription but if your university has an e-library you can view any
journal, print it and those that are not available you can make an order for them.
General Websites: Generally websites do not contain very reliable information so their content
should be checked for the reliability before quoting from them.
Weblogs: Weblogs are also becoming common. They are actually diaries written by different people.
These diaries are as reliable to use as personal written diaries.
Unpublished Personal Records: Some unpublished data may also be useful in some cases.
Diaries: Diaries are personal records and are rarely available but if you are conducting a descriptive
research then they might be very useful. The Anne Frank’s diary is the most famous example of this.
That diary contained the most accurate records of Nazi wars.
Letters: Letters like diaries are also a rich source but should be checked for their reliability before
using them.
Government Records: Government records are very important for marketing, management,
humanities and social science research.
Census Data/population statistics: Health records; Educational institutes’ records etc.
Public Sector Records: NGOs’ survey data; Other private companies records.

9.6 METHODS OF LEGAL RESEARCH


In pursuing research for disclosing facts or proving a hypothesis true or false, various kinds of
methods can be applied for the successful research. The following research methods collectively or
individually can be applied for the successful research as the main methods.
Observation: Information can be received by observing, visiting and viewing the place, society,
events or the things pertinent to the study or research. Observation can be taken as primary and
reliable source of information. If a researcher is careful, s/he can get the points that may play the
significant role in his/her research or study. Observation is a method that is common in the
research of legal and social science. Observation should be guided by a specific research purpose,
the information receive from the observation should be recorded and subjected to checks on the
trail of reliability.
Questionnaire: In questionnaire method, a researcher develops a form containing such questions
pertinent to his/her study. Generally, the researcher prepares yes/ no questions or short answer
questions. In questionnaire method, researcher distributes such forms to the people to whom s/he
deems appropriate. The people, to whom the questionnaires have been distributed, should answer
that what they have known by filling out the form and return it to researcher.
Sampling: When the subject of research is vague, comprehensive and when each indicator cannot be
taken by virtue of financial constraint, time and complexity, etc. then the researcher can randomly
collect data/sample depending on the reason. This is called as sampling method. For instance, in a
demographic research, part of population represent various groups can be taken into consideration.
That is why, it is said that sample is a method that saves time and money.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 275

Interviews: A researcher can receive information sought by him/her asking people concerned
through interview. It is a direct method of receiving information. Interview can be generally held
asking questions in face-to-face contact to the person or persons and sometimes through telephone
conversation. This method is common in the research of legal and social science. In this method, the
researcher has to use less skill and knowledge to receive information s/he had sought. Interview is
known as an art of receiving pertinent information. Interview can be taken as a systematic method
by which a person enters more or less imaginatively into the life of a stranger.
Case Study: Case study is taken as one of the important and reliable methods for legal research.
Case study can be defined as a method of research where facts and grounds of each legal issue are
dealt with by taking individual case. Case study is a method of exploring and analyzing of life of a
social unit such as a person, a family, an institution, a cultural group or even entire community. Case
study is a way of organizing social data so as to preserve the utility character of the social object
being studied. Keeping in view to the matters as referred to in above, we can state here that the
case study is a method of legal research to explore and analyze the fact and data of a social unit
and to organize social data for prescription of useful character and society.

References
Kabir, S.M.S. (2016). Basic Guidelines for Research: An Introductory Approach for All
Disciplines. Book Zone Publication, ISBN: 978-984-33-9565-8, Chittagong-4203,
Bangladesh.
Kabir, S.M.S. (2017). Essentials of Counseling. Abosar Prokashana Sangstha, ISBN: 978-984-
8798-22-5, Banglabazar, Dhaka-1100.
Kabir, S.M.S., Mostafa, M.R., Chowdhury, A.H., & Salim, M.A.A. (2016). Bangladesher
Samajtattwa (Sociology of Bangladesh). Protik Publisher, ISBN: 978-984-8794-69-2,
Dhaka-1100.
Kabir, S.M.S. (2018). Psychological health challenges of the hill-tracts region for climate
change in Bangladesh. Asian Journal of Psychiatry, Elsevier,34, 74–77.
Kabir, S.M.S., Aziz, M.A., & Jahan, A.K.M.S. (2018). Women Empowerment and Governance
in Bangladesh. ANTYAJAA: Indian journal of Women and Social Change, SAGE
Publications India Pvt. Ltd, 3(1), 1-12.
Alam, S.S. & Kabir, S.M.S. (2015). Classroom Management in Secondary Level: Bangladesh
Context. International Journal of Scientific and Research Publications, 5(8), 1-4, ISSN
2250-3153, www.ijsrp.org.
Alam, S.S., Kabir, S.M.S., & Aktar, R. (2015). General Observation, Cognition, Emotion,
Social, Communication, Sensory Deficiency of Autistic Children. Indian Journal of
Health and Wellbeing, 6(7), 663-666, ISSN-p-2229-5356,e-2321-3698.

Basic Guidelines for Research SMS Kabir


Chapter - 9 Methods of Data Collection Page 276

Kabir, S.M.S. (2013). Positive Attitude Can Change Life. Journal of Chittagong University
Teachers’ Association, 7, 55-63.
Kabir, S.M.S. & Mahtab, N. (2013). Gender, Poverty and Governance Nexus: Challenges and
Strategies in Bangladesh. Empowerment a Journal of Women for Women, Vol. 20, 1-12.
Kabir, S.M.S. & Jahan, A.K.M.S. (2013). Household Decision Making Process of Rural Women
in Bangladesh. IOSR Journal of Humanities and Social Science (IOSR-JHSS), ISSN:
2279-0845,Vol,10, Issue 6 (May. - Jun. 2013), 69-78. ISSN (Online): 2279-0837.
Jahan, A.K.M.S., Mannan, S.M., & Kabir, S.M.S. (2013). Designing a Plan for Resource
Sharing among the Selected Special Libraries in Bangladesh, International Journal of
Library Science and Research (IJLSR), ISSN 2250-2351, Vol. 3, Issue 3, Aug 2013, 1-20,
ISSN: 2321-0079.
Kabir, S.M.S. & Jahan, I. (2009). Anxiety Level between Mothers of Premature Born Babies
and Those of Normal Born Babies. The Chittagong University Journal of Biological
Science, 4(1&2), 131-140.
Kabir, S.M.S., Amanullah, A.S.M., & Karim, S.F. (2008). Self-esteem and Life Satisfaction of
Public and Private Bank Managers. The Dhaka University Journal of Psychology, 32, 9-
20.
Kabir, S.M.S., Amanullah, A.S.M., Karim, S.F., & Shafiqul, I. (2008). Mental Health and Self-
esteem: Public Vs. Private University Students in Bangladesh. Journal of Business and
Technology, 3, 96-108.
Kabir, S.M.S., Shahid, S.F.B., & Karim, S.F. (2007). Personality between Housewives and
Working Women in Bangladesh. The Dhaka University Journal of Psychology, 31, 73-
84.
Kabir, S.M.S. & Karim, S.F. (2005). Influence of Type of Bank and Sex on Self-esteem, Life
Satisfaction and Job Satisfaction. The Dhaka University Journal of Psychology, 29, 41-
52.
Kabir, S.M.S. & Rashid, U.K. (2017). Interpersonal Values, Inferiority Complex, and
Psychological Well-Being of Teenage Students. Jagannath University Journal of Life and
Earth Sciences, 3(1&2),127-135.
--------------------------

Basic Guidelines for Research SMS Kabir

View publication stats


AN ASSESSMENT OF THE RELIABILITY OF
SECONDARY DATA IN MANAGEMENT SCIENCE
RESEARCH

OLABODE, Segun Oluwaseun1*; OLATEJU Olawale Ibrahim2; BAKARE Akeem


Abayomi3

1. Department of Management Technology, Faculty of Management Sciences, Lagos State


university, Lagos, Nigeria.
2. Department of Management Technology, Faculty of Management Sciences, Lagos State
university, Lagos, Nigeria.
3. Department of Management Technology, Faculty of Management Sciences, Lagos State
university, Lagos, Nigeria.

 Corresponding Author: segun.olabode@lasu.edu.ng

ABSTRACT
Secondary data (SD) provides major advantage in the use of existing data sources, with large
amounts of information, at relatively cheaper cost and easily available for research purposes.
Even some researchers argue that millions of person-years of experience in the database will
be available through SD, which would be impossible to collect in prospective studies. But an
unreliable data could impede on the quality of research results and conclusions. The study
critical examination of literature has identified tools that can aid the assessment of SD
reliability. The study believes that the use of the adjusted inter-raters/observer as proposed by
the study will add value to the method of assessing the reliability of SD, because of it use of
statistical tools to directly estimate the available data. The study also believes that this will
serve as a base for other researchers to improve on the study of assessing the reliability of
secondary data.
KEYWORDS: Inter-raters/observer, Reliability, Secondary data, Validity

1
1.0 Introduction

The quality of data (primary or secondary) utilised in any research determines the

outcome of the research and its importance for further research work and relevance to business

or statistical institutes. Thus, the quality of the enormous data collected daily by relevant

organisations and/or individuals (e.g. government agencies, universities, private organisations,

non-profits, think tanks, public opinion polls, and students) in recent years should be of

importance to any system/institution especially the academic environment. Most times, vast

amounts of primary data are collected and archived by relevant institutions or researchers at

points in time all over the word. These results have made more prevalent the possibility of

utilizing exiting data tor research at a later point in time-i.e. use of secondary data (Andrews,

Higgins, Andrews, Lalor, 2012; Johnston, 2014; Smith, 2008).

Depending on the researcher's perception, the term “secondar datal” (SD) are data or

information that was either gathered by someone else (researchers, recognized organisations

acceptable to a system, etc.) for records or other purpose than the one currently under

consideration, or often a combination of the two (Cnossen, 1997; McCaston, 2005) and is

thus sometime referred to as “second-hand" data. For the first researcher they are primary data,

but for the second researcher, they are secondary data (Peter & Piet, 2012). To Weijun (2008),

SD include both raw data and published summaries.

SD sometimes save the researcher the time that would have been spent on the field

collecting data and, accessing the area under study. It can provide relatively large database of

good quality, that may not be feasible for any individual researcher to collect. However, some

researchers in business and management studies especially indicants as proxies for constructs,

perhaps due to concerns over the possibility of been outdated, inaccurate or validity issues

(Houston, 2004; Houston & Johnson, 2000; Schutt, 2006).

2
Obtaining SD today could be a relatively routine and easy process depending on the

environment (i.e. how often such environment updates its records and what kind of records are

available). SD may be quite expensive, however, the upfront costs s c as registration fees, have

dropped with the emergence of the World Wide Web and the increase in the numbers of

digitally published research sites. (Routledge, 2004).

Depending on the environment in which the SD is collected and the purpose of

collection, SD can be beneficial especially to management sciences, for example, SD collected

from academic publications may have a high degree of background work needed for the present

research in the literature reviewed. Its use in such publication could already have promoted the

data in media and management academics environment. Hence it could make its pre-established

degree of validity and reliability need not be re-examined by the researcher or environment who

is re-using such data. It could also in some cases, be a baseline for comparison with collected

primary data results to determine the originality of the present data (Management Study Guide,

2016).

However, SD has its own shortcomings as identified earlier. The SD may be outdated,

inaccurate or have validity issues. It may not be relevant to the population under examination,

or detail enough. For example, an administrative data, transactional data or data from the

Internet, which is not originally collected for research, may not be available in the usual

'research formats‟ or may be difficult to get access to. This exposes researchers to possible

errors that can affect the quality (reliability and validity) of the data and invariably affects the

viability of the research.

This study therefore, believed a critical examination of the concept, and assessment

tools in the reliability of secondary data is essential to aid management research.

1.1 Statement of the Problem

3
As identified earlier, SD may be advantageous especially in term of cost as a result of

the large database it can provides for management research innovation, productivity, and

drawing conclusions in academics' research. However, when utilizing SD to help draw

important conclusions in academic research, failing to check the reliability of that data could

lead to inaccurate analyses and inappropriate research findings and conclusion. This may be due

to some of the following. First, with today's accessibility to data via the internet, anyone can

publish anything from anywhere (Stewart, 2014). Secondly, some organisations fraudulently

manipulate information to give investors and client an impression that may not reflect their true

state, some organisations don't post or give out detail information/data needed for

comprehensive business and management research (especially details of their working capital

and other financial variables that can aid a comprehensive research) (Bankole, 2003; James &

Oyeniyi, 2017; Shabnam, Zakiah, & Mohd, 2016; Vlad, Tulvinschi & Chirita, 2011). Thirdly,

as identified by Babbie (2010); Cowton (1998); Flintermann (2014); using SD means

possibility of inappropriateness of data in research, little or no control over how research data

was generated and collated, possible modification of data by a researcher, a potential poor

documentation, that could make the data neither valid or reliable. Hence, the use of SD based

on face value without checking for potential errors and bias before it is used (Flintermann,

2014) or determining the reliability cannot be trusted for business and management research.

This corroborates the theory identified by Priezkalns (2016) that some researchers believed only

primary source of data can be trusted. This situation is one of the several reasons some

researchers in the academic field of business and management avoid SD sources in their

research.

Though several researchers have tried to examined how to improve researchers

confidence in the use of SD, by developing tools/methods of assessing SD. There have been

drawback as a result of limited literature. Thus, Flintermann (2014) is of the opinion that,
4
available literatures have not been able to identify a suitable tool/method for the assessment of

the reliability and validity of SD. Despite the various literature identified by few researchers in

this area, there is a dearth knowledge of how secondary information is correlated with the

primary data in business and management research and the solution to it. This also corroborated

the belief of Andrews et al., (2012); Johnston, (2014); Smith, (2008) that there remains a dearth

of literature that specifically addresses the process and challenges of conducting a reliable

secondary data analysis research.

Thus taking a critical examination of the above, this study provides an exploratory of

the available -tools available in management sciences to determine the reliability of SD.

1.2 Objectives of the Study

i) To examine the concept of secondary data, validity and 'reliability' of secondary

data.

ii) To assess the available tools for determining the reliability of secondary data in

management/business research.

iii) To identify which criteria can be used to assess the reliability of secondary data in

management/business research.

2.0 Literature Review

The study examined various work from literature because the conceptual frame work

needed to address the first objective of the study.

2.1 Conceptual Framework

According to Johnston (2014), the concept 'secondary data analysis‟ was first identified

by Glaser's in the discussion of re-analyzing data; i.e. data which were originally collected for

other purposes. Weijun (2008) is of the opinion that SD include both raw data and published

summaries. To Weijun, most organisations collect and store a variety of data to support their

operations. These data are available only in the format the organisation that produce want it,
5
thus most likely required negotiation for it to be accessed. Researchers like Bankole (2003);

Oyeniyi, Obamiro, Abiodun, Moses, & Osibanjo (2016); believe that SD is an existing

information whose main source is from primary sources. To Boslaugh (2007) the difference

between SD and primary data depends on the relationship between the individual/research team

who collected a dataset and the researcher who is analyzing. Boslaugh (2007) concept is an

important one because the same data set could be primary data in one analysis and secondary

data in another depending on the time interval, purpose and environment. For example, three

researchers A, B & C examined the relationship between two research variables. While A used

system D as a case study, B used system E as a study area. While A & B collected data on the

field, researcher C analysed data collected from A & B for comparison of the relationship in the

two environment within the same time frame. Since data collected by A & B from the field

(primary data) were for different purpose, the same data given to C will be seen as a secondary

data.

Hakim (as cited in Johnston, 2014) believed secondary data analysis is any further

analysis of an existing dataset which presents interpretations, conclusions or knowledge

additional to, or different from, those presented in the first report on the inquiry as a whole and

its main results. Irrespective how researchers or professional alike conjure the definition or

concept of SD, the time interval that differentiate between the original purpose of the data

collection and later purpose differentiate it from primary data. This, is in convergence with the

view of Watson (2013) that see SD as analytical works that comment on and interpret other

works from primary sources and are thus "second hand, published accounts, because they are

created after primary sources and they often use or talk about primary sources.

Data Reliability (DR) is a concept every researcher, especially in business, management,

social sciences and basic sciences, are aware of (Shuttleworth, 2009). To Shuttleworth, (2009)

it could be a way of maximizing the inherent repeatability or consistency in collated data. For
6
maintaining reliability internally, a researcher will use as many repeat sample groups as

possible, to reduce the chance of an abnormal sample group skewing the results. For example,

if three replicate samples for each analysis, and one generates completely different results from

the others, then there may be something wrong with the data collated.

To Golafshani (2003), it is the extent to which sampled research results are consistent

over time and the accuracy of representation of the total population under study'. If there results

of a study can be reproduced under a similar methodology, then the research instrument is

considered to be reliable. This concept is not different from the works of most researchers like

Bankole (2003); Oyeniyi, et al (2016); Phelan & Wren (2006); Roberta & Alison (2015). They

described reliability as the consistency between independent measurements of the same

phenomenon or consistency of a measuring instrument to produce the same result repeatedly

when applied to the same object. This concept shows that, reliability is a measure of the level of

consistency of the research instrument and not the data. Though, the initial data generated at

different period with the instrument is used to assess the reliability of the instrument. This

concept is in convergence with the concepts of researchers like Babbie (2010); Flintermann

(2014); Pierce (2009); Tasic & Feruh (2012) etc. To these researchers, reliability is the degree

to which a research instrument or process consistently yields the same results under the same

conditions, regardless of how many time the process is repeated, or degree to which a

researcher can rely on the source of the data and therefore on the data itself. Thus to

Flintermann, researchers can improve the reliability of their research instrument by repeatability

and increasing its internal consistency. He further identifies the following as cited in the work

of Golafshani, (2003) that reliability can be estimated by using the following tests especially in

quantitative research:

 Inter - Rater/Observer reliability: Degree to which different raters/observers are giving

the same answers or estimates


7
 Test-Retest Reliability: Consistency of a measure over time

 Parallel - Forms reliability: The reliability of two tests constructed the same way, from

the same content

 Internal consistency reliability: Consistency of results across items, often measured as

Cronbach‟s Alpha

In science, the definition is the same, but needs a much. narrower and unequivocal

definition. Thus, Shuttleworth (2009) believed that just as in sciences reliability is extremely

important externally. This is because in science, the theory is that another researcher should be

able to perform exactly the same experiment, with similar equipment, under similar conditions,

and achieve exactly the same results least the design is unreliable. For example, the cold fusion

case, of 1989 where Fleischmann and Pons announced to the world that they had managed to

generate heat at normal temperatures, instead of the huge and expensive tori used in most

research into nuclear fusion. These findings shook the world, but other researchers that

attempted to replicate the experiment, experience no success. Thus the conclusion is that, it is

either the researchers lied, or genuinely made a mistake. Neither of the conclusion is unclear,

but their results were clearly unreliable.

Just as Shuttleworth (2009) identified the similarities in concept, this study summarized the

concept of data reliability as stated by Adefioye (2016) as the consistency, ability and

repeatability of results i.e. the result of a researcher is considered reliable if consistent result

have situations but different circumstances. It can also be overall consistency, accuracy and

completeness of a measure of repeatability of findings from processed data, given the uses they

are intended for. In this context, reliability means that data are reasonably complete and

accurate, meet the intended purposes, and are not subject to inappropriate alteration.

 Completeness refers to the extent that relevant records are present and the fields in each

record are populated appropriately.


8
 Accuracy refers to the extent that recorded data reflect the actual underlying

information.

 Consistency, a subcategory of accuracy, refers to the need to obtain and use data that are

clear and well defined enough to yield similar results in similar analyses (Adefioye,

2016).

It should be of note that, while researchers like Adefioye (2016) used the construct

'accuracy' in term of the actual underlying information within the stem understudy, others like

Oyeniyi, et al (2016) used it in term of expected underlying information within the system

understudy. Thus, Oyeniyi, et al (2016) believed that accuracy cannot be used to conceptualize

reliability. For example, a critical examination of the faulty wrist watch example stated by

Oyeniyi, et al (2016) shows that, the time it read is consistence with the information that it will

always be ten minutes late but not consistence the expectation that a normal wrist watch will

read the actual time. Hence, depending on the perception of the researcher the concept of

reliability might show slight variation in the use of constructs. But one unifying construct in the

concept of reliability is 'consistency'.

Assessing secondary data reliability can entail reviewing existing information about the

data, which may include interviewing officials of audited Organisation; performing simple

analysis on the sample of data, including advanced electronic analysis; tracing to and from

source documents; and reviewing selected system controls (Shuttleworth, 2009). This

collaborate Corillo (2014) who argue that, an assessment of the reliability of data will involve

an assessment of the method(s) used to collect the data. Corillo (2014) also argue that, it will

depend on the source of the data been assessed. For example, for documentary source, it is

unlikely that there will be a formal methodology describing how the data were collected. But in

report attention is given to how the data were analysed and how the result are report.

9
Flintermann (2014) argued that researchers improve on the quality of their research if

they can assess the reliability of the research instrument or process used to generate and collet

data. But he stressed that, this does not only depend on source of data but also depends on if it

is a quantitative and qualitative research. To Flintermann (2014, while researchers believe in

the reliability of research instruments or process used in quantitative research, there is little or

no acceptable criteria in the assessment of the reliability of qualitative research be it primary or

secondary data.

According to Flintermann, (2014)

“Without the certainty of numbers and p-vetues, qualitative research expresses a loss of

confidence within and outside the field. Instead of explaining how reliability can be

attained and estimated; leading qualitative researchers either suggested the adoption of

new criteria or argued that reliability is an issue solely belonging to the quantitative

research. As much as researchers and methodologists agree upon the definition and

measurement of reliability in quantitative research the less agreement exists in

qualitative research. From a quantitative point of view/ reliability and its measurement

is clearly defined. In qualitative research the answer to what reliability is and how to

measure it is not as clear/ as many discussions exist”.

Thus researchers like Golafshani (as cited in Flintermann, 2014), are of the opinion that,

though the concept of reliability is used for both qualitative and quantitative research, the most

important test of a qualitative study is its quality, if researchers take the idea of testing as a way

of retrieving information. But to other researchers like Stenbacka (as cited in Flintermann,

2014) the concept of reliability is not applicable or pertinent or it even giving the wrong

impression in qualitative research as it is difficult to differentiate between the researcher and

method used. He further stressed that, the level of consistency required in quantitative research
10
does not have any value in qualitative research. Stenbacka (as cited in Flintermann, 2014) then

concluded that, rather than discussing reliability of qualitative research, it is better for

researchers to make the whole process (preparation, data gathering, analysis) visible. Hence,

Morse, Barrett, Mayan, Olson & Spiers (2002) identified new terms that can be introduced as

parallel concepts of reliability in qualitative research. These terms are consistency,

confirmability and dependability. To them, consistency can be achieved when the research

process can be verified from the raw data collection over data reduction to the findings. While

confirmability refers to the degree to which researchers actually arrived at their research

findings and interpretations or degree to, which others can confirm results (Flintermann., 2014;

Koch, 2006)

Irrespective of the new terms used to define reliability and under whichever type/source

of research, the above examination of the concept, of SO reliability shows the following;

 Reliability of research instruments/process is of vital concern as it is seen as a sign of

generating quality data research finding and conclusion.

 Researchers determine the reliability of research instrument/process using initially

generated data, with the believe that a reliable instrument will generate a reliable data

 While it is easier to assess the reliability of SD in quantitative research the reverse is the

case for qualitative research.

Also, Flintermann (2014) proposed that reliability of research instrument/process

especially qualitative research, can be increased if the researcher can provide an insight into

how findings and interpretations were achieved, repeatability of the research (if necessary) and

describing changes in procedures.

It should be noted that, since researchers determine the reliability of research

instrument/process .and not the data (Wayne, 20 14), the situation is different ·for both types of

data. This is because it is easier to assess the, reliability of basic primary data collection
11
instruments like Questionnaire. Interview, Observation and Reading (Annum, 2017) because of

the availability of initial run of data, But assessing an existing 'document (in the form of

government publication, earlier research, personal records and clients' records, Vivek, 2011), or

non-document (in the form of tape and video recordings, pictures, drawings, films and

television programmes, DVO/CD, Weijun, 2008) the tools available from which secondary data

are gathered may not be easily execute with direct statistical tools .. This may be assessed by

the use of a step by step' method of analysis.

2.2 Theoretical Framework

The study examined various 'work from' literature 'because the theoretical framework

needed to address the second and third objectives-of the study. In this' regard, 'the common

errors influencing reliability, two theories (Delphi and Triangulation theories) used to improve

an existing model, and some models developed to assess the reliability of secondary data are

examined. The study then concluded with a list of criteria for the evaluation of reliability of SO

in research as identified in research and the study proposed model.

Also as identified earlier, Flintermann (2014) opinion that available literatures have not

been able to identify a suitable tool/method for the assessment of the reliability and validity of

SD. This also affect the availability of enough theory on the study. This study therefore used a

method of conceptual derivative and importation of theories from other field to explain existing

models assessed in the study.

3.0 Methods of Estimating Research Instrument Reliability

As identified earlier in the study, Inter-Rater/Observer, Test-Retest, Parallel - Forms,

and Internal consistency are the basic tools of estimating research instruments reliability.

Leading researchers like Adefioye (2016); Bankole (2003); Oyeniyi, et al (2016) etc. believed
12
these tools are basically used to estimate reliability of primary data research instruments. But a

careful observation of the Inter-Rater/Observer method by the study observed that, it can be

used to estimate the reliability of data directly and not just the research instrument. This made,

it a, possible method of assessing the reliability of SD since it may not necessarily require an

initial run of data.

According to most researchers Inter-Rater/Observer involved' the use of human expert

as a part measurement procedure, to assess the consistency and invariably reliability of data,

For example, a researcher that required and collected working capital of an organisation(s) as

data for a study, can assess the data reliability. This is by estimating the consistency in the

responds of two expert observers regarding the possible level of the degree of error and

biasness.

William (2006) identified two methods to actually estimate inter-rater reliability First, if

the measurement consists' of categories the raters check off which category each observation

falls in the researcher calculates the percent of agreement between the raters. To William

(2006), it may be seen to some researchers as a crude measure, but it does give art idea of how

much agreement exists, and it works no matter how many categories are used for each

observation. Second if the measurement is a continuous one This involve calculating the

correlation between the ratings of the two observers. The correlation estimate will determine the

reliability or consistency in the responds of the raters and invariabty that of the data.

In other to improve this method, two adjustments are proposed to modify the Inter-

Rater/Observer method for assessment of the reliability of secondary data. The modification

could be based on the Delphi theory and the triangulation theory. This is because these theories

are similar to the existing inter-rater/observer method but allow the use of more than two

experts. This will involve the use of' coefficient of multiple correlation statistical tool to

evaluate the degree of consistency of the expert.


13
3.1 The Two Adjustments Proposed

As identified earlier, instead of restricting it to two expert observers, more than two

observers could be used. This will involve calculating the coefficient of the multiple

correlations between the ratings of the observers. The higher the result the more reliable the

data. It can also be done using the triangulation theory. The theory involves the use of multiple

independent source of data to establish the truth and accuracy of a claim. Hence, can be used to

assess the validity of data instruments and data.

This can invariably establish the reliability of data as researchers like Oyeniyi, et. al

(2016); Wayne (2014} etc. have identified that, though, a reliable data does not necessarily

mean a valid data, but a valid data mean's a reliable data. Hence assessing the validity of a SD

means assessing its reliability.

3.2 The Triangulation Theory

To Sagor (2000), the triangulation theory is similar to how legal practitioners or

researchers (defense lawyers and prosecutors) convince a jury of the essential truth and

accuracy (validity and reliability) of their cases. This is done through the twin processes of

corroboration and impeachment. To convince a jury to believe their witnesses, another

independent witnesses is brought in. As an additional witness corroborates the first witness, it

increased the confidence the juror will have in the initial testimony. The more independent

testimony from witnesses that support the initial witness before a jury, the more the jurors will

trust the truthfulness and accuracy of the claims. Conversely, the reverse is the case if lawyers

want the jury to doubt the truth and accuracy" (validity and reliability) of the other side, they try

to impeach (challenge the credibility of) the testimony of as many as presented by the lawyer.

Thus if as many as possible expert can pass a consistent judgment on a set of data

rated/observed then the validity and reliability can be established.

3.3 The Delphi Theory


14
Using the Delphi theory, a structured communication technique initially developed as a

systematic, interactive forecasting method. Experts will assess the data in two or more rounds.

After each round, a facilitator provides an amonized summary of the experts ratings from the

previous round as well as reasons for their judgments. Thus, experts are encouraged to review

their rating/observation in view of the responds of the other experts. The process is deemed

optimal after an acceptable level of consistency in their responds. The coefficient of multiple

correlations is then used to assess the reliability of final rating/observation of the experts. The

higher the result the more reliable the data.

The above tool/method of assess SD reliability is slightly different from other

tools/methods developed by most researchers. This is because, the above shows a direct use of

statistical tools/methods in assessing the reliability of SD. But most researchers use an indirect

qualitative step by" step methods as tools for assessing the reliability of SD. Some of these tools

were identified in the studies examined in subsequent paragraph of this study.

FAO in their evaluation of the quality of both the source of SD and the data itself;

categorized the problems that may reduce quality as shown in table 1 below. The organisation

is of the opinion that if SD is analysed for each category, then the quality of the data can be

improved. FAO also presented a flow chart as shown in figure 1 depicting the decision path that

should be followed when using secondary data. The flowchart has two phases. The first phase

relates the relevance of the SD to the research objectives. The second phase is concern with

questions about the accuracy of SD.

Table 1: Categories of Problems that may Reduce Quality

Definitions Researchers have to be careful of variables definitions when making use of secondary data.
For example, researchers with interest in rural communities and their average family size. If
published statistics are consulted, then a check must be done on how terms such as “family
size” have been defined. They may refer only to the nucleus family or include the extended
family. It should be noted that definitions may change overtime and where this is not
recognized erroneous conclusions may be drawn. Geographical areas may have their
boundaries redefined, units of measurement and grades may change and imported goods can
be reclassified from time to time for purposes of levying customs and excise duties.
Measurement error When a researcher conducts fieldwork she/he estimates inaccuracies in measurement through
15
the standard deviation and standard error, which are sometimes not published. They may
require speaking to the individuals involved in the collection of the data to obtain some
guidance on the level of accuracy of the data. The problem is sometimes not so much „error‟
but differences in levels of accuracy required by decisions makers. When the research had to
do with large investments in, say, food manufacturing, management will want to set very tight
margins of error in making market demand estimated. In order cases, having high level of
accuracy is not so critical. For instance, if a food manufacturer is merely assessing the
prospects for one more flavor for a snack food already produced by the company then there is
no need for highly accurate estimates in order to make the investment decisions.
Source bias Researchers have to be aware of vested interests when they consult secondary sources. Those
responsible for their compilation may have reasons for wishing to present a more optimistic or
pessimistic set of results for their organisations. For example, for officials responsible for
estimating food shortages to exaggerate figures before sending aid requests to potential
donors. Similarly, and with equal frequency, commercial organisations have been known to
inflate estimates of their market shares.
Reliability The reliability of published statistics may vary over time. It is not uncommon, for example, for
the systems of collecting data to have changed over time but without any indication of this to
the reader of published statistics. Geographical or administrative boundaries may be changed
by government, or the basis for stratifying a sample may have altered. Other aspects or
research methodology that affects the reliability of secondary data is the sample size, response
rate, questionnaire design and modes of analysis.
Time scale Most censuses take place at 10 years intervals, so data from this and other published may be
out-of-date at the time the researcher wants to make use of the statistics. The time period
during which secondary data was first complied may have a substantial effect upon the nature
of the data. For instance, the significant increase in the price obtained for Uganda coffee in the
mid-90‟s could be interpreted as evidence of the effectiveness of the rehabilitation programme
that set out to restore coffee estates which had fallen into a state of disrepair. However, more
knowledgeable coffee market experts would interpret the rise in Uganda coffee prices in the
context of large scale destruction of the Brazilian coffee crop due to heavy frost, in 1994,
Brazil being the largest coffee producer in the world.
Sources of data Whenever possible, researchers ought to use multiple sources of secondary data. In this way,
these different sources can be cross-checked as confirmed of one another. Where differences
occur an explanation for these must be found or the data should be set aside.
Source: Adopted from FOA Corporate Document Repository: Agriculture and Consumer
Protection, (n.d.)
Figure 1: Evaluating secondary data

Does the data help address specified


research question? No
Stop
Yes
Does the data apply to the population No
Stop
of interest?
Yes
Does the data apply to the time of No
Stop
interest?
Yes
Are the definitions, data collection
No Can the data
methods ad systems of measurement
be revised?
Sto
known and acceptable?
Yes

Consult the original data if possible


Yes
16
Does the value of information exceed No
Stop
the cost of its acquisition?
Yes

Is the risk of bias high?


Yes

Can the data be verified?


Yes
Use data
Source: Adapted from FOA Corporate Document Repository: Agriculture and Consumer
Protection, (n.d.)

A critical examination of the above simple flow chart shows some simple deficiencies.

One, it did not state what next if the data can be revised. Two, the researcher has to determine

its tool(s) of analyzing the risk of bias. Three, it did not examine the possibilities if error in

measurement of variables etc. but as identified in the last category in the study of FAO,

researchers are advised to use multiple sources of secondary data. Similarly, the use of one or

more other tools of analysis with the flow chart could improve the assessment of reliability of

SD. For example, table 2 shows, a Flintermann, (2014) summary of errors and issues that may

have an impact on the use of SD divided into different categories as identifies by (Tasic &

Feruh, 2012). A combination of table 2 as an assessment tool with the flow chart can help

researchers overcome some of the deficiencies in the flow chart in improving the quality of

research result that use SD.

Table 2: Errors and Issues in Secondary Data

Errors and issues in Caused by


secondary data
Manipulation The organization gathering data may
Errors that can invalidate manipulate/reorganize data to meet a purpose
data unknown to other. Collecting agency may want to
show that the organization goal is met.
Inappropriate, confusion or a) Organization might collect, organize and
carelessness distribute data without properly specifying the
particulars of the collection process or assembly
procedures
b) Organization may not care about data quality or
validity

17
c) Organization‟s staff may not know how to collect
data
Concept error Concept error arise because of the difference
between the concept to be measured and a specific
item that is used to measure a concept. Data
containing error can still be use, however, only if
something is known about the nature of the error.
Changing circumstances Changes affecting data series which are not readily
apparent in that data series. e.g. change in
geographical boundaries, change in underlying unit
of measurement
Inappropriate transformations Original data in secondary data sources is often
presented in categories or tables that make the data
more presentable or the original categories do not
Errors requiring data reflect an analyst‟s needs to handle the task at hand.
reformulation Inappropriate temporal Secondary data often not available for intervening
extrapolations periods between published reports. Data for these
periods need to be interpolated from the nearest two
reporting years. Not knowing the true change
between two these two points, any answer can be
obtained for the point of time in question.
Inappropriate temporal Arising from a misunderstanding of the time
recognition dimension of secondary data. There is always a time
lag between the gathering of primary data and the
time when it is made available.
Correct(ed) data Data can be inconsistent form one report to another
Errors reducing reliability in the same published series because of errors that
have been discovered, corrected and then reflected in
subsequent version of the data set. Or publisher of
secondary data can adjust forecasts for a decimal
year against actual census numbers.
Changes in collection Occurs due to different methods or circumstances
procedures surrounding the collection, e.g. time of collection,
way of summarizing data. Generated data can be
quite different from previous data in the dame data
set.
Clerical errors Occurs because of the transposition of numbers in a
series with the same number of digits or the
misplacing a decimal. Outliers can be easily detected
by creating diagrams or tables.
Source: Adapted from Flintermann (2014)

Flintermann (2014) is of the opinion that available literatures have not been able to

identify a suitable tool/method for the assessment of the reliability and validity of SD. But

based on study theoretical framework, Flintermann (2014) developed a set of criteria in five

categories as shown in table 3 for the assessment of reliability of SD. The table also shows

indicators of reliability or validity and the level of reliability or validity if these indicators are

found.

Tables 3 Criteria for assessing reliability/validity in a market research report

18
Indicators for Reliability and Validity Level of Level of
reliability validity
and
validity
Detailed description on which type of research was used High High
Definition of research variable High High
Clear specification of Sources used stated High High
data collection and data Information on how it dealt with missing data High High
analysis Date of collection available High High
Information on how quality of data used is controlled High High
Transforming data (from raw data to result) High High
Information about method used e.g. statistical tests High High
Coding of data whether and how? High High
Clear organisation of data High High
Contact data presented High High
Clear specification about Information about changes in methods used from one study to High High
potential changes in another
procedure Information about changes in sources used from one study to High High
another
Information about changes in definitions used from one study to High High
another
Updates Due to error correction – information given High High
Due to new version of report High High
Result of comparing data Dataset similar to each other High High
collected out the research
concepts with the actual
research concepts data
Missing researchData for missing variable could be found using other sources Low Low
variables report that the research
Data for missing variables could partly be found using other Medium Medium
sources that the research
Data for missing variable could not be found using other High High
sources that the research
Source: Adapted from Flintermann, (2014)
3.4 Clear specification of data collection and data analysis

This category refers to how the process of data collection and analysis are described.

Government recognized agencies, private organisations, or researches involve in the collection,

organisation and distribution of data may poorly specify the particulars of the collection

process, the data procedures (methodology), insight into the whole process (gathering,

collection, analysis) and sources used. The specification of these indicators can make it easier

to replicate the study using the same procedures. In qualitative research, consistency and

conformability can be achieved by stating the whole process of data preparation, gathering and

anaivsis. If a researcher presents the whole research process, reliability is considered to be high

(Flintermann, 2014).

19
3.5 Clear specification about potential changes in procedures

This categories deal with possible changes in procedures. Reliability is at an on

acceptable high level if the information on changes in methods, sources or definitions can be

found and judgements on how these affect the data.

Updates

This category comprises of updates, which are either made due to error correction or

new methods of data presentation or new version of a research reports. Data can be even

inconsistent from one report/presentation to another in the same published series because of

errors being discovered and corrected in subsequent versions. For example, different

researchers using data collected from the same organisation shows obvious and unacceptable

variation In input data.· Corrections of errors may also be caused by using inappropriate

methods or sources or by using methods for data processing in an incorrect manner.

3.6 Comparing data collected outside of research scope to research scope

Theoretically, sources outside of research scope should provide data that is similar/equal

to research scope data since reliability of an instrument is expected to yields the same results

regardless of the number of repetitions. Hence, a search for data about the automotive industry

using the same definitions, years and units in a research scope should provide similar findings

and trends. If this is the case, reliability of the data is strengthened.

Missing Research Reports

Missing data may affect the reliability of researchers' results if data collection,

processing or storage are not properly coordinated or even forgotten to be obtained. If the final

data obtained for a research was reached based on missing data, chances are high that decisions

or conclusions are made based on biased data. Missing data can have a significant effect on

conclusions drawn. If any information about missing data or processes, sources and methods

20
used are clearly documented and hence a replication can be done to identify whether data is

missing, reliability is increased.

As identified earlier, other researchers, like Koziol & Arthur (2012); Stewart (2014);

Weijun (2008) also use an indirect qualitative step by step methods as tools for evaluating the

quality of SD. But most of the steps identified by their work have been included in the earlier

reviewed literature of the study;

4.0 Concluding Remarks

SD provides major advantage in the use of existing data sources, with large amounts of

information, at relatively cheaper cost and easily available for research purposes. Even Henrik

& Jorn (1996) argue that millions of persons experience in the data bases will be available

through SD, which would be impossible to collect in prospective studies, But an unreliable data

could impede on the quality of research results and conclusions. The study critical examination

of literature has identified tools that can aid the assessment of SD reliability.

Of the tools identified, the study believed that the use of the adjusted inter-

raters/observer as proposed by the study will add value to the method of assessing the reliability

of SD, because of its use or statistical tools to directly estimate the available data.

The study also believed that, this will serve as a base for other researchers to improve on

the study of assessing the reliability of secondary data.

References
Adefioye, T. (2016). Reliability abd validity. Retrieved March 21, 2017, from
https://www.lbs.edu.ng/sites/faculty_research/crle/Downloads/Notes%20on%20Reliability%20a
nd%20Validity%20by%20Temilade%20Adefioye.pdf

Andrews, L., Higgins, A., Andrews, M. W., & Lalor, J. G. (2012). Classical grounded theory to analyse
secondary data: Reality and reflection. The Grounded Theory Review, 11(1), 12-26.

Annun, G. (2017). Research instrument. Retrieved May 2, 2017, from


http://campus.educadium.com/newmediart/file.php/137/Thesis_Repository/recds/assets/TWs/Ug
radResearch/ResMethgen/files/notes/resInstrsem1.pdf

21
Bankole, A. R. (2003). Research methods: An introductory approach. Lagos: Adeshina Press Production
and Publication.

Barbie, E. (2010). The practice of social research (12th ed.). Belmon, USA: Wadsworth Cengage
Learning.

Bill, G. (2015). Is America abandoning research? Retrieved February 7, 2017, from


https://discoveryourtruenorth.org/1608-2/

Boslaugh, S. (2007). Secondary datasources for public health: A practical guide. London, U.K.:
Cambridge University Press.

Carillo, P. R. (2014). Secondary data analysis. Retrieved May 10, 2017, from
https://www.slideshare.net/honeyturqueza/secondary-data-analysis-report

Cnossen, C. (1997). Secondary research: Learning paper 7. Retrieved May 15, 2017, from
https://www.coursehero.com/file/p39mt1eo/20-No-2-Cnossen-C-1997-Secondary-Research-
Learning-Paper-7-The-Robert-Gordon/

Darren, L., & Gareth, H. J. (2013). Introduction to research methods and data analysis in Psychology
(3rd ed.). London, U.K.: Pearson Education Limited.

Flintermann, B. (2014). The quality of market research reports: The case of Marketline Advantage and
the automobile industry. Retrieved May 29, 2017, from
http://essay.utwente.nl/66122/1/Flintermann_MA_MB.pdf

FOA Corporae Document Repository. (n.d.). Marketing research and information systems. Retrieved
June 1, 2017

Golafshani, N. (2003). Understanding reliability and validity in qualitative research. The Qualitative
Report, 8(4), 597-606.

Harkness, S. (2004). Social and political indicators of human well-being: Working paper. Retrieved
March 2, 2017, from https://www.wider.unu.edu/publication/social-and-political-indicators-
human-well-being

Hassard, J. (2014). Why are scientist abandoning their research? Retrieved May 2, 2017, from
http://www.artofteachingscience.org/why-are-scientists-abandoning-their-research/

Henrik, T., & Jorn, O. (1996). A framework for evaluation of secondary data sources for
epidemiological research. International Journal of Epidemiology, 25(2), 50-62.

Herrera, Y. M., & Kapur, D. (2007). Improving data quality: Actors, incentives and capabilities.
Political Analysis, 15(1), 365-386.

Houston, B. (2004). Assessing the validity of secondary data proxies for new constructs. Journal of
Business Research, 57(2), 154-161.

Johnson, M. P. (2014). Secondary data analysis: A method of which time has come. Qualitative and
Quantitative Methods in Libraries (QQML), 3(1), 619-626.

22
Johnson, S. A., & Houston, M. B. (2000). Buyer-supplier contracts versus joint ventures: Determinants
and consequences. Journal of Marketing Research, 37(1), 1-15.

Kehinde, J., & Oyeniyi, O. (2017). Comment on mothly seminar series. Lagos State University, Faculty
of Management Sciences, Department of Business Administration.

Koch, T. (2006). Establishing rigourin qualitative research: The decision trail. Journal of Advanced
Nursing, 53(1), 91-103.

Koziol, N., & Arthur, A. (2012). An introduction to secondary daa analysis. Retrieved May 3, 2017,
from http://r2ed.unl.edu/presentations/2011/RMS/120911_Koziol/120911_Koziol.pdf

Management Study Guide. (2016). Secondary data. Retrieved May 23, 2017, from
http://www.managementstudyguide.com/secondary_data.htm

McCaston, M. K. (2005). Tips for collecting, reviewing and analysing secondary data. Retrieved May
13, 2017, from
https://www.ands.org.au/__data/assets/pdf_file/0003/713235/Tips_for_Collecting_Reviewing_a
nd_Analyz.pdf

Morse, J. M., Barrett, M., Mayan, M., Olson, K., & Spiers, J. (2002). Verification strategies for
establishing reliability and validity in quantitative research. International Journal of ualitative
Methods, 1(2), 13-22.

Norwegian Centre of Research Data. (n.d.). Data quality and comparability. Retrieved May 29, 2017,
from http://www.nsd.uib.no/macrodataguide/quality.html

Oyeniyi, O. T., Abiodun, A. J., Moses, C. L., & Osibanjo, A. O. (2016). Reesearch methodology: With
simplified ste-by-step application of SPSS package. Lagos: Pumark Nigeria Limited.

Peter, W. V., & Piet, J. D. (2012). 49 factors that influence the quality of secondary data sources: Paper
quality and risk management. Rotterdam, Statistics: Statistics Netherlands.

Phelan, C., & Wren, J. (2006). Exploring reliability in academic assessment. Cedar Falls: Iowa:
University of Norhern Iowa.

Pierce, R. (2009). Evaluating information: Validity, reliability, accuracy and triangulation. Retrieved
March 21, 2017, from https://www.sagepub.com/sites/default/files/upm-
binaries/17810_5052_Pierce_Ch07.pdf

Priezkalns, E. (2016). Revenue assurance: Expert opinions for communications providers. Chicago:
CRC Press.

Roberta, H., & Alison, T. (2015). Validity and reliabilty in quantitative studies. ResearchGate, 18(3),
50-62.

Sagor, R. (2000). Guiding school improvement with action research. Virgina, USA: Association for
Supervision and Curriculum Development.

Schutt, R. (2006). Investigation the social world. New Jersey: SAGE Publications.

23
Shabnam, F. A., Zakiah, M. M., & Mohd, M. R. (2016). Detecting financial statement frauds in
Malaysia: Comparing thr abilities of Beneish and Dechow models. Asian Journal of Accounting
and Governance, 7(1), 57-65.

Shuttleworth, M. (2009). Definition of reliability. Retrieved May 13, 2017

Smith, E. (2008). Using secondary data in educational and social research. New York: McGraw-Hill
Education.

Stenbacka, C. (2001). Qualitative research requires quality concepts of its own. Management Decision,
39(7), 551-555.

Stewart, C. (2014). How to evaluate external secondary data. Retrieved May 31, 2017, from
https://blog.marketresearch.com/how-to-evaluate-external-secondary-data

Tasic, S., & Feruh, M. B. (2012). Errors and issues in secondary data used in marketing research. The
Scientific Journal for Theory and Practice of Socioeconomic Development, 1(2), 326-335.

Vanhatalo, E., & Kulahc, M. (2015). Impact of autocorrelationon principal components and their use in
statistical process control (SPC). London: John Wiley & Son.

Vivek, A. (2011). Instrument to measure primary and secondary data. Retrieved May 29, 2017, from
https://www.slideshare.net/VivekArora2/instrument-to-measure-primary-secondary-data

Vlad, M., Tulvinschi, M., & Chirita, I. (2011). The consequences of fraudulent financial reporting. The
Annals of the "Stefan cel Mare" University of Suceava. Fascicle of he Faculty of Economics and
Public Administration, 11(1), 264-269.

Watson, K. A. (2013). Evaluating primary ans secondary sources. Retrieved May 30, 2017, from
https://www.nlj.gov.jm/rai/CSEC/Evaluating%20Primary%20and%20Secondary%20Sources.pd
f

Wayne, L. D. (2014). Terrorism, homeland security, and risk assessment via research proposal (3rd
ed.). Bloomington: Xlibris, LLC.

Weijun, T. (2008). Research methods for business student. Retrieved April 24, 2017, from
https://eclass.teicrete.gr/modules/document/file.php/DLH105/Research%20Methods%20for%20
Business%20Students%2C%205th%20Edition.pdf

William, M. (2006). Research method knowledge base: Types of reliability. Retrieved April 15, 2017,
from https://www.socialresearchmethods.net/kb/reltypes.php

24
Last updated July 2016

Evaluating sources
During your research process, you will collect a lot of information from books, articles, and websites. Sometimes it may be difficult to determine
whether a source is appropriate for your research needs. This handout is designed to help you evaluate the sources you find in your research.

Most information sources can be critically evaluated according to these basic questions:

 Audience. For whom is this source intended?


 Accuracy. Is the information in this source correct?
 Bias. Does the information in the source support a particular agenda?
 Credibility. Is the author an expert in this field?
 Currency. Is the information up to date?

This table outlines specific questions you will want to ask when evaluating books, journal articles, and websites.

 Is the text geared toward general readers, students, or specialists in the field?
Audience  Does the text use technical or scholarly language?
 Is the level of the source appropriate for your needs? Is the language difficult to understand? (if so, you might want to start with some
background information or sources written for a general audience).
 Are there theories that have since been disproven in this source? (this is especially important to determine for scientific texts)
Accuracy  Is there documentation or evidence presented for the information provided? Look for in-text references and citations or a bibliography at
the end of the article, chapter, book, or webpage.
 Is the information in the article primarily fact, opinion, or propaganda?
Bias  Is the information supported by research?
 Does the author provide sufficient evidence for their claims?
 Does the author used highly-charged or emotional language?
 Who is the author? Do they have strong ties with any organizations or corporations? Is the author a political figure? If the source was
published by an organization, look carefully for political affiliations, leanings, or any specific agenda it might have.
 Does the author present multiple sides of the issue or acknowledge other viewpoints?
 Is the author an expert in this field? What else have they written? (hint: search for the author in library databases or Google Scholar).
Credibility  Where is the author employed? Is the author associated with a group or organization that may stand to benefit from the research? For
instance, a scientific study about pain relievers may be less credible if the author works for Bayer, a major manufacturer of aspirin.
 Is the publisher well known?
 Do your sources need to be current? Currency is more important in some fields than in others. For example, an article on current medical
Currency research or case law is more time-sensitive than an essay on Aristotle.
 Does the publication date affect the article’s accuracy or introduce bias? This is especially important when researching issues in the
sciences. Although an article may be published in a very well-respected scholarly journal, if it was published in 1960, it may no longer be
considered very accurate.
Last updated July 2016

Additional tips for evaluating specific source types


Books
 Look at where the book was published. If it was published at a University Press, this could be an indicator of scholarly content. You can
always search the web for more information about the publisher.

Webpages
 Look for documentation of the information provided. Wikipedia articles often contain footnotes at the bottom of the article page, which can
often lead to valuable print and electronic sources that may be more reliable than the entry itself.
 Check the domain type, as it might influence the nature of the information you are viewing.
o Commercial sites usually end in .com. They might be trying to sell you something or promote their own product, so beware of self-
promotional language and potentially incomplete or biased information or statistics. What kind of support or information is used to
support their claims? Consider cross-checking information with other independent organizations and reviews.
o Academic sites end in .edu, but examine the URL and the page's content. Is it a library web page, or a student's personal project?
o Canadian government-related sites end in gc.ca; Nova Scotia government sites end in gov.ns.ca, and U.S. government-related sites
end in .gov. Keep in mind that reports, data and statistics, and official documents may be more reliable than general interest pages.
Consider the relationship between the level of government and the topic. Documents that are more specific to the region and
governance of a particular issue tend to be more valuable than broad international policies that could affect a local context.
o Non-profit groups such as public interest organizations, religious groups, and think tanks use the .org domain. These sites may be
biased toward the organization's point of view. Some organizations make clear their underlying philosophy, either in the very title of
the organization or through an "About Us" section or a "Mission Statement." Others may require more research to discover their
agenda.
o Is there a tilde (~) in part of the URL? This implies that a web page is a personal page, even if it's linked to a larger institution. It may
not be held to the same standards as the institution's pages, or reflect the institution's views.
 Is contact information for the author or publisher provided?
 Look for an “About the Author” or “About this site” link. What are the author’s credentials? Check a library database or Google Scholar to see
if the author has published books or articles in scholarly journals.
 Be particularly wary of bias when viewing webpages. Anyone can create a webpage about any topic. Be sure to verify the validity of the
information you find.

This handout was adapted from the following sources:


Indiana University Libraries. (n.d.). Evaluating sources rhetorically. Retrieved from https://libraries.indiana.edu/sites/default/files/Evaluating_Sources_Rhetorically.pdf
UNC Chapel Hill Libraries. (n.d.). Evaluating your sources. Retrieved from http://www2.lib.unc.edu/instruct/evaluate/index.html?section=home
University Library, University of Illinois. (n.d.). How do I…? Evaluate sources. Retrieved from http://www.library.illinois.edu/ugl/howdoi/How_Do_I_PDF_Files/Evaluate_Sources.pdf
University of Calgary Writing Centre. (2014). Evaluating sources. Retrieved from https://www.ucalgary.ca/ssc/files/ssc/wss_evaluating_source_2014.pdf
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Knowledge, Skills & Competence

Key Points

Let’s Get Started


Importance of knowledge management

Provide a brief overview of the session. Discuss the importance of knowledge, skills
and development an individual’s career standpoint.
Open up the discussion for the session and ask participants to share their thoughts on
“why do you think skill development is important”?

Understanding knowledge, skills and competence

Knowledge – Mastery of facts, range of information in subject matter area.


Skills – Proficiency, expertise, or competence in given area; e.g., science, art, crafts
Competence– Demonstrated performance to use knowledge and skills when needed

 Some important definitions:

o Interpersonal Skill: Is aware of, responds to, and considers the needs, feelings,
and capabilities of others. Deals with conflicts, confrontations, disagreements in
a positive manner, which minimizes personal impact, to include controlling one’s
feelings and reactions. Deals effectively with others in both favorable and
unfavorable situations regardless of status of position. Accepts interpersonal and
cultural diversity.

o Team Skill: Establishes effective working relationships among team members.


Participates in solving problems and making decisions.

o Communications: Presents and expresses ideas and information clearly and


concisely in a manner appropriate to the audience, whether oral or written.
Actively listens to what others are saying to achieve understanding. Shares
information with others and facilitates the open exchange of ideas and
information. Is open, honest, and straightforward with others.

o Planning and Organizing: Establishes courses of action for self to accomplish


specific goals [e.g., establishes action plans]. Identifies need, arranges for, and
obtains resources needed to accomplish own goals and objectives. Develops and
uses tracking systems for monitoring own work progress. Effectively uses

Page 64 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
resources such as time and information.

o Organizational Knowledge and Competence: Acquires accurate information


concerning the agency components, the mission[s] of each relevant organizational
unit, and the principal programs in the agency. Interprets and utilizes information
about the formal and informal organization, including the organizational
structure, functioning, and relationships among units. Correctly identifies and
draws upon source[s] of information for support.

o Problem Solving and Analytical Ability: Identifies existing and potential


problems/issues. Obtains relevant information about the problem/issue, including
recognizing whether or not more information is needed. Objectively evaluates
relevant information about the problem/issue. Identifies the specific cause of the
problem/issue. Develops recommendations, develops and evaluates alternative
course of action, selects courses of action, and follows up.

o Judgment: Makes well-reasoned and timely decisions based on careful, objective


review and informed analysis of available considerations and factors. Supports
decisions or recommendations with accurate information or reasoning.

o Direction and Motivation: Sets a good example of how to do the job;


demonstrates personal integrity, responsibility, and accountability. Provides
advice and assistance to help others accomplish their work. Directs/motivates
self.

o Decisiveness: Identifies when immediate action is needed, is willing to make


decisions, render judgments, and take action. Accepts responsibility for the
decision, including sustaining effort in spite of obstacles.

o Self-Development: Accurately evaluates own performance and identifies skills


and abilities as targets of training and development activities related to current
and future job requirements. Analyzes present career status. Sets goals [short
and/or long term]. Identifies available resources and methods for self-
improvement. Sets realistic time frames for goals and follows up.

o Flexibility: Modifies own behavior and work activities in response to new


information, changing conditions, or unexpected obstacles. Views
issues/problems from different perspectives. Considers a wide range of
alternatives, including innovative or creative approaches. Strives to take actions
that are acceptable to others having differing views.

o Leadership: Ability to make right decisions based on perceptive and analytical


processes. Practices good judgment in gray areas. Acts decisively.

Page 65 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Fig : Systematic Learning for the Individual

Importance of Knowledge, Skills & Competence (KSC)

The primary purpose of KSC is to measure those qualities that will set one candidate apart
from the others. KSC identify the better candidates from a group of persons basically
qualified for a position. How well an applicant can show that he or she matches the
position’s defined KSAs determines whether that person will be seriously considered for
the job.

Importance of developing skills:


More and more, job roles are requiring formal training qualifications either because of legislative
requirements or to meet the requirements of specific employers.

Page 66 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Developing your skills through further training provides significant benefits including:
Increased career development opportunities:
Developing a career in a chosen field is something many of us aspire to. Experience alone, in
many cases does not suffice when employers are seeking to promote their staff. By undertaking
further training, the opportunity to develop your career is enhanced.
Personal growth:
Training not only provides you
with the skills in a particular
area. By undertaking further
training you build your
networking, time management,
communication and negotiation
skills.
Increase your knowledge and
understanding of the industry:
Trainings to know more about
the industry & its development
keeps the resource abreast with
current industry trends & a better perspective to approach industry problems

Activity Description:
Make groups of 3-5 people and ask them to discuss and come up with ideas on how they
would like to plan out their careers after they join an organization. The candidates will be
required to create a career map showing where they stand in the organization and their
individual career paths at 5 year intervals.

Check your Understanding

1. True or False? Personal skill development is equally important for an


individual’s career as is performing well in the organization
c. True
d. False

Page 67 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Suggested Responses:
True, skills development is one of the most important things any fresh joinee in an organization
needs to think about. Skills development helps out the individual in the long run.

2. True or False? After formal education is completed, one can lay free and
doesn’t need to engage in any additional self-training.
a. True
b. False

Suggested Responses:
False, one should never stop moving ahead in life; and one can move ahead in life only by
continuous self improvement.

Summary

 Knowledge – Mastery of facts, range of information in subject matter area.


 Skills – Proficiency, expertise, or competence in given area; e.g., science, art,
crafts
 Competence– Demonstrated performance to use knowledge and skills when
needed
 More and more, job roles are requiring formal training qualifications either
because of legislative requirements or to meet the requirements of specific
employers.
 Training not only provides you with the skills in a particular area. By
undertaking further training you build your networking, time management,
and communication and negotiation skills.
 Trainings to know more about the industry & its development keeps the
resource abreast with current industry trends & a better perspective to
approach industry problems

Page 68 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Training and Development


Key Points

Identifying Training Needs

Different methods are used by the organization to review skills and knowledge
including:
• training need analysis
• skills need analysis
• performance appraisals

Training Need Analysis

o Training needs analysis is the first stage in training process and involves a
procedure to determine whether training will indeed address the problem, which
has been identified. Training can be described as “the acquisition of skills,
concepts or attitudes that result in improved performance within the job
environment”. Training analysis looks at each aspect of an operational domain
so that the initial skills, concepts and attitudes of the human elements of a
system can be effectively identified and appropriate training can be specified.

o Analysing what the training needs are is a vital prerequisite for an effective
training programme or event. Simply throwing training at individuals may miss
priority needs, or even cover areas that are not essential. TNA enables
organisations to channel resources into the areas where they will contribute the
most to employee development, enhancing morale and organizational
performance. TNA is a natural function of appraisal systems and is key
requirement for the award of Investors in People

o Training needs analysis involves:

• monitoring current performance using techniques such as observation,


interviews and questionnaires
• anticipating future shortfalls or problems
• identifying the type and level of training required and analysing how this can
best be provided.

Page 69 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Work / Task Analysis

Conducting a Work / Task Analysis

 Interview subject matter experts (SME's) and high performing employees.


Interview the supervisors and managers in charge. Review job descriptions and
occupational information. Develop an understanding of what employees need
to know in order to perform their jobs. Important questions to ask when
conducting a Task Analysis:

o What tasks are performed?


o How frequently are they performed?
o How important is each task?
o What knowledge is needed to perform the task?
o How difficult is each task?
o What kinds of training are available?

 Observe the employee performing the job. Document the tasks being
performed. When documenting the tasks, make sure each task starts with
an action verb. How does this task analysis compare to existing job
descriptions? Did the task analysis miss any important parts of the job
description? Were there tasks performed that were omitted from the job
description?

 Organize the identified tasks. Develop a sequence of tasks. Or list the tasks by
importance.

 Are there differences between high and low performing employees on specific
work tasks? Are there differences between Experts and Novices? Would
providing training on those tasks improve employee job performance?

 Most employees are required to make decisions based on information. How is


information gathered by the employee? What does the employee do with the
information? Can this process be trained? Or, can training improve this
process?

Page 70 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Performance Analysis

o Performance Analysis is used to identify which employees need the training.


Review performance appraisals. Interview managers and supervisors. Look for
performance measures such as benchmarks and goals.
o Sources of performance data:
 Performance Appraisals
 Quotas met (un-met)
 Performance Measures
 Turnover
 Shrinkage
 Leakage
 Spoilage
 Losses
 Accidents
 Safety Incidents
 Grievances
 Absenteeism
 Units per Day
 Units per Week
 Returns
 Customer Complaints

Check your Understanding!

Are there differences between high and low performing employees on specific
competencies? Would providing training on those competencies improve
employee job performance?

Facilitator Notes: Yes, there can be significant differences between high and low
performing employees on competencies, and that is why it is even more
important to understand training needs more carefully!

Evaluation/Review of Trainings

Page 71 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Evaluation of the impact of learning interventions may be carried out at a number of levels and
involve a variety of factors:

Reaction: What did the participants think about the learning interventions? What did the
providers think about the training interventions? What were their thoughts about the venue
facilities?

Learning: What were the main areas which were remembered by the whole group of
participants? What were the main areas which were forgotten by the whole group of
participants?

Transfer: Which elements of the learning have been applied in the workplace? Which elements
of the learning have not been applied in the workplace? Why do the participants apply some of
the elements of the learning programme and not others?

Results: What were results of the changed work behavior? What effect did this have on
productivity?

Return on Investment: What was the return on investment (ROI) of the training? How does the
cost of training compare to the financial return on increased (decreased) productivity?

Page 72 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Fig : Sample training feedback form

Donald L Kirkpatrick, Professor Emeritus, University Of Wisconsin (where he achieved his


BBA, MBA and PhD), first published his ideas in 1959, in a series of articles in the Journal of
American Society of Training Directors. The articles were subsequently included in
Kirkpatrick's book Evaluating Training Programs (originally published in 1994; now in its 3rd
edition - Berrett- Koehler Publishers)
Kirkpatrick's four levels of evaluation model
The four levels of Kirkpatrick's evaluation model essentially measure:
• reaction of student - what they thought and felt about the training
• learning - the resulting increase in knowledge or capability
• behaviour - extent of behaviour and capability improvement and
implementation/application
• results - the effects on the business or environment resulting from the trainee's performance

Feedback

Feedback is an essential mean to understand and identify the right trainings & knowledge needed
for the required job function.
What is a 360-degree feedback survey?

Page 73 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
One of the best feedback tools for professionals is 360-degree feedback it’s also known as multi
source feedback that comes from members of an employee’s immediate work group. Most often,
360-degree feedback will include direct feedback from an employee’s subordinates, supervisors
and colleagues, as well as self-evaluation. In some cases it may also comprise feedback from
external sources, such as customers, suppliers and other interested stakeholders which reveals how
others perceive you.
It’s used for planning and mapping specific paths in their development. In few organizations results
are used in making administrative decisions related to pay and promotions. Usually it is best used
for development “than evaluation”.
How to go about 360-Degree Feedback survey?
To start the process 10-12 raters are to be pulled, out of which at least 6-8 (other than self) ratings
must be obtained. Raters should
offer confidential and anonymous
feedback about the individuals. The
accumulated report helps individuals
to reflect and start working on their
developmental aspects. Honest and
realistic feedback will be much
more valuable to the participants in
their self-development. Each
individual who receives feedback
will then be encouraged to work on
the development areas. It’s
advisable to re-run the survey after 9
months to know the progress and to
know the extent of improvement.
The feedback includes parameters such as: job performance, behaviour at workplace, managerial
effectiveness and skills like delegation, communication and team play. It also includes higher
aspects like ethics, fairness, etiquette values, like professional courtesies. These are only an
indicative list and we can customize the parameters for each organization.

Activity Description:
Create a feedback form for a soft skills training. Identify what fields you will include
and include a grading mechanism for the trainer on each parameter.

Page 74 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Summary

 Different methods are used by the organizations to review skills and


knowledge including:
• training need analysis
• skills need analysis
• performance appraisals
 Training needs analysis is the first stage in training process and involves a
procedure to determine whether training will indeed address the problem,
which has been identified
 Performance Analysis is used to identify which employees need the
training.
 Evaluation of the impact of learning interventions may be carried out at a
number of levels and involve a variety of factors.

Page 75 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Learning and Development policies and record keeping


Key Points

Sample L&D Policy for Genpact

Genpact is completely committed towards continuous talent development, and our Learning
and development framework is a key differentiator for us when it comes to employee
retention. We have made significant investments in developing in-house capabilities in many
training areas, both technical and non-technical, and have also partnered with several leading
training providers, in order to ensure best-in-class training for our employees.

Our Learning & Development function delivers more than 6 million hours of training
annually. The testimony to our commitment lies in a series of industry recognition that we
have won over the years, such as recognition from American Society for Training &
Development (ASTD) and multiple Brandon Hall Excellence in learning awards.

Training needs identification for each individual is done at the time of joining the
organization / new process/ new role and during subsequent performance appraisals.
Trainings provided cover all aspects of professional and personal development – business /
process understanding, technical capabilities, domain knowledge, communication and
interpersonal skills, and leadership potential development.

All new hires are required to attend a mandatory New Hire Orientation program, which
familiarizes them with Genpact as an organization and its defining values, as well as various
HR policies and processes.

This is followed by process and vertical specific new hire inductions to familiarize the new
hire with the process and industry as well as provide overview of the work done in that
space. In order to sensitize our employees with cultural nuances we also provide a
certification program on cultural sensitivity, with modules for more than 130 countries in the
world.

Page 76 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

For technical trainings, the focus is towards developing self-reliance and having internal
experts to conduct trainings using case studies and practical examples as data points. This
allows for imparting of technical trainings within our business context. In order to
supplement the in-
house trainer pool we
also work extensively
with reputed training
providers to meet the
training needs in a
timely and effective
manner.

Genpact also has a


comprehensive
leadership skills
development
curriculum, which
focuses on each stage
of an individual’s
professional growth,
from the time the
person starts leading a team for the first time, to gradually assuming greater responsibilities,
both in terms of span as well as scope of work.

These programs are a mix of online modules, such as a suite of 42 e-learnings from Harvard
Business School (Harvard Manage Mentor®), and instructor-led classroom sessions.

We have a dedicated residential learning facility at Hyderabad where programs of longer


duration are conducted centrally. For senior leadership programs, we have tied up with
reputed providers of Executive Education such as Duke University, IIM Ahmedabad,
Imparta and CapStone.

Check your Understanding!

What do you learn from the L&D policy for Genpact? Why do you think
Genpact invests so heavily into resource upskilling and training

Facilitator Notes: This is because when a resource is hired, or is working in a


specified process, there is always a continuous need to upskill the resource which would increase the
productivity of the resource. Hence, organizations like to measure ROI on the resource trainings.

Page 77 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Record Keeping

• As a professional, you have a responsibility to keep your skills and knowledge up to date.
The learning & development helps you turn that accountability into a positive opportunity to
identify and achieve your own career objectives.

• At least once a year, we recommend you review your learning over the previous 12 months,
and set your development objectives for the coming year. Reflecting on the past and
planning for the future in this way makes your development more methodical and easier to
measure. This is a particularly useful exercise prior to your annual appraisal!

• Some people find it helpful


to write things down in
detail, while others record
'insights and learning
points' in their diaries as
they go along. This helps
them to assess their
learning continuously.
These records and logs are
useful tools for planning
and reflection: it would be
difficult to review your
learning and learning needs
yearly without regularly
recording in some way your
experiences.

• Training is an investment that you make in yourself. It’s a way of planning your
development that links learning directly to practice. Trainings help you keep your skills up to
date, and prepare you for greater responsibilities. It can boost your confidence, strengthen
your professional credibility and help you become more creative in tackling new challenges.
Trainings makes your working life more interesting and can significantly increase your job
satisfaction. It can accelerate your career development and is an important part of upgrading
to chartered membership.

• It is strongly recommended that you maintain a personal portfolio. This will assist you in a
number of key aspects related to your career:
 You will be able to provide documented evidence of your commitment to your
chosen profession; and of your continued competence

Page 78 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
 It will act as an excellent reference, both in the updating of your Curriculum Vitae
and in recalling details of topics you have studied
 It will be a most useful aid in your career development, providing a means by which
you can plan, record and review your relevant activities

Fig : Sample training feedback form

Figures : A sample development record (top) and a sample development plan template (bottom)

Page 79 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points
Continuous Professional Development (CPD) - Refers to the process of tracking and documenting
the skills, knowledge and experience that you gain both formally and informally as you work,
beyond any initial training. It's a record of what you experience, learn and then apply. The term is
generally used to mean a physical folder or portfolio documenting your development as a
professional.
CPD can help you to reflect, review and document your learning and to develop and update your
professional knowledge and skills. It is also very useful to:
 provides an overview of your professional development to date
 reminds you of your achievements and how far you've progressed
 directs your career and helps
you keep your eye on your
goals
 uncovers gaps in your skills
and capabilities
 Opens up further development
needs
 provides examples and
scenarios for a CV or interview
 demonstrates your professional
standing to clients and
employers
 helps you with your career development or a possible career change

How can you assess this? Answer these questions:

Where am I now? - Review and reflect on any learning experiences over the previous year or over
the past three months. Write your thoughts down about what you learned, what insights it gave you
and what you might have done differently. Include both formal training events and informal learning
Where do I want to be? - Write down your overall job skill requirements – immediate & in next 1-
year.
What do I have to do to get there? - Make a note of what you need to do to achieve them. This
could include further training, job or role progression or changes in direction.
For shorter term objectives, include the first step - what you can do today or tomorrow. For
example, having a chat with your manager about a new responsibility or finding out about new
technology from a colleague who has experience of it.

Page 80 of 144
Facilitator Guide – SSC/ Q2101 – Associate Analytics

Key Points

Activity Description:
Create a sample development plan for your career after your hypothetical first year of
employment in a large organization. Make sure the development plan is a result of
development record which you’ve been maintaining throughout your tenure in the
organization. Do create a template of development record as well.

Summary

 Every organization has detailed goals on learning and development needs,


similar to the learning and development goals of Genpact, we we saw in the
sample policy.
 As a professional, you have a responsibility to keep your skills and knowledge
up to date.
 At least once a year, we recommend you review your learning over the
previous 12 months, and set your development objectives for the coming
year.
 It is strongly recommended that you maintain a personal portfolio. This will
assist you in a number of key aspects related to your career.
 Continuous Professional Development refers to the process of tracking and
documenting the skills, knowledge and experience that you gain both
formally and informally as you work, beyond any initial training.

Page 81 of 144
Developing a Competency
Framework
Linking Company Objectives and Personal
Performance

© Veer
Binkski

Objectives should align across the organization.

You're probably familiar with the phrase "what gets measured gets done."
Defining and measuring effectiveness – especially the performance of
workers – is a critical part of your job as a manager.

The question is: how do you define the skills, behaviors, and attitudes that
workers need to perform their roles effectively? How do you know they're
qualified for the job? In other words, how do you know what to measure?

Some people think formal education is a reliable measure. Others believe


more in on-the-job training, and years of experience. Others might argue that
personal characteristics hold the key to effective work behavior.

All of these are important, but none seems sufficient to describe an ideal set
of behaviors and traits needed for any particular role. Nor do they guarantee
that individuals will perform to the standards and levels required by the
organization.

A more complete way of approaching this is to link individual performance


to the goals of the business. To do this, many companies use "competencies."
These are the integrated knowledge, skills, judgment, and attributes that
people need to perform a job effectively. By having a defined set of
competencies for each role in your business, it shows workers the kind of
behaviors the organization values, and which it requires to help achieve its
objectives. Not only can your team members work more effectively and
achieve their potential, but there are many business benefits to be had from
linking personal performance with corporate goals and values.

Defining which competencies are necessary for success in your organization


can help you do the following:

 Ensure that your people demonstrate sufficient expertise.


 Recruit and select new staff more effectively.
 Evaluate performance more effectively.
 Identify skill and competency gaps more efficiently.
 Provide more customized training and professional development.
 Plan sufficiently for succession.
 Make change management processes work more efficiently.
How can you define the set of practices needed for effective performance?
You can do this by adding a competency framework to your talent
management program. By collecting and combining competency information,
you can create a standardized approach to performance that's clear and
accessible to everyone in the company. The framework outlines specifically
what people need to do to be effective in their roles, and it clearly establishes
how their roles relate to organizational goals and success.

This article outlines the steps you need to take to develop a competency
framework in your organization.
Design Principles of a Competency
Framework
A competency framework defines the knowledge, skills, and attributes
needed for people within an organization. Each individual role will have its
own set of competencies needed to perform the job effectively. To develop
this framework, you need to have an in-depth understanding of the roles
within your business. To do this, you can take a few different approaches:

 Use a pre-set list of common, standard competencies, and then customize


it to the specific needs of your organization.
 Use outside consultants to develop the framework for you.
 Create a general organizational framework, and use it as the basis for other
frameworks as needed.
Developing a competency framework can take considerable effort. To make
sure the framework is actually used as needed, it's important to make it
relevant to the people who'll be using it – and so they can take ownership of
it.

The following three principles are critical when designing a competency


framework:

1. Involve the people doing the work – These frameworks should not be
developed solely by HR people, who don't always know what each job
actually involves. Nor should they be left to managers, who don't always
understand exactly what each member of their staff does every day. To
understand a role fully, you have to go to the source – the person doing the
job – as well as getting a variety of other inputs into what makes someone
successful in that job.
2. Communicate – People tend to get nervous about performance issues. Let
them know why you're developing the framework, how it will be created,
and how you'll use it. The more you communicate in advance, the easier
your implementation will be.
3. Use relevant competencies – Ensure that the competencies you include
apply to all roles covered by the framework. If you include irrelevant
competencies, people will probably have a hard time relating to the
framework in general. For example, if you created a framework to cover
the whole organization, then financial management would not be included
unless every worker had to demonstrate that skill. However, a framework
covering management roles would almost certainly involve the financial
management competency.

Developing the Framework


There are four main steps in the competency framework development
process. Each steps has key actions that will encourage people to accept and
use the final product.

Step One: Prepare


 Define the purpose – Before you start analyzing jobs, and figuring out
what each role needs for success, make sure you look at the purpose for
creating the framework. How you plan to use it will impact whom you
involve in preparing it, and how you determine its scope. For example, a
framework for filling a job vacancy will be very specific, whereas a
framework for evaluating compensation will need to cover a wide range of
roles.
 Create a competency framework team – Include people from all areas of
your business that will use the framework. Where possible, aim to
represent the diversity of your organization. It's also important to think
about long-term needs, so that you can keep the framework updated and
relevant.

Step Two: Collect Information


This is the main part of the framework. Generally, the better the data you
collect, the more accurate your framework will be. For this reason, it's a good
idea to consider which techniques you'll use to collect information about the
roles, and the work involved in each one. You may want to use the following:
 Observe – Watch people while they're performing their roles. This is
especially useful for jobs that involve hands-on labor that you can
physically observe.
 Interview people – Talk to every person individually, choose a sample of
people to interview, or conduct a group interview. You may also want to
interview the supervisor of the job you're assessing. This helps you learn
what a wide variety of people believe is needed for the role's success.
 Create a questionnaire – A survey is an efficient way to gather data.
Spend time making sure you ask the right questions, and consider the
issues of reliability and validity. If you prefer, there are standardized job
analysis questionnaires you can buy, rather than attempting to create your
own.
 Analyze the work – Which behaviors are used to perform the jobs
covered by the framework? You may want to consider the following:
 Business plans, strategies, and objectives.

 Organizational principles.
 Job descriptions.
 Regulatory or other compliance issues.
 Predictions for the future of the organization or industry.
 Customer and supplier requirements.
Job analysis that includes a variety of techniques and considerations will
give you the most comprehensive and accurate results. If you create a
framework for the entire organization, make sure you use a sample of roles
from across the company. This will help you capture the widest range of
competencies that are still relevant to the whole business.
 As you gather information about each role, record what you learn in
separate behavioral statements. For example, if you learn that Paul from
accounting is involved in bookkeeping, you might break that down into
these behavioral statements: handles petty cash, maintains floats, pays
vendors according to policy, and analyzes cash books each month. You
might find that other roles also have similar tasks – and therefore
bookkeeping will be a competency within that framework.
 When you move on to Step Three, you'll be organizing the information
into larger competencies, so it helps if you can analyze and group your raw
data effectively.

Step Three: Build the Framework


This stage involves grouping all of the behaviors and skill sets into
competencies. Follow these steps to help you with this task:

 Group the statements – Ask your team members to read through the
behavior statements, and group them into piles. The goal is to have three
or four piles at first – for instance, manual skills, decision-making and
judgment skills, and interpersonal skills.
 Create subgroups – Break down each of the larger piles into
subcategories of related behaviors. Typically, there will be three or four
subgroupings for each larger category. This provides the basic structure of
the competency framework.
 Refine the subgroups – For each of the larger categories, define the
subgroups even further. Ask yourself why and how the behaviors relate, or
don't relate, to one another, and revise your groupings as necessary.
 Identify and name the competencies – Ask your team to identify a
specific competency to represent each of the smaller subgroups of
behaviors. Then they can also name the larger category.
 Here's an example of groupings and subgroupings for general management
competencies:
 Supervising and leading teams.
 Provide ongoing direction and support to staff.
 Take initiative to provide direction.
 Communicate direction to staff.
 Monitor performance of staff.
 Motivate staff.
 Develop succession plan.
 Ensure that company standards are met.
 Recruiting and staffing.
 Prepare job descriptions and role specifications.
 Participate in selection interviews.
 Identify individuals' training needs.
 Implement disciplinary and grievance procedures.
 Ensure that legal obligations are met.
 Develop staff contracts.
 Develop salary scales and compensation packages.
 Develop personnel management procedures.
 Make sure staff resources meet organizational needs.
 Training and development.
 Deliver training to junior staff.
 Deliver training to senior staff.
 Identify training needs.
 Support personal development.
 Develop training materials and methodology.
 Managing projects/programs
 Prepare detailed operational plans.
 Manage financial and human resources.
 Monitor overall performance against objectives.
 Write reports, project proposals, and amendments.
 Understand external funding environment.
 Develop project/program strategy.
You may need to add levels for each competency. This is particularly
useful when using the framework for compensation or performance
reviews. To do so, take each competency, and divide the related behaviors
into measurement scales according to complexity, responsibility, scope, or
other relevant criteria. These levels may already exist if you have job
grading in place.

 Validate and revise the competencies as necessary – For each item, ask
these questions:
 Is this behavior demonstrated by people who perform the work most
effectively? In other words, are people who don't demonstrate this
behavior ineffective in the role?
 Is this behavior relevant and necessary for effective work
performance?
These questions are often asked in the form of a survey. It's important to
look for consensus among the people doing the job, as well as areas where
there's little agreement. Also, look for possible issues with language, or the
way the competencies are described, and refine those as well.

Step Four: Implement


As you roll out the finalized competency framework, remember the principle
of communication that we mentioned earlier. To help get buy-in from
members of staff at all levels of the organization, it's important to explain to
them why the framework was developed, and how you'd like it to be used.
Discuss how it will be updated, and which procedures you've put in place to
accommodate changes.

Here are some tips for implementing the framework:


 Link to business objectives – Make connections between individual
competencies and organizational goals and values as much as possible.
 Reward the competencies – Check that your policies and practices
support and reward the competencies identified.
 Provide coaching and training – Make sure there's adequate coaching
and training available. People need to know that their efforts will be
supported.
 Keep it simple – Make the framework as simple as possible. You want the
document to be used, not filed away and forgotten.
 Communicate – Most importantly, treat the implementation as you would
any other change initiative. The more open and honest you are throughout
the process, the better the end result – and the better the chances of the
project achieving your objectives.
Key Points
Creating a competency framework is an effective method to assess, maintain,
and monitor the knowledge, skills, and attributes of people in your
organization. The framework allows you to measure current competency
levels to make sure your staff members have the expertise needed to add
value to the business. It also helps managers make informed decisions about
talent recruitment, retention, and succession strategies. And, by identifying
the specific behaviors and skills needed for each role, it enables you to
budget and plan for the training and development your company really needs.

The process of creating a competency framework is long and complex. To


ensure a successful outcome, involve people actually doing carrying out the
roles to evaluate real jobs, and describe real behaviors. The increased level of
understanding and linkage between individual roles and organizational
performance makes the effort well worth it.
Develop your knowledge, Skills and Competence

1. Develop knowledge, skills and competence is about “Self-development” and


also talk about “Carrier Growth Development”

SELF DEVELOPMENT
Self-development efforts are unique to an individual, and the reasons for
undertaking them are specific to the individual.

The advantage of self-development efforts are that you decide for yourself how
and where best to expand your capabilities and strengths — which learning and
development activities you want to undertake and which are the different areas
where you can best apply your knowledge and skills.

1. Identify your current level of knowledge, skills and competence

2. Obtain advice and guidance from appropriate people to develop your


knowledge, skills and competence

3. Agree with appropriate people on a plan of learning and development activities


to address your learning needs

“Training is important to continuous improvement because of the change that is


taking place around us. You need to be aware of that change, and you need to be
continually growing to adapt to it through self-development.”

- Paul Fortier (Plant training leader for libbey, Inc., in Toledo, OH)

1|Page
Need For Self Development

By concentrating on continuous self-development you:

 Become better at things you do or want to do

 Add to your knowledge

 Increase your range of skills

 Take responsibility for your development

How to identifying knowledge and skills

1. First, make two or more Personal Opinion table like this:

2. Second, give yourself a rating 1–4 based on the parameters


Rating scale:
1 = Never
2 = Occasionally
3 = Frequently
4 = All the time

2|Page
3. Third, give personal opinions table to your friends and let him fill the personal
opinions table for you.
After that, Make Self Development Plan Table

CAREER GROWTH AND DEVELOPMENT

 Identify the knowledge and skills you need for your development.

 Undertake learning and development activities in line with your plan.

 Obtain feedback from appropriate people on your knowledge and skills and
how effectively you apply them

 Review your knowledge, skills and competence regularly and take appropriate
action.

Ensuring your Own Career Development

Employee development is a necessary process

 To bridge the skill gap of employees


 To bridge the skill gap between the organization’s current capability
and that is required to achieve the business results

3|Page
 To build loyalty in employees, as loyalty increases productivity
 To create a learning culture in the organization
 To motivate employees to learn new skills and acquire new leanings

Conduct Gap Analysis


 Studying the job description - A job description is a list of knowledge,
skills, abilities and other characteristics that an individual must have to
successfully perform the job.
 Tip Look at some job descriptions on the internet for various job roles
 Studying and understanding the organization’s vision and goals and
how you can contribute to its growth
 You can avail of development opportunities provided to you, and also
avail of other sources for self-development for your individual
development needs from all the resources available on the web.

Training Methods Available

 Instructor-led ---This type of training is facilitated by an instructor either


online or in a classroom setting Instructor-led training allows for learners and
instructors or facilitators to interact and discuss the training material, either
individually or in a group setting

 Learning from colleagues ----We can learn so much from our colleagues that
we can put to use within our professional life and beyond Learning from those
we work with is one of the great benefit of being in the workplace Sometimes
what we learn from them is through what they teach us outright or simply
what we learn by just observing them

 Self-training: ---Learning a concept or acquiring knowledge by self-study is


called as self-training It can be done through reading user manuals, textbooks
or learning through simulations and tutorials

4|Page
 E-learning ----It is the use of technology to enable people to learn anytime and
anywhere e-Learning can include training, the delivery of just-in-time
information and guidance from experts, For example, virtual classrooms,
application sharing, self-paced courses, audio and video conferencing, etc.

Review Performance

 Daily Feedback: While working on your daily tasks it becomes a good


habit to obtain feedback from your peers and colleagues.
 Development Interview: You receive feedback from your supervisor as
well. This could be on a specific task or a project. Feedback from your
supervisor always helps you to achieve your goals and also climb the
ladder of success
 Assessment of Work Performance: This kind of feedback is shared
mostly on an annual basis by the HR manager or supervisor. It provides
you with holistic feedback and also helps you to plan your further
development

Identify accurately the knowledge and skills they need for their job role
Here are some of the skills that are commonly developed on the job:

 Industry or product knowledge


 Professionalism
 Leadership
 Customer service
 Time management
 Strategic thinking

5|Page
How to learn job skills at work
Use these methods to start learning new skills on the job:
 Look for opportunities
 Assess your skills
 Practice
 Learn from others
 Ask for feedback
 Track your progress

How to identify employee training and development needs

 To identify training and development needs, you must first set clear
expectations for each role within your business. This creates a benchmark
to monitor performance against.
 Review job descriptions when new positions are created.
 Monitor employee performance----
 Analysis --- training and development needs will fall into one of three
categories:
o improving staff knowledge about your industry
o Job-related needs
o Personal development
 Use focus group to understand training and development needs---
o Identify training and development needs within your business.
 Set up a system of mentoring and coaching
o Closely aligning staff with a mentor will help develop skills while
identifying additional training and development needs.

6|Page
Six steps to developing a coherent, practical and effective Learning and
Development plan:

Step 1: Create clear career pathways

Step 2: Define Roles and Responsibilities

Step 3: Define the Knowledge & Behaviors required

Step 4: Choose the right Training

Step 5: Build on Skills & Experience

Step 6: Monitoring the Learning & Development Journey

7|Page

You might also like