MODULE I
Introduction to Research Methodology (RM)
Meaning of Research
1. Research is a systematic, organized effort to investigate a specific problem, collect data,
analyze information, and reach valid conclusions.
2. The word research literally means “to search again” — implying a careful, persistent
investigation.
3. It aims to discover new facts, verify existing knowledge, and develop new theories or
applications.
Significance of Research
Expands knowledge
1. Research systematically adds to the body of human knowledge.
2. It uncovers new facts, relationships, and insights that were previously unknown.
3. By investigating unanswered questions, it helps correct misconceptions and refines
existing ideas.
4. Example: Discovering a new species in biology expands scientific understanding of
biodiversity.
Solves practical problems
1. Research offers evidence-based solutions to real-world issues.
2. It helps design interventions, tools, or technologies to overcome challenges in health,
education, engineering, agriculture, and many other fields.
3. By applying rigorous methods, it ensures that solutions are reliable and replicable.
4. Example: Medical research leading to vaccines for infectious diseases.
Aids in theory development
1. Research supports the creation and refinement of theories that explain how or why
things happen.
2. It tests hypotheses, validates existing theories, or builds new frameworks to explain
observed phenomena.
3. Theories developed through research guide further studies and provide foundations
for practice.
4. Example: Social research that builds theories on human motivation or learning.
Improves practices and processes
1. Research identifies more efficient, effective, or ethical ways of doing things.
2. It evaluates current practices, identifies weaknesses, and recommends improvements.
3. This is especially important in sectors like education, healthcare, business, or
manufacturing.
4. Example: Educational research leading to better teaching strategies to improve
student outcomes.
Supports policy-making and planning
1. Research generates data and evidence that inform public policies, laws, and
regulations.
2. Policymakers rely on research findings to make informed decisions and plan
interventions that benefit society.
3. Sound evidence helps avoid guesswork and improves transparency and
accountability.
4. Example: Economic research guiding minimum wage policies or climate research
informing environmental laws.
Importance of Scientific Research in Decision Making
Objectivity:
1. Scientific research uses systematic and unbiased methods to gather evidence.
2. This helps decision-makers base their choices on facts, rather than opinions, emotions, or
political pressures.
3. Objectivity ensures fairness and credibility in decisions, improving public trust.
4. Example: Using randomized controlled trials to evaluate a new drug avoids subjective
bias.
Reliability:
1. Scientific methods produce results that are repeatable and verifiable, increasing
confidence in the evidence used for decisions.
2. Reliable research findings help reduce errors or failures in policies and strategies.
3. Decisions based on reliable data are more likely to succeed and gain stakeholder support.
4. Example: Consistent economic indicators guiding central bank monetary policy.
Risk Reduction:
1. Scientific research helps identify potential risks before actions are taken, enabling better
planning and preparedness.
2. It provides evidence to assess possible negative consequences and ways to mitigate
them.
3. This prevents costly mistakes and improves safety and sustainability.
4. Example: Environmental impact studies reducing risks of large construction projects.
Predictive Power:
1. Scientific research can develop models and forecasts to predict future outcomes and
trends.
2. Predictive capability supports proactive rather than reactive decision-making.
3. It enables organizations and governments to plan long-term strategies with greater
confidence.
4. Example: Climate models predicting rainfall patterns to support agricultural planning.
Resource Optimization:
1. Research provides data to allocate resources efficiently, avoiding waste.
2. By identifying priority areas and the most effective interventions, decision-makers
maximize the impact of limited budgets and time.
3. This supports cost-effective and targeted strategies.
4. Example: Health research showing which vaccines deliver the best public health results
for the investment.
Accountability:
1. Decisions backed by scientific research can be defended and justified transparently,
increasing accountability.
2. Stakeholders (public, investors, policy makers) can trace decisions to evidence rather
than arbitrary choices.
3. This builds trust and supports ethical governance.
4. Example: Public health authorities citing peer-reviewed data to justify pandemic
restrictions.
Types of Research
1. Basic (Pure) Research:
I. It is also called fundamental research.
II. Conducted to generate new knowledge and advance theoretical understanding
without immediate practical application.
III. Driven by curiosity and intellectual interest.
IV. Example: Studying the behavior of subatomic particles in physics.
2. Applied Research:
I. Focuses on practical problem-solving.
II. Aims to find solutions that can be directly applied to real-world challenges.
III. Often uses results from basic research to address specific needs.
IV. Example: Developing a new drug to treat a disease.
3. Descriptive Research:
I. Seeks to describe characteristics of a phenomenon or population systematically and
accurately.
II. Answers “what” questions, rather than “why” or “how.”
III. Often involves surveys, observations, or case studies.
IV. Example: Surveying the literacy rate in a rural community.
4. Analytical Research:
I. Goes beyond describing — it analyzes information to understand relationships
and causes.
II. Uses data that is already available to draw conclusions or test hypotheses.
III. Often includes critical evaluation and comparison.
IV. Example: Analyzing crime records to determine factors influencing crime rates.
5. Exploratory Research:
I. Conducted to explore a relatively unknown area or gain new insights where
little previous research exists.
II. Helps identify variables, questions, or hypotheses for future studies.
III. Flexible and open-ended in design.
IV. Example: Interviewing startup founders to explore challenges in a new industry.
6. Experimental Research:
I. Involves manipulating variables to test cause-and-effect relationships under
controlled conditions.
II. Often includes experimental and control groups.
III. Considered the most scientifically rigorous research type.
IV. Example: Testing the effectiveness of a new fertilizer on crop yields in a lab-
controlled field.
7. Quantitative Research:
I. Involves numerical data collection and statistical analysis.
II. Seeks to measure variables and test hypotheses objectively.
III. Results are often presented in graphs, tables, or charts.
IV. Example: Using a questionnaire to measure levels of stress among students and
analyzing it statistically.
8. Qualitative Research:
I. Focuses on understanding meanings, experiences, and perspectives in a
deeper, non-numerical way.
II. Uses interviews, observations, and textual analysis.
III. Generates rich, detailed data about human behavior or culture.
IV. Example: Conducting in-depth interviews to explore how cancer patients cope
emotionally.
Research Process
Formulating the research problem
1. The first and most crucial step.
2. Involves identifying and clearly defining the issue, question, or gap the research aims
to address.
3. A well-formulated problem provides direction and focus for the entire study.
4. Should be specific, feasible, and researchable.
5. Example: Investigating the reasons for high dropout rates among rural high school
students.
Reviewing literature
1. A systematic examination of existing studies, theories, and data related to the research
topic.
2. Helps understand what is already known, where gaps exist, and how the current study can
contribute.
3. Prevents duplication of effort and builds a theoretical framework for the research.
4. Example: Reading studies on student motivation, family background, and school
resources in relation to dropout.
Developing objectives and hypotheses
1. Objectives state what the research aims to achieve, in clear and measurable terms.
a. Example: “To identify socio-economic factors influencing school dropout rates.”
2. Hypotheses are testable statements predicting relationships between variables.
a. Example: “Students from low-income families are more likely to drop out.”
3. Together, they guide the direction and scope of the research.
Designing the research (methodology)
1. Involves deciding how the research will be carried out, including:
2. Research type (qualitative, quantitative, mixed)
3. Sampling techniques
4. Tools for data collection (questionnaires, interviews, experiments, etc.)
5. Ethical considerations
6. Data analysis plan
7. A well-planned design ensures validity, reliability, and feasibility.
Collecting data
1. Executing the planned data-gathering process.
2. May involve surveys, interviews, experiments, observations, or secondary data sources.
3. Data collection must follow ethical guidelines, maintain accuracy, and protect
participants’ privacy.
4. Example: Administering a questionnaire to rural students on reasons for school leaving.
Analyzing data
1. Transforming raw data into meaningful results using statistical or qualitative analysis
methods.
2. Includes coding, organizing, testing hypotheses, identifying patterns, and interpreting
results.
3. Analysis links the data back to the research questions and objectives.
4. Findings are usually presented with tables, charts, narratives, or models.
5. Example: Calculating the percentage of dropouts linked to family income levels.
1. Interpreting results
2. Reporting and presenting findings
Identification of Research Problem
Involves recognizing an area where knowledge is lacking
Should be specific, researchable, and feasible
Criteria:
o Relevance
o Clarity
o Novelty
o Practical feasibility
Formulation of Hypothesis
A tentative statement predicting the relationship between variables
Must be testable and measurable
Example: “Higher advertising expenditure increases sales.”
Types of hypotheses:
Null hypothesis (H0): No relationship exists
Definition:
The null hypothesis is a statement that there is no effect, no difference, or no relationship
between variables. It represents the default assumption.
Purpose:
It is the hypothesis we attempt to disprove or reject through statistical testing.
Examples:
A new drug has no effect on patients:
H₀: The drug has no effect (mean difference = 0)
A coin is fair:
H₀: The probability of heads = 0.5
Alternative hypothesis (H1): Relationship exists
Definition: The alternative hypothesis is the statement we want to prove or support. It
suggests that there is an effect, a difference, or a relationship.
Purpose: If we reject the null hypothesis, we accept the alternative hypothesis.
Examples:
The drug does have an effect:
H1: The drug has an effect (mean difference ≠ 0)
The coin is biased:
H1: Probability of heads ≠ 0.5
Research Designs
A blueprint for conducting the study
Specifies methods of data collection, measurement, and analysis
Ensures validity and reliability of results
Types of research designs:
1. Exploratory: Flexible, preliminary studies
✅ Purpose: To explore a problem or phenomenon when there is little prior knowledge
available.
✅ Features: Flexible, open-ended, aims to generate insights, hypotheses, or new research
questions rather than testing a theory.
✅ Methods: Literature reviews, focus groups, in-depth interviews, pilot studies.
✅ Example: Studying why a new consumer trend is emerging without predefined
hypotheses.
2. Descriptive: Structured, describes characteristics
✅ Purpose: To describe characteristics of a population, situation, or phenomenon.
✅ Features: Structured, uses clear definitions and measurement tools; answers “what”
rather than “why”.
✅ Methods: Surveys, observational studies, case studies.
✅ Example: Determining the demographic profile of smartphone users in a city.
3. Experimental: Tests cause-effect relationships
✅ Purpose: To test cause-and-effect relationships under controlled conditions.
✅ Features: Manipulates independent variables and observes effects on dependent variables,
usually with random assignment to groups.
✅ Methods: Laboratory experiments, field experiments, randomized controlled trials (RCTs).
✅ Example: Testing the effectiveness of a new teaching method on student performance.
4. Diagnostic: Identifies reasons for a problem
✅ Purpose: To determine the causes of a problem or to identify solutions.
✅ Features: Seeks to diagnose an issue, its sources, and possible remedies.
✅ Methods: Root cause analysis, case analysis, stakeholder interviews.
✅ Example: Investigating why employee turnover is high in an organization.
5. Cross-sectional: Observes at a single point in time
✅ Purpose: To study a population or phenomenon at one point in time.
✅ Features: Snapshot approach, often descriptive or correlational; cannot establish causality over
time.
✅ Methods: One-time surveys, observational studies.
✅ Example: A one-time survey measuring health behaviors among teenagers in 2025.
6. Longitudinal: Observes over a period of time
✅ Purpose: To study changes and developments over time.
✅ Features: Data is collected from the same subjects repeatedly over a period, enabling analysis
of trends, patterns, and causal relationships.
✅ Methods: Panel studies, cohort studies, repeated surveys.
✅ Example: Tracking the academic progress of students from grade 1 to grade 12.
MODULE-II
1. Measurement and Data Collection
Measurement is the process of assigning numbers, symbols, or labels to characteristics
or variables of phenomena according to specific rules. It is essential for converting
abstract concepts (like intelligence, satisfaction, or performance) into observable and
quantifiable data for analysis.
Objectives of Measurement
To quantify variables for comparison and statistical analysis
To ensure consistency and accuracy in research
To test hypotheses and validate theoretical models
To enable replication of research studies
Key Concepts in Measurement
Concept Description
A characteristic or attribute that can vary among subjects (e.g., age, income,
Variable
satisfaction)
Construct An abstract concept developed for scientific study (e.g., motivation, intelligence)
Operational
How a variable or construct is measured or manipulated in a specific study
Definition
Levels (Scales) of Measurement
Understanding the level of measurement helps determine the appropriate statistical tools to
use.
1. Nominal Scale
The nominal scale classifies data into distinct categories that are mutually exclusive and
without any inherent order.
Characteristics:
Only labels or names are assigned
No ranking or ordering
Used for categorical variables
Examples:
Gender: Male, Female, Other
Blood Type: A, B, AB, O
Religion: Hindu, Muslim, Christian, Sikh
Statistical Use:
Frequency counts
Mode
Chi-square test
2. Ordinal Scale
The ordinal scale involves rank-ordering items, but does not assume equal spacing between
the ranks.
Characteristics:
Categories are ordered
Intervals are not equal or known
Indicates relative position
Examples:
Customer Satisfaction: Very Satisfied > Satisfied > Neutral > Dissatisfied
Education Level: High School < Bachelor's < Master's < PhD
Military Rank: Private < Sergeant < Captain
Statistical Use:
Median
Percentiles
Non-parametric tests (e.g., Mann-Whitney U test)
3. Interval Scale
The interval scale has ordered categories with equal intervals between values, but no true
zero point.
Characteristics:
You can measure differences, but not true ratios
Zero point is arbitrary
Examples:
Temperature in Celsius or Fahrenheit (0°C ≠ no temperature)
IQ scores
Dates on a calendar
Statistical Use:
Mean and standard deviation
Correlation and regression
t-tests and ANOVA
4. Ratio Scale
The ratio scale includes all the properties of interval data plus a true zero, which allows for
meaningful ratios.
Characteristics:
Has order, equal intervals, and a true zero
Enables comparison using multiplication/division
Examples:
Weight, Height, Age
Income, Distance, Speed
Number of children, Exam scores (out of 100)
Statistical Use:
All descriptive and inferential statistics
Can compute ratios (e.g., “twice as fast”)
Data Collection:
Data collection is a vital part of the research process. It involves gathering accurate
and relevant information to answer research questions, test hypotheses, and analyze
relationships between variables.
There are several techniques used, depending on the type of research (qualitative or
quantitative), the nature of the data, and the research objectives.
Techniques used to gather information for research. Methods include:
o Surveys
o Interviews
o Observations
o Experiments
1. Surveys
Definition:
Surveys are a widely used method of data collection, especially in quantitative research. They
involve asking a set of standardized questions to a group of respondents to gather data about their
opinions, behaviors, characteristics, or experiences.
Tools Used:
Printed or digital questionnaires
Online survey platforms (e.g., Google Forms, SurveyMonkey)
Used For:
Large sample sizes
Gathering opinions, attitudes, behaviors, demographics
Types of Surveys
Type Description Use Case
Snapshot of current opinions or
Cross-Sectional Data collected at a single point in time.
behavior.
Data collected from the same respondents
Longitudinal Studying trends or changes.
over time.
Web-based (e.g., Google Forms,
Online Surveys Easy distribution, low cost.
SurveyMonkey).
Telephone Surveys Conducted via phone calls. Useful for quick, direct responses.
Face-to-Face Higher response rates, more detailed
In-person interviews using a survey format.
Surveys answers.
Useful for specific populations (e.g.,
Mail Surveys Paper surveys sent and returned by mail.
elderly).
Advantages:
Cost-effective for large populations
Standardized data
Easy to analyze statistically
Easy to administer and analyze
Can reach large and diverse populations
Limitations:
Responses may lack depth
Risk of non-response or biased answers
Misinterpretation of questions
Social desirability bias
2. Interviews
Definition:
An interview is a method of collecting data through direct, personal conversation between the
researcher and the respondent in order to gather information, insights, or opinions on a particular
topic.
Key Features of Interviews
1. Direct Interaction – Face-to-face, phone, or video communication.
2. Flexible Structure – Can range from structured (fixed questions) to unstructured (open-
ended discussion).
3. Depth of Data – Allows for in-depth understanding of experiences, motivations, and
perspectives.
Types of Interviews
Type Description Use Case
Structured Pre-determined questions, fixed order. Surveys, large sample studies.
Guided questions with flexibility to Most common in qualitative
Semi-Structured
explore topics. research.
Open conversation with no fixed Exploratory studies, ethnographic
Unstructured
questions. work.
Group Interviews / Focus Interviewing multiple participants Gathering varied perspectives in
Groups simultaneously. social contexts.
Telephone/Online Remote, often recorded for
When in-person is not feasible.
Interviews transcription.
Used For:
In-depth insights into beliefs, feelings, and motivations
Qualitative research and case studies
Advantages:
Rich, detailed information
Opportunity to clarify and probe further
Flexibility in exploring new topics
Limitations:
Time-consuming and costly
Requires skilled interviewer
May introduce interviewer bias
Difficult to analyze large volumes of qualitative data
3. Observations
Observations as a data collection method involve systematically watching, listening to, and
recording behaviors, events, or conditions in their natural setting. It is commonly used in
qualitative research, though it can also be structured for quantitative purposes.
Key Features of Observations
1. Direct Data Collection – No reliance on participants’ self-reports.
2. Contextual Insight – Captures behaviors as they occur in real environments.
3. Flexible or Structured – Can be open-ended or follow a checklist.
Types of Observations
Type Description Use Case
Structured Uses a checklist or coding system. Behavior studies, classroom monitoring.
Ethnographic research, exploratory
Unstructured Open-ended, descriptive notes.
studies.
Participant Observation Researcher becomes part of the group. Social sciences, anthropology.
Non-Participant
Observer remains separate. Objective recording of public behavior.
Observation
Naturalistic Conducted in real-life settings. Studying subjects in their environment.
Conducted in a controlled environment (e.g.,
Controlled Experiments, usability testing.
lab).
Used For:
Studying behavior, social interactions, physical settings
Ethnographic and psychological studies
Advantages:
Real-time data in natural context
Non-verbal behavior can be recorded
Captures actual behavior
Useful when participants may not be fully aware of their actions
Limitations:
Observer bias
Limited access to internal thoughts or motivations
Ethical concerns in covert observation
Time-consuming and labor-intensive
Behavior may change if people know they are being watched (Hawthorne effect)
4. Experiments
Definition:
Experiments are a structured method of data collection used primarily in quantitative research
to test hypotheses by manipulating one or more variables and observing the effect on other
variables. They are especially common in scientific, psychological, and social research.
Key Features of Experiments
1. Controlled Environment – Researchers manipulate variables in a systematic way.
2. Causal Inference – Allows determination of cause-and-effect relationships.
3. Randomization – Participants are often randomly assigned to groups to reduce bias.
Types of Experiments
Type Description Use Case
Laboratory Psychology, biology, product
Conducted in a controlled, artificial setting.
Experiment testing.
Marketing, education, workplace
Field Experiment Conducted in a real-world setting.
studies.
Researcher does not manipulate variables; they Policy impact, environmental
Natural Experiment
occur naturally. studies.
When randomization is not feasible
Quasi-Experiment Lacks random assignment to groups.
or ethical.
Involves manipulation, control group, and
True Experiment Gold standard for testing causality.
random assignment.
Used For:
Testing causal relationships between variables
Hypothesis-driven research
Advantages:
High internal validity (control over variables)
Can determine cause-and-effect relationships
Replicable and often statistically rigorous
Limitations:
Artificial settings may reduce external validity
Ethical constraints
Often expensive and complex
Can be time-consuming
Participant behavior may be influenced by awareness of the experiment (demand
characteristics)
Summary Table
Technique Data Type Strengths Limitations
Survey Quantitative Cost-effective, large samples Superficial answers, bias possible
Interview Qualitative Deep insights, flexible Time-consuming, risk of bias
Observation Both Real-time, behavioral data Limited to observable phenomena
Experiment Quantitative Causality, control May lack real-world relevance
2. Primary Data
Primary data is data collected directly from first-hand sources for a specific research purpose.
It is original, raw, and has not been previously published or interpreted by others.
Characteristics
Original – Collected by the researcher directly from the source.
Specific – Tailored to a particular research question or objective.
Current – Reflects recent or real-time information.
Controlled – Researcher has control over how data is collected (tools, timing, sample,
etc.).
Common Methods of Collecting Primary Data
Method Description
Surveys/Questionnaires Written or digital forms with structured questions.
Interviews Verbal interaction (structured, semi-structured, or unstructured).
Observations Watching and recording behavior or events in real-time.
Experiments Controlled setups to test hypotheses by manipulating variables.
Method Description
Focus Groups Guided group discussions to gather opinions or insights.
Field Notes / Diaries Self-recorded experiences or observations by participants or researchers.
Advantages
Relevant to the specific study
Greater accuracy and reliability
Researcher controls data quality
Timely and up-to-date information
Disadvantages
Time-consuming to collect
Can be expensive (especially for large samples)
May require trained personnel and special tools
Limited in scope (compared to large datasets available as secondary data)
Examples of Primary Data
A student conducting a survey of 100 classmates about study habits
A company observing customer behavior in a store
An experiment testing the effectiveness of a new drug
A researcher interviewing local farmers about climate change impacts
3. Secondary Data
Secondary data refers to data that has already been collected, processed, and
published by someone else, often for a purpose different from your own research. Data
collected previously for another purpose but reused for a new study.
Characteristics
Pre-existing – Collected by another person, organization, or researcher.
Easily Accessible – Often found in reports, databases, journals, or websites.
Cost-effective – Saves time and resources compared to collecting new (primary) data.
Broad Scope – Often covers large samples or long time spans.
Common Sources of Secondary Data
Source Type Examples
Government Reports Census data, labor statistics, health records
Academic Research Journal articles, theses, dissertations
Institutional Databases World Bank, WHO, IMF, OECD
Commercial Sources Market research firms, company reports
Online Content News archives, social media, blogs (used with caution)
Libraries and Archives Historical documents, newspapers, records
Advantages
Saves time and cost
Data may come from large, reliable sources
Allows for longitudinal or trend analysis
Can complement or validate primary data
Disadvantages
May not be specific to your research question
Could be outdated or incomplete
Risk of bias or data quality issues
Limited control over how data was collected
Examples of Secondary Data
Using UNICEF reports for child health statistics
Analyzing past academic studies on climate change
Referencing market trends from a commercial research firm
Examining crime statistics from government databases
Primary vs. Secondary Data
Aspect Primary Data Secondary Data
Source Collected directly Already collected
Cost & Time High Low
Specificity Custom to your study May be general
Control Full control No control over collection
Aspect Primary Data Secondary Data
Accuracy Can be high if collected well Varies depending on source
4. Design of Questionnaire
A questionnaire is a structured set of questions used to collect data from respondents. It’s
commonly used in surveys for both qualitative and quantitative research.
Elements of good questionnaire design:
o Clear objectives
✅The questionnaire should directly serve the research objectives.
✅ Questions must be relevant to what you want to measure, avoiding unnecessary
items.
✅Helps keep the questionnaire focused and concise.
Example: If studying customer satisfaction, do not include irrelevant questions
about unrelated products.
o Simple and unambiguous language
✅ Use clear, everyday language that is easily understood by the target audience.
✅ Avoid jargon, technical terms like SQL, NoSQL, ETL, APIs, and cloud tools like
AWS, or complex sentences.
✅ Help reduce misinterpretation of questions.
Example: Instead of “What is your perception of the efficacy of our service
delivery?” ask “How satisfied are you with our service?”
o Logical sequence
✅ Arrange questions in a logical, natural flow.
✅ Start with easy and non-sensitive questions to build rapport (mutual sense of trust).
✅ Group similar topics together to maintain respondent focus.
✅ Place sensitive or personal questions later, after trust has been established.
Example sequence: demographics → general behavior → specific opinions → sensitive
topics.
o Pre-testing (pilot testing)
✅ Test the questionnaire with a small sample of respondents before the actual survey.
✅ Help identify confusing questions, unclear wording, or missing response options.
✅ Allows improvements in design and flow.
✅ Minimizes errors during full data collection.
o Balanced and unbiased questions
✅ Questions should be neutral, avoiding leading language that could push respondents to
a particular answer.
✅ Provide balanced options for responses.
✅ Avoid emotionally loaded words or one-sided framing.
Example: Instead of “Don’t you think our service is excellent?” ask “How would you
rate the quality of our service?”
5. Sampling Fundamentals and Sample Designs
Sampling: Selecting a portion of the population to represent the whole. In other words,
Sampling is the process of selecting a subset (sample) from a larger group (population) to
estimate characteristics of the whole.
Instead of surveying every individual in a population (which is often costly or
impossible), sampling allows you to draw conclusions from a manageable number of
observations.
Sampling Fundamentals:
1. Population
The entire set of items or individuals of interest in a study.
Can be finite (e.g., students in a school) or infinite (e.g., all possible outcomes of rolling a
die).
Example: All voters in a country.
2. Sample
A portion of the population selected for analysis.
Should ideally be representative of the whole population?
Example: 1,000 voters selected randomly to predict election outcomes.
3. Sampling Frame
A list or source containing all the elements of the population from which the sample is
drawn.
Example: A city’s registered voter list.
Must be as complete and current as possible to avoid sampling bias.
4. Sampling Unit
A single element or group of elements considered for selection.
Could be an individual, household, company, etc.
Example: One household in a housing survey.
5. Sample Size
The number of sampling units selected from the population.
Larger samples give more accurate results but cost more.
Sample size is influenced by:
o Desired confidence level
o Margin of error
o Population variability
6. Sampling Error
The difference between the sample result and the true population value.
Occurs naturally because only part of the population is observed.
Reduced by increasing sample size and using probability sampling.
7. Non-Sampling Error
Errors not related to the act of sampling itself.
Examples:
o Data entry mistakes
o Misinterpretation of survey questions
o Non-response bias
8. Representativeness
A sample is representative if it reflects the key characteristics of the population.
Representativeness is critical for making valid inferences.
Sample Designs
Sample design refers to the plan or strategy used to select a sample from a population.
It determines how, from where, and how many units will be selected to ensure the
sample is representative and suitable for the research objectives.
Objectives of a Good Sample Design
A well-designed sample should:
Be representative of the population
Minimize bias
Be efficient in terms of time and cost
Allow valid statistical inferences
Be simple and practical to implement
Types of Sample Designs
Sample designs are broadly categorized into two main types:
o Probability sampling (e.g., simple random, stratified, cluster, systematic)
o Non-probability sampling (e.g., convenience, purposive, quota, snowball)
1. Probability Sampling Designs
Every unit in the population has a known and non-zero chance of being selected.
Sampling Method Description Use Case
Simple Random
Every unit has an equal chance of selection. Small, well-defined populations
Sampling
Systematic
Select every kth unit after a random start. Manufacturing, quality control
Sampling
Divide the population into subgroups (strata), Surveys needing demographic
Stratified Sampling
then sample from each. representation
Divide population into clusters, then randomly
Cluster Sampling Wide geographic studies
choose clusters.
Multistage Combines two or more sampling methods (e.g., Large-scale national or
Sampling clusters → individuals). educational studies
1. Simple Random Sampling
Definition:
Every member of the population has an equal and independent chance of being selected.
Working Principle:
Use a random number generator, lottery method, or software to select individuals.
Requires a complete list (sampling frame) of the population.
Example:
From a list of 1,000 students, randomly select 100 to take a survey.
Advantages:
Minimizes selection bias
Easy to analyze statistically
Disadvantages:
Not feasible for large populations without a complete list
May not represent subgroups proportionally
2. Systematic Sampling
Definition:
Select every kᵗʰ element from a list after randomly selecting a starting point.
Working Principle:
Determine the sampling interval k = N/n (where N is the population size and n is the
sample size)
Randomly choose a starting point between 1 and k
Select every kᵗʰ individual from the list
Example:
From a list of 1,000 households, if a sample of 100 is needed, choose every 10ᵗʰ household after
a random start between 1 and 10.
Advantages:
Simple and quick
Ensures evenly spread sample
Disadvantages:
Can lead to bias if there's a hidden pattern in the population list
3. Stratified Sampling
Definition:
Divide the population into homogeneous subgroups (strata) and randomly sample from each
subgroup.
Working Principle:
Identify key subgroups (e.g., age, gender, income)
Perform simple random sampling within each subgroup
Example:
In a school with 60% females and 40% males, to sample 100 students, randomly select 60
females and 40 males.
Advantages:
Ensures representation of all key subgroups
Greater precision than simple random sampling
Disadvantages:
Requires knowledge of population structure
More complex to organize
4. Cluster Sampling
Definition:
Divide the population into clusters, then randomly select entire clusters for the sample.
Working Principle:
Clusters are often naturally occurring groups (e.g., schools, neighborhoods)
Randomly select some clusters
Include all members of the selected clusters, or sample within them
Example:
Randomly choose 5 schools out of 50, and survey all students in those 5 schools.
Advantages:
Cost-effective and practical for large, spread-out populations
Easier to implement when full population list is unavailable
Disadvantages:
Higher sampling error if clusters are not homogeneous
Less precision than stratified or simple random sampling
5. Multistage Sampling
Definition:
A complex form of cluster sampling, where sampling is done in multiple stages, often
combining several techniques.
Working Principle:
Stage 1: Randomly select clusters
Stage 2: Randomly sample individuals within those clusters
Example:
Stage 1: Randomly select districts →
Stage 2: Within each district, randomly select schools →
Stage 3: Within each school, randomly select students
Advantages:
Efficient for large, diverse populations
Flexible and cost-effective
Disadvantages:
Increased complexity
Higher potential for sampling error
2. Non-Probability Sampling Designs
Not every unit has a known or equal chance of selection.
Sampling
Description Use Case
Method
Convenience Selects the most accessible subjects (e.g.,
Pilot studies, quick surveys
Sampling people nearby).
Judgmental Selection based on researcher’s judgment or
Expert panels, market research
Sampling expertise.
Quota population is divided into specific subgroups, and Opinion polls, market research
a predetermined number (quota) of subjects is and social sciences when time or
Sampling
Description Use Case
Method
Sampling selected from each subgroup resources are limited.
Snowball Participants recruit other participants (useful Social networks, drug use
Sampling for hidden populations). research
1. Convenience Sampling
Definition:
A sampling method where subjects are selected based on their easy availability and willingness
to participate.
Working Principle:
The researcher selects whoever is easiest to reach.
No effort is made to ensure the sample is representative.
Example:
A college student surveys classmates in the cafeteria because they are easily accessible.
Advantages:
Fast and inexpensive
Easy to implement
Disadvantages:
High risk of bias
Results are not generalizable to the broader population
2. Judgmental Sampling (Purposive Sampling)
Definition:
The researcher intentionally selects individuals who are most appropriate or relevant to the
study.
Working Principle:
Selection is based on the researcher’s expert judgment about who will provide the best
information.
Example:
A medical researcher selects only doctors with 10+ years of experience to study opinions on a
new treatment.
Advantages:
Focuses on knowledge-rich individuals
Useful in qualitative or exploratory research
Disadvantages:
Subjective and prone to bias
Cannot generalize findings to a broader population
3. Quota Sampling
Definition:
A method where the population is divided into subgroups, and a specific number (quota) is
filled for each subgroup—not using random selection.
Working Principle:
Identify key characteristics (e.g., age, gender)
Set quotas for each subgroup
Select individuals non-randomly until quotas are met
Example:
Survey 100 people: 50 men and 50 women. The researcher interviews people until those
numbers are reached, based on convenience.
Advantages:
Ensures representation of specific subgroups
Faster and cheaper than stratified random sampling
Disadvantages:
Non-random selection can lead to sampling bias
Less reliable than probability sampling
4. Snowball Sampling
Definition:
A method where existing study subjects recruit future subjects from among their social
networks.
Working Principle:
Start with a few known individuals (seeds)
Ask them to refer others who meet the criteria
The sample “snowballs” as more people are referred
Example:
To study drug use in a hidden population, a researcher starts with a known user and asks them to
refer other users.
Advantages:
Useful for hard-to-reach or hidden populations
Cost-effective for rare subjects
Disadvantages:
Sample may be homogeneous (limited diversity)
Introduces referral bias
No control over the sampling frame
Key Differences
Feature Probability Sampling Non-Probability Sampling
Selection Method Randomized Non-random
Representative? Yes (more likely) Not guaranteed
Bias Risk Low High
Cost and Time Often higher Usually lower
Statistical Validity High Limited
Example Scenario: Student Satisfaction Survey
Step Example
Population All students in a university
Step Example
Sampling Frame List of enrolled students
Sample Size 300 students
Sampling Design Stratified sampling by department
Sampling Unit Each student
6. Measurement and Scaling Techniques
Measurement
Measurement is the process of assigning numbers or symbols to objects, events, or
characteristics according to specific rules, to represent quantities or qualities of attributes.
For example: measuring customer satisfaction, intelligence, or brand preference.
Scaling
Scaling refers to the process of adjusting the range, size, or level of something to make it
comparable, manageable, or more effective. It involves developing rules to quantify abstract
concepts (like attitudes or perceptions).
Objectives of Measurement & Scaling
Quantify subjective data (e.g., opinions, preferences)
Enable comparison and analysis
Support decision-making
Ensure consistency and reliability in data collection
Levels of Measurement (Scales of Measurement)
There are four primary types of measurement scales:
Level Description Example
Nominal Categorizes data without any order Gender (male/female), colors
Satisfaction (satisfied > neutral >
Ordinal Ranks data but doesn’t show exact difference
dissatisfied)
Numeric scales with equal intervals, no true
Interval Temperature in Celsius, IQ scores
zero
Ratio Like interval, but has a meaningful zero Age, weight, income, height
Types of Scaling Techniques
Scaling techniques are methods used to construct scales for measuring variables. They fall into
two broad categories: Comparative and Non-Comparative.
A. Comparative Scaling Techniques
Involve comparing one item directly with another. Respondents are asked to evaluate one object
relative to another, rather than in isolation. These methods are often used in marketing research,
psychology, and surveys where preference or priority needs to be identified.
Type Description Example
Paired Respondents choose between two items at a
Coke vs. Pepsi
Comparison time.
Rank Order Rank items in order of preference. Ranking 5 brands of smartphones
Allocate a constant total (e.g., 100 points) Allocate 100 points among brands by
Constant Sum
among items. preference
Sorting personality traits into “most/least
Q-Sort Scaling Sort items into predefined categories.
like me”
1 Paired Comparison Scaling
Definition: Respondents are presented with two items at a time and asked to select one, based
on a certain criterion (e.g., preference, importance, quality).
Example:
You are shown these pairs:
Tea vs Coffee → You choose Coffee
Coffee vs Juice → You choose Coffee
Tea vs Juice → You choose Juice
From this, a preference order can be inferred.
🔹 Characteristics:
Easy for respondents (only two items at a time)
Requires multiple comparisons (n(n-1)/2 for n items)
Good for small sets (3–10 items)
🔹 Advantages:
Simple and intuitive
Reduces decision complexity
🔹 Disadvantages:
Not practical with large item sets (time-consuming)
Can lead to inconsistent choices
2️ Rank Order Scaling
🔹 Definition: Respondents are shown all items simultaneously and asked to rank them from
most to least preferred (or vice versa).
🔹 Example:
Rank the following from most preferred (1) to least preferred (4):
1. Tea
2. Coffee
3. Juice
4. Soda
You might respond:
1 = Coffee
2 = Juice
3 = Tea
4 = Soda
🔹 Characteristics:
Provides ordinal data (not interval)
Allows for comparative insights
🔹 Advantages:
Efficient even for moderate number of items
Reveals full preference order
🔹 Disadvantages:
No indication of how much more one item is preferred than another
Can be cognitively demanding if the list is long
3️ Constant Sum Scaling
🔹 Definition: Respondents allocate a fixed number of points (e.g., 100) across a set of items
to indicate the relative importance or preference.
🔹 Example:
Distribute 100 points among the following based on how important they are in choosing a drink:
Taste
Price
Availability
Brand
You might respond:
Taste: 40
Price: 30
Availability: 20
Brand: 10
🔹 Characteristics:
Provides ratio-level data
Reveals relative weight of each attribute
🔹 Advantages:
More precise than simple ranking
Useful in resource allocation and priority setting
🔹 Disadvantages:
Requires more effort
Some respondents may struggle with allocating exactly 100 points
4️Q-Sort Scaling
🔹 Definition: Respondents sort a set of statements or items into predefined categories (often a
quasi-normal distribution), based on how much they agree, prefer, or relate.
🔹 Example:
Given 10 product features, you might be asked to sort them into 5 categories:
Most preferred (2 items)
Preferred (2 items)
Neutral (2 items)
Less preferred (2 items)
Least preferred (2 items)
🔹 Characteristics:
Forces a distribution of preferences
Often used in psychology and user experience studies
🔹 Advantages:
Reduces central tendency bias
Good for comparative studies involving subjective judgments
🔹 Disadvantages:
May feel artificial or restrictive
Not suitable for all types of stimuli or populations
Summary Table:
Technique Type of Data Comparison Style Best Use Case
Paired
Ordinal One pair at a time Simple preference tests
Comparison
When full preference order is
Rank Order Ordinal Rank all items
needed
Allocate fixed
Constant Sum Ratio Prioritizing or weighting factors
points
Ordinal (quasi-
Q-Sort Scaling Categorized sorting Subjective traits or attitudes
normal)
B. Non-Comparative Scaling Techniques
Non-comparative scaling techniques involve evaluating a single object (product, service, idea,
etc.) independently, rather than in comparison to others. Respondents evaluate items
independently of others. It is also called monadic scaling.
Common Types of Non-Comparative Scaling Techniques
1️Continuous Rating Scale (Graphic Rating Scale)
Description:
Respondents mark their opinion on a continuous line between two extreme points.
Example:
Indicate your satisfaction with our service:
Very Unsatisfied ──────────────|────────────── Very Satisfied
Advantages:
Fine-grained feedback
Visual and intuitive
Disadvantages:
Manual scoring can be difficult without software
Subjective interpretation of where to mark
2️Itemized Rating Scales
These are discrete, pre-defined categories used to rate an object.
a) Likert Scale
Measures level of agreement/disagreement
Usually has 5 or 7 points
Example:
"The product is easy to use."
Strongly Disagree
Disagree
Neutral
Agree
Strongly Agree
Data Level: Ordinal (often treated as interval in practice)
🔸 b) Semantic Differential Scale
Uses bipolar adjectives (e.g., good–bad, fast–slow)
Respondents mark on a 5 or 7-point scale
Example:
Fast ◀─────●─────▶ Slow
Rate our delivery speed:
Data Level: Interval
🔸 c) Numerical Rating Scale
Respondents rate items with numbers (e.g., 1 to 10)
Example:
Rate your satisfaction on a scale from 1 to 10.
Data Level: Interval
3️Staple Scale
Description:
A single adjective is placed in the center of a scale ranging from +5 to –5, without a
neutral zero.
Example:
Rate the staff's friendliness
+5 (Extremely Friendly) to –5 (Extremely Unfriendly)
diff
Copy Edit
+5
+4
+3
+2
+1
Friendliness
-1
-2
-3
-4
-5
Data Level: Interval
Advantages:
Compact and easy to administer
Can be used where bipolar adjectives don't make sense
Disadvantages:
Less intuitive for some respondents
Summary Table
Scale Type Format Data Level Typical Use
Graphic (Continuous) Scale Mark on a line Interval Satisfaction, pain levels
Likert Scale Agreement levels Ordinal Attitudes, opinions
Semantic Differential Bipolar adjectives Interval Brand image, perception
Numerical Rating 1–10 or 1–5 scale Interval Satisfaction, quality
Staple Scale +5 to –5 scale Interval Attitude measurement
Applications of Non-Comparative Scaling
Marketing research (customer satisfaction, product evaluation)
Psychology (attitudes, behavior measurement)
Service quality studies
Brand perception analysis
UX/UI testing and feedback
Advantages
Simple to administer and analyze
Doesn’t overwhelm respondents with comparisons
Generates absolute values, useful for benchmarking
Limitations
Does not show relative preferences among items
Potential bias due to respondents' interpretation of scales
Assumes equal intervals in some types, which may not be accurate
7. Data Processing
Data processing is the process of collecting and manipulating data to produce
meaningful information. It converts raw, unorganized data into a usable format through a
sequence of operations.
Stages of Data Processing
Data processing typically follows a systematic, multi-stage flow. Each stage plays a crucial role
in ensuring data is accurate, useful, and ready for analysis or decision-making.
1. Data Collection
Purpose: Gather raw data from various sources.
Sources can include:
Surveys or questionnaires
IoT devices and sensors
Databases or data lakes
Web scraping
Government datasets (e.g., census data)
Business transactions
Importance: The quality of your output depends on the quality of your input ("Garbage in,
garbage out").
2. Data Preparation (Data Cleaning)
Purpose: Make raw data usable by detecting and correcting errors or inconsistencies.
Common tasks:
Handling missing or null values
Removing duplicates
Filtering out outliers
Correcting typos and mislabels
Normalizing or standardizing data formats
Converting data types (e.g., string to date)
Goal: Ensure the dataset is accurate, consistent, and structured.
3. Data Input
Purpose: Feed cleaned data into a system for processing.
Methods:
Manual input (e.g., forms)
Automated loading (e.g., using scripts, APIs(Application Programming Interface), or
ETL (Extract → Transform → Load) tools)
File imports (e.g., CSV(Comma-Separated Values), Excel, JSON ((JavaScript Object
Notation), SQL(Querying Census Data)
Goal: Ensure data is correctly stored in databases or data pipelines for further use.
4. Data Processing
Purpose: Perform operations that convert raw input into useful outputs.
Techniques vary by use case and may include:
Sorting and filtering
Aggregation (e.g., sum, average)
Transformation (e.g., normalization, encoding)
Joining datasets
Applying algorithms or statistical models
This is the "engine room" where value is created.
5. Data Storage
Purpose: Save processed data for retrieval, analysis, and future use.
Storage systems include:
Relational databases (e.g., MySQL, PostgreSQL)
Data warehouses (e.g., Amazon Redshift, Snowflake)
Cloud storage (e.g., AWS S3, Google Cloud Storage)
NoSQL databases (e.g., MongoDB, Cassandra)
Importance: Well-organized storage ensures fast access and scalability.
6. Data Output
Purpose: Present the processed data in a user-friendly, actionable format.
Output formats can include:
Reports (PDF, Excel, dashboards)
Visualizations (charts, maps, graphs)
APIs or data feeds
Alerts or notifications
Goal: Translate data into information that stakeholders can understand and act on.
7. Data Interpretation/Analysis
Purpose: Draw insights, detect patterns, and support decision-making.
Approaches:
Descriptive statistics (mean, median, mode)
Predictive analytics (regression, classification)
Inferential statistics (regression, hypothesis testing)
Machine learning models
Trend and pattern detection
Outcome: Data-driven decisions, policy-making, or business strategies.
Summary Table
Stage Key Focus Main Tools
Data Collection Acquiring raw data Surveys, APIs, Web scraping
Data Preparation Cleaning and formatting Python (Pandas), Excel, OpenRefine
Data Input Feeding data into systems Scripts, ETL tools, SQL
Data Processing Manipulating and transforming data Python, R, SQL, Spark
Data Storage Saving for future use Databases, Data lakes, Cloud storage
Data Output Reporting and visualizing Power BI, Tableau, Dashboards
Data Interpretation Analyzing and making decisions Analytics, ML, Statistical methods