Study Pack QTB PDF

QUANTITATIVE TECHNIQUES IN BUSINESS QTB
INTRODUCTION TO QUANTITATIVE
TECHNIQUES IN BUSINESS
1
SESSION 1:
INTRODUCTION TO QUANTITATIVE TECHNIQUES IN BUSINESS

A. Lesson Objective
This lesson will enable students to:
1. What is meant by Quantitative Techniques in Business?
2. Why to study Quantitative Techniques in Business?
3. Identify the Research Problem and how to write effective Problem Statement?
4. Understand some core concepts including constant, variables research questions, hypothesis and data?
B. Lesson Outline
1) What is QTB?
2) Why to study QTB?
3) Some core concepts in Quantitative Techniques in Business
a. Research Problems & Problem statement
b. Constant and Variables
i. Types of variables
1. With respect to relationship
a. Independent variable
b. Dependent variable
c. Mediating variable
d. Moderating variable
2. With respect to data
a. Categorical variable (Nominal, Ordinal)
b. Numerical variable (Discrete, continues)
c. Research Questions
i. Types of Research questions
1. Descriptive research Questions
2. Differential research Questions
3. Associational research Questions
4. Complex Research Questions
d. Research Hypothesis
i. Types of Hypothesis
1. Null Hypothesis
2. Alternative Hypothesis
2
QUATTITATIVE TECHNIQUES IN BUSINESS
Business is a regular human activity of producing and acquiring goods/ services to satisfy human needs with
the objective of earning profit and wealth. Business includes a number of activities like exchange of goods and
services, transactions with buyers and sellers, production and distribution of goods and business skills for
uncertainties. Business is all about making decisions related to different managerial fields. The business
decisions may be related to the marketing strategy, managing human resource, investing finance, efficient
production and in time procurement with the objective of increasing profit, decreasing cost and maximize
market share. In our daily practical life there are always uncertainties; those uncertainties cannot be
completely eliminated but can be reduced or there effect can be minimized by making effective decisions on
the basis of relevant data.
Examples
 Marketing department needs to have updated information about the target markets, competitors,
consumer buying behaviors and market situation In order to launch a new product.
 Human Resource department needs to have data of current employees and growth rates of the
company in order to predict and plan the future needs of human resources.
 Finance department needs to have statistical data regarding cost of production and sales to have
financial forecasts breakeven analysis and investments decisions.
Why to study Quantitative techniques in Business?

QTB is different from other related courses offered to students as it encompasses the whole sphere of issues
related to managerial decision irrespective of the area in which they are operating. Other statistical courses
are theoretically taught while QTB is emphasized on deriving information for solving practical problem.
Furthermore this course is essential to study as this course enables us to:
 Gather, sort, analyze and interpret the data

 Have latest updated, accurate, yet relevant information about different environmental factors
 Understand and compare different types of situations we confront in our business activities
 Predict and forecast about the future needs of the business
 Develop effective policies and business related strategies
 Make effective decisions that helps to achieve business goals efficiently
 All research whether academic or applied is based on Quantitative Techniques
 Thesis writing, which is essential for attaining degree, is based on Quantitative Techniques
3
SOME BASIC CONCEPTS/TERMS OF QUANTITATIVE RESEARCH
Before proceeding to the quantitative techniques used in business, it is essential to understand some basic
concepts related to these techniques. These concepts are as follows:
Research Problems:
“Any problem that needs to be solved with the help of data collected through research is called Research
Problem”
According the Kerlinger, in order to solve a problem, one must know what the problem is. Understanding, and
defining the problem faced by managers, is critical to solve it because it is said that problem well defined and
understood is half solved.
Defining a problem is the first and the most important step in problem solving process. It serves as the
foundation of a research study thus if well formulated, you expect a good study to follow. The way you
formulate a research problem, determines almost every step that follows in the research study.
A Research Question is a statement that identifies the phenomenon to be studied.
Problem statement
A problem statement is a clear and concise description of any business issue that seeks for Description,
Association or difference of two or more variables.
A good problem statement is in which it is clearly defined that
1. What actually the problem is?

2. Who are the stakeholders of the problem
3. What is the scope and limitation of problem (rationally justified)
Examples
 Measure the annual turnover of employees in Higher educational sector of Pakistan
 advertisement contribute to the sales of a new product in the market
 Which of the two options i.e. stock market or real estate is better for investment.
4
Research Questions
Research problem needs to be translated into one or more research questions that are defined as
“A research question is an interrogative statement that seeks for the tentative relationship among variables
and clarifies what the researcher wants to answer.”
Example
 What is the impact of advertisement on sales of a new product in the market

 What is the annual turnover of employees in Higher educational institutions of Pakistan
 Does investing in stock market yield more return on investment as compare to investment in real
estate.
Types of Research Questions
On the basis of nature of problem, research questions are divided into three types
1. Descriptive research question: A question that is answered through Summarising data about a
single variable
Example: What is the annual turnover of employees in higher educational institutions of Pakistan?
2. Associational research question:: A question that is answered through determining strength and
direction of relationship between two or more variables
Example: What is the impact of advertisement on sales of a new product in the market?
3. Difference research question: A question that is answered through comparing and contrasting two
groups on the basis of same variable
Example Does investing in stock market yield more return on investment as compare to investment
in real estate.
5
Schematic diagram showing how the purpose and type of research question correspond to the general type
of statistic used in a study.
Research Hypothesis
Research hypotheses are predictive statements about the relationship between two variables”
Types of Hypothesis
There are two types of hypothesis
1) Null Hypothesis: A statement that nullifies the existence of predicted relationship or difference between
two variables.
Example: Ho = There is no relationship between Advertising and Sales
2) Alternative Hypothesis: A statement that relates the existence of predicted relationship or difference
between two variables.
Example: H1 = There is relationship between advertising and sales
Differences between Research Questions and Hypothesis
Research question Hypothesis

Interrogative statement Simple statement
Non-Predictive Predictive
Non-Directional Directional
6
Constant and Variable
Constant:
The term constant can be defined as a fixed value within a study or research. It can also be like a single value
of an entity or a fixed category of variable that does not change in the same research.
In a research, when something that occurs and remains unchanged in the situation is called as “CONSTANT”.
Example
 If all participants of a study are female then Gender will be constant

 If all participants of a study have the same age (i.e. 25 years) then the Age will be constant.
Variables
A variable is defined as a characteristic of the participants or situation for a given study that has different
values. A variable must vary or have different values in the study.
Vary + able = Change + able
In simple words a variable can be defined as “A variable is any factor, trait, or condition that can exist in differing amounts or types”.
“A variable is any entity that can take on different values.”
Scientists use an experiment to search for cause and effect relationships in nature. In other words, they design
an experiment so that changes to one item cause something else to vary in a predictable way.
Variables measured as a matter of degree, using real numbers. Anything that can vary in measure
Example
 Sales
 Annual turnover
 investment
Types of Variables
In quantitative research, variables are defined operationally and are commonly divided into different types on
following basis
a) On the basis of relationship

b) On the basis of data
7
a. On the basis of relationship

Variables are divided in four types on the basis of relationship.
I. Independent Variable: A variable that is not influenced in a specific situation but causes change in
other variables such as “advertising” that causes change in sales of a product. Independent variable is
also called explanatory or manipulated variable.
II. Dependent Variable: A variable that is influenced by any other variable (independent variable) in a
specific situation. As in above example sales is influenced by advertising and hence it is called
dependent variable. Dependent variable is also called outcome or response variable.
III. Mediating Variable: a variable that forms a link between independent and dependent variables working as
bridge between them. For example, in the example of advertising and sales advertising do not directly affect the
sales rather advertising creates awareness and image that in turn causes increase in sales. Here awareness and
image are the two mediating variables.
IV. Moderating Variable: a variable that reduces the intensity or strength of independent and dependent
variables. For example, competitor’s product, price, placement, or packaging moderates the relationship
between advertising and sales. For example:
8
b. On the basis of Data
On the basis of data, a variable is divided into two major categories; (i) Categorical Data and (ii)
Numerical Data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Both the types are further divided into two categories as discussed below.
1. Categorical variable: A variable whose values are not numerical in nature. For example Gender (Male,
female), Religion (Islam, Christianity, Jews, etc), Motivation level (High, medium, low)
Types of Categorical variable:
a) Nominal variable: A categorical variable whose values are not ordered for example
Gender Male, Female
b) Ordinal variable: A categorical variable whose values are in ordered for example
Education Metric, inter, graduation
2. Numerical variable A variable whose values are numerical in nature for example
No of employees (23, 45, 69, 100), Collar size (14, 14.5, 15, 15.5…), Height (5.7, 5.8, 5.3)
Types of Numerical variable
1. Discrete variable A numerical variable whose values have same interval for example
Number of employees (23, 45, 69, 100), Collar size (14.5, 15, 15.5…)
2. Continuous variable A numerical variable whose values don’t have same interval for example
Speed 40.1, 45.7, 67.5………. Km/h
Height 5.7, 5.8, 5.3 feet
9
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
 Category  Category  Category  Numbers
 Words  Order  Numbers  Touch every point
 Non order  Words  Specific
interval
References
Morgan, L. Leech, W. Gloeckner & Barrett (2007) SPSS for Introductory Statistics: Use and Interpretation (3rd
ed.) Mahwah, NJ: Lawrence Erlbaum Associates.
Jarrett, D. (2007) Using SPSS (6th ed.) Middlesex University.
Pallant, J. SPSS Survival Manual A Step by Step Guide to Data Analysis using SPSS for Windows (3rd ed.)
McGraw Hill Open University Press
10
ACTIVITY
11
Write down three examples of each type of variables
Types of variables on the basis of Relationship

Independent variable Dependent variable Mediating variable Moderating variable
1. 1. 1. 1.
2. 2. 2. 2.
3. 3. 3. 3.
Types of variables on the basis of data type

Categorical variable Numerical variable
Nominal variable Ordinal variable Discrete variable Continuous variable
1. 1. 1. 1.
2. 2. 2. 2.
3. 3. 3. 3.
1. Categorize the following variables according to their types
Gender, Marital Status, nationality, qualification , motivation level, ethnicity, income, color size, colors of cars
2. For given research problems faced by managers answer the following queries
SITUATION: HR manager of ABC Company is facing high rate of employee’s turnover due to which
organizational performance is affecting.
Evaluation criteria (Total Marks: 10)
1 Develop problem statement 10%

2 Identify variables and their types 20%
3 Develop research question 20%
4 Develop hypothesis 10%
5 Decide about design of survey 20%
6 Decide which type of data will be collected. 20%
Instruction:
It is an individual assignment that is to be submitted in the form of hard copy to your course instructor.
12
Quantitative Techniques In Business

Assignment No. 1
Dear Students:
You are supposed to create a situation and identify the problem. Then develop a problem statement. After
developing the problem statement you have to identify some variables from that situation and also have to
mention their type as well. Then from that assumed situation you have to develop a research question. After
developing that research question, develop a theoretical model you have to prepare hypothesis
A. You are advised to follow these steps:

a. develop problem statement
b. identify variables and their types
c. develop research question
d. develop a Theoretical model
e. develop hypothesis
b) Evaluation criteria (Total Marks: 10)
1 Develop problem statement 20%

2 Identify variables and their types 20%
3 Develop research question 20%
4 Develop a theoretical model 20%
5 Develop hypothesis 20%
c) Instruction:
13
SOURCES OF DATA
14
Session objective:
 What is data and its different types

 Throughout this session you will able to understand how to access the primary and secondary data
 How to design questionnaire
 How to conduct survey
Data
A set of raw facts and figures related to a specific problem is called Data.
OR
Data are the observation about the social world.
OR
Data is information that has been translated into a form that is more convenient to move or process
Example: Age: 16, 18, 20, 21, 23,
Nationality: Pakistani, Indian, American
Data
Nature Time Frame Source
Cross
Quantitative Primary
Sectional
Qualitative Longitudional Secondary
Time series
15
Types of data
Data is divided on three bases
1. Nature of data
Nature wise data can be of two types
i. Quantitative data: a data that consist of numbers for example data about age consists of values
like 16, 18, 20, 21, 23 (years)
ii. Qualitative data: a data that consist of words rather than numbers. For example data about
Nationality consists of values like Pakistani, Indian, and American etc.
2. Time frame: Time wise data can be of two types
i. Cross-sectional data: Data that is collected from different units at once
ii. Time Series data: Data that is collected from same units on different time with same time
interval
iii. Longitudinal data: A dataset is longitudinal if it tracks the same type of information on the
same subjects at multiple points in time. For example, part of a longitudinal dataset could
contain specific students and their standardized test scores in six successive years.
3. Sources of Data:
The information gathered from data is used for further analysis so that conclusions may be drawn out of
them. Will this data truly represent our population of interest? Is our sample properly selected? Before
answering these questions it is important to understand from where to find the accurate data, or what our
source of data will be.
3.1 Primary and Secondary Data
The sources of data are categorized into primary and secondary sources.
a. Primary Data: Primary data is such data which comes from an original (primary) source and are collected
with a specific research question in mind (Alan H. Kvanli, c. Stephen Guynes Robert J. Pavur)
For example: you want to collect data on motivation
For this purpose you may first design a questionnaire in which your research questions will be
addressed and then to collect the data from your representative sample, which in this case would be
employed women.
Advantage & Disadvantage:
Even though this is a time consuming and expensive method of data collection, but in the long run it
provides essential information and also allows to build quality research.
16
b) Secondary Data: Secondary data represents the previously recorded data collected for another
purpose. Such source of data are easier and much cheaper to obtain as compared to primary data, but
it is not necessary that it accurately addresses your research question.
For example: An annual report
Advantages:
a) It can cover many years of operation
b) It may exist for many geographical regions
Disadvantages:
a) The data may be out of date

b) There is no way to verify that the data was properly obtained
How to collect primary data
Survey method is used to collect primary data. The details of survey are discussed below:
Survey:
Survey is a quantitative research strategy that involves the structured collection of data from a pre-
determined sample. It involves following methods.
1. Questionnaire
Survey Design
17
1: Objectives of Survey
The first step of survey design is to clearly define that why we are going to conduct the survey.
Example: The basic aim of this survey is to collect updated, accurate yet relevant data in order to answer a
research problem
2. Survey Design:
After setting objectives of survey we develop the plan (design) of survey deciding that:
 Whom to survey (Sample Selecting)
 Where to survey (Site Selecting)
 How to survey (Method)
 What to survey (Questions for required information)
 How to develop Questionnaire?
2.1 Designing a questionnaire:

Questionnaires as statistical tools began nearly 100 years ago when an early research pioneer, Francis Galton,
realized that these instruments were an excellent device for studying behavior that could not be observed
or experimented on directly. They provide a very versatile and relatively inexpensive method of collecting
primary data.
2.2 Five Steps in Questionnaire Design
• Decide what information is required.

1
• Draft some questions to elicit the information.

2
• Put them into a meaningful order and format.

3
• Pre-test the questionnaire.

4
• Go back to Step 1, and continue until the questionnaire is perfect.

5
18
2.3 Decide what information is required

In general, it is best to restrict on a single issue. Otherwise, the purpose of the instrument is unclear
and the length of instrument can be excessive. If you are the first person to start investigation than you
will have to collect the data from scratch. A key decision for each question is whether to make it open
ended or close ended the example of close ended question would be:
Strongly agree Agree undecided disagree strongly disagree
I satisfied with my job ______ _____ _____ ____ ___________
The main advantage of open ended is to get in depth information and close ended is to get limited
Information. Here is some example of close ended question
 Likert scale question
 Yes/no or true/false question
 Multiple choice question
 Rank in order question
2.4 Draft some questions to elicit the information: here are some key things to keep in mind at this stage
 Reading level
 Leading question
 Questions that ask too much
 Measuring in middle position
2.5 Put them into a meaningful order and format:
 Include introductory and closing statement
 Deciding on the questionnaire length
 Pay close attention to the question sequence
2.6 Pre-test the questionnaire:
In this stage you need to encourage respondents not only to answer the question but also to comment
on their wording and clarity. Once you identify the problem revise the instrument according to the
problem
2.7 Go back to Step 1, and continue until the questionnaire is perfect
2.8 Fowler (1998) lists five characteristics of good questions:
1. Questions need to be consistently understood.
19
2. Questions need to be consistently administered or communicated to the respondents.

3. What constitutes an adequate answer should be consistently communicated.
4. Unless measuring knowledge is the goal of the question, all respondents should have access to the
information needed to answer the question accurately.
5. Respondents must be willing to provide the answers called for in the question
3. Pilot Test
A pilot study is a small scale run of the survey. It is designed to test the questionnaire and other
aspects of the survey design. It is process of checking/assessing the accuracy of the wording sequence
and ability to understand the question by conducting survey from one or two respondent as a trail in
order to refine questionnaire
4. Fieldwork/conduct a survey
It is a process of collecting data actually from the target sample. It can be done in following ways:
 Self administered survey
 Postal survey
 Online survey
5. Data Preparation
After getting your survey completed and knowing the interface of the SPSS the next step is to prepare
the data for analysis. This process involves four steps.
1. Coding the questionnaire.
2. Defining the variables in SPSS variable view.
3. Entering the data in SPSS data view.
6. Data Analysis and Interpretation
Data Analysis:
It is a process of summarizing, organizing and transforming data with the goal to highlight the useful
information, suggesting conclusions in order to support good decision making.
Data can be analyzed in two ways:
 Descriptive
20
 Inferential
Interpretation:
Interpretation is a process of making sense of results by explaining and assigning meaning to them.
7. Discussion and Conclusion
 Discussion is an extended communication (often interactive) dealing with some particular topic
 Conclusion summarizes the major inferences that can be drawn from the information presented in the
report. It answers the questions raised by the original research problem or stated purpose of the
report (Blake & Bly, 1993) and states the conclusions reached. Finally, the conclusion of your report
should also attempt to show ‘what it all means’: the significance of the findings reported and their
impact (Weaver & Weaver, 1977).
Some characteristics of a good conclusion
 The conclusion/s presented in a report must be:

 Related to the selected topic or research problem,
 Resulting from and justified by the material which appears in the report,
 The conclusion must not introduce any new material,
 It should report on all the conclusions that the evidence dictates as it is NOT the job of a conclusion to
“gloss over conclusions that are puzzling, unpleasant, incomplete or don’t seem to fit into your
scheme” (Weaver & Weaver, 1977: 98).
 Doing this would indicate writer bias and mean your conclusion may mislead the reader.
8. Report Writing:
A report is a reflection paper of your research on a topic. It is a critical analysis of the information that you
found, and your own conclusions on the results of the research. Your report should be written for definite
readers. You should state your conclusions on the basis of the known information. Pick up the most important
information which will help you to draw to the conclusions.
The essentials of good/effective report writing are as follows-
 Know your objective, i.e., be focused.

 Analyze the niche audience, i.e., make an analysis of the target audience, the purpose for which
audience requires the report, kind of data audience is looking for in the report, the implications of
report reading, etc.
 Decide the length of report.
 Disclose correct and true information in a report.
 Discuss all sides of the problem reasonably and impartially. Include all relevant facts in a report.
21
 Concentrate on the report structure and matter. Pre-decide the report writing style. Use vivid structure
of sentences.
 The report should be neatly presented and should be carefully documented.
 Highlight and recap the main message in a report.
 Encourage feedback on the report from the critics. The feedback, if negative, might be useful if
properly supported with reasons by the critics. The report can be modified based on such feedback.
 Use graphs, pie-charts, etc to show the numerical data records over years.
 Decide on the margins on a report. Ideally, the top and the side margins should be the same (minimum
1 inch broad), but the lower/bottom margins can be one and a half times as broad as others.
 Attempt to generate reader’s interest by making appropriate paragraphs, giving bold headings for each
paragraph, using bullets wherever required, etc.
An effective report can be written going through the following steps-
 Determine the objective of the report, i.e., identify the problem.

 Collect the required material (facts) for the report.
 Study and examine the facts gathered.
 Plan the facts for the report.
 Prepare an outline for the report, i.e., draft the report.
 Edit the drafted report.
 Distribute the draft report to the advisory team and ask for feedback and recommendations.
How to collect secondary of data
Sources that are used to collect secondary data can be:
•Published reports
•Government statistics
•Scientific and technical Abstracts
•Company's financial statements
•Banks reports
Some large volume of data on a very wide assortment of variables are extensively available on:
 U.S. Bureau of the Census, Statistical Abstracts of the United States

 Canadian Business and Current Affairs
 United Nations Statistical Publications and Documents
 Census of the Population
 Wall street Journal
22
 Fortune Magazine
Some websites are available to collect the secondary data
1 • www.wdi.com
2 • www.pwt.com
3 • www.ifs.com
4 • www.fbs.com
5 • www.sbp.com
6 • www.tndeconomysurvey/data.com
7 • www.webdevelopers.com
8 • WWW.INDEXMUNDI.COM
9 • WWW.LSE.COM.PK
STEP 1: WWW.GOOGLE.COM
Type “WDI” click search
Click on the first result
23
Step 2: click data bank on the WDI web page
Step 3: the following window allows you to select your desire country for the extraction of data
24
Step 4: the following window will enables you to select your required variables from the list
Step 5: the following window enables you to select years
Step 6: following window will allows you to view and download the data as instructions were given in
the previous steps
Click on the button “view data” on the right side of the page
25
Step 7: the new window will appear on your screen
This window will enable you to view your selected data further on by clicking on the excel icon, your data
will be downloaded in excel format
26
Primary and Secondary Sources in Different Disciplines
Discipline Primary Secondary

Marketing Focus groups, marketing Sales records that have been
experiments tabulated by an information
service that buys its data from
retail outlets.
Management Personnel records Data provided by state or federal
agencies
Finance Lahore stock exchange or financial Company reports, magazine or
data files newspaper articles
Accounting Auditing of primary records Records kept and administered by
an external accounting firm
27

Assignment No. 2
Dear Students:
In the last assignments you have successfully developed your model and identified the problem, in this
assignment you are required to develop an instrument (questionnaire) in order to collect data on your
identified problem. Please note that if you are to collect primary data the use of questionnaire will be done,
and if secondary data then you have to wisely select the appropriate data source fulfilling your identified
problem.
You are advised to follow these steps while using Primary Data Source
a) Decide what information is required.

b) Draft some questions to elicit the information
c) Put them into a meaningful order and format
d) Pre-test the questionnaire
e) Go back to Step 1, and continue until the questionnaire is perfect.
Evaluation criteria (Total Marks: 10)
1 Decide what information is required 20%

2 Draft some questions to elicit the information 20%
3 Put them into a meaningful order and format 20%
4 Pre-test the questionnaire 20%
5 Developed questionnaire 20%
You are advised to follow these steps while using Secondary Data Source
1 Appropriate formula 20%

2 Relevant data 20%
3 Selection of relevant site 20%
4 Quality data 20%
Instruction:
References:
Kvanli, A. H., Guymes, C. S., & Pavur, R. J. (1996). Intoduction to Business Statistics. USA: West Publishing Company.
28
Preparing data using SPSS
29
A. Lesson Objective
After attending this session, the students will be able to:
1. History of SPSS
2. Understand what is SPSS
3. How to run SPSS software
4. Understand how to Code the Questionnaire
5. Learn How to define the variables using variable view in SPSS
6. Learn How to enter the data using Data view in SPSS
B. Lesson Outline
1) Introduction to SPSS
2) How to run SPSS
3) SPSS Interface
4) Data Preparation (Processing)
5) Data Analysis
Introduction to SPSS
Various software’s are available in the market through which the quantitative data can be analyzed such as
STATA, E-Views, Minitab and SPSS, the most commonly used package is SPSS.
SPSS stands for “statistical package for social sciences”. It is software that is basically used for the analysis of
quantitative data.
Background of SPSS:
Released in 1968 SPSS (Statistical Package for the Social Sciences) was developed by Norman H. Niw and C.
Hadlai Hull. SPSS is among the most widely used programs for statistical analysis in social science. It is used by
market researchers, health researchers, survey companies, government, education researchers, marketing
organizations and others. The original SPSS manual (Nie, Bent & Hull, 1970) has been described as one of
"sociology's most influential books".
This software includes:
 Data preparation: coding, variable definition, data entry and reliability n validity
30
 Descriptive statistics:, Frequencies, Central tendency, dispersion and graphical representation

 Inferential statistics: ANOVA, Correlation (Nonparametric tests)
 Prediction for numerical outcomes: Linear regression
SPAA Versions
The following versions of SPSS can be found
SPSS 15.0.1 - November 2006, SPSS 16.0.2 - April 2008, SPSS Statistics 17.0.1 - December 2008, PASW
Statistics 17.0.3 - September 2009, PASW Statistics 18.0 - August 2009, PASW Statistics 18.0.1 - December
2009, PASW Statistics 18.0.2 - April 2010, PASW Statistics 18.0.3 - September 2010, IBM SPSS Statistics 19.0 -
August 2010, IBM SPSS Statistics 20.0 - August 2011, IBM SPSS Statistics 21.0 - August 2012
1. How to open SPSS
1. 3. 4. 5.
2.
Start SPSS Inc. SPSS Welcom
Programs
Menu 16.0 Window
2. SPSS Interface
SPSS has user friendly interface similar to MS. Excel software including two sheets having row and columns format. It
comprises of
1. Title bar (at the top showing title of file)
2. Menu bar (below the title showing menu list)
3. Tool bar (showing different tools)
4. List of attributes of variables (Header row)
5. Serial Number (left most column)
6. Working area (cells comprising row and columns)
7. Scroll bars (right most and lowest end)
8. Views tabs (variable view, data view and output view)
8.1 Variable View
31
Variable view is used to define the variables on the basis of different attributes it includes.
 Rows indicate variables.

 Columns indicate attributes variable
You can add or delete variables and modify attributes of
variables, including the following attributes:
8.1.1 Name of the variable (Short without space)
8.1.2 Type (Numeric, String etc)
8.1.3 Width (8, 10, etc)
8.1.4 Decimals (2, 3, 5 etc for continuous variables)
8.1.5 Label (Full name of the variable)
8.1.6 Values (answer categories with codes)
8.1.7 Missing (blank, multiple, wrong answers)
8.1.8 Columns (6, 8, 10 etc)
8.1.9 Align (Left, right, centre)
8.1.10 Measure (Nominal, ordinal, scale)
8.2 Data View
Data view is used to enter the data of each case (row wise) against each variable (column wise) according to
the coding scheme, in the form of a data matrix
Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a
questionnaire is a case.
Columns are variables. Each column represents a variable or characteristic that is being measured. For
example, each item on a questionnaire is a variable.
Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and
the variable intersect. Cells contain only data values.
Output view. An interface that enables the user to view the results of every test applied on the data file.
32
3. Data Preparation (Processing)

After getting your survey completed (Sample attached as annexure 1) and knowing the interface of the SPSS the
next step in quantitative research process is to prepare the data for analysis. This process involves four steps.
3.1 Coding the questionnaire.
3.2 Defining the variables in SPSS variable view.
3.3 Entering the data in SPSS data view.
3.4 Checking the data for errors.
3.1 Coding the questionnaire

After assigning ID numbers to the completed questionnaires, the researcher should begin the coding
process. Coding is the process of assigning numbers to the values or levels of each variable. Before
starting the coding process, you should keep in mind some coding rules to avoid any coding mistakes.
These rules are as under
Rules of Coding
 All data should be numeric. (e.g. Male = 1 and Female = 2)
 Each variable must occupy the same column. (one column for one variable)
 All values (codes) for a variable must be mutually exclusive. Questions should be phrased so that
persons would logically chose only one of the provided options and all possible options should be
provided. A final category of “other” may be provided in cases where all possible options can not
be listed but these are not very useful for statistical purposes
33
 Each variable should be coded to give maximum information. Do not collapse categories or values
when you set up the codes for them rather try to code and enter the data in as detailed a form as
available. Thus enter actual test scores, GPAs etc. as specific as possible other wise use categories
to get the data.
 For each participant, there must be a code or value against each variable. These codes should be
numbers, except for variables for which the data are missing. It is recommended to use blanks for
missing data as SPSS is designed to handle blanks as missing values. Alternatively you can code
extra ordinary high values for blank, multiple or wrong answers (i.e. 98 or 99). But in this case you
must tell SPSS (while defining variables) that these codes are for missing values otherwise the SPSS
will treat them as actual data
 Apply any coding rule consistently for all participants. It means that be consistent in your coding
scheme. For example if you have decided to code male=1 and female=0 then this coding scheme
will be used for all the cases. You can not use multiple coding schemes for different cases against
same variable.
 Use high numbers (codes) for positive values (Strongly agree=5) and small numbers for negative
values (strongly disagree=1). For a variable that is ordered
2.8 Defining variables in SPSS variable View: the next step is to define the variables in SPSS. For this
purpose create and save an SPSS data file (Blank) into which you will enter the data. Click on the
variable view tab. You will find the following window
In this window the numbers in left most columns shows the serial number like 1, 2, 3, 4 ….. (row wise) and
variable attributes (column wise). Remember that each question will be named as a variable and define each
variable on the basis of following attributes by clicking in the blank boxes under them.
3.2.1 Name of the variable
Always name the variables as short as possible and also without space in them. For example Type in
“Recommen” in cell parallel to number 1 below the Name and press enter. The cursor will move forward
34
to the next cell that is TYPE. Note that each variable name must be unique; duplication is not allowed and
the first character must be a letter or one of the characters @, #, or $.
3.2.2 Type
Enter the type of variable that can be
 Numeric. A variable whose values are numbers. Values are displayed in standard numeric format.
The Data Editor accepts numeric values in standard format or in scientific notation. It can be further
specified by selecting other variable types including comma, Dot, scientific notation, date, dollar or a
custom currency
 String. A variable whose values are not numeric and therefore are not used in calculations. The
values can contain any characters up to the defined length. Uppercase and lowercase letters are
considered distinct. This type is also known as an alphanumeric variable.
But preferably numeric type should be used by giving dummy codes (male=1 and female=0) to the string
variables
3.2.3 Width
Width indicates the number of digits you can place in one value (Code). It is recommended to have
width=8 for a better output.
3.2.4 Decimals
“Decimals” indicate the number of decimal places you need to have in a code or value. Preferably it should
be not more than 2. It is preferably used in continues variables.
3.2.5 Label
In label column you need to write the full name or phrase of the variable so that you could remember that
which question was named as this variable (Recommen is labeled as “I recommend course”). It can be
upto 40 characters with spaces but it is recommended to keep it upto 20 characters so that the printouts
of results would be readable.
3.2.6 Values (answer categories with codes)
In values column numeric codes are assigned to the categories of answers (i.e 5=strongly agree etc). We
click on the “none” in then click on the three dots button and in value labels window insert value (5,4,3,2,1
etc) and Label (Strongly agree, agree, undecided, disagree, Strongly disagree) then click on add each time
and finally click OK.
3.2.7 Missing
This column is used to assign the codes for the missing values. Missing values are defined to accommodate
the errors in filling of questionnaires by respondents. Respondents can have three different types of
mistakes that are
 Blank answer the respondent (s) did not attempted a question or a series of questions
 Multiple answer the respondents (s) marked two options rather attempting only one.
 Wrong answer the respondent (s) gave answer of their own rather marking out of the given options
You can assign missing value codes (large and novel values i.e.98, 99) by clicking on the “none” in missing
value column, click on the three dot button and writing in upto three missing values in discrete missing
value option. You can also assign only one global missing value for all types of error. Remember If you do
not define missing values then SPSS will use it in analysis considering it a normal value.
35
3.2.8 Columns
This option is used to define the width of the columns in data view to accommodate number of digits in a
value against a variable. Preferably it should be 8 to accommodate the 8 digit numbers defined in width
option
3.2.9 Align
Align option is used to define the alignment (left, right, center) of the data in data view. Preferably the
numbers are aligned “right” in SPSS. So select “right” in the dropdown box of Align
3.2.10 Measure
Measure option is used to define the level of measurement of the variable. SPSS provides only three
choices for level of measurement: nominal, ordinal or scale.
 Nominal: a variable can be treated as nominal If the categories are just different names and not
ordered (Low to high), label the variables as nominal is the SPSS variable view (remember the nominal
variables with only two categories are called dichotomous but are marked Nominal in SPSS)
 Ordinal: a variable can be treated as ordinal If the categories or values of a variable vary from low to
high (i.e., are ordered) and there are only three or four such values (e.g. good better, best, or strongly
disagree, disagree, agree, strongly agree), we recommend that you label the variable ordinal. Also, if
there are five or more ordered levels or values of a variable and you suspect that the frequency
distribution of the variable is substantially non-normal, label the variable ordinal.
 Scale: a variable can be treated as scale when its values represent ordered categories with a
meaningful metric, so that distance comparisons between values are appropriate. Examples of scale
variables include age in years and income in thousands of dollars. Furthermore If the variables have five
or more ordered categories or values and you have no reason to suspect that the distribution is non-
normal, label the variable scale in the SPSS variable view measure column. If the variable is essentially
continuous (i.e. measured to one or more decimal places or is the average of several items), it is likely
to be at least approximately normally distributed, so call it scale. (Remember that SPSS marks both
interval and ratio measures as Scale)
36
Table 1 Measurement levels
Traditional Term Traditional Definition S PS S Term Our Term Our Definition
Nominal Two or more unordered Nominal Nominal Three or more unordered

categories categories
NA NA NA Dichotomous Two categories, either ordered or

unordered
Ordinal Ordered levels, in which the Ordinal Ordinal Three or more ordered levels, but
difference in magnitude the frequency distribution of the
between levels is not equal score is not normally distributed
Interval & Ratio Interval: ordered levels, in S cale Approximately Normal Many (at least 5) ordered levels
which the difference between (or Normal) or scores, with the frequency
levels is equal but no true distribution of the scores being
zero. approximately normal
Ratio: ordered levels; the
difference between levels is
equal, and a true zero
37
2 Characteristics and Examples of Our Four Levels of Measurement

Nominal Dichotomous Ordinal Normal (Scale)
3 + levels 2 Levels  3 + Levels  5 + levels

Characteristics
Not Ordered Ordered or not  Ordered levels  Ordered levels
True Categories  Unequal Intervals  Approximately
Names, labels between levels normally distributed
 Not normally  Equal Intervals
distributed between levels
 Ethnicity  Gender  Competence  SAT math

Examples  Religion  Math grades Scale  Math Achievement
 Curriculum (high vs. low)  Mother’s  Height
Type Education
 Hair Color
3.3 Data Entry in SPSS Data View:

After defining all the variables one by one in variable view of the SPSS, next step is to enter the data in
the data view of the SPSS. Click on the data view tab in SPSS you will have this form of window
Here you have numbers on the left most column that shows the number of cases (i.e 1, 2,3 ……) row
wise and the top most row showing variables that are defined in variable view (recommend, work
hard, college etc.) column wise. Click on the cell below “recommend” in front of case 1, and enter the
answer code from filled questionnaire in it (i.e. 3) and press the right arrow. Enter 5 under work hard
and press right arrow and continue to entering the data codes till last variable. Now the data of first
case against each variable is entered. Keep on the same practice until the data for each case against
each variable is entered. Put missing value (i.e. 99) wherever you find any blank, multiple or wrong
answers by respondents. The data file will look like following
38
4 Data Analysis
Data analysis is a process of organizing, summarizing, presenting, interpreting, and drawing conclusions
based on data with the goal of highlighting useful information, and supporting decision making. In
quantitative research data analysis is performed objectively using statistical techniques.
Statistics is a branch of applied mathematics concerned with the collection and interpretation of
quantitative data to draw conclusions and test (accept or reject) hypothesis. There are two levels/types of
data analysis
1. Descriptive analysis
2. Inferential analysis
Descriptive statistics will be learnt in next class
References
WWW.SPSS.COM.HK/CORPINFO/HISTORY.HTM
39
ACTIVITY
40
Class Activity Session 2
 Exercise: Please code the following sample questionnaire,

define variables, enter data in SPSS
41
42
43

Assignment No. 3
Dear Students:
Your next task is collection of data on your questionnaire and then enters data in SPSS
You are advised to follow instructions given below:
 Collect data from your research respondents (minimum sample size is 50 )

 Questionnaires should be filled independently
 Enter data into SPSS
a) Evaluation criteria (Total Marks: 10)
1 Data collected to your actual respondents 30%

3 Data entry in SPSS 70%
b) Instruction:
44
Checking Data
Reliability &
Validity
45
Session Objectives
After attending this session the students will be able to
1. Learn how to check the quality (goodness) of data

1) Reliability and validity
2) Factor analysis
3) Principle components analysis
Session outline
1. Quality of data
1.1 Reliability (Cron bach Alpha)

Cronbach's alpha is the most common measure of internal consistency ("reliability"). It is most commonly used
when you have Likert questions in a survey/questionnaire that form a scale and you wish to determine if the scale
is reliable.
Test Procedure in SPSS

1.2 Click Analyze > Scale > Reliability Analysis... on the top menu as shown below:
1. You will be presented with the Reliability Analysis dialogue box:
46
2. Transfer the variables "Qu1" to "Qu9" into the "Items:" box. You can do this by drag-and-dropping the
variables into their respective boxes or by using the button. You will be presented with the following
screen:
3. Click the button to generate the output.
SPSS Output for Cronbach's Alpha

SPSS produces many different tables. The first important table is the Reliability Statistics table that provides the
actual value for Cronbach's alpha, as shown below:
47
Interpretation
We can see that in our example, Cronbach's alpha is 0.805, which indicates a high level of internal consistency
for our scale with this specific sample. The minimum acceptable value of Cronbach’s alpha is .700, and our
result value is 0.805 which is greater than 0.700.
2. Factor Analysis and Principal Component
Two methods of analysis are the subject of this chapter, principal component analysis and factor analysis. In
very general terms, both can be seen as approaches to summarizing and uncovering any patterns in a set of
multivariate data, essentially by reducing the complexity of the data.
2.1 When factors analysis applies

 If the validation fails, we are warned that the solution found in the analysis of the full data set is not
generalizable and should not be reported as valid findings.
 We do have some options when validation fails:
 If the problem is limited to one or two variables, we can remove those variables and redo the analysis.
 Randomly selected samples are not always representative. We might try some different random
number seeds and see if our negative finding was a fluke. If we choose this option, we should do a
large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations
to negate the failed validation and support our findings is not sufficient.
2.2 Principal Component Analysis
Principal component analysis is a multivariate technique for transforming a set of related (correlated)
variables into a set of unrelated (uncorrelated) variables that account for decreasing proportions of the
variation of the original observations. The rationale behind the method is an attempt to reduce the complexity
of the data by decreasing the number of variables
2.3 Why principle components analysis apply:

 To test the generalizability of findings from a principal component analysis, we could conduct a second research
study to see if our findings are verified.
 To make the correlated Principal component analysis is a multivariate technique for transforming a set of
related (correlated) variables into a set of unrelated (uncorrelated) variables
48
2.4 Test Procedure in SPSS

1. Click Analyze > Scale > Reliability Analysis... on the top menu as shown below:
49
50
2.5 The rotation method refers to the mathematical method that SPSS rotate the axes in geometric space.
This makes it easier to determine which variables are loaded on which components.
51
52
53
SPSS Output for factor analysis:

1. SPSS produces many different tables. The first important table is the KMO and Bartlett's Test table that
provides the actual KMO (Kaiser-Meyer-Olkin Measure of Sampling Adequacy) and Bartlett's Test of Sphericity significance
value, as shown below:
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .685
Bartlett's Test of Sphericity Approx. Chi-Square 89.786
df 10
Sig. .000
If the value of KMO is .6 then data is valid for further research table given blow to check the validity of data
1. .50 Poor
2. .60 Good
3. .70 Very good
4. .80 Excellent
5. .90 Perfect
2.6 Assumption of principle components analysis
To check the assumption of principle components analysis check the value of Bartlett's Test of Sphericity
significance value if the value of significance is less than o.o5 then we apply principle components analysis.
54
Kaiser criterion
One of the commonly used techniques is known as Kaiser’s criterion, or the eigenvalue rule. Using this rule,
only factor with an eigen value of 1.0 or more are retained for further investigation (this will became clearer
when you see the example presented in this chapter). This eigenvalue of a factor represents the amount of the
total variance explained by the factor. Kaiser criterion has been criticized, however, as resulting is the
retention of too many factors in same situations
Henry Kaiser suggested a rule for selecting a number of factors m less than the number needed for perfect
reconstruction: set m equal to the number of eigenvalues greater than 1. This rule is often used in common
factor analysis as well as in PCA. Several lines of thought lead to Kaiser's rule, but the simplest is that since an
eigenvalue is the amount of variance explained by one more factor, it doesn't make sense to add a factor that
explains less variance than is contained in one variable. Since a component analysis is supposed to summarize
a set of data, to use a component that explains less than a variance of 1 is something like writing a summary of
a book in which one section of the summary is longer than the book section it summarizes--which makes no
sense. However, Kaiser's major justification for the rule was that it matched pretty well the ultimate rule of
doing several factor analyses with different numbers of factors, and seeing which analysis made sense. That
ultimate rule is much easier today than it was a generation ago, so Kaiser's rule seems obsolete.
55
a
Rotated Component Matrix
Component
1 2
i am courteously greeted at
.865 .044
front desk reservation
i am treated in friendliness
.833 .155
manner by reception staff
the staff of the hotel give me on

.515 .674
time response
the house keeping service of the

-.184 .911
hotel during my stay was good
the value i receive from my

.385 -.061
hotel matches my expectation
Extraction Method: Principal Component Analysis.

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
Construct component analysis

Rotated components analysis shows the construct variable for further analysis, rule of factor loading is if eigenvalue
explain one variable than take the items in rotated component matrix. If the values are grater then 0.4 then take that
items and if the value of items less than 0.4 then that item exclude with the construct variable
56
ACTIVITY
57
Class Activity Session 2
 Used SPSS promotion data file for PCA analysis

 Lab practice on the Principle component analysis and developing construct variables.
 Interpretation of the output files (data reliability and validity)
58

Assignment No. 4
Dear Students:
Your next task is to check reliability and validity of your data collected through survey.
You are advised to follow these steps:
 To check the reliability of the data

 To check the validity of the data
 Factors analysis
 Construct new variables
c) Evaluation criteria (Total Marks: 10)
1 To check the reliability of the data 20%

2 To check the validity of the data 20%
3 Factors analysis 20%
4 Construct new variables 40%
d) Instruction:
It is an individual assignment that is to be submitted in the form of hard copy to your course
instructor.
59
Data File Management

Descriptive Statistics
60
Session Objectives
After attending this session the students will be able to
 How to transform data and create new variables
Session outline
2. Data File Management
2.1 Count the Data
2.2 Recode Variables
2.2.1 Revise
2.2.2 Reverse
2.3 Compute a new variable
61
Goodness of data
Reliability (Cron bach Alpha)
Cronbach's alpha is the most common measure of internal consistency ("reliability"). It is most commonly used
when you have Likert questions in a survey/questionnaire that form a scale and you wish to determine if the scale
is reliable.
Test Procedure in SPSS

4. Click Analyze > Scale > Reliability Analysis... on the top menu as shown below:
5. You will be presented with the Reliability Analysis dialogue box:
62
6. Transfer the variables "Qu1" to "Qu9" into the "Items:" box. You can do this by drag-and-dropping the
variables into their respective boxes or by using the button. You will be presented with the following
screen:
7. Click the button to generate the output.
SPSS Output for Cronbach's Alpha

SPSS produces many different tables. The first important table is the Reliability Statistics table that provides the
actual value for Cronbach's alpha, as shown below:
Interpretation
We can see that in our example, Cronbach's alpha is 0.805, which indicates a high level of internal
consistency for our scale with this specific sample. The minimum acceptable value of Cronbach’s alpha is
.700, and our result value is 0.805 which is greater than 0.700.
63
Data File Management

Data file management involves different methods for data transformations into the form needed to answer the
research questions. Data file management can be quite time consuming especially if you have a lot of
questions/items that you combine to compute that summated or composite variables that you want to use in later
analysis.
You will learn three useful data transformation techniques: Count, Recode, and compute a new variable that is the
average of several initial variables. From these operations we will produce new variables.
Problem 5.1: Count Math Courses Taken
Sometimes you want to know how many items the participants have taken, bought, done, agreed with
and so forth. For this purpose you can use the count option in transform menu.
Example: How many math courses (algebra1, algebra2, geometry, trigonometry and calculus) did each of the 75
participants take in high school? Label your new variable.
There are five different math courses with the scores of taken=1 and not taken=0 we want to count that how many
course are taken by each respondent. For this
1. go to Transform menu
2. select count values within the cases option to get count window
3. Now type mathcrt in target variable. This is SPSS name for your new variable
4. type math courses taken in the target label box
5. Then select all the math courses and move them over to the numeric variables box. Your Count
window should look like following window 1
6. Click on define variable. To get window 2
1. Type 1 (code for math course taken) in the value box, click on add and continue
2. Click on ok
3. Check your file in variable view that a new variable mathcrt is there and in data view it is also
added along with data values
Problem 5.2: Recode and Re - label

Recode is the command used for adding new and improved variables in the file by
64
1. Revising the variables with large number of answer categories having low frequencies in each category
so that group size will be large enough to perform statistical analysis
2. Reversing the categories of a negatively worded question to make it positive in order to compute new
variable
First we will learn to use recode option to revise the father’s and mother’s education in HSB data file so that
those with no postsecondary education have a value of 1, those with some post secondary have a value of 2,
and those with a bachelor’s degree or more have a value of 3. Label the new variables and values
 Click on transform => Recode=> into Different variables and
you should get Fig: 5.4.
 Now click on mother’s education and then the arrow button.
 Click on father’s education and the arrow to move them to
the numeric Values=> output box.
 Now highlight “faed” in the numeric variable box so that it
turns blue.
 Click on the Output Variable Name box and type faedr.
 Click on the Label box and type father’s education revised.
 Click on change. Did you get faed=> faedr in the Numeric
Variable => Output Variable box as in Fig
 Now repeat these procedures with maed in the Numeric
Variable => Output Box.
 Highlight maed.
 Click on Output Variable Name, Type maedr.
 Click Label, type mother’s education revised.
 Click Change.
 Then click on Old and New Values to get Fig
 Click on Range and type 2 in first box and 3 in second box
 Click on Value (part of New Value on the right) and type 1.
 Then click on Add.
 Repeat these steps to change old value s 4 through 7 to a
new value of 2.
 Then Range: 8 through 10 to Value : 3.
 If it does, click on Continue.
 Finally, click on OK.
Check your variable and data view that two new variables with the names of “faedr” and “maedr” are added
there. Define the new variables attributes in the variable view as per variable definition procedure
Now we will learn to use recode option to reverse Pleasure items (item06 and item10 ) in HSB data file so that
these negatively worded items could be reversed. Label the new variables and values. Follow the following steps
 Click on Transform => Record=> Into Different Variables.
65
 Click on reset to clear the window of old information as a

precaution.
 Select item06 and item10 and click on the arrow button.
 Highlight item06 so that it turns blue
 Click on Output Variable and Name and type item06r.
 Click on Label and type item06 reversed.
 Finally click on change.
 Now highlight item10 so that it turns blue.
 Click on Output Variable and Name and type item10r.
 Click on Label and type item10 reversed.
 Click on change.
 Click on old and New values to get fig
 Now click on the value box (under old value) and type 4
 Click on the value box for the new value and type 1
 Click Add to tell the computer to change values of 4 to 1
 Repeat last three steps to recode the values 3 to2, 2 to 3,
and 1 to 4.
 Click on continue and then Ok
Check your variable and data view that two new variables with the names of “item06r” and “item10r” are added
there.
Compute Variables
Compute option in transform menu is used to compute one variable from number of variables
derived from questionnaire (as we are used to ask number of questions to measure one variable).
Compute Pleasure Scale Score

Here we will learn how to Compute the average pleasure scale from item02, item06r, item10r and item14.
Name the new computed variable pleasure and label its highest and lowest values.
 Click on transform => compute.

 In the. Target Variable box of Fig., type pleasure.
 Click on type & Label and give it the name pleasure scale.
 Click on continue to return to Fig.
 In the Numeric expression box type
(item02+item06r+item10+item14)/4 be sure that what you typed is
exactly like this
 Finally, click on Ok.
66
 Now provide Value Labels for the pleasure scale using

commands similar to those you did for father’s education
revised.
 Type 1, then very low and click Add.
 Type 4, then very high, and click Add. See Fig. if you need help.
Check your data file to see if pleasure scale has been added as a new variable in both variable and data views.
67
DATA ANALYSIS
68
A. Lesson Objectives
After studying this session you would be able to:
1. Produce simple graphical and numerical summaries of data.
2. Measure the location (Average) of the data
3. Measure the dispersion(Spread) of the data
4. Check the data normality
5. Use Data transformation techniques
5.1 Count, Reverse, Revise
5.2 Compute a new variable
B. Lesson Outline
1. Descriptive statistics
1.1 Summarizing Numerical Data
1.1.1 Five Figure Summaries
1.1.2 Frequency Distribution
1.1.2.1 Tables
1.1.2.2 Graphs
2. Measures of Central Tendency
2.1 Mean
2.2 Median
2.3 Mode
3. Measures of Variability
3.1 Standard Deviation
3.2 Range
3.3 Inter quartile range
3.4 Variance
4. Normality of data
4.1 Skewness
4.2 Kurtosis
69
Descriptive statistics are the statistics that are used to understand and describe the data. They are used to
answer the descriptive type of research questions. It involves
 Summarizing the data
 Measure of central tendency
 Measure of dispersion
 Checking data normality
 Data file management
 Recode and transform variables
1- Summarizing the data
A data matrix contains too much information to be taken in at a glance due to which it becomes difficult to
understand and get feel of the data. A set of data can be understood only if it is summarized in some
appropriate way. Summarizing data techniques varies based on the type of data that whether the data is
categorical or numerical. We will see how both types of data are summarized one by one.
1.1- Summarizing categorical data
A categorical variable is usually summarized in frequencies and there percentages. This process is
called Frequency distribution. It can be presented in two ways that are in the form of
 Tables of frequency and percentages or
 Graphs.
Let’s see frequency distribution in detail.
1.1.1- Frequency Distribution.
A frequency distribution is a tally (IIII) or count of the number of times each score (category) on a
single variable is marked by respondents. A frequency can be further summarized by expressing them
as percentages of the total using following formula
Percentage = (frequency/total) X100
Example
The frequency distribution of final grades in a class of 50 students might be 7 A’s, 20 B’s, 18 C’s and 5
D’s. Note that in this frequency distribution most students have B’s or C’s (grades in the middle) and
similar small numbers have A’s and D’s (high and low grades).
When there are a small number of scores for the low and high values and most scores are for the
middle values, the distribution is said to be approximately normally distributed
70
To get a frequency distribution Table:

Analyze Descriptive Statistics Frequencies move religion to the variable box
OK (make sure that the Display frequency tables box is checked)
Fig.1. Frequency table for religion in hsb data
Frequency Percent Valid Percent Cumulative Percent
Valid protestant 30 40.0 44.8 44.8
catholic 23 30.7 34.3 79.1
no religion 14 18.7 20.9 100.0
Total 67 89.3 100.0
Missing other religion 4 5.3
blank 4 5.3
Total 8 10.7
Total 75 100.0
Interpretation:
In this example, there is a Frequency column that shows the numbers of students who marked each
type of religion (e.g., 30 said protestant and 4 left it blank). Notice that there are a total of (67) for the
three responses considered Valid and a total (8) for the two types of responses considered to be
Missing as well as an overall total (75). The Percent column indicates that 40.0% are protestant, 30.7%
are catholic, 18.7% are not religious, 5.3% had one of several other religions, and 5.3% left the question
blank. The Valid Percentage column excludes the eight missing cases and is often the column that you
would use. Given this data set, it would be accurate to say that of those not coded as missing, 44.8%
were protestant and 34.3% catholic and 20.9% were not religious.
Frequency distribution graphs
With Nominal data, you should not use a graphic that connects adjacent categories because with
nominal data, there is no necessary ordering of the categories or levels. Thus, it is better to make a bar
graph or chart of the frequency distribution of variables like religion, ethnic group, or other nominal
71
variables; the points that happen to be adjacent in your frequency distribution are not by necessarily
adjacent.
Bar Charts
bar graphs are usually used to display "categorical qualitative data", the bars in bar graphs are usually separated
and the height of the bars shows the frequency of that category.
To get a bar chart select

Graphs legacy dialogues interactive bar chart move variable to the box
OK
Fig.2. Bar chart for the nominal variable religion
1.2- Summarizing Numerical Data

Simple numerical summaries of a numerical variable can be obtained through
1.2-1. Five Figure Summary
The data can be summarized by quoting five figures if the data is first sorted into (ascending)
numerical order. These five figures are
1. Minimum value (Min) —the smallest value, with rank 1
2. Maximum value (Max) — the largest value, with rank n, and
3. Median (M/Q2) —The middle value, with rank (n+1)/2
The median divides the data into two halves, each with the same number of observations. Each
of these halves may, in turn, be divided into two by quartiles, so that the data is split into 4
quarters. These are known as:
4. Lower quartile (Q1) —The middle value of first half.
5. Upper quartile (Q3) —The middle value of second half
Rank the values from 1 (the smallest value) to n (the largest value; n denotes the total number of
observations).
72
Minimum Median Maximum
Lower half Upper half
Lower quartile Upper Quartile
Example 1: Department An absenteeism data. Consider the absenteeism data for a department in an
organization
Department A: 20 employees
0 0 2 2 0 0 1 1 3 1 2 3 3 5 95 5 5 8 10 15
Step 1-Ascending order
0 0 0 0 1 1 1 2 2 2 3 3 3 5 5 5 8 10 15 95
Step 2 Ranking
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rank
0 0 0 0 1 1 1 2 2 2 3 3 3 5 5 5 8 10 15 95
Value
Step 3 deriving summary elements

 The minimum value at rank 1 is “0”,
 The Maximum value at rank 20 (n) is “95”
 The median at rank (n+1/2= 20+1/2=21/2=10.5). since the 10th value is 2 and the 11th
values is 3 so
Median = (2+3)/2= 2.5
 The ‘lower half’ consists of the values ranked from 1 to 10. The middle rank is therefore
(1+10)/2 = 5 ½. The 5th value is 1 and the 6th value is also 1, so
Lower quartile = (1+1)/2 = 1
 Similarly, the ‘upper half’ consists of the values ranked from 11 to 20. The middle of
these ranks is (11+20)/2 = 31/2 = 15½. The 15th and 16th values both are 5, so
Upper quartile = (5+5)/2 = 5
Table 3: The five-figure summary
73
Summary Value
Minimum 0
Lower quartile 1
Median 2.5
Upper quartile 5
Maximum 95
1.2-2. Boxplot
A boxplot is a quick method of summarizing and graphically representing ordinal and scale data for examining
one or more sets of data. It is also called box and whisker plot. It is useful to
 Summarize the data by getting five figure summary
 Check the data for errors
 Examine and compare frequency distributions
 Check assumption for inferential statistics (Check normality of data)
Boxplot for one set of data
Graphs Boxplot in boxplot window select simple and summaries of separate variables click
define select the variable and move it into the boxes represent box click ok
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
math achievement test 75 100.0% 0 .0% 75 100.0%
74
Upper whisker =Max
Upper Quartile = Q3
Median = Q2
Lower Quartile = Q1
Lower whisker =Min
Interpretation
The case processing summary table shows the valid N=75, with no missing values for total sample of 75 for the variable
math achievement. The plot shows a box plot for math achievement. The box represents the middle 50% of the cases
(M=13), lower end of the box shows lower quartile (Q1=7.67), and upper end of the quartile shows upper quartile
(17.00). The whiskers indicate the expected range (25.33) of scores from minimum (Min=-1.67) to Maximum
(Max=23.67). Scores outside of this range are considered unusually high or low, such scores are called outliers. There are
no outliers for in this case.
Boxplot for two sets of data
To draw boxplot for two or more data sets click on
`Graphs legacy dialogues interactive box plot move gender to the x-axis and move SAT math to y-axis
OK
Box and whisker plot for ordinal or normal data
75
Interpretation
Fig. 5 shows two box plots, one for males and one for females. The box represents the middle 50% of the cases
(i.e. those between the 25th and 75th percentiles). The whiskers indicate the expected range of scores. Scores
outside of this range are considered unusually high or low. Such scores, called outliers, are shown above and or
below the whiskers with circles or asterisks (for very extreme scores) and the SPSS data view line number for that
participant. Note there are no outliers for the 34 males, but there is a low (#6) and a high (#54) female outlier.
(Note this number will not be the participant’s ID unless you specify that SPSS should report this by ID number or
the ID numbers correspond exactly to the line number).
Histograms
Histogram is a form of a bar graph used with numerical (scale) variable preferably of continuous nature. The
intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a
rectangle located above the interval. Unlike the bar graph, in a histogram there is no space between the bars.
The data is continuous so the lower limit of any one interval is also the upper limit of the previous interval. It is
useful to
 Summarize the data Examining and comparing frequency distributions

 Check normality of data
To draw a histogram select:
Graphs legacy dialogues interactive histogram move variable to the box OK
Fig.3. Histogram of SAT- math score
Interpretation
In fig. 3 the frequencies (number of students), shown by the bars are for a range of points (in this case SPSS selected a
range of 50: 250-299, 300-349, 350-399, etc). Notice that the largest number of students (about 20) had scores in the
middle two bars of the range (450-499 and 500-549). Similar small numbers of students have very low and very high
76
scores. The bars in the histogram form a distribution (pattern or curve) that is similar to the normal, bell shaped curve.
Thus, the frequency distribution of the SAT math scores is said to be approximately normal.
The figure shows the distribution for the competence scale. Notice that the bars form a pattern very different from the
normal curve line. This distribution can be said to be not normally distributed. As we see later in the chapter, the
distribution is negatively skewed. That is, extreme scores or the tail of the curve are on the low end or left side. Note
how much this differs from the SAT math score frequency distribution. As you will see in the Levels of Measurement
section, we call the competence scale variable ordinal.
1.3- Scatter plot

Scatter plot is a plot or graph of two variables that shows how the score on one variable associates with his or her score
on the other variable. Each dot or circle on the plot represents a particular individual’s score on the two variables with
one variable being represented on the X axis and the other on the Y axis. The measurement for both variables is
continuous (measurement data). It is useful to
 Gain insight into the relationship between two scale variables.
 To check the assumptions of linearity for correlation and regression statistics
 To locate the outliers that are far from the regression line.
To draw boxplot for two or more data sets click on
Graphs legacy dialogues interactive scatter plot move “Scholastic aptitude” to the x-axis and move
competence scale to y-axis OK
Interpretation
The output shows a scatter plot for two scale variables i.e. scholastic aptitude test and competence scale
The overall pattern of the dots show that it is from diagonal upward straight regression line showing positive association
between the two variables and the points fit the line pretty well (r2= 0.04) and there are very few values dispersed far
from the regression line so it seems that there is strong relationship between scholastic aptitude test and competence
scale
77
2. Measures of Central Tendency (Average/ Location)

Central tendency of a data set refers to a measure of the "middle, central or average" value of the data set in order
to find out the only one value that can represent the whole data set. It is also called measure of the location. It
includes
 Mean is the arithmetic average of numerical data. It is an appropriate measure of central tendency when
there is less fluctuation in data and values are more consistent with no outliers. It is the most common
measure of central tendency. It can be calculated by dividing sum of the values (∑ X) with the number of
values (n). Its formula is
X = ∑ X /n
 Median is the middle value of the numerical data. It is an appropriate measure of central tendency for ordinal
raw data is less consistent with more fluctuations and outliers. It is the midpoint of a distribution that the
same numbers of scores are above the median as below it. It can be calculated by
 Arranging the data in ascending order

 Ranking them
 And locating the value at middle rank using the formula as under
X = (n+1)/2th value
 Mode is the most common category in the data. It is a measure of central tendency for any kind of data but it
is most appropriate for categorical data preferably of qualitative nature. It generally provides the least precise
information about central tendency in case of categories of ordinal or scale data. Remember that some time
data can have more than one mode. Mode is denoted by
X = the most frequent value
The mean median and mode have same value if the data is normally distributed (symmetrical) but would have
varying values if the data is skewed.
The suitability of measures of central tendency is given in table below
To find out mean, median, and mode click on

Analyze Descriptive statistics frequencies move the “Scholastic aptitude” to the variables box
click on statistics tab check mean median mode click OK
78
Fig.6. Mean, Median Mode of SAT math score
N Valid 75
Missing 0
Mean 490.53
Median 490.00
Mode 500
3. Measures of Variability
Measure of variability is the quantitative measure of the degree of variation or dispersion of values in a
data set including score of one variable. It provides information about the degree to which individual scores are
clustered about or deviate from the average value in a distribution. A measure of statistical dispersion is zero if all the
data are identical, and increases as the data becomes more diverse. It cannot be less than zero. Standard Deviation is
the most common measure of variability. It is as follows
 Standard deviation
Standard deviation is the most commonly used measure of the variability. It is the average distance of
the values from the mean of data and thus shows how much variation is there in the data from the
"average" (mean). The formula for standard deviation is as follows
Standard deviation is the most commonly used measure of the variability. It is the average distance of the
values from the mean of data and thus shows how much variation is there in the data from the "average"
(mean). The formula for standard deviation is as follows
It can be calculated using following steps

Example: Suppose we wished to find the standard deviation of the data set consisting of the values 3, 7, 7, and
19.
Step 1: find the arithmetic mean (average) of 3, 7, 7, and 19,
Step 2: find the deviation of each number from the mean, by subtracting the mean from values (x-x)
79
1. Step 3: square each of the deviations to obtain (x-x)2 , which amplifies large deviations and makes
negative values positive,
2. Step 4: find the average of those squared deviations by adding them up and dividing by n-1 to get the
variance s2) 4 - 1
3. Step 5: take the non-negative square root of the quotient (converting squared units back to regular
units),
=48
S=
4. So, the standard deviation of the set is 6.93
Interpretation of standard deviation
In order to measure the dispersion of the data from its mean (x = 9) standard deviation is calculated.
The standard deviation (s=6.93) shows that the average distance of the values from the means is 6.93
which relates that the most of the values falls in the range of 9 ± 6.93 (x±s) that is from 2.07 to 15.93.
Zero Standard deviation means that the data values are clustered at one point i.e. mean. A low standard
deviation indicates that the data points tend to be very close to the mean, whereas high standard
deviation indicates that the data are spread out over a large range of values.
For data with a symmetric and approximately normal distribution it can be shown that
 About two-third of the data will lie within one standard deviation on either side of the mean, that is
between (x ± S)
 About 95% of the data will lie within two standard deviation on either side of the mean that is
between (x ± 2S)
 Nearly all the data will lie within three standard deviation on either side of the mean that is
between(x ± 3S)
These facts would help you interpret the standard deviation for an approximately normal variable
Remember that when the distribution is skewed the standard deviation may be a less helpful measure of spread
as its values can be largely affected by outliers.
Other measures of Variability

Besides standard deviation there are also some other measures of variability that are as follows
 Range - The range is the difference between the highest and lowest score in a distribution. It is the simplest
measure to compute and understand variability of the data but it is not often used as the sole measure of
variability due to its instability. Because it is based solely on the most extreme scores in the distribution and
80
does not fully reflect the pattern of variation within a distribution, hence the range is a very limited measure of
variability.
Range = Max - Min
 Interquartile Range (IQR) - The interquartile range is the range of the middle 50% of a distribution. Because
any outliers in our distribution must be on the ends of the distribution, the range as dispersion can be strongly
influenced by outliers. One solution to this problem is to eliminate the ends of the distribution and measure the
range of scores in the middle. Thus, with the interquartile range we will eliminate the bottom 25% and top 25%
of the distribution, and then measure the distance between the extremes of the middle 50% of the distribution
that remains.
IQR = Q3 - Q1
 Variance - The variance is a measure based on the deviations of individual scores from the mean. As noted in
the definition of the mean, however, simply summing the deviations will result in a value of 0. To get around
this problem the variance is based on squared deviations of scores about the mean.
When the deviations are squared, the rank order and relative distance of scores in the distribution is preserved
while negative values are eliminated. Then to control for the number of subjects in the distribution, the sum of
the squared deviations, S(X - `X), is divided by N (population) or by N - 1 (sample). The result is the average of
the sum of the squared deviations and it is called the variance.
To get the measures of variability

 Analyze Descriptive Statistics Descriptive move SATmath Options
Std Deviation, variance, Range, IQR Continue OK
Descriptive Statistics for the Scholastic Aptitude test—math (SATM)
N Range Std. Deviation Variance
scholastic aptitude test –

75 480 94.553 8.940E3
math
Valid N (listwise) 75
81
Table 4 Selection of Appropriate Descriptive Statistics and Plots
4. Checking assumption for parametric tests

Every inferential statistical test has assumptions. These Statistical assumption explain when it is and isn’t
reasonable to perform a specific statistical test. It these assumptions are not met, the value that SPSS
calculates, which tells the researcher whether or not the results are statistically significant, will not be
completely accurate and may even lead the researcher to draw the wrong conclusions about the results.
It involves checking assumptions for parametric tests as well as non parametric tests. These involves
 Assumptions of large sample size (non parametric test i.e. Chi-square etc.)
 Normality of the data (parametric test i.e correlation and regression etc.)
 Linearity of the data (parametric test i.e correlation and regression)
Here we will discuss the normality curve. The other will be discussed while studying corresponding tests
The Normal Curve
The frequency distributions of many of the variables used in the behavioral sciences are distributed approximately
as a normal curve when N is large. Examples of such variables that approximately fit a normal curve are height,
weight, intelligence, and many personality variables. Notice for each of these examples, most people would fall
toward the middle of the curve, with fewer people at the extremes. If the average height of men in United States
was 5’10” then this height would be in the middle of the curve. The heights of men who are taller than 5’10” would
be to the right of the middle on the curve, and those of men who are shorter than 5’10” would be to the left of the
middle on the curve, with only a few men 7’ or 5’ tall.
82
4.2 Properties of Normal Curve

1. The mean, median and mode are equal.
2. It has one “hump” and this hump is in the middle of the distribution.
3. The curve is symmetric. If you fold the normal curve in half, the right side would fit perfectly with
the left side; that is, it is not skewed.
4. The range is infinite.
5. The curve is neither too peaked nor too flat and its tails are neither too short nor too long.
4.3 how to check the normality

Normality of data can be checked by using
1. Histograms
a. Draw histogram for the data
b. Double click on the Histogram in output window to get into chart editor window
c. Click on the normal curve button in tool bar and check the shape of the curve
d. If it is fulfilling the characteristics mentioned above and the shape of the curve is just like the
shape given above than the data is normal otherwise it is non-normal
2. Boxplots
Box plots can be useful for identifying variables with extreme scores, which can make the
distribution skewed (non-normal).Also if there are few outliers, if the whiskers are approximately
the same length, and if the lines in the box is approximately in the middle of the box, then we can
assume that the variable is approximately normally distributed. Thus, math achievement is near
normal, motivation is approximately normal, but competence is quiet skewed in the HSB data file.
4.4 Non normally shaped Distributions
If the data is not normally distributed than it can have
83
4. Skewness
If one tail of the frequency distribution is longer than the other, and if the mean and median are
different, the curve is skewed. A perfectly normal curve has a skewness of zero (0.0), if it is skewed
to the left, it is called negatively skewed and if it is skewed to the right than it is called positively
skewed. If the value if skewness lies between -1 and +1 than it is considered as the data is
approximately normal.
5. Kurtosis
If a frequency distribution is more peaked than the normal curve in figure above then it is said to
have positive kurtosis and is called leptokurtic. Inversely if a frequency distribution is relatively flat
with heavy tails, it has negative kurtosis and is called platykurtic.
Both skewness and kurtosis can be measured using frequencies command in analyze menu. Skewness
is necessary to measure but kurtosis effects less on the results of the test.
84
Mid-Term Paper
85
Mid-Term Project Discussion

&
Lab Practice Session
86
INFERENTIAL STATISTICS
87
 Lesson Objectives
After studying this session Students would be able to
 Understand and infer results from data in order to answer the associational and differential research
questions using different parametric and non parametric tests.
 Understand, implement and interpret the chi-square, phi and cramer’s V
 understand, implement and interpret the correlation statistics
 understand, implement and interpret the regression statistics
 understand, implement and interpret the T-test statistics
Lesson Outline
1. Non parametric test.

 Chi square /Fisher exact
 Phi and cramer’s v
 Kendall tau-b
 Eta
 Spearman correlation
2. Parametric test
 Correlation
 Pearson correlation
 Regression
 Simple regression
 Multiple regression
 T-Test
 One-sample T-test
 Independent sample T-test
 Paired sample T-test
88
Inferential statistics are used to make inferences (conclusions) about a population from a sample based on the
statistical relationships or differences between two or more variables using statistical tests with the assumption that
sampling is random in order to generalize or make predictions about the future.
Why we use inferential Statistics:-

Inferential statistics are used
 To test some hypothesis either to check relationship between variables (two/more) or to compare two groups to
measure the differences among them.
 To generalize the results about a population from a sample
 To make predictions about the future.
 To make conclusions
You don't need to understand the underlying calculus, but you do need to know which inferential statistic is appropriate
to use and how to interpret it.
Some basic concepts about inferential statistics
Statistical significance (The p value)
Statistical significance test is the test of a null hypothesis Ho which is a hypothesis that we attempt to reject or nullify. i.e.
Ho =There is no relationship /Difference between variable 1 and variable 2
When we apply any inferential statistic, it gives us significance value (called p value). If the p value is less than 5% then
the test result is said to be significant at the 5% level. The term significant means that the test signifies or points to the
conclusion that there is evidence against the truth of the null hypothesis. The comparison of p with 5% is a standard
method often used by researchers, but it is better to report and interpret the actual values of p.
Interpretation
If the p value is greater than 0.05 than it means that Ho is accepted and H1 is rejected. It relates that there is no
relationship/difference between the variables/groups.
If the p value is less than or equal to 0.05 than it means that Ho is rejected and H1 is accepted. It relates that there is
relationship/difference between the variables/groups. A higher p value means that the relationship is lesser significant
and a smaller p value means that the relationship is highly significant.
Confidence Interval
Confidence interval is a range of values constructed for a variable of interest so that this range has a specified
probability of including the true value of the variable. The specified probability is called the confidence level,
and the end points of the confidence interval are called the confidence limits’.
89
It is one of the alternatives to null hypothesis significance testing (NHST). These intervals provide more
information then NHST and may provide more practical information. For example, suppose one knew that an
increase in reading scores of five points, obtained on a particular instrument, would lead to a functional
increase in reading performance.
Two different methods of instruction were compared. The result showed that students who used this new method
scored significantly higher statistically than those who used the other method. According to NHST, we would reject the
null hypothesis of no difference between methods and conclude that the new method is better. If we apply confidence
intervals to this same study, we can determine an interval that contains the population mean difference 95% of the time.
If the lower bound of that interval is greater than five points, we can conclude that using this method of instruction
would lead to a practical or functional increase in reading levels. If, however, the confidence interval ranged from say 1
to 11, the result would be statistically significant, but the mean difference in the population could be as little as 1 point,
or as big as 11 points. Given these results, we could not be confident that there would be a practical increase in reading
using the new method.
The effect size (weak, moderate or strong)

Effect size is the strength of the relationship between the independent variable and the dependent variable, and/or the
magnitude of the difference between levels of the independent variable with respect to the dependent variable.
A statistically significant outcome does not give information about the strength or size of the outcome. Therefore, it is
important to know, the size of the effect. Statisticians have proposed many effect size measures that fall mainly into two
types of families, the r family and the d family.
Interpreting Effect Sizes
Effect sizes always have an absolute value between -1.0 and +1.0. According to Cohen (1988) we can interpret the effect
size (r/d) as follows
Test Value Effect Size Relationship
0 No effect No relationship
>0 – 0.33 Small effect Weak relationship
Moderate
>0.33 – 0.70 Medium/typical effect
relationship
>0.70 – <1 Large effect Strong relationship
1 Maximum effect Perfect relationship
Steps in interpreting inferential statistics

 Relate why a test is applied
 Discuss for which variable the test is applied
 Discuss the assumptions of test applied
90
 Elaborate whether the null hypothesis is rejected or accepted w.r.t. p value

As discussed above if the significance (p) value is less than 0.05 then HO is rejected and H1 is accepted, conversely if the
significance value is greater than 0.05 then HO is accepted and H1 is rejected
 State what is the direction of the effect?
For associational research question indicate whether the association or relationship is positive or negative
For differential research question state which group performed better?
 Conclude the results
Types of tests used in Inferential Statistics

Inferential statistics include a wide variety of tests to infer the results. This variety of tests can be classified in two
broader categories that are
 Non parametric tests
 Parametric tests
Following is the detailed discussion related to both types of tests.
Non parametric test

Non parametric tests are the statistical tests that are used
 When the level of measurement is nominal or ordinal. E.g. chi-square test or Kendall’s tau-b.
 When assumptions about normal distribution in the population is not met e.g. spearman correlation
Non parametric tests involve
 Chi-Square test
 Kendall’s tau-b
 Eta
 Spearman correlation (will be discussed in correlation section)
Let’s see these tests in detail.
Chi-Squared Test
Chi-Squared test is the most commonly used non-parametric test to check the association between two
nominal variables in order to accept or reject the null hypothesis.
Hypothesis for Chi-Square Test
 Ho = there is no association between gender and geometry in h.s.

 H1 = There is association between gender and geometry in h.s.
91
It is used to check
 The association between two nominal variables

 Compare two or more groups if they are categorical in nature
Assumptions and Conditions for the Chi-Squared test
 The data of the variables must be independent. Each subject is assessed only once.
 Both the variables are nominal.
 All the expected counts are greater than 1 for chi-square.
 At least 80% of the expected frequencies should be greater than or equal to 5.
Checking the assumptions for the Chi-Squared test
The assumptions for Chi-squared test are checked through cross tabulation of the categorical variables. It
can be drawn by
 Click the analyze menu

 Select the descriptive statistics option
 Select crosstabs option in the sub menu
 Put geometry in h.s. in rows section and gender in columns section
 Check chi-squared, phi and Cramer’s v from statistics tab
 Check observed, expected and total from cells tab
 Click continue then ok to get the following crosstabs in output window
geometry in h.s. * gender Crosstabulation
gender
male female Total

geometry in h.s. not taken Count 10 29 39
Expected Count 17.7 21.3 39.0
% of Total 13.3% 38.7% 52.0%
Taken Count 24 12 36
% of Total 32.0% 16.0% 48.0%
Total Count 34 41 75
% of Total 45.3% 54.7% 100.0%
Check if all the values of expected counts are greater than one (excluding total column and the total row)
92
Check if the 80% values of expected counts are greater than 5. You can calculate the percentage using
following formula
Number of cells with expected counts greater than 5 × 100
Total number of cells
 If the assumptions are fulfilled then use significance value of Pearson chi-square as highlighted below
 If the assumptions for chi-square are not fulfilled then select the significance value of Fisher’s exact test
 To check the strength of the relationship (effect size) use the value of Phi for 2x2 crosstab and value of Cramer’s
V for 3x3 crosstab. Remember that both Phi and Cramer’s v have similar values for 2x3 and 3x2 crosstabs
Cases
Valid Missing Total
geometry in h.s. * gender 75 100.0% 0 .0% 75 100.0%
Chi-Square Tests
Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-

Value df sided) sided) sided)
a
Pearson Chi-Square 12.714 1 .000
b
Continuity Correction 11.112 1 .001
Likelihood Ratio 13.086 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear Association 12.544 1 .000

b
N of Valid Cases 75
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 16.32.
b. Computed only for a 2x2 table
Symmetric Measures
Value Approx. Sig.
Nominal by Nominal Phi -.412 .000
Cramer's V .412 .000
N of Valid Cases 75
93
Interpretation:
To check the association between gender and geometry in h.s. chi-square test is conducted. The case
processing summary table indicates that there is no participant with missing value. The assumptions are
checked through crosstabs. The Crosstabulation table includes the Counts and Expected Counts, and their
relative percentages within gender. The result shows that there are 24 males who had taken geometry which
is 71% of total 34 male students. On the other hand, 12 of 41 females took geometry; that is only 29% of the
females. It looks like a higher percentage of males took geometry than female students. The Ch-Square Test
table tell us whether we can be confident that this apparent difference is not due to chance.
Note: it is noted very carefully that, we use the Pearson Chi-Square or (for small samples) the Fisher’s exact
test to interpret the results of the test.
Note, in the Cross Tabulation table, that the Expected Count of the number of male students who didn’t take
geometry is 17.7 and the observed or actual Count is 10. Thus, there are 7.7 fewer males who didn’t take
geometry than would be expected by chance, given the Totals shown in the Table. There are also the same
discrepancies between observed and expected counts in the other three cells of the table. A question
answered by the chi-square test is whether these discrepancies between observed and expected counts are
bigger than one might expect by chance.
The Chi-Square Tests table is used to determine if there is a statistically significant relationship between two
dichotomous or nominal variables. It tells you whether the relationship is statistically significant but does not
indicate the strength of the relationship, like phi or a correlation does. In output, we use the Pearson Chi-
Square or (for small samples) the Fisher’s exact test to interpret the results of the test. They are statistically
significant (p < .001), which indicates that we can be quite certain that males and females are different on
whether they take geometry.
Phi is -.412, and like the chi-square, it is statistically significant. Phi is also a measure of effect size for an
associational statistic and, in this case, effect size is moderate according to Cohen (1988)
KENDALL’S TAU-B
If the variables are ordered (i.e. ordinal), you have several other choices. We will use Kendall’s tau-b in this
problem.
Example:
What is the relationship or association between father’s education and mother’s education?
 Analyze Descriptive Statistics Crosstabs.

 Click on Reset to clear the previous entries.
 Put mother’s education revised in the Rows box and father’s education revised in the columns box.
 Click on Cells and ask that the Observed and Expected cell counts and Total percentages be printed in
the table. Click on Continue and then Statistics.
94
 Request the following Statistics: Kendall’s tau-b coefficient under ordinal, and Phi and Cramer’s V
under nominal (for comparison purposes). Do not check Chi-Square.
 Click on Continue
 Click on OK.

Cases
Valid Missing Total
mother education
revised * father 73 97.3% 2 2.7% 75 100.0%
education revised
Symmetric Measures
Asymp. Approx. Approx.
Value Std. Errora Tb Sig.
Ordinal by Ordinal Kendall's tau-b
.494 .108 3.846 .000
N of Valid Cases 73
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Interpretation:
To investigate the relationship between father’s education and mother’s education, Kendall’s tau-b was used.
The analysis indicated a significant positive association between father’s education and mother’s education,
tau =.494, p<.001. This means that more highly educated fathers were married to more highly educated
mothers and less educated fathers were married to less educated mothers. This tau is considered to be a large
effect size (Cohen, 1988).
ETA
If one variable is nominal and the other is scale then ETA is the appropriate test used to check the relationship
between the two variables. Eta is calculated for both variables. First you should decide the dependent variable
and consider the Eta value of that variable.
Example: What is the association between gender and number of math courses taken? How strong is it?
 Analyze Descriptive Statistics Crosstabs.
 Click on Reset to clear the previous entries.
95
 Put math courses taken in the Rows box and gender in the columns box.
 Click the Statistics and select Eta.
 Click Continue
 Click OK to get following results

Cases
Valid Missing Total
math courses taken *
gender 75 100.0% 0 .0% 75 100.0%
Value
Nominal by Eta math courses taken
.328
Interval Dependent
gender Dependent .419
Interpretation
Eta was used to investigate the strength of the association between gender and number of math courses
taken (eta=.33). This is a weak to medium effect size (Cohen, 1988). Males were more likely to take several or
all the math courses than females.
96
Correlation
Correlation is a statistical process that determines the mutual (reciprocal) relationship between two (or more) variables
which are thought to be mutually related in a way that systematic changes in the value of one variable are accompanied
by systematic changes in the other and vice versa.
It is used to determine
 The existence of mutual relationship that is defined by the significance (p) value.
 The direction of relationship that is defined by the sign (+,-) of the test value
 The strength of relationship that is defined by the test value
Correlation Coefficient (r)
The correlation coefficient measures the strength of linear relationship between two or more numerical
variables. The value of correlation coefficient can vary from -1.0 (a perfect negative correlation or association)
through 0.0 (no correlation) to +1.0 (a perfect positive correlation). Note that +1 and -1 are equally high or
strong, but they lead to different interpretations. A high positive correlation between anxiety and grades
would mean that students with higher anxiety tended to have higher grades, those with lower anxiety had
lower grades, and those in between had grades that were neither especially high nor especially low. A high
negative correlation would mean that students with high anxiety tended to have low grades; also high grades
would be associated with low anxiety. With a zero correlation there are no consistent associations. A student
with high anxiety might have low, high or medium grades.
There are two types of correlation
1. Pearson Correlation
2. Spearman Correlation
1. Pearson Correlation
The Pearson Correlation is used when you have two variables that are normal/scale an assumption of the
Pearson correlation is that the variables are related in a linear (straight line) way so we will examine the
scatter plots to see if that assumption is reasonable. Second, the Pearson Correlation, and the Spearman
correlation will be computed and the Spearman is used when one or both is ordinal.
Assumptions and conditions for Pearson Correlation
 The two variables have a linear relationship.
97
 Scores on one variable are normally distributed for each value of the other variable and vice versa.
 Outliers (i.e. extreme scores) can have a big effect on the correlation.
Checking the assumptions for Pearson Correlation
The assumptions for correlation test are checked through normal curve (normality assumption) and the
scatter plot (linearity assumption)
Normality assumption
 Click on the analyze menu

 Select the descriptive statistics option
 Select frequency option in the sub menu
 Put math achievement and Sat math in variables box
 Check skewness in statistics tab and histogram in charts tab
 Click continue and then ok
 You will get skewness values showing that the variables are approximately normally distributed further
check the normality of data through normal curve in histograms using chart editor
98
Linearity assumption
 Click on the graph menu

 Select legacy dialogue, interactive and then scatter plot
 Put math achievement in y-axis and satmath in x-axis
 Click ok to get scatter plot in output window
 Double click on the scatter plot to get into chart editor
 Click on the “add fit line at total” button in tool bar to get
linear line and R square linear = 0.62 close window
 Repeat the previous step for quadratic line and get R square = 0.621
 click apply and close the window
 Calculate the difference between the two R square (0.621 – 0.62 = 0.001)
 If the difference is less than 0.05 (the p value) then the relation is linear (0.001>0.05) hence apply Pearson
correlation
How to apply Pearson Correlation
 Select analyze then correlate and then bivariate

 Put math achievement and Sat math in variable box
 Ensure that Pearson, two tailed, and flag relationships are checked
 Click ok to get follow results in output window
Correlations
scholastic
math aptitude test -
achievement test math
**
math achievement test Pearson Correlation 1 .788
Sig. (2-tailed) .000
N 75 75
**
scholastic aptitude test – Pearson Correlation .788 1
math Sig. (2-tailed) .000
N 75 75
**. Correlation is significant at the 0.01 level (2-tailed).
99
Interpretation
To investigate if there was a statistically significant association between Scholastic aptitude test and math
achievement, a correlation was computed. Both the variables were approximately normal there is linear
relationship between them hence fulfilling the assumptions for Pearson's correlation. Thus, the Pearson’s r is
calculated, r= 0.79, p < .001 relating that there is highly significant relationship between the variables. The
positive sign of the Pearson's test value shows that there is positive relationship, which means that students
who have high scores in math achievement test do have high scores in scholastic aptitude test and vice versa.
Using Cohen’s (1988) guidelines’ the effect size is large relating that there is strong relationship between math
achievement and scholastic aptitude test.
2. Spearman Correlation:
If the assumptions for Pearson correlation are not fulfilled then consider the Spearman correlation with the
assumption that the Relationship between two variables is monotonically non-linear
Example: what is the association between mother’s education and math achievement?
 Analyze Correlate Bivariate.
 Move math achievement and mother’s education to the variables box
 Next ensure that the spearman and Pearson boxes are checked.
 Make sure that the two-tailed (under test of significance), flag significant correlations and two-tailed
are checked
 Now click on options and check means and standard deviations and click on exclude cases list wise.
 Click on continue and click on Ok
Correlationsa
math
mother's achieveme
education nt test
Spearman's mother's Correlation
1.000 .315**
rho education Coefficient
Sig. (2-tailed) . .006
math Correlation
.315** 1.000
achievement test Coefficient
Sig. (2-tailed) .006 .
100
Interpretation
To investigate if there was a statistically significant association between mother’s education and math achievement, a
correlation was computed. Mother’s education was skewed (skewness=1.13), which violated the assumption of
normality. Thus, the spearman rho statistic was calculated, r, (73) = .32, p = .006. The direction of the correlation was
positive, which means that students who have highly educated mothers tend to have higher math achievement test
scores and vice versa. Using Cohen’s (1988) guidelines’ the effect size is medium for studies in his area. The r2 indicates
that approximately 10% of the variance in math achievement test score can be predicted from mother’s education.
REGRESSION
Regression analysis is used to measure the relationship between two or more variables. One variable is called
dependent (response, or outcome) variable and the other is called Independent (explanatory or predictor)
variables.
It is used to check that due to one unit change in the independent variable(s) how much change occurs in
dependent variable.
Regression Equation
It is the equation representing the relation between selected values of one variable (x:the independent variable) and
observed values of the other (y: the dependent variable); it permits the prediction of the most probable values of y. The
standard form of this equation for two variables and for more than two variables respectively is as follows
Y = a + bx Y = a + bx1 + cx2 + dx3 + ex4
Y = dependent variable
a = Constant
b, c, d, e, = slope coefficients
x1, x2, x3, x4 = Independent variables
Types of Regression
There are two types of regression analysis that are
 Simple Regression
 Multiple regression
Simple Regression
101
Simple regression is used to check the contribution of independent variable(s) in the dependent variable if the
independent variable is one.
Assumptions and conditions of simple regression
 Dependent variable should be scale

 The relationship of variables should be linear
 Data should be independent
 Data should be normally distributed
Example: Can we predict math achievement from grades in high school
Commands
2. Analyze Regression Linear

3. Highlight math achievement. Click the arrow to move it into the dependent box
4. Highlight grades in high school and click on the arrow to move it into the independent (s) box.
5. Click on Ok
Variables Entered/Removedb
Variables Variables
Model Entered Removed Method
a
1 grades in h.s. . Enter
a. All requested variables entered.
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 .504a .254 .244 5.80018
a. Predictors: (Constant), grades in h.s.
ANOVAb
Sum of
Model Squares Df Mean Square F Sig.
1 Regression 836.606 1 836.606 24.868 .000a
Residual 2455.875 73 33.642
Total 3292.481 74
a. Predictors: (Constant), grades in h.s.
b. Dependent Variable: math achievement test
Coefficientsa
102
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .397 2.530 .157 .876
grades in h.s. 2.142 .430 .504 4.987 .000
a. Dependent Variable: math achievement test
Regression equation is Math Achievement Test = 0.40 + 2.14(grades in h.s)
Interpretation
Simple regression was conducted to investigate how well grades in high school predict math achievement
scores. The results were statistically significant as p<.001. The indentified equation to understand this
relationship was math achievement = .40 + 2.14* (grades in high school). The adjusted R 2 value was .244. This
indicates that 24% of the variance in math achievement was explained by the grades in high school.
Multiple Regressions
Multiple regressions is used to check the contribution of independent variable(s) in the dependent variable if
the independent variables are more than one.
 Assumptions and conditions of Multiple regression
 Dependent variable should be scale

 Relationship of independent variables with dependent variable is linear
 Independent variables should not be correlated with each other (Multi-co linearity)
 Variance of error term should be constant (Heteroscedasticity)
 Current values of error term should not be correlated with previous (Auto correlation)
Example: How well can you predict math achievement from a combination of four variables: grades in high
school, father’s education, mother education and gender
Commands
6. Analyze Regression Linear

7. Highlight math achievement. Click the arrow to move it into the dependent box
8. Highlight grades in high school, father’s education, mother education and gender and click on the arrow to
move them into the independent (s) box.
9. Under method, be sure that enter is selected.
10. Click on continue and then ok to get the following results in output window
103
Std.
Mean Deviation N
math achievement
12.6621 6.49659 73
test
grades in h.s. 5.70 1.552 73
father's education 4.73 2.830 73
mother's education 4.14 2.263 73
Gender .55 .501 73
Model Summary
Std. Error
Adjusted R of the
Model R R Square Square Estimate
a
1 .616 .379 .343 5.26585
ANOVAb
Sum of Mean
Model Squares df Square F Sig.
1 Regression 1153.222 4 288.305 10.397 .000a
Residual 1885.583 68 27.729
Total 3038.804 72
Coefficients
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 1.047 2.526 .415 .680
grades in h.s. 1.946 .427 .465 4.560 .000
father's education .191 .313 .083 .610 .544
mother's education .406 .375 .141 1.084 .282
Gender -3.759 1.321 -.290 -2.846 .006
a. Dependent Variable: math achievement test
104
Regression Equation:
Math Achievement Test = 1.047 + 1.95 (grades in h.s.) + 0.19 (father's education) + 0.41
(mother's education) – 3.76 (Gender)
Interpretation
Simultaneously multiple regression was conducted to investigate the best predictors of math achievement test
scores. The means, standard deviation, and inter correlations can be found in table. The combination of
variables to predict math achievement from grades in high school, and gender was statistically significant, p
<0.05. While father’s and mother’s education are not statistically significant with math achievement test as p
>0.05 .The beta coefficients are presented in last table. Note that high grades and male gender significantly
predict math achievement when all four variables are included. The adjusted R 2 value was 0.343. This indicates
that 34 % of the variance in math achievement was explained by the model.
T-TEST Statistics
The t test is used to compare to groups to answer the differential research questions. Its values determines
the difference by comparing means
Hypothesis for T-test
HO: there is no Difference between variable 1 and variable 2
H1: There is difference between variable 1 and variable 2
Types of T-test
There are three types of T-test
1. One sample t-test

2. Independent sample t-test
3. Paired sample t-test
1. ONE SAMPLE T-TEST
One sample t-test is used to determine if there is difference between population mean (Test value) and
the sample mean (X)
Assumptions and conditions of 1 sample t-test
1. The dependent variable should be normally distributed within the population
105
2. The data are independent.(scores of one participant are not depend on scores of the other
:participant are independent of one another )
Example: is the mean SAT-Math score in the modified HSB data set significantly different from the
presumed population mean of 500?
Commands
1. Analyze Compare means One sample t-test
2. Move scholastic aptitude test-math to the test variables box.
3. Type 500 in the test value box
4. Click on Ok
One-Sample Statistics
Std. Error
N Mean Std. Deviation Mean
scholastic aptitude test
75 490.53 94.553 10.918
– math
One-Sample Test
Test Value = 500
95% Confidence Interval of
Sig. (2- Mean the Difference
t Df tailed) Difference Lower Upper
scholastic aptitude test – math -.867 74 .389 -9.467 -31.22 12.29
Interpretation:
To investigate the difference between population and the sample, one-sample t-test is conducted. The One-
Sample Statistics table provides basic descriptive statistics for the variable under consideration. The Mean AT-
Math for the students in the sample will be compared to the hypothesize population mean, displayed as the
Test Value in the One-Sample Test table. On the bottom line of this table are the t value, df, and the two-
tailed sig. (p) value, which are circled. Note that p=.389 so we can say that the sample mean (490.53) is not
significantly different from the population mean of 500 as the sig. value is greater than 0.05 . The table also
provides the difference (-9.47) between the sample and population mean and the 95% Confidence Interval.
The difference between the sample and the population mean is likely to be between +12.29 and -31.22 points.
106
Notice that this range includes the value of zero, so it is possible that there is no difference. Thus, the
difference is not statistically significant.
2. INDEPENDENT SAMPLE T-TEST
Independent sample T-test is used to compare two independent groups (Male and Female) with respect to
their effect on same dependent variable.
Assumptions and conditions of Independent T-test
1. Variance of the dependent variable for two categories of the independent variable should be equal
to each other
2. Dependent variable should be scale
3. Data on dependent variable should be independent.
Example: Do male and female students differ significantly in regard to their average math achievement
scores
Commands
1. Analyze Compare means independent sample t-test
2. Move math achievement scores to the test variables box.
3. Move gender to the grouping variable box
4. Click on define groups
5. Type 0 for males in the group 1 box and 1 for females in the group 2 box
6. Click on continue
7. Click on Ok
107
Interpretation
The first table, Group Statistics, shows descriptive statistics for the two groups (males and females)
separately. Note that the means within each of the three pairs look somewhat different. This might be due to
chance, so we will check the t test in the next table.
The second table, Independent Sample Test, provides two statistical tests. The left two columns of numbers
are the Levene’s test for the assumption that the variances of the two groups are equal. This is not the t test;
it only assesses an assumption! If this F test is not significant (as in the case of math achievement and grades
in high school), the assumption is not violated, and one uses the Equal variances assumed line for the t test
and related statistics. However, if Levene’s F is statistically significant (Sig. <.05), as is true for visualization,
then variances are significantly different and the assumption of equal variances is violated. In that case, the
Equal variances not assumed line used; and SSPS adjusts t, df, and Sig. The appropriate lines are circled.
Thus, for visualization, the appropriate t=2.39, degree of freedom (df) = 57.15, p=.020. This t is statistically
significant so, based on examining the means, we can say that boys have higher visualization scores than girls.
We used visualization to provide an example where the assumption of equal variances was violated (Levene’s
test was significant). Note that for grades in high school, the t is not statistically significant (p=.369) so we
conclude that there is no evidence of a systematic difference between boys and girls on grades. On the other
hand, math achievement is statistically significant because p<.05; males have higher means.
The 95% Confidence Interval of the Difference is shown in the two right-hand column of the output. The
confidence interval tells us if we repeated the study 100 times, 95 of the times the true (population)
difference would fall within the confidence interval, which for math achievement is between 1.05 points and
6.97 points. Note that if the Upper and Lower bounds have the same sign (either + and + or – and -), we know
that the difference is statistically significant because this means that the null finding of zero difference lies
outside of the confident interval. On the other hand, if zero lies between the upper or lower limits, there could
be no difference, as is the case of grades in h.s. The lower limit of the confidence interval on math
108
achievement tells us that the difference between males and females could be as small as 1.05 points out 25,
which are the maximum possible scores.
Effects size measures for t tests are not provided in the printout but can be estimated relatively easily. For
math achievement, the difference between the means (4.01) would be divided by about 6.4, an estimate of
the pooled (weighted average) standard deviation. Thus, d would be approximately .60, which is, according to
Cohen (1988), a medium to large sized “effect.” Because you need means and standard deviations to compute
the effect size, you should include a table with means and standard deviations in your results section for a full
interpretation of t tests.
2. PAIRED SAMPLE T-TEST
Paired sample T-test is used to compare two paired groups (e.g. Mothers and fathers) with respect to there
effect on same dependent variable.
Assumptions and conditions of Paired sample T-test
3. The independent variable is dichotomous and its levels (or groups) are paired, or matched, in some
way (husband-wife, pre-post etc)
4. The dependent variable is normally distributed in the two conditions
Example: Do students’ fathers or mothers have more education?

Commands
8. Analyze Compare means paired sample t-test
9. Click on both of the variables, fathers education and mothers education, and move them
simultaneously to the paired variables box
10. Click on Ok
Paired Samples Statistics
Std. Std. Error
Mean N Deviation Mean
Pair 1 father's education 4.73 73 2.830 .331
mother's
4.14 73 2.263 .265
education
Paired Samples Correlations
N Correlation Sig.
Pair 1 father's education &
73 .681 .000
mother's education
109
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std.
Difference
Std. Error Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 father's
education -
.589 2.101 .246 .099 1.079 2.396 72 .019
mother's
education
Interpretation
The first table shows the descriptive statistics used to compare mother’s and father’s education levels. The
second table Paired Samples Correlations, provides correlations between the two paired scores. The
correlation (r=.68) between mother’s and father’s education indicates that highly educate men tend to marry
highly educated women and vice versa. It doesn’t tell you whether men or women have more education. That
is what t in the third table tells you.
The last table shows the Paired Samples t Test. The Sig. for the comparison of the average education level of
the students’ mothers and fathers was p=.019. Thus, the difference in educational level is statistically
significant, and we can tell from the means in the first table that fathers have more education; however, the
effect size is small (d=.28), which is computed by dividing the mean of the paired differences (.59) by the
standard deviation (2.1) of the paired differences. Also, we can tell from the confidence interval that the
difference in the means could be as small as .10 of a point or as large as 1.08 points on the 2 to 10 scale.
110
Final-Term Project Discussion

&
111

The students will be given two hours session in Lab revision of what they have learnt in Post-midsession.
 The objectives of this session are to provide students an opportunity to

 Revise the whole course that they have learnt throughout the post mid session
 Have hands on practice on dealing with quantitative data using SPSS
 Share their problems that they confront during revision and get the solution
 Clarify if they have any ambiguity regarding understanding or application of any concept regarding QTB
Final Project Discussion
The students will be given one hour’s session to discuss about the final draft of their final projects.
The objectives of this session are to provide students an opportunity to
 Share their problems that they confront during revision and get the solution
 Clarify if they have any ambiguity regarding understanding or application of any concept regarding QTB
 Get productive feedback on what they have done regarding their projects
The Drafts will be checked on the following criteria

The drafts will be checked if the following components are covered
a. Whether the survey is appropriately designed to collect the primary data

b. Whether the following components are appropriately discussed in the report
o An introduction explaining the background and objectives of your work.
o The Justification of the topic selection
o A description of the data – definitions of the variables, conclusions about data quality, and so on.
o A justification of the methods you have chosen to analyze the data.
o Analysis and results descriptive as well as inferential with results
o Conclusion – a discussion and interpretation of your results and a summary of what you have
achieved.
o Length: 1500 to 2500 words
112

Study Pack QTB PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Study Pack QTB PDF

Uploaded by

Copyright:

Available Formats

QUANTITATIVE TECHNIQUES IN BUSINESS QTB

INTRODUCTION TO QUANTITATIVE TECHNIQUES IN BUSINESS

QUATTITATIVE TECHNIQUES IN BUSINESS

Why to study Quantitative techniques in Business?

 Gather, sort, analyze and interpret the data

SOME BASIC CONCEPTS/TERMS OF QUANTITATIVE RESEARCH

A Research Question is a statement that identifies the phenomenon to be studied.

A good problem statement is in which it is clearly defined that

1. What actually the problem is?

 Measure the annual turnover of employees in Higher educational sector of Pakistan

 advertisement contribute to the sales of a new product in the market

 What is the impact of advertisement on sales of a new product in the market

Types of Research Questions

There are two types of hypothesis

Example: Ho = There is no relationship between Advertising and Sales

Example: H1 = There is relationship between advertising and sales

Differences between Research Questions and Hypothesis

Research question Hypothesis

Constant and Variable

 If all participants of a study are female then Gender will be constant

Vary + able = Change + able

a) On the basis of relationship

a. On the basis of relationship

b. On the basis of Data

Nominal Ordinal Discrete Continuous

Types of Categorical variable:

Types of Numerical variable

Jarrett, D. (2007) Using SPSS (6th ed.) Middlesex University.

Write down three examples of each type of variables

Types of variables on the basis of Relationship

Types of variables on the basis of data type

1. Categorize the following variables according to their types

Evaluation criteria (Total Marks: 10)

1 Develop problem statement 10%

Quantitative Techniques In Business

A. You are advised to follow these steps:

1 Develop problem statement 20%

 What is data and its different types

Nationality: Pakistani, Indian, American

Nature Time Frame Source

Qualitative Longitudional Secondary

3.1 Primary and Secondary Data

For example: you want to collect data on motivation

Advantage & Disadvantage:

For example: An annual report

a) The data may be out of date

How to collect primary data

 Whom to survey (Sample Selecting)

 Where to survey (Site Selecting)

 How to survey (Method)

 What to survey (Questions for required information)

 How to develop Questionnaire?

2.1 Designing a questionnaire:

• Decide what information is required.

• Draft some questions to elicit the information.

• Put them into a meaningful order and format.

• Pre-test the questionnaire.

• Go back to Step 1, and continue until the questionnaire is perfect.

2.3 Decide what information is required

2. Questions need to be consistently administered or communicated to the respondents.

 Self administered survey

1. Coding the questionnaire.