You are on page 1of 35

INDEX

S.NO TOPIC PAGE NO


1 INTRODUCTION TO SPSS
2 WORKING WITH SPSS
3 IMPORTANT DROP-DOWN MENUS IN DATA VIEW
4 OUTPUT VIEWER
5 INTRODUCTION TO STATISTICS
6 DESCRIPTIVE STATISTICS
7 INFERENTIAL STATISTICS
8 CORRELATION
9 REGRESSION ANALYSIS
10 T-TEST
11 CHI-SQUARE TEST
12 ANOVA
13 GRAPHS IN SPSS

1. INTRODUCTION TO SPSS

SPSS (Statistical Package for the Social Sciences), also known as IBM SPSS Statistics, is a software package
used for the analysis of statistical data. SPSS is a package of programs for manipulating, analyzing, and
presenting data; the package is widely used in the social, behavioral sciences, healthcare, marketing, and
education research. There are several forms of SPSS. The core program is called SPSS Base and several
add-on modules extend the range of data entry, statistical, or reporting capabilities.

SPSS is a widely used program for statistical analysis in Social Sciences. it is also used by market
researchers, survey companies, government data mines, and others.
in addition to statistical analysis data management (case selection, file reshaping, creating derived data)
and data documentation (a metadata dictionary stored in the data file) are the features of SPSS software.

statistics included in the base software are

1. Descriptive statistics - cross-tabulation, frequency, descriptive, ratio scales, etc.


2. Bivariate statistics- t-test, ANOVA, correlation (Bivariate, partial distances), and non-parametric tests.

3. Prediction for numerical outcomes-linear regression.


4. Prediction for identifying groups- factor analysis.

SPSS data sets have to dimensional table structure where rows are called cases and columns represent
variables.

VERSIONS AND OWNERSHIP HISTORY


The company was started in 1968 when Norman Nie, Dale Bent, and Hadlai "Tex" Hull developed and
started selling the SPSS software. The company was incorporated in 1975, and Nie served as CEO from
1975 until 1992. Jack Noonan served as CEO from 1992 until the 2009 acquisition of SPSS Inc. by IBM.
Now SPSS-29th version is running.

DATA INFORMATION KNOWLEDGE AND WISDOM


Data, information, knowledge, and wisdom are closed-related concepts but have their roles about the
other and each term has its meaning.

Data is in unorganized and unformulated form. it is first collected and then analyzed; data only becomes
information suitable for making decisions once it has been analyzed in some fashion. knowledge is
derived from extensive amounts of information on a subject.

Wisdom means the status of the person in possession of a created knowledge who also knows under
which circumstances it is good to use it.
Data is the least abstract concept; knowledge information is the next least and knowledge is the most
abstract. Data becomes information through interpretation. Data is the plural form whereas data is the
singular form.

DATABASE
A database is an organized Collection of data. It is a collection of Schemes, tables, queries, reports,
views, and other objects. formally a database refers to a set of related data and the way it is organized.
Databases are used to hold and support the internal operations of the organization. it is also used to
hold administrative information and more specialized data such as engineering data or economic
models.

DATABASE MANAGEMENT SYSTEM

A database management system (DBMS) is a computer software application that interacts with the user,
other applications, and the database itself to capture and analyze the data. examples of DBMS are my
SQL, Oracle, Microsoft SQL, IBM DB2, etc.
DATA ANALYSIS

Analysis of data is a process of inspecting, cleansing, transforming, and modeling data to discover useful
information, suggesting conclusions, and supporting decision-making.

Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery
for predictive rather than purely descriptive purposes.
VARIABLE

A variable is a value that can change depending on the conditions or on information passed to the
program. Data collection happens on the variable.

Example- The heights of the students who attended a particular class is data where height is the variable.
TYPES OF VARIABLES

Based on the nature, the variables can be classified into 2 types.


1. Metric variable

2. Non-Metric variable
Metric variables refers to those variables having attributes where as non-metric variables refers to those
that have presence or absence of a characteristic or a property.

Example- Metric variable-Sales, Costs, no.of. students.. etc

Non-metric variables are also called as dummy or categorical variable.


TYPES OF METRIC VARIABLES

Metric variables can be further classified into two types.

1. Discrete variable
2. Continuous variable

Discrete variables are countable form of data-example no of students in a class.

Continuous variables are measurable form of data- example height, weight of the students.

MEASUREMENT AND SCALES OF MEASUREMENT


Measures: it is the scales of measurement, which can be nominal, ordinal, interval or ratio scale. In the
SPSS you will find the nominal, ordinal and ratio measures.

Nominal scale: Numbers are labels or groups or classes. Simple codes assigned to objects as labels. We
use nominal scale for qualitative data, e.g. professional classification, geographic classification.
E.g. blonde: 1, brown: 2, red: 3, black: 4. A person with red hair does not possess more ‘hairiness’ than a
person with blonde hair.

Ordinal scale: Data elements may be ordered according to their relative size or quality, the numbers
assigned to objects or events represent the rank order (1st, 2nd, 3rd etc.) E.g. top lists of companies.

Interval scale: There is a meaning of distances between any two observations. The "zero point" is
arbitrary. Negative values can be used. Ratios between numbers on the scale are not meaningful, so
operations such as multiplication and division cannot be carried out directly. E.g. temperature with the
Celsius scale.

Ratio scale (Scale): This is the strongest scale of measurement. Distances between observations and also
the ratios of distances have a meaning. It contains a meaningful zero. E.g. mass, length.

2.WORKING WITH SPSS

Once the SPSS software is opened, a dialo gue box appears, click cancel and proceed. Then SPSS window
in the data view format will be open by default. To get started always check for the status process is
ready on the extreme right to process for the day entry
DATA EDITOR
The data editor allows you to create your own data set and perform statistical operations interactively
using pull-down menus. The data editor window has two sheets.
1. Data view
2. Variable view

1. Data view
By default, the data view opens whenever you open the data editor. It contains your actual data set.
Hear the variables are represented I have columns and each row is called a case.
2. Variable View
The variable view allows you to name your variable to identify the missing values. Space should not be
used when writing a variable name we can use an underscore’_’ Instead.
Once the variables are entered, they are a string of columns associated with the variable.

1. Name
Name is the name of the variable. This will appear in the column holders in the data view. In SPSS
variable names don't have spaces. Space is indicated using an underscore ( _ ).

2. Type
Type is the type of data in the variable. A string type refers to the data stored as text numeric variable
stored data is a number. Other useful options are data and Dollars. To perform different statistical
functions numeric data type is preferred as SPSS cannot perform statistical functions on string data type.

3. Width
Width tells the computer how much space each case needs to take up. This is increased in character.

4. Label
label is useful for explaining what the value is measuring. Label gives the complete information about
the variable when there is restriction on the number of characters (normally in older versions).

5. Values
This allows you to display certain labels depending on the data in each case. Values are generally used
for nonmetric variables where it takes data only as numbers, punctuations and alphabets are not taken
into concentration.
To give a value to a nonmetric variable take the mouse pointer and left-click, a dialogue box appears.
Give your prepared value to the variable.
For example gender is a nonmetric variable the value will be
1. male
2. female

6. Alignment
Alignment is the manner in which the numbers are placed one below the other. By default, the
numbers are right-aligned and the text is always left-aligned.

3. IMPORTANT DROP-DOWN MENUS IN DATA VIEW

In the menu bar, there are some important pull-down menus. The most important are the following.

File It helps in creating new data files, opening existing one and save and
print data file.
Edit Allow editing functions like copying, cut, paste.
View Shows menu editor for changes in fonts, grid lines , value labels etc.
Data Details about entering data, define variable properties. Helps in
identifying duplicate cases, sorting cases, transposing, restructuring
and aggregating data.
Transform Transforms data compatible for analysis using recode, replace
missing value etc.
Analyse All inferential statistics are available. It contains statistical tools and
techniques
Graphs Builds different types of charts, graphs etc. Using chart builder
Utilities Command which are used for more complex statistical computations
Add-ons List extra features available for advanced level
Window Allows you to arrange, select and control the attributes of windows
Help Supports in gaining insights about the procedures of SPSS. Contains
tutorials, coach etc.

4. OUTPUT VIEWER

The output window assimilates the results of the work done in SPSS. This means that it identifies the
results of previously conducted analyses.
In the output window, the right side is the output from the SPSS procedures that were run, and on the
left side is the outline of that output. The SPSS output is composed of a series of output objects that
can be titles like frequencies, descriptive, cross tables, table of charts and numbers, etc. Each of
these objects is listed in the outline view. The outline view makes navigating the output easier.
There are 4 types of files in SPSS

1. Data file
2. Output file
3. Syntax file
4. Script file.
The data file and the output file are the two important files which are frequently used in SPSS.
The data file is the file in which the data is stored. This file is saved with the extension .Sav.

Once the data file is saved, the output file is generated with the message-data file saved. This file also
contains the various outputs generated using statistical operations. The output file is saved with the
extension .SPV.
5. INTRODUCTION TO STATISTICS
The meaning of statistics word is varying to the different person. In day-to-day human life, the
knowledge of this subject is use in different ways. We have used statistics for personal purpose as well as
professional purpose. In personal life, we have used statistics for general calculation of household
budget. Generally, there are two types of information i.e., quantitative and qualitative information. Thus,
this subject is used by the people to take appropriate decision about the problems/ budget on the basis
of the both types of information’s.

MEANING OF STATISTICS

The word of statistics has been derived from the ‘status’, which is Latin word OR ‘Statista’ which is Italian
word. In the18th century, Prof. G. F. Achenwall has been used it first time. For a common man, ‘Statistics’
means numerical information expressed in quantitative terms, which may relate to objects, subjects,
activities, information, phenomena, or regions of space. The word statistics can be defined in two broad
different ways, because it is used to convey different meanings in singular and plural sense.

DEFINITION OF STATISTICS

The definition of statistics has been given by the different statistician in different ways. Some important
definitions of statistics are given below;
A. L. Bowley defined that “Statistics may be called the science of counting”. He also said that
“Statistics may rightly be called the science of average”.

B. According to Boddington “Statistics is the science of estimates and probabilities”.

C. According to Selligman “Statistics is the science which deals with the methods of collecting,
classifying, tabulation, comparing and interpreting numerical data collected to throw some light
on any sphere of enquiry”.
D. Croxton and Cowden defined that “statistics as the collection, tabulation, presentation, analysis
and interpretation of numerical data”.

TYPES OF STATISTICS

There are two types of statistics based on subject matter/ function.


1. Descriptive Statistics

Descriptive Statistics is the branch, which deals with descriptions of obtained data. It is a summary
statistic that summarizes features/ characteristics from a collection of information. Moreover, it includes
classification, tabulation, measurement of central tendency as well as variability. The researchers use
these measures to understand the tendency of data/ scores. Which further enhances the ease of
description of the phenomena.

2. Inferential statistics
Statistical inference (SI) is the process of data analysis to deduce properties of a probability distribution.
Inferential statistical analysis infers properties of a population or census through the testing hypotheses
and deriving estimates which are based on the primary assumption i.e., the observed data set is sampled
from a larger population. It also deals with the drawing of conclusions about population/ census.
Moreover, It provides a technique to compute the probabilities of future behavior of the subjects/ areas.

6.DESCRIPTIVE STATISTICS
MEASURES OF CENTRAL TENDENCY

Measures of central tendency is a single value that attempts to describe a set of data by identifying a
central position within that set of data.

The mean, median, and mode are all valid measures of central tendency but under different conditions.
Some measures of central tendency become more appropriate to use than others.
Mean

The most commonly used measure of central tendency is called mean (or the average). Here the main of
interest is to learn how to calculate the mean when the data set is in type of ungrouped (raw data). The
mean for ungrouped data is obtained by dividing the sum of all values by the number of values in that
data set.

Median
The median is the value of the middle term in a data set that has been ranked in increasing or decreasing
order.

Mode

The mode is the value that occurs most often in a data set.
MEASURES OF VARIATION

1.Range

In simple words, the range for a data set depends on two values (the smallest and the largest values)
among all values in such data set.
Range = Largest value – Smallest value

2.Mean Deviation

Another measure of variation is called mean deviation; it is the mean of the distances between each
value and the mean.
3.Variance and Standard Deviation

A most used measure of variation is called standard deviation denoted by (σ for the population and S for
the sample). The numerical value of this measure helps us how the values of the dataset corresponding
to such measure are relatively closely around the mean.
DESCRIPTIVE STATISTICS

1. Given below are the combined parental income of 30 students using SPSS.
i. Calculate mean, median, and mode of the parental income

ii. Calculate the measures of dispersion.


30000 59000 1200000 59000 70000 925000
72000 58000 27000 379000 145000 45000
61000 26000 77000 52000 100000 60000
44000 225000 79000 55000 35000 48000
312000 42000 40000 91000 63000 53000
Steps:

1) Open SPSS

2) By default, SPSS screen appears to be in the data view.


3) Go to variable view and give in the variables.

1- Serieal -No

2- Parental-Income
Both the above variables are metric variables.

4) After the variables are entered, then go to the data view.

5) The variables given in the variables view becomes the columns in the data view.
6) Enter the data in the data view.

7) To find out the measures of central tendency and the measures of dispersion, we do a
descriptive analysis.

8) Go to analyse menu, choose descriptive statistics and choose frequencies.


9) A dialog box appears.

10) Select parental income and bring it to the variable side.


11) Then click on statistics.

12) A frequency statistics dialogue box appears.


13) Select Mean, median, mode, std-deviation, variance, Range, Minimum and Maximum. Then click
continue.
14) Click OK in the frequency dialog box.

15) The output is displayed in the output file.

Output:
Frequencies
Statistics
Income
N Valid 30
Missing 0
Mean 151066.67
Median 59500.00
Mode 59000
Std. Deviation 263103.092
Variance 69223236781.609

Range 1174000
Minimum 26000
Maximum 1200000
SKEWNESS

Skewness means lack of symmetry. In mathematics, a figure is called symmetric if there exists a point in
it through which if a perpendicular is drawn on the X-axis, it divides the figure into two congruent parts
i.e. identical in all respect or one part can be superimposed on the other i.e mirror images of each other.
In Statistics, a distribution is called symmetric if mean, median and mode coincide. Otherwise, the
distribution becomes asymmetric. If the right tail is longer, we get a positively skewed distribution for
which mean > median > mode while if the left tail is longer, we get a negatively skewed distribution for
which mean < median < mode.

Symmetrical Curve

Negative Skewed Curve


Positive Skewed Curve
KURTOSIS

To get the complete idea about the shape of the distribution which can be studied with the help of
Kurtosis. Prof. Karl Pearson has called it the “Convexity of a Curve”. Kurtosis gives a measure of flatness
of distribution. The degree of kurtosis of a distribution is measured relative to that of a normal curve.
The curves with greater peakedness than the normal curve are called “Leptokurtic”. The curves which are
more flat than the normal curve are called “Platykurtic”. The normal curve is called “Mesokurtic.”

7.INFERENTIAL STATISTICS

In inferential statistics data is used from the sample and conclusions or inferences are made about the
larger population from which the sample is drawn. The goal of the inferential statistics is to draw
conclusions from a sample and generalize them to the population. It determines the probability of the
characteristics of the sample using probability theory. The most common methodologies used are
hypothesis tests, Analysis of variance etc.
Inferential statistics is divided into 2 types.

1. Parametric Tests

parametric statistics as “statistics used for the inference from a sample to a population that
assume the variances of each group are similar and that the sample is large enough to represent
the population”
2. Non-Parametric Tests

Non-parametric statistics can be described as tests that do not involve testing of hypotheses
related to population parameters. Salkind (2014, page 46) described non-parametric statistics as
“distribution- free statistics that do not require the same assumptions as do parametric
statistics”
Most parametric tests are based on normal distribution and have four basic assumptions
that must be met for the test to be accurate. The assumptions of parametric tests are

1. Normally distributed data

The rationale behind hypothesis testing relies on data being normally distributed and so if
this assumption is not met, logic behind hypothesis testing is flawed.
2. Homogeneity of variance

This assumption means that the variances should be the same throughout the data.

3. Interval data
The data should be measured at least in an interval scale.

4. Independence

This assumption like normality is different depending on the test.


Difference between Parametric and Non-parametric Statistics

Parametric Statistics Non-parametric Statistics


The assumed distribution is normal. The assumed distribution may not be normal. It can
be any distribution.
The variance is homogeneous. The variance could be heterogeneous or no
assumption is made with regard to the variance.
The scales of measurement used are interval or The scales of measurement used are nominal or
ratio. ordinal.
The observations need to independent of each There is no assumption with regard to the
other independence of the observations
Mean is the measure of central tendency that is Median is the measure of central tendency that is
used here. used here.
It is more complex to compute when compared to It is simple to calculate.
the non-parametric techniques.
Can get affected by outliers Is comparatively less affected by outliers
The following test used between Parametric and Non-Parametric Statistics

Category Parametric Statistical Techniques Non-parametric Statistical Techniques


Correlation Pearson Product Moment Coefficient of Spearman Rank Coefficient Correlation
Correlation (r) (Rho), Kendall‟s Tau
Two groups, Independent t-test Mann-Whitney „U‟ test
independent
measures
More than two One-way ANOVA Kruskal-Wallis one way ANOVA
groups, independent
measures,
Two groups, repeated Paired t-test Wilcoxon matched pair signed rank test
measures
More than two One-way, repeated measures ANOVA Friedman’s two-way Analysis of
groups, repeated Variance
measures,

8.CORRELATION

Correlation is a bivariate analysis that measures the strength of the association between two variables
and the direction of the relationship. In terms of the strength of the relationship, the value of the
correlation coefficient varies between +1 and -1. A value of ± 1 indicates a perfect degree of association
between the two variables. As the correlation coefficient value goes towards 0, the relationship
between the two variables will be weaker. The direction of the relationship is indicated by the sign of
the coefficient; a + sign indicates a positive relationship and a – sign indicates a negative relationship.

Usually, in statistics, we measure four types of correlations:


Pearson correlation
Kendall rank correlation
Spearman correlation

PEARSON CORRELATION COEFFICIENT


Karl pearson,s method of correlation coefficient is based on covariance of the concerned variables. In
modern literature, it is also called as product movement coefficient of correlation because it is based on
the product of the first moment around mean in the two series.

Properties of correlation coefficient


1. Correlation coefficient lies between -1 and +1
2. Correlation coefficient is independent of the change in origin and scale of reference
3. Correlation coefficient is a pure number independent of the unit of measurement
Assumptions of Karl Pearson’s coefficient
1) Linear relationship
2) Casual relationship
3) Each of the variables is being affected by a large number of independent contributory causes of
such nature as to product normal distribution
4) Error of measurement.

Example based on Karl Pearson’s coefficient of correlation

Calculate Karl pearson’s coefficient of correlation for the advertisement cost and sales of a company.
Is there a correlation between the advertisement cost and the sales of the product.
Step 1. Open IBM SPSS statistics
Step 2. Check for the statement “IBM SPSS is ready”
Step3. Give the variables in the variable view
Variable 1 Serial No
Variable 2 Cost
Variable 3 Sales
Step4. Click on Data view. The variables in the variable view are displayed as columns in data view.
Step5. Enter the data in data view
S.No Cost Sales
1 39 47
2 65 53
3 62 58
4 90 86
5 82 62
6 75 68
7 25 60
8 98 91
9 36 51
10 78 84

Step6. After entering the data, click on Analyse----Correlate ------Bivariate. A dialog box appears
Step7. In the dialog box, drag cost and sales to the right side. In the correlation coefficient, select
Pearson’s coefficient. In Options select means and Standard deviations and click continue, Click OK.
Step8. An Output screen appears with the correlation results.

SPSS Output:
Correlations
Cost Sales
Cost Pearson Correlation 1 .780**
Sig. (2-tailed) .008
N 10 10
Sales Pearson Correlation .780** 1
Sig. (2-tailed) .008
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).

Conclusion:- A Pearson product-moment correlation was run to determine the relationship


between cost and sales. There was a strong, positive correlation between cost and sales, which was
statistically significant (r = .780, n = 10, p = .008).
SPEARMAN CORRELATION COEFFICIENT
Spearman’s correlation coefficient is a non parametric statistical tools which is used when the data
have violated parametric assumption such as non-normally distributed data
Example:
Calculate Spearman’s correlation coefficient in the distribution of marks in Economics and
mathematics of 10 students in an examination.
Step1. Open SPSS
Step2. When SPSS Opens up check for the message IBM is ready
Step3. In the variable view, enter the variables
Variable 1 Student_Id
Variable 2 Economics
Variable 3 Mathematics
Step4. Go to data view and enter the data
Student-id Economics Mathematics
1 25 70
2 28 80
3 32 85
4 36 75
5 40 65
6 38 59
7 39 48
8 42 50
9 41 54
10 45 66

Step5. Choose Analyse -----correlate-------Bivariate


Step6. In the correlation dialog box, drag economics and mathematics and check on Spearman
correlation then click OK.
Step7. The output screen appears with the correlation results.
Correlations
Mathematics Economics
Spearman's rho Mathematics Correlation Coefficient 1.000 -.636*
Sig. (2-tailed) . .048
N 10 10
Economics Correlation Coefficient -.636* 1.000
Sig. (2-tailed) .048 .
N 10 10
*. Correlation is significant at the 0.05 level (2-tailed).
Conclusion:- A Spearman's rank-order correlation was run to determine the relationship between
10 students' maths and economics exam marks. There was a strong, nagative correlation between
math and economics marks, which was statistically significant (rs(10) = -0.636, p = .048).
KENDALL RANK CORRELATION
Kendall rank correlation is another non parametric correlation coefficient which is used when you have a
small data set with a large number of tied ranks.

Example: Calculate the correlation coefficient relating to plant utilization of a company


Step1. Open SPSS
Step2. When SPSS Opens up check for the message IBM is ready
Step3. In the variable view, enter the variables
Variable 1 S. No
Variable 2 Plant_Capacity
Variable 3 Plant_Utilization
Step4. Go to data view and enter the data

Plant
Plant
S.No Capacity Utilization
1 2.6 2
2 2.8 2
3 3 2.6
4 3 2.5
5 3 2.4
6 3.2 2.8
7 3.8 3
8 4.9 3.9
9 5.4 4.8
10 6 5

Step5. Choose Analyse -----correlate-------Bivariate


Step6. In the correlation dialog box, drag plant_capacity and plant_Utilization and click on Kendall’s
correlation then click OK.
Step7. The output screen appears with the correlation results

Correlations
Plant Capacity Plant Utilization
Kendall's tau_b Plant Capacity Correlation Coefficient 1.000 .954**
Sig. (2-tailed) . .000
N 10 10
Plant Utilization Correlation Coefficient .954** 1.000
Sig. (2-tailed) .000 .
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).
Conclusion:-
The above results of the correlation coefficients indicate that we can see that Kendall's correlation
coefficient, τ b, is 0.954, and that this is statistically significant (p = 0.000).

9. REGESSION ANALYSIS

Regression is the measure of the average relationship between two or more variables in terms of original
units of data. It is one of the most frequently used techniques in economics and business research to
find the relationship between two or more variables that are related to causality is called regression
analysis.

UTILITY OF REGRESSION ANALYSIS


1). Regression analysis helps in establishing a functional relationship between two or variables
2). Regression analysis is a highly valuable tool for studying the cause-and-effect relationship
3). This tool can be used for prediction or estimation of future production, prices, etc.
4). Regression analysis is widely used in the estimation of demand and supply curves, production and
cost function, etc.
REGRESSION COEFFICIENT
In the regression line of Y on X
Y= a+bx
The coefficient b is called the slope of the line of regression of Y on X and is called the coefficient of
regression of Y on X. It represents the increment in the value of the dependent variable Y for a unit
change in the value of independent variable X.

Example:
A personnel manager of a company wants to find a measure which he can use to fix the monthly
income of the persons applying for a job in production department. As an experimental project, he
collected the data on 7 persons from that department referring to years of service and their monthly
income.
Find regression equation of Income (X) on years of service.

Step1. Open SPSS


Step2. When SPSS Opens up check for the message IBM is ready
Step3. In the variable view, enter the variables
Variable 1 S. No
Variable 2 Years of service
Variable 3 Income
Step4. Go to data view and enter the data

Years of
S.No service Income(000)
1 11 10
2 7 8
3 9 6
4 5 5
5 8 9
6 6 7
7 10 11

Step5. Choose Analyse----Regression -----Linear


Step6. When the linear regression dialog box appears drag
Years of service---Dependent variable
Income -----Independent Variable
Step7. The output is generated in the output screen

Output

Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .750a .563 .475 1.565
a. Predictors: (Constant), Income(000)
ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 15.750 1 15.750 6.429 .052b
Residual 12.250 5 2.450
Total 28.000 6
a. Dependent Variable: Years of service
b. Predictors: (Constant), Income(000)

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 2.000 2.439 .820 .450
Income(000) .750 .296 .750 2.535 .052
a. Dependent Variable: Years of service

Regression equation = 2+0.75x

10.T-TEST
T test can be used to test whether two means of two different groups are different or not. If we take a
very large sample from a population and calculate the mean for each sample and then plot a frequency
distribution of the means then the resultant sampling distribution would be student t distribution.

DEGREE OF FREEDOM
It may be noted that for the students t test the number of degrees of freedom is n-1.
ASSUMPTIONS OF T TEST
1) The parent population from which the sample is drawn is normal
2) The sample observations are randomly distributed
3) The population standard deviation is not Known
TYPES OF T TESTS IN SPSS
There are three types of t tests in SPSS
1) Independent sample t test
2) Dependent sample t test or Paired t test
3) One sample t test

INDEPENDENT SAMPLE T TEST


The independent sample t test is used in situations in which there are two experimental conditions and
different participants have been used in each situation.

Example: Test were made at short intervals on Spark plugs of two manufactures. The following
tabulation gives the number of hours of service from plugs the two sources.
Source A Source B
200 190
210 200
190 190
200 180
190 190
200 210
180 200
200 192
200
210

Do these results indicate a statistically significant difference between spark plugs as far as the mean
length of service is concerned.

Step1. Go to start menu –Click on IBM SPSS


Then SPSS window will be opened it contains data view and variable view

Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 S. No
Variable 2 Source
Variable 3 Hours
Step3. In variable view enter the names as source and hours then click on values in the source column.

Type value as 1 label as “Source A” and Value as 2 for label “source B” then click on OK
Step4. Go to data view and enter the data as follows
S.No Source Hours
1 1 200
2 1 210
3 1 190
4 1 200
5 1 190
6 1 200
7 1 180
8 1 200
9 1 200
10 1 210
11 2 190
12 2 200
13 2 190
14 2 180
15 2 190
16 2 210
17 2 200
18 2 192

Step5. Go to Analyse----compare means----select Independent sample t-test

Step6. Move variable Hours to test variable box and source variable to Grouping variable box then define
groups as 1 and 2 after that click on continue and click on OK in main dialog box.
Output:
Group Statistics
Source N Mean Std. Deviation Std. Error Mean
Hours Source A 10 198.00 9.189 2.906
Source B 8 194.00 9.071 3.207

Independent Samples Test


Levene's Test for
Equality of
Variances t-test for Equality of Means
Sig. 95% Confidence
(2- Std. Error Interval of the
tailed Differenc Difference
F Sig. t df ) Mean Difference e Lower Upper
Hour Equal .006 .94 .92 16 .370 4.000 4.334 -5.189 13.189
s variance 0 3
s
assume
d
Equal .92 15.22 .370 4.000 4.328 -5.213 13.213
variance 4 9
s not
assume
d
Conclusion: From the above table we can see that significance value is more than 0.05 that is
0.940 here we can conclude that the variance among the groups is equal that means that the null
hypothesis is accepted.

DEPENDENT SAMPLE T TEST OR PAIRED T TEST


This test used when there are two experimental conditions and the same participant participate in
both the conditions. This test also called as paired sample t test.
Example: The following data shows weekly sales of a manufacturer before and after
reorganization of the sales units of 10 weeks from September to December in two successive
years.
Sales Before- Sales After-
week No Reorganisation Reorganisation
1 15 20
2 17 19
3 12 18
4 18 22
5 16 20
6 13 19
7 15 21
8 17 23
9 19 24
10 18 24

Applying t test to determine whether re-organization had any effect on the sales.

Step1. Go to start menu –Click on IBM SPSS


Then SPSS window will be opened it contains data view and variable view

Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 S. No
Variable 2 Week No
Variable 3 Sales-Before
Variable 4 Sales-After
Step3. Go to data view and enter the data as follows

S.NO week No Sales-Before Sales- After


1 1 15 20
2 2 17 19
3 3 12 18
4 4 18 22
5 5 16 20
6 6 13 19
7 7 15 21
8 8 17 23
9 9 19 24
10 10 18 24

Step4. After entering the data in the data view click on Analyze----Compare means-----paired sample t
test

Step5. Move sales-Before and sales-After to paired variables box and click on OK.
The SPSS output will be as follows

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Sales-Before 16.00 10 2.261 .715
Sales- After 21.00 10 2.160 .683

Paired Samples Correlations


N Correlation Sig.
Pair 1 Sales-Before & Sales- After 10 .819 .004

Paired Samples Test


Paired Differences
95% Confidence
Interval of the
Std. Std. Error Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair Sales-Before - 5.000 1.333 .422 5.954 4.046 11.85 9 .000
1 Sales- After 9
Conclusion: There was a significant average difference before and after re organization in sales with (t
11.859, P<0.05) hence the null hypothesis is rejected.
ONE SAMPLE T TEST
Example: Certain Pesticide is placed into bags by a machine. A random sample of 10 bags drawn and
their contents are found to weight (in Kgs) as follows.
50, 49, 52, 44, 45, 48, 45, 49, 45
Test if the average packaging can be taken to be 50 Kgs.
Step1. Go to start menu –Click on IBM SPSS
Then SPSS window will be opened it contains data view and variable view

Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 S. No
Variable 2 Weights
Step3. Go to data view and enter the data as follows
S.No weights
1 50
2 49
3 52
4 44
5 45
6 48
7 45
8 46
9 49
10 45

Step4. After entering the data in the data view click on Analyze----Compare means-----One sample t test

Step5. Move weights variable to Test variable box and click on OK. The output will be as follows.

Output:

One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
weight 10 47.30 2.669 .844
One-Sample Test
Test Value = 50
95% Confidence Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
weight -3.199 9 .011 -2.700 -4.61 -.79

Conclusion: There is a significant difference in the mean weight of bags (p < .011), the average weight of
bags are less 3 kgs than 50 kgs.

11.CHI-SQUARE TEST

The Chi-Square Test of Independence determines whether there is an association between categorical
variables (i.e., whether the variables are independent or related). It is a nonparametric test. This test is
also known as the Chi-Square Test of Association.

The Chi-Square Test of Independence is commonly used to test the following:


i. Statistical independence or association between two categorical variables

ii. Goodness of Fit

CROSS TABS
It is used to aggregate and jointly display the distribution of two or more variables by tabulating. It is
widely used for find out interrelationship and interactions between variables.

CHI-SQUARE TEST-ASSOCIATION OF ATTRIBUTES

The following is the data regarding the level of awareness of online banking of males and females in a
particular area such as recharge, shopping, Ticket booking, fund transfer and bill payment.
We want to find there is any association among people and their level of awareness.

Step1. Go to start menu –Click on IBM SPSS

Then SPSS window will be opened it contains data view and variable view
Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 S. No
Variable 2 gender
Variable 3 level of awareness
Step3. In values column enter gender such as 1 is male and 2 is female, then after same way level of
awareness 5 vales like 1=recharge, 2=shopping, 3=Ticket booking, 4=fund transfer and 5=bill payment.
Step4. Go to data view and enter the data as follows

level of
S.No Gender Awareness
1 2 5
2 1 3
3 2 4
4 2 1
5 2 2
6 1 4
7 1 1
8 2 5
9 1 3
10 1 2

Step5. Go to Analyse--- Descriptive statistics----cross tabs---Click on Chi-square ---Click on OK

Output:
Chi-Square Tests
Asymptotic
Significance (2-
Value df sided)
Pearson Chi-Square 4.000a 4 .406
Likelihood Ratio 5.545 4 .236
Linear-by-Linear .720 1 .396
Association
N of Valid Cases 10
a. 10 cells (100.0%) have expected count less than 5. The minimum
expected count is 1.00.
Conclusion: Since the p-value is greater than our chosen significance level (α = 0.05), we do not
reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an
association between gender and level of awareness.

CHI-SQUARE TEST-GOODNESS OF FIT

In one area a survey was conducted among people regarding having a passport or not. The results are
as follows.
Respondents: 1 2 3 4 5 6 7 8

Response: 1 1 1 1 1 1 2 1
Apply the chi square test of Goodness of fit.

Step1. Go to start menu –Click on IBM SPSS

Then SPSS window will be opened it contains data view and variable view
Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 Respondents
Variable 2 Having Passport
Step3. In values column enter 1 is having pass port that is Yes and not having passport that is No=2.
Step4. Go to data view and enter the data as follows

Having
Respondents Passport
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 1
Step5. Go to Analyse --Non-parametric test-Legacy Dialog-Chi-square test-select variable and click on OK.

Output:

Having Passport
Observed N Expected N Residual
yes 7 4.0 3.0
No 1 4.0 -3.0
Total 8

Test Statistics

Having Passport
Chi-Square 4.500a

df 1
Asymp. Sig. .034

a. 2 cells (100.0%) have expected frequencies less than 5. The minimum expected
cell frequency is 4.0.

Conclusion: The above table, Test Statistics, provides the actual result of the chi-square goodness-of-fit
test. We can see from this table that our test statistic is statistically significant: χ2(2) = 4.500, p < .034.
Therefore, we can reject the null hypothesis.

12.ANOVA
The following are the salaries of the employees working in different departments such as finance,
Human Resources, Marketing. The details as follows.

Employee_Id Department Salaries


1 1 4500
2 2 4700
3 3 4000
4 1 4200
5 2 4100
6 3 4000
7 1 3800
8 2 3700
9 3 3500
10 1 4200
Calculate One way ANOVA.
Step1. Go to start menu –Click on IBM SPSS
Then SPSS window will be opened it contains data view and variable view
Step2. Click on variable view and enter the variable names given in the problem such as Source, Hours
Variable 1 Employee_Id
Variable 2 Department
Variable 3 Salaries
Step3. In values column for department variable enter 1 is finance, 2 is Human resources and 3 is for
Marketing.
Step4. Go to data view and enter the data as above.

Step5. Click on Analyse---Compare means---select One way ANOVA


After selecting the variables click on ok.

Output:

ANOVA
Salaries
Sum of
Squares df Mean Square F Sig.
Between Groups 240166.667 2 120083.333 .913 .444
Within Groups 920833.333 7 131547.619
Total 1161000.000 9

Conclusion: We conclude that the mean salaries among the departments is not statistically significantly
(F2, 350 = .913, p < 0.444) so accept the null hypothesis as means are equal among groups.

13.GRAPHS IN SPSS
From the below data prepare Pie chart, Box plot and Histogram

Money Study Time for Money Study Time for


Spent timeT Party Spent timeT Party
for for
Student_Id Party Student_Id Party
1 50 7 2 21 80 13 4
2 35 8 1 22 50 6 4
3 120 12 4 23 110 5 4
4 80 3 4 24 60 8 3
5 100 11 1 25 70 10 2
6 120 14 4 26 60 10 2
7 90 11 2 27 50 11 2
8 80 10 3 28 75 4 3
9 70 9 3 29 80 7 4
10 80 8 3 30 30 12 0
11 60 12 2 31 70 7 3
12 50 14 1 32 70 14 1
13 100 13 4 33 70 3 3
14 90 15 0 34 60 11 2
15 60 7 3 35 70 9 1
16 40 6 0 36 60 11 1
17 60 5 2 37 60 11 1
18 90 8 2 38 90 8 2
19 130 12 5 39 70 9 3
20 70 11 1 40 200 10 5
PIE CHART
Quick Steps

Click Graphs -> Legacy Dialogs -> Pie

Select “Summaries for groups of cases”


Click Define

Click “Reset” (recommended)

Move the variable for which you are creating a pie chart into the “Define slices by” box
Select your desired option under “Slices Represent”

Select “Titles” to add a title (recommended)

Click “OK”
Pie Chart of the variable times parting per week
BOXPLOT

A boxplot (also known as a box and whisker plot) is a way of graphically illustrating the
distribution of numeric data using the “five number summary” of the data set – namely the
minimum, first quartile, median, third quartile, and maximum values. It also identifies any outliers
that may exist in the data set.
Steps: Click Graphs -> Legacy Dialogs -> Boxplots in earlier versions of SPSS
Select Simple and Summaries of separate variables
Click Define
Click Reset (recommended)
Select the variable for which you wish to create a boxplot, and move it into the Boxes Represent
box
Click OK
Box plot of the variable study time per week
HISTOGRAM

Quick Steps
Click Graphs -> Legacy Dialogs -> Histogram
Drag variable you want to plot as a histogram from the left into the Variable text box
Select “Display normal curve” (recommended)
Click OK

You might also like