You are on page 1of 22

Module No.

Q. Define SPSS. How to import data from MS excel to SPSS? Differentiate between
data view and variable view. Explain the basic elements of SPSS. Describe the functions,
advantages and disadvantages of SPSS.

Sol.

SPSS (Statistical Package for the Social Sciences), also known as IBM SPSS Statistics, is
a software package used for the analysis of statistical data.

Although the name of SPSS reflects its original use in the field of social sciences, its use has
since expanded into other data markets. SPSS is commonly used in healthcare, marketing and
education research.

The types of data analysed using SPSS is widely varied. Common sources include survey
results, organization customer databases, Google Analytics, scientific research results and
server log files. SPSS supports both analysis and modification of many kinds of data and
almost all formats of structured data. The software supports spread sheets, plain text files
and relational databases such as SQL, SATA and SAS.

SPSS provides data analysis for descriptive and bivariate statistics, numeral outcome
predictions and predictions for identifying groups. The software also provides data
transformation, graphing and direct marketing features.

The software package was created in 1968 by SPSS Inc. and was acquired by IBM in 2009.
While the software was renamed to IBM SPSS Statistics, it is still commonly referred to as
just SPSS.

Importing data from MS Excel to SPSS

1. File, Import Data from the SPSS menu.

2. Select file name.


3. Click 'Read variable names' if the first row of the spreadsheat contains column
headings.

4. Click Open.

Data view and Variable view

Variable View

1. Name: This is a column field that accepts the unique ID. This helps in sorting the

data. For example, the different demographic parameters such as name, gender, age,

and educational qualification are the parameters for sorting data.

The only restriction is special characters which are not allowed in this type.
2. Label: The name itself suggests it gives the label. Which also provides the ability to

add special characters.

3. Type: This is very useful when different kinds of data are inserted.

4. Width: We can measure the length of characters.

5. Decimal: While entering the percentage value, this type helps us to decide how much

one needs to define the digits required after the decimal.

6. Value: This helps the user to enter the value.

7. Missing: This helps the user skip unnecessary data not required during analysis.

8. Align: Alignment, as the name suggests, helps to align left or right. But in this case,

for ex. Left align.

9. Measure: This helps to measure the data entered in the ordinal, cardinal, and nominal

tools.

The data has to enter in the sheet named “variable view.” It allows us to customize the data

type as required for analyzing it.

One needs to populate column headings like Name, Label, Type, Width, Decimals, Values,

Missing, Columns, Align, and Measures to analyze the data.

These headings are the different attributes that help to characterize the data accordingly.

Data View
The data view is structured as rows and columns. By importing a file or adding data

manually, we can work with SPSS.


Elements of SPSS

1. Name is the variable's machine readable name. This is the name used to refer to the
variable in SPSSs underlying code and, if no "Label" is defined, the name that will
appear at the top of the column in the "Data View."
2. Type indicates the type of data that can be stored in the variable's column. The most
frequently used types are "String" (for text) and "Numeric." SPSS uses the type to
know what rules can be applied to a specific variable. It won't do arithmetic on a
string variable, for example.
3. Width indicates the allowed number of characters per instance.
4. Decimals set the number of decimal places allowed in variable instances.
5. Label sets the name that will be displayed at the top of the column in the Data Editor
6. Values sets names given to coded values (e.g. if the data represents survey responses
where a "0" represents "no" and "1" represents a "yes" this field can be used to tell
SPSS to display the text values instead of the numerical raw data).
7. Missing sets the values that will be encoded as "Missing."
8. Columns sets the displayed column length.
9. Align sets the displayed alignment (right, left, or centre).
10. Measure sets the statistical level of measurement. SPSS distinguishes between
"Scale" (variables that represent a continuous scale like population or temperature),
"Ordinal" (variables that can be rank ordered but do not represent), and "Nominal"
(variables that cannot be ranked such as those that represent labels or classifications).
11. Role is used by some SPSS dialogues to distinguish between the variable's intended
usage in some predictive applications (e.g. regression, clustering, and classification).
For most dialogues the role won't be significant.

Functions of SPSS

The core functionalities offered in SPSS are:

 Statistical program for quantitative data analysis – It includes frequencies, cross-

tabulation, and bivariate statistics.

 Modeller program that allows for predictive modelling. It enables researchers to build

and validate predictive models using advanced statistical procedures.

 Text analysis helps you derive insights from qualitative inputs through open-ended

questionnaires.

 Visualization Designer allows researchers to use their data for a variety of visual

representations.
Advantages:
The advantages of using SPSS as a software package compared to other are:
SPSS is a comprehensive statistical software.
1. Many complex statistical tests are available as a built in feature.
2. Interpretation of results is relatively easy.
3. Easily and quickly displays data tables.
4. Can be expanded.

Disadvantages
1. SPSS can be expensive to purchase for students.
2. Usually involves added training to completely exploit all the available features.
3. The graph features are not as simple as of Microsoft Excel.
Module No. 2

Q. What are descriptive statistics? Define mean, median, mode, maximum and
minimum value.

Sol.

Descriptive statistics, in short, help describe and understand the features of a specific data
set by giving short summaries about the sample and measures of the data. The most
recognized types of descriptive statistics are measures of center: the mean, median,
and mode, which are used at almost all levels of math and statistics. The mean, or the
average, is calculated by adding all the figures within the data set and then dividing by the
number of figures within the set.

For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5).
The mode of a data set is the value appearing most often, and the median is the figure
situated in the middle of the data set. It is the figure separating the higher figures from the
lower figures within a data set. However, there are less common types of descriptive
statistics that are still very important.

1. Mean

The most common expression for the mean of a statistical distribution with a discrete random
variable is the mathematical average of all the terms. To calculate it, add up the values of all
the terms and then divide by the number of terms. The mean of a statistical distribution with a
continuous random variable, also called the expected value, is obtained by integrating the
product of the variable with its probability as defined by the distribution. The expected value
is denoted by the lowercase Greek letter mu (µ).

2. Median

The median of a distribution with a discrete random variable depends on whether the number
of terms in the distribution is even or odd. If the number of terms is odd, then the median is
the value of the term in the middle. This is the value such that the number of terms having
values greater than or equal to it is the same as the number of terms having values less than or
equal to it. If the number of terms is even, then the median is the average of the two terms in
the middle, such that the number of terms having values greater than or equal to it is the same
as the number of terms having values less than or equal to it.
3. Mode

The mode of a distribution with a discrete random variable is the value of the term that occurs
the most often. It is not uncommon for a distribution with a discrete random variable to have
more than one mode, especially if there are not many terms. This happens when two or more
terms occur with equal frequency, and more often than any of the others.

A distribution with two modes is called bimodal. A distribution with three modes is called
tramadol. The mode of a distribution with a continuous random variable is the maximum
value of the function. As with discrete distributions, there may be more than one mode.

4. Minimum value

We start by looking more closely at the statistics known as the minimum. This number is the
data value that is less than or equal to all other values in our set of data. If we were to order
all of our data in ascending order, then the minimum would be the first number in our list.
Although the minimum value could be repeated in our data set, by definition this is a unique
number. There cannot be two minima because one of these values must be less than the other.

5. Maximum value

Now we turn to the maximum. This number is the data value that is greater than or equal to
all other values in our set of data. If we were to order all of our data in ascending order, then
the maximum would be the last number listed. The maximum is a unique number for a given
set of data. This number can be repeated, but there is only one maximum for a data set. There
cannot be two maxima because one of these values would be greater than the other.

1.Click Analyse-descriptive statistics- frequencies


2. Click on frequencies tab and select mean, median, mode, maximum and minimum

3. Click on continue, open


Module No.3

Q. Define the meaning and purpose of cross-tabulation. Explain with example.

Sol.

Cross tabulation (crosstab) is a useful analysis tool commonly used to compare the results
for one or more variables with the results of another variable. It is used with data on a
nominal scale, where variables are named or labelled with no specific order.

Crosstabs are basically data tables that present the results from a full group of survey
respondents as well as subgroups. They allow you to examine relationships within the
data that might not be obvious when simply looking at total survey responses.

Example:

Employee satisfaction: The following is a cross tabulation example from Survey

Monkey that was created with data from an employee satisfaction survey. The survey used

multiple-choice questions to ask employees:

 How happy they are at work

 How long they intend to stay at their jobs

 How long they’ve been in the organization

 How many hours they work per day (on average)

The questions that define the groups are in the columns, and the questions used to compare

the groups are in rows. This is the typical format of a crosstab report.

From this crosstab table, you can see that there is a relationship between employees who have

been at the company longer and their level of satisfaction. Once you’ve identified this

relationship, you can explore it further to find out the root cause of this relationship. From the

data you have, you can’t definitively say that one variable is impacting the other. In other
words, the data identifies a correlation between longer-term employment and employee

satisfaction, but it does not imply causation.

1. Click Analyse-descriptive statistics- crosstabs

2. select the rows and columns


3. Click ok
Module No.4

Q. What are charts and its types? What is box plot? Explain with example.

Sol.

The task of defining a chart includes selecting the chart type, naming the chart, defining the
appearance of the chart, specifying the history range for the chart, and defining the variables
associated with assets that are included on the chart. As a first step, decide what type of chart
you want to produce:
Its types are-

1. Line chart
A line chart uses a line to connect together a series of data points.

2. Pie chart
A pie chart is a circular chart divided into sectors, which illustrate relative magnitudes or
frequencies. In a pie chart, the area of each sector is proportional to the quantity it represents.
Together, the sectors create a full circle.

3. Bar chart
A bar chart has rectangular bars with lengths proportional to the values that they represent.
Bar charts are used for comparing two or more values. The bars can be horizontally or
vertically oriented.

4. Scatter chart
A scatter chart uses Cartesian (x,y) coordinates to display data as a collection of points across
time. The position of each point on the horizontal axis (x) is the time the value was collected.
The value of each variable determines the position on the vertical axis (y).

5. External chart
You can use the external chart format to display a chart or image that has been created
outside of Decision Center. The chart or image is referenced by an external URL.

Box plot
Displays the five-number summary of a set of data. The five-number summary is the
minimum, first quartile, median, third quartile, and maximum.
In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes
through the box at the median. The whiskers go from each quartile to the minimum or
maximum.
1. Go to graphs-chart builder

2. Select X and y variables

3. Click on Ok
Module No. 5

1. IMPORTED DATA

2. FREQUIENCY

Statistics
S.NO MATHS
N Valid 25 25
Missing 0 0
Mean 13.00 3.08
Median 13.00 3.00
Mode 1a 2
Minimum 1 1
Maximum 25 5
a. Multiple modes exist. The smallest value is
shown

3. CROSS-TAB
4. GRAPHS
Module No.6

Q. Define the meaning of correlation and explain its types. Explain with example.

Sol.

Correlation in Statistics
This section shows how to calculate and interpret correlation coefficients for ordinal and
interval level scales. Methods of correlation summarize the relationship between two
variables in a single number called the correlation coefficient. The correlation coefficient is
usually represented using the symbol r, and it ranges from -1 to +1.

A correlation coefficient quite close to 0, but either positive or negative implies little or no
relationship between the two variables. A correlation coefficient close to plus 1 means a
positive relationship between the two variables, with increases in one of the variables being
associated with increases in the other variable.

A correlation coefficient close to -1 indicates a negative relationship between two variables,


with an increase in one of the variables being associated with a decrease in the other variable.
A correlation coefficient can be produced for ordinal, interval or ratio level variables, but has
little meaning for variables which are measured on a scale which is no more than nominal.

For ordinal scales, the correlation coefficient can be calculated by using Spearman’s rho. For
interval or ratio level scales, the most commonly used correlation coefficient is Pearson’s r,
ordinarily referred to as simply the correlation coefficient.

Types of correlation

1. The Kendall rank coefficient

It is often used as a test statistic in a statistical hypothesis test to establish whether two
variables may be regarded as statistically dependent. This test is non-parametric, as it does
not rely on any assumptions on the distributions of X or Y or the distribution of (X,Y).

2. Karl Pearson

The Karl Pearson coefficient is defined as a linear correlation that falls in the numeric range
of -1 to +1. This is a quantitative method that offers the numeric value to form the intensity of
the linear relationship between the X and Y variable.

3. Spearman’s Correlation

It is a statically measure of measuring the strength and direction of the monotonic


relationship between two continuous variables. Therefore, these attributes are ranked or put in
the order of their preference. It is denoted by the symbol “rho” (ρ) and can take values
between -1 to +1.

1. Go to Analyse – correlate – Bivariate

2. Select the variables and coefficient correlation then click on Ok

3. Output-
Module No. 7

Q. Explain regression and its steps to compute regression in SPSS. Also explain its
types.

Sol.

Regression is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one dependent
variable (usually denoted by Y) and a series of other variables (known as independent
variables).

Also called simple regression or ordinary least squares (OLS), linear regression is the most
common form of this technique. Linear regression establishes the linear relationship between
two variables based on a line of best fit. Linear regression is thus graphically depicted using a
straight line with the slope defining how the change in one variable impacts a change in the
other. The y-intercept of a linear regression relationship represents the value of one variable
when the value of the other is zero. Non-linear regression models also exist, but are far more
complex.

Regression analysis is a powerful tool for uncovering the associations between variables
observed in data, but cannot easily indicate causation. It is used in several contexts in
business, finance, and economics. For instance, it is used to help investment managers value
assets and understand the relationships between factors such as commodity prices and the
stocks of businesses dealing in those commodities.

Types of Regression

1. Simple linear regression

Simple linear regression reveals the correlation between a dependent variable (input) and an
independent variable (output). Primarily, this regression type describes the following:
Relationship strength between the given variables.

2. Multiple linear regression

Multiple linear regression establishes the relationship between independent variables (two or
more) and the corresponding dependent variable. Here, the independent variables can be
either continuous or categorical. This regression type helps foresee trends, determine future
values, and predict the impacts of changes.
Steps to compute regression in SPSS

1. Go to Analyse—Regression—Linear

2. Select the dependent and independent variables then click on Ok

3. Output
Module No.8

Q. Define chi-square and explain its steps in SPSS.

Sol.

A chi-square (χ2) statistic is a test that measures how a model compares to actual observed
data. The data used in calculating a chi-square statistic must be random, raw, mutually
exclusive, drawn from independent variables, and drawn from a large enough sample. For
example, the results of tossing a fair coin meet these criteria.

Chi-square tests are often used to test hypotheses. The chi-square statistic compares the size
of any discrepancies between the expected results and the actual results, given the size of the
sample and the number of variables in the relationship.

For these tests, degrees of freedom are used to determine if a certain null hypothesis can be
rejected based on the total number of variables and samples within the experiment. As with
any statistic, the larger the sample size, the more reliable the results.

Steps in SPSS

1. Click Analyse-descriptive statistics- crosstabs

2. select the rows and columns


3. Tick chi-square option in statistics tab

4. Click continue than Ok

You might also like