You are on page 1of 16

Analysis NOTES for SPSS

No Type of Analysis Diploma Degree Masters PhD


1. Descriptive Statistics (Nominal /    
Categorical Data)
2. Descriptive Statistics (Survey Items) x x  

3. Data Screening and Cleaning x x  

4. Normality Analysis x x  

5. Reliability Analysis x   

6. Validity x   
(Optional)
7. Multicollinearity Analysis x x  

8. Cross-Tabulation Analysis    
(Optional) (Optional)
9. Correlation Analysis x   

10. Number of Continuous Variable (IV) 1-2 3-4 5-7 8-15


/ Hypothesis
11. Multiple Regression Analysis x X  
(Optional)
12. Mediation / Moderation Analysis x x  
(Optional)

1. DESCRIPTIVE STATISTICS (for nominal data)

a. Basic Reporting

Table 1 : Demographic Profile of the Respondents (n=150)


Profile Frequency Percentage (%)
Gender
Male 82 54.7
Female 68 45.3

Age
Between 20-30 32 21.3
Between 31-40 45 30.0
Between 41-50 37 24.7
50 and above 36 24.0

Qualification
Diploma 11 7.3
Degree 17 11.3
Masters 39 26.0
Doctorate 50 33.3
Others 33 22.0

Sample Explanation
Table 1 shows the descriptive profile of the respondents. A total of 150 respondents
participated in this research. Researchers found that males are the majority of respondents
which consists of 83 (54.7%) while females are 68 (45.3%). While the age group between 31-
40 represents the largest group in this group which is 45 respondents (30%) followed by the
age group between 41-50 which is 37 respondents (24.7%) and the age group 50 and above
which is 36 respondents (24%) ) followed by the age group between 20-30 as many as 32
respondents (21.3%). The age group between 20-30 is the smallest group. In terms of
Qualification, the group with a Doctorate qualification is the largest group found to
participate in this research which is 50 respondents (33.3%) followed by Masters which is 39
respondents (26.0%), first degree 17 respondents (11.3%), other qualifications 33 respondents
(22%) and the diploma of 11 respondents (7.3%). Diploma -educated respondents are the
least group in this research. The ‘others’ category are those with education with a
professional qualification background (5 respondents), vocational (12 respondents) and
certificate level (16 respondents) respectively.

b. Advanced Reporting (Nominal Data)


Table 1 : Demographic Profile of the Respondents (n=150)
Profile Frequency Percentage Min Max Mean Standard
(%) (M) Deviation
(SD)
Gender 1 2 1.45 .45
Male 82 54.7
Female 68 45.3

Age 1 4 2.51 1.10


Between 20-30 32 21.3
Between 31-40 45 30.0
Between 41-50 37 24.7
50 and above 36 24.0

Qualification 1 5 3.51 1.17


Diploma 11 7.3
Degree 17 11.3
Masters 39 26.0
Doctorate 50 33.3
Others 33 22.0

Students’ Notes
Students must explain the basic profile of the respondents and interpret Mean and Standard
Deviation of the nominal data.

Min and Max represent the number of categorical data that you set in the questionnaires.
The mean (average) of a data set is found by adding all numbers in the data set and then
dividing by the number of values in the set.
Standard deviation is a mathematical tool to help you to assess how far the values are
spread above and below the mean. A high standard deviation (more than 1) shows that the
data is widely spread (less reliable) and a low standard deviation (close to zero, not more
than 1) shows that the data are clustered closely around the mean (more reliable).
-1 Min 1
0 Gender Age / Qualification

Strongly Disagree Neutral Agree Strongly


Disagree Agree
1 2 3 4 5

Standard Deviation (SD) in table 1 explain the average amount of variability in your data set.
It tells you, on average, how far each score lies from the mean. If you look at gender, the data
is skewed to one site but still reliable as it is closed to Zero but not more than 1. Whereas, the
data for age and qualification is less reliable because more than 1. Since it is less reliable, we
will use the non-parametric test to access Age and Qualification only.

c. Advanced Reporting (Continuous Data)

Table 2 : Descriptive Statistics of Variables


Variables N Minimum Maximum Mean Std. Deviation
Technology Acceptance 150 2.33 5.00 4.32 .63
Perceived Confidence 150 2.00 5.00 3.71 .65
Cost Benefits 150 1.00 5.00 2.33 .88
Facilitation 150 2.00 5.00 4.10 .65

Sample Explanation
Table 2 shows the minimum and maximum scores for each continuous variable. Mean
technology acceptance score is the highest which is M = 4.32, SD = .63, followed by
facilitation M = 4.10, SD = .65, perceived confidence M = 3.71, SD = .65 and the lowest is
Cost Benefits M = 2.33 , SD = 88. All standard deviation results show that it is close to the
mean score. This gives an indication that all the variables measured have small distributions.
The mean data for cost benefit indicates that most respondents tend to disagree that cost
benefit is the cause of Technology Acceptance, while the mean facilitation indicates that
facilitation is the main cause of technology acceptance.

2. ASSESSING NORMALITY AND VALIDITY

A. Normality
Assessing the normality of residuals is important in multiple regression analysis for several
reasons:

1. Assumption of ordinary least squares (OLS): OLS regression assumes that the residuals
(i.e., the differences between the observed values and the predicted values) are normally
distributed. Violation of this assumption can lead to biased parameter estimates and
inaccurate hypothesis testing.

2. Validity of hypothesis tests and confidence intervals: Hypothesis tests and confidence
intervals are based on assumptions of normality. If the residuals are not normally distributed,
the p-values and confidence intervals may be inaccurate, leading to incorrect conclusions.

3. Influential outliers: Non-normality in the residuals may indicate the presence of influential
outliers. These outliers can have a disproportionate impact on estimated regression
coefficients, potentially distorting the results. Identifying and addressing influential outliers is
essential for obtaining reliable estimates.

4. Model fit and predictive accuracy: The assumption of normality is also related to model fit
and predictive accuracy. If the residuals are not normally distributed, it may suggest that the
underlying model does not capture the true data-generating process adequately. Additionally,
non-normal residuals can affect the accuracy of predicted values and may lead to biased
predictions.

To assess normality, you can use diagnostic plots such as a histogram or a Q-Q plot to
visually inspect the distribution of the residuals. Alternatively, statistical tests, such as the
Shapiro-Wilk test or the Kolmogorov-Smirnov test, can be used to formally test for
normality. If the residuals deviate substantially from normality, you may need to consider
transformations of the variables or consider alternative regression methods such as robust
regression that can handle non-normal residuals.

Step-by-Step Normality Analysis for continuous variables


Quick Steps
1. Click Analyze -> Descriptive Statistics -> Explore…
2. Move the variable of interest from the left box into the Dependent List box on the right.
3. Click the Plots button, and tick the Normality plots with tests option. Check also data
outliers.
4. Click Continue, and then click OK.
5. Your result will pop up – check out the Tests of Normality section.

Data cleaning and outliers


When you do the normality test, the data will show outliers if any. The simplest yet most
effective step is to delete the item/s if outliers exist. If few outliers exist for any one variable,
make sure you delete the outlier items simultaneously.
Data outliers will look like below :-

Sample Explanation if using Shapiro-Wilk and Kolmogorov-Smirnov

a. Advanced Reporting
Based on the analysis, the data is not significant p>.05 (n=150). Therefore, researcher
conclude that the data is normally distributed.

b. Advanced Reporting
The data in normality test using Shapiro-Wilk and Kolmogorov-Smirnov fulfil the basic
assumption of performing parametric tests after the outliers are detected and removed. Based
on the analysis, the data is not significant p>.05 (n=150). Therefore, researcher conclude that
the data is normally distributed.
Step-by-Step Validity Analysis
To perform a validity analysis in SPSS 16, you can follow these step-by-step instructions:

1. Start SPSS and open your dataset: Launch SPSS 16 and open the dataset that contains the
variables you want to analyze for validity.

2. Determine the variables to analyze: Identify the variables or items you want to assess for
validity. These could be Likert-scale items, continuous variables, or any other measurement
scales.

3. Assess internal consistency using Cronbach's alpha: If you are analyzing a scale or a set of
related items, you can assess internal consistency using Cronbach's alpha. Go to "Analyze" >
"Scale" > "Reliability Analysis". In the "Reliability Analysis" window, select the items you
want to analyze and move them into the "Items" box. Choose the appropriate settings, such as
the desired method for handling missing data, and click "OK". SPSS will generate the
reliability statistics, including Cronbach's alpha, in the output window.

4. Conduct exploratory factor analysis (EFA): If you want to analyze the underlying factor
structure of your items, you can perform exploratory factor analysis. Go to "Analyze" >
"Data Reduction" > "Factor". In the "Factor" window, select the items you want to analyze
and move them into the "Variables" box. Choose the appropriate settings, such as the
extraction method (e.g., principal components analysis) and rotation method (e.g., varimax),
and click "OK". SPSS will generate the factor analysis output, including factor loadings and
eigenvalues.

For details step-by step and result interpretation for KMO and Bartlett’s Test, refer to Factor
Analysis in SPSS - Reporting and Interpreting Results (onlinespss.com)

5. Assess convergent (p<0.05)and discriminant validity (between 0.3-0.7 or 0.3 and above):
To assess the convergent validity of your scale, examine the factor loadings of each item on
its corresponding factor in the factor analysis output. Generally, factor loadings above 0.30
are considered acceptable. To assess discriminant validity, compare the item loadings across
factors and ensure they are significantly higher on their own factor compared to other factors.

Use this link to assess validity on Instrument (Convergent and divergent validity) using
Bivariate: How to Test Validity questionnaire Using SPSS - SPSS Tests; SPSS 04: Convergent and
Discriminant Validity of Scale - YouTube; How to analyze discriminant and convergent validity of
Likert scales - YouTube

6. Calculate scale scores: If you have determined that your items are valid and reliable, you
can calculate scale scores. To create composite scores for each scale, you can use SPSS
syntax or formulas. For example, you can create a new variable representing the sum or
average of individual items for each scale. To do this, go to "Transform" > "Compute
Variable". In the "Compute Variable" window, provide a name for the new variable, enter the
formula (e.g., SUM (item1 to item5) or MEAN(item1 to item5)), and click "OK".

7. Perform further analyses: Once you have established the validity of your variables and
calculated scale scores, you can use them in other statistical analyses, such as regression,
ANOVA, or correlation, depending on your research question and objectives.
It is important to interpret the validity analysis results in the context of your research and
consider any specific requirements or guidelines for your study. If you need further guidance
or assistance, consult SPSS 16 Help menus or refer to statistical textbooks for more detailed
instructions and explanations

3. RELIABILITY ANALYSIS
a. Basic Reporting
Table 4 : Cronbach's Alpha
Subscale No of Item Cronbach's Alpha
Technology Acceptance 3 .829
Technology Reliability 3 .762
Perceived Confidence 5 .752
Cost Benefits 3 .840
Facilitation 5 .876

Sample Explanation
According to Hair (2016), a general accepted rule is that, Cronbach's Alpha of 0.6-0.7
indicates an acceptable level of reliability, and 0.8 or greater a very good level. Based on the
presented result in Table 1, the technology acceptance, technology reliability, perceived
confidence, cost benefits and facilitations have a good internal consistency, with a Cronbach
alpha coefficient reported of .829, .762, .752, .840 and .870 respectively. The results indicates
that all items used in the survey questionnaire are reliable.

4. CROSS-TABULATION ANALYSIS
(Non-parametric test – if the distribution of data is not normal)
a.) Basic Reporting
Table 4 (a) : GENDER * QUALIFICATION Cross-tabulation
QUALIFICATION

Diploma Degree Masters Doctorate Others Total

GENDER Male Count 6 11 16 30 19 82

% within GENDER 7.3% 13.4% 19.5% 36.6% 23.2% 100.0%

% within QUALIFICATION 54.5% 64.7% 41.0% 60.0% 57.6% 54.7%

% of Total 4.0% 7.3% 10.7% 20.0% 12.7% 54.7%

Female Count 5 6 23 20 14 68

% within GENDER 7.4% 8.8% 33.8% 29.4% 20.6% 100.0%

% within QUALIFICATION 45.5% 35.3% 59.0% 40.0% 42.4% 45.3%

% of Total 3.3% 4.0% 15.3% 13.3% 9.3% 45.3%

Total Count 11 17 39 50 33 150


% within GENDER 7.3% 11.3% 26.0% 33.3% 22.0% 100.0%

% within QUALIFICATION 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

% of Total 7.3% 11.3% 26.0% 33.3% 22.0% 100.0%

Sample Explanation
You also need to explain Table 4(a) accordingly. You can support your explanation using
graph which can also be set in SPSS.

b. Advanced Reporting

Table 4(b) Chi-Square Tests


Pearson Chi-Square (Gender) Asymp. Sig df
Technology Acceptance .403 8
Qualification .366 4
a. 1 cells (10.0%) have expected count less than 5. The minimum expected count is 4.99.

Table 4(b) indicates the results of the square test (Pearson Chi Square) in the table above (₂
= .403, df = 4, p> .05) showed no difference between Gender and Technology Acceptance
and between gender and qualification (₂ = .366, df = 4, p> .05). This means that the
differences between the genders of men and women do not directly affect the qualification.
Students’ Notes
₂  This symbol is used for Chi-Square
5. CORRELATION ANALYSIS

Table 4.1 : Linear Relationship


Correlations
Perceived
Facilitation Confidence Cost Benefits
Technology Pearson’s r .469** .425** -.274**
Acceptance p-value .000 .000 .001
**. Correlation is significant at the 0.01 level (2-tailed).

Sample Explanation
a. Basic Reporting - Bivariate Correlation (Pearson Correlation)

Results of the Pearson correlation indicates that there is a moderate significant relationship
between technology acceptance and facilitation r=.469, p<.05 (n=150). It is also found that
the relationship between technology acceptance and perceived confidence is moderately
significant r=.425, p<.05 (n=150). However, it is found that there is a weak negative
relationship between technology acceptance and cost benefits r = -.247, p<.05 (n=150) which
indicate that an increase in the cost benefits will decrease technology acceptance.

b. Advanced Reporting

Correlation analysis is used to describe the strength and direction of the linear relationship
between two variables. Cohen (1988) suggests the following guidelines to determine the
strength of the relationship between variables as follows:

r=.10 to .29 or r=–.10 to –.29 small/weak


r=.30 to .49 or r=–.30 to –.4.9 medium/moderate
r=.50 to 1.0 or r=–.50 to –1.0 large.

Results of the Pearson correlation indicates that there is a moderate significant relationship
between technology acceptance and facilitation r=.469, p<.05 (n=150). It is also found that
the relationship between technology acceptance and perceived confidence is moderately
significant r=.425, p<.05 (n=150). However, it is found that there is a weak negative
relationship between technology acceptance and cost benefits r = -.247, p<.05 (n=150) which
indicate that an increase in technology acceptance will decrease the cost benefits.

6. MULTIPLE REGRESSION ANALYSIS

Students’ Notes
Below are the table generated from SPSS. You need to transform the table into more precise
data to explain the statistical results.
Correlations

TECHACCEPT PERCON COSTBEN FACL

Pearson Correlation TECHACCEPT 1.000 .425 -.274 .469

PERCON .425 1.000 -.406 .290

COSTBEN -.274 -.406 1.000 -.373

FACL .469 .290 -.373 1.000

Sig. (1-tailed) TECHACCEPT . .000 .000 .000

PERCON .000 . .000 .000

COSTBEN .000 .000 . .000

FACL .000 .000 .000 .

N TECHACCEPT 150 150 150 150

PERCON 150 150 150 150

COSTBEN 150 150 150 150

FACL 150 150 150 150


Coefficientsa

Standardize

Unstandardized d 95% Confidence Collinearity

Coefficients Coefficients Interval for B Correlations Statistics

Std. Lower Upper Zero-

Model B Error Beta t Sig. Bound Bound order Partial Part Tolerance VIF

1 (Constant) 1.697 .453 3.747 .000 .802 2.592

PERCON .305 .074 .313 4.116 .000 .159 .451 .425 .322 .283 .813 1.230

COSTBEN -.005 .057 -.007 -.086 .931 -.117 .107 -.274 -.007 -.006 .764 1.309

FACL .366 .073 .376 5.012 .000 .222 .510 .469 .383 .344 .838 1.194

a. Dependent Variable:

TECHACCEPT

he independent
e (DV). Whereas β2
and and DV.

gnificant
5), whereas, Cost
with Technology

703). This indicates,

Model Summaryb

Change Statistics

R Adjusted R Std. Error of R Square F Sig. F


Model R Square Square the Estimate Change Change df1 df2 Change

1 .558a .312 .297 .53222 .312 22.030 3 146 .000

a. Predictors: (Constant), FACL, PERCON,


COSTBEN

b. Dependent Variable: TECHACCEPT

ient, R=.558
orrelation

Adjusted R2 indicates that 29.7% of the variance in the dependent variable can
be predicted from independent variables
ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 18.720 3 6.240 22.030 .000a

Residual 41.355 146 .283

Total 60.075 149

a. Predictors: (Constant), FACL, PERCON, COSTBEN

b. Dependent Variable: TECHACCEPT

The P-value <0.05, indicates the equation is a good fit (F(3, 146)= 22.03, p=0.000)

Student Notes
The above tables are the data copied from SPSS output. We must choose the number that we
need only to report on the thesis/assignment. Based on the report above, we need to follow
the steps as listed below.
Step 1 : Report on Multicollinearity
Step 2 : Evaluating the Model including normality data using PP Plot
In multiple regression analysis, PP Plot can also tell you whether your data is normal or not.
A normal distribution of data indicates your model is fit.

Why is Model Fitting Important? As previously mentioned, a well-fit model does not match
every data point given but follows the overall curves. This shows that the model is neither
underfit nor overfit. If a model is not properly fit, it will produce incorrect insights and should
not be used for making decisions.

Step 3 : Summary of the Result


Sample Reporting / Explanation (Only advance explanation needed)
a.) Report on Multicollinearity

Table 4.1 Multicollinearity


Model Pearson Tolerance VIF
Value
Correlation
Perceived
.425 .813 1.230
Confidence
Cost Benefits -.274 .764 1.309
Facilitation .469 .838 1.194
Dependent Variable: Technology Acceptance

Test for multicollinearity is conducted to check the relationship among independent variables.
According to Pallant (2016), multicollinearity exists if the independent variables are highly
correlated or when r=.9 and above, which indicates problems in statistical significance of an
independent variables and the data may not be reliable. Based on the statistical result,
Perceived Confidence, Cost Benefits and Facilitation correlated with Technology Acceptance
(.425, -.274 and .469 respectively). Whereas, the coefficient correlation also shows that all
variables are less that .7. Tabachnick and Fidell (2001, p. 84) suggest to the researcher to
think carefully before including two variables with a bivariate correlation of .7 or more in the
same analysis. In this case, therefore, all variables will be retained. VIF (Variance inflation
factor) also shows all variables are below 10 which indicates no multicollinearity problem
exist. Tolerance value (.813, .764 & .838) is larger than 1-R 2 (1-0.28 = 0.72) which confirms
there is no multicollinearity problem in the model.
Notes to student
1. Tolerance value should be larger than 1-R2.
2. R2 values is depicted from Adjusted R Square in the Model Summary Table.

b.) Model Evaluation (Step by Step)

Notes to student

A linear regression analysis is used to predict changes in the dependent variable based on the
value of independent variables. Below is the standard APA format of reporting for linear
regression.
A linear regression was calculated to predict [dependent variable] based on [independent
variable] . A significant regression equation was found (F(_,__)= __.___, p < .___).

(F(Regression df value, Residual df value )= F Value, p < ANOVA Significance Value)

Sample Advanced Explanation


A simple linear regression was calculated to predict Technology Acceptance based on
Perceived Confidence, Cost Benefits and Facilitation. Multiple correlation coefficient,
R=.558 indicates a high degree of correlation. Adjusted R2 indicates that 29.7% of the
variance in the dependent variable can be predicted from the independent variable. A
significant regression equation was found (F(3, 146)= 22.03, p=0.000). The p-value or
p<0.05, indicates the equation is a good fit.
c.) Summary of the Result
Table 4.2 summarizes the hypothesis that test the relationship between independent variable
and dependent variable as follows :-
Table 4.2 : Result of the Hypothesis Testing
No Hypothesis Testing Result
H₁ There is a relationship between Perceived Confidence and Supported
Technology Acceptance
H₂ There is a relationship between Cost Benefits and Technology Not Supported
Acceptance
H₃ There is a relationship between Facilitation and Technology Supported
Acceptance

You might also like