You are on page 1of 70

DATA ANALYSES & REASONING

MBE12303
TOPIC
1. INTRODUCTION TO STATISTICS AND RESEARCH
2. FALSHBACK PREVIOUS TOPICS
3. DESCRIPTIVE STATISTICS
4. DESCRIPTIVE ANALYSIS USING SPSS
5. REPORTING OUTPUT IN APA STYLE
Common steps of research
1) Develop a research objectives/questions
2) Conduct through literature review
3) Re-define research objectives/questions
- hypothesis
4) Design research methodology/study
5) Create research proposal – DEFEND PROPOSAL (PS1)
6) Apply for ethics approval
7) Collect and analyze data
8) Draw conclusions and relate findings – MOCK VIVA
9) Reporting – DISSERTATION / THESIS (PS2)
BASIC STATISTICS
• Variables
• Type of data – scale of measurement/ level of
data
• Samples and Sampling Techniques
• TYPES OF STATISTICS
– Descriptive stat
– Inferential stat
• Introduction to SPSS
SPSS Beginner
• Create Variables (transform instrument into
SPSS database) – codebook
• Common items for each variable:
– variable name (NAME & LABEL)
– variable description (VALUE)
– variable format : number, data, text (TYPE)
– variable scale of measurement (MEASURE)
Descriptive vs Inferential
• Descriptive statistics
– To describe the basic features of the data in a
study.
– To present quantitative descriptions in a
manageable form.
– performed on a sample & on a population
– To examine & explore the data.
Descriptive vs Inferential
• Inferential statistics
– To study samples and then make generalizations
about the population from which they were
selected.
– An inference is made about the population based
on the samples data
Descriptive vs Inferential
Overview

Descriptive Statistics
• Describes the important characteristics of a
set of data.
• Organize, present, and summarize data:
1. Graphically
2. Numerically

9
Important Characteristics of
Quantitative Data
“Shape, Center, and Spread”
• Center: A representative or average value that
indicates where the middle of the data set is located.

• Variation: A measure of the amount that the values


vary among themselves.

• Distribution: The nature or shape of the distribution of


data (such as bell-shaped, uniform, or skewed).
“Shape” of Distributions
Symmetric
• Data is symmetric if the left half of its histogram
is roughly a mirror image of its right half.

Skewed
• Data is skewed if it is not symmetric and if it
extends more to one side than the other.
Uniform
• Data is uniform if it is equally distributed (on a
histogram, all the bars are the same height or
approximately the same height).
The Shape of Distributions
Symmetric Uniform

Skewed left Skewed Right


Descriptive Statistics
1. Frequency Distributions and Their Graphs
2. Measures of Central Tendency
3. Measures of Variation
1. Frequency Distribution
Frequency Distribution
• A table that organizes data values into classes or
intervals along with number of values that fall in
each class (frequency, f ).
1. Ungrouped Frequency Distribution – for data sets
with few different values. Each value is in its own
class.

2. Grouped Frequency Distribution: for data sets with


many different values, which are grouped together
in the classes.
1. Frequency Distribution
Graphs of Frequency Distributions:

• Frequency Histograms – most common use


• Stem and Leaf Plots
• Dot Plots
• Time Series
• Chart
• Etc….
Frequency Table - SPSS
2. Measures of Central Tendency
Measure of central tendency
• A value that represents a typical, or central,
entry of a data set.
• Most common measures of central tendency:
– Mean
– Median
– Mode
Comparing the Mean, Median & Mode
• All three measures describe an “average”. Choose the one
that best represents a “typical” value in the set.
• Mean:
– The most familiar average.

– A reliable measure because it takes into account every


entry of a data set.
– May be greatly affected by outliers or skew.

• Median:
– A common average.

– Not as effected by skew or outliers.

• Mode: May be used if there is an overwhelming repeat.


Choosing the “Best Average”
• The shape of your data and the existence of
any outliers may help you choose the best
average:
3. Measures of Variation
• Another important characteristic of
quantitative data is how much the data varies,
or is spread out.
• The 3 most common method of measuring
spread are:
1. Range
2. inter-quartile range
3. Standard deviation and Variance
The inter-quartile range (IQR)
in particular is used to describe the dispersion
of the data.

The inter-quartile range (IQR) is defined as the range between


the first and the third quartile. Please note that the IQR
contains exactly 50 %of the data within the distribution.
Median, Quartiles, Deciles &
Percentiles
• The Median is a value that subdivides the ordered data into
two halves.
• The Quartiles subdivide the data into quarters, the deciles
provide a subdivision into tenths, and the percentiles a
subdivision into hundredths.
• There are three quartiles: the lower quartiles, Q1, the
median(Q2), and the upper quartile, Q3.
• The percentiles are simply called the 1st percentile, the 2nd
percentile and so on.
• The median is the 5th decile and the 50th percentile.
• A study of the values of the deciles or quartiles gives us an idea
of the spread of the data, but an ‘ idea’ is all we get and there
is no need for great precicision
IN SUMMARY:
• MEDIAN
– (Data is divided into 2 parts)
• QUARTILE
– (Data is divided into 4 parts)
• DECILES
– (Data is divided into 10 parts)
• PERCENTILES
– (Data is divided into 100 parts)
Interpreting Standard Deviation
• Standard deviation is a measure of the typical
amount an entry deviates from the mean.
• The more the entries are spread out, the
greater the standard deviation.
Standard Deviation: Key Points
 s0 ( When would s = 0 ?)

 The standard deviation is a measure of variation of all


values from the mean. The larger s is, the more the
data varies.
 The units of the standard deviation s are the same as
the units of the original data values. (The variance
has units2).

 The value of the standard deviation s can increase


dramatically with the inclusion of one or more
outliers (data values far away from all others)
Standard Deviation and “Spread”
How does “s” show how much the data varies?
Three methods:
1. Range Rule of Thumb
2. Chebyshev’s Theorem
3. The Empirical Rule
The Empirical Rule
For data sets having a symmetric distribution:

 About 68% of all values fall within 1 standard


deviation of the mean

 About 95% of all values fall within 2 standard


deviations of the mean

 About 99.7% of all values fall within 3 standard


deviations of the mean
The Empirical Rule
When to Use a Particular Descriptive Statistic
Summary
Video - Explaination
• Introduction to Descriptive Statistics (16 min)
https://www.youtube.com/watch?v=QoQbR4lVLrs
• Descriptive Statistics Part 1 (26 min)
https://www.youtube.com/watch?v=8Iklj-lf1fY
• Descriptive Statistics Part 2 (16 min)
https://www.youtube.com/watch?v=ZkEjYloGRIE
Descriptive Stat - SPSS
• Screening & cleaning data
• data analysis
– Frequency
– Descriptive stat
– Explore
– Crosstab
Data Screening & Cleaning
• Once you have entered your data, you need to
check for errors.
• For example, if you have a variable with a Likert
scale ranging from 1 – 5, all of your values should
be in this range. Are they?
• To run a frequency distribution,
click Analyze, Descriptive Statistics,
then Frequencies. Then click on the variable
name that you are checking and move it to
the Variable box.
Example:
• This variable asks the respondent’s general
level of happiness, data should include only 0,
1, 2, 3, 8, and 9.
Solution
• You can sort your cases by either ascending or descending value. Click
on Data, Sort Cases. Then click the name of the variable that you know has an
error. (“happy”) and put it in the Sort By box. Since the values are at the top of the
expected range, I have decided to sort by “descending”. Your screen should look
like this:
• Click OK. Make sure that you are still in the Data View tab (you don’t want to be
looking at the output). Your cases with errors are near the top of the lists (the ’10’
and the ‘4’).
• If this was your own database, you would look up the case and correct the error. If
you do not have the information necessary to identify the case with the error,
delete the value and SPSS will treat it as a missing value.
SPSS-Frequency
SPSS-Descriptive Stat

Example: Reading Score


• 1-40 : low level
• 41-70 : medium level
• 75-100: high level
SPSS-Crosstab
SPSS-Explore

• Valid, missing, total?


Descriptive Stat - SPSS
a. Valid – This refers to the non-missing cases. In this column,
the N is given, which is the number of non-missing cases; and
the Percent is given, which is the percent of non-missing
cases.

b. Missing – This refers to the missing cases. In this column,


the N is given, which is the number of missing cases; and the
Percent is given, which is the percent of the missing cases.

c. Total – This refers to the total number cases, both non-


missing and missing. In this column, the N is given, which is
the total number of cases in the data set; and the Percent is
given, which is the total percent of cases in the data set.
SPSS-Explore
Descriptive Stat - SPSS
a. Statistic – These are the descriptive statistics.
b. Std. Error – These are the standard errors for the descriptive statistics. The
standard error gives some idea about the variability possible in the statistic.
c. Mean – This is the arithmetic mean across the observations. It is the most
widely used measure of central tendency. It is commonly called the average.
The mean is sensitive to extremely large or small values.
d. 95% Confidence Interval for Mean Lower Bound – This is the lower (95%)
confidence limit for the mean. If we repeatedly drew samples of 200
students’ reading test scores and calculated the mean for each sample, we
would expect that 95% of them would fall between the lower and the upper
95% confidence limits. This gives you some idea about the variability of the
estimate of the true population mean.
Descriptive Stat - SPSS
e. 95% Confidence Interval for Mean Upper Bound – This is the upper
(95%) confidence limit for the mean.

f. 5% Trimmed Mean – This is the mean that would be obtained if the


lower and upper 5% of values of the variable were deleted. If the value
of the 5% trimmed mean is very different from the mean, this
indicates that there are some outliers. However, you cannot assume
that all outliers have been removed from the trimmed mean.

g. Median – This is the median. The median splits the distribution such
that half of all values are above this value, and half are below.
Descriptive Stat - SPSS
h. Variance – The variance is a measure of variability. It is the sum of the
squared distances of data value from the mean divided by the variance
divisor. The Corrected SS is the sum of squared distances of data value from
the mean. Therefore, the variance is the corrected SS divided by N-1. We
don’t generally use variance as an index of spread because it is in squared
units. Instead, we use standard deviation.
i. St. Deviation – Standard deviation is the square root of the variance. It
measures the spread of a set of observations. The larger the standard
deviation is, the more spread out the observations are.
j. Minimum – This is the minimum, or smallest, value of the variable.
k. Maximum – This is the maximum, or largest, value of the variable.
l. Range – The range is a measure of the spread of a variable. It is equal to the
difference between the largest and the smallest observations. It is easy to
compute and easy to understand. However, it is very insensitive to variability.
Descriptive Stat - SPSS
m. Interquartile Range – The interquartile range is the difference
between the upper and the lower quartiles. It measures the spread of
a data set. It is robust to extreme observations.

n. Skewness – Skewness measures the degree and direction of


asymmetry. A symmetric distribution such as a normal distribution has
a skewness of 0, and a distribution that is skewed to the left, e.g. when
the mean is less than the median, has a negative skewness.

o. Kurtosis is a measure of the heaviness of the tails of a distribution.


In SPSS, a normal distribution has kurtosis 0. Extremely non normal
distributions may have high positive or negative kurtosis values, while
nearly normal distributions will have kurtosis values close to 0. Kurtosis
is positive if the tails are “heavier” than for a normal distribution and
negative if the tails are “lighter” than for a normal distribution.
Descriptive Stat - SPSS
• Open SPSS data set
(file name:Dataset.sav)

• Follow my steps.
Video – SPSS Research By Design

• An Introduction to the SPSS Workspace – SPSS for Beginners


(2-4)
https://www.youtube.com/watch?v=8_4Z3iKzE8M&list=PLVI_iGT
5ZuRmXlbuwMKi04R6Oe1G3De8G
• How to Create Variables in SPSS – SPSS for Beginners (2-5)
https://www.youtube.com/watch?v=27pOf3_Kq3s&list=PLVI_iGT
5ZuRmXlbuwMKi04R6Oe1G3De8G&index=2
• Importing Excel Data into SPSS – SPSS for Beginners (2-6)
https://www.youtube.com/watch?v=itpAr1fpzcw&list=PLVI_iGT5
ZuRmXlbuwMKi04R6Oe1G3De8G&index=3
• How to Use SPSS - An Introduction to SPSS for Beginners
• https://www.youtube.com/watch?v=_zFBUfZEBWQ&list=PLVI
_iGT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=4
• Descriptive Statistics and Frequencies in SPSS – SPSS for
Beginners
• https://www.youtube.com/watch?v=bapuGcjwiLQ&list=PLVI_i
GT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=5
• Descriptive Statistics and z Scores in SPSS – SPSS for Beginners
• https://www.youtube.com/watch?v=99fGYHGyO5U&list=PLVI
_iGT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=6
• How to Create Frequencies using SPSS – Statistics for
Beginners
• https://www.youtube.com/watch?v=5ehvfgauCWA&list=PLVI_
iGT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=12
• Computing Central Tendency in SPSS
• https://www.youtube.com/watch?v=OgB8DSJuDSs&list=PLVI_
iGT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=13
• Computing Variability with SPSS – Standard Deviation,
Variance, & Range
• https://www.youtube.com/watch?v=fWzMnHVtjPc&list=PLVI_
iGT5ZuRmXlbuwMKi04R6Oe1G3De8G&index=14
Reporting Descriptive Statistics (APA Styles)
• Report statistical results in American
Psychological Association (APA) style.
• APA style sets up some basic rules to make it
easier on the reader, ensuring consistency and
a common understanding.
1. Writing Arabic Numerals
2.Numbers Spelled Out as Words
3. Roman Numerals
4. Decimals
5. Rounding
6. Comma in Numbers
7. Descriptive Statistics
There are a lot of symbols that you might use to represent basic descriptive
statistics when writing up results. The most commonly used for descriptive
statistics are shown below.
Note that all of these should appear in italic font type.
Common mistake
Common mistake
Example of reporting:
THANK YOU
Universiti Tun Hussein Onn Malaysia (UTHM)
86400 Parit Raja, Batu Pahat
Johor, Malaysia

Tel: +607-453 7000


Fax: +607-453 6337

http://www.facebook.com/uthmjohor

@uthmjohor

http://pinterest.com/uthmjohor

You might also like