Professional Documents
Culture Documents
DATA ANALYSIS(Quan,Qual,Mixed)
Statistics
Contents
• Introduction to Statistics
• Methods of Representing Data
• Measures of Central Tendency
• Measures of Variability or Spread
• Measures of Association/Correlation
• The Testing of Hypothesis
– Normal curve
– Z-test
– The t-test
– ANOVA
– Chi-square test
• Regression
– Linear
– Multiple
What to measure?
• To test hypotheses we need to measure variables. Variables are things that can
change or vary.
– Example: IQ, Behaviour, Location, Mood, Achievement, time, etc.
• Most hypotheses can be expressed in terms of two variables: a proposed cause and
a proposed outcome.
• Independent variable: A variable thought to be the cause of some effect. This
term is usually used in experimental research to denote a variable that the
experimenter has manipulated.
– Outcome variable
Introduction to Statistics
• Statistics involves:
– Observation,
– Collection of data,
– Organisation of data,
– Presentation of data,
– Analysis of data,
– Decision making.
Why study statistics?
• Statistical tests simply provide a tool for analysing the results
of any research.
• They are vital to the research process.
Types of Statistics
• There are two major components of the discipline of
Statistics:
– Descriptive statistics
– Inferential statistics
Descriptive statistics:
Methods of organising, presenting, and summarising data in a
convenient and informative way.
Example: mean, median, standard deviation, correlations,
percentages, etc.
Inferential statistics
• methods used to draw conclusions or inferences about
characteristics of populations based on sample data.
– Example: t-tests, ANOVA, Factor analysis, Regression analysis,
chi-square, etc.
Levels/scales of measurement
Categorical Continuous
Mode
Median
Levels/scales of measurement
Mean
Max/min
Range
Standev
Variance
Skewness
Kurtosis
Crosstabs
Suitable graphs Bar chart Bar chart Histogram, box Histogram, box
Pie chart plot, scatter plot plot, scatter plot
Methods of representing data
• Sequencing
• Tables
• Frequency distribution
• Graphs
• Measures of variation/spread/dispersion
– Example:
If data consists of names, arrange in alphabetical order.
If they consist of objects, events, animals, etc., arrange according
to kinds, species, groups, etc.
Raw score
• 10, 15, 18, 12, 14, 15, 20, 15, 16, 11, 12,
14, 19, 20, 17, 18, 15, 13, 11, 12, 19, 13,
10, 14, 17, 19, 16, 15, 15, 15.
Frequency distribution
Score Frequency
10 2
11 2
12 3
13 2
14 3
15 7
16 2
17 2
18 2
19 3
20 2
30
Measures of central tendency (MCT)
• The central tendency of a distribution is an estimate of the centre
of a distribution of values.
• Measures of central tendency aim to quantify the “typical” or
“average” score in a data set.
• Three major types of estimates are distinguished:
– The Mode – in a distribution of data is simply the score that
occurs most frequently.
– The Median – of a distribution is the value that cuts the
distribution exactly in half (by definition the 50th percentile) –
median position = (n+1)/2.
– The Arithmetic Mean (M) – is the average, technically the sum
of all data scores divided by the number of scores (n) .
Sample Mean
• N = Population size
• n = sample size
n
x i
• Mean X i 1
Scores: 2, 4, 5, 3, 2, 2, 4, 5, 1, 1, 1, 2, 3, 2, 3 (n = ….)
Sorted: 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5
Scores: 2, 4, 5, 3, 2, 2, 4, 5, 1, 1, 1, 2, 3, 2, 3 (n =15 )
Sorted: 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5
2 1 4 9 16 25
4 1 0 1 4 9
6 9 4 1 0 1
8 25 16 9 4 1
sum 36 24 20 24 36
Measures of statistical variability
Maximum = 5
Minimum = 1
Range = 4
Although, the range …
• gives an idea of how far spread the data is
– a higher range number means the data is more spread apart
• and it can compare various sample ranges to see which is spread the most
• BUT the range can be fooled by extreme values (both have range = 10)
4.5 10
4 9
3.5 8
7
3
6
2.5
5
2
4
1.5
3
1 2
0.5 1
0 0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Statistical variability: Sum of Squares
Raw score:
1, 2, 2, 3, 4.
Mean=1+2+2
+3+4/5=2.4
Statistical variability: Sum of Squares
26
Statistical variability: Variance
X X
2
i
S S2 i 1
n 1
platykurtic leptokurtic
What are we ideally looking for?
• If the mean represents the data well then most of the scores
will cluster close to the mean and the resulting standard
deviation is small relative to the mean.
Is my sample representative of the population?
SE = SD/√ N
Standard error
• What is the difference between std. error and
std.?
– Std. error is the standard deviation of the population
mean.
– SD tells the researcher how spread out the
responses are -- are they concentrated around the
mean, or scattered far & wide? Did all of your
respondents rate your product in the middle of your
scale, or did some love it and some hate it?
95% Confidence Intervals
• We use the mean and standard deviation to create
confidence intervals for the population mean.
• Can use Z-distribution if data are normally
distributed, or t-distribution if data are
approximately normally distributed
• We will use t distribution (with n-1=14 df)
• CI for m looks like:
S
X tdf , a / 2
n
39.10
90.67 2.145 ==> 90.67 2.145 (10.10)
15
90.67 21.66 ==> (69.01, 112.33)
45
95% CI for m, for PPVT (n=15)
• Based on our sample results, we are 95% confident
that the true average PPVT score for the population
from which the sample was drawn, is between 69.01
and 112.33.
• Recall that the expression, S ,/ isnreferred to as, the
standard error of the mean
• Tells us about precision, how close our estimates may
be to the true population value
S X 10.10
• It’s somewhat large here because the sample is very
small (n=15)
46
Inferential statistics
Population and samples revisited
• Statistical inference is the process by which we
acquire information about populations from
samples.
sample
random
population
• …
Statistic
Parameter
Symbolic notation for some sample and
population measures
Statistical Sample Population Data
measure statistic parameter type
Size n N Qualitative/
Quantitative
Mean
x
Quantitative
Variance 2
s
Standard s
Deviation
Proportion p Qualitative
Inferential Statistics
With inferential statistics we are trying to reach conclusions
that extend beyond the immediate data alone; whereas with
descriptive statistics we simply describe what’s going on in
our data.
Two methods of inferential statistics are distinguished:
1. To infer from the sample data to population - the estimation
of parameter(s)
2. To judge whether an observed difference/relationship is a
dependable difference/relationship (systematic) or one that
might have happened by chance – the testing of statistical
hypotheses
Statistical Hypothesis Testing
In the second step the test statistics is calculated from the data.
There are different test statistics and which you choose depends on
various factors:
Type of investigation – differences or relationships
Sample type – independent versus dependent
Number of samples – one, two or more samples
Level of measurement – ratio, interval, ordinal and nominal
Distribution of Data – parametric versus non-parametric tests
Amount of Data (Sample size)
Website:
http://www.gardenersown.co.uk/About/Mark/Cho
osestats.html
Common Test Statistics: Differences
Goal Interval + Normal Ordinal + Nominal (two
Distribution Non-normal possible
outcomes)
Predict value from one other Simple linear Nonparametric Simple logistic
measured variable regression or regression regression
Nonlinear
regression
68
Example1: A doctor diagnoses a patient with
cancer
• Case I:
– Null hypothesis: the patient does not have cancer (which
is a true case).
– Research hypothesis: the patient has cancer.
• Researcher conclusion: the patient has cancer (wrong decision).
In this scenario, the researcher has committed a type I error.
• Case II:
– Null hypothesis: the patient does not have cancer (false
case).
– Research hypothesis: the patient has cancer (true case).
• Researcher conclusion: the researcher concludes that the
patient does not have cancer. In this case the researcher has
committed a type II error.
Example 2: court case
• Case1:
– Null hypothesis: defendant is not guilty.
– Alternative hypothesis: defendant is guilty.
– Researcher conclusion: the defendant is guilty (wrong
decision). In this case, the researcher has committed a
Type I error. Convicting an innocent person.
Type II error: defendant is not guilty. Accepting a null
hypothesis (which is not true). This is the same as setting
a guilty person free.
Hypothesis Truth Table
NULL HYPOTHESIS
TRUE FALSE
CORRECT TYPE II
ACCEPT
DECISION ERROR
DECISION
TYPE I CORRECT
REJECT
ERROR DECISION
Statistical significance
• Calculated value
• Critical value: found in tables or stored in
computer’s memory
• In general, if the calculated value of the
statistic (t, F, etc.) is relatively large, the
probability or p is small, (e.g., .05, .01, .001).
74
Cont…
0.50 P = 0.5
.06 P = 0.06
.000 P<.001 78
Mixed methods research
• Mixed methods research takes advantage of using multiple ways to
explore a research problem.
• Basic Characteristics
• Design can be based on either or both perspectives.
• Research problems can become research questions and/or
hypotheses based on prior literature, knowledge, experience, or
the research process.
• Sample sizes vary based on methods used.
• Data collection can involve any technique available to researchers.
• Interpretation is continual and can influence stages in the research
process.
Why Use Mixed Methods?
Summary of findings
• The summary part usually includes a brief
restatement of the problem/s, the main features of
the methods and the most important findings.
• Upon completing the draft of this section, the
writer should check it carefully to determine
whether it gives a concise but reasonably
complete description of the study & its findings.
• S/He should also check to ascertain that no
information has been introduced here that
Cont…
had been included in the appropriate preceding
sections.
• It is a good idea to have a colleague lead the
conclusions section to see if the author is
communicating as well as he intended to do.
• With respect to each finding, you are asking yourself,
knowing what I now know, what conclusion can I draw.
• Research findings are typically defined as the
researchers’ interpretations of the data they collected
or generated in the course of their studies.
Conclusions