You are on page 1of 19

STATISTICAL TREATMENT OF DATA

BASICS OF STATISTICS
DEFINITION: SCIENCE OF COLLECTION, PRESENTATION, ANALYSIS, AND REASONABLE
INTERPRETATION OF DATA.

For example, suppose we measure the


weight of 100 patients in a study. With
Statistics presents a rigorous scientific
so many measurements, simply looking
method for gaining insight into data.
at the data fails to provide an
informative account.

Statistics can give an instant overall


Besides data summarization, another
picture of data based on graphical
important task of statistics is to make
presentation or numerical
inference and predict relations of
summarization irrespective to the
variables.
number of data points.
Discrete VS. Continuous Data
Discrete data (countable) is information that can only take certain values. These values don’t have to be
whole numbers but they are fixed values – such as shoe size, number of teeth, number of kids, etc.

Discrete data includes discrete variables that are finite, numeric, countable, and non-negative integers (5,
10, 15, and so on).

Continuous data (measurable) is data that can take any value. Height, weight, temperature and length are
all examples of continuous data.

Continuous data changes over time and can have different values at different time intervals like weight of
a person.
STATISTICAL TREATMENT

MEAN STANDARD PERCENTAGES


DEVIATION

CORRELATION T-TEST ANOVA


MEAN

The mean is essentially a model of your data set. It is the value that is
most common. ... That is, it is the value that produces the lowest
amount of error from all other values in the data set. An important
property of the mean is that it includes every value in your data set as
part of the calculation.

The mean represents the average value of the dataset.


It can be calculated as the sum of all the values in the
dataset divided by the number of values. In general, it is
considered as the arithmetic mean.
STANDA A standard deviation (or σ) is a measure of how
dispersed the data is in relation to the mean. Low

RD standard deviation means data are clustered around


the mean, and high standard deviation indicates

DEVIATI data are more spread out. A standard deviation close


to zero indicates that data points are close to the

ON mean, whereas a high or low standard deviation


indicates data points are respectively above or
below the mean.
E
X You administer a memory recall test to 20
A
students. The data follows a normal
M
distribution with a mean score of 50 and a
P
L
standard deviation of 10.
E
The sign in a correlation tells you what direction
the variables move. A positive correlation means
the two variables move in the same direction. A
negative correlation means they move in opposite
directions. The number in a correlation will
always be between zero and one

A positive correlation means that both variables


CORRELATION move in the same direction - as one goes up, the
other goes up, or vice versa.

A negative correlation means that the two


variables move in the opposite direction from
each other - as one goes up, the other goes down.
•Assumption #1: Your two variables should be
measured at the interval or ratio level (i.e., they
are continuous). Examples of variables that meet this
criterion include revision time (measured in hours),
intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight
(measured in kg), and so forth.

Assumptions •Assumption #2: There is a linear


relationship between your two variables. Whilst there
are a number of ways to check whether a linear
relationship exists between your two variables, we
suggest creating a scatterplot using SPSS Statistics,
where you can plot the one variable against the other
variable, and then visually inspect the scatterplot to
check for linearity.
 You interpret a scatterplot by looking for trends in the data as you go from left to right: If the
data show an uphill pattern as you move from left to right, this indicates a positive
relationship between X and Y. As the X-values increase (move right), the Y-values tend to
increase (move up).
•Assumption #4: Your variables should be approximately
normally distributed. In order to assess the statistical
significance of the Pearson correlation, you need to have bivariate
normality, but this assumption is difficult to assess, so a simpler
method is more commonly used. This simpler method involves
determining the normality of each variable separately. To test for
normality you can use the Shapiro-Wilk test of normality, which
is easily tested for using SPSS Statistics. In addition to showing
you how to do this in our enhanced Pearson’s correlation guide,
Assumptions we also explain what you can do if your data fails this
assumption.

Assumption #3: There should be no significant outliers.


Outliers are simply single data points within your data that do not
follow the usual pattern (e.g., in a study of 100 students’ IQ
scores, where the mean score was 108 with only a small variation
between students, one student had a score of 156, which is very
unusual, and may even put her in the top 1% of IQ scores
globally). The following scatterplots highlight the potential
impact of outliers:
Strong correlation: If the coefficient
value lies between ± 0.50 and ± 1.

Moderate degree: If the value lies


between ± 0.30 and ± 0.49, then it is
CORRELATION said to be a medium correlation.

Low degree: When the value lies


below + . 29, then it is said to be a
small correlation.
T-Test Assumptions
Assumption #1: Your dependent variable should be measured on a continuous scale (i.e., it
is measured at the interval or ratio level). Examples of variables that meet this criterion
include revision time (measured in hours), intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn
more about continuous variables in our article: Types of Variable.
Assumption #2: Your independent variable should consist of two categorical, independent
groups. Example independent variables that meet this criterion include gender (2 groups:
male or female), employment status (2 groups: employed or unemployed), smoker (2
groups: yes or no), and so forth.
Assumption #3: You should have independence of observations, which means that there is
no relationship between the observations in each group or between the groups themselves.
For example, there must be different participants in each group with no participant being in
more than one group. This is more of a study design issue than something you can test for,
but it is an important assumption of the independent t-test. If your study fails this
assumption, you will need to use another statistical test instead of the independent t-test
(e.g., a paired-samples t-test). If you are unsure whether your study meets this assumption,
you can use our Statistical Test Selector, which is part of our enhanced content.
T-Test Assumptions
Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that
do not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a
small variation between students, one student had a score of 156, which is very unusual, and may even put her in the
top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the independent t-
test, reducing the validity of your results. Fortunately, when using SPSS Statistics to run an independent t-test on
your data, you can easily detect possible outliers. In our enhanced independent t-test guide, we: (a) show you how to
detect outliers using SPSS Statistics; and (b) discuss some of the options you have in order to deal with outliers.
Assumption #5: Your dependent variable should be approximately normally distributed for each group of the
independent variable. We talk about the independent t-test only requiring approximately normal data because it is
quite "robust" to violations of normality, meaning that this assumption can be a little violated and still provide valid
results. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using SPSS
Statistics
Assumption #6: There needs to be homogeneity of variances. You can test this assumption in SPSS Statistics using
Levene’s test for homogeneity of variances. In our enhanced independent t-test guide, we (a) show you how to
perform Levene’s test for homogeneity of variances in SPSS Statistics, (b) explain some of the things you will need
to consider when interpreting your data, and (c) present possible ways to continue with your analysis if your data
fails to meet this assumption.
Percentages are also most clearly displayed in
parentheses with no decimal places:
Nearly half (49%) of the sample was married.

Mean and Standard Deviation are most clearly


Reporting of presented in parentheses: The sample as a whole
was relatively young (M = 19.22, SD = 3.45).
Results
Correlations are reported with the degrees of
freedom (which is N-2) in parentheses and the
significance level:
The two variables were strongly correlated, r(55)
= .49, p < .01.
T Tests are reported like chi-squares, but only the degrees of freedom are
in parentheses. Following that, report the t statistic (rounded to two
decimal places) and the significance level.
There was a significant effect for gender, t(54) = 5.43, p < .001, with men
receiving higher scores than women.

Reporting of ANOVAs (both one-way and two-way) are reported like the t test, but
Results there are two degrees-of-freedom numbers to report. First report the
between-groups degrees of freedom, then report the within-groups
degrees of freedom (separated by a comma). After that report the F
statistic (rounded off to two decimal places) and the significance level.
There was a significant main effect for treatment, F(1, 145) =
5.43, p < .01, and a significant interaction, F(2, 145) = 3.13, p < .05.

You might also like