Professional Documents
Culture Documents
sample
Statistic (mean, sd)
Population
Parameter (μ,σ)
• Population – refers to the complete set of all
the observations, elements, or objects under
consideration.
• Sample – refers to the representative portion of
the population or the subset of the population.
• Parameter – refers to the numerical description
of a population.
• Statistic – refers to the numerical description of
a sample.
• In statistics, data are facts or figures that
indicate a variable.
• Variable –refers to anything that varies.
• Independent variable – a variable that can
stand on its own.
• Dependent variable – a variable that relies on
another variable (independent variable)
• Extraneous variable – a variable that
influences other variables but is not under
consideration.
Level of Measurement
Ratio level (numerical)
1.The numbers in the data are used to classify a
person/object into distinct, non-overlapping,
and exhaustive categories;
2.The data are arranged into categories
according to magnitude.
3.The data have a fixed unit of measure
representing a set size throughout the scale.
4.The data have absolute zero.
Ex: temperature in Kelvin, daily allowance
• Interval level (numerical)
1.The numbers in the data are used to classify a
person/object into distinct, non-overlapping,
and exhaustive categories;
2.The data are arranged into categories
according to magnitude.
3.The data have a fixed unit of measure
representing a set size throughout the scale.
Ex: temperature in Celsius, IQ scores
• Ordinal level (categorical, rankable)
1.The numbers in the data are used to classify a
person/object into distinct, non-overlapping,
and exhaustive categories;
2.The data are arranged into categories
according to magnitude.
Ex: shirt size, academic rank
• Nominal level (categorical)
1.The numbers in the data are used to classify a
person/object into distinct, non-overlapping,
and exhaustive categories.
Ex: sex, nationality
Exercises
• 1. postal zip code
• 2.performance rating as O, VS, S, MS, NI
• 3.student number
• 4.Body temperature in Celsius
• 5.Ranking in class (1st honor, etc)
• 6.annual salary
• TIN
Data Collection
• 1. Interview or direct method
• 2.Questionnaire or indirect method
• 3.Registration method
• 4.Observation
– Participant observation
– Non-participant observation
5.Experiment
Sampling-
the process of selecting sample units from the population.
• Sample Skewness
• The above formula for skewness is referred to as the
Fisher-Pearson coefficient of skewness.
• Skewness for a normal distribution is zero, and any
symmetric data should have a skewness near zero.
Negative values for the skewness indicate data that
are skewed left and positive values for the skewness
indicate data that are skewed right. By skewed left,
we mean that the left tail is long relative to the right
tail. Similarly, skewed right means that the right tail
is long relative to the left tail. If the data are multi-
modal, then this may affect the sign of the skewness.
Kurtosis
• Kurtosis is defined as the measure of thickness
or heaviness of the given distribution for the
random variable along its tail. In other words,
it can be defined as the measure of
“tailedness” of the distribution. Hence, it is
clear that it is considered as a common
measure of shape. The outliers in the given
data have more effect on this measure.
Moreover, it does not have any unit.
• • The distribution with kurtosis equal to 3 is known as
mesokurtic. A random variable which follows normal distribution
has kurtosis 3.
• • If the kurtosis is less than three, the distribution is called as
platykurtic. Here, the distribution has shorter and thinner tails
than normal distribution. Moreover, the peak is lower and also
broader when compared to normal distribution.
• • If the kurtosis is greater than three, the distribution is called as
leptokurtic. Here, the distribution has longer and fatter tails than
normal distribution. Moreover, the peak is higher and also
sharper when compared to normal distribution.
Hypothesis Testing
• Hypothesis testing refers to the formal procedures used by
statisticians to accept or reject statistical hypotheses.
• A hypothesis is an educated guess about something in the
world around you. It should be testable, either by experiment
or observation.
• Null hypothesis. The null hypothesis, denoted by H0, is usually
the hypothesis that sample observations result purely from
chance.
• Null hypothesis – a statement of no significance.
• Alternative hypothesis. The alternative hypothesis, denoted
by H1 or Ha, is the hypothesis that sample observations are
influenced by some non-random cause.
• Alternative hypothesis – a statement of significance.
• "significance" refers to something that is
extremely useful and important. But in
statistics, "significance" means "not by chance" or
"probably true".
• The level of significance for a statistical
hypothesis test is defined as the fixed probability
of wrong rejection of null hypothesis when if in
fact it is true. The significance level is said to be
the probability of type I error and is preset by the
researcher with the consequences of error.
• A confidence level refers to the percentage of
all possible samples that can be expected to
include the true population parameter.
• Type I errors happen when we reject a true
null hypothesis.
• Type II errors happen when we fail to reject a
false null hypothesis.
• STEPS IN STATISTICAL HYPOTHESIS TESTING
Step 1: State the null hypothesis, H0, and the alternative hypothesis, Ha. The alternative
hypothesis represents what the researcher is trying to prove. The null hypothesis represents
the negation of what the researcher is trying to prove.
• Step 2: State the size(s) of the sample(s). This represents the amount of evidence that is
being used to make a decision. State the significance level, a, for the test.. The significance
level is the probability of making a Type I error. A Type I error is a decision in favor of the
alternative hypothesis when, in fact, the null hypothesis is true. A Type II error is a decision
to fail to reject the null hypothesis when, in fact, the null hypothesis is false.
• Step 3: State the test statistic that will be used to conduct the hypothesis test (the
appropriate test statistics for the different kinds of hypothesis tests must be identified.
• Step 4: Find the critical value for the test. This value represents the cut off point for the test
statistic. If the null hypothesis were true, there would be only a probability of a of obtaining
a value of the test statistic that would be at least this extreme. If the value of the test
statistic computed from the sample data is beyond the critical value, the decision will be
made to reject the null hypothesis in favor of the alternative hypothesis.
• Step 5: Calculate the value of the test statistic, using the sample data. (If you are using Excel
or SPSS, or some similar computer package, you will calculate the value of the test statistic,
along with a p-value.)
• Step 6: Decide, based on a comparison of the calculated value of the test statistic and the
critical value of the test, whether to reject the null hypothesis in favor of the alternative. (If
you have a calculated p-value, then decide based on a comparison of the p-value with a. If
the p-value is less than a, reject H0. Otherwise, fail to reject H0.)
Statistical Tests
A. Comparing: Dependent (outcome) Independent Parametric test (data Non-parametric test
variable (explanatory) variable is normally (ordinal/ skewed data)
distributed)
The averages of two Scale Nominal (Binary) Independent ttest Mann-Whitney test/
INDEPENDENT groups Wilcoxon rank sum
• Thank You.