You are on page 1of 15

Name: JONRY G.

HELAMON
Subject: EDUC 102 ADVANCE STATISTICS
Instructor: LEE G. BARAQUIA

A. REVIEW ON THE BASIC STATISTICS

Measure Of Central Tendency

It is a summary statistic that represents the center point or typical value of a dataset.

These measures indicate where most values in a distribution fall and are also referred to as the

central location of a distribution. You can think of it as the tendency of data to cluster around a

middle value. The three most common measures of central tendency are the mean, median,

and mode.

Measure Of Variability

This is a summary statistic that represents the amount of dispersion in a dataset. How

spread out are the values? While a measure of central tendency describes the typical value,

measures of variability define how far away the data points tend to fall from the center.A low

dispersion indicates that the data points tend to be clustered tightly around the center. High

dispersion signifies that they tend to fall further away.

Normality tests are used to determine whether a data set is modeled for normal

distribution. Many statistical functions require that a distribution be normal or nearly normal.

There are both graphical and statistical methods for evaluating normality:

Two numerical measures of shape –


 skewness and excess kurtosis – can be used to test for normality.

If skewness is not close to zero, then your data set is not normally distributed.

SKEWNESS

It is a measure of the asymmetry of the probability distribution of a random variable about its

mean. In other words, skewness tells you the amount and direction of skew (departure from

horizontal symmetry). The skewness value can be positive or negative, or even undefined. If

skewness is 0, the data are perfectly symmetrical, although it is quite unlikely for real-world

data. As a general rule of thumb:

 If skewness is less than -1 or greater than 1, the distribution is highly skewed.

 If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is moderately skewed.

 If skewness is between -0.5 and 0.5, the distribution is approximately symmetric.

KURTOSIS

Kurtosis tells you the height and sharpness of the central peak, relative to that of a standard bell

curve.

Data sampling

It is a statistical analysis technique used to select, manipulate and analyze a

representative subset of data points to identify patterns and trends in the larger data set being

examined. It enables data scientists, predictive modelers and other data analysts to work with a

small, manageable amount of data about a statistical population to build and run analytical

models more quickly, while still producing accurate findings.

Slovin's formula
- is used to calculate the sample size (n) given the population size (N) and a margin of error (e).

- it's a random sampling technique formula to estimate sampling size

B. REVIEW ON PROBABILITY

Probability

Probability is the branch of mathematics that deals with the study chance. Probability deals with

the study of experiments and their outcomes.

Probability Key Terms

 Experiment

An experiment in probability is a test to see what will happen incase you

do something. A simple example is flipping a coin. When you flip a coin,

you are performing an experiment to see what side of the coin you'll end

up with.

 Outcome

An outcome in probability refers to a single (one) result of an experiment.

In the example of an experiment above, one outcome would be heads

and the other would be tails.

 Event

An event in probability is the set of a group of different outcomes of an

experiment. Suppose you flip a coin multiple times, an example of an

event would the getting a certain number of heads.


 Sample Space

A sample space in probability is the total number of all the different

possible outcomes of a given experiment. If you flipped a coin once, the

sample space S would be given by:

If you flipped the coin multiple times, all the different combinations of

heads and tails would make up the sample space. A sample space is

also defined as a Universal Set for the outcomes of a given experiment.

Notation of Probability

The probability that a certain event will happen when an experiment is performed can in

layman's terms be described as the chance that something will happen.

The probability of an event, E is denoted by

Suppose that our experiment involves rolling a die. There are 6 possible outcomes in the

sample space, as shown below:

The size of the sample space is often denoted by N while the number of outcomes in an event is

denoted by n.
From the above, we can denote the probability of an event as:

For the sample space given above, if the event is 2, there is only one 2 in the sample space,

thus n = 1 and N = 6.

Thus probability of getting a 2 when you roll a die is given by

Understanding the Magnitude of the Probability of an Event

The largest probability an event can have is one and the smallest is zero. There are no negative

probabilities and no probabilities greater than one. Probabilities are real positive numbers

ranging from zero to one. The closer the probability is to 1, the more likely the event is to occur

while the closer the event is to zero, the less likely the event is to occur.

When an event has probability of one, we say that the event must happen and when the

probability is zero we say that the event is impossible.

The total of all the probabilities of the events in a sample space add up to one.
Events with the same probability have the same likelihood of occurring. For example, when you

flip a fair coin, you are just as likely to get a head as a tail. This is because these two outcomes

have the same probability i.e.

C. PARAMETRIC AND NON PARAMETRIC TEST

Parametric Test

The parametric test is the hypothesis test which provides generalisations for making

statements about the mean of the parent population. A t-test based on Student’s t-statistic,

which is often used in this regard.

The t-statistic rests on the underlying assumption that there is the normal distribution of variable

and the mean in known or assumed to be known. The population variance is calculated for the

sample. It is assumed that the variables of interest, in the population are measured on an

interval scale.

Nonparametric Test

The nonparametric test is defined as the hypothesis test which is not based on underlying

assumptions, i.e. it does not require population’s distribution to be denoted by specific

parameters.
The test is mainly based on differences in medians. Hence, it is alternately known as the

distribution-free test. The test assumes that the variables are measured on a nominal or ordinal

level. It is used when the independent variables are non-metric.

BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON

Meaning A statistical test, in which specific A statistical test used in the case of

assumptions are made about the non-metric independent variables, is

population parameter is known as called non-parametric test.

parametric test.

Basis of test Distribution Arbitrary

statistic

Measurement Interval or ratio Nominal or ordinal

level

Measure of Mean Median

central tendency

Information about Completely known Unavailable

population

Applicability Variables Variables and Attributes


BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON

Correlation test Pearson Spearman

D. TESTING HYPOTHESIS

A hypothesis is an educated guess about something in the world around you. It should be

testable, either by experiment or observation. For example:

 A new medicine you think might work.

 A way of teaching you think might be better.

 A possible location of new species.

 A fairer way to administer standardized tests.

What Is a T-Test?

A t-test is a type of inferential statistic used to determine if there is a significant difference

between the means of two groups, which may be related in certain features. It is mostly used

when the data sets, like the data set recorded as the outcome from flipping a coin 100 times,

would follow a normal distribution and may have unknown variances. A t-test is used as a

hypothesis testing tool, which allows testing of an assumption applicable to a population.

A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to determine

the probability of difference between two sets of data. To conduct a test with three or more

variables, one must use an analysis of variance.


T-Test

Essentially, a t-test allows us to compare the average values of the two data sets and determine

if they came from the same population. In the above examples, if we were to take a sample of

students from class A and another sample of students from class B, we would not expect them

to have exactly the same mean and standard deviation. Similarly, samples taken from the

placebo-fed control group and those taken from the drug prescribed group should have a

slightly different mean and standard deviation.

Mathematically, the t-test takes a sample from each of the two sets and establishes the problem

statement by assuming a null hypothesis that the two means are equal. Based on the applicable

formulas, certain values are calculated and compared against the standard values, and the

assumed null hypothesis is accepted or rejected accordingly.

If the null hypothesis qualifies to be rejected, it indicates that data readings are strong and are

not by chance. The t-test is just one of many tests used for this purpose. Statisticians must

additionally use tests other than the t-test to examine more variables and tests with larger

sample sizes. For a large sample size, statisticians use a z-test. Other testing options include

the chi-square test and the f-test.

Analysis of variance (ANOVA) is a collection of statistical models and their associated

estimation procedures (such as the "variation" among and between groups) used to analyze the

differences among group means in a sample. ANOVA was developed

by statistician and evolutionary biologist Ronald Fisher. The ANOVA is based on the law of total

variance, where the observed variance in a particular variable is partitioned into components

attributable to different sources of variation. In its simplest form, ANOVA provides a statistical
test of whether two or more population means are equal, and therefore generalizes the t-

test beyond two means.

There are three types of t-tests, and they are categorized as dependent and independent t-

tests.

Post-Hoc Tests

Post-hoc (Latin, meaning “after this”) means to analyze the results of your experimental data.

They are often based on a familywise error rate; the probability of at least one Type I error in a

set (family) of comparisons. The most common post-hoc tests are:

 Bonferroni Procedure

 Duncan’s new multiple range test (MRT)

 Dunn’s Multiple Comparison Test

 Fisher’s Least Significant Difference (LSD)

 Holm-Bonferroni Procedure

 Newman-Keuls

 Rodger’s Method

 Scheffé’s Method

 Tukey’s Test (see also: Studentized Range Distribution)

 Dunnett’s correction

 Benjamin-Hochberg (BH) procedure

Repeated Measures

Paired Sample T-Test

The paired sample t-test, sometimes called the dependent sample t-test, is a statistical

procedure used to determine whether the mean difference between two sets of observations is
zero. In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of

observations. Common applications of the paired sample t-test include case-control studies or

repeated-measures designs. Suppose you are interested in evaluating the effectiveness of a

company training program. One approach you might consider would be to measure the

performance of a sample of employees before and after completing the program, and analyze

the differences using a paired sample t-test.

Relationship between two variables.

What is correlation?

A correlation coefficient measures the extent to which two variables tend to change together.

The coefficient describes both the strength and the direction of the relationship. Minitab offers

two different correlation analyses:

Pearson product moment correlation

The Pearson correlation evaluates the linear relationship between two continuous

variables. A relationship is linear when a change in one variable is associated with a

proportional change in the other variable.

For example, you might use a Pearson correlation to evaluate whether increases in

temperature at your production facility are associated with decreasing thickness of your

chocolate coating.

Spearman rank-order correlation

The Spearman correlation evaluates the monotonic relationship between two continuous

or ordinal variables. In a monotonic relationship, the variables tend to change together,


but not necessarily at a constant rate. The Spearman correlation coefficient is based on

the ranked values for each variable rather than the raw data.

Spearman correlation is often used to evaluate relationships involving ordinal variables.

For example, you might use a Spearman correlation to evaluate whether the order in

which employees complete a test exercise is related to the number of months they have

been employed.

It is always a good idea to examine the relationship between variables with a

scatterplot. Correlation coefficients only measure linear (Pearson) or monotonic

(Spearman) relationships. Other relationships are possible.

Analysis of covariance (ANCOVA)

Analysis of covariance (ANCOVA) allows to compare one variable in 2 or more groups

taking into account (or to correct for) variability of other variables, called covariates.

Analysis of covariance combines one-way or two-way analysis of variance with linear

regression (General Linear Model, GLM).

E. LINEAR REGRESSION AND MULTIPLE ANALYSIS

Multivariate analysis of variance (MANOVA

Multivariate analysis of variance (MANOVA) is simply an ANOVA with several

dependent variables. That is to say, ANOVA tests for the difference in means between two or

more groups, while MANOVA tests for the difference in two or more vectors of means.

Internal and external validity are concepts that reflect whether or not the results of a study are

trustworthy and meaningful. While internal validity relates to how well a study is conducted (its

structure), external validity relates to how applicable the findings are to the real world.
F. INTERNAL VALIDITY

Internal validity is the extent to which a study establishes a trustworthy cause-and-

effect relationship between a treatment and an outcome. It also reflects that a given study

makes it possible to eliminate alternative explanations for a finding. For example, if you

implement a smoking cessation program with a group of individuals, how sure can you be that

any improvement seen in the treatment group is due to the treatment that you administered?

Internal validity depends largely on the procedures of a study and how rigorously it is performed.

Internal validity is not a "yes or no" type of concept. Instead, we consider how confident we can

be with the findings of a study, based on whether it avoids traps that may make the findings

questionable.

The less chance there is for "confounding" in a study, the higher the internal validity and the

more confident we can be in the findings. Confounding refers to a situation in which other

factors come into play that confuses the outcome of a study. For instance, a study might make

us unsure as to whether we can trust that we have identified the above "cause-and-effect"

scenario.

In short, you can only be confident that your study is internally valid if you can rule out

alternative explanations for your findings. As a brief summary, you can only assume cause-and-

effect when you meet the following three criteria in your study:

1. The cause preceded the effect in terms of time.

2. The cause and effect vary together.

3. There are no other likely explanations for this relationship that you have observed.
Reliability analysis refers to the fact that a scale should consistently reflect

the construct it is measuring. There are certain times and situations where it can

be useful.

Content and Construct Validity

Content and construct validity are two of the types of validity that support the GRE® Program.

 Construct validity means the test measures the skills/abilities that should be measured.

 Content validity means the test measures appropriate content.

ETS gathers information from graduate and professional school programs, including business

and law schools, about the skills that they consider essential for success in their programs.

Types of Validity

VREP is designed to measure face validity, construct validity, and content validity. To establish

criterion validity would require further research.

Face validity is concerned with how a measure or procedure appears. Does it seem like

a reasonable way to gain the information the researchers are attempting to obtain? Does it

seem well designed? Does it seem as though it will work reliably? Face validity is independent

of established theories for support (Fink, 1995).

Construct validity seeks agreement between a theoretical concept and a specific

measuring device or procedure. This requires operational definitions of all constructs being

measured.

Content Validity is based on the extent to which a measurement reflects the specific

intended domain of content (Carmines & Zeller, 1991, p.20). Experts in the field can determine

if an instrument satisfies this requirement. Content validity requires the researcher to define the
domains they are attempting to study. Construct and content validity should be demonstrated

from a variety of perspectives.

Criterion related validity, also referred to as instrumental validity, is used to

demonstrate the accuracy of a measure or procedure by comparing it with another measure or

procedure which has been demonstrated to be valid. If after an extensive search of the

literature, such an instrument is not found, then the instrument that meets the other measures of

validity are used to provide criterion related validity for future instruments.

Operationalization is the process of defining a concept or construct that could have a

variety of meanings to make the term measurable and distinguishable from similar concepts.

Operationalizing enables the concept or construct to be expressed in terms of empirical

observations. Operationalizing includes describing what is, and what is not, part of that concept

or construct.

You might also like