You are on page 1of 17

Unit-2

Part:1
Different Roles of Variables
Variables Definition
Independent variables (Treatment Variables you manipulate in order to affect the outcome of an
variables) X experiment.
Dependent variables ( Response or Variables that represent the outcome of the experiment.
outcome variables) Y
Intervening Variables Often an apparent relationship between two variables is caused by a
third variable.
Mediating Variables A variable that links the independent and the dependent variables,
and whose existence explains the relationship between the other two
variables.
Moderating Variables A second independent variable that is expected to have a significant
contributory or contingent effect on the originally stated dependent-
independent relationship.
Control Variables Variables that are held constant throughout the experiment.
Marker Variables Variables that are used to indicate some other feature. Often, the
variable of interest is not directly observable; instead, a marker
believed to indicate the existence or level of the variable is used. 
Correlation Coefficient : r
• Correlation coefficients are used to measure how strong a
relationship is between two variables. 
• Correlation coefficient formulas are used to find how strong a
relationship is between data. The formulas return a value between -1
and 1.
• An important rule to remember is that Correlation doesn’t imply
causation
Karls Pearson Correlation Coefficient
•  It shows the linear relationship between two sets of data.
• It answers the question, Can I draw a line graph to represent the
data? Two letters are used to represent the Pearson correlation:
Greek letter rho (ρ) for a population and the letter “r” for a sample.
• The Pearson correlation can evaluate ONLY a linear relationship
between two continuous variables.
• Assumptions:
1. Normal distribution
2. X and Y series should be metric and not categorical.
Spearman Correlation Coefficient
• In statistics, Spearman’s rank correlation coefficient or Spearman’s ρ, named
after Charles Spearman is a nonparametric measure of rank correlation
• The Spearman correlation can evaluate a monotonic relationship between
two variables — Continous or Ordinal and it is based on the ranked values for
each variable rather than the raw data.
• A monotonic relationship is a relationship that does one of the following:
• (1) as the value of one variable increases, so does the value of the other
variable, OR,
• (2) as the value of one variable increases, the other variable value decreases.
• BUT, not exactly at a constant rate whereas in a linear relationship the rate of
increase/decrease is constant.
Comparison of Pearson and Spearman coefficients

1. The fundamental difference between the two correlation coefficients is that the
Pearson coefficient works with a linear relationship between the two variables
whereas the Spearman Coefficient works with monotonic relationships as well.
2. One more difference is that Pearson works with raw data values of the variables
whereas Spearman works with rank-ordered variables.
• If a scatterplot is visually indicating a “might be monotonic, might be linear”
relationship, our best bet would be to apply Spearman and not Pearson.
• No harm would be done by switching to Spearman even if the data turned out to
be perfectly linear.
• But, if it’s not exactly linear and we use Pearson's coefficient then we’ll miss out on
the information that Spearman could capture.
Standard Deviation and Standard Error
• The standard deviation (often SD) is a measure of variability.
• When we calculate the standard deviation of a sample, we are using it as an
estimate of the variability of the population from which the sample was
drawn. 
• We calculate the sample mean we are usually interested not in the mean of
this particular sample, but in the mean for individuals of this type—in
statistical terms, of the population from which the sample comes.
• We can estimate how much sample means will vary from the standard
deviation of this sampling distribution, which we call the standard error (SE)
of the estimate of the mean. Standard error is as a measure of the precision
of the sample mean.
• As the size of the sample data grows larger, the SEM decreases versus
the SD; hence, as the sample size increases, the sample mean
estimates the true mean of the population with greater precision.
• In contrast, increasing the sample size does not make the SD
necessarily larger or smaller, it just becomes a more accurate estimate
of the population SD.
Covariance
• Covariance is a measure of the relationship between two random variables.
• The metric evaluates how much – to what extent – the variables change together. In
other words, it is essentially a measure of the variance between two variables.
• Unlike the correlation coefficient, covariance is measured in units.
• The variance can take any positive or negative values. The values are interpreted as
follows:
Positive covariance: Indicates that two variables tend to move in the same direction.
Negative covariance: Reveals that two variables tend to move in inverse directions.
• For example, the covariance between two random variables X and Y can be calculated
using the following formula (for population and sample respectively):
Covariance Vs Correlation
Covariance Correlation
Covariance measures the total variation of two Correlation measures the strength of the relationship
random variables from their expected values. Using between variables. Correlation is the scaled measure
covariance, we can only gauge the direction of the of covariance. It is dimensionless. In other words, the
relationship (whether the variables tend to move in correlation coefficient is always a pure value and not
tandem or show an inverse relationship). However, it measured in any units
does not indicate the strength of the relationship, nor
the dependency between the variables.

The relationship between the two concepts can be expressed using the formula below:
Degrees of Freedom
• Degrees of Freedom refers to the maximum number of logically
independent values, which are values that have the freedom to vary,
in the data sample.
• The formula for Degrees of Freedom equals the size of the data
sample minus one:
Example:
• Consider a data sample consisting of, for the sake of simplicity, five
positive integers. 
• Four of the numbers in the sample are {3, 8, 5, and 4} and the average
of the entire data sample is revealed to be 6.
• This must mean that the fifth number has to be 10. It can be nothing
else. It does not have the freedom to vary.
• So the Degrees of Freedom for this data sample is 4 (Total sample
size-1).

You might also like