You are on page 1of 16

What is Regression?

1. Many business decisions involve the relationship between two or more variables.
E.g., what determines sales levels? •
2. Regression analysis is used to develop an equation showing how the variables are
related.
3. The variable being predicted is called the dependent variable and is denoted by y. The
variables used to predict the value of the dependent variable are called the
independent variables and are denoted by x.
4. The average relationship between a dependent and independent variable is called a
regression.
5. The dependent variable is assumed to be random variable whereas the independent
variables are assumed to have fixed values.

Dependent Variable

A variable intended to be estimated or predicted is termed as dependent variable. The


dependent variable also called the regressand, the predictand, response or explained variable. It is
denoted by ‘’ Y ‘’.

Independent Variable

A variable on the basis of which the dependent variable is to be estimated is called independent
variable. The independent variable is also called regressor, predictor or explanatory variable. It is
denoted by ‘’ X ‘’.

Regression Line/Equation

̂ = 𝒂 + 𝒃𝑿
𝒀
1. a is the intercept of the regression line
2. b is the slop parameter.
n is the total number of variables.

find a linear relationship of the two variables


If X is the independent variable and Y is the dependent variable, then the relationship described
̂ = 𝒂 + 𝒃𝑿 is called a regression line.
by a straight line 𝒀
In agricultural research we are often interested in describing the change in one variable (Y, the
dependent variable) in terms of a unit change in a second variable (X, the independent
variable). Regression is commonly used to establish such a relationship. A simple linear
regression takes the form of ̂ = 𝒂 + 𝒃𝑿
𝒀
In the above equation of regression we noted that a regression has two parameters a and b.
Where a is the intercept of the regression line and b is the slop parameter.
Formulas for finding the values of a and b are given below
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑏= 2
𝑛 ∑ 𝑋 2 − (∑ 𝑋)
and
̅ − 𝒃𝑿
𝒂=𝒀 ̅

Regression Line/Equation
If X is the independent variable and Y is the dependent variable, then the relationship
̂ = 𝒂 + 𝒃𝑿 is called a regression line which is used to find a linear
described by a straight line 𝒀
relationship of the two variables. For example, the relation between Celsius and Fahrenheit
scales (temperatures) given by F = 32 + 1.8C is a linear relation.

Types of Regression relationships

The relation of X and Y depends upon the value of b. So following are the different types of relationship
exist in regression

Positive relation

If the value of b is positive then there will be positive relation between X and Y, which means if X
increase Y increase and if X decrease Y also decrease.

Negative relation

If the value of b is negative then there will be negative relation between X and Y, which means if
X increase Y decrease and if X decrease Y increase.

No relation

If the value of b is zero then there will be no effect of X on Y.

Correlation
 Correlation a LINEAR association between two random variables

 Correlation analysis show us how to determine both the nature and strength of relationship
between two variables

 When the changes in one variables appear to be linked with the changes in the other variable,
the two variable are said to correlate.

 When variables are dependent on time correlation is applied

 Correlation lies between +1 to -1

 A zero correlation indicates that there is no relationship between the variables

 A correlation of –1 indicates a perfect negative correlation

 A correlation of +1 indicates a perfect positive correlation

The coefficient of correlation

As for measuring the degree of variability, we need a measure of the degree of


relationship between two variables that is free from the particular units employed in a given
case. Such a measure is termed a coefficient of correlation.

In other words coefficient of correlation is the formula used to find a value which tells about the
strength of relationship between variable X and Y i.e. how strongly positive or negative the
relation is.
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]

Statistical Inference
The main objective of sampling is to draw conclusions about the unknown population from the
information provided by a sample. This is called statistical inference.

the population characteristics are parameters and

sample characteristics are statistics.

There are two approaches of statistical inference namely

i) Estimation of parameter

ii) Hypothesis Testing or testing of hypothesis

1. Estimation of Parameter

Statistical inference about the unknown value of the population parameter is called
estimation of parameter. Suppose we are interested to know the average life of tires of a certain
firm. This means we want an estimate of something which is not known to us. So it is the
problem of estimation.

Important Terms

1) Estimate

The value of the estimator takes when calculated using an actual sample of data.
For example 𝑋̅ is the estimate of population mean μ.

2) Estimator

An estimator is rule or formula that tells how to calculate an estimate based on the
∑𝑋
measurement contained in a sample. For example 𝑋̅ = 𝑛
is the estimator.

3) Estimation

The computation of a statistic from sample data for the purposes of obtaining a guess of the
unknown population parameter value.

Types of estimation

1. Point estimation

An estimate of the population parameter given by a single number is called is called a


point estimate of the parameter.

EX. A firm wish to estimate amount of time its salesman spend on each sales call
2. Interval Estimation

An estimate of a population parameter given by two numbers between which the


parameter may be considered to lie. The interval estimation consists of lower and upper limits
and we assign a probability (say 95% confidence) that this interval contains the true value of the
parameter.

Testing of hypothesis

Hypothesis testing is a process which is used to check the validity of a statement about
a population parameter.

What is hypothesis?

A statement about the population parameter developed for the purpose of testing.

Types of hypothesis

1. Statistical hypothesis

A statistical hypothesis is statement about the numerical value of a population


parameter.

2. Null hypothesis

A null hypothesis is any hypothesis which is tested for possible rejection or acceptance
under the assumption that it is true.

3. Alternative hypothesis

A statement that specifying that the population parameter is some value other than the
one specified under null hypothesis.

Procedure for hypothesis testing

To perform any kind of statistical analysis for testing of hypothesis the following six steps are the
base of every study. These steps are known as the procedure of hypothesis testing. These are as follows.

1. Formulating the null and alternative hypothesis

The first step in hypothesis testing is to identify the problem and decide on the statements, that
which statement can be a null hypothesis and which can be the alternative. Notational null and
alternative hypothesis can be represented as
Null hypothesis Alternative hypothesis

𝐻0 : 𝜃 = 𝜃0 𝐻1 : 𝜃 ≠ 𝜃0

𝐻0 : 𝜃 ≤ 𝜃0 𝐻1 : 𝜃 > 𝜃0

𝐻0 : 𝜃 ≥ 𝜃0 𝐻1 : 𝜃 < 𝜃0

Where 𝜃 is the population parameter and 𝜃0 is the statement about the population parameter.

2. Level of significance

It is the probability of rejecting Ho when Ho is true. It is denoted by α. It makes the size of


critical region. 1%, 5%, 10%

3. Test statistics

A statistic used as a basis for deciding whether the null hypothesis should be rejected is called
test statistics.

4. Critical region

Critical region or rejection region is decided by 𝐻1 . the size of critical region is equal to α.

Alternative Critical Region Conclusion


hypothesis

𝐻1 : 𝜃 ≠ 𝜃0 Calculated value < Tabulated value and Reject Ho


Calculated value > Tabulated value

𝐻1 : 𝜃 > 𝜃0 Calculated value > Tabulated value Reject Ho

𝐻1 : 𝜃 < 𝜃0 Calculated value < Tabulated value Reject Ho

5. Computation

The relevant test-statistic is calculated from the sample data. The calculated value is to be
compared with the tabulated value.

6. Conclusion
If the calculated value of test-statistic lies in the rejection region, the null hypothesis Ho is
rejected and 𝐻1 is accepted.

And if the calculated value of the test-statistics do not falls in the rejection region then we say
Ho is accepted or do not rejected.

Z-test
Hypothesis Testing of population mean ( when σ is known)

Suppose a population has mean µ which is unknown and

Standard deviation σ which is known.

A large sample of size n>30 is selected from the population and sample mean 𝑋̅ is calculated. So the
testing procedure used for this kind of information is called Z-test for testing a specified value of µ i.e.
𝜇0 . The test procedure for Z-test is given below

Procedure:

1. We frame the null and alternative hypothesis. Three different forms of null and alternative hypothesis
are possible which are:

a) 𝐻0 : 𝜇 = 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 ≠ 𝜇0

b) 𝐻0 : 𝜇 ≤ 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 > 𝜇0

c) 𝐻0 : 𝜇 ≥ 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 < 𝜇0
2. Level of significance α is decided that can be 1%, 5% and 10%
𝑋̅−𝜇0
3. Test Statistics 𝑍= 𝜎/√𝑛

4. Critical Region

It depends upon alternative hypothesis

If 𝐻1 : 𝜇 ≠ 𝜇0 than |𝑍𝑐𝑎𝑙 | ≥ 𝑍𝛼⁄2 or (𝑍𝑐𝑎𝑙 > 𝑍𝛼 and−𝑍𝑐𝑎𝑙 < −𝑍𝛼 )

If 𝐻1 : 𝜇 > 𝜇0 than 𝑍𝑐𝑎𝑙 > 𝑍𝛼

If 𝐻1 : 𝜇 < 𝜇0 than 𝑍𝑐𝑎𝑙 < −𝑍𝛼

Table that can be use to take the tabulated value at different alpha’s is given below:

α Two sided (α/2) One sided right (α) One sided left (-α)

0.10 (10%) -1.645 and +1.645 1.282 -1.282


0.05 (5%) -1.96 and 1.96 1.645 -1.645

0.02 (2%) -2.326 and 2.326 2.054 -2.054

0.01 (1%) -2.575 and 2.575 2.326 -2.326

5. Calculation

Put all the information test statistics and get the results

6. Conclusion

If the calculated value of Z falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the selected value is not the correct
estimate about population mean.

Two Sample Z and T test

Hypothesis Testing of Two Population’s Means ( when 𝝈𝟐 𝟏 𝒂𝒏𝒅 𝝈𝟐 𝟐 are known)

Suppose there are two population with mean 𝜇1 𝑎𝑛𝑑 𝜇2 which are unknown and variances
𝜎 1 𝑎𝑛𝑑 𝜎 2 2 which are known. Two large samples of size 𝑛1 𝑎𝑛𝑑 𝑛2 > 30 are selected from the
2

population and sample means 𝑋̅1 𝑎𝑛𝑑 𝑋̅2 are calculated. So the testing procedure used to test whether
the two population’s means are identical or not by using this kind of information is called two sample Z-
test. The test procedure for two sample Z-test is given below

Procedure:

1. We frame the null and alternative hypothesis. Three different forms of null and alternative hypothesis
are possible which are:

a) 𝐻0 : 𝜇1 − 𝜇2 = 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 ≠ 0

b) 𝐻0 : 𝜇1 − 𝜇2 ≤ 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 > 0

c) 𝐻0 : 𝜇1 − 𝜇2 ≥ 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 < 0
OR

a) 𝐻0 : 𝜇1 = 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 ≠ 𝜇2

b) 𝐻0 : 𝜇1 ≤ 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 > 𝜇2

c) 𝐻0 : 𝜇1 ≥ 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 < 𝜇2
t-test

Hypothesis testing of population mean (when σ is unknown)

Suppose a population has mean µ which is unknown and standard deviation σ which is also
unknown. A small sample of size n<30 is selected from the population and sample mean 𝑋̅ and sample
standard deviation ‘s’ is calculated. So the testing procedure used for this kind of information is called t-
test for testing a specified value of µ i.e. 𝜇0 .

The test procedure for t-test is given below

Procedure:

1. Formulating Hypothesis: We frame the null and alternative hypothesis. Three different forms of null
and alternative hypothesis are possible which are:

a) 𝑯𝟎 : 𝜇 = 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 ≠ 𝜇0

b) 𝑯𝟎 : 𝜇 ≤ 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 > 𝜇0

c) 𝑯𝟎 : 𝜇 ≥ 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 < 𝜇0
2. Level of significance α is decided that can be 1%, 5% and 10%
𝑋̅−𝜇0
3. Test Statistics t= 𝑠/√𝑛

∑𝑋 ∑(𝑥−𝑥̅ )2
Where, 𝑋̅ = 𝑎𝑛𝑑 𝑠 = √
𝑛 𝑛−1

4. Critical Region

It depends upon alternative hypothesis

If 𝐻1 : 𝜇 ≠ 𝜇0 than |𝑡𝑐𝑎𝑙 | ≥ 𝑡𝛼⁄ or (𝑡𝑐𝑎𝑙 > 𝑡𝛼(𝑣) and−𝑡𝑐𝑎𝑙 < −𝑡𝛼(𝑣) )


2(𝑣)

If 𝐻1 : 𝜇 > 𝜇0 than 𝑡𝑐𝑎𝑙 > 𝑡𝛼(𝑣)

If 𝐻1 : 𝜇 < 𝜇0 than 𝑡𝑐𝑎𝑙 < −𝑡𝛼(𝑣)

Where

v = n-1 which is the degree of freedom for t-test

5. Calculation

Put all the information test statistics and get the results

6. Conclusion
If the calculated value of t falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the selected value is not the correct
estimate about population mean.

𝝌𝟐 Test

Chi Squared Test

A chi squared test follows a chi square distribution. This test is generally used for three purpose
which are as follows:

1. Chi square test for independence

The Chi Square statistic is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical
variables in the population; they are independent.

2. Chi square test for goodness of fit

Chi-squared tests often refers to tests for which the distribution of the test statistic approaches
2
the χ distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true)
of the test statistic approximates a chi-squared distribution more and more closely as sample sizes
increase.

3. Chi square test for variance

The chi-square test for variance is a non-parametric statistical procedure with a chi-square-
distributed test statistic that is used for determining whether the variance of a variable obtained from a
particular sample has the same size as the known population variance of the same variables. In order to
determine the population variance, it is necessary to examine the entire population. It is often sufficient
to obtain the population variance based on a representative sample. When conducting the chi-square
test, the variable being tested can have any level on a scale.

Chi Squared Test Procedure:

1. Formulating Hypothesis: We frame the null and alternative hypothesis. Three different
forms of null and alternative hypothesis are possible which are:

a) 𝐻0 : 𝜎 2 = 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 ≠ 𝜎 2 0

b) 𝐻0 : 𝜎 2 ≤ 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 > 𝜎 2 0

c) 𝐻0 : 𝜎 2 ≥ 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 < 𝜎 2 0
2. Level of significance α is decided that can be 1%, 5% and 10%
(𝑛−1)𝑠2
3. Test Statistics 𝜒2 = 𝜎20
Whereby

n = the sample size

𝑠 2 = the sample variance

𝜎 2 0 = the population variance

4. Critical Region

It depends upon alternative hypothesis

If 𝐻1 : 𝜎 2 ≠ 𝜎 2 0 than 𝜒 2 > 𝜒 21−𝛼⁄ and 𝜒 2 < 𝜒 2 𝛼⁄


2(𝑛−1) 2(𝑛−1)

If 𝐻1 : 𝜎 2 > 𝜎 2 0 than 𝜒 2 > 𝜒 21−𝛼 (𝑛−1)

If 𝐻1 : 𝜎 2 < 𝜎 2 0 than 𝜒 2 < 𝜒 2 𝛼 (𝑛−1)

5. Calculation

Put all the information test statistics and get the results

6. Conclusion

If the calculated value of 𝜒 2 falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the population variance is not equal to
the specific value we assumed.

Skewness and Kurtosis


 Skewness is a lack of symmetry in a distribution around mean

 Skewness is the measure of asymmetry of the distribution of

 Measures of skewness tell us the direction and the extent of Skewnes

 the skewness of a distribution positive or negative skewness

 The more the mean moves away from the mode, the larger the asymmetry or skewness

 A distribution is said to be 'skewed' when the mean and the median fall at different points in
the distribution, and the balance (or centre of gravity) is shifted to one side or the other-to left
or right

positive skewness
Mean ˃ Median ˃ Mode

negative skewness
Mean ˂ Median ˂ Mode
KURTOSIS

 KURTOSIS is a measure of the &quot; peakedness

 Kurtosis is another measure of the shape of a frequency curve.

 While skewness signifies the extent of asymmetry, kurtosis measures the degree of peakedness
of a frequency distribution.

 Karl Pearson classified curves into three types on the basis of the shape of their peaks.

 These are Mesokurtic, leptokurtic and platykurtic. These three types of curves

Describe a Frequency Distribution:

 To describe a major characteristics of a frequency distribution, we need the calculations of the


following five quantities.
 The number of observations that describes the size of the data.

 A measure of central tendency such as the mean or median that provides information about the
center or average value.

 A measure of dispersion such as standard deviation that indicates the variability of the data.

 A measure of skewness that shows the lack symmetry in the frequency distribution.

 A measure of kurtosis that gives information about its peakedness.

MIDS syllabus

Statistics

Statistics is the branch of science that deals with

• Collection of data

• Summarization of data

• Analysis of data

• Presentation of data and

• Interpretation of data

OR

It is science concerns with the collection, presentation, and analysis of data and to draw valid inferences
from the given data.

Data

Data is information usually numerically that are collected through observation or experiment.

Uses of statistics

◦ Statistics are used to organize and summarize the information so that the researcher can see
what happened in the research study and can communicate the results to others.

◦ Statistics helps in collecting appropriate quantitative data.

◦ To predict the decision regarding future outcomes.

◦ To estimate the unknown quantities.

◦ To establish association between factors.

◦ Quality testing is another use of statistics in every area of life.

Why statistics
Knowledge in statistics provides you with the necessary tools and conceptual foundations in quantitative
reasoning to extract information intelligently from this sea of data.

• Statistical methods and analyses are often used to communicate research findings and to support
hypotheses and give credibility to research methodology and conclusions.

•It is important for researchers and also consumers of research to understand statistics so that they can
be informed, evaluate the credibility and usefulness of information, and make appropriate decisions.

Role of statistics in social work:


◦ Understanding statistical concepts is essential for social work professionals. It is key to
understanding research and reaching evidence-based decisions in your own practice.

◦ When conducting social work research with the goal of advancing the knowledge in the field,
statistics is an essential tool that enables social workers to draw a story out of the mountains of
statistical data unearthed. According to the definition of statistics, it is the science of collecting,
analyzing, summarizing, and making inferences from data sets. Since conducting research means
you have to make sense of all the data compiled, statistics are enormously important for
drawing accurate conclusions about the topic being examined in the research.

Branches of Statistics

There are two branches of statistics

◦ Descriptive Statistics

◦ Inferential Statistics

Descriptive statistics are statistical procedures used to summarize, organize or simplify data. Provide
description of population either through numerical calculation or graphs or tables data.

Inferential statistics consists of techniques that allow us to study samples and then make
generalizations about the population from which they are selected.

You might also like