Professional Documents
Culture Documents
1. Many business decisions involve the relationship between two or more variables.
E.g., what determines sales levels? •
2. Regression analysis is used to develop an equation showing how the variables are
related.
3. The variable being predicted is called the dependent variable and is denoted by y. The
variables used to predict the value of the dependent variable are called the
independent variables and are denoted by x.
4. The average relationship between a dependent and independent variable is called a
regression.
5. The dependent variable is assumed to be random variable whereas the independent
variables are assumed to have fixed values.
Dependent Variable
Independent Variable
A variable on the basis of which the dependent variable is to be estimated is called independent
variable. The independent variable is also called regressor, predictor or explanatory variable. It is
denoted by ‘’ X ‘’.
Regression Line/Equation
̂ = 𝒂 + 𝒃𝑿
𝒀
1. a is the intercept of the regression line
2. b is the slop parameter.
n is the total number of variables.
Regression Line/Equation
If X is the independent variable and Y is the dependent variable, then the relationship
̂ = 𝒂 + 𝒃𝑿 is called a regression line which is used to find a linear
described by a straight line 𝒀
relationship of the two variables. For example, the relation between Celsius and Fahrenheit
scales (temperatures) given by F = 32 + 1.8C is a linear relation.
The relation of X and Y depends upon the value of b. So following are the different types of relationship
exist in regression
Positive relation
If the value of b is positive then there will be positive relation between X and Y, which means if X
increase Y increase and if X decrease Y also decrease.
Negative relation
If the value of b is negative then there will be negative relation between X and Y, which means if
X increase Y decrease and if X decrease Y increase.
No relation
Correlation
Correlation a LINEAR association between two random variables
Correlation analysis show us how to determine both the nature and strength of relationship
between two variables
When the changes in one variables appear to be linked with the changes in the other variable,
the two variable are said to correlate.
In other words coefficient of correlation is the formula used to find a value which tells about the
strength of relationship between variable X and Y i.e. how strongly positive or negative the
relation is.
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
Statistical Inference
The main objective of sampling is to draw conclusions about the unknown population from the
information provided by a sample. This is called statistical inference.
i) Estimation of parameter
1. Estimation of Parameter
Statistical inference about the unknown value of the population parameter is called
estimation of parameter. Suppose we are interested to know the average life of tires of a certain
firm. This means we want an estimate of something which is not known to us. So it is the
problem of estimation.
Important Terms
1) Estimate
The value of the estimator takes when calculated using an actual sample of data.
For example 𝑋̅ is the estimate of population mean μ.
2) Estimator
An estimator is rule or formula that tells how to calculate an estimate based on the
∑𝑋
measurement contained in a sample. For example 𝑋̅ = 𝑛
is the estimator.
3) Estimation
The computation of a statistic from sample data for the purposes of obtaining a guess of the
unknown population parameter value.
Types of estimation
1. Point estimation
EX. A firm wish to estimate amount of time its salesman spend on each sales call
2. Interval Estimation
Testing of hypothesis
Hypothesis testing is a process which is used to check the validity of a statement about
a population parameter.
What is hypothesis?
A statement about the population parameter developed for the purpose of testing.
Types of hypothesis
1. Statistical hypothesis
2. Null hypothesis
A null hypothesis is any hypothesis which is tested for possible rejection or acceptance
under the assumption that it is true.
3. Alternative hypothesis
A statement that specifying that the population parameter is some value other than the
one specified under null hypothesis.
To perform any kind of statistical analysis for testing of hypothesis the following six steps are the
base of every study. These steps are known as the procedure of hypothesis testing. These are as follows.
The first step in hypothesis testing is to identify the problem and decide on the statements, that
which statement can be a null hypothesis and which can be the alternative. Notational null and
alternative hypothesis can be represented as
Null hypothesis Alternative hypothesis
𝐻0 : 𝜃 = 𝜃0 𝐻1 : 𝜃 ≠ 𝜃0
𝐻0 : 𝜃 ≤ 𝜃0 𝐻1 : 𝜃 > 𝜃0
𝐻0 : 𝜃 ≥ 𝜃0 𝐻1 : 𝜃 < 𝜃0
Where 𝜃 is the population parameter and 𝜃0 is the statement about the population parameter.
2. Level of significance
3. Test statistics
A statistic used as a basis for deciding whether the null hypothesis should be rejected is called
test statistics.
4. Critical region
Critical region or rejection region is decided by 𝐻1 . the size of critical region is equal to α.
5. Computation
The relevant test-statistic is calculated from the sample data. The calculated value is to be
compared with the tabulated value.
6. Conclusion
If the calculated value of test-statistic lies in the rejection region, the null hypothesis Ho is
rejected and 𝐻1 is accepted.
And if the calculated value of the test-statistics do not falls in the rejection region then we say
Ho is accepted or do not rejected.
Z-test
Hypothesis Testing of population mean ( when σ is known)
A large sample of size n>30 is selected from the population and sample mean 𝑋̅ is calculated. So the
testing procedure used for this kind of information is called Z-test for testing a specified value of µ i.e.
𝜇0 . The test procedure for Z-test is given below
Procedure:
1. We frame the null and alternative hypothesis. Three different forms of null and alternative hypothesis
are possible which are:
a) 𝐻0 : 𝜇 = 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 ≠ 𝜇0
b) 𝐻0 : 𝜇 ≤ 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 > 𝜇0
c) 𝐻0 : 𝜇 ≥ 𝜇0 𝑎𝑛𝑑 𝐻1 : 𝜇 < 𝜇0
2. Level of significance α is decided that can be 1%, 5% and 10%
𝑋̅−𝜇0
3. Test Statistics 𝑍= 𝜎/√𝑛
4. Critical Region
Table that can be use to take the tabulated value at different alpha’s is given below:
α Two sided (α/2) One sided right (α) One sided left (-α)
5. Calculation
Put all the information test statistics and get the results
6. Conclusion
If the calculated value of Z falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the selected value is not the correct
estimate about population mean.
Suppose there are two population with mean 𝜇1 𝑎𝑛𝑑 𝜇2 which are unknown and variances
𝜎 1 𝑎𝑛𝑑 𝜎 2 2 which are known. Two large samples of size 𝑛1 𝑎𝑛𝑑 𝑛2 > 30 are selected from the
2
population and sample means 𝑋̅1 𝑎𝑛𝑑 𝑋̅2 are calculated. So the testing procedure used to test whether
the two population’s means are identical or not by using this kind of information is called two sample Z-
test. The test procedure for two sample Z-test is given below
Procedure:
1. We frame the null and alternative hypothesis. Three different forms of null and alternative hypothesis
are possible which are:
a) 𝐻0 : 𝜇1 − 𝜇2 = 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 ≠ 0
b) 𝐻0 : 𝜇1 − 𝜇2 ≤ 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 > 0
c) 𝐻0 : 𝜇1 − 𝜇2 ≥ 0 𝑎𝑛𝑑 𝐻1 : 𝜇1 − 𝜇2 < 0
OR
a) 𝐻0 : 𝜇1 = 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 ≠ 𝜇2
b) 𝐻0 : 𝜇1 ≤ 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 > 𝜇2
c) 𝐻0 : 𝜇1 ≥ 𝜇2 𝑎𝑛𝑑 𝐻1 : 𝜇1 < 𝜇2
t-test
Suppose a population has mean µ which is unknown and standard deviation σ which is also
unknown. A small sample of size n<30 is selected from the population and sample mean 𝑋̅ and sample
standard deviation ‘s’ is calculated. So the testing procedure used for this kind of information is called t-
test for testing a specified value of µ i.e. 𝜇0 .
Procedure:
1. Formulating Hypothesis: We frame the null and alternative hypothesis. Three different forms of null
and alternative hypothesis are possible which are:
a) 𝑯𝟎 : 𝜇 = 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 ≠ 𝜇0
b) 𝑯𝟎 : 𝜇 ≤ 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 > 𝜇0
c) 𝑯𝟎 : 𝜇 ≥ 𝜇0 𝑎𝑛𝑑 𝑯𝟏 : 𝜇 < 𝜇0
2. Level of significance α is decided that can be 1%, 5% and 10%
𝑋̅−𝜇0
3. Test Statistics t= 𝑠/√𝑛
∑𝑋 ∑(𝑥−𝑥̅ )2
Where, 𝑋̅ = 𝑎𝑛𝑑 𝑠 = √
𝑛 𝑛−1
4. Critical Region
Where
5. Calculation
Put all the information test statistics and get the results
6. Conclusion
If the calculated value of t falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the selected value is not the correct
estimate about population mean.
𝝌𝟐 Test
A chi squared test follows a chi square distribution. This test is generally used for three purpose
which are as follows:
The Chi Square statistic is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical
variables in the population; they are independent.
Chi-squared tests often refers to tests for which the distribution of the test statistic approaches
2
the χ distribution asymptotically, meaning that the sampling distribution (if the null hypothesis is true)
of the test statistic approximates a chi-squared distribution more and more closely as sample sizes
increase.
The chi-square test for variance is a non-parametric statistical procedure with a chi-square-
distributed test statistic that is used for determining whether the variance of a variable obtained from a
particular sample has the same size as the known population variance of the same variables. In order to
determine the population variance, it is necessary to examine the entire population. It is often sufficient
to obtain the population variance based on a representative sample. When conducting the chi-square
test, the variable being tested can have any level on a scale.
1. Formulating Hypothesis: We frame the null and alternative hypothesis. Three different
forms of null and alternative hypothesis are possible which are:
a) 𝐻0 : 𝜎 2 = 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 ≠ 𝜎 2 0
b) 𝐻0 : 𝜎 2 ≤ 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 > 𝜎 2 0
c) 𝐻0 : 𝜎 2 ≥ 𝜎 2 0 𝑎𝑛𝑑 𝐻1 : 𝜎 2 < 𝜎 2 0
2. Level of significance α is decided that can be 1%, 5% and 10%
(𝑛−1)𝑠2
3. Test Statistics 𝜒2 = 𝜎20
Whereby
4. Critical Region
5. Calculation
Put all the information test statistics and get the results
6. Conclusion
If the calculated value of 𝜒 2 falls in the critical region than we say that our null hypothesis is
rejected under and the provided information and conclude that the population variance is not equal to
the specific value we assumed.
The more the mean moves away from the mode, the larger the asymmetry or skewness
A distribution is said to be 'skewed' when the mean and the median fall at different points in
the distribution, and the balance (or centre of gravity) is shifted to one side or the other-to left
or right
positive skewness
Mean ˃ Median ˃ Mode
negative skewness
Mean ˂ Median ˂ Mode
KURTOSIS
While skewness signifies the extent of asymmetry, kurtosis measures the degree of peakedness
of a frequency distribution.
Karl Pearson classified curves into three types on the basis of the shape of their peaks.
These are Mesokurtic, leptokurtic and platykurtic. These three types of curves
A measure of central tendency such as the mean or median that provides information about the
center or average value.
A measure of dispersion such as standard deviation that indicates the variability of the data.
A measure of skewness that shows the lack symmetry in the frequency distribution.
MIDS syllabus
Statistics
• Collection of data
• Summarization of data
• Analysis of data
• Interpretation of data
OR
It is science concerns with the collection, presentation, and analysis of data and to draw valid inferences
from the given data.
Data
Data is information usually numerically that are collected through observation or experiment.
Uses of statistics
◦ Statistics are used to organize and summarize the information so that the researcher can see
what happened in the research study and can communicate the results to others.
Why statistics
Knowledge in statistics provides you with the necessary tools and conceptual foundations in quantitative
reasoning to extract information intelligently from this sea of data.
• Statistical methods and analyses are often used to communicate research findings and to support
hypotheses and give credibility to research methodology and conclusions.
•It is important for researchers and also consumers of research to understand statistics so that they can
be informed, evaluate the credibility and usefulness of information, and make appropriate decisions.
◦ When conducting social work research with the goal of advancing the knowledge in the field,
statistics is an essential tool that enables social workers to draw a story out of the mountains of
statistical data unearthed. According to the definition of statistics, it is the science of collecting,
analyzing, summarizing, and making inferences from data sets. Since conducting research means
you have to make sense of all the data compiled, statistics are enormously important for
drawing accurate conclusions about the topic being examined in the research.
Branches of Statistics
◦ Descriptive Statistics
◦ Inferential Statistics
Descriptive statistics are statistical procedures used to summarize, organize or simplify data. Provide
description of population either through numerical calculation or graphs or tables data.
Inferential statistics consists of techniques that allow us to study samples and then make
generalizations about the population from which they are selected.