You are on page 1of 14

AMIT SINGH/MBA MMS/SSJCET20024/S.

S JHONDHALE COLLEGE

AMIT OMPRAKASH SINGH

SSJCET20024

MBA FY MMS 2021

SHIVAJIRAO S. JHONDHALE COLLEGE

BUSINESS STATISTICS
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

BUSINESS STATISTICS

1.Distinguish between the point estimation and interval estimation.


• Point estimate. A point estimate of a population parameter is a single value of a
statistic.
• For example, the sample mean x is a point estimate of the population mean μ.
Similarly, the sample proportion p is a point estimate of the population proportion P.

• Interval estimate. An interval estimate is defined by two numbers, between


which a population parameter is said to lie.
• For example, a < x < b is an interval estimate of the population mean μ. It indicates
that the population mean is greater than a but less than b.

2.Explain how an interval estimate is better than a point estimate.


• Point estimation gives us a particular value as an estimate of the population
parameter.
• Interval estimation gives us a range of values which is likely to contain the population
parameter. This interval is called a confidence interval.

3. What do you understand by 'Central Tendency'?


• Central tendency is a descriptive summary of a dataset through a single value that
reflects the centre of the data distribution. Along with the variability (dispersion) of a
dataset, central tendency is a branch of descriptive statistics.
• The central tendency is one of the most quintessential concepts in statistics.
• Mean (Average): Represents the sum of all values in a dataset divided by the total
number of the values.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

• Median: The middle value in a dataset that is arranged in ascending order (from the
smallest value to the largest value). If a dataset contains an even number of values,
the median of the dataset is the mean of the two middle values.
• Mode: Defines the most frequently occurring value in a dataset. In some cases, a
dataset may contain multiple modes while some datasets may not have any mode at
all.

4.Under what conditions is the median more suitable than other measures of central
tendency?
• The median is usually preferred to other measures of central tendency when your
data set is skewed (i.e., forms a skewed distribution) or you are dealing with ordinal
data. However, the mode can also be appropriate in these situations, but is
not as commonly used as the median.

5."Every average has its own peculiar characteristics. It is difficult to say which average is
the best." Explain with examples.
• yes it's difficult.
• for ex:- I and u got 90 percentage and this is the average marks.so to see who is the
best among us mam said to give one answer again and she thought anyone would
answer but we both replied and it was correct so as a conclusion we can say that
"every best".

6.What do you mean by dispersion? What are the different measures of dispersion?
• Dispersion is the state of getting dispersed or spread. Statistical dispersion means the
extent to which a numerical data is likely to vary about an average value. In other
words, dispersion helps to understand the distribution of the data.
• Types of Measures of Dispersion
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

There are two main types of dispersion methods in statistics which are:

• Absolute Measure of Dispersion

• Relative Measure of Dispersion

• Absolute Measure of Dispersion-An absolute measure of dispersion contains the same


unit as the original data set. Absolute dispersion method expresses the variations
in terms of the average of deviations of observations like standard or means
deviations. It includes range, standard deviation, quartile deviation, etc.
• The types of absolute measures of dispersion are:

1. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set then squaring each of them and
adding each square and finally dividing them by the total no of values in the data set
is the variance. Variance (σ2)=∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers
into quarters. The quartile deviation is half of the distance between the third and the
first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of
central tendency is known as the mean deviation (also called mean absolute
deviation).
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

7.Why is the standard deviation the most widely used measure of dispersion? Explain.
• Standard deviation is considered to be the best measure of dispersion and is
therefore, the most widely used measure of dispersion.
• It is based on all values and thus, provides information about the complete series.
Because of this reason, a change in even one value affects the value of standard
deviation.
• It is independent of origin but not of scale.
• It is useful in advance statistical calculations like comparison of variability in two data
sets.
• It can be used in testing of hypothesis.
• It is capable of further algebraic treatment.

8.Define skewness and Dispersion.


• Skewness is a measure of asymmetry of distribution about a certain point. A
distribution may be mildly asymmetric, strongly asymmetric, or symmetric. The
measure of asymmetry of a distribution is computed using skewness. In case of a
positive skewness, the distribution is said to be right-skewed and when the skewness
is negative, the distribution is said to be left-skewed. If the skewness is zero, the
distribution is symmetric. Skewness is measured on the basis of Mean, Median, and
Mode. The value of skewness can be positive, negative, or undefined depending on
whether the data points are skewed to left, or skewed to the right.
• In statistics, dispersion is a measure of how distributed the data is meaning it specifies
how the values within a data set differ from one another in size. It is the range to which
a statistical distribution is spread around a central point. It mainly determines the
variability of the items of a data set around its central point. Simply put, it measures
the degree of variability around the mean value. The measures of dispersion are
important to determine the spread of data around a measure of location. For example,
the variance is a standard measure of dispersion which specifies how the data is
distributed about the mean. Other measures of dispersion are Range and Average
Deviation.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

9.Define Kurtosis and Moments.


• kurtosis is a statistical measure that is used to describe distribution. Whereas
skewness differentiates extreme values in one versus the other tail, kurtosis measures
extreme values in either tail. Distributions with large kurtosis exhibit tail data
exceeding the tails of the normal distribution (e.g., five or more standard deviations
from the mean). Distributions with low kurtosis exhibit tail data that are generally less
extreme than the tails of the normal distribution.
• the moments of a function are quantitative measures related to the shape of the
function's graph. If the function represents mass, then the first moment is the center
of the mass, and the second moment is the rotational inertia. If the function is a
probability distribution, then the first moment is the expected value, the second
central moment is the variance, the third standardized moment is the skewness, and
the fourth standardized moment is the kurtosis. The mathematical concept is closely
related to the concept of moment in physics.
• For a distribution of mass or probability on a bounded interval, the collection of all the
moments (of all orders, from 0 to ∞) uniquely determines the distribution (Harsdorf
moment problem). The same is not true on unbounded intervals.

10.“Correlation and Regression are two sides of the same coin”. Explain?
• correlation and regression are two sides of the same statistical coin. When you
measure the linear correlation of two variables, what you are in effect doing is laying
out a straight line that best fits the average "together-movement" of
these two variables.

11.What do you understand by linear regression?


• Linear regression attempts to model the relationship between two variables by fitting
a linear equation to observed data. One variable is considered to be an explanatory
variable, and the other is considered to be a dependent variable.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

12.What is a scatter diagram? How does it help in studying correlation between two
variables, in respect of both its nature and extent?
• The scatter diagram is a technique used to examine the relationship between both the
axis (X and Y) with one variable. In the graph, if the variables are correlated, the point
will drop along a curve or line. A scatter diagram or scatter plot, is used to give an idea
of the nature of relationship.
• Perfect Positive Correlation: Perfect Negative.
• Low Degree of Positive Correlation: Low Degree
• High Degree of Positive Correlation: High Degree.

13.Write short note on Karl Pearson’s coefficient of correlation.


• Karl Pearson’s coefficient of correlation is an extensively used mathematical method
in which the numerical representation is applied to measure the level of relation
between linearly related variables. The coefficient of correlation is expressed by “r”.
• Karl Pearson Correlation Coefficient Formula

14.Write short note on Spearman’s Rank Correlation Coefficient.


• The Spearman's Rank Correlation Coefficient is the non-parametric statistical measure
used to study the strength of association between the two ranked variables. This
method is applied to the ordinal set of numbers, which can be arranged in order, i.e.
one after the other so that ranks can be given to each.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

15.Calculate median for the following data.


Monthly Wages (Rs): No of workers
800-1,000 18
1,000-1,200 25
1,200-1,400 30
1,400-1,600 34
1,600-1,800 26
1,800-2,000 10

Monthly wages frequency Cumulative frequency


800-1,000 18 18
1,000-1,200 25 43
1,200-1,400 30 73
1,400-1,600 34 107
1,600-1,800 26 133
1,800-2,000 10 143

Now, Median is the value of N/2 =143/ 2 = 71.5 item, which lies in the class (1,200-
1,400). Thus (1,200-1,400) is the median class. For determining the median in this class,
we use interpolation formula as follow:

M=L1 + N/2-C (L2-L1)


F

=1200 +71.5-43/30 *200

=RS 1393.2
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

16.What do you understand by non-parametric tests?


• In statistics, nonparametric tests are methods of statistical analysis that do not require
a distribution to meet the required assumptions to be analyzed (especially if the data
is not normally distributed). Due to this reason, they are sometimes referred to as
distribution-free tests. Nonparametric tests serve as an alternative to parametric tests
such as T-test or ANOVA that can be employed only if the underlying data satisfies
certain criteria and assumptions.
• The underlying data do not meet the assumptions about the population sample
• Generally, the application of parametric tests requires various assumptions to be
satisfied. For example, the data follows a normal distribution and the population
variance is homogeneous. However, some data samples may show skewed
distributions.
• The skewness makes the parametric tests less powerful because the mean is no longer
the best measure of central tendency because it is strongly affected by the extreme
values. At the same time, nonparametric tests work well with skewed distributions
and distributions that are better represented by the median.
• The population sample size is too small-The sample size is an important assumption in
selecting the appropriate statistical method. If a sample size is reasonably large, the
applicable parametric test can be used. However, if a sample size is too small, it is
possible that you may not be able to validate the distribution of the data. Thus, the
application of nonparametric tests is the only suitable option.
• The analyzed data is ordinal or nominal-Unlike parametric tests that can work only
with continuous data, nonparametric tests can be applied to other data types such as
ordinal or nominal data. For such types of variables, the nonparametric tests are the
only appropriate solution.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

17.What are Type I and Type II Errors in hypothesis testing?


• Type I error-When the null hypothesis is true and you reject it, you make a type I error.
The probability of making a type I error is α, which is the level of significance you set
for your hypothesis test. An α of 0.05 indicates that you are willing to accept a 5%
chance that you are wrong when you reject the null hypothesis. To lower this risk, you
must use a lower value for α. However, using a lower value for alpha means that you
will be less likely to detect a true difference if one really exists.
• Type II error-When the null hypothesis is false and you fail to reject it, you make a type
II error. The probability of making a type II error is β, which depends on the power of
the test. You can decrease your risk of committing a type II error by ensuring your test
has enough power. You can do this by ensuring your sample size is large enough to
detect a practical difference when one truly exists.

18.What is a Test Statistic? What are the commonly used test statistics in hypotheses
testing?
• A test statistic is a statistic (a quantity derived from the sample) used in statistical
hypothesis testing. A hypothesis test is typically specified in terms of a test statistic,
considered as a numerical summary of a data-set that reduces the data to one
value that can be used to perform the hypothesis test.

• Concept of null hypothesis- A classic use of a statistical test occurs in process control
studies. For example, suppose that we are interested in ensuring that photomasks in
a production process have mean linewidths of 500 micrometers. The null hypothesis,
in this case, is that the mean linewidth is 500 micrometers. Implicit in this statement
is the need to flag photomasks which have mean linewidths that are either much
greater or much less than 500 micrometers. This translates into the alternative
hypothesis that the mean linewidths are not equal to 500 micrometers. This is a two-
sided alternative because it guards against alternatives in opposite directions; namely,
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

that the linewidths are too small or too large.The testing procedure works this way.
Linewidths at random positions on the photomask are measured using a scanning
electron microscope. A test statistic is computed from the data and tested against pre-
determined upper and lower critical values. If the test statistic is greater than the
upper critical value or less than the lower critical value, the null hypothesis is rejected
because there is evidence that the mean linewidth is not 500 micrometers.

• One-sided tests of hypothesis- Null and alternative hypotheses can also be one-sided.
For example, to ensure that a lot of light bulbs has a mean lifetime of at least 500
hours, a testing program is implemented. The null hypothesis, in this case, is that the
mean lifetime is greater than or equal to 500 hours. The complement or alternative
hypothesis that is being guarded against is that the mean lifetime is less than 500
hours. The test statistic is compared with a lower critical value, and if it is less than
this limit, the null hypothesis is rejected.
• Thus, a statistical test requires a pair of hypotheses; namely,
• H0: a null hypothesis
• Ha: an alternative hypothesis.
• Significance levels- The null hypothesis is a statement about a belief. We may doubt
that the null hypothesis is true, which might be why we are "testing" it. The alternative
hypothesis might, in fact, be what we believe to be true. The test procedure is
constructed so that the risk of rejecting the null hypothesis, when it is in fact true, is
small. This risk, α, is often referred to as the significance level of the test. By having a
test with a small value of α, we feel that we have actually "proved" something when
we reject the null hypothesis.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

19.Distinguish between a One-tailed and Two-tailed test, give a diagram and an example
in each case.
• One tailed test- If you are using a significance level of .05, a one-tailed test allots all
of your alpha to testing the statistical significance in the one direction of interest. This
means that .05 is in one tail of the distribution of your test statistic. When using a one-
tailed test, you are testing for the possibility of the relationship in one direction and
completely disregarding the possibility of a relationship in the other direction. Let’s
return to our example comparing the mean of a sample to a given value x using a t-
test. Our null hypothesis is that the mean is equal to x. A one-tailed test will test either
if the mean is significantly greater than x or if the mean is significantly less than x, but
not both. Then, depending on the chosen tail, the mean is significantly greater than or
less than x if the test statistic is in the top 5% of its probability distribution or bottom
5% of its probability distribution, resulting in a p-value less than 0.05. The one-tailed
test provides more power to detect an effect in one direction by not testing the effect
in the other direction. A discussion of when this is an appropriate option follows.

• Two tailed test- If you are using a significance level of 0.05, a two-tailed test allots half
of your alpha to testing the statistical significance in one direction and half of your
alpha to testing statistical significance in the other direction. This means that .025 is
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

in each tail of the distribution of your test statistic. When using a two-tailed test,
regardless of the direction of the relationship you hypothesize, you are testing for the
possibility of the relationship in both directions. For example, we may wish to
compare the mean of a sample to a given value x using a t-test. Our null hypothesis is
that the mean is equal to x. A two-tailed test will test both if the mean is significantly
greater than x and if the mean significantly less than x. The mean is considered
significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of
its probability distribution, resulting in a p-value less than 0.05.
AMIT SINGH/MBA MMS/SSJCET20024/S. S JHONDHALE COLLEGE

20.What do you mean by Critical Region and Acceptance Region of a test?


• The critical region is the region of values that corresponds to the rejection of the null
hypothesis at some chosen probability level. The shaded area under the Student's t
distribution curve is equal to the level of significance. The critical values are tabulated
and thus obtained from the Student's t table or anther appropriate table. If the
absolute value of the t statistic is larger than the tabulated value, then t is in the critical
region.
• The acceptance region is the interval within the sampling distribution of the test
statistic that is consistent with the null hypothesisH0 from hypothesis testing. It is the
complementary region to the rejection region. The acceptance region is associated
with a probability 1−α.

You might also like