You are on page 1of 34

IGNOU

MS-08

IMPORTANT QUESTIONS AND ANSWERS

Q1) Discuss the role of Chi-square distribution in testing of hypothesis.

A chi-squared test, also written as test, is any statistical hypothesis test wherein the
sampling distribution of the test statistic is a chi-squared distribution when the null
hypothesis is true. Without other qualification, 'chi-squared test' often is used as short
for Pearson's chi-squared test.

Chi-squared tests are often constructed from a sum of squared errors, or through the
sample variance. Test statistics that follow a chi-squared distribution arise from an
assumption of independent normally distributed data, which is valid in many cases due
to the central limit theorem. A chi-squared test can be used to attempt rejection of the
null hypothesis that the data are independent.

Also considered a chi-squared test is a test in which this is asymptotically true, meaning
that the sampling distribution (if the null hypothesis is true) can be made to
approximate a chi-squared distribution as closely as desired by making the sample size
large enough. The chi-squared test is used to determine whether there is a significant
difference between the expected frequencies and the observed frequencies in one or
more categories.

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the
observed value of a given phenomena is significantly different from the expected
value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the
observed sample distribution with the expected probability distribution. Chi-Square
goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the
interval are compared, with the expected numbers of points in each interval.

Procedure for Chi-Square Goodness of Fit Test:

Set up the hypothesis for Chi-Square goodness of fit test:


A. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that
there is no significant difference between the observed and the expected value.
B. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis
assumes that there is a significant difference between the observed and the expected
value.
Compute the value of Chi-Square goodness of fit test using the following formula:

Where, = Chi-Square goodness of fit test O= observed value E=


expected value
Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom depends
on the distribution of the sample. The following table shows the distribution and an
associated degree of freedom:
Type of distribution No of constraints Degree of freedom
Binominal distribution 1 n-1
Poisson distribution 2 n-2
Normal distribution 3 n-3
Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the same as
in other tests, like t-test, ANOVA, etc. The calculated value of Chi-Square goodness of fit
test is compared with the table value. If the calculated value of Chi-Square goodness of
fit test is greater than the table value, we will reject the null hypothesis and conclude
that there is a significant difference between the observed and the expected
frequency. If the calculated value of Chi-Square goodness of fit test is less than the table
value, we will accept the null hypothesis and conclude that there is no significant
difference between the observed and expected value.

Q2) Marginal Revenue

ans) Marginal revenue is the increase in revenue that results from the sale of one
additional unit of output. While marginal revenue can remain constant over a certain
level of output, it follows the law of diminishing returns and will eventually slow down,
as the output level increases. Perfectly competitive firms continue producing output
until marginal revenue equals marginal cost.

A company experiences best results when production and sales continue until marginal
revenue equals marginal cost. Marginal cost is the total expense of selling one additional
good. If marginal revenue exceeds marginal costs, this indicates the company made a
profit on the item sold. When marginal revenue falls below marginal cost, this is an
indicator that it is no longer profitable to produce and sell this good.

Marginal revenue for competitive firms is typically constant. This is because the market
dictates the optimal price level and companies do not have much if any discretion
over the marginal price. Marginal revenue works differently for monopolies. Because
monopolies have control over the quantity of available goods in the market, marginal
revenue for a monopoly decreases as additional goods are sold, because the level of
goods being supplied is increasing.

Formula

The marginal revenue formula is calculated by dividing the change in total revenue by
the change in quantity sold.

To calculate the change in revenue, we simply subtract the revenue figure before the
last unit was sold from the total revenue after the last unit was sold.

Q3) Explain Binomial and Normal distribution

ans)

The normal (z) distribution is a continuous distribution that arises in many natural
processes. "Continuous" means that between any two data values we could (at least in
theory) find another data value. For example, men's heights vary continuously and are
the result of so many tiny random influences that the overall distribution of men's
heights in America is very close to normal. Another example is the data values that we
would get if we repeatedly measured the mass of a reference object on a pan balance
the readings would differ slightly because of random errors, and the readings taken as a
whole would have a normal distribution.

The bell-shaped normal curve has probabilities that are found as the area between any
two z values. You can use either Table A in your textbook or the normalcdf function on
your calculator as a way of finding these normal probabilities.

Not all natural processes produce normal distributions. For example, incomes
in America are the result of random natural capitalist processes, but the result is an
extremely skew right distribution.

A binomial distribution is very different from a normal distribution, and yet if the
sample size is large enough, the shapes will be quite similar.

The key difference is that a binomial distribution is discrete, not continuous. In other
words, it is NOT possible to find a data value between any two data values.

The requirements for a binomial distribution are

1) The r.v. of interest is the count of successes in n trials


2) The number of trials (or sample size), n, is fixed
3) Trials are independent, with fixed value p = P(success on a trial)
4) There are only two possible outcomes on each trial, called "success" and "failure."
(This is where the "bi" prefix in "binomial" comes from. If there were several possible
outcomes, we would need to use a multinomial distribution to account for them, but we
don't study multinomial distributions in the beginning AP Statistics course.)

Q4) Define Hypothesis. Explain various types of errors in testing of Hypothesis.


Describe various steps involved in the "Hypothesis Testing".

ans) in statistics, during a statistical survey or a research, a hypothesis has to be set and
defined. It is termed as a statistical hypothesis It is actually an assumption for the
population parameter. Though, it is definite that this hypothesis is always proved to be
true. The hypothesis testing refers to the predefined formal procedures that are used
by statisticians whether to accept or reject the hypotheses. Hypothesis testing is defined
as the process of choosing hypotheses for a particular probability distribution, on the
basis of observed data.

Hypothesis testing is a core and important topic in statistics. In the research hypothesis
testing, a hypothesis is an optional but important detail of the phenomenon. The null
hypothesis is defined as a hypothesis that is aimed to challenge a researcher. Generally,
the null hypothesis represent the current explanation or the vision of a feature which
the researcher is going to test. Hypothesis testing includes the tests that are used to
determine the outcomes that would lead to the rejection of a null hypothesis in order to
get a specified level of significance. This helps to know if the results have enough
information, provided that conventional wisdom is being utilized for the establishment
of null hypothesis.

A hypothesis testing gives the following benefits

1. They establish the focus and track for a research effort.


2. Their development helps the researcher shape the purpose of the research
movement.
3. They establish which variables will not be measured in a study and similarly
those, which will be measured.
4. They need the researcher to contain the operational explanation of the variables
of interest.

Process of Hypothesis Testing

1. State the hypotheses of importance


2. Conclude the suitable test statistic
3. State the stage of statistical significance
4. State the decision regulation for rejecting / not rejecting the null hypothesis
5. Collect the data and complete the needed calculations
6. Choose to reject / not reject the null hypothesis

Errors in Research Testing:

It is common to make two types of errors while drawing conclusions in research:

Type 1: When we recognize the research hypothesis and the null hypothesis is
supposed to be correct.

Type 2: When we refuse the research hypothesis even if the null hypothesis is incorrect.

We illustrate the five steps to hypothesis testing in the context of testing a specified
value for a population proportion. The procedure for hypothesis testing is given below :

1. Set up a null hypothesis and alternative hypothesis.


2. Decide about the test criterion to be used.
3. Calculate the test statistic using the given values from the sample
4. Find the critical value at the required level of significance and degrees of
freedom.
5. Decide whether to accept or reject the hypothesis. If the calculated test statistic
value is less than the critical value, we accept the hypothesis otherwise we reject
the hypothesis.

Q5) Explain the meaning of sampling distribution of a sample statistic. Obtain the
sampling distribution of mean in case of sampling from infinite populations.

ANS) The sampling distribution of the mean was defined in the section introducing
sampling distributions. This section reviews some important properties of the sampling
distribution of the mean introduced in the demonstrations in this chapter.
MEAN
The mean of the sampling distribution of the mean is the mean of the population from
which the scores were sampled. Therefore, if a population has a mean , then the mean
of the sampling distribution of the mean is also . The symbol Mis used to refer to the
mean of the sampling distribution of the mean. Therefore, the formula for the mean of
the sampling distribution of the mean can be written as:

M =

VARIANCE
The variance of the sampling distribution of the mean is computed as follows:

That is, the variance of the sampling distribution of the mean is the population variance
divided by N, the sample size (the number of scores used to compute a mean). Thus, the
larger the sample size, the smaller the variance of the sampling distribution of the mean.
he standard error of the mean is the standard deviation of the sampling
distribution of the mean. It is therefore the square root of the variance of the sampling
distribution of the mean and can be written as:

The standard error is represented by a because it is a standard deviation. The


subscript (M) indicates that the standard error in question is the standard error of the
mean.

In inferential statistics, we want to use characteristics of the sample (i.e. a statistic) to


estimate the characteristics of the population (i.e. a parameter). What happens when
we take a sample of size n from some population? If a continuous distribution, how is
the sample mean distributed?&fnbsp; If taken from a categorical population set of data,
how is that sample proportion distributed? One uses the sample mean (the statistic) to
estimate the population mean (the parameter) and the sample proportion (the statistic)
to estimate the population proportion (the parameter). In doing so, we need to know
the properties of the sample mean or the sample proportion. That is why we need to
study the sampling distribution of the statistics. We will begin with the sampling
distribution of the sample mean. Since the sample statistic is a single value that
estimates a population paramater, we refer to the statistic as a point estimate.
Before we begin, we will introduce a brief explanation of notation and some new terms
that we will use this lesson and in future lessons.
Notation:
Sample mean: book uses y-bar or y; most other sources use x-bar or x
Population mean: standard notation is the Greek letter
Sample proportion: book uses -hat ^ ; other sources use p-hat, (p^)
Population proportion: book uses ; other sources use p

Terms
Standard error standard deviation of a sample statistic
Standard deviation relates to a sample
Parameters, e.g. mean and SD, are summary measures of population, e.g. and . These
are fixed.
Statistics, e.g. sample mean and sample SD, are summary measures of a sample,
e.g. x and s. These vary. Think about taking a sample and the sample isnt always the
same therefore the statistics change. This is the motiviation behind this lesson - due to
this sampling variation the sample statistics themselves have a distribution that can be
described by some measure of central tendency and spread.

Note: The sample mean y is random since its value depends on the sample chosen. It is
called a statistic. The population mean is fixed, usually denoted as .
The sampling distribution of the (sample) mean is also called the distribution of the
variable y.

Q6) What is skewness ? Distinguish between Karl Pearson's and Bowley's coefficient of
skewness. Which one of these would you prefer and why ?

ans) What is Pearsons Coefficient of Skewness?


Karl Pearson developed two methods to find skewness in a sample.
1. Pearsons Coefficient of Skewness #1 uses the mode. The formula is:

Where = the mean, Mo = the mode and s = the standard deviation for the
sample.
See: Pearson Mode Skewness.
2. Pearsons Coefficient of Skewness #2 uses the median. The formula is:

Where = the mean, Mo = the mode and s = the standard deviation for the
sample.
It is generally used when you dont know the mode.
Sample problem: Use Pearsons Coefficient #1 and #2 to find the skewness for data
with the following characteristics:
Mean = 70.5.
Median = 80.
Mode = 85.
Standard deviation = 19.33.
Pearsons Coefficient of Skewness #1 (Mode):
Step 1: Subtract the mode from the mean: 70.5 85 = -14.5.
Step 2: Divide by the standard deviation: -14.5 / 19.33 = -0.75.
Pearsons Coefficient of Skewness #2 (Median):
Step 1: Subtract the median from the mean: 70.5 80 = -9.5.
Step 2: Multiply Step 1 by 3: -9.5(3) = -28.5
Step 2: Divide by the standard deviation: -28.5 / 19.33 = -1.47.
Caution: Pearsons first coefficient of skewness uses the mode. Therefore, if the mode is
made up of too few pieces of data it wont be a stable measure of central tendency. For
example, the mode in both these sets of data is 9:
1 2 3 4 5 6 7 8 9 9.
1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 10 12 12 13.
In the first set of data, the mode only appears twice. This isnt a good measure of central
tendency so you would be cautioned not to use Pearsons coefficient of skewness. The
second set of data has a more stable set (the mode appears 12 times).
Therefore, Pearsons coefficient of skewness will likely give you a reasonable result.

Interpretation
In general:

The direction of skewness is given by the sign.


The coefficient compares the sample distribution with a normal distribution. The
larger the value, the larger the distribution differs from a normal distribution.
A value of zero means no skewness at all.
A large negative value means the distribution is negatively skewed.
A large positive value means the distribution is positively skewed.

Q8) What is time series analysis ? Decompose a time series into its various components
and describe them.

ans) Time series analysis comprises methods for analyzing time series data in order to
extract meaningful statistics and other characteristics of the data. Time series
forecasting is the use of a model to predict future values based on previously observed
values. While regression analysis is often employed in such a way as to test theories that
the current values of one or more independent time series affect the current value of
another time series, this type of analysis of time series is not called "time series
analysis", which focuses on comparing values of a single time series or multiple
dependent time series at different points in time.

Methods for time series analysis may be divided into two classes: frequency-domain
methods and time-domain methods. The former include spectral analysis and wavelet
analysis; the latter include auto-correlation and cross-correlation analysis. In the time
domain, correlation and analysis can be made in a filter-like manner using scaled
correlation, thereby mitigating the need to operate in the frequency domain.
Additionally, time series analysis techniques may be divided into parametric and non-
parametric methods. The parametric approaches assume that the underlying stationary
stochastic process has a certain structure which can be described using a small number
of parameters (for example, using an autoregressive or moving average model). In these
approaches, the task is to estimate the parameters of the model that describes the
stochastic process. By contrast, non-parametric approaches explicitly estimate the
covariance or the spectrum of the process without assuming that the process has any
particular structure.

Decomposition methods are based on an analysis of the individual components of a time


series. The strength of each component is estimated separately and then substituted

into a model that explains the behavior of the time series. Two of the more important
decomposition methods are

Multiplicative decomposition

Additive decomposition

MULTIPLICATIVE DECOMPOSITION

The multiplicative decomposition model is expressed as the product of the four


components of a time series:

yt = TRtStCtIt
These variables are defined as follows:

yt = Value of the time series at time t


TRt = Trend at time t
St = Seasonal component at time t
Ct = Cyclical component at time t

It = Irregular component at time t


Each component has a subscript t to indicate a specific time period. The time period can
be measured in weeks, months, quarters, years, and so forth.

For example, sales of air conditioners depend heavily on the season of the year; due to
population growth, sales of air conditioners also show a positive trend over time.
Suppose you use the following equation to estimate (and to explain) the trend in the
demand for air conditioners:

TRt = 1000 + 25t


Quarterly data is used, so t represents the time measured in quarters. This equation
indicates that over time, sales of air conditioners tend to rise by 25 units per quarter.
Using the trend equation, the forecast of air conditioner sales over the coming year
looks like this:

Seasonal factors are handled by giving different weights to each season that are used to

adjust the trend components. Assume that the seasonal factors for four seasons are as
follows:

These values show that the seasonal demand for air conditioners is strongest in the
third quarter and weakest in the fourth and first quarters. (If there is no seasonal effect,
then each of these factors would be equal to 1.) Incorporating the seasonal factors into
the model gives the following adjusted forecasts:
Now, suppose you estimate the four cyclical (quarterly) factors to be:

Incorporating the cyclical factors gives the following adjusted forecast for the four
quarters over the coming year:

ADDITIVE DECOMPOSITION

With additive decomposition, a time series is modeled as the sum of the trend, seasonal
effect, cyclical effect, and irregular effects. This is shown in the following equation:

yt = TRt + St + Ct + It
The additive decomposition method is more appropriate when the seasonal factors tend
to be steady from one year to the next. By contrast, multiplicative decomposition is
more widely used since many economic time series have a seasonal factor that grows
proportionately with the level of the time series. In other words, economic growth tends
to be multiplicative rather than linear, because returns are compounded over time.

Q9) What do you understand by 'Central tendency' ? Describe the measures of central
tendency.

ans) A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data. As such, measures of
central tendency are sometimes called measures of central location. They are also
classed as summary statistics. The mean (often called the average) is most likely the
measure of central tendency that you are most familiar with, but there are others, such
as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to
use than others. In the following sections, we will look at the mean, mode and median,
and learn how to calculate them and under what conditions they are most appropriate
to be used.

Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is most
often with continuous data (see our Types of Variable guide for data types). The mean is
equal to the sum of all the values in the data set divided by the number of values in the
data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample
mean, usually denoted by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol
letter, , pronounced "sigma", which means "sum of...":
You may have noticed that the above formula refers to the sample mean. So, why have
we called it a sample mean? This is because, in statistics, samples and populations have
very different meanings and these differences are very important, even if, in the case of
the mean, they are calculated in the same way. To acknowledge that we are calculating
the population mean and not the sample mean, we use the Greek lower case letter "mu",
denoted as :

The mean is essentially a model of your data set. It is the value that is most common.
You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it
minimises error in the prediction of any one value in your data set. That is, it is the value
that produces the lowest amount of error from all other values in the data set.

An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency
where the sum of the deviations of each value from the mean is always zero.

When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being
especially small or large in numerical value. For example, consider the wages of staff at
a factory below:

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of
a worker, as most workers have salaries in the $12k to 18k range. The mean is being
skewed by the two large salaries. Therefore, in this situation, we would like to have a
better measure of central tendency. As we will find out later, taking the median would
be a better measure of central tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our
data is skewed (i.e., the frequency distribution for our data is skewed). If we consider
the normal distribution - as this is the most frequently assessed in statistics - when the
data is perfectly normal, the mean, median and mode are identical. Moreover, they all
represent the most typical value in the data set. However, as the data becomes skewed
the mean loses its ability to provide the best central location for the data because the
skewed data is dragging it away from the typical value. However, the median best
retains this position and is not as strongly influenced by the skewed values. This is
explained in more detail in the skewed distribution section later in this guide.

Median

The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to
calculate the median, suppose we have the data below:

65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works fine
when you have an odd number of scores, but what happens when you have an even
number of scores? What if you had only 10 scores? Well, you simply have to take the
middle two scores and average the result. So, if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average them to get
a median of 55.5.

Q10) What is a random variable ? How is it used to define a probability distribution


ans) A random variable is defined as a function that associates a real number (the
probability value) to an outcome of an experiment.

In other words, a random variable is a generalization of the outcomes or events in a


given sample space. This is possible since the random variable by definition can change
so we can use the same variable to refer to different situations. Random variables make
working with probabilities much neater and easier.

A random variable in probability is most commonly denoted by capital X, and the small
letter x is then used to ascribe a value to the random variable.

For examples, given that you flip a coin twice, the sample space for the possible
outcomes is given by the following:

There are four possible outcomes as listed in the sample space above; where H stands
for heads and T stands for tails.

The random variable X can be given by the following:

To find the probability of one of those out comes we denote that question as:

which means that the probability that the random variable is equal to some real
number x.

In the above example, we can say:

Let X be a random variable defined as the number of heads obtained when two coins are
tossed. Find the probability the you obtain two heads.

So now we've been told what X is and that x = 2, so we write the above information as:
Since we already have the sample space, we know that there is only one outcomes with
two heads, so we find the probability as:

we can also simply write the above as:

From this example, you should be able to see that the random variable X refers to any of
the elements in a given sample space.

There are two types of random variables: discrete variables and continuous random
variables.

Discrete Random Variables


The word discrete means separate and individual. Thus discrete random variables are
those that take on integer values only. They never include fractions or decimals.

A quick example is the sample space of any number of coin flips, the outcomes will
always be integer values, and you'll never have half heads or quarter tails. Such a
random variable is referred to as discrete. Discrete random variables give rise to
discrete probability distributions.

Continuous Random Variable


Continuous is the opposite of discrete. Continuous random variables are those that take
on any value including fractions and decimals. Continuous random variables give rise to
continuous probability distributions.

Q11) Skewness

ans) In probability theory and statistics, skewness is a measure of the asymmetry of the
probability distribution of a real-valued random variable about its mean. The skewness
value can be positive or negative, or even undefined.
The qualitative interpretation of the skew is complicated and unintuitive. Skew must
not be thought to refer to the direction the curve appears to be leaning; in fact, the
opposite is true (see below). For a unimodal distribution, negative skew indicates that
the tail on the left side of the probability density function is longer or fatter than the
right side it does not distinguish these two kinds of shape. Conversely, positive skew
indicates that the tail on the right side is longer or fatter than the left side. In cases
where one tail is long but the other tail is fat, skewness does not obey a simple rule. For
example, a zero value means that the tails on both sides of the mean balance out overall;
this is the case for a symmetric distribution, but is also true for an asymmetric
distribution where the asymmetries even out, such as one tail being long but thin, and
the other being short but fat. Further, in multimodal distributions and discrete
distributions, skewness is also difficult to interpret. Importantly, the skewness does not
determine the relationship of mean and median. In cases where it is necessary, data
might be transformed to have a normal distribution.

Q12) Baye's Theorem

ans) In probability theory and statistics, Bayes theorem (alternatively Bayes law or
Bayes' rule) describes the probability of an event, based on prior knowledge of
conditions that might be related to the event. For example, if cancer is related to age,
then, using Bayes theorem, a persons age can be used to more accurately assess the
probability that they have cancer, compared to the assessment of the probability of
cancer made without knowledge of the person's age.

One of the many applications of Bayes theorem is Bayesian inference, a particular


approach to statistical inference. When applied, the probabilities involved in Bayes
theorem may have different probability interpretations. With the Bayesian probability
interpretation the theorem expresses how a subjective degree of belief should rationally
change to account for availability of related evidence. Bayesian inference is fundamental
to Bayesian statistics.

Bayes Theorem is a way of finding a probability when we know certain other


probabilities.

The formula is:

P(A) P(B|A)
P(A|B) =
P(B)
It tells us how often A happens given that B happens, written P(A|B), when we know
how often B happens given that A happens, written P(B|A) , and how likely A and B are
on their own.

P(A|B) is "Probability of A given B", the probability of A given that B happens


P(A) is Probability of A
P(B|A) is "Probability of B given A", the probability of B given that A happens
P(B) is Probability of B

When P(Fire) means how often there is fire, and P(Smoke) means how often we see
smoke, then:

P(Fire|Smoke) means how often there is fire when we see smoke.


P(Smoke|Fire) means how often we see smoke when there is fire.

So the formula kind of tells us "forwards" when we know "backwards" (or vice versa)

Example: If dangerous fires are rare (1%) but smoke is fairly common (10%) due to
factories, and 90% of dangerous fires make smoke then:

P(Fire) P(Smoke|Fire) 1% x 90%


P(Fire|Smoke) = = = 9%
P(Smoke) 10%

In this case 9% of the time expect smoke to mean a dangerous fire.

Q13) Multistage sampling

ans) Multistage sampling refers to sampling plans where the sampling is carried out in
stages using smaller and smaller sampling units at each stage.

Multistage sampling can be a complex form of cluster sampling because it is a type of


sampling which involves dividing the population into groups (or clusters). Then, one or
more clusters are chosen at random and everyone within the chosen cluster is sampled.

Using all the sample elements in all the selected clusters may be prohibitively expensive
or unnecessary. Under these circumstances, multistage cluster sampling becomes
useful. Instead of using all the elements contained in the selected clusters, the
researcher randomly selects elements from each cluster. Constructing the clusters is the
first stage. Deciding what elements within the cluster to use is the second stage. The
technique is used frequently when a complete list of all members of the population does
not exist and is inappropriate.

In some cases, several levels of cluster selection may be applied before the final sample
elements are reached. For example, household surveys conducted by the Australian
Bureau of Statistics begin by dividing metropolitan regions into 'collection districts' and
selecting some of these collection districts (first stage). The selected collection districts
are then divided into blocks, and blocks are chosen from within each selected collection
district (second stage). Next, dwellings are listed within each selected block, and some
of these dwellings are selected (third stage). This method makes it unnecessary to
create a list of every dwelling in the region and necessary only for selected blocks. In
remote areas, an additional stage of clustering is used, in order to reduce travel
requirements

Q14) Absolute value function

ans) An absolute value function is a function that contains an algebraic expression


within absolute value symbols. Recall that the absolute value of a number is its distance
from 0 on the number line. To graph an absolute value function, choose several values of
x and find some ordered pairs.

Q15) Identity matrix

ans) A square matrix in which all the main diagonal elements are 1s and all the
remaining elements are 0s is called an Identity Matrix. Identity Matrix is also
called Unit Matrix or Elementary Matrix. Identity Matrix is denoted with the letter
Inn , where nn represents the order of the matrix. One of the important properties of
identity matrix is: AInn = A, where A is any square matrix of order nn.
Examples of Identity Matrix

are identity matrices of order , , , nn.

Q16) Quantiles

ans) In statistics and the theory of probability, quantiles are cutpoints dividing the
range of a probability distribution into contiguous intervals with equal probabilities, or
dividing the observations in a sample in the same way. There is one less quantile than
the number of groups created. Thus quartiles are the three cut points that will divide a
dataset into four equal-size groups (cf. depicted example). Common quantiles have
special names: for instance quartile, decile (creating 10 groups: see below for more).
The groups created are termed halves, thirds, quarters, etc., though sometimes the
terms for the quantile are used for the groups created, rather than for the cut points.

q-Quantiles are values that partition a finite set of values into q subsets of (nearly) equal
sizes. There are q of the q-quantiles, one for each integer k satisfying 0 < k < q. In
some cases the value of a quantile may not be uniquely determined, as can be the case
for the median (2-quantile) of a uniform probability distribution on a set of even size.
Quantiles can also be applied to continuous distributions, providing a way to generalize
rank statistics to continuous variables. When the cumulative distribution function of a
random variable is known, the q-quantiles are the application of the quantile function
(the inverse function of the cumulative distribution function to the values { /q, /q, ,
q /q}.

Q17) Axioms of probability


ans) Given an event in a sample space which is either finite with elements or
countably infinite with elements, then we can write

and a quantity , called the probability of event , is defined such that

1. .

2. .

3. Additivity: , where and are mutually exclusive.

4. Countable additivity: for , 2, ..., where , , ... are


mutually exclusive (i.e., ).

Q18) The power curve of a test

ans) The power of a hypothesis test is the probability of correctly rejecting the null
hypothesis. Power is shown as the shaded area in the plot on the left and as a function of
the noncentrality parameter, , in the plot on the right. Power depends on the effect
size, , as well as the sample size, , and the significance level, .

Q19) Polynomial Function

ans) In mathematics, a polynomial is an expression consisting of variables (or


indeterminates) and coefficients, that involves only the operations of addition,
subtraction, multiplication, and non-negative integer exponents. Polynomials appear in
a wide variety of areas of mathematics and science. For example, they are used to form
polynomial equations, which encode a wide range of problems, from elementary word
problems to complicated problems in the sciences; they are used to define polynomial
functions, which appear in settings ranging from basic chemistry and physics to
economics and social science; they are used in calculus and numerical analysis to
approximate other functions. In advanced mathematics, polynomials are used to
construct polynomial rings and algebraic varieties, central concepts in algebra and
algebraic geometry.

Q20) Cluster sampling


ans) Cluster sampling refers to a type of sampling method . With cluster sampling, the
researcher divides the population into separate groups, called clusters. Then, a simple
random sample of clusters is selected from the population. The researcher conducts his
analysis on data from the sampled clusters.

Compared to simple random sampling and stratified sampling , cluster sampling has
advantages and disadvantages. For example, given equal sample sizes, cluster sampling
usually provides less precision than either simple random sampling or stratified
sampling. On the other hand, if travel costs between clusters are high, cluster sampling
may be more cost-effective than the other methods.

Q21) Delphi method of forecasting

ans) he Delphi method is a structured communication technique or method, originally


developed as a systematic, interactive forecasting method which relies on a panel of
experts. The experts answer questionnaires in two or more rounds. After each round, a
facilitator or change agent provides an anonymised summary of the experts' forecasts
from the previous round as well as the reasons they provided for their judgments. Thus,
experts are encouraged to revise their earlier answers in light of the replies of other
members of their panel. It is believed that during this process the range of the answers
will decrease and the group will converge towards the "correct" answer. Finally, the
process is stopped after a predefined stop criterion (e.g. number of rounds, achievement
of consensus, stability of results) and the mean or median scores of the final rounds
determine the results.

Delphi is based on the principle that forecasts (or decisions) from a structured group of
individuals are more accurate than those from unstructured groups.The technique can
also be adapted for use in face-to-face meetings, and is then called mini-Delphi or
Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and
has certain advantages over another structured forecasting approach, prediction
markets

Q22) Stratified sampling

ans) In statistics, stratified sampling is a method of sampling from a population.

In statistical surveys, when subpopulations within an overall population vary, it is


advantageous to sample each subpopulation (stratum) independently. Stratification is
the process of dividing members of the population into homogeneous subgroups before
sampling. The strata should be mutually exclusive: every element in the population
must be assigned to only one stratum. The strata should also be collectively exhaustive:
no population element can be excluded. Then simple random sampling or systematic
sampling is applied within each stratum. This often improves the representativeness of
the sample by reducing sampling error. It can produce a weighted mean that has less
variability than the arithmetic mean of a simple random sample of the population.

In computational statistics, stratified sampling is a method of variance reduction when


Monte Carlo methods are used to estimate population statistics from a known
population.

Q23) Linear function

ans) The linear function is popular in economics. It is attractive because it is simple and
easy to handle mathematically. It has many important applications.

Linear functions are those whose graph is a straight line.

A linear function has the following form

y = f(x) = a + bx

A linear function has one independent variable and one dependent variable. The
independent variable is x and the dependent variable is y.

a is the constant term or the y intercept. It is the value of the dependent variable when x
= 0.

b is the coefficient of the independent variable. It is also known as the slope and gives
the rate of change of the dependent variable.

Q24) Correlation coefficient

ans) A correlation coefficient is a number that quantifies a type of correlation and


dependence, meaning statistical relationships between two or more values in
fundamental statistics.

Types of correlation coefficients include:

Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a


measure of the strength and direction of the linear relationship between two variables
that is defined as the (sample) covariance of the variables divided by the product of
their (sample) standard deviations.
Intraclass correlation, a descriptive statistic that can be used when quantitative
measurements are made on units that are organized into groups; describes how
strongly units in the same group resemble each other.

Rank correlation, the study of relationships between rankings of different variables or


different rankings of the same variable

Spearman's rank correlation coefficient, a measure of how well the relationship


between two variables can be described by a monotonic function

Kendall tau rank correlation coefficient, a measure of the portion of ranks that match
between two data sets.

Goodman and Kruskal's gamma, a measure of the strength of association of the cross
tabulated data when both variables are measured at the ordinal level.

Q25) Non-sampling error

ans) In statistics, non-sampling error is a catch-all term for the deviations of estimates
from their true values that are not a function of the sample chosen, including various
systematic errors and random errors that are not due to sampling. Non-sampling errors
are much harder to quantify than sampling errors.

Non-sampling errors in survey estimates can arise from:

Coverage errors, such as failure to accurately represent all population units in the
sample, or the inability to obtain information about all sample cases;

Response errors by respondents due for example to definitional differences,


misunderstandings, or deliberate misreporting;

Mistakes in recording the data or coding it to standard classifications;

Other errors of collection, nonresponse, processing, or imputation of values for missing


or inconsistent data.

Q26) Name the types of Probability Sampling Method

ans) Sampling takes on two forms in statistics: probability sampling and non-
probability sampling:

Probability sampling uses random sampling techniques to create a sample.


Non-probability samplingtechniques use non-random processes like researcher
judgment or convenience sampling.
Probability sampling is based on the fact that every member of a population has a
known and equal chance of being selected. For example, if you had a population of
100 people, each person would have odds of 1 out of 100 of being chosen. With non-
probability sampling, those odds are not equal. For example, a person might have a
better chance of being chosen if they live close to the researcher or have access to a
computer. Probability sampling gives you the best chance to create a sample that is
truly representative of the population.
Using probability sampling for finding sample sizes means that you can employ
statistical techniques like confidence intervals and margins of error to validate your
results.

Types of Probability Sampling


Simple random sampling is a completely random method of selecting subjects.
These can include assigning numbers to all subjects and then using a random
number generator to choose random numbers. Classic ball and urn experiments are
another example of this process (assuming the balls are sufficiently mixed). The
members whose numbers are chosen are included in the sample.
Stratified Random Sampling involves splitting subjects into mutually
exclusive groups and then using simple random sampling to choose members from
groups.
Systematic Sampling means that you choose every nth participant from a
complete list. For example, you could choose every 10th person listed.
Cluster Random Sampling is a way to randomly select participants from a list that
is too large for simple random sampling. For example, if you wanted to choose 1000
participants from the entire population of the U.S., it is likely impossible to get a
complete list of everyone. Instead, the researcher randomly selects areas (i.e. cities
or counties) and randomly selects from within those boundaries.
Multi-Stage Random sampling uses a combination of techniques.

Advantages and Disadvantages


Each probability sampling method has its own unique advantages and disadvantages.

Advantages
Cluster sampling: convenience and ease of use.
Simple random sampling: creates samples that are highly representative of the
population.
Stratified random sampling: creates strata or layers that are highly
representative of strata or layers in the population.
Systematic sampling: creates samples that are highly representative of the
population, without the need for a random number generator.

Disadvantages
Cluster sampling: might not work well if unit members are not homogeneous (i.e.
if they are different from each other).
Simple random sampling: tedious and time consuming, especially when creating
larger samples.
Stratified random sampling: tedious and time consuming, especially when
creating larger samples.
Systematic sampling: not as random as simple random sampling,
Q27) Normal distribution

ans) In probability theory, the normal (or Gaussian) distribution is a very common
continuous probability distribution. Normal distributions are important in statistics and
are often used in the natural and social sciences to represent real-valued random
variables whose distributions are not known.

The normal distribution is useful because of the central limit theorem. In its most
general form, under some conditions (which include finite variance), it states that
averages of samples of observations of random variables independently drawn from
independent distributions converge in distribution to the normal, that is, become
normally distributed when the number of observations is sufficiently large. Physical
quantities that are expected to be the sum of many independent processes (such as
measurement errors) often have distributions that are nearly normal.Moreover, many
results and methods (such as propagation of uncertainty and least squares parameter
fitting) can be derived analytically in explicit form when the relevant variables are
normally distributed.

Q28) Null and Alternative hypothesis

ans) Definition of Null Hypothesis

A null hypothesis is a statistical hypothesis in which there is no significant difference


exist between the set of variables. It is the original or default statement, with no effect,
often represented by H0 (H-zero). It is always the hypothesis that is tested. It denotes
the certain value of population parameter such as , s, p. A null hypothesis can be
rejected, but it cannot be accepted just on the basis of a single test.

Definition of Alternative Hypothesis

A statistical hypothesis used in hypothesis testing, which states that there is a


significant difference between the set of variables. It is often referred to as the
hypothesis other than the null hypothesis, often denoted by H1 (H-one). It is what the
researcher seeks to prove in an indirect way, by using the test. It refers to a certain
value of sample statistic, e.g., x, s, p

The acceptance of alternative hypothesis depends on the rejection of the null hypothesis
i.e. until and unless null hypothesis is rejected, an alternative hypothesis cannot be
accepted.

Q29) Standard deviation

ans) In statistics, the standard deviation (SD, also represented by the Greek letter sigma
or the Latin letter s is a measure that is used to quantify the amount of variation or
dispersion of a set of data values. A low standard deviation indicates that the data points
tend to be close to the mean (also called the expected value) of the set, while a high
standard deviation indicates that the data points are spread out over a wider range of
values.

The standard deviation of a random variable, statistical population, data set, or


probability distribution is the square root of its variance. It is algebraically simpler,
though in practice less robust, than the average absolute deviation. A useful property of
the standard deviation is that, unlike the variance, it is expressed in the same units as
the data. There are also other measures of deviation from the norm, including mean
absolute deviation, which provide different mathematical properties from standard
deviation.

In addition to expressing the variability of a population, the standard deviation is


commonly used to measure confidence in statistical conclusions. For example, the
margin of error in polling data is determined by calculating the expected standard
deviation in the results if the same poll were to be conducted multiple times

Q30) Explain the concept of the power curve of a test and p-value of a test.

ans) Generally your null hypothesis is a value for some parameter (of e.g. a population
or family of random variables). You want to reject the null hypothesis if the data you
have is sufficiently unlikely if the null hypothesis were true.

A hypothesis test does this using a test statistic and a rejection region - with the null
hypothesis rejected if the test statistic falls in the rejection region. The rejection region
is typically parameterized by a value , the significance level. This value is the
probability, if the null hypothesis is true, that your test statistic falls in the rejection
region. This is called a false positive or type-I error. The idea is that you can choose a
value for that represents an acceptable chance of a false rejection.

Now, there is also a true value for the parameter in the null hypothesis. Given this value
as well as the rejection region for the test, there is some probability of rejecting the null
hypothesis. This probability (often referred to as ) is the power of the test.

Since the true value is unknown, the power of a test can also be described with a
curve, (x) x , where xx is the supposed to true value for the statistic and (x) x is
the probability of rejecting the null hypothesis for that value of xx. Notice that if one
chooses a larger , the rejection region grows, and so does (x) x for each xx.

The power of a test often also refers to the size of (x) x as xx moves away from the
null hypothesis. Two tests with the same significance level can be compared in terms of
their power curves, with one test deemed more powerful if it has a higher when xx is
not the null hypothesis.

If you perform a test, the p-value is the smallest for which the test would reject the
null hypothesis. That is, it's the value of the significance level for which the test statistic
would have been on the boundary of the rejection region. Notice that this does not
depend on the chosen significance level, so increasing by increasing will have no
effect on the p-value.

The p-value is also often used in place of the rejection region for the test statistic [1] to
determine whether to reject the null hypothesis. In this case it is rejected if and only
if p< p< . So the power describes the probability that p< p< . Then if a test has a
larger than another test for a fixed , we can see that the more powerful test will tend
to generate smaller p-values.

Q31) Explain the Gauss Jordan Method for solving a system of linear simultaneous
equation with the help of an example.

ANS) Let's start simple, and work our way up to messier examples.

Solve the following system of equations.

5x + 4y z = 0
10y 3z = 11
z=
3

It's fairly easy to


see how to proceed
in this case. I'll just
back-substitute
the z-value from
the third equation
into the second
equation, solve the
result for y, and
then
plug z and y into
the first equation
and solve the result
for x.

10y 3(3) = 11
10y 9 = 11
10y = 20
y=2
5x + 4(2) (3) = 0
5x + 8 3 = 0
5x + 5 = 0
5x = 5
x = 1

Then the solution is (x, y, z) = (1, 2, 3).

The reason this system was easy to solve is that the system was "triangular"; this refers
to the equations having the form of a triangle, because of the lower equations containing
only the later variables.

The point is that, in this format, the system is simple to solve. And Gaussian elimination
is the method we'll use to convert systems to this upper triangular form, using the row
operations we learned when we did the addition method.

Solve the following system of equations using Gaussian elimination.

3x + 2y 6z = 6
5x + 7y 5z = 6
x + 4y 2z = 8

No equation is solved for a variable, so I'll have to do the multiplication-and-


addition thing to simplify this system. In order to keep track of my work, I'll
write down each step as I go. But I'll do my computations on scratch paper. Here
is how I did it:

The first thing to do is to get rid of the leading x-terms in two of the rows. For
now, I'll just look at which rows will be easy to clear out; I can switch rows later
to get the system into "upper triangular" form. There is no rule that says I have to
use the x-term from the first row, and, in this case, I think it will be simpler to use
the x-term from the third row, since its coefficient is simply "1". So I'll multiply
the third row by 3, and add it to the first row. I do the computations on scratch
paper:

...and then I write down the results:


(When we were solving two-variable systems, we could multiply a row, rewriting
the system off to the side, and then add down. There is no space for this in a
three-variable system, which is why we need the scratch paper.)

Warning: Since I didn't actually do anything to the third row, I copied it down,
unchanged, into the new matrix of equations. I used the third row, but I didn't
actually change it. Don't confuse "using" with "changing".

To get smaller numbers for coefficients, I'll multiply the first row by one-half:

Now I'll multiply the third row by 5 and add this to the second row. I do my
work on scratch paper:

...and then I write down the results: Copyright Elizabeth Stapel 2003-2011 All
Rights Reserved

I didn't do anything with the first row, so I copied it down unchanged. I worked
with the third row, but I only worked on the second row, so the second row is
updated and the third row is copied over unchanged.

Okay, now the x-column is cleared out except for the leading term in the third
row. So next I have to work on the y-column.

Warning: Since the third equation has an x-term, I cannot use it on either of the
other two equations any more (or I'll undo my progress). I can work on the
equation, but not with it.

If I add twice the first row to the second row, this will give me a leading 1 in the
second row. I won't have gotten rid of the leading y-term in the second row, but I
will have converted it (without getting involved in fractions) to a form that is
simpler to deal with. (You should keep an eye out for this sort of simplification.)
First I do the scratch work:

...and then I write down the results:

Now I can use the second row to clear out the y-term in the first row. I'll multiply
the second row by 7 and add. First I do the scratch work:

...and then I write down the results:

I can tell what z is now, but, just to be thorough, I'll divide the first row by 43.
Then I'll rearrange the rows to put them in upper-triangular form:

Now I can start the process of back-solving:

y 7(1) = 4
y 7 = 4
y=3

x + 4(3) 2(1) = 8
x + 12 2 = 8
x + 10 = 8
x = 2

Then the solution is (x, y, z) = (2, 3, 1).

Note: There is nothing sacred about the steps I used in solving the above system; there
was nothing special about how I solved this system. You could work in a different order
or simplify different rows, and still come up with the correct answer. These systems are
sufficiently complicated that there is unlikely to be one right way of computing the
answer. So don't stress over "how did she know to do that next?", because there is no
rule. I just did whatever struck my fancy; I did whatever seemed simplest or whatever
came to mind first. Don't worry if you would have used completely different steps. As
long as each step along the way is correct, you'll come up with the same answer.

In the above example, I could have gone further in my computations and been more
thorough-going in my row operations, clearing out all the y-terms other than that in the
second row and all the z-terms other than that in the first row. This is what the process
would then have looked like:

This way, I can just read off the values of x, y, and z, and I don't have to bother with the
back-substitution. This more-complete method of solving is called "Gauss-Jordan
elimination" (with the equations ending up in what is called "reduced-row-echelon
form"). Many texts only go as far as Gaussian elimination, but I've always found it easier
to continue on and do Gauss-Jordan.
Q32) Mixed Auto-regressive moving average models

ans) In the statistical analysis of time series, autoregressivemoving-average (ARMA)


models provide a parsimonious description of a (weakly) stationary stochastic process
in terms of two polynomials, one for the autoregression and the second for the moving
average. Given a time series of data Xt , the ARMA model is a tool for understanding and,
perhaps, predicting future values in this series. The model consists of two parts, an
autoregressive (AR) part and a moving average (MA) part. The AR part involves
regressing the variable on its own lagged (i.e., past) values. The MA part involves
modeling the error term as a linear combination of error terms occurring
contemporaneously and at various times in the past.

The model is usually referred to as the ARMA(p,q) model where p is the order of the
autoregressive part and q is the order of the moving average part

Q33) Least Squares Criterion

ans) The method of least squares is a standard approach in regression analysis to the
approximate solution of overdetermined systems, i.e., sets of equations in which there
are more equations than unknowns. "Least squares" means that the overall solution
minimizes the sum of the squares of the residuals made in the results of every single
equation.

The most important application is in data fitting. The best fit in the least-squares sense
minimizes the sum of squared residuals (a residual being: the difference between an
observed value, and the fitted value provided by a model). When the problem has
substantial uncertainties in the independent variable (the x variable), then simple
regression and least squares methods have problems; in such cases, the methodology
required for fitting errors-in-variables models may be considered instead of that for
least squares.

Least squares problems fall into two categories: linear or ordinary least squares and
non-linear least squares, depending on whether or not the residuals are linear in all
unknowns. The linear least-squares problem occurs in statistical regression analysis; it
has a closed-form solution. The non-linear problem is usually solved by iterative
refinement; at each iteration the system is approximated by a linear one, and thus the
core calculation is similar in both cases.

Q34) Rank Correlation.


ans) In statistics, a rank correlation is any of several statistics that measure an ordinal
associationthe relationship between rankings of different ordinal variables or
different rankings of the same variable, where a "ranking" is the assignment of the
labels "first", "second", "third", etc. to different observations of a particular variable. A
rank correlation coefficient measures the degree of similarity between two rankings,
and can be used to assess the significance of the relation between them. For example,
two common nonparametric methods of significance that use rank correlation are the
MannWhitney U test and the Wilcoxon signed-rank test.

You might also like