Professional Documents
Culture Documents
MS-08
A chi-squared test, also written as test, is any statistical hypothesis test wherein the
sampling distribution of the test statistic is a chi-squared distribution when the null
hypothesis is true. Without other qualification, 'chi-squared test' often is used as short
for Pearson's chi-squared test.
Chi-squared tests are often constructed from a sum of squared errors, or through the
sample variance. Test statistics that follow a chi-squared distribution arise from an
assumption of independent normally distributed data, which is valid in many cases due
to the central limit theorem. A chi-squared test can be used to attempt rejection of the
null hypothesis that the data are independent.
Also considered a chi-squared test is a test in which this is asymptotically true, meaning
that the sampling distribution (if the null hypothesis is true) can be made to
approximate a chi-squared distribution as closely as desired by making the sample size
large enough. The chi-squared test is used to determine whether there is a significant
difference between the expected frequencies and the observed frequencies in one or
more categories.
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the
observed value of a given phenomena is significantly different from the expected
value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the
observed sample distribution with the expected probability distribution. Chi-Square
goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test,
sample data is divided into intervals. Then the numbers of points that fall into the
interval are compared, with the expected numbers of points in each interval.
ans) Marginal revenue is the increase in revenue that results from the sale of one
additional unit of output. While marginal revenue can remain constant over a certain
level of output, it follows the law of diminishing returns and will eventually slow down,
as the output level increases. Perfectly competitive firms continue producing output
until marginal revenue equals marginal cost.
A company experiences best results when production and sales continue until marginal
revenue equals marginal cost. Marginal cost is the total expense of selling one additional
good. If marginal revenue exceeds marginal costs, this indicates the company made a
profit on the item sold. When marginal revenue falls below marginal cost, this is an
indicator that it is no longer profitable to produce and sell this good.
Marginal revenue for competitive firms is typically constant. This is because the market
dictates the optimal price level and companies do not have much if any discretion
over the marginal price. Marginal revenue works differently for monopolies. Because
monopolies have control over the quantity of available goods in the market, marginal
revenue for a monopoly decreases as additional goods are sold, because the level of
goods being supplied is increasing.
Formula
The marginal revenue formula is calculated by dividing the change in total revenue by
the change in quantity sold.
To calculate the change in revenue, we simply subtract the revenue figure before the
last unit was sold from the total revenue after the last unit was sold.
ans)
The normal (z) distribution is a continuous distribution that arises in many natural
processes. "Continuous" means that between any two data values we could (at least in
theory) find another data value. For example, men's heights vary continuously and are
the result of so many tiny random influences that the overall distribution of men's
heights in America is very close to normal. Another example is the data values that we
would get if we repeatedly measured the mass of a reference object on a pan balance
the readings would differ slightly because of random errors, and the readings taken as a
whole would have a normal distribution.
The bell-shaped normal curve has probabilities that are found as the area between any
two z values. You can use either Table A in your textbook or the normalcdf function on
your calculator as a way of finding these normal probabilities.
Not all natural processes produce normal distributions. For example, incomes
in America are the result of random natural capitalist processes, but the result is an
extremely skew right distribution.
A binomial distribution is very different from a normal distribution, and yet if the
sample size is large enough, the shapes will be quite similar.
The key difference is that a binomial distribution is discrete, not continuous. In other
words, it is NOT possible to find a data value between any two data values.
ans) in statistics, during a statistical survey or a research, a hypothesis has to be set and
defined. It is termed as a statistical hypothesis It is actually an assumption for the
population parameter. Though, it is definite that this hypothesis is always proved to be
true. The hypothesis testing refers to the predefined formal procedures that are used
by statisticians whether to accept or reject the hypotheses. Hypothesis testing is defined
as the process of choosing hypotheses for a particular probability distribution, on the
basis of observed data.
Hypothesis testing is a core and important topic in statistics. In the research hypothesis
testing, a hypothesis is an optional but important detail of the phenomenon. The null
hypothesis is defined as a hypothesis that is aimed to challenge a researcher. Generally,
the null hypothesis represent the current explanation or the vision of a feature which
the researcher is going to test. Hypothesis testing includes the tests that are used to
determine the outcomes that would lead to the rejection of a null hypothesis in order to
get a specified level of significance. This helps to know if the results have enough
information, provided that conventional wisdom is being utilized for the establishment
of null hypothesis.
Type 1: When we recognize the research hypothesis and the null hypothesis is
supposed to be correct.
Type 2: When we refuse the research hypothesis even if the null hypothesis is incorrect.
We illustrate the five steps to hypothesis testing in the context of testing a specified
value for a population proportion. The procedure for hypothesis testing is given below :
Q5) Explain the meaning of sampling distribution of a sample statistic. Obtain the
sampling distribution of mean in case of sampling from infinite populations.
ANS) The sampling distribution of the mean was defined in the section introducing
sampling distributions. This section reviews some important properties of the sampling
distribution of the mean introduced in the demonstrations in this chapter.
MEAN
The mean of the sampling distribution of the mean is the mean of the population from
which the scores were sampled. Therefore, if a population has a mean , then the mean
of the sampling distribution of the mean is also . The symbol Mis used to refer to the
mean of the sampling distribution of the mean. Therefore, the formula for the mean of
the sampling distribution of the mean can be written as:
M =
VARIANCE
The variance of the sampling distribution of the mean is computed as follows:
That is, the variance of the sampling distribution of the mean is the population variance
divided by N, the sample size (the number of scores used to compute a mean). Thus, the
larger the sample size, the smaller the variance of the sampling distribution of the mean.
he standard error of the mean is the standard deviation of the sampling
distribution of the mean. It is therefore the square root of the variance of the sampling
distribution of the mean and can be written as:
Terms
Standard error standard deviation of a sample statistic
Standard deviation relates to a sample
Parameters, e.g. mean and SD, are summary measures of population, e.g. and . These
are fixed.
Statistics, e.g. sample mean and sample SD, are summary measures of a sample,
e.g. x and s. These vary. Think about taking a sample and the sample isnt always the
same therefore the statistics change. This is the motiviation behind this lesson - due to
this sampling variation the sample statistics themselves have a distribution that can be
described by some measure of central tendency and spread.
Note: The sample mean y is random since its value depends on the sample chosen. It is
called a statistic. The population mean is fixed, usually denoted as .
The sampling distribution of the (sample) mean is also called the distribution of the
variable y.
Q6) What is skewness ? Distinguish between Karl Pearson's and Bowley's coefficient of
skewness. Which one of these would you prefer and why ?
Where = the mean, Mo = the mode and s = the standard deviation for the
sample.
See: Pearson Mode Skewness.
2. Pearsons Coefficient of Skewness #2 uses the median. The formula is:
Where = the mean, Mo = the mode and s = the standard deviation for the
sample.
It is generally used when you dont know the mode.
Sample problem: Use Pearsons Coefficient #1 and #2 to find the skewness for data
with the following characteristics:
Mean = 70.5.
Median = 80.
Mode = 85.
Standard deviation = 19.33.
Pearsons Coefficient of Skewness #1 (Mode):
Step 1: Subtract the mode from the mean: 70.5 85 = -14.5.
Step 2: Divide by the standard deviation: -14.5 / 19.33 = -0.75.
Pearsons Coefficient of Skewness #2 (Median):
Step 1: Subtract the median from the mean: 70.5 80 = -9.5.
Step 2: Multiply Step 1 by 3: -9.5(3) = -28.5
Step 2: Divide by the standard deviation: -28.5 / 19.33 = -1.47.
Caution: Pearsons first coefficient of skewness uses the mode. Therefore, if the mode is
made up of too few pieces of data it wont be a stable measure of central tendency. For
example, the mode in both these sets of data is 9:
1 2 3 4 5 6 7 8 9 9.
1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 10 12 12 13.
In the first set of data, the mode only appears twice. This isnt a good measure of central
tendency so you would be cautioned not to use Pearsons coefficient of skewness. The
second set of data has a more stable set (the mode appears 12 times).
Therefore, Pearsons coefficient of skewness will likely give you a reasonable result.
Interpretation
In general:
Q8) What is time series analysis ? Decompose a time series into its various components
and describe them.
ans) Time series analysis comprises methods for analyzing time series data in order to
extract meaningful statistics and other characteristics of the data. Time series
forecasting is the use of a model to predict future values based on previously observed
values. While regression analysis is often employed in such a way as to test theories that
the current values of one or more independent time series affect the current value of
another time series, this type of analysis of time series is not called "time series
analysis", which focuses on comparing values of a single time series or multiple
dependent time series at different points in time.
Methods for time series analysis may be divided into two classes: frequency-domain
methods and time-domain methods. The former include spectral analysis and wavelet
analysis; the latter include auto-correlation and cross-correlation analysis. In the time
domain, correlation and analysis can be made in a filter-like manner using scaled
correlation, thereby mitigating the need to operate in the frequency domain.
Additionally, time series analysis techniques may be divided into parametric and non-
parametric methods. The parametric approaches assume that the underlying stationary
stochastic process has a certain structure which can be described using a small number
of parameters (for example, using an autoregressive or moving average model). In these
approaches, the task is to estimate the parameters of the model that describes the
stochastic process. By contrast, non-parametric approaches explicitly estimate the
covariance or the spectrum of the process without assuming that the process has any
particular structure.
into a model that explains the behavior of the time series. Two of the more important
decomposition methods are
Multiplicative decomposition
Additive decomposition
MULTIPLICATIVE DECOMPOSITION
yt = TRtStCtIt
These variables are defined as follows:
For example, sales of air conditioners depend heavily on the season of the year; due to
population growth, sales of air conditioners also show a positive trend over time.
Suppose you use the following equation to estimate (and to explain) the trend in the
demand for air conditioners:
Seasonal factors are handled by giving different weights to each season that are used to
adjust the trend components. Assume that the seasonal factors for four seasons are as
follows:
These values show that the seasonal demand for air conditioners is strongest in the
third quarter and weakest in the fourth and first quarters. (If there is no seasonal effect,
then each of these factors would be equal to 1.) Incorporating the seasonal factors into
the model gives the following adjusted forecasts:
Now, suppose you estimate the four cyclical (quarterly) factors to be:
Incorporating the cyclical factors gives the following adjusted forecast for the four
quarters over the coming year:
ADDITIVE DECOMPOSITION
With additive decomposition, a time series is modeled as the sum of the trend, seasonal
effect, cyclical effect, and irregular effects. This is shown in the following equation:
yt = TRt + St + Ct + It
The additive decomposition method is more appropriate when the seasonal factors tend
to be steady from one year to the next. By contrast, multiplicative decomposition is
more widely used since many economic time series have a seasonal factor that grows
proportionately with the level of the time series. In other words, economic growth tends
to be multiplicative rather than linear, because returns are compounded over time.
Q9) What do you understand by 'Central tendency' ? Describe the measures of central
tendency.
ans) A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data. As such, measures of
central tendency are sometimes called measures of central location. They are also
classed as summary statistics. The mean (often called the average) is most likely the
measure of central tendency that you are most familiar with, but there are others, such
as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to
use than others. In the following sections, we will look at the mean, mode and median,
and learn how to calculate them and under what conditions they are most appropriate
to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is most
often with continuous data (see our Types of Variable guide for data types). The mean is
equal to the sum of all the values in the data set divided by the number of values in the
data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample
mean, usually denoted by (pronounced x bar), is:
This formula is usually written in a slightly different manner using the Greek capitol
letter, , pronounced "sigma", which means "sum of...":
You may have noticed that the above formula refers to the sample mean. So, why have
we called it a sample mean? This is because, in statistics, samples and populations have
very different meanings and these differences are very important, even if, in the case of
the mean, they are calculated in the same way. To acknowledge that we are calculating
the population mean and not the sample mean, we use the Greek lower case letter "mu",
denoted as :
The mean is essentially a model of your data set. It is the value that is most common.
You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it
minimises error in the prediction of any one value in your data set. That is, it is the value
that produces the lowest amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency
where the sum of the deviations of each value from the mean is always zero.
The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being
especially small or large in numerical value. For example, consider the wages of staff at
a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of
a worker, as most workers have salaries in the $12k to 18k range. The mean is being
skewed by the two large salaries. Therefore, in this situation, we would like to have a
better measure of central tendency. As we will find out later, taking the median would
be a better measure of central tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our
data is skewed (i.e., the frequency distribution for our data is skewed). If we consider
the normal distribution - as this is the most frequently assessed in statistics - when the
data is perfectly normal, the mean, median and mode are identical. Moreover, they all
represent the most typical value in the data set. However, as the data becomes skewed
the mean loses its ability to provide the best central location for the data because the
skewed data is dragging it away from the typical value. However, the median best
retains this position and is not as strongly influenced by the skewed values. This is
explained in more detail in the skewed distribution section later in this guide.
Median
The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to
calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works fine
when you have an odd number of scores, but what happens when you have an even
number of scores? What if you had only 10 scores? Well, you simply have to take the
middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and average them to get
a median of 55.5.
A random variable in probability is most commonly denoted by capital X, and the small
letter x is then used to ascribe a value to the random variable.
For examples, given that you flip a coin twice, the sample space for the possible
outcomes is given by the following:
There are four possible outcomes as listed in the sample space above; where H stands
for heads and T stands for tails.
To find the probability of one of those out comes we denote that question as:
which means that the probability that the random variable is equal to some real
number x.
Let X be a random variable defined as the number of heads obtained when two coins are
tossed. Find the probability the you obtain two heads.
So now we've been told what X is and that x = 2, so we write the above information as:
Since we already have the sample space, we know that there is only one outcomes with
two heads, so we find the probability as:
From this example, you should be able to see that the random variable X refers to any of
the elements in a given sample space.
There are two types of random variables: discrete variables and continuous random
variables.
A quick example is the sample space of any number of coin flips, the outcomes will
always be integer values, and you'll never have half heads or quarter tails. Such a
random variable is referred to as discrete. Discrete random variables give rise to
discrete probability distributions.
Q11) Skewness
ans) In probability theory and statistics, skewness is a measure of the asymmetry of the
probability distribution of a real-valued random variable about its mean. The skewness
value can be positive or negative, or even undefined.
The qualitative interpretation of the skew is complicated and unintuitive. Skew must
not be thought to refer to the direction the curve appears to be leaning; in fact, the
opposite is true (see below). For a unimodal distribution, negative skew indicates that
the tail on the left side of the probability density function is longer or fatter than the
right side it does not distinguish these two kinds of shape. Conversely, positive skew
indicates that the tail on the right side is longer or fatter than the left side. In cases
where one tail is long but the other tail is fat, skewness does not obey a simple rule. For
example, a zero value means that the tails on both sides of the mean balance out overall;
this is the case for a symmetric distribution, but is also true for an asymmetric
distribution where the asymmetries even out, such as one tail being long but thin, and
the other being short but fat. Further, in multimodal distributions and discrete
distributions, skewness is also difficult to interpret. Importantly, the skewness does not
determine the relationship of mean and median. In cases where it is necessary, data
might be transformed to have a normal distribution.
ans) In probability theory and statistics, Bayes theorem (alternatively Bayes law or
Bayes' rule) describes the probability of an event, based on prior knowledge of
conditions that might be related to the event. For example, if cancer is related to age,
then, using Bayes theorem, a persons age can be used to more accurately assess the
probability that they have cancer, compared to the assessment of the probability of
cancer made without knowledge of the person's age.
P(A) P(B|A)
P(A|B) =
P(B)
It tells us how often A happens given that B happens, written P(A|B), when we know
how often B happens given that A happens, written P(B|A) , and how likely A and B are
on their own.
When P(Fire) means how often there is fire, and P(Smoke) means how often we see
smoke, then:
So the formula kind of tells us "forwards" when we know "backwards" (or vice versa)
Example: If dangerous fires are rare (1%) but smoke is fairly common (10%) due to
factories, and 90% of dangerous fires make smoke then:
ans) Multistage sampling refers to sampling plans where the sampling is carried out in
stages using smaller and smaller sampling units at each stage.
Using all the sample elements in all the selected clusters may be prohibitively expensive
or unnecessary. Under these circumstances, multistage cluster sampling becomes
useful. Instead of using all the elements contained in the selected clusters, the
researcher randomly selects elements from each cluster. Constructing the clusters is the
first stage. Deciding what elements within the cluster to use is the second stage. The
technique is used frequently when a complete list of all members of the population does
not exist and is inappropriate.
In some cases, several levels of cluster selection may be applied before the final sample
elements are reached. For example, household surveys conducted by the Australian
Bureau of Statistics begin by dividing metropolitan regions into 'collection districts' and
selecting some of these collection districts (first stage). The selected collection districts
are then divided into blocks, and blocks are chosen from within each selected collection
district (second stage). Next, dwellings are listed within each selected block, and some
of these dwellings are selected (third stage). This method makes it unnecessary to
create a list of every dwelling in the region and necessary only for selected blocks. In
remote areas, an additional stage of clustering is used, in order to reduce travel
requirements
ans) A square matrix in which all the main diagonal elements are 1s and all the
remaining elements are 0s is called an Identity Matrix. Identity Matrix is also
called Unit Matrix or Elementary Matrix. Identity Matrix is denoted with the letter
Inn , where nn represents the order of the matrix. One of the important properties of
identity matrix is: AInn = A, where A is any square matrix of order nn.
Examples of Identity Matrix
Q16) Quantiles
ans) In statistics and the theory of probability, quantiles are cutpoints dividing the
range of a probability distribution into contiguous intervals with equal probabilities, or
dividing the observations in a sample in the same way. There is one less quantile than
the number of groups created. Thus quartiles are the three cut points that will divide a
dataset into four equal-size groups (cf. depicted example). Common quantiles have
special names: for instance quartile, decile (creating 10 groups: see below for more).
The groups created are termed halves, thirds, quarters, etc., though sometimes the
terms for the quantile are used for the groups created, rather than for the cut points.
q-Quantiles are values that partition a finite set of values into q subsets of (nearly) equal
sizes. There are q of the q-quantiles, one for each integer k satisfying 0 < k < q. In
some cases the value of a quantile may not be uniquely determined, as can be the case
for the median (2-quantile) of a uniform probability distribution on a set of even size.
Quantiles can also be applied to continuous distributions, providing a way to generalize
rank statistics to continuous variables. When the cumulative distribution function of a
random variable is known, the q-quantiles are the application of the quantile function
(the inverse function of the cumulative distribution function to the values { /q, /q, ,
q /q}.
1. .
2. .
ans) The power of a hypothesis test is the probability of correctly rejecting the null
hypothesis. Power is shown as the shaded area in the plot on the left and as a function of
the noncentrality parameter, , in the plot on the right. Power depends on the effect
size, , as well as the sample size, , and the significance level, .
Compared to simple random sampling and stratified sampling , cluster sampling has
advantages and disadvantages. For example, given equal sample sizes, cluster sampling
usually provides less precision than either simple random sampling or stratified
sampling. On the other hand, if travel costs between clusters are high, cluster sampling
may be more cost-effective than the other methods.
Delphi is based on the principle that forecasts (or decisions) from a structured group of
individuals are more accurate than those from unstructured groups.The technique can
also be adapted for use in face-to-face meetings, and is then called mini-Delphi or
Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and
has certain advantages over another structured forecasting approach, prediction
markets
ans) The linear function is popular in economics. It is attractive because it is simple and
easy to handle mathematically. It has many important applications.
y = f(x) = a + bx
A linear function has one independent variable and one dependent variable. The
independent variable is x and the dependent variable is y.
a is the constant term or the y intercept. It is the value of the dependent variable when x
= 0.
b is the coefficient of the independent variable. It is also known as the slope and gives
the rate of change of the dependent variable.
Kendall tau rank correlation coefficient, a measure of the portion of ranks that match
between two data sets.
Goodman and Kruskal's gamma, a measure of the strength of association of the cross
tabulated data when both variables are measured at the ordinal level.
ans) In statistics, non-sampling error is a catch-all term for the deviations of estimates
from their true values that are not a function of the sample chosen, including various
systematic errors and random errors that are not due to sampling. Non-sampling errors
are much harder to quantify than sampling errors.
Coverage errors, such as failure to accurately represent all population units in the
sample, or the inability to obtain information about all sample cases;
ans) Sampling takes on two forms in statistics: probability sampling and non-
probability sampling:
Advantages
Cluster sampling: convenience and ease of use.
Simple random sampling: creates samples that are highly representative of the
population.
Stratified random sampling: creates strata or layers that are highly
representative of strata or layers in the population.
Systematic sampling: creates samples that are highly representative of the
population, without the need for a random number generator.
Disadvantages
Cluster sampling: might not work well if unit members are not homogeneous (i.e.
if they are different from each other).
Simple random sampling: tedious and time consuming, especially when creating
larger samples.
Stratified random sampling: tedious and time consuming, especially when
creating larger samples.
Systematic sampling: not as random as simple random sampling,
Q27) Normal distribution
ans) In probability theory, the normal (or Gaussian) distribution is a very common
continuous probability distribution. Normal distributions are important in statistics and
are often used in the natural and social sciences to represent real-valued random
variables whose distributions are not known.
The normal distribution is useful because of the central limit theorem. In its most
general form, under some conditions (which include finite variance), it states that
averages of samples of observations of random variables independently drawn from
independent distributions converge in distribution to the normal, that is, become
normally distributed when the number of observations is sufficiently large. Physical
quantities that are expected to be the sum of many independent processes (such as
measurement errors) often have distributions that are nearly normal.Moreover, many
results and methods (such as propagation of uncertainty and least squares parameter
fitting) can be derived analytically in explicit form when the relevant variables are
normally distributed.
The acceptance of alternative hypothesis depends on the rejection of the null hypothesis
i.e. until and unless null hypothesis is rejected, an alternative hypothesis cannot be
accepted.
ans) In statistics, the standard deviation (SD, also represented by the Greek letter sigma
or the Latin letter s is a measure that is used to quantify the amount of variation or
dispersion of a set of data values. A low standard deviation indicates that the data points
tend to be close to the mean (also called the expected value) of the set, while a high
standard deviation indicates that the data points are spread out over a wider range of
values.
Q30) Explain the concept of the power curve of a test and p-value of a test.
ans) Generally your null hypothesis is a value for some parameter (of e.g. a population
or family of random variables). You want to reject the null hypothesis if the data you
have is sufficiently unlikely if the null hypothesis were true.
A hypothesis test does this using a test statistic and a rejection region - with the null
hypothesis rejected if the test statistic falls in the rejection region. The rejection region
is typically parameterized by a value , the significance level. This value is the
probability, if the null hypothesis is true, that your test statistic falls in the rejection
region. This is called a false positive or type-I error. The idea is that you can choose a
value for that represents an acceptable chance of a false rejection.
Now, there is also a true value for the parameter in the null hypothesis. Given this value
as well as the rejection region for the test, there is some probability of rejecting the null
hypothesis. This probability (often referred to as ) is the power of the test.
Since the true value is unknown, the power of a test can also be described with a
curve, (x) x , where xx is the supposed to true value for the statistic and (x) x is
the probability of rejecting the null hypothesis for that value of xx. Notice that if one
chooses a larger , the rejection region grows, and so does (x) x for each xx.
The power of a test often also refers to the size of (x) x as xx moves away from the
null hypothesis. Two tests with the same significance level can be compared in terms of
their power curves, with one test deemed more powerful if it has a higher when xx is
not the null hypothesis.
If you perform a test, the p-value is the smallest for which the test would reject the
null hypothesis. That is, it's the value of the significance level for which the test statistic
would have been on the boundary of the rejection region. Notice that this does not
depend on the chosen significance level, so increasing by increasing will have no
effect on the p-value.
The p-value is also often used in place of the rejection region for the test statistic [1] to
determine whether to reject the null hypothesis. In this case it is rejected if and only
if p< p< . So the power describes the probability that p< p< . Then if a test has a
larger than another test for a fixed , we can see that the more powerful test will tend
to generate smaller p-values.
Q31) Explain the Gauss Jordan Method for solving a system of linear simultaneous
equation with the help of an example.
ANS) Let's start simple, and work our way up to messier examples.
5x + 4y z = 0
10y 3z = 11
z=
3
10y 3(3) = 11
10y 9 = 11
10y = 20
y=2
5x + 4(2) (3) = 0
5x + 8 3 = 0
5x + 5 = 0
5x = 5
x = 1
The reason this system was easy to solve is that the system was "triangular"; this refers
to the equations having the form of a triangle, because of the lower equations containing
only the later variables.
The point is that, in this format, the system is simple to solve. And Gaussian elimination
is the method we'll use to convert systems to this upper triangular form, using the row
operations we learned when we did the addition method.
3x + 2y 6z = 6
5x + 7y 5z = 6
x + 4y 2z = 8
The first thing to do is to get rid of the leading x-terms in two of the rows. For
now, I'll just look at which rows will be easy to clear out; I can switch rows later
to get the system into "upper triangular" form. There is no rule that says I have to
use the x-term from the first row, and, in this case, I think it will be simpler to use
the x-term from the third row, since its coefficient is simply "1". So I'll multiply
the third row by 3, and add it to the first row. I do the computations on scratch
paper:
Warning: Since I didn't actually do anything to the third row, I copied it down,
unchanged, into the new matrix of equations. I used the third row, but I didn't
actually change it. Don't confuse "using" with "changing".
To get smaller numbers for coefficients, I'll multiply the first row by one-half:
Now I'll multiply the third row by 5 and add this to the second row. I do my
work on scratch paper:
...and then I write down the results: Copyright Elizabeth Stapel 2003-2011 All
Rights Reserved
I didn't do anything with the first row, so I copied it down unchanged. I worked
with the third row, but I only worked on the second row, so the second row is
updated and the third row is copied over unchanged.
Okay, now the x-column is cleared out except for the leading term in the third
row. So next I have to work on the y-column.
Warning: Since the third equation has an x-term, I cannot use it on either of the
other two equations any more (or I'll undo my progress). I can work on the
equation, but not with it.
If I add twice the first row to the second row, this will give me a leading 1 in the
second row. I won't have gotten rid of the leading y-term in the second row, but I
will have converted it (without getting involved in fractions) to a form that is
simpler to deal with. (You should keep an eye out for this sort of simplification.)
First I do the scratch work:
Now I can use the second row to clear out the y-term in the first row. I'll multiply
the second row by 7 and add. First I do the scratch work:
I can tell what z is now, but, just to be thorough, I'll divide the first row by 43.
Then I'll rearrange the rows to put them in upper-triangular form:
y 7(1) = 4
y 7 = 4
y=3
x + 4(3) 2(1) = 8
x + 12 2 = 8
x + 10 = 8
x = 2
Note: There is nothing sacred about the steps I used in solving the above system; there
was nothing special about how I solved this system. You could work in a different order
or simplify different rows, and still come up with the correct answer. These systems are
sufficiently complicated that there is unlikely to be one right way of computing the
answer. So don't stress over "how did she know to do that next?", because there is no
rule. I just did whatever struck my fancy; I did whatever seemed simplest or whatever
came to mind first. Don't worry if you would have used completely different steps. As
long as each step along the way is correct, you'll come up with the same answer.
In the above example, I could have gone further in my computations and been more
thorough-going in my row operations, clearing out all the y-terms other than that in the
second row and all the z-terms other than that in the first row. This is what the process
would then have looked like:
This way, I can just read off the values of x, y, and z, and I don't have to bother with the
back-substitution. This more-complete method of solving is called "Gauss-Jordan
elimination" (with the equations ending up in what is called "reduced-row-echelon
form"). Many texts only go as far as Gaussian elimination, but I've always found it easier
to continue on and do Gauss-Jordan.
Q32) Mixed Auto-regressive moving average models
The model is usually referred to as the ARMA(p,q) model where p is the order of the
autoregressive part and q is the order of the moving average part
ans) The method of least squares is a standard approach in regression analysis to the
approximate solution of overdetermined systems, i.e., sets of equations in which there
are more equations than unknowns. "Least squares" means that the overall solution
minimizes the sum of the squares of the residuals made in the results of every single
equation.
The most important application is in data fitting. The best fit in the least-squares sense
minimizes the sum of squared residuals (a residual being: the difference between an
observed value, and the fitted value provided by a model). When the problem has
substantial uncertainties in the independent variable (the x variable), then simple
regression and least squares methods have problems; in such cases, the methodology
required for fitting errors-in-variables models may be considered instead of that for
least squares.
Least squares problems fall into two categories: linear or ordinary least squares and
non-linear least squares, depending on whether or not the residuals are linear in all
unknowns. The linear least-squares problem occurs in statistical regression analysis; it
has a closed-form solution. The non-linear problem is usually solved by iterative
refinement; at each iteration the system is approximated by a linear one, and thus the
core calculation is similar in both cases.