Regression and Statistical Tests

DIETHER PAUL A.
BELANDO GRADE 9-THESSALONIANS OCTOBER 24, 2019
A. Regression Definitions and Descriptions
What Is Regression?
Regression is a statistical measurement used in finance, investing, and other disciplines that attempts to
determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series
of other changing variables (known as independent variables).
Regression helps investment and financial managers to value assets and understand the relationships between
variables, such as commodity prices and the stocks of businesses dealing in those commodities.
Regression Explained
The two basic types of regression are linear regression and multiple linear regression, although there are non-
linear regression methods for more complicated data and analysis. Linear regression uses one independent
variable to explain or predict the outcome of the dependent variable Y, while multiple regressions uses two or
more independent variables to predict the outcome.
Regression can help finance and investment professionals as well as professionals in other businesses.
Regression can also help predict sales for a company based on weather, previous sales, GDP growth, or other
types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for
pricing assets and discovering costs of capital.
The general form of each type of regression is:
 Linear regression: Y = a + bX + u
 Multiple regression: Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u
Where:
 Y = the variable that you are trying to predict (dependent variable).

 X = the variable that you are using to predict Y (independent variable).
 a = the intercept.
 b = the slope.
 u = the regression residual.
There are two basic types of regression: linear regression and multiple linear regression.
Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical
relationship between them. This relationship is typically in the form of a straight line (linear regression) that
best approximates all the individual data points. In multiple regression, the separate variables are differentiated
by using numbers with subscripts.
Example of How Regression Analysis Is Used
Regression is often used to determine how many specific factors such as the price of a commodity, interest
rates, particular industries, or sectors influence the price movement of an asset. The aforementioned CAPM is
based on regression, and it is utilized to project the expected returns for stocks and to generate costs of capital.
A stock's returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta
for the particular stock.
Beta is the stock's risk in relation to the market or index and is reflected as the slope in the CAPM model. The
expected return for the stock in question would be the dependent variable Y, while the independent variable X
would be the market risk premium.
Additional variables such as the market capitalization of a stock, valuation ratios, and recent returns can be
added to the CAPM model to get better estimates for returns. These additional factors are known as the Fama-
French factors, named after the professors who developed the multiple linear regression model to better explain
asset returns.
DIETHER PAUL A. BELANDO GRADE 9-THESSALONIANS OCTOBER 24, 2019
B. Chi Square (χ2) Statistic Definitions and Descriptions
What Is a Chi Square Statistic?

A chi square (χ2) statistic is a test that measures how expectations compare to actual observed data (or model
results). The data used in calculating a chi square statistic must be random, raw, mutually exclusive, drawn from
independent variables, and drawn from a large enough sample. For example, the results of tossing a coin 100
times meet these criteria.
Chi square tests are often used in hypothesis testing.
What Does a Chi Square Statistic Tell You?
There are two main kinds of chi square tests: the test of independence, which asks a question of relationship,
such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks
something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?"
For these tests, degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on
the total number of variables and samples within the experiment.
For example, when considering students and course choice, a sample size of 30 or 40 students is likely not large
enough to generate significant data. Getting the same or similar results from a study using a sample size of 400
or 500 students is more valid.
In another example, consider tossing a coin 100 times. The expected result of tossing a fair coin 100 times is
that heads will come up 50 times and tails will come up 50 times. The actual result might be that heads comes
up 45 times and tails comes up 55 times. The chi square statistic shows any discrepancies between the expected
results and the actual results.
Example of a Chi Squared Test

Imagine a random poll was taken across 2,000 different voters, both male and female. The people who
responded were classified by their gender and whether they were republican, democrat, or independent. Imagine
a grid with the columns labeled republican, democrat, and independent, and two rows labeled male and female.
Assume the data from the 2,000 respondents is as follows:
Independen Tota
Republican Democrat
t l
Male 400 300 100 800
Femal
500 600 100 1200
e
Total 900 900 200 2000
The first step to calculate the chi squared statistic is to find the expected frequencies. These are calculated for
each "cell" in the grid. Since there are two categories of gender and three categories of political view, there are
six total expected frequencies. The formula for the expected frequency is:
n ( r ) x c (r )
E( r , c )=
n
r=row in question
c=column in question
n=corresponding total
In this example, the expected frequencies are:
 E(1,1) = (900 x 800) / 2,000 = 360

 E(1,2) = (900 x 800) / 2,000 = 360
 E(1,3) = (200 x 800) / 2,000 = 80
 E(2,1) = (900 x 1,200) / 2,000 = 540
 E(2,2) = (900 x 1,200) / 2,000 = 540
 E(2,3) = (200 x 1,200) / 2,000 = 120
Next, these are used values to calculate the chi squared statistic using the following formula:
where:O(r,c)=observed data for the given row and colu
mn
In this example, the expression for each observed value is:
 O(1,1) = (400 - 360)2 / 360 = 4.44

 O(1,2) = (300 - 360)2 / 360 = 10
 O(1,3) = (100 - 80)2 / 80 =5
 O(2,1) = (500 - 540)2 / 540 = 2.96
 O(2,2) = (600 - 540)2 / 540 = 6.67
 O(2,3) = (100 - 120)2 / 120 = 3.33
The chi squared statistic then equals the sum of these value, or 32.41. We can then look at a chi squared statistic
table to see, given the degrees of freedom in our set-up, if the result is statistically significant or not.
C. T-test Definitions and Descriptions
What is a T test?
The t-distribution, used for the t-test. Image: Carnegie Mellon.
The t test tells you how significant the differences between groups are; In other words it lets you know if those
differences (measured in means/averages) could have happened by chance.
A t-test is a type of inferential statistic used to determine if there is a significant difference between the means
of two groups, which may be related in certain features. It is mostly used when the data sets, like the data set
recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have
unknown variances. A t-test is used as a hypothesis testing tool, which allows testing of
an assumption applicable to a population.
A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to determine the probability
of difference between two sets of data. To conduct a test with three or more variables, one must use an analysis
of variance.
Explaining the T-Test
Essentially, a t-test allows us to compare the average values of the two data sets and determine if they came
from the same population. In the above examples, if we were to take a sample of students from class A and
another sample of students from class B, we would not expect them to have exactly the same mean and standard
deviation. Similarly, samples taken from the placebo-fed control group and those taken from the drug
prescribed group should have a slightly different mean and standard deviation.
Mathematically, the t-test takes a sample from each of the two sets and establishes the problem statement by
assuming a null hypothesis that the two means are equal. Based on the applicable formulas, certain values are
calculated and compared against the standard values, and the assumed null hypothesis is accepted or rejected
accordingly.
If the null hypothesis qualifies to be rejected, it indicates that data readings are strong and are not by chance.
The t-test is just one of many tests used for this purpose. Statisticians must additionally use tests other than the
t-test to examine more variables and tests with larger sample sizes. For a large sample size, statisticians use a z-
test. Other testing options include the chi-square test and the f-test.
There are three types of t-tests, and they are categorized as dependent and independent t-tests.
Example of a T-test
A very simple example: Let’s say you have a cold and you try a naturopathic remedy. Your cold lasts a couple
of days. The next time you have a cold, you buy an over-the-counter pharmaceutical and the cold lasts a week.
You survey your friends and they all tell you that their colds were of a shorter duration (an average of 3 days)
when they took the homeopathic remedy. What you really want to know is, are these results repeatable? A t test
can tell you by comparing the means of the two groups and letting you know the probability of those results
happening by chance.
D. Z-Test Definitions and Descriptions
What Is a Z-Test?
A z-test is a statistical test used to determine whether two population means are different when the variances are
known and the sample size is large. The test statistic is assumed to have a normal distribution, and nuisance
parameters such as standard deviation should be known in order for an accurate z-test to be performed.
A z-statistic, or z-score, is a number representing how many standard deviations above or below the mean
population a score derived from a z-test is.
How Z-Tests Work

Examples of tests that can be conducted as z-tests include a one-sample location test, a two-sample location test,
a paired difference test, and a maximum likelihood estimate. Z-tests are closely related to t-tests, but t-tests are
best performed when an experiment has a small sample size. Also, t-tests assume the standard deviation is
unknown, while z-tests assume it is known. If the standard deviation of the population is unknown, the
assumption of the sample variance equaling the population variance is made.
Hypothesis Test
The z-test is also a hypothesis test in which the z-statistic follows a normal distribution. The z-test is best used
for greater-than-30 samples because, under the central limit theorem, as the number of samples gets larger, the
samples are considered to be approximately normally distributed. When conducting a z-test, the null and
alternative hypotheses, alpha and z-score should be stated. Next, the test statistic should be calculated, and the
results and conclusion stated.
Example of One-Sample Z-Test
Assume an investor wishes to test whether the average daily return of a stock is greater than 1%. A simple
random sample of 50 returns is calculated and has an average of 2%. Assume the standard deviation of the
returns is 2.5%. Therefore, the null hypothesis is when the average, or mean, is equal to 3%.
Conversely, the alternative hypothesis is whether the mean return is greater than 3%. Assume an alpha of 0.05%
is selected with a two-tailed test. Consequently, there is 0.025% of the samples in each tail, and the alpha has a
critical value of 1.96 or -1.96. If the value of z is greater than 1.96 or less than -1.96, the null hypothesis is
rejected.
The value for z is calculated by subtracting the value of the average daily return selected for the test, or 1% in
this case, from the observed average of the samples. Next, divide the resulting value by the standard deviation
divided by the square root of the number of observed values. Therefore, the test statistic is calculated to be 2.83,
or (0.02 - 0.01) / (0.025 / (50)^(1/2)). The investor rejects the null hypothesis since z is greater than 1.96 and
concludes that the average daily return is greater than 1%.
E. Analysis of Variance (ANOVA) Definitions and Descriptions
What is Analysis of Variance (ANOVA)?Analysis of variance (ANOVA) is an analysis tool used in statistics
that splits an observed aggregate variability found inside a data set into two parts: systematic factors and
random factors. The systematic factors have a statistical influence on the given data set, while the random
factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the
dependent variable in a regression study.
The t- and z-test methods developed in the 20th century were used for statistical analysis until 1918, when
Ronald Fisher created the analysis of variance method. ANOVA is also called the Fisher analysis of variance,
and it is the extension of the t- and z-tests. The term became well-known in 1925, after appearing in Fisher's
book, "Statistical Methods for Research Workers." It was employed in experimental psychology and later
expanded to subjects that were more complex.
What Does the Analysis of Variance Reveal?

The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test is finished, an
analyst performs additional testing on the methodical factors that measurably contribute to the data set's
inconsistency. The analyst utilizes the ANOVA test results in an f-test to generate additional data that aligns
with the proposed regression models.
The ANOVA test allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them. The result of the ANOVA formula, the F statistic (also called the F-ratio),
allows for the analysis of multiple groups of data to determine the variability between samples and within
samples.If no real difference exists between the tested groups, which is called the null hypothesis, the result of
the ANOVA's F-ratio statistic will be close to 1. Fluctuations in its sampling will likely follow the Fisher F
distribution. This is actually a group of distribution functions, with two characteristic numbers, called the
numerator degrees of freedom and the denominator degrees of freedom.
Example of How to Use ANOVA

A researcher might, for example, test students from multiple colleges to see if students from one of the colleges
consistently outperform students from the other colleges. In a business application, an R&D researcher might
test two different processes of creating a product to see if one process is better than the other in terms of cost
efficiency.The type of ANOVA test used depends on a number of factors. It is applied when data needs to be
experimental. Analysis of variance is employed if there is no access to statistical software resulting in
computing ANOVA by hand. It is simple to use and best suited for small samples. With many experimental
designs, the sample sizes have to be the same for the various factor level combinations.
ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests. However, it
results in fewer type I errors and is appropriate for a range of issues. ANOVA groups differences by comparing
the means of each group and includes spreading out the variance into diverse sources. It is employed with
subjects, test groups, between groups and within groups.
One-Way ANOVA Versus Two-Way ANOVA

There are two types of ANOVA: one-way (or unidirectional) and two-way. One-way or two-way refers to the
number of independent variables in your analysis of variance test. A one-way ANOVA evaluates the impact of
a sole factor on a sole response variable. It determines whether all the samples are the same. The one-way
ANOVA is used to determine whether there are any statistically significant differences between the means of
three or more independent (unrelated) groups.
A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have one independent
variable affecting a dependent variable. With a two-way ANOVA, there are two independents. For example, a
two-way ANOVA allows a company to compare worker productivity based on two independent variables, such
as salary and skill set. It is utilized to observe the interaction between the two factors and tests the effect of two
factors at the same time

Regression and Statistical Tests

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression and Statistical Tests

Uploaded by

Copyright:

Available Formats

DIETHER PAUL A.

BELANDO GRADE 9-THESSALONIANS OCTOBER 24, 2019

A. Regression Definitions and Descriptions

The general form of each type of regression is:

 Y = the variable that you are trying to predict (dependent variable).

Example of How Regression Analysis Is Used

B. Chi Square (χ2) Statistic Definitions and Descriptions

What Is a Chi Square Statistic?

Example of a Chi Squared Test

 E(1,1) = (900 x 800) / 2,000 = 360

In this example, the expression for each observed value is:

 O(1,1) = (400 - 360)2 / 360 = 4.44

C. T-test Definitions and Descriptions

The t-distribution, used for the t-test. Image: Carnegie Mellon.

Explaining the T-Test

D. Z-Test Definitions and Descriptions

How Z-Tests Work

Example of One-Sample Z-Test

What Does the Analysis of Variance Reveal?

Example of How to Use ANOVA

One-Way ANOVA Versus Two-Way ANOVA

You might also like