You are on page 1of 26

Sheet: Introduction File: 246707238.xls.

ms_office Page 1 of 26
ExcelStats 2007.xls Version 1.0 2/15/08
Using Excel to do Statistics: Some Helpful Notes
John.O.McClain@cornell.edu
Johnson Graduate School of Management
Cornell University
Ithaca NY 14853
This workbook is intended for teaching purposes. You are welcome to use it in any manner,
and change it as you see fit. It comes without any guarantee whatsoever, and is distributed
free of charge.
This workbook tells you how to do a bunch of Statistics calculations using Excel. Excel has an Add-In
called the Analysis ToolPak. To find out if you have it working, go into the Tools menu and select Add-Ins.
For Excel 2007:
1. Click the Microsoft Office Button , and then click Excel Options.
2. Click Add-Ins, and then in the Manage box, select Excel Add-ins, and click Go.
3. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
4. In the same box, select the Analysis ToolPak - VBA check box, and then click OK.
5. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on
the Data tab.
Excel has 2 ways to do almost every statistical analysis, and in many cases this workbook illustrates both.
There are separate sheets for each topic listed below. You can find the sheets by selecting the appropriate
"tab" at the bottom of the screen.
Contents: These are the sheets in this workbook.
Introduction
Sorting
Frequencies & Graphs
Histogram
Scatter Plot
Descriptive Statistics
Rank & Percentile
Covariance
Correlation
Sampling
Confidence Intervals
One-Sample t-tests
Two-Sample t-tests
Regression
Additional File Available:
PredInt.xls: Contains a Visual Basic macro to do multiple regression with Prediction Intervals,
a feature that is not included in the Regression tool in the Analysis ToolPak.
If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you get
prompted that the Analysis ToolPak is not currently installed, click Yes to install it.
Sheet: Sorting File: 246707238.xls.ms_office Page 2 of 26
Sorting a Data Set
*** Sorting does not require the Data Analysis package.
Sorting changes the data set. If you want to be able to restore the original order of the data, begin by
numbering the data points. The first column of the Example Data Set contains these numbers.
Example Data Set
Sorted "Ascending" by a Sorted "Descending" by b
Numb. a b Numb. a b Numb. a b
1 1 2 1 1 2 6 5 11
2 2 3 2 2 3 8 6 10
3 3 4 11 2 5 9 5 7
4 4 3 3 3 4 7 8 6
5 3 5 5 3 5 5 3 5
6 5 11 10 3 4 11 2 5
7 8 6 4 4 3 3 3 4
8 6 10 6 5 11 10 3 4
9 5 7 9 5 7 2 2 3
10 3 4 8 6 10 4 4 3
11 2 5 7 8 6 1 1 2
Instructions for the Sort Menu Item
Select the data set. (Hold down the Left Mouse Button and drag the cursor over cells A6 to C17.)
On the Data tab, select Sort
Use the arrow next to the Sort By window to select a from the pull-down list and then Click OK
The results should look like the data in the first shaded area next to the Example Data Set above.

Now repeat the steps, except select b from the pull-down list, and Largest to Smallest under Order.
The results should look like the data in the second shaded area next to the Example Data Set.
To return the data set to its original order, repeat the steps, selecting Numb. from the pull-down list,
and select Smallest to Largest under Order.
The Add Level button may be used to resolve ties.
For example, select Sort By a and Smallest to Largest, and then click Add Level.
In the new boxes, select Then by b and Smallest to Largest.
Compare the results to the first shaded area next to the Example Data Set. The order
has been arranged so that b is ascending when a is constant. For example, look at the
three points for which a = 3. The values for b are 4, 4 and 5, whereas in the first shaded area they
are 4, 5 and 4.
Sheet: Sorting File: 246707238.xls.ms_office Page 3 of 26
Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 4 of 26
Counting and Graphing Frequency of Observations
Data may or may not be numerical. The four counting functions illustrated below take into account
COUNTA counts all entries, ignoring blanks.
COUNT counts only numbers, excluding blanks.
COUNTBLANK counts the number of blank cells.
COUNTIF counts the number of entries that match a specified condition.
Data Excel Functions:
a d a Entries = 15 =COUNTA($A$9:$A$24)
1 High Numbers = 13 =COUNT($A$9:$A$24)
2 Low Blanks = 1 =COUNTBLANK($A$9:$A$24)
3 Med
4 Med a Freq.
3 Med 1 3 =COUNTIF($A$9:$A$24 , "=" & E13)
High 2 5 =COUNTIF($A$9:$A$24 , "=" & E14)
5 Low 3 3
2 Med 4 1
1 Med 5 1 Range to be Counted:
2 Low 6 0 The Condition:
- Low
? Med d Entries = 16
1 High Numbers = 0
3 Low Blanks = 0
2 Low
2 Low d Freq.
Low 7 =COUNTIF($B$9:$B$24,"=" & E25)
Med 6 =COUNTIF($B$9:$B$24,"=" & E26)
High 3 =COUNTIF($B$9:$B$24,"=" & E27)
Graphing Frequencies
Frequencies may be graphed in several ways. We will illustrate two kinds of bar charts and a pie chart.
Standard Bar Chart (Column
Chart):
Excel has a Chart Wizard to help you. It works much faster if you select the range that contains your data
before you start making the graph. So begin by selecting the range E24:F27 above. Then,
On the Insert tab in the Charts group, select Column.
Then click on the first picture under 2-D Column
A chart appears, and your next job is to make it look like the figure above.
As long as the chart is selected, you should see Chart Tools at the top of the screen.
On the Layout tab, in the Labels group, click on Legend and select None
0
5
10
Low Med High
F
r
e
q
u
e
n
c
y

Distribution of Shipment Sizes
Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 5 of 26
click on Legend and select None
click on Axis Titles, select Primary Vertical Axis Title, Rotated Title
and then type the word Frequency. (This enters the title for the Y axis.)
click on Chart Title, select Above Chart, and type the words Distribution of Shipment Sizes.
Right-click on a blank spot to the right of the chart title, and select Font, and change the font size to 10.
Now you may move your chart and change its size. To move it, just click once on it and drag it to a
new location. To change the size, click once on it and use the "handles" (little black boxes) on the corners.
Horizontal Bar
Chart with Stacked
Bars:
Select the range that contains your data and labels. This is E24:F27 in the example.
On the Insert tab in the Charts group, select Bar.
Then click on the second picture under 2-D Bar
Important: On the Design tab, in the Data group, click the Switch Row/Column button.
Now you may resize and move the graph as you please.
0 5 10 15 20
Freq.
Low
Med
High
High
Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 6 of 26
Pie Chart:
Select the range that contains your data and labels, E24:F27.
On the Insert tab in the Charts group, select Pie
Then click on the first picture under 2-D Pie
Click on the title and hit the Delete key.
Click on the legend and hit the Delete key.
On the Layout tab in the Labels group, select Data Labels, More Data Label Options.
In the popup window, un-check all of the options except the following:
Under "Label Contains", select Category Name, Percentage, and Show Leader Lines
Under "Label Position" select Outside End
Low
44%
Med
37%
High
19%
Sheet: Histogram File: 246707238.xls.ms_office Page 7 of 26
Histograms
The Histogram tool in the Data Analysis package is a fast way to get a picture and table of the distribution
of your data. An example is shown below. Also shown are the Excel functions that give the same information.
NOTE: The Histogram tool cannot describe more than one variable at a time.
The term Bin refers to the Upper Limit of the range for which the frequency is calculated. Bin 2 in the table
below has frequency 6, because "Data" contains 6 values that are strictly greater than 1 and less-than-or-equal to 2.
Output from Histogram Tool Excel Functions
Data Bin Frequency Bin Freq. Cumul.
1 1 4 1 4 4
2 2 6 2 6 10
3 3 4 3 4 14
4 4 1 4 1 15
3 More 1 1000 1 16
1
5
2
1
2
3
2
1
3
2
2
Instructions for the Histogram Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Histogram (double-click on it, or select OK)
Select the Input Range window, and either type or select
the area that contains the data.
On the first try, leave Bin Range blank. Later you may
wish to customize the histogram by putting a range
into this box (an example is given later).
If your area includes names for the variables,
select the Labels check box.
If you want the results to be written on the current
worksheet, select the Output Range
button, then click on the window next to that button
and either type in or select a location for the output.
For example, if you type D8, the output will begin at
cell D8 and continue down and to the right.
Check Chart Output if you want Excel to create a graph.
Click OK
UNFORTUNATE NOTE: At the time of this writing, there is an unfixed bug in Excel. If you try to move the
chart created by Histogram, it will separate into two pieces. However, if you save the file, then close it, and then
open it again, the chart will remain in one piece. So I recommend doing that now.
0
1
2
3
4
5
6
7
1 2 3 4 More
F
r
e
q
u
e
n
c
y

Bin
Sheet: Histogram File: 246707238.xls.ms_office Page 8 of 26
Improving the appearance of the histogram (after saving, closing and reopening the file):
The chart above, created by the Histogram tool, has been modified to look better.
First, I changed what was displayed inside the chart:
Delete the title (right-click on it and select Delete).
Delete the legend (same way).
Center the plot area in the chart (click above one of the bars and drag).
Next, I changed its shape:
Single-click on the chart and drag one of the "handles" (little boxes in the corners).
Formatting the numbers on the axes:
Sometimes the histogram tool creates bins with many more decimal places than is necessary. This
has an unfortunate effect on the appearance of the horizontal axis, but it is easy to fix.
Since the problem did not occur in the example above, we first have to create the problem and then fix it.
To create the problem:
Select cell D9 and enter the formula = 1/3
Now look at the graph. Notice how the display has changed on the horizontal axis. Not very pretty, is it?
To fix the problem:
The format of the numbers on the chart is the same as their format on the spreadsheet. Therefore,
Select the range of numbers below the word "Bin" (cells D8:D13 in the example above).
On the Home tab, in the Number group, in the pull-down list, select More Number Formats
In the popup window, select Number from the list of options and change the decimal places to 2.
Notice that the numbers in the graph are now displayed with 2 decimal places, which looks better.
You can use this method for any axis on any Excel graph, displaying however many decimal places
are appropriate for the situation.
Using Bins that You Choose
To tell Excel what bins you want to use for the data,
put the Bin Range in this box.
Notice that I had to include one cell above the
desired range of bins, because the "Labels" box is
checked.
Output from Histogram Tool
Desired Bins Desired Bins Frequency
2 2 10
4 4 5
6 6 1
More 0
Sheet: Scatter Plot File: 246707238.xls.ms_office Page 9 of 26
Scatter Diagrams (Scatter Plots)
Scatter Plots offer a way to visualize the relationship between two variables. Excel's Chart Group
makes it fairly easy to construct one. An example is shown below.
Example Data Set:
a y b
1 33 2
2 23 3
3 14 4
4 55 3
3 3 5
5 44 11
8 35 6
6 98 10
5 41 7
3 77 4
2 8 5
Instructions for Scatter Plots
Follow these steps to reproduce the chart above. Notice that it plots a and b, but that in the data, variable
y is in the column between a and b.
Begin by selecting the data range. Click on cell A6. Then, holding down the Cntl key, click and drag to cell
A17; then, continuing to hold Cntl, click on C6 and drag to C17. This selects both a and b, leaving out y.
On the Insert tab, in the Charts group, select Scatter, and click on the first picture.
Right-click on the legend (on the right side of the chart) and select Delete
Click on the chart title and type "Plot of a vs. b"
Right-click on the title, select Font and change the font size to 12.
On the Layout tab, use the Axis Titles button to insert titles for both axes as shown.
On the Layout tab, use the Gridlines button to insert gridlines as shown.
Now you may move your chart and change its size. To move it, just click once on it and drag it to a
new location. To change the size, click once on it and use the "handles" (little black boxes).
0
2
4
6
8
10
12
0 2 4 6 8 10
b

a
Plot of a vs. b
Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 10 of 26
Descriptive Statistics
The Descriptive Statistics tool in the Data Analysis package is a fast way to get a bunch of numbers that
describe your data. An example is shown below, together with the built-in Excel functions that give the
same information. Copy the Excel Functions to the next column to get a description of variable b.
Example Data Set Output from Descriptive Statistics Tool Excel Functions:
a b a a
1 2
2 3 Mean 3.818182 3.818182
3 4 Standard Error 0.615234 0.615234
4 3 Median 3 3
3 5 Mode 3 3
5 11 Standard Deviation 2.040499 2.040499
8 6 Sample Variance 4.163636 4.163636
6 10 Kurtosis 0.260801 0.260801
5 7 Skewness 0.730477 0.730477
3 4 Range 7 7
2 5 Minimum 1 1
Maximum 8 8
Sum 42 42
Count 11 11
Largest(2) 6 6
Smallest(2) 2 2
Confidence Level(95.0%) 1.370826 1.370826
Instructions for the Descriptive Statistics Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Descriptive Statistics (double-click on it, or select OK)
Select the Input Range window, and either type or select the area that contains the data.
If your data is arranged so that each vertical column
represents a variable, select the Columns button.
If your input range includes names for the variables,
select the Labels In check box.
If you want the results to be written on the current
worksheet, select the Output Range button,
click on the window next to that button and
either type in or select a location for the output.
(If you type E6, the output will begin at cell E6
and continue down and to the right.)
Most Important: Check the Summary Statistics box.
Confidence Level for the Mean box gives a
"Confidence Level" in the output, which is equal to
half of the width of a confidence interval.
Kth Largest or Kth Smallest:
Checking the boxes and entering "2" as shown
causes the output to include the second smallest and
to include the second smallest and second largest
second largest values in the data set.
Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 11 of 26
Click OK
Sheet: Rank & Percentile File: 246707238.xls.ms_office Page 12 of 26
Rank and Percentile
The Rank and Percentile tool in the Data Analysis package is a fast way to get a copy of your data,
sorted from largest to smallest, with the associated ranks. An example is shown below.
The tool sorts the data descending order before it reports ranks and percent ranks. Therefore the output table
shows the data in a different order. For example, the first point in the input is seventh in the output table.
The two related Excel functions, RANK() and PERCENTRANK() are shown for the first 3 data points. The
first point in the Data is a =2, and this value is tied for 7th rank. That puts it at the 26.6 percentile
of the data. The second data point is a =4, which is in sole posession of rank of 2, percentile 93.3.
Data Related Excel Functions Output from Rank and Percentile Tool
a Point a Rank Percent
1 2 7 26.60% 7 5 1 100.00%
2 4 2 93.30% 2 4 2 93.30%
3 3 3 66.60% 3 3 3 66.60%
4 1 5 3 3 66.60%
5 3 11 3 3 66.60%
6 1 14 3 3 66.60%
7 5 1 2 7 26.60%
8 2 8 2 7 26.60%
9 1 10 2 7 26.60%
10 2 12 2 7 26.60%
11 3 15 2 7 26.60%
12 2 16 2 7 26.60%
13 1 4 1 13 .00%
14 3 6 1 13 .00%
15 2 9 1 13 .00%
16 2 13 1 13 .00%
Instructions for the Rank and Percentile Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Rank and Percentile (double-click on it, or select OK)
Select the Input Range window, and either type or select the area that contains the data.
If your data is arranged so that each vertical column
represents a variable, select the Columns button.
If your area includes names for the variables,
select the Labels check box.
If you want the results to be written on the current
worksheet, select the Output Range
button, then click on the window next to that button
and either type in or select a location for the output.
Make sure that the Output Range does not overlap with
the Input Range.
Click OK
=PERCENTRANK($B$11:$B$26,$B11)
=RANK($B11,$B$11:$B$26)
Sheet: Covariance File: 246707238.xls.ms_office Page 13 of 26
Sample Covariance
Covariance measures the degree to which things "vary together". In that regard it is almost the
same as correlation (see the next page). In fact, correlation is more useful for quantifying the
relationship between two variables. The most common use of Covariance is when you are adding
two random variables, such as when you are forming a portfolio of different stocks.
Unfortunately, Excel does not offer an "unbiased" sample estimate of covariance. This is an error that should
have been remedied long ago, but Microsoft has not seen fit to fix it. To understand the problem, consider
Excel's variance function. There are two versions: Sample variance VAR() and Population Variance VARP().
Both of these compute the sum of squared differenced from the sample mean. However, Sample Variance
corrects for a statistical bias by dividing that sum by (n-1), where n is the size of the sample. Population Variance
divides by n, and therefore gives a smaller answer. Population Variance is correct ONLY IF the sample
is, in fact, the entire population. Sample Variance is appropriate when the sample is a small fraction of the
population, which is the more usual case.
To be consistent, Excel should have called their covariance function COVP() or the Population Covariance,
and should change the definition of COV() to Sample Covariance and calculate it using (n-1).
Until they make such a change, you can obtained unbiased estimates of covariance by multiplying Excel's
values by the ratio n/(n-1). The example below, in red, does this correction.
Finally, please note that the diagonal values in the covariance table are variances . Thus, 1.25 is the Population
variance of a and 1.66667 is the Sample Variance of a .
Example Data Set with n= 4
a b c Covariance Data Analysis Tool
1 2 5 a b c
2 3 4 a 1.25
3 5 2 b 1 1.25
4 4 3 c -1 -1.25 1.25
Covariance Excel Function (Population Covariance) Sample Covariance: Excel Function multiplied by n/(n-1)
a b c a b c
a 1.25 a 1.6666667
b 1 1.25 b 1.3333333 1.6666667
c -1 -1.25 1.25 c -1.333333 -1.666667 1.6666667
Instructions for the Covariance Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Covariance (double-click on it, or select OK)
Select the Input Range window, and either type or select the area that contains the data.
If your data is arranged so that each vertical column
represents a variable, select the Columns button.
Otherwise, select the Rows button.
If your area includes names for the variables,
select the Labels in first row check box.
If you want the results to be written on the current
worksheet, select the Output Range
button, then click on the window next to that button
and either type in or select a location for the output.
For example, if you type F15, the output will begin at
Sheet: Covariance File: 246707238.xls.ms_office Page 14 of 26
cell F15 and continue down and to the right.
Make sure that the Output Range does not overlap with
the Input Range.
Click OK
Remember, if you want Unbiased estimates, multiply Excel's Covariance by n/(n-1).
Sheet: Correlation File: 246707238.xls.ms_office Page 15 of 26
Sample Correlation
Correlation is a way to quantify a linear relationship between variables. The value of correlation
is between -1 and +1. Positive correlation means that the variables tend to move in the same
direction. That is, if one variable is above its mean, the other one is likely to be above its mean, too.
Height and weight of people are positively correlated, because very tall people usually weigh more
than very short people. Note that this is not always true, so the correlation is less than +1.0.
Negative correlation means that they tend to move in opposite directions. Mountain climbers know
that there is a negative correlation between altitude and stamina, because of decreasing oxygen.
Correlation of +1 or -1 means that the relationship between the two variables is perfectly linear.
When this happens, a "scatter plot" of the two variables yields a straight line. In the example below,
variables b and c have correlation of -1.
Example Data Set: Correlation Data Analysis Tool:
a b c a b c
1 2 5 a 1
2 3 4 b 0.8 1
3 5 2 c -0.8 -1 1
4 4 3
Correlation Excel Function:
a b c
a 1
b 0.8 1
c -0.8 -1 1
Instructions for the Correlation Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Correlation (double-click on it, or select OK)
Select the Input Range window, and either type or select the area that contains the data.
If your data is arranged so that each vertical column
represents a variable, select the Columns button.
Otherwise, select the Rows button.
If your area includes names for the variables,
select the Labels check box.
If you want the results to be written on the current
worksheet, select the Output Range
button, then click on the window next to that button
and either type in or select a location for the output.
Make sure that the Output Range does not overlap
with the Input Range.
Click OK
0
2
4
6
0 2 4 6
b
a 0
2
4
6
0 2 4 6
c
b
Sheet: Sampling File: 246707238.xls.ms_office Page 16 of 26
Random Sampling
Example Data Set: Example After Sorting:
FUND RandNo FUND RandNo
Benchmarrk Div 0.999969 Freedom Cash 0.078951
Bradford 0.172857 Capital Cash 0.082888
BT INstit Treas 0.263466 Fortis 0.110691
Capital Cash 0.082888 Flex-fund 0.119541
Fidelity Cash 0.275826 Nationwide 0.165838
Flex-fund 0.119541 Bradford 0.172857
Fortis 0.110691 MarketWatch 0.183844
Freedom Cash 0.078951 Piermont Money 0.220191
Galaxy Money 0.291818 BT INstit Treas 0.263466
MarketWatch 0.183844 Fidelity Cash 0.275826
Nationwide 0.165838 NCC Funds 0.27604
NCC Funds 0.27604 Galaxy Money 0.291818
Piermont Money 0.220191 Benchmarrk Div 0.999969
To select a random sample of size n,
Put random numbers into the column next to the data set (instructions given below).
Select the first random number and then go to the Data tab and press this button:
(See alternative instructions on worksheet Sorting.)
Your sample is the first n rows.

Here is how to put random numbers into cells B17:B29:
Tab: Data, Analysis group, Data Analysis, Random Number Generation
Number of Variables: (leave blank)
Number of Random Numbers: (leave blank)
Distribution: Uniform
Parameters Between: 0 and 1
Output Range: B17:B29

For a Sample of 8,
choose the first 8 after
sorting on the Random
Numbers.
Sheet: Confidence Intervals File: 246707238.xls.ms_office Page 17 of 26
Confidence Intervals
There are two ways to do confidence intervals: use Built-in Excel functions , or use information from
the Descriptive Statistics tool in the Data Analysis package. They are both illustrated below.
Confidence Intervals from the Descriptive Statistics Data Analysis Tool.
First, generate the descriptive statistics (see the Descriptive Stats sheet in this workbook):
On the Data tab, the Analysis group, select Data Analysis, Descriptive Statistics
Select your data range,
Check the Confidence Level for the Mean box and enter your desired confidence level in the box,
Check the Summary Statistics box.
Click OK. You should get the output shown below.
Example Data Set Output from Descriptive Statistics Tool
a b a
1 2
2 3 Mean 3.818182
3 4 Standard Error 0.615234
4 3 Median 3
3 5 Mode 3
5 11 Standard Deviation 2.040499
8 6 Sample Variance 4.163636
6 10 Kurtosis 0.260801
5 7 Skewness 0.730477
3 4 Range 7
2 5 Minimum 1
Maximum 8
Sum 42
Count 11
Confidence Level (95.0%) 1.370826
Then, to get the confidence interval, add and subtract the "Confidence Level" from the "Mean".
Calculations: Lower Confidence Limit: 2.447356 = E16 - E29
Upper Confidence Limit: 5.189008 = E16 + E29
Interpretation: We have 95% confidence that the population mean for variable a
is in the interval 2.447 to 5.189.
Confidence Intervals using Built-in Excel Functions.
Basic Calculations: Average: 3.818182 =AVERAGE(A15:A25)
Standard Deviation: 2.040499 =STDEV(A15:A25)
Sample Size, n: 11 =COUNT(A15:A25)
Probability Calculations: Confidence: 0.95
Student's t (2-tail): 2.228139 =TINV(1-E42,E41-1)
The Confidence Interval: Lower Confidence Limit: 2.447356 =E39-E43*E40/SQRT(E41)
Upper Confidence Limit: 5.189008 =E39+E43*E40/SQRT(E41)
Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 18 of 26
One-Sample t-Test
The easiest way to do a One-Sample t-Test in Excel is to use a Confidence Interval. However, this method
does not give a p-value directly. The second method is to construct the test statistic and compare it to a
critical value. The test statistic can be used to compute a p-value. Both methods are illustrated below.
One-Sample t-Tests using Confidence Intervals
Two-tail test: set up a (1 - a) confidence interval (see the sheet Confidence I ntervals for instructions)
and reject H
0
(the Null Hypothesis) if the value specified in H
0
is outside the confidence interval .
One-tail test: use a confidence level of (1 - 2a) and reject H
0
if the value specified in H
0
is
outside of the confidence interval in the direction predicted by the Alternative Hypothesis , H
a
.
Example Data
a Calculation of the Confidence I ntervals:
1 Two-tail test: For a = 0.05, One-tail test: For a = 0.05,
2 set up a 95% confidence interval: set up a 90% confidence interval:
3 Average: 3.8181818 Average: 3.8181818
4 Standard Deviation: 2.040499 Standard Deviation: 2.040499
3 Sample Size, n: 11 Sample Size, n: 11
5 Confidence: 0.95 Confidence: 0.90
8 Student's t (2-tail): 2.2281389 Student's t (2-tail): 1.8124611
6 Lower Confidence Limit: 2.4473559 Lower Confidence Limit: 2.7030948
5 Upper Confidence Limit: 5.1890077 Upper Confidence Limit: 4.9332688
3
2
Hypothesis Tests using Confidence I ntervals:
In the following examples, assume that 4.4 has been given as the value to use in the null hypothesis.
(This is the value often referred to as m
0
. Thus, m
0
= 4.4 in the examples.)
Two-tail test: One-tail test:
Example 1: H
0
: m = 4.4, H
a
: m < > 4.4 (not equal) Example 2: H
0
: m < 4.4, H
a
: m > 4.4
Reject H
0
if 4.4 is outside the confidence interval. Reject H
0
if 4.4 is above the upper confidence limit.
Result: 4.4 is between 2.447 and 5.189, Result: 4.4 is not above 4.933,
so the null hypothesis is NOT rejected. so the null hypothesis is NOT rejected.
Example 3: H
0
: m > 4.4, H
a
: m < 4.4
Reject H
0
if 4.4 is below the lower confidence limit.
Result: 4.4 is not below 2.703
so the null hypothesis is NOT rejected.
Note: If you choose to do a one-tail
test, you must do one or the other of
these, NEVER BOTH.
Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 19 of 26
One-Sample t-Tests using the Test Statistic
The test statistic is (sample average - m
0
)/(standard error). The critical level is the value from the
Student's t distribution. There are two ways to test the hypothesis (they give the same result):
Hypothesis test using the Test Statistic:
For 2 tails, reject H
0
if the test statistic is larger in absolute value than the critical level.
For 1 tail, reject H
0
if the test statistic is larger than the critical level in the direction predicted by H
a
.
Hypothesis test using the P-values:
For 2 tails, reject H
0
if the p-value is smaller than a.
For 1 tail, reject H
0
if the p-value is smaller than a and the direction is consistent with H
a
.
Basic Calculations: Average: 3.8181818
Standard Deviation: 2.040499
Sample Size, n: 11
Probability Calculations: Hypothesized Value, m
0
: 4.4
a: 0.05
t-ratio, or Test Statistic: -0.945687
p-value, one-tail: 0.183299
t Critical one-tail: 1.8124611
p-value, two-tail: 0.3665981
t Critical two-tail: 2.2281389
Tests using the Test Statistic (t-ratio):
Two-tail test: One-tail test:
Example: H
0
: m = 4.4, H
a
: m < > 4.4 (not equal) Example: H
0
: m < 4.4, H
a
: m > 4.4
Result: absolute value of t-ratio of 0.94569 is Result: sample average of 3.818 is below 4.4,
smaller than the critical value of 1.81246, which IS NOT consistent with H
a
,
so the null hypothesis is NOT rejected. so the null hypothesis is NOT rejected.
Example: H
0
: m > 4.4, H
a
: m < 4.4
Result: sample average of 3.818 is below 4.4,
which IS consistent with H
a
,
but the absolute value of the t-ratio of 0.94569 is
smaller than the critical value of 2.22814
so the null hypothesis is NOT rejected.
Same Tests, using the p-value
Two-tail test: One-tail test:
Example: H
0
: m = 4.4, H
a
: m < > 4.4 (not equal) Example: H
0
: m < 4.4, H
a
: m > 4.4
Result: p-value of 0.3666 is larger than 0.05 Result: sample average of 3.818 is below 4.4,
so the null hypothesis is NOT rejected. which IS NOT consistent with H
a
,
so the null hypothesis is NOT rejected.
Example: H
0
: m > 4.4, H
a
: m < 4.4
Result: sample average of 3.818 is below 4.4,
which IS consistent with H
a
,
but the p-value of 0.1833 is larger than 0.05,
so the null hypothesis is NOT rejected.
Note: Test Statistic and P-value
ALWAYS GIVE THE SAME RESULT.
Note: Test Statistic and P-value
ALWAYS GIVE THE SAME RESULT.
Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 20 of 26
Two-Sample t-Tests.
There are three t-Tests in the Excel Data Analysis Tools, and each has a corresponding built-in function.
Data Analysis Tool: Excel Spreadsheet Formula:
t-Test: Paired Two-Sample for Means = TTEST(Array1, Array2, Tails, 1)
t-Test: Two-Sample Assuming Equal Variances = TTEST(Array1, Array2, Tails, 2)
t-Test: Two-Sample Assuming Unequal Variances = TTEST(Array1, Array2, Tails, 3)
The formulas above give only the p-value for the test. The Data Analysis Tools give the complete analysis.
t-Tests using the built-in function TTEST(array1, array2, tails, type)
Example Data Set Paired
Equal s Unequal s
a b Hypothesized Difference 0 0 0
1 2 p-value, one-tail: 0.016746 0.069742 0.0705883
2 3 p-value, two-tail: 0.033492 0.139485 0.1411765
3 4
4 3 TTEST(Array1, Array2, Tails, Type)
3 5 Array1 is the first data set.
5 11 Array2 is the second data set.
8 6 Tails = 1 for a one-tail test, or 2 for a two-tail test
6 10 Type = 1 for a Paired Two-sample test
5 7 Type = 2 for a Two-sample test assuming Equal variance
3 4 Type = 3 for a Two-sample test assuming Unequal variance
2 5 Example: = TTEST( $A$14:$A$24, $B$14:$B$24, 2, 1)
t-Tests using the t-Test Data Analysis tools
At this point you should be familiar with how to use the input boxes, so here is a brief list of the steps.
Tab: Data, group: Analysis, Data Analysis, t-Test: Two-Sample Assuming Equal Variances
Put the addresses of the two variables in their respective Input Range boxes.
In the Hypothesized Mean Difference box,
If your "null hypothesis" is that the two population means are equal, leave the box blank.
If your "null hypothesis" is that the two population means are different by a specified amount:
(In this case, the variable hypothesized to have the larger mean MUST BE "Variable 1". For example,
if Ho is that b has a larger mean than a, then Variable 1 Input Range must contain variable b.)
Then, type the hypothesized difference in the Hypothesized Mean Difference box.
For example, if the null hypothesis states that Variable 1's population mean is 7.4 units
larger than Variable 2's, enter 7.4 in the Hypothesized Mean Difference box.
If your Variable Ranges include a name for each variable, Check the Labels box.
The Alpha box is where you enter the type I error probability. (Excel's output does not report this
value, so be sure to note what value you used.)
Enter your Output Options in the usual way, and click OK.
Examples of each of the Tools are given on the next 2 pages.
Select a cell to see the
TTEST formula.
2-tails Paired Two-Sample test Array 1 Array 2
Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 21 of 26
t-Test: Paired Two Sample for Means, Hypothesized Diff. = 0
a b
Mean 3.818182 5.454545
Variance 4.163636 8.272727
Observations 11 11
Pearson Correlation 0.645926
Hypothesized Mean Difference 0
df 10
t Stat -2.46321 Built-in function TTEST
p-value, one-tail: P(T<=t) one-tail 0.016746 0.016746
t Critical one-tail 1.812462
p-value, two-tail: P(T<=t) two-tail 0.033492 0.033492
t Critical two-tail 2.228139
t-Test: Two-Sample Assuming Equal Variances, Hypothesized Diff. = 0
a b
Mean 3.818182 5.454545
Variance 4.163636 8.272727
Observations 11 11
Pooled Variance 6.218182
Hypothesized Mean Difference 0
df 20
t Stat -1.53897 Built-in function TTEST
p-value, one-tail: P(T<=t) one-tail 0.069742 0.069742
t Critical one-tail 1.724718
p-value, two-tail: P(T<=t) two-tail 0.139485 0.139485
t Critical two-tail 2.085962
t-Test: Two-Sample Assuming Equal Variances, Hypothesized Diff. = 1.2
b a
Mean 5.454545 3.818182
Variance 8.272727 4.163636
Observations 11 11
Pooled Variance 6.218182
Hypothesized Mean Difference 1.2
df 20
t Stat 0.410391 Built-in function TTEST
p-value, one-tail: P(T<=t) one-tail 0.342941 0.342941
t Critical one-tail 1.724718
p-value, two-tail: P(T<=t) two-tail 0.685882 0.685882
t Critical two-tail 2.085962
t-ratio or
Test Statistic
t-ratio or
Test Statistic
t-ratio or
Test Statistic
Variable a has a lower
Mean than variable b, so I
had to input b as the first
variable. If I had not done
this, the output would be
incorrect.
Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 22 of 26
t-Test: Two-Sample Assuming Unequal Variances, Hypothesized Diff. = 0
a b
Mean 3.818182 5.454545
Variance 4.163636 8.272727
Observations 11 11
Hypothesized Mean Difference 0
df 18
t Stat -1.53897 Built-in function TTEST
p-value, one-tail: P(T<=t) one-tail 0.070603 0.070588
t Critical one-tail 1.734063
p-value, two-tail: P(T<=t) two-tail 0.141207 0.141177
t Critical two-tail 2.100924
t-ratio or
Test Statistic
Sheet: Regression File: 246707238.xls.ms_office Page 23 of 26
Regression
Regression is a method to fit a linear function to a data set.
The objective is to estimate values of b0, b1 and b2 in the following equation:
y = b0 + b1 x1 + b2 x3
In this equation, y is called the Dependent Variable (sometimes called the Criterion Variable )
x1 and x2 are called the Independent Variables (or Predictor Variables ),
b0, is called the "intercept", and b1 and b2 are the "slopes".
Collectively, b0, b1 and b2 are referred to as the Coefficients. (This is their label in the output.)
The results for the Example Data Set are shown below the instructions.
Example Data Set:
y x1 x2
-3 2 5
2 3 4
11 5 2
9 6 4
8 4 3
Instructions for the Regression Data Analysis Tool.
On the Data tab, the Analysis group, select Data Analysis
Select Regression (double-click on it, or select OK)
Select the Input Y Range window, and select the area that contains the Dependent Variable.
Select the Input X Range window, and select the area that contains the I ndependent Variable(s).
(If there are 2 or more Independent Variables, they
must be side-by-side in the worksheet.)
You may specify Constant is Zero to force b0
(the intercept) to be zero.
If the first row of your area contains names for the
variables, select Labels in the First Row.
You may set a Confidence Level for the confidence
intervals for the coefficients.
If you want the results to be written on the current
worksheet, select the Output Range
button, then click on the window next to that button
and either type in or select a location for the output.
Make sure that the Output Range does not overlap
with the Input Range.
Select additional output and plots that you would like.
Excel's Normal Probability Plot is incorrect.
Click OK
Note: Graphs produced by Excel's Regression program are badly sized. However, it is easy to change the size
by clicking on the graph and dragging one of the corner "handles". Output is shown below.
http://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htm
As of this writing, Microsoft Excel's Regression package
malfunctions. I recommend using the PredictionInterval
macro that I wrote, which is in a file called PredInt.xls
available on the class web site, with instructions.
An alternative link for this file is:
Sheet: Regression File: 246707238.xls.ms_office Page 24 of 26
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99322
R Square 0.986486
Adjusted R Square 0.972973
Standard Error 0.948683
Observations 5
ANOVA
df SS MS F Significance F
Regression 2 131.4 65.7 73 0.0135135
Residual 2 1.8 0.9
Total 4 133.2
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 5.2 2.894823 1.79631 0.2142827 -7.2554266 17.655427 -7.2554266 17.6554266
x1 2.3 0.360555 6.379052 0.0237044 0.7486554 3.8513446 0.7486554 3.851344584
x2 -2.5 0.5 -5 0.0377496 -4.6513279 -0.3486721 -4.6513279 -0.348672137
RESIDUAL OUTPUT
ObservationPredicted y Residuals Standard Residuals
1 -2.7 -0.3 -0.447214
2 2.1 -0.1 -0.149071
3 11.7 -0.7 -1.043498
4 9 1.78E-15 2.65E-15
5 6.9 1.1 1.639783
One of the "Line Fit Plots" as produced
by Excel. Note that the plot is much "flatter"
than is customary, and it is a "Column Chart".
The other "Line Fit Plot" after changing
its height to a more suitable value,
and converting it to a "Line Chart".
-5
0
5
10
15
2 3 5 6 4
y

x1
x1 Line Fit Plot
y
Predicted y
-4
-2
0
2
4
6
8
10
12
14
5 4 2 4 3
y

x2
x2 Line Fit Plot
y
Predicted y
Sheet: Regression File: 246707238.xls.ms_office Page 25 of 26
Residual Plots are shown below.
One of the "Residual Plots" as produced
by Excel. Note that the plot is much "flatter"
than is customary. "Scatter Plots" are easier
to interpret if they are nearly square.
The other "Residual Plot" after changing
its height to a more suitable value.
NOTE:
Plots of Standardized Residuals are easily
obtained, if you clicked the Standardized
Residuals checkbox. Make copies of the
Residual Plots, and then change the data
source so that the dependent variable is
the standardized residuals.
This plot is an example for variable x2.
Note that I added gridlines as well.
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8
R
e
s
i
d
u
a
l
s

x1
x1 Residual Plot
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6
R
e
s
i
d
u
a
l
s

x2
x2 Residual Plot
-4
x2
-1
-0.5
0
0.5
1
1.5
2
0 1 2 3 4 5 6
R
e
s
i
d
u
a
l
s

x2 Standardized Residual Plot
Sheet: Regression File: 246707238.xls.ms_office Page 26 of 26
-1.5
-1
x2

You might also like