You are on page 1of 4

ME 335 Lab Exercise

Goodness of Fit Test (Chi Squared Analysis)


Overview
This exercise uses a LabVIEW program to generate two sets of data which you will test
for adherence to the shape of a normal distribution, commonly referred to as the bell
curve. Specifically, you will learn to use EXCEL functions to:

Create histogram of observed data which provides a visual indication of the shape
of the data distribution

Compare both predicted and measured number of occurrences in each bin of the
histogram, and use the values to perform a goodness of fit assessment using the
Chi-Squared test

These tests will be applied to both sets of data that you generate, and you will be asked to
interpret the results.

Procedure
Items to be turned include a copy of the workbook created during this exercise and your
responses to the questions on the attached sheet (one workbook and one answer sheet
from each student) uploaded to Moodle.
1) Download the LabVIEW vi Goodness of fit Test Data Generator from Moodle
2) Run the vi and the program will create two data sets.
3) Right click the graph Dataset A and under the Export menu select Export Data
To Excel
4) This will open a temporary excel sheet. Copy the data from this temporary sheet to
your exercise file Goodness of Fit Exercise.xlsx and rename the sheet to Dataset
A
Hint: Open your Histogram exercise file; rename it for this current exercise.
Leave only one of the sheets and re-use the work did for the previous exercise
and update accordingly to do Step 8.
5) Repeat steps the steps for Dataset B and name the Excel sheet accordingly.
6) Close the temporary Excel Sheets (no need to save them) and the LabVIEW Data
Generator program.
7) In Excel. Go to File menu, click on Options, on the window that pops up Click on
Add-Ins, then at the bottom of the screen make sure the Manage option is set to
Excel Add-ins and click on the GO button. In the Window that pops up make sure
that the following options are selected:

8) Create Histograms for both data sets. Define the histogram bins (Calculate the
Number of Bins, Bin Size, Bin Min and Bin Max) and calculate the nj (measured
number of occurrences) column. You could count by hand the measured number of
values that occurred in each bin, but it is easier to use one of EXCELs built-in
functions. Use the FREQUENCY function to calculate the nj values for each bin.
The arguments for the FREQUENCY function are the range of data values and the
bin array values. Remember the procedure for an array function: put the cursor on the
cell where the formula is typed, press the mouse button and highlight the output range
for the array; click on the formula bar so that the cursor appears somewhere in the
formula that you just typed; and now press CTRL-SHIFT-ENTER simultaneously. As
a check, sum the measured number of occurrences over all the bins to make sure that
it equals the total number of points.
At this point you may want to start establishing some structure to your resulting
data to facilitate calculations in later steps. You may also start to use the Excel
functions you may know such as [=count(DataRange)] and if you know how to
name cells and cell ranges to use them in calculations do so. Ask if you need help
on this as this simplifies computation in Excel.
9) The next step in the analysis is to create a histogram using excels analysis tools.
Go to the Data tab and under the Analysis Tools choose Histogram. Select your
Xi dataset (without column heading) check the Chart Output box. Click on OK
and a table of values and chart should appear in a new worksheet. You may have to
resize the chart to make it more readable. A set of bin ranges has been automatically
created, and the number of values in each range is displayed. For now, observe the
histogram and make a preliminary judgment as to whether the data set exhibits
normally distribution.
10) Perform a goodness-of-fit analysis for the data set using the Chi-Squared test to test
the assumption that the data generated is normally distributed. Create the column nj
(predicted number of occurrences) for each bin. To do this, first calculate the mean
value and the standard deviation for your Dataset. Put the results in convenient cells
(include labels so the worksheet is legible).
11) To calculate the 2 parameter, you need to know the measured number of occurrences
(or frequencies) in each bin (nj), and the predicted number of occurrences in each bin
(nj).

The next step is to calculate the predicted number of occurrences in each bin, and
place these values in the column to the right of the nj values. These values are
calculated from integrating the normal distribution function over the range of values
corresponding to each bin. Fortunately, EXCEL has the built-in function NORMDIST
to automate this task.
Label one column as Predicted Distribution and in the first cell under that column
type =NORM.DIST(Bin Value, Mean, Standard Deviation, TRUE). This column will
contain the fraction of the expected distribution based on the statistics of the data.
The NORMDIST function is designed for a couple of similar tasks when the fourth
argument is entered as TRUE (or 1), it returns the integral of the normal error
function from minus infinity up to the x value defined in the first argument. This
integral, of course, corresponds to the probability of getting values in that range.
On the next column over label your data nj and to calculate the expected occurrences
multiply the Predicted distribution by the number of points in your dataset.
A similar approach is used for the other bins, but we must use differences of
probabilities generated by the NORMDIST function. The last bin is also a unique
formula the relevant probability will be 1 minus the NORMDIST function
evaluated at the second to last bin value.
Now sum the predicted values over all the bins does it equal the total number of
points? If not, you have an error somewhere. At this point you should examine the nj
and nj values to see if they are reasonably distributed over the bins. If not, try to
adjust the bin array values to achieve a more even distribution.
12) Now you can calculate the 2 parameter and evaluate the distribution. In another
column evaluate the terms for the Chi Squared sum, (nj nj)2 / nj for each bin.
13) To evaluate (2), we need to know the degrees of freedom (), which for our case is
equal to the number of bins minus 2 (since two parameters used to define the
distribution, the mean and standard deviation, were calculated from the data values).
You could use Table 4.5 in our text to provide a value for (2), but once again
EXCEL has a built-in function to simplify this process. In a cell, type the label
Alpha(Chi-Squared). In the cell below, type =CHIDIST(2, ). This function
returns the value of (2). (The first argument (2) is the cell reference to the value
of 2 ; the second argument is the degrees of freedom.)
14) Plot a histogram graph that includes both the observed distribution and the expected
distribution.
15) Repeat the steps above for data set B. Put your output for each step into new
worksheets as before, using appropriate sheet names.

ME 335 Lab Chi Squared Analysis Summary Report


Turn in your excel file, and a document containing a summary of the exercise in Moodle.
Make sure to comment on the result for both data sets.
Collaboration between group members is expected during the data generation/collection.
However, the answers to the questions below should reflect the work of the individual.
1) Compare the Histogram created manually to that created by Excels Data Analysis
tool.
2) Do the histograms appear to support the contention that the data set exhibits normal
distribution?

3) Discuss your conclusions from the Chi-Squared test. Does the probability value
support the hypothesis that the data set was drawn from a normal distribution? The
criteria for this exercise are that the alpha value for the data should be between 0.10
and 0.90 to support the assumed distribution as a reasonable model.
4) How does the datas Chi Squared Value compare to the Critical Chi Squared values?
Show what the result means graphically. Use the online Chi Squared Graph generator
found here: http://homepage.stat.uiowa.edu/~mbognar/applets/chisq.html

5) If not, does the column plot containing expected and observed distribution indicate
the nature of the deviation from normal behavior?

Make sure to support your observations by including the necessary graphs and tables.
Answer the questions with complete observations, no YES or NO answers. In short
show me that you have learned the tools reviewed from CH4

You might also like