You are on page 1of 4

Brandon Kline, Morgan Toal, and Marlayna Verenna

SMA 265 Applied Statistics


Project 2

PART I:

We were hired as data scientists in order to study the life cycle of a Kree Worm from
1964-1973. To do this, we were asked to test the hypothesis that the Kree Worm’s four
significant life cycles sync with the seasons. To do this, we created a regression analysis and
studied how the nitrate level affected the average number of worms in a tested cubic foot sample
of soil.

Table 1: Regression Analysis of the Kree Worm from 1964-1973

Looking at Table 1, the confidence level of 95% is shown for each of the dummy
variables. To test the hypothesis that the worms have four significant life cycles, we look to the
confidence levels of each of the dummy variables. If all three dummy variables can have an input
of zero, then that would imply that the seasons have no effect on the life cycles of the worms.
The only dummy variable whose confidence level contains zero is D3. This means that since the
other two dummy variables’ chances of being zero are low, the seasons do have an effect on the
life cycles of the Kree Worm.

The variable equation given from the simple linear regression model is:
y = -68.53040454 + 4.859514877x + 27.49757439D1 + 27.59674418D2 + 1.951723932D3
To predict the number of worms in the area today given a guessed Nitrate level of 29ppm, we
plug 29 into this equation for n and plug in the binary number that corresponds with the season in
the dummy variables. The binary number for the current season is 000, so when we plug our
values into the equation, we get p = -68.53040454 + 4.859514877(29). So p = 79.3955 worms.
Table 2: Box Plot of the Life Cycle of the Kree Worm from 1964-1973

Where N is the Nitrate level in ppm, and P is the average number of worms in a tested
cubic foot sample of soil.

PART II:

We were hired as data scientists to study the data for education expenditure by the
Department of Education for 1960 and 1970. To do this we created a regression model. A
regression model allows us to study the relationship between two or more variables. In each
regression model we study the variables: Y which is the per capita expenditure on public
education, X1 which is the per capita personal income, X2, which is the number of residents per
thousand under 18 years of age, and X3 which is the number of people per thousand residing in
urban areas. Below are Tables 4 and 5 which show the relationship of these variables on scatter
plots. Each of these variables is categorized into a dummy variable pertaining to their
geographical region. D1, D2, and D3 represent each of the dummy variables relating to each
region. Each region is represented by a string of three binary numbers. Region 1 (Northeast) is
000, region 2 (North Central) is 100, region 3 (South) is 010, and region 4 (West) is 001.
By creating a regression model, we can find the coefficients needed for our linear
equation. The equation gives the per capita expenditure on public education based on the per
capita personal income, number of residents per thousand under 18 years of age, number of
people per thousand residing and urban areas, and the string of binary numbers pertaining to the
corresponding geographical region.

Table 3: Regression Analysis for the Education Expenditure in 1960


The linear equation we get from the regression analysis is: Y = 8.338180943 +
0.038207715X1 + 0.002590275X2 - 0.023654895X3 + 15.93896786D1 + 10.82695655D2 +
33.08308066D3. This equation was taken from the coefficients section of the regression model.

Table 4: Regression Analysis for the Education Expenditure in 1970

The linear equation we get from the regression analysis is: Y = -177.14303 +
0.07877637X1 + 0.54047745X2 - 0.1239195X3 + 0.78393316D1 - 0.5681756D2 + 29.7941206D3.
This equation was taken from the coefficients section of the regression model.

Table 5: Education Expenditure in 1960


This scatter plot is representative of the per capita expenditure on public education versus
the three x valued variables in 1960. These being the variables listed above.

Table 6: Education Expenditure in 1970

This scatter plot is representative of the per capita expenditure on public education versus
the three x valued variables in 1970. These being the variables listed above.

You might also like