You are on page 1of 6

Names: Shreya Maddireddy & Chris Lee

One Variable Statistics


Project
Step 7: Right now it’s just a bunch of dots. In fact, in the top right corner of your graph it should say dot plot.
Click on the arrows next to dot plot and change the graph to a histogram. Now change it to a box plot. Copy
and paste the dot plot, the box plot AND the histogram here and answer the questions:

1. What is the shape of the distribution? Does one type of graph show the shape more clearly than the others?
Explain.
The histogram shows the shape of the data more clearly in comparison to the others. The histogram
shows specifically which group of values has the greatest frequency. All the graphs, however, definitely
show a right skew. On the other graphs (histogram and dot plot) determining the outliers can only be
conjecture because it isn’t specifically indicated.

2. At first glance, does it look like there are any outliers?


The modified box plot does show outliers as indicated by the dots not connected to the right “whisker”.

Step 8: Repeat steps 6 and 7 for another attribute of your choice. Copy and paste your graphs here and answer
the following questions:
1. What is the shape of the distribution? Does one type of graph show the shape more clearly than the others?
Explain.
Despite all the graphs indicating that the data is symmetrical, the histogram shows it the clearest of all
because it indicates that the data is bimodal.

2. At first glance, does it look like there are any outliers?


Visually speaking, it does not look like there are any outliers.

Step 10: We are always interested in more than just the mean. Go up to the summary menu at the top of the
page and select “add basic statistics.” This should add some extra stuff to your summary box. Copy and paste
your summary box here and answer the following questions.
1. Which number refers to the standard deviation?
.00236263
2. What does the second number in the summary stand for?
The second number in the summary stands for the count, or the number of data points/.

Step 11: Go back up to the summary menu at the top and select “add five number summary.” Copy and paste
the new summary table here and answer the following questions:

1. What is the median batting average?


0.309
2. What is the IQR of the batting average data?
0.0135
3. Use the IQR to test for outliers. Are there any? If so, in which end of the distribution do they lie?
Low-end Outliers = None
High-End Outliers = 0.338, 0.358, 0.363

4. Using the standard deviation, calculate the percentage of data points that lie within one standard deviation
of the mean.
22.5%
6. Would standard deviation or IQR be a better indicator of spread? Explain your choice.
Standard deviation is a better indicator of spread because standard deviation shows how far each
individual data point is from the mean of the data. The IQR only shows the range of the middle 50% of
the data.

Step 13: One at a time, highlight the pitcher and catcher data and drag down a table for each one. Then drag
down a graph for each one. Let’s compare the salaries of catchers with the salaries of pitchers. Construct a
histogram for the salaries of catchers and one for the salaries of pitchers. Copy and paste those here and answer
the questions below:
1. Describe the shape of each distribution.
Both of the graphs show a right skew for the salaries of the pitchers and the catchers.
2. Do there appear to be any outliers?
For the catcher data there seems to be 2 outliers, while the pitcher data shows that there are 3 outliers.
3. Can we really compare these histograms? Why not? (Hint: look at the scales on the axes.)
NO, these graphs cannot be compared because the two graphs show different scales on the x and y axis.

Step 14: Let’s fix the scales so they match. Highlight the catcher histogram. Go up to the Graph menu and
select “Show Axis Links.” You should see little icons that look like metal links pop up in your graph. Move
your cursor over one and it should turn into a hand. Grab the link on the x-axis and drag it over to the x-axis on
your pitcher histogram. You should see the scale change to match the first one. Do the same thing with the y-
axis link. Copy and paste your new histograms here and answer the question below:
1. Did the scale make any difference in the overall shape?
Re-scaling the graphs did not make any significant difference in the shapes of the graphs. Both show the
same shape of a right skew.

Step 15: Drag down a summary table for each data set. Include in each summary table basic statistics as well
as the five number summary. Copy and paste those summary tables here and answer the following questions.
1. What is the mean salary for pitchers? For catchers?
Pitchers = $1.92391 e6
Catchers = $ 2.73056 e6
2. What is the median salary for pitchers? For catchers?
Pitchers = $675000
Catchers = $770000
3. What is the standard deviation for pitchers? For catchers?
Pitchers = 2.99481 e6
Catchers = 3.52807 e6
4. Would standard deviation or IQR be a better indicator of spread? Explain your choice.
Standard deviation is a better indicator of spread because standard deviation shows how far each
individual data point is from the mean of the data. The IQR only shows the range of the middle 50% of
the data.
5. Based on your answer to the previous question, determine if either data set has any outliers.
The catcher data has two outliers, while the pitcher data has three outliers.
6. Now that we have split the data for pitchers and catchers, make a few statements comparing and contrasting
the salaries of the two groups. (Hint: use your answers to the previous five questions.)
Generally, the catchers on average make more money (approximately .7 million dollars more). The
median of the catchers is also higher and the standard deviation is larger. This indicates that the data set
of the catchers is more spread out.

You might also like