You are on page 1of 7

Statistics Project

Directions: You will be working on your own to complete the following statistics project. You can
choose a topic of your interest. See examples below or you can choose a topic of your own (MUST
email Mrs. Fletcher to get approval first). Be sure ALL work is shown, sources are cited (if you
collect data from another source), pictures of graphs are included, and explanations are clear,
concise, and grammatically correct. This will count as a summative assessment. Please refer to the
rubric to see how this project will be graded.

Examples of Topics
1. NBA player’s height vs their weight.
2. NBA player’s points per game (PPG) vs number of years in the league.
3. Number of games a Quarterback wins vs his NFL salary.
4. Salary vs batting average for MLB players.
5. Hours of sleep the night before a test vs test grade.
6. Followers a singer has on Twitter vs amount of spotify downloads they have.
7. Amount a person exercises vs their body fat (BMI).
8. Number of people enlisting in the military vs the nation’s unemployment rate.
9. Years of schooling after High School vs Average salary.
10. Family size (# of immediate family members) vs GPA
11. Family size (# of immediate family members) vs travel experience (outside of US)
12. GPA vs travel experience (outside of US)

Part One: Introduction


1. State the topic of your project as a research question. Example: “Is there a relationship
between the wave height (in feet) and the number of surfers at the beach?”

Is there a correlation between the amount of monthly Spotify listeners (in billions) and
YouTube subscribers (in millions)

2. Why did you choose this topic? What specific result do you expect to find? What type of
correlation do you expect to see (positive or negative) and how strong do you expect the
correlation to be (weak, moderate, or strong)? Why do you expect this result (your rationale)?
Explain.
I chose this topic because I find it interesting how much people listen on Spotify versus
how many people watch the videos on YouTube. I expect to find a moderate positive
correlation. I expect this result because the more people that listen to the artist on Spotify
will flood to YouTube and subscribe to the channel.

3. Define the two variables for your topic of choice.


Example: Variable 1 (x): Wave height (feet); Variable 2 (y): Number of surfers

Variable 1 (x): Amount of Monthly Spotify Listeners

Variable 2 (y): Amount of Subscribers on YouTube

Part Two: Collect and organize the data.


● Option 1: Collect a random sample of 25 participants
● Option 2: Collect data from a reliable source - must be a random sample of 25 data points; be
sure to cite your source

1. Describe your data collection process and sampling strategy.


a. If you located data on a website, cite your source and tell how you selected individuals
from that website to include in your sample.
b. If you obtained data from an office, store, or other similar source, explain where you
went and how you selected individuals to include in your sample.
c. If you surveyed individuals directly, describe how you selected individuals for the
sample. If you used a survey, include a copy of your survey.

I’m going to collect my data on a website. I will find the top 100 listened to artists on
Spotify, then put those artists on a wheel to spin. I will take 25 of those artists and then look
at their respective YouTube account, and see how many subscribers there are.

2. Create a table to organize your data. Be sure to label your table appropriately.

Artist # of Monthly Spotify # of Youtube Subscribers


Listeners (millions) (millions)

Dua Lipa 67.1 19.8

DJ Snake 31.9 21.9

Nicky Jam 24.1 22.7


Pitbull 26.2 15.1

Demi Lovato 26.8 17.5

Selena Gomez 35.7 30.2

Billie Eilish 51.2 44.1

Harry Styles 34.2 11.4

Justin Bieber 88.6 66.6

Major Lazer 20.2 14.4

Arctic Monkeys 21.5 5.83

Diplo 29.0 2.4

Mac Miller 16.7 3.79

Halsey 37.8 11.2

Linkin Park 22.0 18.5

Rihanna 44.5 38.3

Bad Bunny 43.7 36.6

Red Hot Chili Peppers 21.5 6.24

Imagine Dragons 49.9 26.6

Jason Derulo 34.7 16.1

Maluma 30.3 28

P!nk 26.9 11.2

Beyonce 32.7 23.9

Shakira 31.5 34.5

Marshmello 39.7 54.4

Part Three: Statistical Analysis


1. Create a scatter plot (Use google sheets). Be sure to include a title for your scatter plot, labels
for both axes, and an appropriately numbered scale for both axes.
2. Based on the scatter plot, describe the correlation between the two variables. Does it indicate
positive, negative, or no relationship between the variables? How strong is the correlation
between the two variables? Is it weak, moderate, or strong? Use sentences to describe what
this relationship means in terms of your context.

The correlation is a weak positive correlation. This means that the more Spotify listeners
there are, the more YouTube subscribers there will most likely be.

3. Use google sheets to calculate the correlation coefficient (r). What do you get? Does the value
of r reinforce the impression conveyed by the scatter plot? Please include an interpretation
of its meaning.

The Correlation Coefficient is .710. This does reinforce it because it’s sort of strong in the
front, but weaker towards the back. The .710 means that it’s a positive moderate correlation.
A strong positive correlation would be 1, and no correlation would be 0. A strong negative
correlation would be -1, which means that when one variable increases, the other variable
decreases.
4. Graph the regression line and find the equation of the regression line using google sheets.

5. Discuss the slope of the regression line and interpret its meaning in terms of your context.

This means that for every .706 YouTube subscribers there are, there are 1 million monthly
listeners on Spotify

6. Use the regression line’s equation to make a prediction in terms of your variables or context
about a y-value given a specific x-value (be sure to pick an x-value that makes sense).

a. What do you want to predict? Explain using your variables or context. Example: I want
to predict how many surfers we can expect at the beach with waves that are 2 feet high.

I want to predict the amount of YouTube subscribers there are if they have 100
million monthly listeners

b. What x-value are you going to use? Why did you choose this x-value?

For my x-value I’m going to use 100. Because it’s in millions, I want to find the
number of subscribers if there are 100 million listeners.

c. Complete the table below.

Regression Equation:

X-value Plug into equation and show your work Y-value

100 y=.706(100) +-1.83 68.77

d. Explain the results of your prediction. What is the purpose of finding this prediction?
How would this information be beneficial? Who would benefit from these findings?
So if there are 100,000,000 monthly listeners, there would be about 68,770,000
YouTube Subscribers. The purpose of this is to find how many YouTube subscribers
there would be if there were 100,000,000 million monthly listeners. This would be
beneficial so we could find out if someone is above average or below average in their
music career. Justin Bieber has almost 68.77 million subscribers, but he only has 88.6
million monthly listeners. This means he’s doing above average in the music
industry.

Part Four: Results and Summary


1. Interpret the results of your statistical analysis in the context of your original research
question. Does your statistical analysis support your expected findings? Explain. (see rubric
for possible level 4 expectations)

Yes. It does. I believed it would be a moderate positive correlation, and it was. As more
people listen to the artist on Spotify, more people will like the artist. This means they might
have a music video of the song. This might cause them to watch the video, and subscribe to
the artists channel on YouTube

2. What conclusions, if any, do you believe you can draw as a result of your study? Were there
any surprises in the data collected? If the results were not expected, what factors might
explain your results? What did you learn about your variables? (see rubric for possible level
4 expectations)

There was a surprise. One of the outliers (Diplo) had 29 million monthly listeners on
Spotify, however, he only had 2.4 million subscribers. So he had a high amount of listeners,
but not a high amount of subscribers. Another outlier was Justin Bieber, who skyrocketed
in both. He had 88.6 million monthly listeners, and 66.6 subscribers. These are way above
average. This means he is the most popular artist of all time.

3. Connect your findings to the real world. Who would benefit from the results of your study?
Explain. (see rubric for possible level 4 expectations)

This would benefit people if they want to be in the music industry because they could see
their amount of monthly listeners on Spotify, and they could expect a certain amount of
YouTube subscribers. Another benefit of this would be to see how artists compare to each
other. You could also listen to the artists who are really popular and see what kind of music
they make and if they’re similar. If the top music is popular, then you would want to make
that kind of music to get more Spotify Listeners

Here is how you will be graded:


Score Reasoning

4 In addition to a 3 score, student includes sophisticated applications such as:


❏ Vocabulary terms and definitions used to support claims and explanations
❏ Connections made beyond the context of the topic and/or uses data to support claims
❏ In depth, critical analysis of data using values and specific examples to support explanations

3 INTRODUCTION
❏ Topic is described as a research question
❏ Variables are defined
❏ Predictions are described/explained
DATA
❏ Data collection process is explained
❏ Appropriate amount of data is collected
❏ Table of data is included with appropriate labels
ANALYSIS
❏ Scatter plot is included with appropriate labels
❏ Correlation is described and explained
❏ Correlation coefficient is calculated and its interpretation is included
❏ Graph of scatter plot is included with regression line AND equation of regression line displayed
❏ Slope of regression line equation is discussed and interpreted in terms of context
❏ Prediction is made with equation of regression line with work shown and explanation of results
RESULTS/SUMMARY
❏ Results of study are interpreted, explained, and related back to original research question
❏ Conclusion is explained
❏ Connections are made within the context of the topic

2 Minor errors or omissions present in analysis and/or conclusion.

1 Major errors or omissions present in analysis and/or conclusion.

0 Project is not turned in OR parts (or all) of project is copied from another student

You might also like