Statistics/Human Evolutionary Anatomy

Data Description, Analysis and Evolutionary Significance

T2 2016-17

You have spent that last 10 weeks learning about data and representations:
boxplots, scatter plots, linear regressions, outliers, influential points, and correlations. In
this data-saturated world, its really important to understand and apply your knowledge
to analyzing actual data. Weve done this in class: you collected some data (kneeling
heights and arm spans, for example) and you looked at some data about manatees, flight
times, growth rates. But, except for the data you collected, someone else scoured the
interwebs and books to find and organize that data for you. Data isnt always clean like
that. It doesnt always come in nice little matched up pairs. Someone has to do that work.

Your analysis should at least have
a table of your selected data and a paragraph explaining what the variables
represent
a box plot of each variable complete with a description (remember SOCS - shape,
a comparison of the variables by describing similarities and differences of the box
plots
the equation of the best fit (regression) line
an interpretation of the meaning of the slope in the context of your data
some predictions based on the linear model (using the equation to find a missing
value)
a thorough analysis about outliers and influential points (removing an outlier from
the data to see if it is influential)

1. Find your data! Which website(s) did you get it from?

https://ourworldindata.org/happiness-and-life-satisfaction/

https://ourworldindata.org/internet/

Year Internet_usersNA Happiness_usa

2006 229,295,552 7,182,000

2007 250,048,544 7,513,000
2008 250,578,896 7,280,000
2009 244,851,552 7,158,000
2010 249,116,528 7,164,000
2011 245,896,880 7,115,000
2012 263,516,048 7,026,000
2013 256,106,192 7,249,000
2014 263,777,120 7,151,000
2015 271,351,008 6,864,000

(Note: Survey question

from 0 at the bottom to 10 at
the top. The top of the ladder
represents the best possible
life for you and the bottom
worst possible life for you.
On which step of the ladder
would you say you
personally feel you stand at
this time?)

3. What do your variables represent?

Note: You might have a variable named cal that represents calories. This is the place to
explain that.

Year = Year that the data was taken from

Internet_usersna = Number of people who use the internet in North America, in millions
Happy_levels_usa = Happiness levels of people in the U.S.

Note: You can have more than one claim - and you probably will. But you need to have at
least one claim that is supported by your analysis of the data in order to be On Target.

Peoples access to information results in less innocent happiness and more stress.
Shape (skewed left, skewed right, Center (mean, median, mode)

uniform, symmetrical, bimodal)

Internet users:
Internet users: bimodal
Happiness_levels: bimodal Median: 250.313
Mean: 252.453
Mode: NA

Happiness levels:

Median: 7.161
Mean: 7.1702
Mode: NA

Internet Users: Internet Users: NA

Range: 42.056
IQR: 17.62 Happiness_levels_usa:
Standard Deviation: 11.9192 6.864
7.513
Happiness level:
Range: 0.649
IQR: 0.134
Standard Deviation: 0.168018

7. What do these center, shape, spread, outliers, and plot(s) tell you about the data.?
Both histograms are bimodal although there are several modes in the data

Part C: Two-variable Analysis

8. Scatterplot of the data, with linear regression line (line of best fit).
9. a. Equation of the linear regression line (line of best fit).

Note: Replace x and y with the appropriate variable names.

b. What is the correlation coefficient? How well do your variables correlate

(strong, moderate, weak, none, and negative or positive)?

10. A couple of predictions.

Note: Pick your a value for your x-variable and calculate the prediction for the y-variable
using your linear regression equation (from 9a). Show or explain how you did this.
11. Outliers and Influential Points

Note: Maybe your data set doesnt have any outliers. Say why you think this is true.
Maybe your data set has outliers. If it does, determine whether or not those outliers are
influential, and then explain. (Remember, a single variable could have an outlier while
the scatter plot doesnt, or vice versa.)

Happiness_levels_usa has two outliers, possibly because of an event in one year that
caused peoples happiness to spike or drop at a larger than usual amount.

(with outlier)
(without outlier)

There is not a drastic change in the equation.

Part D: Reasoning/Conclusion

Now its time to present the story of the data in a clear way.

12. Provide at least two pieces of evidence & reasoning that support your chosen
claim.
a. First piece of evidence:

b.
Reasoning:

Over time, as people gain more access to the internet and it becomes a human right,
it is possible that their ignorant bliss is taken away by the constant news that finds its
way to everyone through social media.

b. Second piece of evidence:

Reasoning:

Here you can see that as the internet access levels in North America per year increase,
the happiness levels of US citizens decrease.

13. Write a conclusion that supports your claim, and speaks about the
evolutionary significance.

Overall, people that have more access to the internet tend to be less happy, which tells
us that we should look more into what pushes us as humans to want that access so
much, since it is not benefitting us in the way of content.
Humans probably have a genetic predisposition to use screens, and this data tells us that
screen use ultimately affects our happiness negatively.

Further study of the connection between humans and causes of screen time addiction is
clearly warranted.
Standard On Target Above Target Grade

Statistics Represents data with appropriate Represents the data in a new and
& plots for single variables and also for interesting way that addresses a
Probabilit paired data question of your choosing
y
Representing Accurately describes the distributions Demonstrates a deep
& Describing of the data being studied understanding of the content
Data through a thorough analysis of
Interprets linear model, including the the data
correlation coefficient and making
predictions

Science & Explanation is well-written, Explanation makes a claim and

Math organized, and refers to the data and supports it with an eloquent,
Practices the visualization; makes a claim and concise argument based on the
Claim & supports it with evidence, analysis, data visualization and analysis.
Evidence and reasoning.

Heredity Data sets provide a reasonable Explanation of evidence is

and sample size nuanced and support or
Evolution refutation of claim is clear and
Multiple lines Correlation (or lack of correlation) explicit
of evidence between data is explicitly stated and
to support a related to the claim
hypothesis
or inference
in biological Claim is supported or refuted clearly
findings by evidence gathered and analyzed