Professional Documents
Culture Documents
Assignment 5
Data Files needed for these problems are in the Attached Files.
Problems:
3.31
7+5+11+ 8+3+6+2+1+ 9+ 8 60
= =6
10 10
( 7−6 )2 + ( 5−6 )2 + ( 11−6 )2 + ( 8−6 )2 + ( 3−6 )2+ ( 6−6 )2+ ( 2−6 )2 + ( 1−6 )2 + ( 9−6 )2+ ( 8−6 )2 94 47
√ 10 √ √ 10
=
5
≈3.0659
3.33
a. Calculate the mean, variance, and standard deviation for this population.
276.49
Mean 02
76508.
Variance 84
276.60
Stand Dev 23
c. Compare your findings with what would be expected on the basis of the empirical rule. Are
The empirical rule is not a great estimate for the first standard deviation, but it is for the
second and third, this is most likely because the data is not symmetric.
3.37
a. Calculate the mean and standard deviation of the market capitalization for this population of
30 companies.
Mean 185.8
Standard Dev 131.461
Pop 3
This means that the average company has a market cap of 185.8 and we can expect about 68% of
companies to have between 54.3 and 317.2, going out more than one standard deviation in this
context does not make a lot of sense because we know the stock price can’t go below zero.
3.39
a. Does the study suggest that perceived usefulness of smartphones in educational settings
and use of smartphones for class purposes are positively correlated or negatively
correlated?
Since the students that use it more, reported a higher usefulness, they are positively correlated
purposes? Explain.
Yes, I think this is a clear cause and effect relation, the students who use their smartphone
more for class related activities are going to be seeing first hand how they are benefiting from
them, they are also biased to believe the way they use their time is constructive.
3.41
a. Calculate the covariance between first weekend gross and U.S. gross, first weekend gross
First Vs US 779.0137275
First Vs World 3501.558968
US vs World 5289.87611
b. Calculate the coefficient of correlation between first weekend gross and U.S. gross, first
weekend gross and worldwide gross, and U.S. gross and worldwide gross.
First Vs US 0.728417481
First Vs World 0.823319135
US vs World 0.96419656
c. Which do you think is more valuable in expressing the relationship between first weekend
gross, U.S. gross, and worldwide gross—the covariance or the coefficient of correlation?
Explain.
The coefficient of correlation seems to be the best indication of that at a glace since it is always
d. Based on (a) and (b), what conclusions can you reach about the relationship between first
The US vs World gross seems to be the most correlated out of the three data sets
3.44
What are the properties of a set of numerical data?
Numerical data sets have a shape, measures of central tendency and variation.
3.45
While it can be measured in several ways, mean, median or mode for example, it is a number that
3.46
What are the differences among the mean, median, and mode, and what are the advantages and
disadvantages of each?
The mean is the average of the set of data, the sum divided by the total number of points, while the
median tells you the center number of the data set, this has one advantage that it is not influenced by
extreme outliers. The mode tells you the most common element of a data set which is useful because it
3.47
How do you interpret the first quartile, median, and third quartile?
The first quartile is greater than the first 25% of the data, the median 50 and the third quartile 75%
3.48
Variation refers to the overall spread of the data, the more spread out the higher the variation and the more
3.49
What does the Z score measure?
3.50
What are the differences among the various measures of variation, such as the range, interquartile
range, variance, standard deviation, and coefficient of variation, and what are the advantages and
disadvantages of each?
The range is the distance from the largest number in the set to the smallest, which can tell you the general
sense of the kinds of numbers you are dealing with. The interquartile range tells you the middle 50% of the
data set, which gives you a sense of where “most” numbers in the set are. The variance tells you about the
spread of the distribution, clustered or spread out, standard deviation has the correct units and you know
using the empirical rules generally how much of the data is withing how many standard deviations of the
mean. While the coefficient of variation is normalized by dividing it by the mean of the data so it can more
3.51
How does the empirical rule help explain the ways in which the values in a set of numerical data
In most normal sets of data, the empirical rule is a useful rule-of-thumb that says 68% of the data is
within one standard deviation of the mean, 95% within 2 and 99.7 within 3. So if you know both the mean
and standard deviation you can know if a given data point should be considered expected or rare.
3.52
The empirical rule is an estimation about data sets and their distributions so long as they are “normal”, the
3.53
What is meant by the property of shape?
Shape of a data set describes how the data is distributed over the rage, such a standard symmetrical bell
3.54
Skewness refers to how symmetrical the data is, which kurtosis refers to if the data has a bigger or smaller
tail.
3.55
If a data shape is symmetric, the mean will be in the middle and the 2nd and 3rd quartiles will be an
equal distance away from it. If it is positive skewed than the mean will be less than the median and closer to
the 2nd quartile. If the data is negatively skewed than the mean will be greater than the median and closer to
3.56
Covariance is specific to a given set of data while the coefficient of correlation is standardized so
you can use it to compare to other data sets. “Sets A and B are more correlated than C and D”
3.61
a. Calculate the mean, median, range, and standard deviation for the call duration, which is
the amount of time spent speaking to customers on the phone. Interpret these measures of
mean 232.78
median 228
range 1076
standard 158.686
dev 6
The mean is relatively close to the median so the data is not that skewed, most of the data
(68%) can be found between 74 seconds and 391 seconds. So the data has a fairly wide spread.
5 number summery
min 65
first q 138.25
median 228
third q 276.75
max 1141
c. Construct a boxplot and describe its shape.
The data has a mean greater than the median, so is skewed in the positive direction, there
d. What can you conclude about call center performance if a call duration target of less than
We know that both the mean and the median are under this target duration, so on average they are meeting
the goal, but because the standard deviation is so high, a large number of calls are missing the target.
3.73
You are planning to study for your statistics examination with a group of classmates, one of whom
make sense for numerical data such as height and grade point average, these data sets should be switched
to get the appropriate charts. While the mean for height and grade point average makes sense, it should not