You are on page 1of 244

Chpt 1

1.
Statistics as a numerical facts refers to different numerical facts such as medians, averages,
percents and indexes that help us understand the variety of business and economic situations. In
terms of discipline, statistics is art and science of collecting, analyzing, presenting and
interpreting data.
2.

3.
4.

5.

6.
Categorical variables places the individuals into a category, while a quantitative variable is a numerical
variable.

a. Quantitative, because the number of nights is numerical

b. Categorical, because the three possible categories are listed.

c. Categorical, because the categories are yes and no.

d. Quantitative, because the age is a number.

e. Categorical, because the seven listed international destinations are the categories.
7.

8.

9.

10.
a. Quantitative, Ratio

b. Categorical, Nominal

c., Categorical, ordinal

d. Quantitative, ratio

e. Categorical, nominal

11.

a. Quantitative, Ratio

b. Categorical, Ordinal

c., Categorical, Ordinal


d. Quantitative, Ratio

e. Categorical, Nominal

12.

13.

14.
a. Year is on the horizontal axis and the forecasts are on the vertical axis.

Connect consecutive years of the same manufacturer by a straight line.


b. Up to 2005, it appears that General Motors is the production leader. After 2005, it appears that Toyota
becomes the production leader.

c. The width of the bars has to be the same and the height has to be equal to the value in the table.
15.

16.

17.
18.
19.

20.

21.
d. Cannot be determined, because we have NO information about the population of women who did
NOT take the drug.

e. The larger the sample, the more reliable that the results are. In medicines, it is very important that
the results are reliable, because people's lives could be at stake.

22.
23.

24.

b. Too generalized, because the average of 77 is based only on the sample of 5 students.

c. Correct, because the average of 77 is referred to as an estimate.

d. Too generalized, because it is possible that the population deviates from the given sample.

e. Too generalized, because it is possible that the population deviates from the given sample and thus the
five other students could have different grades (lower than 65 or higher than 90)..

25.
a. The variables are the topics that were investigated.

VARIABLES=Exchange, Ticker Symbol, Market Cap, Price/Earnings Ratio, Gross Profit Margin

Thus there are 5 variables.

b. Categorical variables places the individuals into a category, while a quantitative variable is a
numerical variable.

CATEGORICAL=Exchange, Ticker Symbol


QUANTITATIVE=Market Cap, Price/Earnings Ratio, Gross Profit Margin

c. The frequency is the number of times the value is mentioned in the data set.

The percentage is the frequency divided by the sample size of 25.


The width of the bars has to be the same and the height has to be equal to the frequency.

d. The frequency is the number of times the value is mentioned in the data set.

The percent frequency is the frequency divided by the sample size of 25.

The width of the bars has to be the same and the height has to be equal to the frequency.
Chpt 2.1

1.

The frequency is the number of times a certain response was given.

The relative frequency is the frequency divided by the sample size of 120.

2.
3.

d. The width of the bars has to be the same and the height has to be equal to the frequency.
 
4.
a. Categorical variables places the individuals into a category, while a quantitative variable is a
numerical variable.

CATEGORICAL, because no numbers are given.

b. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the sample size, rewritten as a percentage.
c. BAR CHART

The width of each bar has to be the same and the height has to be equal to the frequency.

PIE CHART

Draw a circle

The slice size (central angle of the circle) is the product of 360 degrees and the percent of values in the
category.

Draw the slices for each category.


d. The largest viewing audience is the show with the highest frequency, thus CSI.

The second largest viewing audience has the next highest frequency, thus Trace.

5.
a. The frequency is the number of times the value occurs in the data set.

The relative frequency is the frequency divided by the sample size

b. BAR CHART
The width of each bar has to be the same and the height has to be equal to the frequency.

c. PIE CHART

Draw a circle

The slice size (central angle of the circle) is the product of 360 degrees and the percent of values in the
category.

Draw the slices for each category.

d. The three most common last names have the highest frequencies and thus are Smith, Johnson and
Williams.
2

 
6.
a. The frequency is the number of times the value occurs in the data set.

The relative frequency is the frequency divided by the sample size


BAR CHART

The width of each bar has to be the same and the height has to be equal to the frequency.

b. NBC and CBS tie for first with 17 top-rated shows, while ABC is second with 15 top-rated shows.
FOX is last with only 1 top-rated show.
7.

The frequency is the number of times the value occurs in the data set.

The relative frequency is the frequency divided by the sample size

In general, the food quality ratings are good or better (because the categories contain the highest
frequency) and thus the food quality rating at the restaurants appears to be at least good.
8.

 
a. The frequency is the number of times the value occurs in the data set.

The relative frequency is the frequency divided by the sample size


b. The most Hall of Famers is the category with the highest frequency:

P or Pitcher.

c. The fewest Hall of Famers is the category with the lowest frequency:

H (catcher) and 1st base

d. The most Hall of Famers is the category with the highest frequency:

R or right field.

e. Infielders appear to have less Hall of Famers than outfielders, because the frequencies and the total
frequency is lowest for the infielders.
9.

a. The frequency is the number of times the value occurs in the data set.

The relative frequency is the frequency divided by the sample size

b. The width of each bar has to be the same and the height has to be equal to the frequency.

c. Most adults corresponds with the category with the highest frequency:

C or City

d. Most adults corresponds with the category with the highest frequency:

T or Small Town
e. Less people would live in the city, while more people will love in small towns and rural areas. The
number of people in the suburbs do not change much

10.

a. The frequency is the number of times the value occurs in the data set.

b. The relative frequency is the frequency divided by the sample size

The percent frequency is the relative frequency rewritten as a percentage (and rounded to the nearest
percent).

c. The width of each bar has to be the same and the height has to be equal to the frequency.

e. In Spain, the ratings are worse, because the rating of Bad increased

chpt 2.2
11.

a. The frequency is the number of values that fall within the interval.

b. The relative frequency is the frequency divided by the total frequency.

The percent frequency is the relative frequency multiplied by 100.

12.

The cumulative frequency is the frequency increased by the frequency of the previous classes.

The cumulative relative frequency is the cumulative frequency divided by the total frequency.
13.

HISTOGRAM

The width of each bar has to be the same and the height has to be equal to the frequency.

OGIVE

Draw the histogram using the cumulative frequencies.

The ogive draws a line between the upper boundary of each bar and the upper boundaries of the
consecutive bar.

Then remove the bars of the histogram.


14.
a. Create a number line

For every given data value place a dot above the corresponding number on the number line.

b. The frequency is the number of values that fall within the interval.

c. The relative frequency is the frequency divided by the total frequency.

The percent frequency is the relative frequency multiplied by 100.


15.
a. The frequency is the number of values that fall within the interval.

b. The relative frequency is the frequency divided by the total frequency.

The percent frequency is the relative frequency multiplied by 100.

c. The cumulative frequency is the frequency increased by the frequency of the previous classes.

d. The cumulative relative frequency is the cumulative frequency divided by the total frequency.

16.
a. The frequency is the number of values that fall within the interval.

b. The relative frequency is the frequency divided by the total frequency.

The percent frequency is the relative frequency multiplied by 100.

c. The cumulative frequency is the frequency increased by the frequency of the previous classes.

The cumulative relative frequency is the cumulative frequency divided by the total frequency.

The cumulative percent frequency is the cumulative relative frequency multiplied by 100.
d. The width of each bar has to be the same and the height has to be equal to the frequency.
17.
c. The width of each bar has to be the same and the height has to be equal to the frequency.
18.
a. The lowest holiday spending is 180 dollars, while the highest holiday spending is 2050 dollars.

b. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.
c. The width of each bar has to be the same and the height has to be equal to the frequency.

The histogram is skewed to the right, because the highest bars are to the left in the histogram.

19.
a. The frequency is the number of times the value occurs in the data set.

b. The relative frequency is the frequency divided by the total frequency.

c. The cumulative frequency is the frequency increased by the frequency of the previous classes.

d. The cumulative relative frequency is the cumulative frequency divided by the total frequency.
e. Draw the histogram using the cumulative frequencies.

The ogive draws a line between the upper boundary of each bar and the upper boundaries of the
consecutive bar.

Then remove the bars of the histogram.

20.
a. The frequency is the number of values that fall within the interval.

The relative frequency is the frequency divided by the total frequency.


The percent frequency is the relative frequency multiplied by 100.
b. The width of the bars has to be the same and the height has to be equal to the frequency.

c. The distribution is skewed to the right, because the highest bars are to the left in the histogram.

d. The most frequent income class is less than 5 million dollars.

Tiger Woods and Phil Mickelson appear to earn unusually much.


21.
3
Chpt 2.3

22.
Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value to
the right of the vertical line.

23.

Place the digits of the ones to the left of the vertical line and the digits of the tenths of every data value
to the right of the vertical line.
24.

25.

Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value to
the right of the vertical line.
26.

a. Round every data value to the nearest dollar.

30 25 54 17 55 13 50 35 25 40 39 10 50 45 24 35 17 40 55 45 50 48 30 40

Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value to
the right of the vertical line.

b. Round every data value to the nearest dollar.

30 11 25 5 30 10 15 20 15 20 63 11 30 15 14 13 7 20 25 18 20 20 20 36

Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value to
the right of the vertical line.

Note: "High" indicates the outlier that was not mentioned in the stemplot.
27.

a. Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value
to the right of the vertical line.

28.

a. Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value
to the right of the vertical line.

29.
b. The row percentages are every value divided by the row total, multiplied by 100.

c. The column percentages are every value divided by the column total, multiplied by 100.
30.
31.
c. The width of the bars has to be the same and the height has to be equal to the percentage.

There appears to be a relationship between household income and education level, because the blue and
red bars of the same categories do not have roughly the same height.

32.
c. The distributions are completely different. Under 25 have mostly college degrees or less, while Over
100 have mostly college degrees or more.

Household income and education level have a relationship, because the values in each column are
different

33.

2
b. For the under 15 group, the males have the highest percentage of too fast.
3
c. For the 15 or more group, the females have the highest percentage of too fast.
4
d. Female prefer slower greens, while men prefer faster greens.

Males with lower handicaps prefer slower greens.

Females with higher handicaps prefer slower greens


34.

a. The given classes will be given in each column, while the three fund types are given in each row.
b. The frequency is the number of values that fall within the category and in this case is the same as the
row totals.

c. The frequency is the number of values that fall within the category and in this case is the same as the
column totals.

d. The crosstabulation contained the frequency distributions in its row totals and column totals.

e. There appears to be a relationship between the fund type and the average return over the past five
years, because the frequency is not roughly the same in each row or in each column of the
crosstabulation.
35.

a. The given classes will be given in each column, while the three fund types are given in each row.
b. The frequency is the number of values that fall within the category and in this case is the same as the
row totals.

c. There appears to be a relationship between the fund type and the expense ratio, because the
frequency is not roughly the same in each row or in each column of the crosstabulation.

36.

a. 5-Year Average return is on the horizontal axis and Net asset value is on the vertical axis.

b. There appears to be a positive relationship between 5-year average return and net asset value, because
the scatterplot slopes upwards.

This means that higher 5-year average returns tend to be associated with higher net asset values.
37.
38.
39.

a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.

b. The width of the bars has to be the same and the height has to be equal to the frequency.
40.
a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.
b. The width of the bars has to be the same and the height has to be equal to the frequency.
41.

a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.

b. The width of the bars has to be the same and the height has to be equal to the frequency.
42.
a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.
The width of the bars has to be the same and the height has to be equal to the frequency.

43.
a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.

The width of the bars has to be the same and the height has to be equal to the frequency.
HISTOGRAM

The width of the bars has to be the same and the height has to be equal to the winning margin.

e. The smallest winning margin is of 1 points, which is the 25th Superbowl and which was held in
Florida.

The largest winning margin is of 45 points, which is the 24th Superbowl and which was held in LA.
a. The frequency is the number of times the value occurs in the data set.

The percent frequency is the frequency divided by the total frequency, multiplied by 100.
The width of the bars has to be the same and the height has to be equal to the frequency.

b. The distribution is skewed to the right, because the highest bars are to the left.

c. Most of the states have a population between 0 and 10 million.

There is one state with a population of more than 35 million.

44.
a. Round every data value to the nearest hundred.

1700
12700
7700
1900
3400
1800
8600
2200
11700
7300

Place the digits of the thousands to the left of the vertical line and the digits of the hundreds of every
data value to the right of the vertical line.
45.
a. Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value
to the right of the vertical line.
b. Place the digits of the tens to the left of the vertical line and the digits of the ones of every data value
to the right of the vertical line.

c. The main difference is that in general the high temperatures are higher than the low temperatures.

The spread and the shape of the distributions appear to be roughly the same.
d. The frequency is the number of times a value of the data set is in the corresponding interval.

46.
a. High is on the horizontal axis and Low is on the vertical axis.

b. There is a positive relationship between the variables, because the pattern in the scatter diagram
slopes upwards.

47.
a. The frequencies are given in the row totals.

The percent frequency is the frequency divided by the total frequency, rewritten as a percentage.
The result appears to support the claim, because most of the individuals supported the claim.

b. The frequencies are given in the column totals.

The percent frequency is the frequency divided by the total frequency, rewritten as a percentage.

c. Yes, most of the people appear to favor the claim in Europe, while the number of people who favor
and who oppose appear to be about the same in the US.
48.

c. The result of part (a) and (b) are not consistent, which is due to the difference of the number of at
bats that they both perform in the junior and the senior year.
49.
a. Determine the row and columns totals.

b. The frequency distribution for year constructed is given in the last column of the crosstabulation
table.
The frequency distribution for fuel type is given in the last row of the crosstabulation table.

c. The columns percentages are the value divided by the column total, multiplied by 100.

5
d. The row percentages are the value divided by the row total, multiplied by 100.

e. As the years increase, the most common fuel type is electricity and natural gas.
50.
a. Determine the number of individuals that correspond with each combination of profit (interval) and
equity (interval).

b. The row percentages are the data value divided by the row total, multiplied by 100.

c. As the profit increases, the stockholders' equity appears to increase too.

51.
a. Determine the number of individuals that correspond with each combination of profit (interval) and
market value (interval).

b. The row percentages are the data value divided by the row total, multiplied by 100.
c. As the profit increases, the market value appears to increase too.

52.
a. Profit is on the horizontal axis and Stockholder's equity is on the vertical axis.

b. There is a positive relationship between the variables, because the pattern in the scatter diagram
slopes upwards.
53.
a. Market Value is on the horizontal axis and Stockholder's equity is on the vertical axis.

b. There is a positive relationship between the variables, because the pattern in the scatter diagram
slopes upwards.
Chpt 3.1

1.

2.
3.

4.
5.
6.

d. Moving the line back, appears to have decrease the percentage of shots made slightly and thus less
shots appear to have been made.

The difference, however, is not very large and thus it is possible that there was no change by moving
the 3-point line.

7.
8.

9.
10.
11.
12.
13.
14.

15.
16.

17.
18.

b. The variation appears to be higher for the Eastern U.S. cities, because there measures of variation are
higher.
19.
c. The mean air quality appears to be the same in Pomona and Anaheim, but the spread in air quality is
higher in Anaheim.
20.

21.

b. The mean cost is higher in the city and the costs also have a higher spread in the city.

22.
23.

b. Difference: variation is higher for the 2006 season.

Improvement: Minimum score has increased.


24.

25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
d. The distribution is skewed to the left, because the skewness is negative.

36.
37.
38.
39.

40.
41.
2

4
d. This problem would not have been identified, because the value is an outlier, which was not detected
in part (c).
5
42.
43.
44.
45.

b. There is a negative linear relationship between the variables, because the scatterplot slopes
downwards and the points roughly lie on a straight line.

c. The covariance indicates a negative linear relationship, because the value is negative.
46.

b. There is a positive linear relationship between the variables, because the scatterplot slopes upwards
and the points roughly lie on a straight line.

c. The covariance indicates a positive linear relationship, because the value is positive.
47.

b. There is a positive linear relationship between the variables, because the scatterplot slopes upwards
and the points roughly lie on a straight line.

c. The covariance indicates a positive linear relationship, because the value is positive.
48.
49.
a.

b. Jobless Rate is on the horizontal axis and Delinquent Loan is on the vertical axis.
50.
a. DJIA is on the horizontal axis and S&P 500 is on the vertical axis.

b.
51.
c.

52.
53.

54.

b. This students will be admitted, because students has a grade point average of 2.5.

55.
56.
57.

b. We note that the mean price per share in 2006 was higher than in 2009.

The prices per share were more variable in 2009 compared to 2006, because the standard deviation in
2009 is higher than 18.14.

58.
e. The skewness measure is positive, which indicates that the data is positively skewed. This is to be
expected, because most charges will be small, but some will be much larger.

59.

c. The age at the first marriage has increased over the past 25 years, because the median ages of 25 years
ago are now the first quartiles.

60.
61.
62.

d. Some advantages are saving time and preventing penalty costs.

63.

c. The Automobile should be preferred, because the means are the same, but the automobiles have a
lower standard deviation which means that the times of the automobile are more reliable.
64.
65.
66.

There appears to be a slightly negative relationship between rooms and cost per night, because the
pattern in the scatterplot slopes slightly downwards.

d.
67.
a. Fair value is on the horizontal axis and Share Price is on the vertical axis.

Determine the sample correlation coefficient:


b. Fair value is on the horizontal axis and Earnings per Share is on the vertical axis.
Determine the sample correlation coefficient:
68.
a.
b. The season appears to be slightly better if the spring training is better. However this improvement
won't be much

This is due to the fact that the spring training is practice and does not count towards their actual
standing.

69.
70.

2
Chpt 4

1.

2.

3.
4.
a.
5.

6.
7.

8.

b.

9.
10.
11.
12.

13.
14.

15.
16.
17.

18.

19.
20.
21.

22.
23.
24.

25.
26.
27.
28.
29.

30.
31.

32.
a. The probability is the number of favorable outcomes divided by the number of possible outcomes,
thus we need to divide the count in the table by the total count of 657.

Determine the row and column totals too.


f. Most of the sales are Non-US cars, while least of the sales are US cars.
33.
a. The probability is the number of favorable outcomes divided by the number of possible outcomes,
thus we need to divide the count in the table by the total count of 1929.

Determine the row and column totals too.


34.
a. The probability of on time is the product of the percentage of flights for the airline and the
percentages of on time flight for the airline.

b. Southwest is the most likely airline, because Southwest has the highest row total in the table.

c. The probability of arriving on time is the column total of "On Time" in the joint probability table:
0.7718

d. US Airways is the most likely to arrive late, because this airline has the highest probability in the
column "Late".

Southwest is the least likely to arrive late, because this airline has the lowest probability in the column
"Late".

35.
a. The probability is the number of favorable outcomes divided by the number of possible outcomes,
thus we need to divide the count in the table by the total count of 200.

Determine the row and column totals too.


36.
37.

d. Yes, you should restrict the amount of the monthly payments that can be made using the plastic card
and not allow them to make more payments until the amount of the previous month has been payed.
38.
a. P(B) = 280/400 = 0.7

b) P(S) = (80+40)/400 = 0.3

c)
P(M|S)= 80/(80+40) = 0.667
P (W|S)=40/(80+40)= 0.333

d) P(MnS) =P(M|S)*P(S)= 0.667*0.3= ~0.2


P(WnS) =P(W|S)*P(S)= 0.333*0.3= ~0.1

e) P(S|M) = 80/200= 0.4

f) P(S|W)= 40/200= 0.2

39.
40.
41.

42.
43.

- Sorry that I'm not very good at English and this is the first time I use this kind of tools, so it is a little
hard to understand.But you may guess what I mean right ^_^?
Call A is the small car
B is the big car
C is the probability of accidents
D is the probability of death accidents
P(A) = 0.18P(B)= 1 -0.18 =0.82
P(C|A)= 0.18
Because the size independent to the probability of accidents
=> A indepent to C, B indepent to C
=> P(C|A) = P(C) =P(C|B)
=> In both situation, A- the small car or B-the small call, the probability of accidents is the same = 0.18
In situation A- the small car: P(D|C) = 0.128
In situation B- the big car: P(D|C) = 0.05
=> P(A|D) = P(AnD)/P(D)
= P(AnD) / [P(AnD)+ P(BnD)]
=[P(A)*P(C|A)*P(D|CA)] / [P(A)*P(C|A)*P(D|CA)+P(B)*P(C|B)*P(D|CB)]
=0.18*0.18*0.128 / [0.18*0.18*0.128+ 0.82*0.18*0.05]
=0.00415 / [ 0.00415+ 0.00738]
=0.00415 / 0.01153
=0.36

44.

45.
46.

47.
48.
49.

50.
51.
a. The probability is the number of favorable outcomes divided by the number of possible outcomes,
which then means that the probability is the count in the given table divided by the total of 73,736.
52.
a. The probability is the number of favorable outcomes divided by the number of possible outcomes,
which then means that the probability is the count in the given table divided by the total of 2018.

53.
Result previous exercise:
54.
55.
56.

57.

58.
a.
2

59.

60.
Chpt 5

1.

2.
3.

4.
5.

6.
7.

8.
9.
a. The outcome is the age.The relative frequency is the number of children divided by the total number
of children.
10.
11.

a. The outcome is the number of hours of the service call.The probability has to be the same for each
category (and the probabilities have to sum up to 1).
12.

13.
14.

You might also like