Data Analysis Project
Misleading Statistics
Companies and media often manipulate and skew certain statistics
to their advantages in order to deliver their messages more
convincingly.
Because there are many ways of presentation of statistics, it is quite
easy to manipulate the information in a misleading way.
If the statistics include surveying, a proper way of selection of
population is important (Simple Random Sampling, Clustered
Sampling, Stratified Random Sampling, and etc.).
We all must be aware of these misleading graphs and in what forms
they are presented to us, and be able to analyze, and fix such
graphs so that they are no longer misleading, to the benefit of our
society.
Two different bar graphs are made from
the same survey of favorite foods:
The same information can be accurately
presented in a nonmisleading way :
Favorite Foods
Hamburgers
33%
Pizza
33%
Pizza
Hot Dogs
Hamburgers
Hot Dogs
34%
If we take the same
information and present
it in a pie graph, we can
see the more accurate
result of the survey.
Unlike the previous
graph which depicted
hot dogs as the favorite
food by misrepresenting
the origin on the yaxis,
this pie graph shows
that all three foods are
equally preferred, a
more realistic result.
Comparative Causes of Annual Deaths in
the United States Provided by CDC
A simple glance at this graph will
make us conclude that smoking
is the leading cause of death
among Americans. However, an
indepth analysis of this graph
will easily tell us that it is greatly
misleading.
Certain crucial
information are
missing. There is no
way for us to know
whether or not CDC
has counted smokers
who have died from
diseases or accidents.
There is a good
chance that any
smoker that died from
a disease has been
counted as those that
died from smoking.
Here is a question to
ask the CDC:
A person who smokes has died from a heart disease. What was
his cause of death?
Ways to fix misleading graphs  I
One way to fix a
misleading graph is to
present in a different
way, like what we
have done for the
previous
information(favorite
foods). However, it
seems there is no
significant change in
the information even
after we have made a
pie graph from the
initial bar graph. We
now have to question
the method of
construction of the
graph by the CDC.
Ways to fix misleading graphs  II
 How the CDC has collected their data is very doubtful. The
graph does not provide any information about the number
of deaths caused by smoking. If smoking is not the primary
cause of death, then it should not be the cause of death.

The CDC may say that 418,000 people who have died
were smokers, but they cannot say that they have died
because of smoking.
The graph presented on the next slide is a more accurate
graph regarding the causes of annual deaths in the United
States:
Revised Graph Percentage of
Smokers in Each Cause of Annual
Death in the United States
Analysis of the Revised Graph
From the revised graph, we can tell that
certain percentages of people who have
died from each disease smoked. This
graph does not imply that smoking is the
leading cause of death in the United
States. It does, however, imply that
smoking contributes to deaths in the
United States.
For instance, we can assume that smoking is closely related with cardio
vascular diseases such as heart disease and cancer, for chemical included in a
cigarette such as tar is able to block blood vessels, ultimately causing heart
diseases. We can also assume that smoking is intimately related to deaths
cause by usage of drugs.
We cannot draw a fact from most statistics. It is important to conduct the
survey and the construction of presentation as in the most realistically accurate,
reliable, and nonmisleading way as possible, and the conclusions drawn from
the presentation, must not be general, but acutely specific.
Price Per Barrel of Light Crude Oil
Leaving Saudi Arabia on Jan. 1
The pictograph on the left indicates the
amount of increase in crude oils in
transport from Saudi Arabia. The ratio of
the size of the barrel to the actual price is
not in proportion. The difference in sizes
of the barrels are conveyed to
exaggerate the increase or decrease in
the price of a barrel of crude oil. It is,
moreover, hard for readers to visually
compare prices in each year. Therefore,
this pictograph is potentially misleading.
Year
Price
Increase
1973
$2.41
1974 $10.95 354.36%
1974 $10.46 4.47%
1976 $11.51 10.04%
1977 $12.09 5.04%
1978 $12.70 5.05%
1979 $13.34 5.04%
Revised Graph
Price Per Barrel
Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on
Jan. 1
Instead of using
barrels with
different sizes to
describe the
increase in prices,
a properly
constructed bar
graph would
present the
information more
accurately.
$16.00
$14.00
$12.00
$10.00
$8.00
$6.00
$4.00
$2.00
$0.00
1973
1974
1975
1976
Years
1977
1978
1979
Revised Graph  II
Prices Per Barrel
Price Per Barrel of Light Crude Oil Leaving Saudi
Arabia on Jan. 1
$16.00
$14.00
$12.00
$10.00
$8.00
$6.00
$4.00
$2.00
$0.00
1973
1974
1975
1976
1977
1978
1979
Years
 Another adequate way of fixing the graph, showing the gradual
increase in the oil prices effectively through a line graph.
Chevy Advertisement
This is a misleading graph to serve a purpose
which is to indicate that Chevy is the most
preferred car among people, thus possibly
persuading many others to purchase Chevy.
However, if we look at the graph closely, it can
be seen that the yaxis does not start at zero.
The viewer of this graph may take this
misleading graph for how it looks, and will
procure a false and inaccurate information.
In order to fix this misleading graph, we would
have to possess a precise and accurate
information with which we could organize a
properly designed graph. The yaxis of the
graph must also begin from 0, in order to
display an accurate comparison. Without
sufficient information, we cannot but simply
be aware that graphs such as this are
misleading because of their obscurity of the
origin on the yaxis.
What makes some statistical
information accurate and reliable?
Statistics is a set of methods that are used to collect and analyze data.
Because it is used to help many people to make good decisions about
uncertain situations, many people tend to believe any statistic that is
presented to them by a company. However, as you have seen through our
presentation, statistics are very easy to manipulate; without an adequate
understanding and analysis of the statistical information, it is easy for us to
take misleading statistics seriously. Accurate and reliable statistics come from
proper procedure of defining the problem, collecting the data, analyzing
the data, and reporting the data.
These 4 procedures must be done rationally and as accurately as possible, in
order to prevent the statistics from becoming misleading.
We will explain the adequate ways to conduct the 4 procedures, and
ultimately to make accurate and reliable statistics.
Defining the Problem
Every word in a statistical problem must be defined
extremely specifically and accurately. For example,
if the problem was counting the number of
inhabitants of Kerrisdale, Vancouver, on a specific
date, we would have to define inhabitants to know
who to count into the survey. Also, Kerrisdale must
be defined specifically in order to decide where to
stop the survey. Factors such as newborn babies in
the hospital must be taken into consideration. If one
of these pieces of information is not clearly define, it
would be extremely difficult to begin gathering data.
Collecting the Data
For each kind of problem, different information is
needed, and so is the method of collecting the data.
One of the most important parts of establishing a
statistic is to design an effective way of collecting data.
We collect data from a population or from a sample.
When the population of the survey is selected from a
sample, the selected population must be able to
provide exactly required information for the purpose of
the survey. The most exacting and informative form of
data collection for comparisons is randomized
controlled experiment. The population is divided into
randomly separated groups, and are selcted randomly.
Analyzing the Data
~ Exploratory Methods ~
This method often involves a lot of calculating averages and percentages, and displaying
the information on a graph. Although Exploratory methods may provide many pieces
of information, it may not answer specific questions or make definite statements
about a problem.
~ Confirmatory Methods ~
This method is used to conclude the results of the survey and the statistical information
by answering specific questions. For example, using a confirmatory method, a
statistician can say Oil Prices leaving Saudi Arabia has been increasing, and will
increase in prices.
Not one of these methods should be overlooked. Both methods should be used
extensively to analyze the results of a statistical activity and will have to come to
varieties of extremely specific conclusions with credibility and accuracy.
Reporting the Results
Inference is used to draw conclusion from a statistical activity;
even from a small collection of observations or
experimental results, careful and rational inference can
create an accurate and reliable generalization that can be
used to used to the social benefits.
There are many forms of presentations, and they include bar
graphs, pie graphs, tables, or a set of percentages.
However, when drawing conclusions, one must take into
consideration the fact that the survey was carried on a
specifically selected sample population, not the entire
population. Therefore, using probability, the conclusions
must reflect and include the uncertainty possibly excluded
or misrepresented in the statistics.
How We Would Conduct
Statistical Activities
Everything that has been presented to
you by the previous slides must be
considered when carrying out a
statistical activity. When everything is
carefully done, the statistics will be
truly accurate and reliable.
Thank you for viewing our presentation!
