Professional Documents
Culture Documents
1
Statistics As Science
Introduction
We use statistics when the number of cases that can occur are really large. so large
that it would be impractical or impossible to get every sample. so we use statistics to
allow us to estimate results and perhaps predict the probability of them happening
in the future. In order for a scientist to make an objective judgment as to whether or
not a particular hypothesis can be established by a set of collected data, an objective
method for either accepting, or rejecting, that hypothesis must be used. This is why,
when collating these results, it is essential that statistics are included to ascertain
the level or degree of accuracy in the hypothesis.
All measurements contain some uncertainty and error, and statistical methods help
us quantify and characterize this uncertainty. This helps explain why scientists often
speak in qualified statements. For example, no seismologist who studies
earthquakes would be willing to tell you exactly when an earthquake is going to
occur; instead, the US Geological Survey issues statements like this: There is a 62%
probability of at least one magnitude 6.7 or greater earthquake in the 3-decade
interval 2003-2032 within the San Francisco Bay Region. This may sound
ambiguous, but it is in fact a very precise, mathematically-derived description of
how confident seismologists are that a major earthquake will occur, and open
reporting of error and uncertainty is a hallmark of quality scientific research.
Statistical methods are used in all areas of science. The module explores the
difference between
It explains how common words like significant, control, and random have a different
meaning in the field of statistics than in everyday life.
Course Module
Statistics as Science
Modern science is often based on statements of statistical
significance and probability.
For example:
1) studies have shown that the probability of developing lung cancer is almost 20
times greater in cigarette smokers compared to nonsmokers (ACS, 2004);
2) there is a significant likelihood of a catastrophic meteorite impact on Earth
sometime in the next 200,000 years (Bland, 2005); and
3) first-born male children exhibit IQ test scores that are 2.82 points higher than
second-born males, a difference that is significant at the 95% confidence
level (Kristensen & Bjerkedal, 2007).
But why do scientists speak in terms that seem obscure? If cigarette smoking causes
lung cancer, why not simply say so? If we should immediately establish a colony on
the moon to escape extraterrestrial disaster, why not inform people? And if older
children are smarter than their younger siblings, why not let them know?
The reason is that none of these latter statements accurately reflects the data.
Scientific data rarely lead to absolute conclusions. Not all smokers die from lung
cancer – some smokers decide to quit, thus reducing their risk, some smokers may
die prematurely from cardiovascular or diseases other than lung cancer, and some
smokers may simply never contract the disease. All data exhibit variability, and it is
the role of statistics to quantify this variability and allow scientists to make more
accurate statements about their data.
All measurements contain some uncertainty and error, and statistical methods help
us quantify and characterize this uncertainty. This helps explain why scientists often
speak in qualified statements. For example, no seismologist who
studies earthquakes would be willing to tell you exactly when an earthquake is
going to occur; instead, the US Geological Survey issues statements like this: "There
is ... a 62% probability of at least one magnitude 6.7 or greater earthquake in the 3-
decade interval 2003-2032 within the San Francisco Bay Region" (USGS, 2007). This
may sound ambiguous, but it is in fact a very precise, mathematically-derived
description of how confident seismologists are that a major earthquake will occur,
and open reporting of error and uncertainty is a hallmark of quality
scientific research.
Today, science and statistical analyses have become so intertwined that many
scientific disciplines have developed their own subsets of statistical techniques and
terminology. For example, the field of biostatistics (sometimes referred to as
biometry) involves the application of specific statistical techniques to disciplines in
biology such as population genetics, epidemiology, and public health. The field of
geostatistics has evolved to develop specialized spatial analysis techniques that help
geologists map the location of petroleum and mineral deposits; these spatial
analysis techniques have also helped Starbuck's® determine the ideal distribution of
coffee shops based on maximizing the number of customers visiting each store. Used
correctly, statistical analysis goes well beyond finding the next oil field or cup of
coffee to illuminating scientific data in a way that helps validate scientific
knowledge.
Course Module
explains how common words like "significant," "control," and "random" have a
different meaning in the field of statistics than in everyday life.
According to Shari Messinger, Statistical science is much more than data analysis,
and involves the incorporation of statistical methodology at all stages of research,
requiring scientific expertise in the field of statistics. Appropriate use of statistical
methodology in data analysis means the data should be analyzed in a way that is
both scientifically and statistically reasonable. The statisticians are, themselves,
scientists collaborating in research, and are using their statistical expertise in
determining and applying the appropriate methodology for rigorously addressing
important research questions with excellence.
The time invested for a particular data analysis can take hours, or it can take
months. This depends on the research questions, the study design, the properties of
data gathered, and the target audience that will need to understand the results.
In statistics, variables are central to any analysis and they need to be understood
well by the researcher. Even though the concept looks deceptively simple, many
studies and experienced researchers can go wrong by using the wrong variables.
Like any variable in mathematics, variables can vary, unlike mathematical constants
like pi or e. In statistics, variables contain a value or description of what is being
studied in the sample or population.
For example, if a researcher aims to find the average height of a tribe in Columbia,
the variable would simply be the height of the person in the sample. This is a simple
measure for a simple statistical study. However, most statistical analyses are not as
straightforward.
In many cases, statistical variables do not contain numerical values but rather
something descriptive, such as the color of fins of a fish or the kind of species in a
given natural habitat
Quantitative Method
5
Statistics As Science
According to James Milford, Statistics play a vital role in virtually all branches of
science. Statistical methods are commonly used for analyzing experiment results,
testing their significance and displaying the results accordingly.
It could be argued that without the use, and advancements, of statistics and
statistical research the empirical observations of today's scientists and inventors
would be far less accurate and progressive. We all benefit from the developments
and improvements in science that have been made to our day-to-day life, most of the
time taking for granted what has been achieved with the aid of statistics.
1. Research
There are various research methods that use statistics to record their results,
especially when it comes to experiments in the field of science. Once a
number of experiments have been carried out, statistics
will show a certain result. Then, scientists can accumulate all the statistics
from each test to identify if there are any differences or changes over time. It
is important to have statistics because it is the simple and stand-out way of
representing a result.
2. Significance
Statistics are a single number or a collection of numbers that show the
importance of a certain change or development. In terms of technology this is
important because statistics can show what the current trends are in terms
of various technologies and the development of them. It is also good market
research for companies that are creating various devices because statistics
Course Module
from areas around the country can help identify what the best device is to
move forward with.
3. Predictions
The value of statistics is strong because they can serve as predictions as well
as probabilities in certain trends. This is especially the case in scientific
experiments and studies because within a lot of scientific research it is about
trial and error and what reactions work best. If there are professionals
working towards substances for medical use, statistics can identify what
works best.
Variables
“Variable” is a term frequently used in research projects. It is pertinent to define and
identify the variables while designing quantitative research projects. A variable
incites excitement in any research than constants. It is therefore critical for
beginners in research to have clarity about this term and the related concepts.
Variable, to put in layman statement is something that can change and or can have
more than one value. ''A variable, as the name implies, is 1” something that varies”.
It may be weight, height, anxiety levels, income, body temperature and so on. Each
of these properties varies from one person to another and also has different values
along a continuum. It could be demographic, physical or social and include religion,
income, occupation, temperature, humidity, language, food, fashion, etc. Some
variables can be quite concrete and clear, such as gender, birth order, types of blood
group etc while others can be considerably more abstract and vague.
It is pertinent for a researcher to know as how certain variables within a study are
related to each other. It is thus important to define the variables to facilitate
accurate explanation of the relationship between the variables. There is no limit to
the number of variables that can be measured, although the more variables, the
more complex the study and the more complex the statistical analysis. Moreover the
longer the list of variables, the longer the time required for data collection.
symbols "x" , "y" or "b" represent variables in an equation, while "pi" is a constant.
In an experimental example, if a study is investigating the differences between
males and females, gender would be a variable (some subjects in the study would be
men, and others would be women). If a study has only female subjects, gender
would not be a variable, since there would be only women. If a study includes both
males and females as subjects, but is not interested in differences between men and
women - and does not compare them, gender would not be a variable in that study.
If a study compares three different diets, but keeps all 3 diets the same in the
amount of sodium, then sodium isn't a variable in that study - it's a constant. Other
features of the diets would be variables of interest - maybe the calories or
carbohydrates or fat content.
The independent variable is the variable whose change isn’t affected by any other
variable in the experiment. Either the scientist has to change the independent
variable herself or it changes on its own; nothing else in the experiment affects or
changes it. Two examples of common independent variables are age and time.
There’s nothing you or anything else can do to speed up or slow down time or
increase or decrease age. They’re independent of everything else.
Course Module
The dependent variable is what is being studied and measured in the experiment.
It’s what changes as a result of the changes to the independent variable. An example
of a dependent variable is how tall you are at different ages. The dependent variable
(height) depends on the independent variable (age).
It refers to that type of variable that measures the affect of the independent
variable(s) on the test units. We can also say that the dependent variables are the
types of variables that are completely dependent on the independent variable(s).
The other name for the dependent variable is the Predicted variable(s). The
dependent variables are named as such because they are the values that are
predicted or assumed by the predictor / independent variables. For example, a
student’s score could be a dependent variable because it could change depending on
several factors, such as how much he studied, how much sleep he got the night
before he took the test, or even how hungry he was when he took it. Usually when
one is looking for a relationship between two things, one is trying to find out what
makes the dependent variable change the way it does.
An easy way to think of independent and dependent variables is, when you’re
conducting an experiment, the independent variable is what you change, and the
dependent variable is what changes because of that. You can also think of the
independent variable as the cause and the dependent variable as the effect.
It can be a lot easier to understand the differences between these two variables with
examples, so let’s look at some sample experiments below.
Below are overviews of three experiments, each with their independent and
dependent variables identified.
Experiment 1: You want to figure out which brand of microwave popcorn pops the
most kernels so you can get the most value for your money. You test different
brands of popcorn to see which bag pops the most popcorn kernels.
Experiment 2: You want to see which type of fertilizer helps plants grow fastest, so
you add a different brand of fertilizer to each plant and see how tall they grow.
Quantitative Method
9
Statistics As Science
Experiment 3: You’re interested in how rising sea temperatures impact algae life, so
you design an experiment that measures the number of algae in a sample of water
taken from a specific ocean site under varying temperatures.
For each of the independent variables above, it’s clear that they can’t be changed by
other variables in the experiment. You have to be the one to change the popcorn and
fertilizer brands in Experiments 1 and 2, and the ocean temperature in Experiment
3 cannot be significantly changed by other factors. Changes to each of these
independent variables cause the dependent variables to change in the experiments.
A few more examples can highlight the importance and usage of dependent and
independent variables in a broader sense.
If one wants to estimate the cost of living of an individual, then the factors such as
salary, age, marital status, etc. are independent variables, while the cost of living of a
person is highly dependent on such factors. Therefore, they are designated as the
dependent variable.
Course Module
Independent and dependent variables always go on the same places in a graph. This
makes it easy for you to quickly see which variable is independent and which is
dependent when looking at a graph or chart. The independent variable always goes
on the x-axis, or the horizontal axis. The dependent variable goes on the y-axis, or
vertical axis.
Here’s an example:
As you can see, this is a graph showing how the number of hours a student studies
affects the score she got on an exam. From the graph, it looks like studying up to six
hours helped her raise her score, but as she studied more than that her score
dropped slightly.
The amount of time studied is the independent variable, because it’s what she
changed, so it’s on the x-axis. The score she got on the exam is the dependent
variable, because it’s what changed as a result of the independent variable, and it’s
on the y-axis. It’s common to put the units in parentheses next to the axis titles,
which this graph does.
Course Module