You are on page 1of 11

Quantitative Method

1
Statistics As Science

Module 002 Statistics as Science

At the end of this module you are expected to:


1. Explain statistics as science;
2. Understand the concepts of Statistics; and
3. Differentiate dependent and independent variable;

Introduction
We use statistics when the number of cases that can occur are really large. so large
that it would be impractical or impossible to get every sample. so we use statistics to
allow us to estimate results and perhaps predict the probability of them happening
in the future. In order for a scientist to make an objective judgment as to whether or
not a particular hypothesis can be established by a set of collected data, an objective
method for either accepting, or rejecting, that hypothesis must be used. This is why,
when collating these results, it is essential that statistics are included to ascertain
the level or degree of accuracy in the hypothesis.

All measurements contain some uncertainty and error, and statistical methods help
us quantify and characterize this uncertainty. This helps explain why scientists often
speak in qualified statements. For example, no seismologist who studies
earthquakes would be willing to tell you exactly when an earthquake is going to
occur; instead, the US Geological Survey issues statements like this: There is a 62%
probability of at least one magnitude 6.7 or greater earthquake in the 3-decade
interval 2003-2032 within the San Francisco Bay Region. This may sound
ambiguous, but it is in fact a very precise, mathematically-derived description of
how confident seismologists are that a major earthquake will occur, and open
reporting of error and uncertainty is a hallmark of quality scientific research.

Scientific research rarely leads to absolute certainty. There is some degree of


uncertainty in all conclusions, and statistics allow us to discuss that uncertainty.

Statistical methods are used in all areas of science. The module explores the
difference between

(a) proving that something is true and


(b) measuring the probability of getting a certain result.

It explains how common words like significant, control, and random have a different
meaning in the field of statistics than in everyday life.

Course Module
Statistics as Science
Modern science is often based on statements of statistical
significance and probability.

For example:
1) studies have shown that the probability of developing lung cancer is almost 20
times greater in cigarette smokers compared to nonsmokers (ACS, 2004);
2) there is a significant likelihood of a catastrophic meteorite impact on Earth
sometime in the next 200,000 years (Bland, 2005); and
3) first-born male children exhibit IQ test scores that are 2.82 points higher than
second-born males, a difference that is significant at the 95% confidence
level (Kristensen & Bjerkedal, 2007).

But why do scientists speak in terms that seem obscure? If cigarette smoking causes
lung cancer, why not simply say so? If we should immediately establish a colony on
the moon to escape extraterrestrial disaster, why not inform people? And if older
children are smarter than their younger siblings, why not let them know?

The reason is that none of these latter statements accurately reflects the data.
Scientific data rarely lead to absolute conclusions. Not all smokers die from lung
cancer – some smokers decide to quit, thus reducing their risk, some smokers may
die prematurely from cardiovascular or diseases other than lung cancer, and some
smokers may simply never contract the disease. All data exhibit variability, and it is
the role of statistics to quantify this variability and allow scientists to make more
accurate statements about their data.

A common misconception is that statistics provide a measure of proof that


something is true, but they actually do no such thing. Instead, statistics provide a
measure of the probability of observing a certain result. This is a critical distinction.
For example, the American Cancer Society has conducted several massive studies of
cancer in an effort to make statements about the risks of the disease in US citizens.
Cancer Prevention Study I enrolled approximately 1 million people between 1959
and 1960, and Cancer Prevention Study II was even larger, enrolling 1.2 million
people in 1982. Both of these studies found much higher rates of lung cancer among
cigarette smokers compared to nonsmokers, however, not all individuals who
smoked contracted lung cancer (and, in fact, some nonsmokers did contract lung
cancer). Thus, the development of lung cancer is a probability-based event, not a
simple cause-and-effect relationship.

Statistical techniques allow scientists to put numbers to this probability, moving


from a statement like "If you smoke cigarettes, you are more likely to develop lung
cancer" to the one that started this module: "The probability of developing lung
cancer is almost 20 times greater in cigarette smokers compared to nonsmokers."
The quantification of probability offered by statistics is a powerful tool used widely
throughout science, but it is frequently misunderstood.
Quantitative Method
3
Statistics As Science

All measurements contain some uncertainty and error, and statistical methods help
us quantify and characterize this uncertainty. This helps explain why scientists often
speak in qualified statements. For example, no seismologist who
studies earthquakes would be willing to tell you exactly when an earthquake is
going to occur; instead, the US Geological Survey issues statements like this: "There
is ... a 62% probability of at least one magnitude 6.7 or greater earthquake in the 3-
decade interval 2003-2032 within the San Francisco Bay Region" (USGS, 2007). This
may sound ambiguous, but it is in fact a very precise, mathematically-derived
description of how confident seismologists are that a major earthquake will occur,
and open reporting of error and uncertainty is a hallmark of quality
scientific research.

Today, science and statistical analyses have become so intertwined that many
scientific disciplines have developed their own subsets of statistical techniques and
terminology. For example, the field of biostatistics (sometimes referred to as
biometry) involves the application of specific statistical techniques to disciplines in
biology such as population genetics, epidemiology, and public health. The field of
geostatistics has evolved to develop specialized spatial analysis techniques that help
geologists map the location of petroleum and mineral deposits; these spatial
analysis techniques have also helped Starbuck's® determine the ideal distribution of
coffee shops based on maximizing the number of customers visiting each store. Used
correctly, statistical analysis goes well beyond finding the next oil field or cup of
coffee to illuminating scientific data in a way that helps validate scientific
knowledge.

A common misconception is that statistics provide a measure of proof that


something is true, but they actually do no such thing. Instead, statistics provide a
measure of the probability of observing a certain result. This is a critical distinction.
For example, the American Cancer Society has conducted several massive studies of
cancer in an effort to make statements about the risks of the disease in US citizens.
Cancer Prevention Study I enrolled approximately 1 million people between 1959
and 1960, and Cancer Prevention Study II was even larger, enrolling 1.2 million
people in 1982. Both of these studies found much higher rates of lung cancer among
cigarette smokers compared to nonsmokers, however, not all individuals who
smoked contracted lung cancer (and, in fact, some nonsmokers did contract lung
cancer). Thus, the development of lung cancer is a probability-based event, not a
simple cause-and-effect relationship.

In conclusion, Statistics can describe how much uncertainty there is in scientific


results and provide proof that something is true. Scientific research rarely leads to
absolute certainty. There is some degree of uncertainty in all conclusions, and
statistics allow us to discuss that uncertainty. Statistical methods are used in all
areas of science. The module explores the difference between (a) proving that
something is true and (b) measuring the probability of getting a certain result. It

Course Module
explains how common words like "significant," "control," and "random" have a
different meaning in the field of statistics than in everyday life.

According to Shari Messinger, Statistical science is much more than data analysis,
and involves the incorporation of statistical methodology at all stages of research,
requiring scientific expertise in the field of statistics. Appropriate use of statistical
methodology in data analysis means the data should be analyzed in a way that is
both scientifically and statistically reasonable. The statisticians are, themselves,
scientists collaborating in research, and are using their statistical expertise in
determining and applying the appropriate methodology for rigorously addressing
important research questions with excellence.

The time invested often requires the following:

 Review of the research for basic understanding of the science


 Review of the data to understand the distributional properties of the
variables collected
 Determination of the appropriate methodology to apply in analysis
corresponding to the hypothesis and design of the investigation
 Programming of the analysis using appropriate statistical software (specific
to the particular data set)
 Review of the analytic results
 Reporting of the results

The time invested for a particular data analysis can take hours, or it can take
months. This depends on the research questions, the study design, the properties of
data gathered, and the target audience that will need to understand the results.

In statistics, variables are central to any analysis and they need to be understood
well by the researcher. Even though the concept looks deceptively simple, many
studies and experienced researchers can go wrong by using the wrong variables.

Like any variable in mathematics, variables can vary, unlike mathematical constants
like pi or e. In statistics, variables contain a value or description of what is being
studied in the sample or population.

For example, if a researcher aims to find the average height of a tribe in Columbia,
the variable would simply be the height of the person in the sample. This is a simple
measure for a simple statistical study. However, most statistical analyses are not as
straightforward.

In many cases, statistical variables do not contain numerical values but rather
something descriptive, such as the color of fins of a fish or the kind of species in a
given natural habitat
Quantitative Method
5
Statistics As Science

According to James Milford, Statistics play a vital role in virtually all branches of
science. Statistical methods are commonly used for analyzing experiment results,
testing their significance and displaying the results accordingly.

In order for a scientist to make an objective judgment as to whether or not a


particular hypothesis can be established by a set of collected data, an objective
method for either accepting, or rejecting, that hypothesis must be used. This is why,
when collating these results, it is essential that statistics are included to ascertain
the level or degree of accuracy in the hypothesis.

Statistics can be used to explain qualitative, as well as the more


easily decipher quantitative, results - that is to say it is possible for statistics to
reveal elements of an experiment that would ordinarily be referred to by a
characteristic value, rather than in a measurable way.

The significance of statistical figures can be seen best when validating


solid arguments or predictions out of hypotheses or conjectures that may seem
overwhelming to a layman. It is far easier for the general public to understand the
results of an experiment in greater clarity and detail if they have the simple
reference point of numbers rather than scientific language, mathematics or
equations.

It could be argued that without the use, and advancements, of statistics and
statistical research the empirical observations of today's scientists and inventors
would be far less accurate and progressive. We all benefit from the developments
and improvements in science that have been made to our day-to-day life, most of the
time taking for granted what has been achieved with the aid of statistics.

Statistics are very important in science as well as technology for a number of


reasons.

1. Research
There are various research methods that use statistics to record their results,
especially when it comes to experiments in the field of science. Once a
number of experiments have been carried out, statistics
will show a certain result. Then, scientists can accumulate all the statistics
from each test to identify if there are any differences or changes over time. It
is important to have statistics because it is the simple and stand-out way of
representing a result.

2. Significance
Statistics are a single number or a collection of numbers that show the
importance of a certain change or development. In terms of technology this is
important because statistics can show what the current trends are in terms
of various technologies and the development of them. It is also good market
research for companies that are creating various devices because statistics
Course Module
from areas around the country can help identify what the best device is to
move forward with.

3. Predictions
The value of statistics is strong because they can serve as predictions as well
as probabilities in certain trends. This is especially the case in scientific
experiments and studies because within a lot of scientific research it is about
trial and error and what reactions work best. If there are professionals
working towards substances for medical use, statistics can identify what
works best.
Variables
“Variable” is a term frequently used in research projects. It is pertinent to define and
identify the variables while designing quantitative research projects. A variable
incites excitement in any research than constants. It is therefore critical for
beginners in research to have clarity about this term and the related concepts.

Variable, to put in layman statement is something that can change and or can have
more than one value. ''A variable, as the name implies, is 1” something that varies”.
It may be weight, height, anxiety levels, income, body temperature and so on. Each
of these properties varies from one person to another and also has different values
along a continuum. It could be demographic, physical or social and include religion,
income, occupation, temperature, humidity, language, food, fashion, etc. Some
variables can be quite concrete and clear, such as gender, birth order, types of blood
group etc while others can be considerably more abstract and vague.

“Variable is a property that takes on different 2 3 values''. It is also a logical


grouping of attributes. Attributes are characteristics or qualities that describe an
object. For example if gender is a variable then male and female are the attributes. If
residence is the variable then urban, semi urban, rural become the attributes. So
attributes here describe the residence of an individual.

It is pertinent for a researcher to know as how certain variables within a study are
related to each other. It is thus important to define the variables to facilitate
accurate explanation of the relationship between the variables. There is no limit to
the number of variables that can be measured, although the more variables, the
more complex the study and the more complex the statistical analysis. Moreover the
longer the list of variables, the longer the time required for data collection.

Variables can be defined in terms of measurable factors through a process of


operationalization. It will convert difficult concepts into easily understandable
concepts which then can be measured, empirically. “It is essential to define the term
as variables so that they can be quantified and measured. That is, the variable have
to be able to work for you to operate, or becomes 4 operational”.

A variable is a characteristic or feature that varies, or changes within a study. The


opposite of variable is constant: something that doesn't change. In math, the
Quantitative Method
7
Statistics As Science

symbols "x" , "y" or "b" represent variables in an equation, while "pi" is a constant.
In an experimental example, if a study is investigating the differences between
males and females, gender would be a variable (some subjects in the study would be
men, and others would be women). If a study has only female subjects, gender
would not be a variable, since there would be only women. If a study includes both
males and females as subjects, but is not interested in differences between men and
women - and does not compare them, gender would not be a variable in that study.

If a study compares three different diets, but keeps all 3 diets the same in the
amount of sodium, then sodium isn't a variable in that study - it's a constant. Other
features of the diets would be variables of interest - maybe the calories or
carbohydrates or fat content.

Independent and Dependent Variables


There are two key variables in every experiment: the independent variable and the
dependent variable.

The independent variable is the variable whose change isn’t affected by any other
variable in the experiment. Either the scientist has to change the independent
variable herself or it changes on its own; nothing else in the experiment affects or
changes it. Two examples of common independent variables are age and time.
There’s nothing you or anything else can do to speed up or slow down time or
increase or decrease age. They’re independent of everything else.

Independent variables are variables that are manipulated or are changed by


researchers and whose effects are measured and compared. The other name for
independent variables is Predictor(s). The independent variables are called as such
because independent variables predict or forecast the values of the dependent
variable in the model.

It is also called “regressors,“ “controlled variable,” “manipulated variable,”


“explanatory variable,” “exposure variable,” and/or “input variable.”

Similarly, dependent variables are also called “response variable,” “regressand,”


“measured variable,” “observed variable,” “responding variable,” “explained
variable,” “outcome variable,” “experimental variable,” and/or “output variable.”

In experimental research, an investigator manipulates one variable and measures


the effect of that manipulation on another variable. The variable that the researcher
manipulates is called the independent, or grouping variable. The independent
variable is the variable that is different between the groups compared: all the
members of one group will have the same level of the independent variable, a
second group will have a different level of that same variable, and the same for a 3rd
or 4th group, if present.

Course Module
The dependent variable is what is being studied and measured in the experiment.
It’s what changes as a result of the changes to the independent variable. An example
of a dependent variable is how tall you are at different ages. The dependent variable
(height) depends on the independent variable (age).

It refers to that type of variable that measures the affect of the independent
variable(s) on the test units. We can also say that the dependent variables are the
types of variables that are completely dependent on the independent variable(s).
The other name for the dependent variable is the Predicted variable(s). The
dependent variables are named as such because they are the values that are
predicted or assumed by the predictor / independent variables. For example, a
student’s score could be a dependent variable because it could change depending on
several factors, such as how much he studied, how much sleep he got the night
before he took the test, or even how hungry he was when he took it. Usually when
one is looking for a relationship between two things, one is trying to find out what
makes the dependent variable change the way it does.

It is the outcome variable measured in each subject, which may be influenced by


manipulation of the independent variable In experimental studies, where the
independent variables are imposed and manipulated, the dependent variable is the
variable thought to be changed or influenced by the independent variable.

An easy way to think of independent and dependent variables is, when you’re
conducting an experiment, the independent variable is what you change, and the
dependent variable is what changes because of that. You can also think of the
independent variable as the cause and the dependent variable as the effect.
It can be a lot easier to understand the differences between these two variables with
examples, so let’s look at some sample experiments below.

Examples of Independent and Dependent Variables in Experiments

Below are overviews of three experiments, each with their independent and
dependent variables identified.

Experiment 1: You want to figure out which brand of microwave popcorn pops the
most kernels so you can get the most value for your money. You test different
brands of popcorn to see which bag pops the most popcorn kernels.

Independent Variable: Brand of popcorn bag (It’s the independent variable


because you are actually deciding the popcorn bag brands)

Dependent Variable: Number of kernels popped (This is the dependent


variable because it's what you measure for each popcorn brand)

Experiment 2: You want to see which type of fertilizer helps plants grow fastest, so
you add a different brand of fertilizer to each plant and see how tall they grow.
Quantitative Method
9
Statistics As Science

Independent Variable: Type of fertilizer given to the plant


Dependent Variable: Plant height

Experiment 3: You’re interested in how rising sea temperatures impact algae life, so
you design an experiment that measures the number of algae in a sample of water
taken from a specific ocean site under varying temperatures.

Independent Variable: Ocean temperature


Dependent Variable: The number of algae in the sample

For each of the independent variables above, it’s clear that they can’t be changed by
other variables in the experiment. You have to be the one to change the popcorn and
fertilizer brands in Experiments 1 and 2, and the ocean temperature in Experiment
3 cannot be significantly changed by other factors. Changes to each of these
independent variables cause the dependent variables to change in the experiments.

A few more examples can highlight the importance and usage of dependent and
independent variables in a broader sense.

If one wants to measure the influence of different quantities of nutrient intake on


the growth of an infant, then the amount of nutrient intake can be the independent
variable, with the dependent variable as the growth of an infant measured by height,
weight or other factor(s) as per the requirements of the experiment.

If one wants to estimate the cost of living of an individual, then the factors such as
salary, age, marital status, etc. are independent variables, while the cost of living of a
person is highly dependent on such factors. Therefore, they are designated as the
dependent variable.

In the case of time series analysis, forecasting a price value of a particular


commodity is again dependent on various factors as per the study. Suppose we want
to forecast the value of gold, for example. In this case the seasonal factor can be an
independent variable on which the price value of gold will depend.

In the case of a poor performance of a student in an examination, the independent


variables can be the factors like the student not attending classes regularly, poor
memory, etc., and these will reflect the grade of the student. Here, the dependent
variable is the test score of the student.

Where Do You Put Independent and Dependent Variables on Graphs?

Course Module
Independent and dependent variables always go on the same places in a graph. This
makes it easy for you to quickly see which variable is independent and which is
dependent when looking at a graph or chart. The independent variable always goes
on the x-axis, or the horizontal axis. The dependent variable goes on the y-axis, or
vertical axis.

Here’s an example:

Figure 1: Graph of Hours Studied and Score on Exam


URL: https://blog.prepscholar.com/hs-fs/hubfs/body_graph-
3.jpg?t=1533332397212&width=558&name=body_graph-3.jpg
Retrieved: August 06, 2018

As you can see, this is a graph showing how the number of hours a student studies
affects the score she got on an exam. From the graph, it looks like studying up to six
hours helped her raise her score, but as she studied more than that her score
dropped slightly.

The amount of time studied is the independent variable, because it’s what she
changed, so it’s on the x-axis. The score she got on the exam is the dependent
variable, because it’s what changed as a result of the independent variable, and it’s
on the y-axis. It’s common to put the units in parentheses next to the axis titles,
which this graph does.

Independent Variable Dependent Variable


A variable
A variable is dependent if
is independent if it may
it varies according to
vary freely and does not
Definition changes in other
depend upon changes in
variables. It is usually
other variables. It is
denoted by y.
usually denoted by x.
Quantitative Method
11
Statistics As Science

 Time spent reviewing


for an exam  The marks from an
Example  How many spoonfuls of exam
sugar you put in your  How sweet your tea is
tea

References and Supplementary Materials


Books and Journals
1. Melanie Alvarez; 2016; Dependent and Independent Variables; Florida; Rourke
Educational Media
2. Andy Field; Discovering Statistics Using IBM SPSS Statistics; Germany; Mohn Media
Mohndruck GmbH
Online Supplementary Reading Materials
1. Statistics in Science;
http://magazine.amstat.org/blog/2011/08/01/statscienceaug11/; August 06, 2018
2. Variables; http://www.pt.armstrong.edu/wright/hlpr/text/3.1.variables.htm; August
06, 2018
3. Dependent and Independent Variables Review;
https://www.khanacademy.org/math/pre-algebra/pre-algebra-equations-
expressions/pre-algebra-dependent-independent/a/dependent-and-independent-
variables-review; August 06, 2018

Online Instructional Videos


1. It discusses about Deontology. Deontology is an ethical theory that uses rules to
distinguish right from wrong. Deontology is often associated with philosopher
Immanuel Kant. Kant believed that ethical actions follow universal moral laws, such
as “Don’t lie. Don’t steal. Don’t cheat.”;
https://ethicsunwrapped.utexas.edu/glossary/deontology; August 05, 2018

Course Module

You might also like