Bio Stat

DEFINITIONS
Data are collections of observations, such as measurements, or survey responses. (A

single data value is called a datum, a term rarely used. The term "data" is plural so it is
correct to say "data are..." not "data is...")
Statistics is the science of planning studies and experiments; obtaining data; and
organizing, summarizing, presenting, analyzing, and interpreting those data and then
drawing conclusions based on them.
A population is the complete collection of all measurements or data that are being
considered. Typically, the population is the complete collection of data that we would
like to make inferences about.
A census is the collection of data from every member of the population.
A sample is a subcollection of members selected from a population.

EXAMPLE:
In the journal article "Residential Carbon Monoxide Detector Failure Rates in the United
States" (by Ryan and Arnold, American Journal of Public Health, Vol. 101, No. 10), it
was stated that there are 38 million carbon monoxide detectors installed in the United
States. When 30 of them were randomly selected and tested, it was found that 12 of
them failed to provide an alarm in hazardous carbon monoxide conditions. In this case,
the population and sample are as follows:
Population: All 38 million carbon monoxide detectors in the United States
Sample: The 30 carbon monoxide detectors that were selected and tested
The objective is to use the sample data as a basis for drawing a conclusion about the
population of all carbon monoxide detectors, and methods of statistics are helpful in
drawing such conclusions.

PROCESS INVOLVED IN A STATISTICAL STUDY
1. Prepare
2. Context
 What do the data represent?

 What is the goal of study?
2. Source of the Data

 Are the data from a source with a special interest so that there is pressure to
obtain results that are favorable to the source?
3. Sampling Method
Were the data collected in a way that is unbiased, or were the data collected in a way
that is biased (such as a procedure in which respondents volunteer to participate?
1. Analyze
2. Graph the Data
3. Explore the Data
 Are there any outliers (numbers very far away from almost all the other data)?
 What important statistics summarize the data (such as the mean and standard
deviation)?
 How are the data distributed?
 Are there missing data?
 Did many selected subjects refuse to respond?
3. Apply Statistical Methods
 Use technology to obtain results.
III. Conclude
1. Significance
 Do the results have statistical significance?

 Do the results have practical significance?

DEFINITIONS
A voluntary response sample (or self-selected sample) is one in which respondents
themselves decide whether to be included.
Statistical significance is achieved in a study when we get
a result that is very unlikely to occur by chance. A common criterion is that we have
statistical significance if the likelihood of an event occurring by chance is 5% or less.
Example:
- Getting 98 girls in 100 random births is statistically significant because such an
extreme outcome is not likely to result from random chance.
-Getting 52 girls in 100 births is not statistically significant because that event could
easily occur with random chance.
Practical significance is possible that some treatment or finding is effective, but
common sense might suggest that the treatment or finding does not make enough of a
difference to justify its use or to be practical.

ANALYZING DATA: POTENTIAL PITFALLS
Here are few items that could cause problems when analyzing data.
1. Misleading Conclusions
When forming a conclusion based on a statistical analysis, we should make statements

that are clear even to those who have no understanding of statistics and its terminology.
We should carefully avoid making statements not justified by the statistical analysis.
2. Sample Data Reported Instead of Measured
When collecting data from people, it is better to take measurements yourself instead of
asking subjects to report results.
3. Loaded Questions
If survey questions are not worded carefully, the results of a study can be misleading.
4. Order of Questions
Sometimes survey questions are unintentionally loaded by such factors as the order of
the items being considered.
5. Nonresponse
A nonresponse occurs when someone either refuses to respond to a survey

question or is unavailable.
6. Percentages
Some studies cite misleading or unclear percentages. Note that 100% of some quantity
is all of it, but if there are references made to percentages that exceed 100%, such
references are often not justified.

BASIC TYPES OF DATA
A parameter is a numerical measurement describing some characteristic of a
population.
A statistic is a numerical measurement describing some characteristic of a sample.

EXAMPLE:
There are 17,246,372 high school students in the United States. In a study of 8505 U.S.
high school students 16 years of age or older, 44.5% of them said that they texted while
driving at least once during the previous 30 days (based on data in Texting While
Driving and Other Risky Motor Vehicle Behaviors Among High School Students," by
Olsen, Shults, Eaton, Pediatrics, Vol. 131, No. 6).
1. Parameter: The population size of all 17,246,372 high school students is a

parameter, because it is the size of the entire population of all high school
students in the United States. If we somehow knew the percentage of all 17,246,372
high school students who reported they had texted while driving, that percentage would
also be a parameter.
2. Statistic: The value of 44.5% is a statistic, because it is based on the sample,

not on the entire population.

QUANTITATIVE/CATEGORICAL
Quantitative (or numerical) data consist of numbers representing counts or
measurements.
Categorical (or qualitative or attribute) data consist of names or labels (not numbers
that represent counts or measurements).

EXAMPLES:
1. Quantitative Data: The ages (in years) of subjects enrolled in a clinical trial

2. Categorical Data as Labels: The genders (male/female) of subjects enrolled in
a clinical trial
3. Categorical Data as Numbers: The identification numbers 1, 2, 3, ..., 25 are
assigned randomly to the 25 subjects in a clinical trial. Those numbers are
substitutes for names. They do not measure or count anything, so they are
categorical data, not quantitative data.

DISCRETE/CONTINUOUS
Discrete data result when the data values are quantitative, and the number of values is
finite or "countable." (If there are infinitely many values, the collection of values is
countable if it is possible to count them individually, such as the number of tosses of a
coin before getting tails or the number of births in Houston before
getting a male.)
Continuous (numerical) data result from infinitely many possible quantitative values,
where the collection of values is not countable. (That is, it is impossible to count the
individual items because at least some of them are on a continuous scale, such as the
lengths of distances from 0 cm to 12 cm.)

EXAMPLES:
1. Discrete Data of the Finite Type: Each of several physicians plans to count the
number of physical examinations given during the next full week. The data are
discrete data because they are finite numbers, such as 27 and 46 that result from a
counting process.
2. Discrete Data of the Infinite Type: Researchers plan to test the accuracy of a
blood typing test by repeating the process of submitting a sample of the same blood
(Type O+) until the test yields an error. It is possible that each researcher could
repeat this test forever without ever getting an error, but they can still count the
number of tests as they proceed. The collection of the numbers of tests is countable,
because you can count them, even though the counting could go on forever.
3. Continuous Data: When the typical patient has blood drawn as part of a routine
examination, the volume of blood drawn is between 0 mL and 50 mL. There are
infinitely many values between 0 mL and 50 mL. Because it is impossible to count
the number of different possible values on such a continuous scale, these amounts
are continuous data.
LEVELS OF MEASUREMENT
1. NOMINAL LEVEL
The nominal level of measurement is characterized by data that consist of names,

labels, or categories only. It is not possible to arrange the data in some order (such as
low to high).
EXAMPLES:
1. Yes/No/Undecided: Survey responses of yes, no, and undecided

2. Coded Survey Responses: For an item on a survey, respondents afe given a
choice of possible answers, and they are coded as follows: "l agree" is coded as 1: "I
disagree" is coded as 2; "I don't care" is coded as 3; "I refuse to answer" is coded as
4; "Go away and stop bothering me" is coded as 5. The numbers 1, 2, 3, 4, 5 don't
measure or count anything.
3. ORDINAL LEVEL
Data are at the ordinal level of measurement if they can be arranged in some
order, but differences (obtained by subtraction) between data values either cannot be
determined or are meaningless.
EXAMPLE:
Course Grades: A biostatistics professor assigns grades of A, B, C, D, or F. These
grades can be arranged in order, but we can't determine differences between the
grades. For example, we know that A is higher than B (so there is an ordering), but we
cannot subtract B from A (so the difference cannot be found).
3. INTERVAL LEVEL
Data are at the interval level of measurement if they can be arranged in order, and
differences between data values can be found and are meaningful; but data at this level
do not have a natural zero starting point at which none of the quantity is present.
EXAMPLES:
1. Temperatures: Body temperatures of 98.2°F and 98.8°F are examples of data at

this interval level of measurement. Those values are ordered, and we can determine
their difference of 0.6°F. However, there is no natural starting point. The value of 0°F
might seem like a starting point, but it is arbitrary and does not represent the total
absence of heat.
2. Years: The years 1492 and 1776 can be arranged in order, and the difference of
284 years can be found and is meaningful. However, time did not begin in the year
0, so the year 0 is arbitrary instead of being a natural zero starting point representing
"no time."
3. RATIO LEVEL
Data are at the ratio level of measurement if they can be arranged in order,
differences can be found and are meaningful, and there is a natural zero starting point
(where zero indicates that none of the quantity is present). For data at this level,
differences and ratios are both meaningful.
EXAMPLES:
1. Heights of Students: Heights of 180 cm and 90 cm for a high school student

and a preschool student (0 cm represents no height, and 180 cm is twice as tall as
90 cm.)
2. Class Times: The times of 50 min and 100 min for a statistics class (0 min
represents no class time, and 100 min is twice as long as 50 min.)

BIG DATA
Big data refers to data sets so large and so complex that their analysis is beyond the
capabilities of traditional software tools. Analysis of big data may require software
simultaneously running in parallel on many different computers.
Data science involves applications of statistics, computer science, and software
engineering, along with some other relevant fields (such as biology and epidemiology).

MISSING DATA
A data value is missing completely at random if the likelihood of its being missing is
independent of its value or any of the other values in the data set. That is, any data
value is just as likely to be missing as any other data value.
A data value is missing not at random if the missing value is related to the reason that
it is missing.
Different Methods of Correcting Missing Data:
1. Delete Cases: One very common method for dealing with missing data is to
delete all subjects having any missing values.
2. Impute Missing Values: We impute missing data values when we substitute
values for them. There are different methods of determining the replacement values,
such as using the mean of the other values, or using a randomly selected value from
other similar cases, or using a method based on regression analysis.

BASICS OF DESIGN OF EXPERIMENTS
The Gold Standard: Randomization with placebo/treatment groups is sometimes called
the “gold standard” because it is so effective. (A placebo such as sugar pill has no
medicinal effect.)

DEFINITIONS
In an experiment, we apply some treatment and then proceed to observe its effects on
the individuals. (The individuals in experiments are called experimental units, and they
are often called subjects when they are people.)
In an observational study, we observe and measure specific characteristics, but we do
not attempt to modify the individuals being studied.
EXAMPLE:
Observational Study: Observe past data to conclude that ice cream causes drownings
(based on data showing that increases in ice cream sales are associated with increases
in drownings). The mistake is to miss the lurking variable of temperature and the failure
to see that as the temperature increases, ice cream sales increase and
drownings increase because more people swim.
Experiment: Conduct an experiment with one group treated with ice cream while
another group gets no ice cream. We would see that the rate of drowning victims is
about the same in both groups, so ice cream consumption has no effect on drownings.
Here, the experiment is clearly better than the observational study.

Design of Experiments
1. Replication: It is the repetition of an experiment on more than one individual.

Good use of replication requires sample sizes that are large enough so that we can
see effects of treatments.
2. Blinding: It is used when the subject doesn’t know whether he or she is
receiving a treatment or placebo.
3. Randomization: It is used when individuals are assigned to different groups
through a process of random selection.

COLLECTING SAMPLE DATA
A simple random sample of n subiects is selected in such a way that every possible
sample of the same size n has the same chance of being chosen. (A simple random
sample is often called a random sample, but strictly speaking, a random sample has the
weaker requirement that all members of the population have the same chance of being
selected. That distinction is not so important in this text.)
In systematic sampling, we select some starting point and then select every kth (such
as every 50th) element in the population.
With convenience sampling, we simply use data that are very easy to get.
In stratified sampling, we subdivide the population into at least two different subgroups
(or strata) so that subjects within the same subgroup share the same characteristics
(such as gender). Then we draw a sample from each subgroup (or stratum).
In cluster sampling, we first divide the population area into sections (or clusters). Then
we randomly select some of those clusters and choose all the members from those
selected clusters.
In a multistage sample design, pollsters select a sample in different stages, and each
stage might use different methods of sampling.

OBSERVATIONAL STUDIES
In a cross-sectional study, data are observed, measured, and collected at one point in
time, not over a period of time.
In a retrospective (or case-control) study, data are collected from a past time period
by going back in time (through examination of records, interviews, and so on).
In a prospective (or longitudinal or cohort) study, data are collected in the future
from groups that share common factors (such groups are called cohorts).

EXPERIMENTS
In a study, cofounding occurs when we can see some effect, but we can’t identify the
specific factor that caused it.
Completely Randomized Experimental Design: Assign subjects to different treatment
groups through a process of random selection.
Randomized Block Design: A block is a group of subjects that are similar, but blocks
differ in ways that might affect the outcome of the experiment. Use the following
procedure: Form blocks (or groups) of subjects with similar characteristics; and
randomly assign treatments to the subjects within each block.
Matched Pairs Design: Compare two treatment groups (such as treatment and
placebo) by using subjects matched in pairs that are somehow related or have similar
characteristics.
Rigorously Controlled Design: Carefully assign subjects to different treatment groups,
so that those given each treatment are similar in the ways that are important to the
experiment. This can be extremely difficult to implement, and often we can never be
sure that we have accounted for all of the relevant factors.

SAMPLING ERRORS
A sampling error (or random sampling error) occurs when the sample has been
selected with a random method, but there is a discrepancy between a sample result and
the true population result: such an error results from chance sample fluctuations.
A nonsampling error is the result of human error, including such factors as wrong data
entries, computing errors, questions with biased wording, false data provided by
respondents, forming biased conclusions, or applying statistical methods that are not
appropriate for the circumstances.
A nonrandom sampling error is the result of using a sampling method that is not
random, such as using a convenience sample or a voluntary response sample.

Module 1 - Lesson 2: Importance

of Biostatistics and Health
Statistics
What is Biostatistics?
Biostatistics is the branch of statistics responsible for interpreting the scientific data that
is generated in the health sciences, including the public health sphere. It is the
responsibility of biostatisticians and other experts to consider the variables in subjects
(in public health, subjects are usually patients, communities, or populations), to
understand them, and to make sense of different sources of variation.

The goal of biostatistics is to disentangle the data received and make valid inferences
that can be used to solve problems in public health. Biostatistics uses the application of
statistical methods to conduct research in the areas of biology, public health, and
medicine. Many times, experts in biostatistics collaborate with other scientists and
researchers.

Biostatistics has made major contributions to our understanding of countless public
health issues, such as:

 Chronic diseases
 Cancer
 Human growth and development
 The relationship between genetics and the environment
 AIDS
 Environmental health (the impact and monitoring of)
 Biostatistics is integral to the advancement of knowledge, not only in public
health policy, but also in biology, health policy, clinical medicine, health economics,
genomics, proteomics, and a number of other disciplines.

The Role of Biostatisticians

Biostatisticians are said to be the specialists of data evaluation, as it is their expertise
that allows them to take complex, mathematical findings of clinical trials and research-
related data and translate them into valuable information that is used to make public
health decisions. The work of biostatisticians is also required in government agencies
and legislative offices, where research is often used to influence change at the policy-
making level.

In short, these professionals use mathematics to enhance science and bridge the gap
between theory and practice.

Biostatisticians are required to develop statistical methods for clinical trials,
observational studies, longitudinal studies, and genomics:

 Clinical trials: Studying the evaluation of treatments, screening, and prevention

methods in populations
 Epidemiological: Studying the causes and origins of disease in humans
 Human Genetics: Studying the genetic differences associated with diseases and
disease states
 Genomics: Studying the biological activity of genes as they relate to diseases
and treatments
 Spatial Studies: Studying the geographical distribution of disease/risk factors
Although the work of these scientists is complex, their responsibilities include:

 Designing and conducting experiments related to health, emergency

management, and safety
 Collecting and analyzing data to improve current public health programs and
identify problems and solutions in the public health sector
 Interpreting the results of their findings
 The validity of their research results depends on how well they can make
meaningful generalizations and how well they can reproduce and apply experimental
methods.

What is Informatics?
Informatics, which is actually an emerging field, is also known as bioinformatics, a
science that relies on the basic disciplines of science, mathematics, probability and
statistics, and computer science to build a solid statistical foundation for making
advances, improvements, and even breakthroughs in public health and medicine.

Health informatics is often said to meet at the intersection of information science,
computer science, and healthcare, as it deals with the resources, devices, and methods
required for the effective storage, use, and retrieval of information, while public health
informatics includes the application of informatics in public health areas, such as
surveillance, prevention, preparedness, and health promotion. Public health informatics
focuses on information and technology issues from the perspective of groups of
individuals.

Naturally, health informatics tools would include computers, making systems analysts
important members of public health informatics research teams. It is the responsibility of
expert informaticists to systematically apply information, computer science, and
technology into research, learning, and the practice of public health.

The Role of Systems Analysts in Informatics

Systems analysts are called upon to write and troubleshoot the software used by
biostatisticians and researchers. Their work may also include conducting their own
research, designing databases, and developing algorithms for processing and analyzing
information.

The main responsibilities of systems analysts in biostatistics and informatics include:

 Incorporating bioinformatics/biostatistics into efficient and automated data

analysis tools
 Developing and tracking quality workflow metrics for detecting variants and
sequences
 Working with scientists and researchers to develop project plans
Module 1 - Lesson 2: Importance

of Biostatistics and Health
Statistics Part 2
The Importance of Statistical Competencies for Medical Research Learners
In considering statistics education at the undergraduate and graduate level, it may be
informative to consider how that education might be used by students who go on to
become physicians or other health scientists. In this article, our goal is to inform the
readers of this journal about recent developments in the field of statistics education for
medical researchers and health scientists.

It is very important for medical researchers and health science learners to be literate in
biostatistics, as biostatistics is frequently used to assist in the design of medical
research studies and to summarize, analyze, and report on data obtained from these
studies. In recent years, results obtained from statistical analyses have been reported in
the medical literature on an increasing basis (Altman et al. 2002; Horton and Switzer
2005; Strasak et al. 2007), and so an increased knowledge of biostatistics may be
needed in order to fully understand results that are being reported and translate those
results to research or practice. Additional familiarity in biostatistics is also needed in
order to understand when statistical methods have been incorrectly applied or when
statistical results have been incorrectly reported; it has been noted that without the
benefit of statistical expertise, statistical mistakes are common in manuscripts that have
been submitted to or already published in peer-reviewed journals (Harris et al. 2009;
Mazumdar et al. 2010; Fernandes-Taylor et al. 2011).

The development, teaching, and assessment of statistical competencies for medical
research are essential tasks. Competency has been defined as the ability or skill to do
something successfully or efficiently (Oxford Dictionary 2018). Statistical competencies
are used to define statistical education in terms of the topics that should be taught and
more generally for designing curricula for learners. Our previous and current work in this
area is oriented toward the statistical education of medical research learners. In the
remainder of this article, we will review our previous efforts, describe our current efforts,
summarize the origin and the inclusion of statistical competency items, and discuss
possible future directions toward the need for and development, assessment,
implementation of statistical competencies for medical research learners, and
implications of this work for undergraduate statistics education.

Medical statistics is the study of human health and disease. Its applications ranging
from biomedical laboratory research, to clinical medicine, to health promotion, to
national and global systems of health care to medicine and the health sciences,
including public health, forensic medicine, epidemiology and clinical research. It is a
branch of statistics commonly named as biostatistics. It is the science of summarizing,
collecting, presenting and interpreting data in medical approach and using this data
estimate the magnitude of associations and test hypotheses. It has a main role in
medical investigations. Biostatistics has played an integral role in modern medicine.
Statisticians help researchers design studies, analyze data from medical experiments,
decide what data to collect, help interpret the results of the analyses, and collaborate in
writing articles to describe the results of medical research. Biostatistics helps
researchers make sense of the data collected to decide whether a treatment is working
or to find factors that contribute to diseases. Medical statisticians design and analyze
studies to identify the real causes of health issues as distinct from chance variation.

Why Are Statistics Important in the Health Care Field?
Nowadays, health care organizations employ statistical analysis to measure their
performance outcomes. Hospitals and other large provider service organizations
continuous quality improvement programs implement data driven, to maximize
efficiency. Government health and human service agencies measure the overall health
and well-being of populations with statistical information.

Health Care Utilizations
Researchers use scientific methods to gather data on samples of human population.
The health care organizations benefit from knowing consumer market characteristics
such as age, sex, race, income and disabilities. These types of statistics can predict
what type of services that people are using and the level of care that is affordable to
them.

Resource Allocation
Statistical information is necessary in determining which resources are used to produce
goods and service, what combination of goods and services to produce, and to which
populations to serve them. Health care statistics are critical to production efficiency and
allocation. Valid statistical information minimizes the risks of health care trade-offs.

Needs Assessment
Public and private health care administrators, charged with providing continues care to
diverse populations, compare existing services to community needs. Statistical analysis
is a major component in a needs assessment. Statistics are equally important to
pharmaceutical and technology companies in developing product lines that meet the
needs of the populations they serve.

Quality Improvement
Health care suppliers struggle to make effective goods and services efficiently. Statistics
are important to health care organizations in measuring performance success or failure.
By building benchmarks, or standards of service excellence, quality improvement
managers can measure future results. Analysts chart the overall growth and ability to
survive of a health care company using statistical data gathered over time.

Product Development
Innovative medicine begins and ends with statistical analysis. Data are collected and
reported in clinical trials of new technologies and treatments to weigh products benefits
against their risks. Statistics indirectly influence pricing of product by describing
consumer demand in measurable units.

Module 1 - Lesson 3: Fundamental

Assumptions of Epidemiology and
Epidemic Characteristics of
Diseases
While epidemiology as a discipline has unfolded since World War II, epidemiologic
thinking has been uncovered from Hippocrates through John Graunt, William Farr, John
Snow, and others. Their legacy in the field of epidemiology are as follows:
Hippocrates (400BC) believed that disease might be associated with the physical
environment, including seasonal variation in illness.
John Graunt (1662) was the pioneer in using quantitative methods in discussing
population vital statistics.
William Farr (1800) systematically collected and analysed Great Britain’s mortality
statistics.
John Snow (1850) created natural epidemiologic experiment to prove that cholera was
transmitted by contaminated water.
The valuable contributions to the development of epidemiologic thinking of some early
and more recent thinkers are presented in this
article: http://sphweb.bumc.bu.edu/otlt/MPH-
Modules/PH/PublicHealthHistory/publichealthhistory6.html
Epidemiology is based on two fundamental assumptions:
1. Diseases do not occur by chance.
- It implies that agents of diseases serve as a contributing factor to the

occurrence of a disease.
2. Diseases are not randomly distributed in the population.
- The distribution indicates something on how and why that disease process
occurs.
Wikipedia defines an epidemic as the rapid spread of disease to a large number of
people in a given population within a short period of time. Epidemics occur when an
agent and susceptible hosts are present in adequate numbers, and the agent can be
effectively conveyed from a source to the susceptible hosts. It is therefore important to
determine the characteristics of epidemic diseases that cause them to die out and not
reappear for a long period of time.
The epidemic characteristics of diseases are:
1. The incubation period of the disease – refers to the time interval between the
infection and the appearance of the signs and symptoms of the disease
2. The various means by w/c a disease may spread
3. The speed of penetration of a disease into the community
4. The rapidity of the disappearance of a disease from a community
Module 1- Lesson 4: Types of

Epidemiologic Studies
Epidemiology studies are carried out using a populace to assess whether there are a
cause and effect relationship between exposure to a substance and unfavorable health
effects. Epidemiologic studies fall into two
categories: experimental and observational.
In an experimental study, epidemiologists can control certain factors within the study
from the beginning. An example of this type is a clinical trial for vaccine efficacy where
research scientists randomly control who receives the test vaccine and who does not
among a limited group of participants; they then observe the outcome to determine if it
should be used to a wider population.
In an observational study, the epidemiologist does not control the circumstances. This
study type can be further subdivided into descriptive and analytic.
Descriptive and Analytic Epidemiology
Descriptive Epidemiology Analytic Epidemiology

When was the population affected? How was the population affected?
Where was the population affected? Why was the population affected?
Who was affected?
In a descriptive study, the epidemiologist collects information that characterizes and

summarizes the health event or problem.
In the analytic study, the epidemiologist relies on comparisons between different
groups to determine the role of diverse conditions or risk factors.
Check this link https://www.slideshare.net/iq1086/types-of-epidemiological-
designs (Links to an external site.) for a thorough discussion about the types of
epidemiologic studies.

Epidemiology is not a science conducted in a nothingness. Epidemiologists are in the
field every day, putting their expertise to use. One way epidemiologists apply their
knowledge is by using the basic steps involved in investigating an outbreak.
These steps include:
1. Confirmation of Outbreak
2. Verify Diagnosis
3. Case Definition
4. Case Finding
5. Descriptive Epidemiology
6. Generate Hypothesis
7. Analytical Epidemiology
8. Evaluate Control Measures
9. Surveillance
10. Module 1 - Lesson 5: The
Sample Size and Sampling
Technique-2
11.
12. The sample size is the proportion of the general population that are taking part
in the study. It is an important feature of any theoretical study in which the goal is
to make conclusions about a population from a sample. The lower your sample
size, the higher your margin of error and lower your confidence level. This means
that your data is becoming less reliable. On the contrary, the greater the sample
size the more “statistically significant” the result will be. In other words, if a very
large sample is used, even a small difference from the null hypothesis will be
statistically significant, even if these are not, in fact, practically important.
13. There are a number of different methods for calculating sample size. This
link https://www.statisticshowto.com/probability-and-statistics/find-sample-size/ (L
inks to an external site.) shows the different methods of computing a sample
size, like Cochran's formula and Excel. Likewise, you can also look for online
calculators that can be useful for your research study.
14. Once you’ve chosen the sample size for your study, you’ll need to determine
which sampling technique you’ll use to select your sample from the target
population. The sampling technique that’s right for you depends on the nature
and objectives of your study. There are several sampling techniques available,
and they can be subdivided into two groups: probability sampling and non-
probability sampling.
15. Click on this link https://www.slideserve.com/courtney/sampling-
techniques (Links to an external site.) to determine the various sampling
techniques used in research. Examples are also provided for a better
understanding of the topic.
Module 1 - Lesson 6: Methods of
Collecting, Presenting, Organizing
and Summarizing Data
In this era when “information is power,” how we collect information should be one of our
issues of concern as well as which method of collecting data best answers our
individual needs. If data collected are unreliable, it will surely affect the findings of the
study, thereby leading to false or invaluable results. Conversely, if collected data are
accurate, it can help researchers predict future occurrences and trends.
Data collection is the systematic process of gathering and measuring information from
a variety of sources to get a complete and accurate picture of an area of
interest. Surveys, interviews, and focus groups are principal tools for collecting
information. Today, with help from Web and some analytics tools, researchers are also
able to collect data from mobile devices, website traffic, server activity, and other
relevant sources, depending on the project and needs.
This videoLearn Data collection Methods | Data Science | Quantra Free Courses (Links
to an external site.) provides an introduction to this topic. You can

also read this material for more information on the methods of data
collection: https://www.questionpro.com/blog/data-collection/ (Links to an external site.)

Presentation of Data
The presentation of data is of utter importance nowadays. After all, everything that’s
pleasing to our eyes never fails to grab our attention. The presentation of data refers
to an exhibition or putting up data in an attractive and useful manner such that it can be
easily interpreted.
The three main forms of data presentation are:
1. Textual presentation - In this method, data are presented in text format similar
to what is found in books, reports, and research papers.
2. Data tables - In this form, data are presented in rows and columns. It is a precise
way of showing all the data but it can be hard to interpret or see a pattern. It is
normally used to differentiate, classify, compare, and relate different datasets.
3. Diagrammatic Presentation - Data can further be presented in a simple and
even easier form by using diagrams, illustrations, images, or graphs. Changing raw
data into a diagrammatic form directly makes it quick and easier to interpret.
For more information about this topic, read this

article. https://planningtank.com/planning-techniques/data-presentation-and-analysis

Organizing and Summarizing Data

It has been said that statistics is concerned with making sense of numerical information.
It is easier to interpret data sets that are properly organized than to deal with raw or
unorganized data set.
Data can be organized and summarized using a variety of methods. One way
of organizing data is to arrange or sort the values on the basis of one variable. For
example, suppose the author is interested in the weights of the baseball players and so
he sorts the collected data from the heaviest to the lighter player. Another way to
organize data is to divide them into two or more groups by the values of one
variable.
Tables are commonly used in summarizing data, and there are
many graphical and numerical methods as well. One example of tables is
the frequency distribution table. It contains a list of frequencies and each startifu of
data.
The appropriate type of representation for a collection of data depends in part on the
nature of the data, such as whether the data are numerical or nonnumerical.

Bio Stat

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bio Stat

Uploaded by

Copyright:

Available Formats

DEFINITIONS

Data are collections of observations, such as measurements, or survey responses. (A

 What do the data represent?

2. Source of the Data

3. Apply Statistical Methods

 Use technology to obtain results.

 Do the results have statistical significance?

When forming a conclusion based on a statistical analysis, we should make statements

2. Sample Data Reported Instead of Measured

A nonresponse occurs when someone either refuses to respond to a survey

1. Parameter: The population size of all 17,246,372 high school students is a

2. Statistic: The value of 44.5% is a statistic, because it is based on the sample,

1. Quantitative Data: The ages (in years) of subjects enrolled in a clinical trial

The nominal level of measurement is characterized by data that consist of names,

1. Yes/No/Undecided: Survey responses of yes, no, and undecided

1. Temperatures: Body temperatures of 98.2°F and 98.8°F are examples of data at

1. Heights of Students: Heights of 180 cm and 90 cm for a high school student

1. Replication: It is the repetition of an experiment on more than one individual.

Module 1 - Lesson 2: Importance

 Clinical trials: Studying the evaluation of treatments, screening, and prevention

Although the work of these scientists is complex, their responsibilities include:

 Designing and conducting experiments related to health, emergency

 Incorporating bioinformatics/biostatistics into efficient and automated data

Module 1 - Lesson 2: Importance

Module 1 - Lesson 3: Fundamental

1. Diseases do not occur by chance.

- It implies that agents of diseases serve as a contributing factor to the

2. Diseases are not randomly distributed in the population.

Module 1- Lesson 4: Types of

Descriptive Epidemiology Analytic Epidemiology

In a descriptive study, the epidemiologist collects information that characterizes and

to an external site.) provides an introduction to this topic. You can

For more information about this topic, read this

Organizing and Summarizing Data

You might also like