You are on page 1of 13

Sandoval, Louise Bianca N.

BIOSTATISTICS - HW

A. Define the following:


1. Statistics - It is the study of how to collect, organizes, analyze, and interpret data.
It provides methods and techniques for summarizing and making inferences from
data, enabling researchers to draw meaningful conclusions about populations or
phenomena of interest.

2. Population - The population may simply be defined as the whole number of


people or inhabitants in a country or region. It also refers to the entire group of
individuals, organisms, or objects that share common characteristics and are the
focus of a particular study or analysis.

3. Sample - It is a subset of the population that is selected for study or analysis. It


consists of a smaller group of individuals, patients, or data points that are chosen
to represent the larger population of interest. Samples are often used in research
studies to draw conclusions about the population as a whole.

4. Variable - A variable is any characteristic, number, or quantity that can be


measured or counted. A variable may also be called a data item. In addition, a
variable is any characteristic, attribute, or measurement that can vary among
individuals or subjects within a population or sample. Moreover, variables in
nursing and healthcare research can include demographic information (such as
age, gender, and ethnicity), clinical measurements (such as blood pressure or
cholesterol levels), or treatment outcomes (such as response to medication or
recovery time).

5. Frequency Distribution Table - A frequency distribution table is a chart that


summarizes all the data under two columns - variables/categories, and their
frequency. It is a tabular summary of data that shows the frequency (or count) of
each value or category within a dataset. It organizes data into intervals or
categories and provides a clear summary of how often each interval or category
occurs. Frequency distribution tables are commonly used to analyze and interpret
healthcare data, such as patient demographics or clinical measurements.

6. Range - The range is a measure of the spread or dispersion of a set of data values.
It represents the difference between the highest and lowest values in the dataset.
The range provides a simple measure of variability and can be useful for
understanding the spread of clinical measurements or patient outcomes within a
population or sample.

7. Class Interval or Class Limit or Class Boundaries - Class intervals, class


limits, and class boundaries in Biostatistics refer to the intervals or categories
used to group data in a frequency distribution table.
Class Interval: It defines the range of values that are grouped together in the
frequency distribution.
Class Limit: Class limits are the smallest and largest values that can belong to
different classes in a frequency distribution.
Class Boundaries: Class boundaries are the exact points that separate one class
interval from another in a frequency distribution table.

8. True Class Boundaries - True class boundaries in Biostatistics are the precise
points that separate adjacent class intervals in a frequency distribution table. They
are calculated by subtracting or adding 0.5 units to the lower and upper limits of
each class interval, respectively, to obtain the true boundaries between classes.

9. Cumulative Frequency - Cumulative frequency in Biostatistics is the running


total of frequencies obtained by adding up the frequencies of each class interval as
one moves through the frequency distribution table. It provides information about
the total number of observations that fall below a certain value or within a certain
range.
10. Frequency - It refers to the number of times a particular value, category, or event
occurs within a dataset. It represents the count or occurrence of each value or
event and is used to quantify the distribution of data.

11. Class Midpoint or Class Mark - The class midpoint or class mark in
Biostatistics is the average of the lower and upper class limits of a class interval in
a frequency distribution table. It serves as a representative value for each class
interval and is often used in calculations and analyses involving grouped data.

B. Answer the following:


1. Differentiate Qualitative Variable from Quantitative Variable.
Quantitative Variables are variable whose values result from counting or measuring
something. While, Qualitative Variables are not measureable variables. Their values do
not result from measuring or counting. The main difference between the two lies in the
type of information they represent. Qualitative variables describe categorical attributes or
characteristics, while quantitative variables represent measurable quantities or numerical
values.

2. Enumerate the different steps in a statistical investigation.


1. Identify a problem
- clarify the problem and formulate questions that can be answered with data
2. Collect data
- design and implement a plan to collect or obtain appropriate data
3. Analyse data
- select and apply appropriate techniques to analyse the data
4. Interpret and communicate the results
- interpret the results of analysis in a way that relates to the original question

For example, the procedures can be stated as follows if a researcher is utilizing the
statistical investigation approach to look into a potential correlation between the number
of liters of soft drink a person consumes each week and their BMI (bivariate data):
a. Identify a problem
- does there appear to be an association? Do people who drink more soft drink
tend to have a higher BMI?
b. Collect data
- collect bivariate data (litres consumed and BMI) for a sufficient number of
people.
- the source of the data should be recorded so that someone reading the final
report could verify the data themselves.
c. Analyse data
- this is where we "do the maths" by applying graphical or numerical techniques
to analyse the data.
- the scientist might construct a percentage two-way frequency table or a
scattergraph, determine the strength of the relationship by determining the
correlation co-efficient, or find an equation of a line or curve that best describes
the apparent relationship.
d. Interpret and communicate the results
- comment on whether the analysis indicates that there is an association between
the variables.
- the interpretation of the results should be related to the original question and
communicated in a systematic and concise manner.

Note: a statistician must consider whether they will survey an entire population of
interest (census) or a representative group from within the population (sample).
The process of selecting a sample must be as unbiased as possible to keep the data
as representative as possible.

3. Enumerate the different types of data:


a. According to continuity of measurement
- Continuous Data: Continuous data are measurements that can take on any value
within a given range and can be further divided into smaller increments. These
data are typically obtained through measurements and can have infinite possible
values. Examples include age, weight, height, blood pressure, and laboratory test
results.

- Discrete Data: These are measurements that can only take on specific, distinct
values, usually integers. These data are counted rather than measured and often
represent counts or frequencies of events. Examples include the number of
patients in a hospital ward, the number of medical errors in a healthcare facility,
or the number of patients with a specific medical condition.

b. According to scale or measurement


- Nominal Data: Nominal data are categorical data that represent categories or
groups with no inherent order or ranking. These data are typically used to classify
items or individuals into distinct groups based on shared characteristics. Examples
include gender (male, female), ethnicity (Caucasian, African American, Asian),
and type of disease (hypertension, diabetes, cancer).

- Ordinal Data: Ordinal data are categorical data that represent categories or
groups with a meaningful order or ranking. While the categories have a specified
order, the differences between the categories may not be uniform or quantifiable.
Examples include pain intensity (mild, moderate, severe), patient satisfaction
ratings (poor, fair, good, excellent), and educational attainment (high school
diploma, bachelor's degree, master's degree).

- Interval Data: It is the numerical data that represent measurements where the
differences between values are meaningful and consistent, but there is no true zero
point. These data have a specific order, and the differences between values are
equal and measurable. Examples include temperature measured in degrees Celsius
or Fahrenheit and calendar dates.
- Ratio Data: Ratio data are numerical data that represent measurements where the
differences between values are meaningful, consistent, and there is a true zero
point. These data have a specific order, the differences between values are equal
and measurable, and ratios of values are meaningful. Examples include height,
weight, age, and laboratory measurements such as blood glucose levels.

c. According to their pattern or arrangement


- Raw Data: It is the original, unprocessed measurements or observations collected
from a study or experiment. These data have not been manipulated or analyzed
and may require cleaning and preparation before analysis.

- Grouped Data: It is the data that have been organized into groups or intervals to
simplify analysis and presentation. These data are often used in frequency
distributions and histograms to summarize large datasets and identify patterns or
trends.

- Time Series Data: Time series data are data collected over time at regular
intervals. These data are used to analyze trends, patterns, and changes in variables
over time. Examples include monthly patient admission rates, yearly mortality
rates, or quarterly blood pressure measurements.

- Cross-Sectional Data: Cross-sectional data are data collected from a single point
in time or from a single snapshot of a population. These data provide a snapshot
of the population at a specific moment and are used to assess prevalence,
associations, or relationships among variables at that point in time.

4. Enumerate and define the different branches of statistics.

Numerous statistical fields are relevant to the study and practical use of healthcare. The
definitions of the major statistical subfields are as follows:
1. Descriptive Statistics
This involve methods for summarizing and describing the main features of a
dataset. This branch of statistics includes measures such as mean, median, mode,
range, variance, standard deviation, and percentiles. Descriptive statistics are used
to organize, simplify, and present data in a meaningful way, providing insights
into the characteristics and patterns present in the data.

2. Inferential Statistics
Inferential statistics involve methods for making inferences or predictions about a
population based on sample data. This branch of statistics includes techniques
such as hypothesis testing, confidence intervals, and regression analysis.
Inferential statistics allow researchers to draw conclusions, test hypotheses, and
make predictions about populations using sample data.

3. Biostatistics
Biostatistics, also known as biometry or biometrics, is the application of statistical
methods to biological, biomedical, and healthcare-related data. This branch of
statistics focuses on the design, analysis, and interpretation of studies in medicine,
public health, epidemiology, and other healthcare fields. Biostatistics plays a
crucial role in healthcare research, clinical trials, epidemiological studies, and
evidence-based practice.

4. Clinical Trials
These are experimental studies conducted to evaluate the safety, efficacy, and
effectiveness of new medical treatments, interventions, or therapies. This branch
of statistics involves the design, analysis, and interpretation of clinical trial data to
assess the impact of healthcare interventions on patient outcomes. Clinical trials
often employ randomized controlled trials (RCTs) and other experimental designs
to compare treatment groups and control groups.
5. Epidemiology
It is the study of the distribution and determinants of health-related events,
diseases, and conditions in populations. This branch of statistics involves the
collection, analysis, and interpretation of data on disease prevalence, incidence,
risk factors, and outcomes to identify patterns, trends, and risk factors associated
with health and disease. Epidemiological studies inform public health
interventions, disease prevention strategies, and healthcare policy decisions.

6. Survival Analysis
Survival analysis, also known as time-to-event analysis, is a branch of statistics
that focuses on the analysis of time until an event of interest occurs. This branch
of statistics is commonly used in healthcare research to analyze patient survival
times, disease progression, time to recurrence, or time to recovery. Survival
analysis techniques include Kaplan-Meier curves, Cox proportional hazards
models, and parametric survival models.

7. Bayesian Statistics
It is a branch of statistics that involves the use of Bayesian methods and principles
for statistical inference. This approach to statistics incorporates prior knowledge
or beliefs about a population into the analysis, updating these beliefs based on
observed data to make probabilistic inferences. Bayesian statistics is used in
healthcare research to model uncertainty, incorporate expert knowledge, and make
predictions about patient outcomes or treatment effects.

5. Differentiate Probability sampling techniques from non-probability sampling


techniques and list down each method.

Probability sampling techniques involve methods where each member of the


population has a known, non-zero chance of being selected as part of the sample. These
techniques allow researchers to calculate the probability of each member being selected
and ensure that the sample is representative of the population.
1. Simple Random Sampling:
In simple random sampling, every member of the population has an equal chance
of being selected as part of the sample. This method involves randomly selecting
individuals from the population without any systematic bias or predetermined
criteria.
2. Stratified Random Sampling:
Stratified random sampling involves dividing the population into homogeneous
subgroups or strata based on certain characteristics (e.g., age, gender, disease
severity) and then randomly selecting samples from each stratum. This method
ensures that each subgroup is represented proportionally in the sample.
3. Systematic Random Sampling:
Systematic random sampling involves selecting every kth member from the
population after randomly selecting a starting point. The value of k is calculated
by dividing the population size by the desired sample size. This method provides
a systematic approach to sampling while maintaining randomness.
4. Cluster Sampling:
This involves dividing the population into clusters or groups, then randomly
selecting clusters to include in the sample. Within each selected cluster, all
members are included in the sample. This method is useful when it's impractical
or expensive to sample individuals directly.

Non-probability sampling techniques involve methods where the likelihood of any


particular member of the population being selected as part of the sample is unknown or
cannot be determined. These techniques may introduce bias into the sample, and the
results may not be representative of the population.

1. Convenience Sampling:
Convenience sampling involves selecting individuals who are readily available or
easily accessible to the researcher. This method is convenient and cost-effective
but may not produce a representative sample, as it may exclude certain segments
of the population.
2. Purposive Sampling:
Purposive sampling involves selecting individuals based on specific criteria or
characteristics determined by the researcher. This method allows researchers to
target specific groups of interest but may introduce bias if the selection criteria are
not well-defined or if certain groups are excluded.
3. Snowball Sampling:
Snowball sampling involves selecting initial participants who then refer or recruit
additional participants from their social network. This method is useful for
studying hard-to-reach populations or individuals with shared characteristics but
may result in a biased sample if participants are not representative of the
population.
4. Quota Sampling:
Quota sampling involves selecting individuals based on predetermined quotas or
targets for certain characteristics (e.g., age, gender, occupation) to ensure that the
sample reflects the population's composition. However, participants are not
selected randomly, which may introduce bias into the sample.

In summary, probability sampling techniques involve methods where every member of


the population has a known chance of being selected, while non-probability sampling
techniques involve methods where the likelihood of selection is unknown or cannot be
determined. Each method has its advantages and limitations, and the choice of sampling
technique depends on the research objectives, resources, and constraints.

6. Enumerate and define the different methods of collecting data. List down the
advantage and disadvantages of each.
Various methods are used to collect data for research and analysis. Here are different
methods of collecting data along with their definitions, advantages, and disadvantages:
1. Surveys or Questionnaires
This involves the collection of data through self-reported responses to structured
or semi-structured questions. These questions can be administered in person, over
the phone, through mail, or online.
Advantages are as follows:
- Surveys allow for the collection of large amounts of data from a diverse
population.
- They are relatively cost-effective and can be administered remotely.
- Surveys can be standardized to ensure consistency in data collection.
Disadvantages are as follows:
- Response rates may be low, leading to potential non-response bias.
- There may be errors in self-reporting due to recall bias or social
desirability bias.
- Designing effective surveys requires careful consideration of question
wording, format, and order.

2. Interviews
Interviews involve face-to-face or telephone conversations between a researcher
and participant, during which the researcher asks questions and records responses.
Advantages:
- Interviews allow for in-depth exploration of topics and clarification of
responses.
- Researchers can adapt questions and probe further based on participant
responses.
- Interviews can facilitate rapport-building and trust between the researcher
and participant.
Disadvantages:
- Interviews can be time-consuming and resource-intensive.
- There may be interviewer bias or social desirability bias affecting
responses.
- The presence of the interviewer may influence participant responses.
3. Observational Studies
Observational studies involve the systematic observation and recording of
behaviors, events, or phenomena without direct intervention or manipulation by
the researcher.
Advantages are as follows:
- Observational studies allow for the collection of data in naturalistic
settings, reflecting real-world behavior.
- They can generate rich, qualitative data that provide insights into complex
phenomena.
- Observational studies are less likely to suffer from response bias compared
to self-report methods.
Disadvantages:
- Observer bias may occur if researchers interpret or record data selectively.
- Observational studies may be time-consuming and require trained
observers.
- It may be challenging to maintain objectivity and consistency across
observers.

4. Clinical Trials
This involves experimental studies conducted to evaluate the safety, efficacy, and
effectiveness of new medical treatments, interventions, or therapies.
Advantages:
- Clinical trials provide rigorous evidence for assessing the impact of
healthcare interventions on patient outcomes.
- Randomization and blinding techniques help minimize bias and
confounding variables.
- Clinical trials can inform evidence-based practice and guide clinical
decision-making.
Disadvantages:
- Clinical trials can be expensive, time-consuming, and resource-intensive.
- Ethical considerations must be addressed, including informed consent and
patient safety.
- Generalizability of findings may be limited due to strict inclusion criteria
and controlled settings.

5. Secondary Data Analysis


Secondary data analysis involves the use of existing data sources, such as medical
records, administrative databases, or public health surveys, for research purposes.
Advantages:
- Secondary data analysis is cost-effective and time-efficient, as data
collection has already been conducted.
- Large datasets may provide statistical power and allow for subgroup
analyses.
- Secondary data sources can provide longitudinal or population-level data
that are not feasible to collect directly.

Disadvantages:
- Data quality may vary across sources, leading to potential measurement
errors or missing information.
- Researchers have limited control over data collection methods and
variables measured.
- It may be challenging to access or obtain permission to use certain
datasets.

You might also like