Stats

You might also like

You are on page 1of 14

DEFINITION OF STATISTICS

Statistics is a branch of mathematics that involves the collection, analysis, interpretation, presentation,
and organization of data. It provides methods for summarizing and making inferences from data,
enabling researchers, analysts, and decision-makers to draw meaningful conclusions about a population
or phenomenon based on a representative sample.

Let's consider a scenario;

• involving a company that manufactures smartphones. The company is interested in


understanding the satisfaction level of its customers regarding the battery life of their latest
smartphone model.

To investigate this, they decide to conduct a statistical study.

Here's how the scenario unfolds:

Objective: The company's main goal is to determine the average satisfaction level of customers regarding
battery life.

Data Collection: They randomly select a sample of 300 customers who have purchased the latest
smartphone model. Each customer is asked to rate their satisfaction with the battery life on a scale from
1 to 10, with 1 being very dissatisfied and 10 being very satisfied.

Data Analysis and Summary:

- The collected data is organized and summarized using descriptive statistics, such as calculating the
mean (average), median, and standard deviation of the satisfaction ratings.

- A histogram or box plot may be created to visualize the distribution of satisfaction ratings.

Inferential Statistics: (I summarize muna lahat ni descriptive para ma analyze maayos ni inferential)

- With the sample data, the company can make inferences about the entire population of customers
who purchased the smartphone.

- Confidence intervals can be calculated to estimate the range within which the true average
satisfaction level of the population lies.

Hypothesis Testing:

- The company might want to test a hypothesis, such as whether the average satisfaction level is
significantly different from a predefined target value (e.g., 7 on the scale).

- A statistical test, such as a t-test, can be conducted to assess the significance of the observed
differences.
Presentation of Findings:

- The results of the study, including descriptive statistics, inferential statistics, and any significant
findings from hypothesis testing, are presented in a clear and understandable manner.

- Visual aids, such as charts or graphs, may be used to enhance the presentation.

Decision-Making:

- Based on the statistical analysis, the company can make informed decisions regarding potential
improvements to the smartphone's battery life or marketing strategies to address customer satisfaction.

This scenario illustrates how statistics can be applied in a real-world business context to gather insights,
make informed decisions, and enhance the overall understanding of customer satisfaction.
Branches of Statistics

Statistics can be broadly categorized into two main branches:

1. Descriptive Statistics:

- Measures of Central Tendency: Includes mean, median, and mode, which provide a central value or
typical value of a set of data.

- Measures of Dispersion: Involves range, variance, and standard deviation, which quantify the spread
or dispersion of data points.

- Frequency Distributions: Organizing and summarizing data into tables or graphs to show the
frequency of different values or ranges.

Let's consider a scenario:

Involving a high school and the academic performance of its students. The school administration
is interested in understanding the overall performance of students in a recent final exam for a
particular subject. Descriptive statistics can be employed to summarize and analyze the exam
scores.

Here's how the scenario unfolds:

Objective: The school administration aims to describe the performance of students in the final exam for
a specific subject.

Data Collection: The exam scores of 100 students who took the final exam in the subject are collected.
Each student's score is recorded as a numerical value.

Data Analysis and Summary:

- Descriptive statistics are used to summarize the exam scores. This includes calculating measures such
as the mean (average), median, mode, range, and standard deviation.

- A frequency distribution may be created to show the distribution of scores and highlight common
score ranges.

Visualization:

- A histogram or a box plot can be generated to visually represent the distribution of exam scores.

- A pie chart or bar graph might be used to show the percentage of students who fall into different
performance categories (e.g., excellent, good, satisfactory, needs improvement).

Interpretation:

- The mean score provides an average performance level, while the standard deviation indicates the
extent of variability in the scores.
- The median and mode can give insights into the central tendency and the most common score,
respectively.

Identifying Trends and Patterns:

- Descriptive statistics can help identify any trends or patterns in the distribution of scores. For
example, a skewness in the distribution may suggest whether more students performed exceptionally
well or struggled in the exam.

Communication of Results:

- The summary statistics and visual representations are communicated to teachers, parents, and
students in a report or presentation.

- The report may highlight areas where students excelled or struggled, helping to guide educational
interventions or improvements in teaching methods.

Decision-Making:

- The school administration can use the descriptive statistics to make informed decisions about
potential changes in the curriculum, teaching strategies, or additional support for students based on
their performance levels.

This scenario demonstrates how descriptive statistics can be applied to summarize and communicate the
essential features of a dataset, providing valuable insights for decision-making in an educational context.

2. Inferential Statistics:

- Hypothesis Testing: Aims to make inferences or decisions about a population based on a sample of
data. It involves testing a hypothesis about the population parameter.

- Regression Analysis: Examines the relationship between one or more independent variables and a
dependent variable. It is used for predicting or modeling.

- Analysis of Variance (ANOVA): Compares means among different groups to determine if there are
statistically significant differences. (kung nagkaroon ba ng difference nung minake nila yung decision na
yun)

- Probability Distributions: Describes the likelihood of different outcomes in a random experiment.


Common distributions include normal, binomial, and Poisson distributions.

Let's consider a scenario

• involving a pharmaceutical company that has developed a new drug intended to lower blood
pressure. The company is interested in making inferences about the effectiveness of the drug on
a larger population based on a sample of patients who participated in a clinical trial.
Here's how the scenario unfolds:

Objective: The pharmaceutical company aims to infer the impact of their new blood pressure drug on
the overall population.

Data Collection and Experimental Design:

- A randomized controlled clinical trial is conducted with 500 participants who have high blood
pressure.

- Half of the participants are randomly assigned to receive the new drug, while the other half receives a
placebo (a fake treatment with no active ingredients) (a treatment that appears real, but is designed to
have no therapeutic benefit. A placebo can be a sugar pill, a water or salt water (saline) injection or even
a fake surgical procedure.)

- Blood pressure measurements are recorded before and after the treatment for each participant.

Data Analysis and Summary:

- Descriptive statistics are used to summarize the blood pressure measurements within each group
(drug and placebo), including mean, median, and standard deviation.

- Inferential statistics, such as a t-test or ANOVA, are used to compare the mean blood pressure
changes between the two groups.

Confidence Intervals:

- The company calculates confidence intervals around the mean differences in blood pressure changes
to estimate the range within which the true population effect lies.

- For example, they may report a 95% confidence interval for the mean difference in blood pressure
changes after using the drug.

Hypothesis Testing:

- The company may formulate hypotheses, such as whether there is a statistically significant difference
in blood pressure changes between the drug and placebo groups.

- Statistical tests are conducted to assess the significance of observed differences and determine if they
are likely to be real or due to random chance.

Generalization to the Population:

- Based on the results from the sample, the pharmaceutical company makes inferences about the
broader population of individuals with high blood pressure.

- They may conclude, for instance, that the new drug is effective in lowering blood pressure in the
overall population.

Communication of Results:

- The findings, including statistical significance and effect size, are communicated to regulatory
authorities, healthcare professionals, and the public in a comprehensive report.
- The report may also discuss any potential side effects or limitations of the study.

Decision-Making:

- The pharmaceutical company, regulatory bodies, and healthcare providers use the inferential
statistics to make decisions about the drug's approval, marketing, and prescription recommendations.

This scenario illustrates how inferential statistics are applied to draw conclusions about a population
based on a sample, particularly in the context of assessing the efficacy of a new medical intervention.

Beyond these two main branches, there are several specialized areas within statistics, including:

Biostatistics: Focuses on statistical methods and techniques in biology and medical research.

Econometrics: Applies statistical methods to economic data to test hypotheses and forecast future
trends.

Social Statistics: Deals with statistical analysis of social phenomena, such as demographics and social
trends.

Statistical Quality Control: Concentrates on maintaining and improving the quality of processes and
products through statistical methods.

Bayesian Statistics: Utilizes Bayesian probability to update beliefs or probabilities based on new
evidence.

Multivariate Statistics: Involves the analysis of data with more than two variables, exploring
relationships and patterns among multiple variables simultaneously.

These branches represent the diverse applications of statistics in various fields, each with its own set of
methods and techniques tailored to address specific types of data and research questions.
Variables and it’s types.

• In statistics and research, a variable is any characteristic, attribute, or quantity that can be
measured or counted. Variables can be classified into different types based on their nature and
the level of measurement. The main types of variables are:

Categorical Variables:

- Nominal Variables: These variables represent categories with no inherent order or ranking. Examples
include gender, ethnicity, or eye color.

- Ordinal Variables: Categories have a meaningful order, but the intervals between them are not
consistent. Examples include education levels or socioeconomic classes.

Numerical Variables:

- Discrete Variables: These are variables that can only take distinct, separate values, typically integers.
Examples include the number of students in a class or the count of cars in a parking lot.

- Continuous Variables: These variables can take any value within a range and can have infinite decimal
places. Examples include height, weight, or temperature.

Independent and Dependent Variables:

- Independent Variable: The variable that is manipulated or changed in an experiment or study. It is


believed to have a direct effect on the dependent variable. (it is used to explain or predict changes in the
response variable)

- Dependent Variable: The variable that is observed or measured in response to changes in the
independent variable. It is the outcome variable. (being studied and measured)

Let's consider a scenario:

• in the field of education where researchers are investigating the impact of different teaching
methods on students' academic performance. In this scenario, the independent variable is the
teaching method used, and the dependent variable is the students' academic performance.

Scenario: Impact of Teaching Methods on Academic Performance

Objective: Researchers aim to understand how different teaching methods affect students' academic
performance in a high school mathematics course.

Independent Variable: Teaching Method

* Three teaching methods are chosen as the independent variable:

- Traditional Lectures

- Interactive Group Discussions

- Online Simulations
Dependent Variable: Academic Performance

The academic performance of students is measured through their scores in a standardized mathematics
test administered at the end of the semester.

Experimental Design:

- A group of 150 students is randomly assigned to one of the three teaching methods. Each teaching
method is implemented by a different teacher to control for potential teacher-specific effects.

- The students' initial math scores, demographic information, and any other relevant factors are
recorded as covariates to control for their influence.

Data Collection:

- At the end of the semester, all students take the same standardized mathematics test, and their
scores are recorded.

Data Analysis:

- The researchers use statistical analysis, such as Analysis of Variance (ANOVA), to compare the mean
scores of students in the three teaching methods.

- Post-hoc tests may be conducted to identify specific differences between pairs of teaching methods if
the ANOVA indicates a significant overall difference.

Results:

- The researchers find that there is a statistically significant difference in mean test scores among the
three teaching methods.

Conclusion:

- Based on the statistical analysis, the researchers conclude that the teaching method has a significant
impact on students' academic performance.

Interpretation:

- Further analysis reveals that the group receiving interactive group discussions had the highest mean
score, suggesting that this teaching method may be more effective in enhancing academic performance
compared to traditional lectures and online simulations.

Implications and Recommendations:

- The research findings may have implications for educational policies and teaching practices.

- Recommendations might be made for incorporating more interactive group discussions in


mathematics classes to improve students' understanding and performance.
In this scenario, the independent variable is the teaching method because it is manipulated by the
researchers to observe its effect on the dependent variable, which is the academic performance of the
students.

Explanatory and Response Variables:

- Explanatory Variable: Similar to the independent variable, it is used to explain or predict changes in
the response variable.

- Response Variable: Similar to the dependent variable, it is the variable being studied and measured.

Qualitative and Quantitative Variables:

- Qualitative Variables: Include nominal and ordinal variables. They represent qualities or
characteristics.

- Quantitative Variables: Include discrete and continuous variables. They represent quantities or
numerical values.

Binary Variables:

- A special case of categorical variables with only two possible values, such as 0 and 1, true and false, or
yes and no.

Understanding the types of variables is crucial in designing experiments, conducting statistical analyses,
and drawing meaningful conclusions from data. The choice of the appropriate statistical methods often
depends on the types of variables involved in a study.
Levels of Measurement

• In statistics, levels of measurement, also known as scales of measurement, refer to the different
ways in which variables can be categorized based on the nature of the data. There are four main
levels of measurement, each with its own characteristics and constraints:

Nominal Level of Measurement:

- Represents categories or labels with no inherent order or ranking.


- Only allows for qualitative classification.
- Examples include gender, eye color, or types of cars.
- Operations like counting or determining mode are applicable.
Ordinal Level of Measurement:

- Represents categories with a meaningful order or ranking.


- Intervals between categories are not consistent or meaningful.
- Allows for greater information than nominal level but lacks precise measurement.
- Examples include educational levels (e.g., high school, college, graduate) or socioeconomic classes.
- Operations like counting, determining mode, and comparing ranks are applicable.
Interval Level of Measurement:

- Represents categories with a meaningful order, and the intervals between values are consistent and
meaningful.
- Has no true zero point, meaning that a value of zero does not indicate the absence of the
characteristic being measured.
- Examples include temperature measured in Celsius or Fahrenheit.
- Operations like addition, subtraction, determining mean, and calculating standard deviation are
applicable.
Ratio Level of Measurement:

- Similar to interval level but has a true zero point, indicating the absence of the characteristic being
measured.
- Allows for all arithmetic operations, including multiplication and division.
- Examples include height, weight, income, and age.
- Operations like addition, subtraction, multiplication, division, determining mean, and calculating
standard deviation are applicable.

These levels of measurement represent a hierarchy, where each level includes the characteristics of the
levels below it and adds additional properties. Researchers need to carefully consider the level of
measurement when designing studies, selecting statistical methods, and interpreting results, as the
appropriate statistical analyses depend on the nature of the data and the level of measurement.
Population and Sample

• In statistics, a population and a sample are two important concepts that are fundamental to the
process of gathering and analyzing data.

Population:

A population refers to the entire group that is the subject of the study. It includes all individuals, items,
or data points that meet certain criteria and share a common characteristic.

Example: If you are studying the average height of all adult males in a country, the population would
consist of every adult male in that country.

Sample:

A sample is a subset of the population, selected for the purpose of the study. It is chosen in a way that
allows it to represent the larger population from which it is drawn.

Example: In the height study mentioned earlier, instead of measuring the height of every adult male in
the country (population), you might select a sample of, say, 500 adult males, and measure their heights.

Key Points:

- The population is the entire group you want to study, while the sample is a smaller group selected from
the population.
- Due to practical constraints such as time, cost, and resources, researchers often use samples to make
inferences about populations.
- The goal is for the sample to be representative of the population, so that findings from the sample can
be generalized to the entire population.
- Statistical techniques are often used to analyze data from samples and draw conclusions or make
predictions about the population.

Example:

Suppose you want to study the average income of residents in a city. The population would be all
residents in the city, but it might be impractical to collect data from every single resident. Instead, you
could randomly select a sample of households, collect income data from those households, and use the
sample information to estimate the average income for the entire population.

Understanding the distinction between population and sample is crucial in the design of experiments,
surveys, and research studies, as it helps ensure the validity and generalizability of the findings.
Parameter and Statistic

In statistics, "parameter" and "statistic" are terms that refer to numerical measures that describe aspects
of a population or a sample, respectively.

Parameter:

A parameter is a numerical characteristic of a population. It is a fixed value that provides information


about the entire population.

Example: If you want to know the average income of all households in a city, the actual average income
(mean) of all households in the entire city is a population parameter.

Statistic:

A statistic is a numerical characteristic of a sample. It is a measure calculated from data collected from a
subset (sample) of the population.

Example: If you randomly select 100 households from the city and calculate the average income (mean)
of these households, that calculated average is a sample statistic.

Key Points:

- Parameters describe characteristics of an entire population, while statistics describe characteristics of a


sample.
- Parameters are usually fixed values (although they may not be known), while statistics vary from
sample to sample.
- Parameters are typically denoted by Greek letters (e.g., μ for population mean), while statistics are
usually denoted by corresponding Latin letters (e.g., x̄ for sample mean).
- The goal of inferential statistics is often to use sample statistics to make inferences about population
parameters.

Example:

Suppose you want to estimate the average height of all students (population) in a particular school. If
you measure the height of 50 randomly selected students (sample) and calculate the average height
from this sample, that average is a sample statistic. The average height of all students in the school,
which you are trying to estimate, is the population parameter.
Sampling Techniques

Sampling techniques in statistics are methods used to select a subset of elements from a larger
population in order to make inferences about the population. The goal is to obtain a sample that is
representative of the population to draw valid conclusions. Here are some common sampling techniques
used in statistics:

Simple Random Sampling:

Every individual in the population has an equal chance of being selected and each selection is
independent of the others.

Procedure: Use random methods such as random number generators or random sampling methods to
ensure each individual has an equal chance of being chosen.

Stratified Random Sampling:

The population is divided into subgroups or strata and then random samples are taken from each
stratum.

Procedure: Ensure that each subgroup is adequately represented in the sample by proportionally
sampling from each stratum.

Systematic Sampling:

Select every kth individual from the population after choosing a random starting point.

Procedure: Determine the sampling interval (k) by dividing the population size by the desired sample
size.

Cluster Sampling:

The population is divided into clusters, and then a random sample of clusters is selected. All individuals
within the selected clusters are included in the sample.

Procedure: Randomly select clusters and include all members of those clusters in the sample.

Multistage Sampling:

Combines several sampling methods in stages. It may involve both cluster sampling and stratified
sampling.

Procedure: Select clusters at the first stage, and then apply another sampling method (such as simple
random sampling) within each selected cluster.

Convenience Sampling:

Involves selecting individuals who are easiest to reach or most convenient to include in the study.

Procedure: No specific method; researchers choose participants based on availability and accessibility.
Snowball Sampling:

Participants are selected, and then they refer or recruit additional participants, leading to the sample
growing like a snowball.

Procedure: The sample expands as participants refer others to participate.

Quota Sampling:

Researchers aim to maintain a specific proportion of certain characteristics within the sample.

Procedure: Participants are selected based on pre-defined quotas for characteristics like age, gender, or
socioeconomic status.

Purposive Sampling (or Judgmental Sampling):

Researchers deliberately select individuals based on specific criteria relevant to the research question.

Procedure: Participants are chosen based on the researcher's judgment about their suitability for the
study.

The choice of a specific sampling technique depends on the research question, the characteristics of the
population, available resources, and practical considerations. It is important to carefully consider the
implications of each method to ensure the validity and reliability of the statistical analysis.

You might also like