You are on page 1of 4

LESSON 1 - A | Basic Concepts of Statistics

Statistics is a branch of mathematics that deals with the collection, classification, description, and interpretation of data
obtained by the conduct of surveys and experiments. Its fundamental purpose is to describe and draw inferences about
the numerical properties of a population.

Population - is the entirety of individuals or objects of interest


Sample - is a portion or part of the population of interest.

The measures of the population are called parameters and the measures of the sample are called statistics or estimates.

Data is used to describe a collection of natural phenomena descriptors, including the result of experience, observation or
experiment. These could be numbers, words or images that are used as measurement or result from observations of a set
of variables. Data can be drawn from the population or sample.

Types of Data:
a. Qualitative Data - are categorized data, which take the form of categories or attributes ( e.g. - sex, year level,
religion, etc.)
b. Quantitative Data - numerical data obtained from measurements (e.g. - weight, height, age, scores, etc.)

Presentation of Data:
The study of statistics begins with the collection of data and measurements. Data collected should be organized
systematically for easier and faster interpretation. They may be presented in tabular or graphical form.

A table is used when you want to present a data in a systematic and organized manner so that the reading and
interpretation will be simpler and easier. When a table is used, you must remember the following:
1. The title of the table.
2. Arrange the data systematically in columns. The columns must be properly labeled.

Another type of tabular presentation is the frequency distribution table. It is an arrangement of the data that shows the
frequency of occurrence of different values of variables.

Graphical Presentation of Data


● Histogram - is made up of vertical bars that are joined together, making an appropriate graph for continuous data.
The base of each bar or rectangle is equal to the class boundaries, wherein height corresponding to its class
frequency.
● Bar Graph - is used to present discrete data, where the bars are separated.
● Pie Chart - is used to show percentage distribution.
● Line Graph - a graph that is used to show trends over a period of time.

Variable is the characteristics of an individual or object that can be measured. A variable must vary or have different
values in the study.

LESSON 1 - B | Measures of Central Tendency and Measures of Variability

Aside from tables and graphs, another way of describing a set of data is by stating a single numerical value associated
with it. This value is where all the other values in a distribution tend to cluster. It is called the average or measure of
central tendency. There are three kinds of average: the mean, median, and mode.

MEAN
The mean is the most commonly used measure of central position. The mean used to describe a set of data where the
measure cluster or concentrate at a point. As the measures cluster around each other, a single value appears to represent
distinctively the total measures. It is, however, affected by extreme measures, that is, very high or very low measures can
easily change the value of the mean.
MEDIAN
The median is the middle entry or term in a set of data arranged in either increasing or decreasing order.

To find the median of a given set of data, take note of the following:
1. Arrange the data in either increasing or decreasing order.
2. Locate the middle value. If the number of cases is odd, the middle value is the median. If the number of cases is even,
take the arithmetic mean of the two middle measures.
MODE
The mode is another measure of position. The mode is the measure or value which occurs most frequently in a set of
data. It is the value with the greatest frequency.

To find the mode for a set of data -


1. select the measure that appears most often in the set.
2. if two or more measures appear the same number of times , and the frequency they appear is greater than any other
measures, then each of these values is a mode.
3. if every measure appears the same number of times, then the set of data has no mode.

MEASURES OF VARIABILITY
A measure of variability is a summary statistic that represents the amount of dispersion in a dataset.

How spread out are the values?


While a measure of central tendency describes the typical value, measures of variability define how far away the data
points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion
indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall
further away.

Variance - average of squared distances from the mean


Standard Deviation - average distance from the mean

LESSON 2 | Random Variables and Probability Distributions

A random variable is a quantitative variable whose value is determined by the outcome of a random experiment

Notation:
We use a CAPITAL LETTER, say X, to denote a random variable, and a corresponding small letter, x, for one of its
values.

● Discrete Random Variable - has a finite number of elements or infinite but can be represented by whole numbers.
These values usually arise from counts.
● Continuous Random Variable - has an infinite number of elements and cannot be represented by whole numbers.
These values usually arise from measurements.

PROBABILITY - Probability defines the likelihood of occurrence of an event.

● Experiment: A trial or an operation conducted to produce an outcome is called an experiment.


● Random Experiment: An experiment that has a well-defined set of outcomes is called a random experiment.
● Sample Space: All the possible outcomes of an experiment together constitute a sample space.
● Event: The total number of outcomes of a random experiment is called an event.

Probability Distribution of a Discrete Random Variable


● tells all of its possible values [x] along with their associated probabilities [P(x)]
● Any Probability DISTRIBUTION of a Random Variable must satisfy the following 2 conditions:
1. Each probability P(x) must be between 0 and 1:0 ≤ P (x ) ≤ 1 (If the probability of an event is zero, then
the event is unlikely to happen or impossible.)
2. The sum of all the possible probabilities is 1. ∑ P(x) = 1

Probability Histogram of a Discrete Random Variable


● a graphical representation of a probability distribution.
● Displays possible values of a random variable along the horizontal axis, probabilities along the vertical axis.

LESSON 3 | Sampling Techniques

1. PROBABILITY SAMPLING
● Probability sampling is a sampling technique in which the subjects of the population get an equal opportunity to be
selected as a representative sample.
● known as random sampling
● Unbiased

2. NON-PROBABILITY SAMPLING
● Nonprobability sampling is a method of sampling wherein it is not known which individual from the population will
be selected as a sample.
● known as non-random sampling
● biased

Four Types of Random Sampling Techniques:

1. Simple Random Sampling (SRS) - A simple random sampling is a sampling technique in which every element of the
population has the same probability of being selected for inclusion in the sample.

a. Lottery Method - Every member is assigned a unique number. These numbers are put in a jar and thoroughly
mixed. After that, the researcher picks some numbers without looking at it and those people are included in the
study.
b. Use of Table of Random Numbers - This table consists of a series of digits (0-9) that are generated randomly.
The numbers are arranged in rows and columns and can be read in any direction. All the digits are equally
probable.

2. Systematic sampling - Systematic sampling is a random sampling technique in which a list of elements of the
population is used as a sampling frame and the elements to be included in the desired sample are selected by skipping
through the list at regular intervals.

3. Stratified Sampling - Stratified Sampling is a random sampling technique in which the population is first divided into
strata and then samples are randomly selected separately from each stratum.

4. Cluster or Area Sampling - Cluster or area sampling is a random sampling technique in which the entire population is
broken into small groups or clusters, and then, some of the clusters are randomly selected. The data from randomly
selected clusters are the ones that are analyzed.

Four Types of Non-Random Sampling Techniques:

1. Convenience Sampling - Convenience sampling is the most common type of non-probability sampling, which focuses
on gaining information from participants (the sample) who are ‘convenient’ for the researcher to access.

2. Purposive Sampling -This technique, also called judgmental or selective sampling, focuses on samples taken based on
the researcher's judgement.

3. Snowball Sampling - In this technique, the researcher chooses a possible respondent for the study at hand. Then each
respondent is asked to give recommendations or referrals to other potential respondents.

4. Quota Sampling - It is the equivalent of stratified random sampling in terms of nonprobability sampling. In this
technique, the researcher starts by identifying quotas, which are predefined control categories such as age, gender,
education, or religion.

LESSON 4 | Normal Distribution

Normal Probability Distribution is a probability distribution of continuous random variables. It shows graphical
representations of random variables obtained through measurement such as the height and weight of the students. It is
sometimes called the bell curve. It is used to describe the characteristics of populations and help us visualize the
inferences we make about the population. A lot of data follow this type of pattern. That’s why it’s widely used in
businesses, schools, and in different fields.

𝒞𝒽𝒶𝓇𝒶𝒸𝓉ℯ𝓇𝒾𝓈𝓉𝒾𝒸𝓈 ℴ𝒻 𝒩ℴ𝓇𝓂𝒶𝓁 𝒞𝓊𝓇𝓋ℯ

The graphical representation of the normal distribution is popularly known as a normal curve. The normal curve is
described clearly by the following characteristics.
1. The distribution curve is bell-shaped.
2. The curve is symmetrical about its center.
3. The mean, median, and mode coincide at the center.
4. The width of the curve is determined by the standard deviation of the distribution.
5. The tails of the curve are plotted in both directions and flatten out indefinitely along the horizontal axis. The tails are
thus asymptotic to the baseline. A portion of the graph that is asymptotic to a reference axis or another graph is called an
asymptote, always approaching another but never touching it.
6. The total area under a normal curve is 1.

ℰ𝓂𝓅𝒾𝓇𝒾𝒸𝒶𝓁 ℛ𝓊𝓁ℯ

The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the
mean:

68% of the data falls within one standard deviation of the mean.
95% of the data falls within two standard deviations of the mean.
99.7% of the data falls within three standard deviations of the mean.

𝒰𝓃𝒹ℯ𝓇𝓈𝓉𝒶𝓃𝒹𝒾𝓃ℊ 𝓉𝒽ℯ 𝒮𝓉𝒶𝓃𝒹𝒶𝓇𝒹 𝒩ℴ𝓇𝓂𝒶𝓁 𝒞𝓊𝓇𝓋ℯ

A Standard Normal Curve is a normal probability distribution that has a mean μ = 0 and a standard deviation σ = 1. To
calculate probabilities involving the normal distribution, we must calculate areas under the curve. The Table of Areas
under the Normal Curve is also known as the z—Table. The z—score is a measure of relative standing.

You might also like