You are on page 1of 6

Lesson 1: Classification & Organization of Data Experimental

Data Management - It is development, execution, and ➢ It is a system used to gather data from the results of
supervision of plans, policies, programs, and practices that performed series of experiments on some controlled
control, protect, deliver, and enhance the value of data and and experimental variables. This is commonly used
information assets. in scientific inquiries.
➢ It is administrative process by which the required ➢ Independent variable (IV) - The independent
data is acquired, validated, stored, protected, and variable in an experiments is the variable that is
processed. And by which it accessibility, reliability systematically manipulated by the investigator
and timeliness is ensured to satisfy the needs of (CAUSE)
data users. ➢ Dependent variable (DV) - The dependent variable
Statistics - The word statistics originated from the word in an experimental is the variable that the
“status” meaning “state”. investigator measures to determine the effects of the
➢ It is the science that deals with the collection, independent variable (EFFECT)
classification. Analysis and interpretation of Scientific Method - the data from the experiment force a
numerical facts or data, in such a way that valid conclusion consonant with reality. Thus scientific
conclusions and meaningful predictions can be methodology has a build-in safeguard for ensuring that the
drawn from them. truth assertions of any sort about reality must conform to
General Purposes of Statistics - Statistics are used to what is demonstrated to be objectively true about the
organize and summarize the information so that the phenomena before the assertions are given the status of
researcher can see what happened in the research study scientific truth.
and can communicate the results to other ➢ Descriptive Statistics - It involves the collection
➢ Statistics help the researcher to answer the and classification of data.
questions that initiated the research by determining A bowler wants to find his bowling average for the
exactly what general conclusions are justified based past 10 games
on the specific results that were obtained. A teacher wishes to determine the percentage of
Methods of Data Gathering students who passed the examination
➢ Direct or interview method ➢ Inferential Statistics - It involves the analysis and
➢ Indirect or questionnaire method interpretation of data
➢ Registration method ➢ A manager would like to predict based on previous
➢ Observation method years’ sales, the sales performance of a company
➢ Experimental method for the next five years.
Direct or interview method ➢ A politician would like to estimate, based on an
➢ it is a person to person encounter between the opinion poll, his chance for winning in the upcoming
source of information, the interviewee, and the one senatorial election.
who gathers information, the interviewer. Population - a population is the set of measurements
Indirect or questionnaire method corresponding to the entire collection of units about which
➢ It is the technique in which a questionnaire is used to the information is sought. It is the group of objects/subjects
elicit the information or data needed. about which conclusions are to be drawn.
Registration method Example:
➢ It obtains data from the records of government A. The score of entire students of Senior High School in
agency authorized by law to keep such data or EAC-Cavite.
B. All children of any age who have older or younger
information and made these available to
siblings in Barangay Lucsuhin.
researchers. Sample - a sample is a set of individuals selected from a
Example: Registration of birth, Registration of population, usually intended to be represent the population
Marriage, Registration of Death in a research study
Observation method Example:
➢ It is the technique in which data particularly those A. The scores of 50 students of Senior High School in
pertaining to the behaviors of individuals or group of EAC-Cavite.
individuals during the given situation. B. The 40 children who actually participated in one
➢ To notice using a full appropriate senses. To see, specific study about siblings in Barangay Lucsuhin.
hear, feel, taste, and smell
➢ This is also used when the respondents cannot read
nor write
Sample Determination Formula Data - are measurement or observations. A data set is a
collection of measurements or observations
Datum - is a single measurement or observation and is
commonly called a score or raw score.
➢ The measurements that are made on the subjects of
Where: an experiment are also called data.
➢ Usually data consist of the measurements of the
n = Sample size
dependent variable or of other subject
N = Population size characteristics, such as age, gender, number of
e = Desired margin of error (usually 0.05 or 5%) subjects, And so on. The data as originally
Example Problem measured are often referred to as raw or original
Compute the sufficient sample size of a target population scores.
consisting of 1 524 sixth-graders in a given school district Types of Data
using sample determination formula Qualitative Data - Data that deal with categories or
Given: attributes
N = 1 524 Example
e = 0.05 ➢ Color of Skin
➢ Course in Computer Engineering
Quantitative Data - Data that deal with numerical
Example
➢ Number if units in one semester
➢ Grade point average

Discrete and Continuous Data


➢ Discrete Date - Data that are obtained by Counting
Determine the sample size for each grade level given in the
Example
table below
➢ Number of students in the classroom
➢ Number of cars in the parking lot
➢ Continuous Data - Data that are obtained by
measuring
Example
➢ Are of mango farm in Pampanga
➢ Volume of water in a pool in Pansol, Laguna
Classification of Quantitative Data
➢ Continuous Data - It can assume any of infinite
number of values and can be associated with points
on a continuous line interval.
Example: Height, Weight, Volume
➢ Discrete Data - It results from either a finite number
of possible values or a countable number of possible
values.
Example: number of students, number of books,
and number of patients.
Parameter - is a value, usually a numerical value that
describes a population.
➢ A parameter is usually derived from measurements
of the individuals in the population
Statistics - is a value, usually a numerical value that
describes a sample. A statistic is usually derived from
measurements of the individuals in the sample
.Sampling Error - is a naturally occurring discrepancy, or
error, that exists between a sample statistic and the
corresponding population parameter.
Variable - It is any property or characteristic of some events, IQR (Interquartile Range)
object, or person that may have different values at different Q1 – Lower Quartile
times depending on the conditions. Q3 – Upper Quartile
➢ Qualitative Variable - a qualitative variable The Interquartile Range and Semi- Interquartile Range
describes an object or individual by placing the
object or individual into a category or group.
Example are gender, nationality, color, types of
personality, and product brand.
➢ Quantitative Variable - a quantitative variable has a
value or numerical measurement for which operation
can be applied. Example age, height, and weight
are quantitative
Level of measurement ( The Hierarchy of Levels )
Nominal - labels qualitative data into mutually exclusive Example: Find the IQR of the scores of
categories 9 students in an examination: 46 46 53 55 69 74 75 82 90
➢ At this level of measurement, the numbers in the Q1 = (1/4) (n+1)
variable are used only to classify the data. Words, Q1 = (0.25) (10) = 2.5nd data point
letters, and alpha-numeric symbols can be used. Q1 = 46 + 0.5 (53 – 46)
Example: Manila, Makati, Cavite
Q1 = 46 + 3.5
Ordinal - ranks qualitative data according to its degree
At this level of measurement, the numbers indicate an order Q1 = 49.5 lower quartile
Example: Low, Average, High 1st, 2nd, 3rd Example: Find the IQR of the scores of 9
Interval - numerical data that has order and its differences students in an examination: 46 46 53 55 69 74 75 82 90
can be determined; do not have a “true” zero Q3 = (3/4) (n+1)
➢ At this level of measurement, the numbers tell the Q3 = (0.75) (10) = 7.5th data pt
distances between the measurements in addition to Q3 = 75 + 0.5 (82 – 75)
the classification and ordering.
Q3 = 75 + 3.5
Example: Temperature
Ratio - Numerical data that has order, differences can Q3 = 78.5 upper quartile
be determined and has a “true” zero IQR = Q3 – Q1
➢ Has an absolute zero that is meaningful IQR = 78.5 – 49.5 = 29
Example: Speed, Height, Weight Outlier
➢ A value that "lies outside" (is much smaller or larger
Lesson 2: Measures of Central Tendency than) most of the other values in a set of data.
Measures of Central Tendency A single value that ➢ One way to determine if a data point is an outlier is
describes the center of a distribution: to use the interquartile range (IQR) method.
➢ Mean also known as the “average” or “arithmetic ➢ Lower Boundary : Q1 – 1.5 IQR
mean”
➢ Upper Boundary : Q3 + 1.5 IQR
➢ The sum of all values in a dataset divided by the
total number of observations Example #3:
➢ Median the middlemost score The following measures represent the raw scores of
➢ Mode the most frequent score 39 students in a 100-point examinations in Statistics.
The raw scores were arranged from lowest to highest. 42 45
Lesson 3: Measures Of Dispersion 46 48 50 50 51 52 52 53 56 56 57 58 59 60 60 62 62 63 63
63 64 64 65 65 66 68 70 70 70 70 71 72 72 75 77 78 100
Measures of Absolute Dispersion Determine if there are outliers in the set.
Where: Solution to get Outliers
➢ HS is the highest score Step 1. The scores must be arranged from
➢ LS is the lowest Score lowest to highest. 42 45 46 48 50 50 51 52 52 53 56 56 57
Example: Find the range of the scores of 9 students in an 58 59 60 60 62 62 63 63 63 64 64 65 65 66 68 70 70 70 70
examination: 46 46 53 55 69 74 75 82 90 71 72 72 75 77 78 100
R = HS – LS Step 3. Find the lower quartile (Q1). Q1 = ¼ (n+1)
R = 90 – 46 Q1 = 0.25 (39 + 1) = 10. The 10th score is 53
R = 44 42 45 46 48 50 50 51 52 52 53 56 56 57 58 59 60 60 62 62
The range is 44 63 63 63 64 64 65 65 66 68 70 70 70 70 71 72 72 75 77 78
100
Step 4. Find the upper quartile (Q3). Q3 = ¾(n+1) Standard Score
Q3 = 0.75 (39 + 1) = 30. The 30th score is 70 It measures how many standard deviation is above or below
42 45 46 48 50 50 51 52 52 53 56 56 57 58 59 60 60 the mean. Note: Not really a measure of relative dispersion
62 62 63 63 63 64 64 65 65 66 68 70 70 70 70 71 72 72 75 but related
77 78 100 Example
Step 5. Calculating the lower boundary and upper boundary. Rowan got a grade of 75% in English and a grade of 90% in
Lower Boundary: Q1 – 1.5 IQR History. The mean grade in English is 65% and the standard
53 – 1.5 (70 – 53) = 27.5 Deviation is 10%, whereas in History, the mean grade is 80%
Upper Boundary: Q3 + 1.5 IQR and the standard deviation is 20% in which subject did
70 + 1.5 (70 – 53) = 95.5 Rowan
Since there is a score of 100, it is considered as
an outlier since it is outside of the upper boundary of 95.5 Lesson 4: Symmetric and Asymmetric Distribution
The Variance Symmetric Distribution
S² = ➢ Property of a distribution that has the mean as the
Where: center, acting as a mirror image of the two sides of
x – scores the distribution
x̄- mean ➢ Most of the data values are found near the mean,
n – number of samples tapering off on both sides of the mean. The mean is
The Standard Deviation equal to the median
S= Asymmetric Distribution
Where: ➢ Lack of symmetry
x – scores ➢ Can be right-skewed distribution or left-skewed
x̄- mean distribution
n – number of samples
Example:
The birth weights in pounds of five (5) babies born in a
certain hospital on a certain day are 9.2, 6.4,10.5, 8.1, and
7.8. Find the variance and standard deviation of the weights.
The variance can be found by following these four steps.
➢ Find the mean.
➢ Subtract the mean from each of the five
Right-skewed distribution
samples/observations.
➢ The scores of students who did not study for a major
➢ Squaring these deviations from the mean
examination
➢ Taking the average of these squared deviations.
➢ The household income of Filipinos in a certain area
Following the above steps yields:
in the National Capital Region
Left-skewed distribution
➢ The number of people buying Christmas presents
during December.
➢ The hours that children spend playing with their
gadgets
Skewness is a measure or a criterion on how asymmetric
the distribution of data is from the mean.
Pearson Coefficient of skewness
is a method developed by Karl Pearson to find skewness in
S² = = 2.375 Therefore, the variance is 2.375 a sample using descriptive statistics like the mean and
S = 1.5411 Therefore, the standard deviation is 1.5411 mode.
Where: is the sample mean
Measures of Relative Dispersion Md – is the median
These are unit-less and are used when one wishes to s – standard deviation
compare the scatter of one distribution with another Sk = 0, symmetrical
distribution. Sk = positive; positively skewed
Sk = negative;negatively skewed
Symmetrical distribution and mode occurs when the
values of variables occur at regular frequencies and the Measure of Kurtosis
mean, median at the same point. ➢ Kurtosis is a measure of whether the data are
heavy-tailed or light-tailed relative to a normal
distribution.
➢ That is, data sets with high kurtosis tend to have
heavy tails, or outliers. Data sets with low kurtosis
tend to have light tails, or lack of outliers.

Positively skewed (or right-skewed)


distribution is a type of distribution in which most values are
clustered around the left tail of the distribution while the right
tail of the distribution is longer.
Leptokurtic
➢ indicates a positive excess kurtosis. The leptokurtic
distribution shows heavy tails on either side,
indicating large outliers.
➢ (Leptokurtic Distribution High Degree of
Peakedness)
Negatively skewed Platykurtic
is a type of distribution in which more values are ➢ A platykurtic distribution shows a negative excess
concentrated on the right side (tail) of the distribution graph kurtosis.
while the left tail of the distribution graph is longer ➢ The kurtosis reveals a distribution with flat tails.
➢ The flat tails indicate the small outliers in a
distribution
Measure of Kurtosis
➢ For ungrouped data
➢ For group data
Example 1
Example 1 • For 108 randomly selected high school students, the
• For 108 randomly selected high school students, the following IQ frequency distribution were obtained. Find the
following IQ frequency distribution were obtained. Find the coefficient of skewness & kurtosis of the distribution
coefficient of skewness of the distribution

Key Differences Between Skewness and Kurtosis


This is the fundamental differences between skewness
and kurtosis:
➢ The characteristic of a frequency distribution that
ascertains its symmetry about the mean is called
skewness. On the other hand, Kurtosis means the
relative pointedness of the standard bell curve,
defined by the frequency distribution.
➢ Skewness is a measure of the degree of
lopsidedness in the frequency distribution.
Conversely, kurtosis is a measure of degree of
tailedness in the frequency distribution
➢ Skewness is an indicator of lack of symmetry, i.e.
both left and right sides of the curve are unequal,
with respect to the central point. As against this,
kurtosis is a measure of data, that is either peaked
or flat, with respect to the probability distribution
➢ Skewness shows how much and in which direction,
the values deviate from the mean? In Contrast,
kurtosis explain how tall and sharp the central peak
is.

You might also like