Professional Documents
Culture Documents
Let's say we have a dataset representing the scores of 20 students on a math test:
85, 92, 76, 88, 92, 85, 76, 88, 92, 76, 85, 88, 92, 76, 88, 85, 92, 76, 88, 85
To create a frequency distribution, we count how often each score appears in the dataset:
Score Frequency
76 5
85 5
88 6
92 4
This frequency distribution provides a clear summary of how often each score occurs in the
dataset, which can be useful for further analysis and interpretation.
1. Dice Rolls: Suppose you roll a fair six-sided die 100 times and record the outcome of each roll. The
frequency distribution of the outcomes would show how many times each number (1 through 6)
appears in the 100 rolls.
Outcome Frequency
1 17
2 14
3 22
4 18
5 15
6 14
2. Letter Frequencies in a Text: Count the frequency of each letter in a given text. For example, in
the sentence "The quick brown fox jumps over the lazy dog", the frequency distribution of letters
would show how many times each letter appears.
Letter Frequency
A 1
B 1
C 1
D 1
E 3
F 1
G 1
H 2
I 1
Letter Frequency
J 1
K 1
L 1
M 1
N 1
O 4
P 1
Q 1
R 2
S 1
T 2
U 2
V 1
W 1
X 1
Y 1
Z 1
3. Exam Scores: As shown in the previous example, a frequency distribution of exam scores indicates
how many students received each score on a test.
Score Frequency
60 3
65 5
70 8
75 12
80 10
85 7
90 4
95 1
4. Height of Students: Suppose you measure the height of students in a class and record the data.
The frequency distribution would show how many students fall into different height ranges.
Height Range Frequency
140-150 cm 5
151-160 cm 10
161-170 cm 15
171-180 cm 8
181-190 cm 2
5. Number of Siblings: In a survey of students, you ask how many siblings they have. The frequency
distribution would show how many students have 0 siblings, 1 sibling, 2 siblings, and so on.
Number of Siblings Frequency
0 10
1 15
2 8
3 5
4 or more 2
Frequency distributions are useful for summarizing large datasets and understanding the
distribution of values within them.
25,30,35,40,45,50,55,60,65,7025,30,35,40,45,50,55,60,65,70
1. Mean (Average): The mean is calculated by summing up all the values in the dataset
and dividing by the total number of values.
Mean=25+30+35+40+45+50+55+60+65+7010=48510=48.5Mean=1025+30+35+4
0+45+50+55+60+65+70=10485=48.5
2. Median: The median is the middle value of the dataset when it's arranged in ascending
order. If there is an even number of values, the median is the average of the two middle
values.
25,30,35,40,45,50,55,60,65,7025,30,35,40,45,50,55,60,65,70
Since there are 10 values, the median is the average of the 5th and 6th values:
Median=45+502=952=47.5Median=245+50=295=47.5
3. Mode: The mode is the value that appears most frequently in the dataset.
These measures provide different insights into the central tendency of the dataset. In
this example, the mean and median are relatively close, indicating a symmetric
distribution, while the lack of a mode suggests that there is no age that stands out as
the most common among the individuals.
MEASURES OF DISPERSION
25,30,35,40,45,50,55,60,65,7025,30,35,40,45,50,55,60,65,70
Here are two common measures of dispersion: range and standard deviation.
1. Range: The range is the difference between the maximum and minimum values in the
dataset.
Range=Maximum value−Minimum valueRange=Maximum value−Minimum
value
In this dataset, the minimum age is 2525 and the maximum age is 7070, so
Range=70−25=45Range=70−25=45
The range of ages in this dataset is 4545, indicating that the ages span a range of
4545 years.
2. Standard Deviation: The standard deviation measures the average deviation of each
data point from the mean of the dataset. It provides a measure of how much the values
in the dataset differ from the mean.
The formula for the sample standard deviation is:
Standard Deviation=∑�=1�(��−�ˉ)2�−1Standard Deviation=n−1∑i=1n(xi
−xˉ)2
where ��xi are the individual data points, �ˉxˉ is the mean of the dataset, and �n
is the number of data points.
First, we calculate the mean of the ages:
Mean=25+30+35+40+45+50+55+60+65+7010=48510=48.5Mean=1025+30+35+4
0+45+50+55+60+65+70=10485=48.5
Then, we calculate the squared deviations from the mean for each data point:
(25−48.5)2,(30−48.5)2,...,(70−48.5)2(25−48.5)2,(30−48.5)2,...,(70−48.5)2
After summing up these squared deviations, dividing by �−1n−1 (where �n is the
number of data points), and taking the square root, we get the standard deviation.
Standard Deviation=(25−48.5)2+(30−48.5)2+...+(70−48.5)29Standard Deviati
on=9(25−48.5)2+(30−48.5)2+...+(70−48.5)2
Calculating this value yields the standard deviation of the dataset.
These measures help us understand how spread out the ages are in the dataset. A
larger range or standard deviation indicates greater variability among the ages, while a
smaller range or standard deviation indicates less variability.
GRAPHING DISTRIBUTION
Sure! One common way to graph a distribution is by using a histogram. Let's use the example of
the exam scores dataset we discussed earlier:
Exam Scores:
Let's say we decide to divide the data into 5 intervals (bins) and calculate the width of each interval
as 10 (since the range of scores is 35, from 60 to 95). Here's how we can construct the histogram:
bashCopy code
Frequency ^ | 3 | # | # 2 | # # | # # 1 # | # # --------------> Exam Scores 60-69 70-79 80-89 90-99
In this histogram:
Histograms provide a visual representation of the distribution of data, making it easier to identify
patterns, trends, and central tendency.