You are on page 1of 13

‫‪STAT – 110‬‬

‫يهخص إحصبء ‪110‬‬


‫نطالة انغُخ انزحض‪ٛ‬ش‪ٚ‬خ – كه‪ٛ‬خ انؼهٕو‬

‫‪CH. 3‬‬
‫ششح يجغط ثبنهغز‪ ٍٛ‬انؼشث‪ٛ‬خ ٔاإلَجه‪ٛ‬ض‪ٚ‬خ‬
‫(أيضهخ يزُٕػخ ٔأعئهخ اخزجبساد عبثقخ)‬

‫إعـــــداد‬
‫أستاذ‪ /‬قاسم مضوي‬

‫جٕال ‪0502180703‬‬

‫‪NEW‬‬
‫‪1432-1433‬‬
CHAPTER 3: DATA DESCRIPTION:
:ْٙ ‫زكٌٕ يٍ أسثؼخ أقغبو‬ٚٔ :‫بَبد‬ٛ‫ ٔصف انج‬:‫انجبة انضبنش‬
1- MEASURES OF CENRTRAL TENDENCY : ‫خ‬ٚ‫ظ انُضػخ انًشكض‬ٛٚ‫يقب‬

)ٖ‫ط ٔانًُٕال َٔصف انًذ‬ٛ‫ظ ( انًزٕعط ٔانٕع‬ٛٚ‫ يقب‬4 ْٙٔ


1. Mean = average ٙ‫انًزٕعط انحغبث‬
2. Median ‫ط‬ٛ‫انٕع‬
3. Mode ‫انًُٕال‬
4. Midrange ٖ‫َصف انًذ‬

MEAN: ٙ‫انًزٕعط انحغبث‬

sum of values divided by total number of that values.


It is used only with quantitative data.
.‫ يجًٕع انًفشداد (انًشبْذاد) يقغٕيب ػهٗ ػذدْب‬:ٙ‫ف انًزٕعط انحغبث‬ٚ‫رؼش‬
)u( ‫) بُنما متىسط انمجتمع َسمص نه‬X( ‫متىسط انعُنت َسمص نه‬
EXAMPLE 3-1:
Find the mean 2, 9, 4
SOL.:

Mean = X 
 x  2  9  4  15  5
n 3 3

EXAMPLE 3-2:
FIND THE VALUE OF X IF YOU HAVE 2, X, 9, the mean of these value equal to 5.

SOL.:
X 
x 5
2 x9 11  x  15
n 3  x  15  11  4

Median: ‫ط‬ٛ‫انٕع‬:
‫ب‬ٛ‫ب أٔ رُبصن‬ٚ‫جٓب رصبػذ‬ٛ‫بَبد ثؼذ رشر‬ٛ‫ رزٕعط انج‬ٙ‫ًخ انز‬ٛ‫ْٕ انق‬
EXAMPLE 3-3:
Find the median to these values: 2, 5, 9, 7, 3
SOL.: first step we should arrange the value as sending or descending order.
Arranging data 2, 3, 5, 7, 9
Then the median equal to 5 (the value in the middle of values)

EXAMPLE 3-4:
Find the median to 2, 5, 9, 7, 3 , 12
Arranging data: 2, 3, 5, 7, 9, 12
Then the median equal to:

2
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
MODE: ‫انًُٕال‬
Is the value which having a higher frequency. .‫ انقُمت األكثس تكسازا أو شُىعا‬:‫تعسَفه‬

EXAMPLE 3-5:
Find the mode: 6, 7, 2, 6, 5, 6
SOL.: Mode = 6 (we say data having unimodal)

EXAMPLE 3-6:
Find the mode 6, 7, 2, 6, 5, 7
mode = 6, 7 (bimodal)

REMARKS:
:‫يالحظخ يًٓخ جذا جذا‬
‫ذح انًُٕال‬ٛ‫بَبد رغًٗ ٔح‬ٛ‫بَبد يُٕال ٔاحذ انج‬ٛ‫إرا كبٌ نهج‬
.‫خ انًُٕال‬ٛ‫ يُٕال) رغًٗ صُبئ‬2( ٍٛ‫بَبد يُٕان‬ٛ‫إرا كبٌ نهج‬
‫ٍ رغًٗ يزؼذدح انًُٕال‬ٛ‫بَبد أكضش يٍ يُٕان‬ٛ‫إرا كبٌ نهج‬
one mode - - - unimodal
two mode ---- bimodal
three mode --- multimodal

4- Midrange: ‫غ‬ٛ‫انًذٖ انشث‬


Midrange = highest value + lowest value
2

EXAMPLE 3-7:
Find the midrange for 2, 9, 4

Midrange = highest value + lowest value = 2 + 9 = 5.5


2 2

ٌ‫ قاسم مضى‬/‫أستاذ‬
0502180703

3
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
2- MEASURES OF VARIATIONS (Dispersion): ‫ظ االخزالف‬ٛٚ‫يقب‬

range ٖ‫انًذ‬
2
)S ّ‫شيض ن‬ٚ ‫ُخ‬ٛ‫ٍ انؼ‬ٚ‫ (رجب‬sample variance ٍٚ‫انزجب‬
)S ّ‫شيض ن‬ٚ ‫ُخ‬ٛ‫ (اَحشاف انؼ‬sample standard deviation ٘‫بس‬ٛ‫االَحشاف انًؼ‬
)C.V ّ‫شيض ن‬ٚ( coefficient of variation ‫يؼبيم االخزالف‬
.‫ظ انزشزذ‬ٛٚ‫ظ االخزالف أٔ يقب‬ٛٚ‫ظ أػالِ رغًٗ يقب‬ٛٚ‫انًقب‬

1- Range:
the highest value minus the lowest value.
‫انمدي عبازة عه أعهً قُمت – أقم قُمت‬

EXAMPLE 3-8:

Find the range 0, -16, 9 , -7 , 3

Range = 9 – (-16) = 25

2- VARIANCE: ٍٚ‫انزجب‬
 x 2

x   X  X 
2 2

S 
2 n S 2

n 1 OR n 1
Where X is mean (average)

EXAMPLE 3-9:
Find the variance to 2, 4, 9
ٌ‫ قاسم مضى‬/‫أستاذ‬
SOL:
0502180703
 x 2

x 2

n
S 
2

n 1

X X2
2 4
4 16
9 81
Sum 15 Sum 101 ‫انمجمىع‬

 x 2
152
x 2

n
101 
3
S2    13
n 1 3 1

4
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
3- STANDARD DEVIATION: ٘‫بس‬ٛ‫االَحشاف انًؼ‬

Definition: Is the square root of variance. .ٍٚ‫ نهزجب‬ٙ‫ؼ‬ٛ‫ ْٕ انجزس انزشث‬:ّ‫ف‬ٚ‫رؼش‬

 x  2

x 2

n
S
n 1

EXAMPLE 3-10:

Find the standard deviation for 2, 4, 9

 x  2
152
x 2

n
101 
3
S   13  3.61
n 1 3 1

4- COEFFICIENT OF VARIATION: ‫يؼبيم االخزالف‬


Used to compare between two phenomena (i.e. age and salary).

‫ٍ أػًبس ػًبل ٔسٔارجٓى) ٔرنك ثقغًخ‬ٛ‫ٍ يضم (انًقبسَخ ث‬ٛ‫ٍ ظبْشر‬ٛ‫غزخذو نهًقبسَخ ث‬ٚ ‫بط‬ٛ‫يق‬
ٗ‫بس٘ نهشٔارت ػه‬ٛ‫ى االَحشاف انًؼ‬ٛ‫ صى رقغ‬، ‫بس٘ نهؼًش ػهٗ يزٕعط انؼًش‬ٛ‫االَحشاف انًؼ‬
.‫ األكضش اخزالفب‬ْٙ ٌٕ‫ًخ األكجش رك‬ٛ‫ًُٓب ٔانق‬ٛ‫يزٕعط انشٔارت ٔيٍ صى انًقبسَخ ث‬
S
C.V   100%
X
EXAMPLE 3-11:

Find C.V for 2, 4, 9.

We find X = 5, and S = 3.61 , then


S 3.61
C.V   100%   100  72.2%
X 5
EXAMPLE 3-12:
The average age of the employees at a certain company is 30 years with standard
deviation of 5 years ; the average salary of the employees is $40.000 with standard
deviation of $ 5.000. which one has more variation age or income ?
S 5
age C.V   100%   100  16.67%
X 30
S 5000
salary C.V   100%   100  12.5%
X 40000
Then age has more variation.

5
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
‫يالحظخ يًٓخ جذا ‪REMARKS:‬‬
‫‪1. MEAN < MEDIAN < MODE‬‬
‫‪Then the distribution has a negative skewed or skewed to the left.‬‬

‫‪2. MODE < MEDIAN < MEAN‬‬


‫‪Then the distribution has a positive skewed or skewed to the right.‬‬

‫‪3. MODE = MEDIAN = MEAN‬‬


‫‪Then the distribution is symmetrical.‬‬

‫يالحظخ يًٓخ‪:‬‬
‫انؼالقخ ث‪ ٍٛ‬انًزٕعط انحغبث‪ٔ ٙ‬انٕع‪ٛ‬ط ٔانًُٕال‪.‬‬
‫‪ -1‬إذا كان انمتىسط انحسابٍ أقم قُمت مه انىسُط وانمنىال فإن انتىشَع مهتىٌ جهت انُساز‪.‬‬
‫‪ -2‬إذا كان انمتىسط انحسابٍ أكبس قُمت فمعنً هرا أن انتىشَع مهتىٌ جهت انُمُه‪.‬‬
‫‪ -3‬إذا تساوي انمتىسط انحسابٍ وانىسُط وانمنىال فإن انتىشَع َسمً متماثم‪.‬‬

‫‪6‬‬
‫أعزبر‪ /‬قبعى يضٕ٘ – جٕال ‪0502180703‬‬
3- MEASURES OF POSITIONS: ‫ظ انًٕاضغ‬ٛٚ‫صبنضب يقب‬
1. STANDARD SCORE ‫خ‬ٚ‫بس‬ٛ‫انذسجخ انًؼ‬
2. QUARTILES ‫بد‬ٛ‫األسثبػ‬
3. OUTLIER ‫ًخ انًزطشفخ‬ٛ‫انق‬

1- STANDARD SCORE: ٌ‫ قاسم مضى‬/‫أستاذ‬


Z
XX 0502180703 ‫جىال‬
S

EXAMPLE 3-13:
If the mean of a set of data is 19 and 23.5 has z-sore of 0.75, then the standard
deviation must be?
SOL.:
X = 23.5, and X = 19, Z = 0.75, S = ?
XX
Z
S
23.5  19
0.75 
S
4.5
0.75S  4.5 , S 6
0.75

EXAMPLE 3-14:
The mean of marks in test A is 80 and the standard deviation is 5,
if student get 70 in this test.
His relative position (z-score) in this exam is:
A) 6 B) 36 C)-3 D) -2

If the student in question (EXAMPLE 3-14) has a z-score = -1 in test B, one can say
about his relative position that:
A- B better is better than A B- information are not sufficient
C- Test (A) better than test (B) D- Test (A) is similar to test (B)

2. QUARTILES ‫بد‬ٛ‫األسثبػ‬

Q2 َٙ‫غ انضب‬ٛ‫ ٔانشث‬Q1 ‫غ األٔل‬ٛ‫ انشث‬ْٙٔ ‫بَبد إنٗ صالس أقغبو‬ٛ‫ رقغى انج‬ٙ‫ى انز‬ٛ‫قصذ ثٓب انق‬ٚٔ
Q3 ‫غ انضبنش‬ٛ‫ ٔانشث‬، ‫ط‬ٛ‫ْٕٔ َفغّ انٕع‬
Quartile divided the data into 4 groups.
Quartiles are denoted by Q1, Q2 , and Q3.
The second quartile (Q2) is median.

7
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
3. OUTLIER ‫ًخ انشبرح‬ٛ‫انق‬
An outlier is an extremely high or an extremely how data value when compared with
the rest of the data values.
، ‫ى‬ٛ‫ انق‬ٙ‫ظ نٓب ػالقخ ثجبق‬ٛ‫ًخ ركٌٕ ن‬ٛ‫ ق‬ْٙٔ ‫ًخ‬ٛ‫ًخ أٔ أقم ق‬ٛ‫ أصغش ق‬ْٙ ‫ًخ انشبرح‬ٛ‫(انق‬
ٔ‫ًكٍ اعزخشاجٓب ثغٕٓنخ يٍ دٌٔ قبٌَٕ أ‬ٚٔ ‫بَبد انًؼطبح‬ٛ‫ٔٔاضحخ جذا ثبنُغجخ نهج‬
.)‫ًخ انشبرح أو ال‬ٛ‫ٕجذ قبٌَٕ نهزأكذ يٍ أَٓب انق‬ٚ ٍ‫ ٔنك‬، ‫قبػذح‬
Inter Quartile Range = IQR = Q3 – Q1
IQR ٙ‫ؼ‬ٛ‫انًذٖ انشث‬

:‫ًخ انشبح‬ٛ‫خطٕاد اعزخشاط انق‬


To determine whether a data value can considered as an outlier:
1. computer Q1 and Q3.
2. Find the IQR = Q3 – Q1
3. Compute 1.5 * IQR
4. Compute Q1 - 1.5 * IQR & Q3 + 1.5 * IQR
To determine whether a data value can be considered as an outlier.
5. Compare the data value with Q1 - 1.5 * IQR & Q3 + 1.5 * IQR
If x < Q1 - 1.5 * IQR or x > Q3 + 1.5 * IQR then x is considered as outlier.

EXAMPLE 3-15: Given the data set 5, 6, 12, 13, 15, 18, 22, 50 Find the outlier ?
‫ًخ انشبرح ؟‬ٛ‫جبد انق‬ٚ‫انًطهٕة إ‬
First of all we should arrange data , our data is already arranged.
Arranging of data : 5, 6, 12, 13, 15, 18, 22, 50
Q2 = (13+15)/2 = 14
Q1 = (6+12)/2 = 9
Q3= (18+22)/2 = 20
IQR = Q3- Q1 = 20- 9 = 11
1.5 * IQR = 1.5 * 11 = 16.5
Q1 – 1.5 * IQR = 9 – 16.5 = -7.5
Q3 + 1.5 * IQR = 20 + 16.5 = 36.5

Our range is -7.5 36.5


Then 50 is out of this range, then 50 is outlier.

8
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
4- EXPLORATORY DATA ANALYSIS:
Stem and leaf plot:‫انغبق ٔاألٔساق‬
EXAMPLE 3-16:
If you have the following data 20, 32, 13, 14, 43, 02, 57, 23, 36, 32, 25, 31, 33, 32,
44, 32, 52, 44, 51, 45.
‫بَبد انغبثقخ‬ٛ‫قى ثإَشبء انغبق ٔاألٔساق نهج‬
Construct a stem and leaf for the data.
SOL. ‫انحم‬
First step we must arranging data as following: ٙ‫ه‬ٚ ‫ب كًب‬ٚ‫جب رصبػذ‬ٛ‫بَبد رشر‬ٛ‫الصو َشرت انج‬
02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45, 51, 52, 57
Stem Leaf
0 2
1 3 4
2 0 3 5
3 1 2 2 2 2 3 6
4 3 4 4 5
5 1 2 7

BOX PLOT GRAPH:‫سعًخ انجٕكظ ثٕنذ‬


To draw Box plot graph we find Q1, Q2, and Q3 minimum and maximum value
as following: ‫ ٔانضبنش‬َٙ‫غ األٔل ٔانضب‬ٛ‫جبد انشث‬ٚ‫جت إ‬ٚ ‫نشعى انصُذٔق‬

EXAMPLE 3-17:
Given the data set 5, 6, 12, 13, 15, 18, 22, 50 construct Box plot ?
First of all we should arrange data , our data is already arranged.
Arranging of data : 5, 6, 12, 13, 15, 18, 22, 50
Q2 = (13+15)/2 = 14
Q1 = (6+12)/2 = 9
Q3= (18+22)/2 = 20

Q1 = 9 Q2 = 14 Q3 = 20
Max. = 50
Min. = 5

9
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
REMARKS:‫يالحظبد يًٓخ جذا‬
1. If the median is near the center of the box, the distribution is approximately
symmetric.
‫بَبد رؼزجش يزًبصهخ‬ٛ‫ انًُزصف فإٌ انج‬ٙ‫ ف‬Q2 ‫ط‬ٛ‫إرا كبٌ انٕع‬
2. If the lines are about the same length, the distribution is approximately
symmetric.
.‫غ يزًبصم‬ٚ‫بٌ فإٌ انزٕص‬ٚٔ‫ انصُذٔق يزغب‬ٙ‫ طشف‬ٙ‫إرا كبٌ انخطبٌ ف‬

3. if the median fall to the left of the center of the box, the distribution is
positively skewed.
.ًٍٛٛ‫ؼزجش يهزٕ جٓخ ان‬ٚ ‫غ‬ٚ‫ فإٌ انزٕص‬، ‫غبس‬ٛ‫ ان‬ٙ‫ ف‬Q2 ‫ط‬ٛ‫إرا كبٌ انٕع‬
4. If the right line is larger than the left line, the distribution is positively
skewed.
.ًٍٛٛ‫غ يزهٕ جٓخ ان‬ٚ‫غبس انصُذٔق فإٌ انزٕص‬ٚ ‫ٍ انصُذٔق أكجش يٍ انخط‬ًٛٚ ‫إرا كبٌ انخط‬

5. if the median fall to the right of the center of the box, the distribution is
negatively skewed.
.‫غبس‬ٛ‫غ يهزٕ جٓخ ان‬ٚ‫ فإٌ انزٕص‬، ًٍٛٛ‫ ان‬ٙ‫ ف‬Q2 ‫ط‬ٛ‫إرا كبٌ انٕع‬
6. If the left line is larger than the right line , the distribution is negatively
skewed.
.‫غبس‬ٛ‫غ يهزٕ جٓخ ان‬ٚ‫ٍ انصُذٔق فإٌ انزٕص‬ًٛٚ ‫غبس انصُذٔق أكجش يٍ انخط‬ٚ ‫إرا كبٌ انخط‬

10
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
‫أعئهخ اخزجبساد عبثقخ ػهٗ انفصم انضبنش‬
Prepared by: Gasim Mudawi – Mobile 0502180703

Use the following to answer questions 22-24*:


The number of absences days for 7 students are 1, 3, 0, 4, 2, 1, 3. answer the
following three questions:
22. The values of the central tendency measures are …
a) mean = 2, mode = 1 c) mean = 2, range = 4
b) mean = 2, median = 2 d) median = 2, mode = 3
23. The values of the measures of variations are …
a) standard deviation = 1.414, midrange = 2 c) standard deviation = 1.414 range = 4
b) variance = 2, range = 2 d) variance = 1.414, range = 4

24. The value of the coefficient of variation is….


a) 47.1% b) 35.4% c) 28.3% d) 70.7%

25.Which measures are affected the most by extremely small or extremely large
values?
a) Mode and midrange. c) Mean and mode.
b) Mean and midrange. d) Mode and median.

* Reference: First term – first exam – form A– 1431-1432H.

Use the following to answer questions 21-30*:


The ages in years of 8 patients who are randomly selected from a certain hospital are:
10, 30, 2, 16, 7, 38, 26, 23
21. Referring to the mode value of the patient's ages, the data is said to be/have..
a) Multimodal b) no mode c) bimodal d) unimodal.

22. Referring to the sample data of patient's ages, any age value greater than ... is
consider to be an outlier.
a) 56 b) 57.25 c) 8 d) -1.25

23. The midrange value of the patient's age is ….


a) 21 b) 21.5 c) 20 d) 16.5

24. The median value of the patient's age is ……….


a) 19 b) 11.5 c) 19.5 d) 27.5

25. Referring to the given data, the shape of the patient's age distribution is …
a) symmetrical b) left skewed c) unknown d) right skewed

26. If the value of the coefficient variation of the patient's height is 14.221%, then
which is more variable patient's age or height?
11
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
a) Height is more variable than age c) Cannot tell.
b) They have the same variability d) Age is more variable than height.
27. The value of the standard deviation of the patient's age is …
a) 152.86 b) 53.74 c) 59.54 d) 12.36
28. The mean value of the patient's age is …
a) 118.75 b) 152 c) 76.00 d) 19.00
29. The IQR value for the patient's age is ……..
a) 16 b) 19.5 c) 11.5 d) 19.5
30. The range value of the patient's age is ……
a) 20 b) 36 c) 16.5 d) 13
* Reference: First term – first exam –form D- 1432-1433H.

the most appropriate measure of central tendency for the value -21, -24, -27, -23, -25,
-26, -22, -30, -29, -31, 35 is the ……..
a) mode b) median c) mean c) midrange
What is the most appropriate measure of central tendency for the following data set?
A, C, A, B, C, B, A
a) median b) mean c) midrange d) mode

which measures of central tendency are not affected by outliers?


a) Mean and mode. c) Mean and midrange.
b) Weighted mean and mean. d) Mean and median.
The stem part for the number 654 is……..
a) 6 b) 54 c) 4 d) 65
use the following to answer questions 12-13:
the ages of a department instructors are shown in the following stem and leaf:
3 0 0 0 3 3 3 7 7 7
4 2 2 2 2 6 6 6 6 8 8 8 8 8
5 0 0 0 0 5 5 5 5 5 7 7 7
6 1 1 1 2 2 4 6 6 6 6 7
7 4 4 4 4 6 6 6 9
8 3 7 9
12. The range of the raw data for the above stem and leaf plot is…
a) 9 b) 6 c) 5 d) 59
13. The raw data set for the stem and leaf is called …
a) unimodal b) bimodal c) multimodal d) trimodal
when we want to compare the variability of student's grade and height, we should use
the…
a) standard deviation b) variance c) coefficient of variation d) range
if data value is not within the range Q1- 1.5 IQR, Q3 + 1.5 IQR, then this value is called:
a) the first quartile b) the third quartile b) the median d) an outlier
what is the statistical term for the mean that is obtained by using all the data values
of a specific population:
a) variable b) statistic c) parameter d) quality.
what is the statistical term for the mean that is obtained by using a group of the data
values of a specific population:
a) variable b) statistic c) parameter d) quality.
12
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬
When a study is conducted on a group of students from KAU, every measurement
calculated fro this study would be called …
a) statistic b) sample c) population d) parameter

When a study is conducted on all students from KAU, every measurement calculated
fro this study would be called …
a) statistic b) sample c) population d) parameter

Use the following to answer questions 15-17:

The age of a department instructors are shown in the following stem and leaf plot:
3 1
4 2 2 2 6 6
5 0 7
6 1
15. The value of the median for the raw data for the above stem and leaf plot is…
a) 42 b) 46 c) 2 d) 6
16. the raw data set for the stem and leaf is called …
a) unimodal b) bimodal c) multimodal d) trimodal

17. The midrange of the raw data for the above stem and leaf plot is …
a) 6 b) 59 c) 9 d) 46

Use the following to answer questions 12-14:


Use the following Boxplot to answer the following three questions:

0 5 10 15 20 25 30 35 40

12. The value of the IQR is ….


a) 35 b) 25 c) 40 d) 10
13. The range of the raw data for the above boxplot is ….
a) 25 b) 40 c) 10 d) 35
14. The shape of the distribution is …
a) bimodal b) symmetric c) left skewed d) right skewed

The End …..


Prepared by: Gasim Mudawi – Mobile 0502180703

13
0502180703 ‫ قبعى يضٕ٘ – جٕال‬/‫أعزبر‬

You might also like