Professional Documents
Culture Documents
Lecture 1:
1. Rules:
∑ 𝑭𝑿
• Mean: 𝒙
̅= (Average)
𝒏
• Median: The middle value of the arranged numbers from smallest to
the largest if the total numbers are odd. ( الرقم اللي في النص لو عدد االرقام
)فردي
The sum of the two middle values of the arranged numbers from
smallest to the largest divided by n if the total numbers are even
)( مجموع الرقمين اللي في النص علي عددهم لو عدد االرقام زوجي
• Mode: value that occurs most frequently ((اكتر رقم اتقرر
• IQR: Q3 - Q1
• Outliers: the value lower than Q1 - 1.5 (IQR) or the value higher than
Q3 + 1.5 (IQR)
̅)2
𝜮(𝒙−𝒙
• Variance: 𝑆 2 =
𝐧−𝟏
̅)2
𝜮(𝒙−𝒙
• Standard deviation: S = √
𝐧−𝟏
̅
𝒙−𝒙
• 𝐳 − 𝐬𝐜𝐨𝐫𝐞 =
𝐒
2. Examples:
1) Given the following numerical dataset:
244 191 160 187 180 176 174 205 211 183 211 180 194 200
i. Calculate the mean, median, the first quartile, the third quartile and the
IQR
ii. Are there any outliers in this data? Find them if they are existed.
iii. Construct the box plot for this data.
Solution:
اول خطوه ارتبهم من الصغير للكبير
160 174 176 180 180 183 187 191 194 200 205 211 211 244
𝟏𝟔𝟎 + 𝟏𝟕𝟒 + 𝟏𝟕𝟔 + 𝟏𝟖𝟎 + 𝟏𝟖𝟎 + 𝟏𝟖𝟑 + 𝟏𝟖𝟕 + 𝟏𝟗𝟏 + 𝟏𝟗𝟒 + 𝟐𝟎𝟎 + 𝟐𝟎𝟓 + 𝟐𝟏𝟏 + 𝟐𝟏𝟏 + 𝟐𝟒𝟒
I. Mean: 𝟏𝟒
= 177.5
𝟏𝟖𝟕 + 𝟏𝟗𝟏
Median (Q2): = 189
𝟐
dataset في النص بين ارقام الmedian الزم احط الq1 , q3 علشان اجيب
160 174 176 180 180 183 187 189 191 194 200 205 211 211 244
.median ونص بعد الmedian وبعد كدا اقسمهم نصين نص قبل ال
160 174 176 180 180 183 187
.q1=180 فبكدا180 بتاع النص االوالني هنا بيساويmedian وبعد كدا اجيب ال
191 194 200 205 211 211 244
.q3=205 فبكدا205 بتاع النص االوالني هنا بيساويmedian وبعد كدا اجيب ال
IQR = q3 – q1 = 205 – 180 = 25
II. Outliers: Q1 - 1.5 (IQR) = 180 – 1.5 (25) = 142.5
Q3 + 1.5 (IQR) = 205 + 1.5 (25) = 242.5
The outliers are the numbers smaller than 142.5 and greater than 242.5
244 is outlier.
III.
Min Q1 Max
Q2 Q3
0 200 250
2) Ali and Hany took different exams (A and B) if the grade of Ali was 86
and the grade of Hany was 82. If we want to compare their test scores,
?who gets higher grade Ali or Hany
Mean Standard
)deviation (S
Exam A 79 3.8
Exam B 77 2.5
Solution:
اول حاجه الزم اعرف انا بستخدم ال z-scoreليه اوال انا لو عندي اتنين طلبه امتحنوا نموذجين
مختلفين عايز اعرف مين جاب احسن من التاني مش معني ان حد جاب اعلي يبقي احسن ال الزم اقارن
مين احسن من حيث درجات امتحان النموذج االول للطالب االوالني واقارن من حيث درجات النموذج
التاني للطالب التاني دا بقي اسمه .z-score
̅
𝒙𝒙− 𝟗𝟕 𝟖𝟔 −
= 𝑨 𝒎𝒂𝒙𝒆𝒛 = = 1.842 For Ali
𝐒 𝟖𝟑.
̅
𝒙𝒙− 𝟕𝟕 𝟖𝟐 −
= 𝑩 𝒎𝒂𝒙𝒆𝒛 = = 2 For Hany
𝐒 𝟓𝟐.
1. Rules:
̅
𝒙−𝒙
1. 𝐳 − 𝐬𝐜𝐨𝐫𝐞 = ̅ 𝒊𝒔 𝒕𝒉𝒆 𝒎𝒆𝒂𝒏 𝒐𝒇 𝒙 𝒂𝒏𝒅 𝑺 𝒊𝒔 𝒕𝒉𝒆 𝒔𝒕𝒂𝒏𝒅𝒓𝒂𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏)
Where (𝒙
𝐒
∑ 𝒛𝒙 𝒛𝒚
2. 𝒓 = (Correlation coefficient rule)
𝒏−𝟏
𝒔𝒚
3. 𝒃𝟏 = 𝒓 ( )
𝒔 𝒙
̅ − 𝒃𝟏 𝒙
4. 𝒃𝟎 = 𝒚 ̅
̂ = 𝒃𝟏 𝒙 + 𝒃𝟎 (Predicted value)
5. 𝒚
6. Error =|𝒚
̂ − 𝒚| where (𝒚̂ 𝒊𝒔 𝒕𝒉𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 𝒗𝒂𝒍𝒖𝒆 𝒂𝒏𝒅 𝒚 𝒊𝒔 𝒕𝒉𝒆 𝒓𝒆𝒂𝒍 𝒗𝒂𝒍𝒖𝒆)
Attendance (y) 73 78 84 88 29 35 39 44
I. Calculate the correlation coefficient between the attendance of students and the
semester weeks. Comment on its value. (𝒙 ̅ (Mean of weeks) = 4.50, 𝒚
̅ (Mean of
attendance) = 58.75, 𝒔𝒙 =2.449489743, 𝒔𝒚 = 24.27079374).
II. What would be the number of students attend the play games room at week number
ten?
III. What is the error in the predicted value at week 4?
Solution: -
وطبعا علشان نجيبها محتاجين نعمل نكمل علي الجدول اللي فوق هنزودr هنا احنا بنجيب ال .I
والتالتz-score of y والصف التاني هيبقيz-score of x صفوف اول صف هيبقي ال3
واهم حاجه وانت بتحسبهم متنساش تاخد رقمين ارقام بعدscore z- ضرب الصفين بتوع ال
. r ( وبعد كدا طبعا نجيب الpoint) ال
𝒙−𝒙̅
𝐳 − 𝐬𝐜𝐨𝐫𝐞 𝒐𝒇 𝒙 =
𝐒
Week (x) 1 2 3 4 5 6 7 8
Attendance (y) 73 78 84 88 29 35 39 44
𝒛𝒙 -1.4289 -1.0206 -0.6124 -0.2041 0.2041 0.6124 1.0206 1.4289
𝒛𝒚 0.5871 0.7931 1.0403 1.2052 -1.2258 -0.9785 -0.8137 -0.6077
𝒛𝒙 ∗ 𝒛𝒚 -0.8389 -0.8095 -0.6371 -0.2460 -0.2502 -0.5992 -0.8305 -0.8684
r هنا بنجمع اخر صف علشان نجيب ال
∑ 𝒛𝒙 𝒛𝒚 = -5.0798
∑ 𝒛𝒙 𝒛𝒚 −𝟓.𝟎𝟕𝟗𝟖
𝒓= = = -0.7257
𝒏−𝟏 𝟖−𝟏
̂ = 𝒃𝟏 𝒙 + 𝒃𝟎 = 𝟗𝟏. 𝟏 − 𝟕. 𝟏𝟗 𝒙
𝒚
At 𝒙 = 𝟒, 𝒚̂ =𝟗𝟏. 𝟏 − 𝟕. 𝟏𝟗 (4) = 62
Error =|𝒚
̂ − 𝒚| = |𝟔𝟐 − 𝟖𝟖| = 𝟐𝟔
Another Example:
Solution:
i. Sales = 332.0269 + 3.1924 (Advertising cost)
̂ = 𝒃𝟏 𝒙 + 𝒃𝟎 لو ركزت في المعادله دي هتالقي ان القانون دا زي دي
𝒚
B1 =3.1924 ودا معناه ان
Standard deviation of X and Standard deviation of y وانا عندي ال
𝒔𝒚
𝒃𝟏 = 𝒓 ( )
𝒔𝒙
𝟔𝟏.𝟗𝟒𝟕𝟔
3.1924=r( ) , r=0.6456
𝟏𝟐.𝟓𝟐𝟕𝟕
Between 0.6 to 0.99 is strong Positive (direct relation(
ii. Calculate the real and estimated values of sales at week 30
mean(Average) فهجيبها عن طريق الreal value is missing انا عندي ال
∑𝒚
̅=
𝒚 𝒏
𝟑𝟖𝟓+𝟒𝟎𝟎+𝟑𝟓𝟗+𝟑𝟔𝟓+𝒚+𝟒𝟒𝟎+𝟒𝟗𝟎+𝟒𝟐𝟎+𝟓𝟔𝟎
436.6667= =
𝟗
𝟑𝟒𝟓𝟓+𝒚
436.6667= , y=475
𝟗
To calculate estimated value we use this equation:
Sales = 332.0269 + 3.1924 (Advertising cost) = 332.0269 + 3.1924 (30) =
427.7989
Lecture 4:
1. Rules:
The Bayes Rule is a way of going from P(X|Y), known from the
training dataset, to find P(Y|X).
Bayes Rule:
Naïve Bayes:
2. Example:
Age Income Student Credit rating Buys Computer
<=30 High No Fair No
<=30 High No Excellent No
31..40 High No Fair Yes
>40 Medium No Fair Yes
>40 Low Yes Fair Yes
>40 Low Yes Excellent No
31..40 Low Yes Excellent Yes
<=30 Medium No Fair No
<=30 Low Yes Fair Yes
>40 Medium Yes Fair Yes
<=30 Medium Yes Excellent Yes
31..40 Medium No Excellent Yes
31..40 High Yes Fair Yes
>40 Medium No Excellent No
Given the following training dataset, use naive Bayesian classifier to classifier
this data:
X = (age <=30, Income = medium, Student = yes, Credit rating = Fair)
Will buy computer or not?
Solution:
class بتاعت الprobability اول خطوه بتجيب ال
Class:
• C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’
◼ Compute P(Ci) for each class:
◼ P(C1) = P(buys computer = “yes”) = 9/14 = 0.643
◼ P(C2) = P(buys computer = “no”) = 5/14= 0.357
class بتاعتها مع الProbability في الداتا الجديده واجيبColumn تاني خطوه ابدأ امسك كل
Class:
• C1:buys computer = ‘yes’ C2:buys computer = ‘no’
Data to be classified:
• X = (age <=30, Income = medium, Student = yes, Credit rating = Fair)
الل تحت
ممكن اكتبها بطريقه تانيه زي ي
◼ Compute P(X|Ci) for each class
P(age = “<=30” | buys computer = “yes”) = 2/9 =
0.222
P(age = “<= 30” | buys computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys computer = “yes”) = 6/9 =
0.667 P(student = “yes” | buys computer = “no”) = 1/5
= 0.2
P(credit rating = “fair” | buys computer = “yes”) = 6/9 = 0.667
P(credit rating = “fair” | buys computer = “no”) = 2/5 = 0.4
متنساش الخطويطين دول
في بعضyes ضرب كل االرقام بتاعت ال
في بعضNo ضرب كل االرقام بتاعت ال
◼ X = (age <= 30 , income = medium, student = yes, credit rating = fair)
P(X|Ci) : P(X| buys computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 =
0.044
P(X | buys computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(yes) وطلعت الناتج اضربها في الYes وبعد كدا بعد ما ضربت ال
P(NO) وطلعت الناتج اضربها في الNO وبعد كدا بعد ما ضربت ال
Use the K - Nearest Neighbor (KNN) algorithm and the Manhattan distance as a
similarity measure to predict whether an ID number 11 whose Height is 5.5 years
and age is 38. We need to predict the weight of this person based on their height
and age. (Use K = 5 as the number of neighbors to be considered).
Solution:
ID Height Age Distance Rank Weight
1 5 45 |𝟓 − 𝟓. 𝟓| + |𝟒𝟓 − 𝟑𝟖| = 𝟕. 𝟓 5 77
2 5.11 26 |𝟓. 𝟏𝟏 − 𝟓. 𝟓| + |𝟐𝟔 − 𝟑𝟖| = 𝟏𝟐. 𝟑𝟗 8
3 5.6 30 |𝟓. 𝟔 − 𝟓. 𝟓| + |𝟑𝟎 − 𝟑𝟖| = 𝟖. 𝟏 6
4 5.9 34 |𝟓. 𝟗 − 𝟓. 𝟓| + |𝟑𝟒 − 𝟑𝟖| = 𝟒. 𝟒 3 59
5 4.8 40 |𝟒. 𝟖 − 𝟓. 𝟓| + |𝟒𝟎 − 𝟑𝟖| = 𝟐. 𝟕 2 72
6 5.8 36 |𝟓. 𝟖 − 𝟓. 𝟓| + |𝟑𝟔 − 𝟑𝟖| = 𝟐. 𝟑 1 60
7 5.3 19 |𝟓. 𝟑 − 𝟓. 𝟓| + |𝟏𝟔 − 𝟑𝟖| = 𝟐𝟐. 𝟐 10
8 5.8 28 |𝟓. 𝟖 − 𝟓. 𝟓| + |𝟐𝟖 − 𝟑𝟖| = 𝟏𝟎. 𝟑 7
9 5.5 23 |𝟓. 𝟓 − 𝟓. 𝟓| + |𝟐𝟑 − 𝟑𝟖| = 𝟏𝟓 9
10 5.6 32 |𝟓. 𝟔 − 𝟓. 𝟓| + |𝟑𝟐 − 𝟑𝟖| = 𝟔. 𝟏 4 58
𝟕𝟕+𝟓𝟗+𝟕𝟐+𝟔𝟎+𝟓𝟖
ID 11 = = 𝟔𝟓. 𝟐 𝒌𝒈
𝟓
3. Example On Confusion Matrix:
Actual Class\Predicted class cancer = yes cancer = no Total
a) Given the following distance matrix, use the DBSCAN algorithm to find
the final clusters. Determine for each point whether it is core, border, or
a noise point.
Use the Following parameters as inputs for the DBSCAN
(EPS=2, Minpts=2)
وبحطهم مع بعض بالشكل داeps اول خطوه بشوف اللي اقل من ال
N (A1) = {} noise
N (A2) = {} noise
N (A3) = {A5, A6} core
N (A4) = {A8} core
N (A5) = {A3, A6} core
N (A6) = {A3, A5} core
N (A7) = {} noise
N (A8) = {A4} core
المفروض بعد كدا اشوف ال borderو ال coreوال noiseطب دول اعرفهم ازاي
◼ ال coreبتبقي اكبر من او تساوي ال Minptsبس عندك مثال ال
}N (A4) = {A8
دي بتبقي 2علشان اصلها بيبقي كدا }N (A4) = {A8, A4
◼ ال borderبتبقي مش محققه شرط ال Minptsبس جواها point coreزي ما هنشوف
في ال exampleالجي
◼ ال noiseبتبقي بتبقي مش محققه شرط ال Minptsوال جواها point core
اخر خطوه منساش احطهم في Clusters
Solution:
Solution: