You are on page 1of 17

19CS2205 Data Science

Variables
LAB EXPERIMENT #7: Linear Transformation for Random

Date of the Session: J_


Time of the Session: to_

Pre-requisite
1. Required basic knowledge of probability theory.

Pre-lab.
1. Explain what is a random variable?
ub
Anscoe: Kandem v/aualule ua Vaiallo uhex valuL
t eal
uvknpLn e a utin that omigus v/aluws
a n eapeui uts outieml3

sed nOconemetue oa suugLKENn


D often
aalusw to luteumie statstical lab ensuas aubg
Omstes:

2.Explain linear transformation of random variable.


ppliec t e Sand
Avsues Wlua Dnas aamshomatisu us

Vasualle, a w Uanolsr ualue u itcated


any Gaaadl b ake Ceustanb,
ndem mdem vaialll
them at bX us a dinas taanskeuatitn et X
A Jinuas doauyaAmauontf X s MDtLS pandn vasulal

ui e ehtevn oluuete wtt i


3. What is Min-Max normalization.

Avsueh: Min- Mo Wtunalu z0L t eni l th nmelt Lomuubu


wwp to nO9mali te dala Jt wll state the elota

luboam 0 md 1. Te nounaluiatuon hilps u t

undiatowd olala tovly

49
udput:
Sepal Jecatlu sepal. uwdth petal-lorugt petal-oidic Speutes
8S
2

4
us- ukea
82 13 0:2

D2
6sus-Aete
148 62
80
54

S'
nisvoqait
&
us-VUainia
spal-
width yetal. luugth pelil wit
-

epal-levgtt 150 000000 S 0 000OOO SO 00000o


tut 5D00 0O00
3-0Su000
8.758667 FOL S667
5.843333
D u 35q4
tbuu 20 O+3161
O 8 280b6
sta 000000 b 0000O
20000000
4 800000
600000 D 300000
8o00o0
6 1000OO 4350
O00 3 00000
8.00000

5 800000 S 100000 100000

3 30 OO00
4O000 D Soo000
+0'l oOO00

19O000o
A00000

mecm'
5.8u3333
9 sepal-kmgt
S 3 0SuO00
Aepal widte
75866+
petol- leugth 3
198664
petalwittth

oltype:tlratt4
meduam,
eph! - e n t 5.80
Rpoul uwidt 3 00
Pltal -engt
30
Petulwidtt kype: a tb4
In-lab. 19CS2205 Data Science

Data-sct
The Iris flower data set or
British statistician, Fisher's Iris data set is a
consists of 50 sampleseugenicist, and multivariate data set
from each of biologist Ronald Fisher in his introduced by the
versicolor). Four features were three species of Iris (Iris setosa, Iris1936. The data set
the sepals and petals, in measured from each virginica and Iris
Fisher developed a linear centimetres. Based on the sample: the length and the width of
combination of these four
The dataset is available in discriminant model to distinguish the
following link. species from eachfeatures,
other.

a.
https:/ www.kaggle.com/arshid/iris-flower-dataset
Read data from the .CSV file. Get
and standard deviation of the the basic statistics like
mean, median, variance
petal_width of the data set. attributes sepal_length, sepal_width,
b. Find the co-relation between the attributes petal_length,
(petal_length, petal_with) (sepal_length, sepal_width) and
C. Make a linear transformation
of the attribute sepal_length by adding 1.5
Similarly make a linear transformation of the attribute
to it.
two. petal_length by multiplying
d. Now find the co-relation between the attributes
(sepal_length, sepal_width) and
(petal length, petal_with). Then analyze it and draw
Make a linear transformation of the attribute conclusion.
Similarly make a linear transformation of sepal_length by subtracting 1.5 from it.
the attribute
f Now find the co-relation between the attributes petal_length by dividing two.
(sepal_length, sepal_width) and
(petal length, petal_with). Then analyze it and draw conclusion.

Writing space for the Problem: (For Student's use only)


impose pamds på
imposut nuMpy
as nP
inf ect matplethb pyptet pet
ns as
eabren
tMptut

d pd v(ontutIsersu) SV)
4

WapliawnU0

dyOH) 50
Stol:
Sepal-ouath o 820806%
Aepul -udidti O4355q4
Ytal-Acug 464420

petal widu 0 46316

tpe: Hnt 6
StPl. Lugtt D'6856q4
0 188004
sepal. width
tal-leugl sl317
OS82414

Yetal widt&
dtype: Hirate4 ysien-petnl addll
divsn-petal
elal.lugt pelal
uwidte
kpal.widle
3epal ugt
08/ 4S4 o8tt154
vSH1S4
-0:109369
0oODO - 03S6Suu
-b30316-0:35654u
-0S
jal-hugt -D20516
00000
0
000000
-o109369 9627S+
SEpal-widt. -0y2051b 0ooo0
100000
-0420S|b
6215
O 46215S
PetalLugtle
018H154

3s6sy 4
bAK*
0 00CU0
00 00OO
08175
getal-wakl. o S/454-0
:000000
0:LOSi6

diwibxen
Pehl-longth 0:8Hy
o811S4 000000
D8tSU
0'FIBu ST/75t
0STSlu
0
-0:104369
addia.Sepal-
00000

wgth

0 09569au495064931

0.4 62+ 5404


#OSO9663
dhivéién. dd.
plbel.eoiki speúts
ulalie petal. luugta pekal-oug lu
epal- euytt sehal.
44 02
5 8S

49 3 '0

15
3 4 8
0 uisse
5.0
(S
19CS2205 Data Science

for the Problem: (For Student's


use only)
Writing space

widlth3)
efo (a4 C'Sepal.
4l'apal.bungla7.| .

lees(
ltal.lugta.
J.
Csepal_Liutin J+1s
ed. Spal. iugth J +06
hcad ls
))
sepol-eurgt'
l'addad.

s t (44
hilt . kepal tugth7)
.

iet (ol'adld
bey kuugth ]J/2
/2
L'pttal. lugth
eugt o d4 Lpetal.
] l4 =

d i v i s e n - p e t a l .

eugte)
C'divisien -petal
HtAist( division-petal
laugtus)
inglt (a4 l Upetal-Aeugii)
vii
(64Caiisien-petal lugt I,o}
ades

utput
Sepal.Lun sepolwidl petul.uugt. Jtal-uoidtt Speuies isinjetal -ovoe
8S b2

19uis setle
uis-sets
us sete 4S

I4 suis setse

51
utput-
eude terdut Ut buaatiy
rveite Bamd Cy ustn
ype
o 10-b7 umdes Huralt /lealtt
taltle .464
A Yawgon btpudy 961115
electenit 28
22-8i
C Naupyitaud Nmal kemalt
aLLeMeies
80 2100
Nohnal Home aud 4b-3
268-1 A Yangon tembe
Male
xfetyla I6.11S,
S82 2
Mmles Ml Hoalel 88.& 8E0
312319 A Yanget beauuby -.

Nounal Mae Spoouts48b*5/7 Sd 203s


S73-3 A Yavgon auel

C0U
o 633962088 5 890689

Min Hoo5tales:
Hin Mar Scalas ( Ceb =0 5 48 1 5
80 2-200
8uDs2s3
84 ou$d
634 378S

u9.2990

Na: otal, lengHus t000, dtype :Mot t4


eatwaHluuge-0 14b9
5 28
2 y6 33
S8 22
8631

6S 82
i q 88 34

NauL Ouit Puiee , gtu 100C0 ,dkype: tucatb


19CS2205 Data Science

Post-lab.

accessories, Electronic
DATA SET Health and beauty, following
iinK.

selling the products such as data-set in the


Asuper market XXXX Food and beverages, Sports. You
c a n get the

Home
and litestyle,
https://www.kaggle.com/aungpyaeap/supermarket-sales

transform the
attributes unit-price and Total. Then between u n i t
between the
Find the co-relation normalization. Then find the co-relation

Total' using min-max


attribute conclusion.
Total and draw
price and
use only)
Problem: (For Student's
Writing space for the

as pd
impokt padas
as np
tmpot numpy
30les -Sheet1 a )
(3uCIceuteut|&pekmcoket
dfa pd suaa.
dfe hoad )

cescCdfe ['1otat)
dtl'Unit paice'
Scale
ppupocONMNG mpaout Aunon
fpgwn 8leasua
Alin llnStlUR [dfe lUut puie 7, d19["
ola l '
19CS2205 Data Science

Viva Voce:

values it can take.


die is a random variable, then write the possible
1. If rolling a
Okeu a die u Letlo tte Possible outtenes asuet 1,2,3,4,5,6

Xet J tte andom Voeuolile

X (,2, 3, 415, 6)

2. Consider X (A random variable) to be the number of heads obtained in three tosses of a coin

what are the possible values it will take.

1Lie S ceius ae toesed 111,114,1HT, H1, 41H, 1HH,HH1, HHH

o d s : 0, l, 2,3
x(o,1,2,

3. What do you understand about the


linear-transformation of an attribute of a data-set?
LCSineas tamssematios Con suhes to ettuus :
aarloding a tenstuut to eath tesm na dataset E
b)vwukkpling dataset luy o enstomt
Ovcl LaMLB &
Lpuod in ditfeol oays

Comment of the Evaluator (if Any)


Marks Secured:
EvaluatorsObservation
out of
Full Name of the
Evaluator:

Signature of the Evaluator Date of


Evaluation:
Variables
Random
Discrete
#8:
EXPERIMENT
AR

Session:J-I-

Date ofthe Session: to


Time ofthe

aueocg *
Pre-lab
random variable? s t e lng-tescm
1. What is expectation of a uandom
Vauinlle
e tatÒn ef
a

Ansuwe
a n l e n valuualle.
calulotes
ung
douoted as 0 , is
valuu, uxaly
he eyeted
x if (X)
ECx)

Caleulace
Expectation of a discrete random variable.
2.
value us
tle erpectel
6 a disecate gamdem vaualle, Luandp
LLE of value
unmaui zing lumming) the puolut tAC

talkun ousL au
Vaualle aud itu aneciated poelauilty ,
Values f andom vauialle- olluoted

tie pelted alue, ually


d i sisoute
e t e andem
vauialle,
4 , is caliulated sng:

3. Explain independence of two random variables.


ost indepoudut kuootnq
SIACA: wo andem yuialtes aol V
the value n t them doesnot ge the Puthaluliieg
the otet one
In Othes uOPdr, i{ X anl y osLE indopedout, uwe Ca

onite
PCY-ylx-n)- PCY-9),feå ail

54
Output: pelal uwidt
fetal.uwitse
yetal kugt
Bepal ougttr
Bepal.
wdt
speties
62
Lrisser
uisseka
2 32
9useken
HiS-ete
3
148 62 3
8

41 54
is-li
sepal-ougth Sepal-oidth pelal -Auugti petal.eoilts apeies
0 2
51 Guis sett
30
uis-mtea
3
Luis wBa.

53 84 Lisseltre
83

tetal teuut
50

wque
58 5 5 5
u& 43
L5.1 4-9 4:7 4b 5. 5 u

puoalelity
Sepal lengt hupucuy Kutlalulity
8S 000
0
C-O0
2
0

36
O00
13

1S 2
23 000
19CS2205 Data Science

In-lab
Data-set
The Iris flower data set or
British Fisher's Iris data
set is a multivariate
statistician, eugenicist, and data set introduced by the
consists of S0 samples from biologist Ronald Fisher in his 1936. The data set
each of three
versicolor). Four features were measured species of Iris (Iris setosa, Iris virginica and Iris
the sepals and from each sample: the
petals, in centimetres. Based the
length and the width of
Fisher developed a linear discriminant model
on combination of these four features,
The dataset is available in following link.
to distinguish the species from each othe.

https://www.kaggle.com/arshid/iris-flower-dataset
a. From the above data set
values and its
only consider the species iris-setosa, get all the distinct
frequency for the attribute sepal-length.
b. Calculate the
probability of each distinct value of the attribute
species iris-setosa. sepal-length for
C. Now, calculate the expected value of the attribute
and draw your conclusion. sepal-length for species iris-setosa

Empout Pamalaspo a
Writing space for the Problem: (For Student's use only)
tmpaut wmpy as nP
dpd Lad svC ltonteut Is RIS Cav)

&i dt ze(df C'3pecies]: 1suis-sctesa]

lal-Let: np- t8uut- non xe(dti mpol longtw


tal-Leun
tenp-df 8Sepal Aongt. wnuguuU
uintl tevap)

auela lulituy CS
tuut 0
ei in tewp:

LOumt -4SLtomt t1 55

QLnuy. oppevd lLeuut)


enpected Valei
6 1580000000O000O04
19CS2205 Data Science

(For Student's
Problem: (For
,Writing space for theroblgm:
.use
only
lot MAL)
hallety oPpead l teMit

p u o t(t e m y )

t ( Y r u c g u a u

g uTpetNabulite
l

(i Sepalleuglt tlewmp,
nd Datafsaune
PA Da
FaouU:ftaqyuen
,Pcebaliy'. qpxeatulu

pected _Value=o

i:0 Ci)
value in temp +{ValueB prLetaluli
expected- Volu
xpected value
=

DaLintEpected v e l u l erpected-valu)
Cukputi
Species
Seul.Luugt epol-widlh getal.luugt elal-udoa
02 Jois setoa
0:2 pus Belev
0
02 Jpis Aetev
3 2 13

'

r'

Juis vgdnia
3
Juis voiniu
4pecio
Ape cio
tlal-vorilt
Sepal uuot sepalwiat. pebolLuuglh
100 33 O yuis-vngiA
10 S
4 Juisosaoics
r

Juis HAta
48 62 3
8
Juús-gni
tetal teuuntr

wuei
7u 74 5

ueb alibkg
Bepal Leugth hlutuy tuekalulily
0 02
006
00
5 o 00

18 0 00
14 6 00
20 5 A
19CS2205 Data Science

Post-Lab:
all the distinct
a. From the above data set only consider the species Iris-virginica, get
values and its frequency for the attribute Petal Length.
csvC'GRIS. CU')
d pd suad
44.lo[olt ["Apteies]= "1sis-vongnica

Petal Length for


each distinct value of the attribute
b. Calculate the probability of
species Iris-virginica.

tevmp-dtl.Sepal.levgtle r wiqut )
puint (temp .
tsquulenaY=C), pu@6aluiutyt3 ,count0

fo tn temp:
i n datafuame Cpttoul-lengtiJ: Punt Ctemp)
ti
Leumt- loumt puintCPustaliity)
tRueyuPpLft! Eetal tount)
PLLebatuluouppevd[louut[tetal-teunut)
Iris-
attribute petal-length for species
c. Now, calculate the expected value of the
draw your conclusion.
virginica and
C
s l-Pol Datat uow
Sepal Lenglu :te

Paebally pevedaleuy

pet valun
LID
oNalue in temp t puLebaluuy
etvolue enpet value +(value
iitl
puntCEnpeted Valul pet-vaue,
57
enpected Value-
8.1S8000000 OO0 O004
19CS2205 Data Science

Viva-Voce

1What doyou understand by probability mass function (PMF) and probability density function?
1.M

(PDF).

Auswe,-
Puelrlu
Puelrrlu las tundien (PrM£) a wlion thot gies dthe
LEnluly
puetalulily that a
distute audem uuiahe is yact
Quual te sene value

helnluly Heuxty kuuion (ror) i a tatiso epuoY


t a t dufiw a uelalilily distuiibutien e
tena disuete
a disoebe
vouualul.
uaudem
HAvde
OMdevm Vouuialte as eppee to a Cedinws
2. What is the difference between joint probability and conditional probability?

Joint Peobaliluty
dt te puelnliulay ef a euut tuwig stuultanebuvdy

Covde enal Puslaltlity-


puelabildy e
a&euencd ed

Commentof the Evaluator (if Anyl Evaluator's Observation


Marks Secured: out of

Full Name of the Evaluator:

a Cunlu

You might also like