You are on page 1of 111

Pramod Kumar Mishra

Prof

Lewis of Rubin statistics


for Negus
Pearson Ed

Business
Albright fwinston Analysis
Decision
Data Analysis 4
makeby
Excel
Descriptive
diu Analysis
rep
sass e
GAS
MATCH
Python Excel solver
R

Analytics Anglyn's
y
current
Decision onlysituation
making

Variance ANOVA
Analysis of
Integrated

LARIMAT
money average
Auto Regum'm
Regression
Decision tree
Neural Network

sampling
Data management

usepinII
paramanaganent

Panel Data
Log zu

2029 Pooled Daba

source source
Primary secondary

Quentionaire
valid Reliable Biased Design
Pilot test
full Scale Data Collection
Primary
1 I
Raw Data
Data Management

1
Data
Array column 1 Variables
t
Statistics
Row 1
Information Respondent I
Rxc

A's CXR
Process

AAqui ne Store Clean Protect Retrieve Analyse


Puriff

check
Datacluality
I Error Accurate
free
2 complete
3 Cannot cut

4 Timely of updated
5 Unique
6 valid

qmat
1 Data cleansing Purification

2 Data Coding

III
Elm 3 Data
a on
filtering Restructuring

5 Data c.IEfEea
Quantitative Sagerigate
Data collection Precoding
During Not
stage many open
ended question
Data
During Entry No leading ques
Notmany sensitive
ques
1 related na
formatting veryftp.ht
issues
2 No Errors
Related to special character
valve empty data
missing

Pre
Coding

condensing

Datacoding
1 Assign code to a variable
the data code
for
a
2
Screening
3 cases Prefixed rule
missing
4 Outlier treatment Box Plot
un
75
prima rage

ai P
a Quartile
Decile toy
Percentile I't

uanh

Restruchung
Data
sheets responses

I Merging
Removing Duplicate responses
variables

AscentDescendig
Select
variableInesponderts
splitting

Recode
DafaTranfornfration Datsaanitora

categorisation
Recode data
into new var

2 computing

3 Transpose
Data Reduction

Reducing size ofdata


Remove
railrespondent
i in
iii I Mean O
variance I
Normalization
Principal componentAnalya's
spit a
factor Analysis
a
XI
8142279454

Pramod wishra uohyd.ac in

zsyfM gunpux
population
t I
I n

s o
continuous
distribution
flag

2
El E
fact i e
652A

K E C a too

factortualysis
tflteibf.ee
E7Ii7
if.ya
3 X X t 27cL tKrs
i

i
kn Corelatient factor
Gender

www.f Edga
Q1
an kn
Income Y a ki t Azk

L f
Fini'T
Je 1 i X
A az
H
an

Reg coefficient

Ypred
error

Y yn e

Ee 0

Y b
If
9 Ane t ask an xn t
Cheek
Data Reliability
Dependable consistent etc

chronbact
o a si

mi n req 9707
LH
to
Research

VI Nu Nz Vu
R N 11 7 ul

ru

R 13

7 4 7 x
fly
E fl E ful
if
a

Ej
7 ful
t'Independent
Dependent

X Y

t t
I 4

an g

X
f te iz

6 toy Total var

Val X q2
Cleansing open Refine google
Data Wrangler

rake treatment
Missing

the
Desulp stars prescriptie OR
Predictive
spsf
SAS
pyruanU
Sw
YA
IndianSoftware
Analysis

Centraltendenig

sD N A SA
p
E
I Z 3 U 5
e

c
Mean Average
Mean
Population
tape Median
t I
SDCs Mode
SDG

Arithmetic Geometric Harmonic


they
Exclusive

germ Include

ungoaped

e
ga

so to 375 3 75
w y 42 5 4 170 5
got
0 475 5 237 5
o IS
zo 5 52 5 M 210
57 5 2 115 544.86
no 10
is 807 5
us Fe
y Effe
Et
A Assumed mean

A
tqtfz.fi
de m A

insert T
m A
9
at ft
ct.fm
O lo G 5 30 180

o W 10 15 VO WO

20 30 18 25 to 180

so ho 14 35 O O

40 50 It 45 to 120

fo Go to 55 W
zoo

6 65 30 180
Go 70
76
Go
let A 35 76

I t 35 GI
76

Lil a
35 o 7094

34 21
preecidig
a
Muir
median M let i

frequency

6 6
o lo

to
16
10 Lo
Medium
zy
2030 is
M 76
30 40 48
ly I
40 50 2
3A
Go Go to 6041
70

70 6 76
Go

4 10
38
M 30 t ya 32.85

2 as
307

odd
for
Mediaryntfythobscuration
a
median

Q2 Quartile

254
Q3
a 725.1
75 Y

Median Positional average

12 d 5 4 20 6 I 0
16

iii iii i ii
9
ay
data
advantage in
non symmetrical

e t of
dis xi

et
e x

et x
as
H

do t 9 x lo

18

I 21 66
201 1,0

aye
hot 57
41 1
IV

hot 7 s

5
47

die 21.66 21 66 97 5
02
s
47 Z
03
34 55 is 33

Q a

Oz d
4

Quartile deviation
Coefficient of
Oz Q1
a

HE ditzes f da

dist is not
if
perfectly symmetric
Percentile too parts

Decile to parts

50th percentile 5th Decile Oz Median

S P Gupta Intro to stah's

G C Barry
Quiz
Assignment
wig
dSep

mode
The observation
no times
occurring
in
for
adistribution
highest of

I 2.2 3 3,3 10 i 5 6 6 6 6 I5

I l
2 2

3 3
10 I
5 I

I5 I

f 2.2,3,3 3,3 10 i 5 6 6 6 6 I5

3 multiple models

Bimodal

Garment
t
mode e t
d

l lower value of nodal class


d i
f i fo frequencyof preceding
class to modal class
frequency
modal
of class

da i
f i f2
frequency of succedig
y
class to modal class

i class interval
of
o
length

mom et i

et

C I f
Latif
O Io G
o Lo
fo lO

2030
fi 18
e modal class
Yo 40
fu 14
Go 50 12

Go Go 10

60
Ef 76

x 0
dot
mode
f µ

L
snot NO
3

20 t 6 66

26.67

Mean 34 2
0 i 21 6 central
tendency
T Values
Median 32 s

3
I 475
Mode 26 7

coff of 037
QD
ds di 25 9
Iger
some
Inter Quartile of summary
any

Mean Mediaraf Mode Relationship

mean median mode t skewed


11
rely
y
mo de c median c mean Extreme
Values On

now

mean median anode Symmetric


distribution

mode median mean skewed


rely
Extremevotes
hand
Assymetric
on
left
side
Modet3Median 2Meai

mild skewness

fproximately bfmf.ltLeian

rn
new
On Anthmatic

meatu
variation from
Scattering

moving
my
Variance 8D Quartile D Coeff of variation
Mean Absolute D Standard Error

variance
go
In Ex
III
c I deviation

n no observations
of

so to JI
n
I
1 2 4 5 3
n Ey
e4
2 t l t l t 4
l
I 3 2 23 4372 5 3

10 52.5
62
4

SD ITS Tl 58

Geogped fix it

C I f m d int n ie
floc IR

3 129 5046 i 6 29
o lo 6 5 ug

15 Z 19 3610
o Lo 10

2030 18 25 I 9 1458
30 40 14
O
35 0 I 14
yo 50 12 45 I 11 1452
5 2 21
Go Go 10 4410
Go 70 6 65 31 5766
76 185072
I 34.2 34
Ttf

d MI
i

SD T
6 243 5
J net
15.6

SD
Coefficient x too
variation
of mean

keen
I xlooi4s's
good
eve hot

variation 4
Dispersed
Not consistent
Inconsistent
Better

U 6
MGR 120 10
E

150 II
Marz

10 100
Umar I

To
d L

CVmaez
00
i lo
150
A
f d Li dt

fdfdo.io
c I m

g i8a su

15 Z Y 209 Go
o Lo to

25 I 1 18 11
2030 18
30 40 14 gg O 0 O O

40 50 12 45 y 1 12 12

Go Go to 55 2 4 ton Go

6o 7 54
76 Efd 6Efd 218

t
at
pins

yet
f 1.67
86 o 07
2
Skewness

Galton's skewness
Skewness
Bowley's

Sn Q t 02 202

Qy O

ECxYN I
3

Sh C fo s o s

mild skewness

skewed
1.0C ska o s
rely
e t o skewed
rely
forgeted
54 1
she
Efcc
mm
C I f m d z fun
23 Bug 3174
169 1690
o Lo 10
15 2 13

3 9 162
2030 18 25 I

49 686
30 40 ly g o z

Uo 50 12 45 2189 3468
I 17
E 9180
Ef'Go

I
Eff 16602
60
27 7 28

Ef
kurtosis K

E x 5441N
ki
Th

E 41N
k Eflk
ST
where S
J f N

ki 3001980 60

5003322888.66
228886

53 Platykufic
2.18

K 3

f Normal

K 3
KC 3

C I

Tailedness
Peakedness

Kurtosis excess E K 3
ke
O Mesokaetic
Jf
U 3 do
platykuelic
to Karlie
K s 0
lap

i O d2 CO
L Id 3

su C C 05,5
O
ki 3 ke

statistics
Summary
statistics
Descriptive
l distribution
nominal frequency
for ordinal data
2 central tenderdcy
only Mean
Q
T a
Q
mode
variance
3
mqm dot
foggy

Crosstabutation Chi square test


Iron

Rn Rotate
garden µ

Rows 3

1
Column

Ige

Inferentialbfatistic
jdOct2019
Conditional probability

A 4B are independent
PLA B PLAAB PCA PCB

A 4 B disjoint PCA AB o

A B are dependent

S H 1 PCH ki PCT
S HH HE TT TH

PLAID statistically
Ig hedda
Dependent

Joint probability
Prior Prob peas peas

Posterior PHB

kotmogonvbef
mhonPCAIH
plpA.gl
PCB 70

P BIA PCA 70
PCA
from PLA AB P A B P B
P CA A B P BIA PLA
from

P AMB P BIA PLA P AIB PCB

E Dice rolled twice


s O O O

giiiioiiiiiiioiiiii.io
41 42 43 44 45 46 p
51 52 53 54 55,56
61 62 63 64 65,66

A 1st 2 21 22 23 24,25 26
digit
11 13 15
Be Sum is divisible 2 22 24 26
by
31 33 35

I L

PCA 6136
P B 18 36 An B
22 24 26
3
P AIB p AAB 36
PCB

PLBIA 36,3
Iz

BagisthoIM
p B PLANT
PCB
P AIB PCBIA P PCB 40
PCB

Baye'sformula NAIB PCBIAIPCAIPC.ro


A PCA

tPCBIA4PCAYPCBIAtPCBIAYPCAY
PCAABI.pl tPCAcnB PH
PCA

PCA PCAA

P AMB t P Acn B
P AMB t AchB
P B

of complements En is high
denominator
tf
no
P AIB
PCBIAIPCATE
P BIA PLA

Prob of in the playing a


Acc of spade hi FCAFB PLAABI
PLA RBI PCB
1B 413
45213
52

Mis Ans
i
B Y si PLA1B
t
PCBIAI.PH
P BIA PLA t PfB A PLAY

rental
f
the
ipfan.IN

PCA I PCA l t.dz


52
spade

plants BI
PY.in IfT Ia
14 she 14 sa 2 s2

4 Ahs
Tz

Ones Tve Tve


T
T T D Dengue

Typffjor T s Test
D a
D
confusion Matrix
a
Dc D Typeftror

calculate conditional pools theorem


by Baye's
PlistfTt PITY Dt
P Dtl 17 Pf1 115
P D 1T PLT list
PCD IT Pti 1137

The T ve
T Te
rbality
U Ut 99 guy o
n l
il 99
uc u of

control Gperimap
group
go gap

P Itta fav
o
oi
user
D
No
on user

O
r i r

351 P Uf Itu pCttYu p


Pfitulu plug Plant plug

PW p Thefu 0.00495

PUYPCltuluy
o.ooggspfucf
ty.PL Yuypcue
PCttYu4PCu4tpHtYu

o ol X O 995

o ol X 0.995 to 99 0.005

0.00995
o tooygs 07149

66 77

s s
Plants Production Std and Ron Ord
I 6000 75 25

I 4000 80 20

CI sooo 9901 1
12000

PCI 12 05
PCI s 13 0.33
16 If
PLI 0

PCIIs P SLI PCI


P SII P t P Ste PHI
P shI PCI
075 0.5
0.75 0.5 to 33 0 d t 0.17 0gg

0
3750
375 to 264 t 0.1683 0.8073
O 4645
in 46

P Ifs 0
2642 32.74 I 33
0.8073

P IIIs old I 21
o 8073

P Its tP Ifs f Phils 100

p Ifs PCs.li
s I yP
PCI tPCsYI PlIItP sYiI PliI
0.25 0

50.25
0.5 f 0.2 0 33 t 0.01 0.17

0.1250
1251 0.0661 0.0017 I
0.1927

PCIIs 64.86 it 65
PCIIs 34 25 I 34

O 89 9
P IIIs E a

wl9
that
g A Probability Distribution

Discrete contfulous

1 1
Categorical Sealed
Qualitative Quantitative
variable variable
1 I
Binomial Distribution A Normal Distribution
Poisson Distribution
Special case
ofBinomial y

Success
failure ProbabilityDensity
Bernoulli'sat function

Probability mass

fruition
prob mass function pmf
n
fact
s C pg
X I DT M

O t
n
MC Cen n
e
Mo il
k f n ne en
success x times
p
s

in a times
q failure
id no
of trials

1ftor
Toss a coin ten times
Ef 0 s 2 i 1024
N i 90

Atleast 4 heads
let x random variable for getting head
p peu t the p Goo ME q
n n
cut C p g
f
Csc Hz
Probability mass fix b
fun
Xi 0 1,2 10
here se 4

lo loco 129 12
soo
f la I
1024

p
7024

4
x
fat to
1024

452 I
EL l
for
x
Xd 1024 1024

1 nest's io fou Em

x de
f 4 lot x 1
Xxxxx
X6 2104 1024 1024

z 349117
6 2 3 X
110 91
252 110 9118117 6
f
x Kis s
g11
6 1024 35 4131 12
234
4118
7
10 xxx
I 10 9 8
f 71024
6
2 G id
X t
f Ct 120
1024

sod i
fld 451024
Sums 1024 4
OU

x
949 to
1024

solo
fllo I
1024

it Probability
fuel 1,02
Distribution
p
function f Cx

Coin Unbiased

Population Normal

Sample Normal

o.o.nl llioi
o i 2 3 4 s 6 7 9 9 10

is 4 heads will turn up I


past PCx74 1024

A most 4 381
Place4 1024
Decision
manky
9 c
bratty A Quality
Place97 7024
Check

PoissonDistribution
Binomial distribution
n Oton
44 MC p qn
f
se

Eatin I c Han y
d
X C X o t 00

Pmf for Poisson distribution

t
µ
t
fan x

D mean
2 718 fact e

D
p µ
c t s t O
ed a

gothOct2019

statistics
Summary
central tendency
as well as deviation

Poisson distribution
d e o i
flu
x

D n
xp
i
f n
xpng
C 2.718

EI Hmc D to min
Arg waiting
of customer at a bank

Probability Distribution

Discrete continuous
I t
Categorical Sealey
Qualitative Quantitative
variable variable
1 I
Binomial Distribution A Normal Distribution
Poisson Distribution
Special case
ofBinomial
Success
failure b
B.am gmg gmy
stiae

Probability mass

fruition

Binomial 2 success
responses failure

Binary responses

SC Gupta
i Gc Berry Kapoor

Jockyiq.Renegig.ballu.TL

fail Iet
o
pdf
k K
to
when no
flo toe
0

1 0.000045439
0.00004539

K 9 10 0.000045399
ya I
O 00045399

i i
x 3 T 1000 X O 00004539
fC3
G
O 0075665

2
3
4 0 01891625
f 4
x 4 i

s 5 I 03783250
f
x o

so
671167 0.063054166
HE 7 77 T O 09007738

I'd O 112596726
74cal
g
fC9 O 12511003

T lo 7 401 0.12511003
Exactly
gumi's
at x.tl I 0 113734066
74111

O 09478033
7512711112

O 0729079
nuns x

i.o.o.io
a 15 7415 0.034718069

ponflful fuel
l o 95120947
probability
distribution PCx 15 l o 95120947
fun

HMMM

0 I 2 3 4 5 6 7 9 9 10 11 12 13 ly 15

P x to 0.125 12.51
fGo
aptos Pats aEofc.ci
floltfultfcutfcsltfkltf.li
O 0711 7

PCkelo i 0 sat a 604

P 55 110 i 0.5537 55
Off Every
5 min there is some accident at Rajadianaga
in
PLI accidents a
day
P L5 n n i e

P 5 u n
I
p lo n n
al
di 5
l
ful x

5
lo 50 e 0.006737
f OF

f t 5 e O 033689

Normal distribution

e
5
simple pdf feats CGoo e
x
represent
population

2 Standard normal
Kj

thief
N
1µmO to
Z

f Ck
Jfk
oo
da t

flat e d y
Arg ar class u
EFI of
45
Aug cut legs

6 s S
kg

ConfidenceintervaldAreaProbabiliti

N E Ei t 681
C2 2 95

3 3 i 99 73

99.73
Confidence level
a

Cost
time Jeonfedar
complexity

One tail Normal distribution


mean TO
S D T
x x x
i
d 2T too
2 1.29
significance
Coy confidence got 251.29
98 to 2 1.65
gg Io 2 2.33

Two tails
90 2 f l 65 I 65
95 2 I I 96 I 96
99 2 2.58 2.58

mOct
12 9
Tentative test
of hypothesis

assumption of a
pan

Des'chephle stats
T 2 f test
normal distribution test
Chi square test
Basis for hypothesis literature evidence
Alternate
1 up

Hypo'w no Income does not


impact enperdina
null
hyphens

H reflection
regia
H region
gig

the region to Rejection


no acceptance region
region
11 is deemed to be
accepted

Ho to Mi lo X Random variable
H Mt to thinnest
A stats table
jfnr
confidence led 95
994
Sing
gun wel
sfj.cn
significance
level
a

I lend
confidence

significance
area

1970
Krejcie of Morgan
More sample size higher representation

Egfr Average Expense Thoth to ooo IHR U

8th deveiation moth INR 2,000 s o

gg ut 36 4000,16000

i P exp nooo 15 87

i P Exp toooo Lxckooo 4772


x CGouo i 2 Wto
XC how
hit co y 0.8413

Oru772
O
O 9772

the
G S to

X Random variable radon Gcp se

Std normal variate


M_E 2

P
Ig X 712000

if
no
s
22 i
p z t

17thOct WH

SampligEssimation

Krejie g Morgan
n

Error
Error C
Accuracy
e

2 2
n

I
2
M

when is known

Q at 954 confidence c t.gs


6 4
es 5 Accuracy
5

n s c s 15.3664
05 L
6146.56 4

24586

16
if
39337984

if e o

1n

se 0 1 for
normal disty

se 1
Jn
n lo Sei 0.316 n
ox
n
mi 100 set O G Ian O

nn

when p proportion
glen
6 unknown

Binomial
9 fmpq
frog

NT I 962 0.5 0 J
O 0025

400
00252

mta

Krejcief Morgan
1970 1

384

s
n
µ
Heterogeneous
guard distribution
Nonparametric
M XL PCI pg
XP l P td4N i

X Chi esque 3 84
d
degree of accuracy
P population proportion
ix Population

Ni 100

I'd 4 100 0 5 0 5
m
3 84 05 0.5 t 0 05 99

79 57
I do

if N Ion N 278
25
38400
Milo
if ooo ms 370

N To ooo ooo n s 384


if
9600000
if N I 00,00 ooo n I 384
TFggggg.gg
000
250
384 4 1536

pop Gop 0,000 heterogr

Central limit theorem

2 5c
Je
Z I

As N becomes
very large
T S a

f p

population Sample
parameter parameter
P
shape of pop
Irrespective
of
sample parameter will approach
population parameter

sampteparametaipopulationparamery
Irrespective the shape
normal of
d of population
guw
CLT holds good

Test of hypothesis

Statement tentative analysis to euisling


when

Before analysis

Based on LR
Observation Descriptive

Not known no hypothesis

1hypothesis True false


1hypothesis is deemed to be accepted
from sample

Typicalsamplesige

Suicide informal
faemw borrowing

hypothesis
testing
Steps
1 hypothesis
formulate
Ho
Hi
Based on LR objestation Research questions

2 Decide on the significance level ftpee aernr

3 Decide on the appropriate distribution


9 find the requisite test statistics

4 out the critical value at


find
the green significance loud

9 a critical value

5 compare the statistics calculated in


step 3 w r t critical value
the.ptatictics is greater than smaller
If than the acceptance then
region of
the hypothesis alternative is going to
be accepted otherwise Null
hypothesis
will be accepted

6 Draw the inference about the population


Step 5
from

ITE7Eg uii
iC gew

9 5 o o5

9 O 09
to
rejection

Fix

I
y a quae

1 re side

1 65

Precision T
acceptance level I

t test 2 test

of
si
n I

varian test critical value


from table

2 t
mistrikhin Normal Approximately Normal
gargle 730 E 30
size
6 known Unknown

L I µ I m one
samput

MA n i
of

Standridge offn
sample
Std deviation calculated from
w r t mean
x µ 6
Two
souffes I nu g

E
5 cm u rui7 lm
Yutaka
41ft NZ

Spa
combined 8D

pooled

staffman
AifH Kz 1

2
Sf

dfsnitnz

2eopnfhhae.su
pI

6
I p
g
p po
TI
two sample
p Pz
2

14 Dfat'm
fo K t Kz

Nit Nz

Eoc Population

avg distance 80.00ohm we can confidence


Entenal to tear
µ 50,000km
hypothesis
T s km
go ooo

Sample
In go z 2 2

Jc 48,000 tooo

fo Ms 50,000
H i le 4 50,000
µ 7 50 oui
if so ooo 12 tailed
T.eu
Fait 1.96 of
two trails
f 1 go

F 2

Ija 2,2g

I I 52,000
if 2T 2

Hh

Calulated

Correct Statement
I conf
L Z l 96
KI
upper 6Th
20
I 96 4000 1 50,000 in

re 51960

WM z 1.96

x 48040
level
at 5
of significance
more CI 8040to 51960 with an average
precise kn
of 50,000
4880004 CL
soooo II
52580
CI more
9 11
gyu zu accurate
out CI
52000 E CI
hyp I j lo
ay ep gh
1st lily
o

representation
50680 sample true
rot
if c old aggro of
the

wastage t
cost t
boss9
Unhappy

26thOct WH
T 2 Sample test
test mean relation
close no difference
comparison

I test
7 test

t Ii
splgyn.tt nu

sp
f tha 2

Men Women
Eff
i
Mean
s.i.im
Lumphetize 930 930

case 2 tes

Null

I Hi
McFMz
M Mz
so ooo 4sooo twanged
2 o
M cuz
10oooh

2 E
Sooo

54
9 12
Z

it

1 96

i 9 12
7cal
2 tab I 96 upper tail
I 96 flower tail

2cal 7 2 tab

code valve
galls in rejection region
Ho is aejected
H deemed to be accepted

There is stat evidence against the


a
which says at this stage that null hypo
is
y
claim that men earn more

with this sample

Eg men women
i in

ri 7

25 27
error 11 smpk.TT
n
Z II
64T

t tier Cri nd

spy't t's

Sp
t ooooT
YXlo.int
J2 50

JSYOXflo.gov
IT
10,000
t 50001
k
1
27

1.80

tea teals
0 05,50

falsies 07 Lill
teas I 2.40

to I
table to 2

Test DAD charge


Eff Before DAD After
claim atleast to marks higher

marks in DAD

Sample Before After


I 30 42 12
2 42 50 8 tlo.ee Mr
3 GO 60 O
Y 655
th 9 Mz 4,310
I 39 4g 11
6 47 55 8 I Difference
7 2
55 65 fo mean
ofdiffer
8 90 3
gu 8 SD
9 75 79 4 4 standard Error

10 49 64 is 5 Use t ti

I t d
SATs

31stOct
T
Chi Square Test X
Non parametric test

Sample need not be normal


sample size normally higher

variable are categorical


Qualitative

Test Statistics

2
ECO
R2 E 0 observed
E I

R l E
def
C l Rti
T

R Row CT Total

c column

XI idf X table

Income
Eg low
tiger n 200

I.t

am
t

566 the
Total 1120 80 1200

Association

Ho Income is independent on gender


ft Income is dependent on gender
9012 54
Eg
2001

35 84 9,0
72 36
qs
6
65 29 E 66
111047

C us 1l I 94
g

calculation

Observed t
valve
o expected E
valve
O E
g Ej c I E

65 66 I I 0 015

45 44 I l 0.022
E 0.0020

Now table
from
O 0dL
X ae

to.us i 3.84 tab

dof K H K t

1 1 1

Now
comparing Tcu with
Ives
X os
o o ou

to accepted
No ftatistical evidence
to
reject to at this
stage
Ho is retained and weconclude
that there is association
b w the and the income
gender

marks in DAD

Sample Before After


I 30 42 12
2 42 50 8 tlo.ee Mr
3 GO 60 O
th 9 Ma 4,310
Y 55 Go g
Mz 310th
I 38 4g 11
6 47 55 8 1 Difference
7 2
55 65 too mean
ofdiffer
8 90 3
gu 8 SD
9 75 79 4 4 standard Error

10 49 64 is 5 Use t to

8T t d
SAT
d I d l
10

d f
3 9 15.21
o l cool S
d I 65 61 J lo l
3 I 9.61
2 9 d 41
o I 0.01 4 3 adffup
I 9 3 61
d M
f
o I 0 01
hI 16.81 se
gg 47.6 t 8 l 1.3g
T
4 3 cross
16681 5,0

9 5 two tales table


take 9 to
n 1 9
df wuertailtest

ttab to.os 9 4.83

Compare tea 4 teas

we have tea Sttas


falling 1 in o N
ft is accepted

Before and after there is no


charge
claim is not valid in this case
The has failed T
training

T 2 27
J
6 f 54

no stat evidence that the add


5 males train'y

6 NovWB
AN O VA
Analysis of variance
variation
w
aew.im
nfuif.mu
fi n
i u
n i II lez l

I Ma
1variationarmongthemeansT
ANova table variation b w f variation within
ECK I
Sources sun Mean f Ration P value
of
variation dot squawks
square Ms

f dist
B w column
kg Ssb
Msg Ssd Msb
variation let Msw F Ratio k i n k

within column Msw Ssw


n k Ssw n a
variation
sum of
Total n p
Ssbtssw
variation

M samples f Ratio Varianiebly


k no ranencewli
of groups

stat b w sample mean Antonia


diff

Brand Brand I Brand I


Eff alienage

to 12
16 12
12 y 16
13 is 18
9 10 14
I i 11.8 52 14.4 Is 13.6

I Ii Iz tis 13.3
3

Totalvariance
gpage
I B w column variance
in asample

2
sample mean I I I I LI 5 n LE542
5 11 8 13.3 I I 2 25 11.25
5 14.4 13 z l I 1z 6.05

5 13.6 13.3 0.3 D og 0.45


Ei IF 75
nZ
o ENCE x p
K T
13
17 75 8 875 I dg
2

stage
Wfi column variance
1 I II
2 2 z
X I µ I 42 XI Iz Xu 43 Is LXI Xz
lo 11.8 3.24 12 144 5 76 8 13.6 31.36
15 118 10.24 16 144 2.5 12 13.6 2 56
12 11.8 11 144 16 13.6 5
13 II a 1.44 15 144 0.36 Id 13.6 1936
11.8 7 ay 18 144 12.96 14 13.6 0.16
22 d 33 14 59 2

2
fnrljsft cnz Dsftcnz.it
n 1 na 1 ng l

4 22.8 t 4 33.14 t 4 59.2


gl L T
13256 460.56
236.8 38.46
12

Stage ANOVA Table


f
Elk I
Sources Sun P value
of Melhem f Ratio
vanilahlow dot squawks f Test
B w column Kt
variation 2 17 75 8.9 0.23

In h
within column 461 52
12 38
variation
n i
Total 478 31
14
variation

stage'T
389
fo O
ftab
F Ratio with rake
Stage Compare
c
f Ho

teal o us
tab

to is accepted
among
no variation the mileages
significant
of cars foo'm various brands

Exueffdislo 23.2 lu o
7ygo

Drake 54 V
804 Area

yyoo5 I
5 Area

Not
Since Prabe O 05

Ho is accepted
7thNov 2019
covariance of correlation

2
variance EC
n In
E x E x
ITI
f variability within of Y
X Y
Cov X Y f a

Infix Itu Il

X X I Y Y T x 5 y T
2 L l l 3 f 27 9
2 4 425 f l l 2
I l 21 4 5 o O

3 O 6 I o

l
4 I 7 2 2

15 E E 25 E i 2
5
I 3 7 5
L

Cov LX y i 2 o 4 O
5

co variance is not standardised statistical value


X f Y are
tuely related

Standardised co relation coefficient r

co variant

Degree of relationship

Co Degree of variability
Len
X Y XY Xl 42
2 3 6 4 9
20 25
5 4 16
I 5 5 1 25
3 6 18 g 36
4 7 21 16 49
F Is
I E Is
l Pearson's r Scale Quantitative

2 Spearman's l a categorical ordinal

1 Pearson's h
r wvc nexy EXEY

I y
µ XY nEy2 f
b lnExY Exsy
Deffnegability 2
I n Ex Ext n Yj
Defaniah

he 04 GEE ay
122
0.2

be 5 77 15 25

15 54
F 152 5 135 252

385 375

270 225 675 625

10

45 50
J
to O 21
4743

h O 21 O

Y corelated with r
x f are
tely
o 21

Degreeof variability a o oggi

r the correlation
g
EE ft t E t High ne n

hi 0.5 Med the n

ve is
ri O 5 il

2 Spearman's S

I
d
diftaum.in

X Y RHI RH d d2
2 3 4 5 I f
s 4 I 4 I't 6
161
3 g
5 5 3 z y 5125 D
3 6 3 2 l I i f 96
F 2 I 1 I 5 24
F Is
I
f t 96 t 0.2

Tao s

Same marks Average Rank


Reg 7
Multiple correlation
n
y

ydnm dy
MK t C
ft m o Cf
M O x

mi
DI
Ast

n
g

min

Does regression create


causality in relation

cause of effect

association not be causality


An may
Y Y
f mite
bn
f flu
at
y a intercept constant
b regression coefficientshpe

data and
Regression
doesn't cause
useful fr missing
prediction
causality
can be used for
interpretation but
under condition
given

we have
at bae
E
ye e Eat Ebu
y na t b Ex

at bx
y ax t b se
guy
s a Ext b Est
Eng
Substitution
CLemination
b i b NEXT EXEY cross tabulation
ya you
n Ex2 ex
dependent
on

big nExy ExEy


ney Kyi
hyx.dz

2
fnExy ExEy
nEx2 ExiJ LnEg2 ky5

S2

Ir tbg
Nozomi
BED

F f

TFW
part y
n
y ng n
y w
4 3 12 lb g g
o
3 7 21 9 49 a
I 6
7 7 49 I s
6 7 92 36 ng n
s
9 90 d 100 2.0
10 i
4 O O ly O iz s u is a 7 a s lo x

3 4 12 g 16
5 7 35 25 49
5 2 10 25 9
I 5 5 1 25

547 46 234 267

b n
say Easy
yr
next Ex

101234 47 46

10 267 971L

2340 2162 170

2670 2209 461

b o 3g
at bae
y at bI
g
bi
a
j4.6 O 39 4 7
2 77

of have
from we

2 77 t 0.39k
y
validation
T 2 767 t 039 4 7
y 4.6

next class Error


of model

µw9r iii Ssr it 2


x
y j4.33
e e
iz j ly ji
4 3 l 33
3 7 3.94 3.06
7 y 5.50 4.5
1
io
4 72 2 72 l
l s 316 1.84
Total 4746 o 03 83 53 7 01 predicted
Residual
Unpredicted Residual

The modal r 2.77 t 0.39k


g
predictor
I Independent explanatory
Independent
w
E
g r
o r
7 Or qq
b
s 617,46 City Residuals

EH
Errors
ee
z
I

I z s u s s 7 a 9 co x

at bae te
g variability that
se is creating in
co
2
g
linearly
a
t
y
2 e 2 77 O 39k
f
e
y g
lo
Now set
in
ErrorSum
ofsquares
sse

90
E
in y Ji 83.53

Regression Sim of squares


Coefficient of
determination R2
I

SSR

8353 SSE variation of observed valenefrom


predicted rate
F ol Ssr a variation
of predicted value
mean
from

SST go 54

R2 t SSI SSI
SST SST

t R2
SST

R2 T o 92
0.08
E O 92
g

s 008 R's caff of core


y

I tbg
r

ii

2
E 47 461
40,134
14614
X267 471440 302
i

O od

h t 0 28

Con
coff Heft of det

b b o 39
legless coff
Standardised to395
be regression coff
q Gj i 0.15
Close to R

sample to R th t
one independent variable

This modal is not


fit
a
good

Error t.ge 92
g
OVA TABLE
ECx I
Sources sun Mean f Ration P value
of
variation dot squares
square Ms

K I f dist
B w column
Ssb
Msg Ssd Msb
variation Kt Msw F Ratio k i n K
n k
within column MSwissin
Ssw n k
variation
k a sum of
Total
Ssbtssw
variation

squared

Adjusted
ji
sse
Ely
ssr
e
scg yay
t H M
Ely ji t
sse
2

no
c
Ely j Ecj g
of independent
variable sse t Ssr
Nilo
kit

t H o od
y O
gig
or
l l 035
O 035 independent
more
less y variables
my Adding explanatory
will not help buy sample
to the model
mynotimm.ge is
very
less

Standard Error Se i
f
Jn

µn7
Jd3j
3 23

Summarize the model


fit
R2 O o8

Adjusted
R o 035 Set model
fit 1
Se 3.23

Multiple R TN 0.2828 I O 29

T 2 767 t 039k t 3 23
Y
C R2 9 model t
fit
fl AdjR2 T model
fit 1

Under Overfilling model


fitting

R a coat
good model
Se how

got
filthy model
R over
se low

R2 lo l
filthy model
Under
se high

multi collinearity Effect


high collinearity

highly correlated values should be


fuelher analysed

check plot normal tes


Normality
Smirnov test
Kolmogorov

ANOVA not all regression coefficient are zero

e o
of ng th
tr b Mii Nui M ten
1 M 492 First flin

Sources sum Mean f Ration P value


of
variation dot squawks
square Ms

K l SSR 7 ol f diet
7.01 0.67 0.437
F 7 ol
Regression y 7 01 io.hu F Ratio k in K
n uI SSG gs.is I I 1
0 67
Residual g 83.53 filth a

n 4 SST
Total 90.54
variation 9

pp 0.05 model is not a

11
good fit
Overall Ho accepted

moddfit
atleast one value On
model f O

fo failed 5 318

floes faitical
Flo is accepted
tf at least one
cuff different from
others

You might also like