You are on page 1of 20

INTRODUCTION

Applied  Sta+s+cs  and  Compu+ng  Lab   Indian  School  of  Business

Applied  Sta+s+cs  and  Compu+ng  Lab

LEARNING  GOALS
•  What  is  the  importance  of  sta=s=cs?   •  When  is  sta=s=cs  needed?   •  Where  can  sta=s=cs  be  used?

Applied  Sta+s+cs  and  Compu+ng  Lab

A  SMALL  STORY  FROM  THE  PAST
•  World  War  II,  the  Royal  Air  Force(  RAF)  wanted  to   ﬁt  their  aircraL  with  armor   •  ‘Where’  to  ﬁt  this  armor?     •  Imagine:
–  all  the  aircraLs  were  shot  in  the  exact  same  places     –  each  German  bomber  aRacks  aircraLs  in  the  exact   same  manner

•  There  is    varia=on.     •  To  answer  such  ques=ons,  we  need  ‘STATISTICS’.
Applied   S ta+s+cs   and   Compu+ng   Lab   Sta+s+cs  is   the   grammar   of  science   –  KARL  PEARSON

THE  STORY  CONTD…
•  First  gather/collect  some  informa=on   •  Relevant  informa=on     •  ‘DATA’:  relevant  informa=on,  collected  with   the  aim  of  answering  certain  ques=ons

Applied  Sta+s+cs  and  Compu+ng  Lab

THE  STORY  CONTD…
Acknowledge  variability    collect  data    answer  the  ques=on.   To  ﬁnd  an  answer  that  is  valid  for  the   ‘popula=on’  [the  set  of  all  objects  on  which   we  want  to  make  inferences]   •  The  RAF  concluded  that  armor  had  to  be  ﬁt  in   all  the  places  that  these  aircraLs  had  bullet   holes.   •  •  •  •
Applied  Sta+s+cs  and  Compu+ng  Lab

THE  STORY  CONTD…
•  Abraham  Wald,  a  famous  sta=s=cian  didn’t  agree   with  this.   •  Fit  the  armor  in  places  with  no  damage!   •  The  RAF  considered  only  one  part  of  the   popula=on.     •  We  call  a  part/subset  of  the  popula=on  a   ‘sample’.   •  This  sample  (aircraLs  that  returned  aLer  combat)   is  not  representa=ve  of  the  popula=on
Applied  Sta+s+cs  and  Compu+ng  Lab

THE  STORY  CONTD…
•  What  about  those  aircraLs  that  did  not   survive?     •  The  planes  that  survived  showed  that  the   damage  they  underwent  was  not  fatal.   •  As  we  will  soon  see,  for  more  accurate   conclusions  to  be  drawn,  a  sample  is  always   required  to  be  representa=ve  of  the   popula=on  that  it  is  taken  from.
Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  1
•  The  Used  Car  industry  of  the  US   •  How  are  these  cars  priced?   •  How  do  you  measure  the  rate  at  which  a  used   car  is  to  be  sold?   •  How  can  you  determine  its  value  based  on   several  characteris=cs  ?

Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  1  CONTD…
•  Do  leather  seats  aRract  customers  more  than  the   size  of  the  engine?   •  If  a  car  of  Buick  make  has  a  4  cylinder  engine,   leather  seats  but  has  travelled  more  miles  than  a   Cadillac  with  a  6  cylinder-­‐engine,  are  the  prices   same?   •  What  is  the  expected  retail  price  of  a  Chevrolet   sedan  with  a  4  cylinder-­‐engine  that  is  one  year   old  and  has  already  run  12987  miles?
Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  2
•  Cardiovascular-­‐diseases  (CVD)  are  becoming   unfortunately  more  common   •   If  a  person  were  to  ask  a  doctor  to  evaluate  their   CVD  risk,  how  would  the  doctor  go  about  it?   •  We  oLen  hear  that  being  overweight  increases   the  risk  of  CVD   •  Not  en=rely  accurate   •  It  is  actually  the  ‘body  fat’  or  the  ‘adipose’  along   with  the  degree  of  obesity
Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  2  CONTD…
•  Studies  have  shown  that  individuals  with  excess   body  fat  in  the  abdominal  area  have  a  higher  risk   •  ‘Computed  Tomography’  (CT  scan)  is  the  only   technique  that  allows  for  the  precise  and  reliable   measurement  of  the  AT  (at  any  site  in  the  body).
•  many  physicians  do  not  have  access  to  this  method  to   evaluate  their  pa=ents   •  Irradia=on  of  the  pa=ent  (suppresses  the  immune  system)   •  Expensive

Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  2  CONTD…
•  Is  there  a  simpler  yet  reasonably  accurate  way  to  predict   AT  area?  That  is:
–  Easily  available   –  Inexpensive     –  Risk  free

•  A  group  of  researchers  (Jean-­‐Pierre  Després,  Denis  Prud’homme,  Marie-­‐Chris7ne   Pouliot,  Angelo  Tremblay,  and  Claude  Bouchard)  conducted  a  study  with  the  aim   of  predic=ng  the  area  of  abdominal    AT  using  simple   anthropometric  measurements  i.e.  measurements  on  the   human  body   •  Various  measurements  were  considered:
–  Weight,  Subcutaneous  skin-­‐fold  thickness,  Hip  circumference,   Waist  circumference  etc

Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  2  CONTD…   STATISTICAL  INVESTIGATION
•  Now  the  ques=on  is,  can  any  of  these  help   predict  the  AT  area?  For  example:
–  ‘  Can  the  waist  circumference  of  an  individual  predict   the  amount  of  AT  he/she  has?’

•  This  is  where  sta=s=cal  inves=ga=on  begins.  With   a  ques=on.     •  Observing  the  WC  of  all  people  is  not  feasible   hence  only  a  few  people  are  considered  and   based  on  observa=ons  made  on  them  inferences   can  be  drawn
Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  3
•  If  the  student  knew  his/her  internal  assessment   marks  and  previous  year  CGPA  can  they  get  an   idea  of  how  they  might  perform  in  the  ﬁnals?   •  Suppose  that  a  student  had  the  following   informa=on  on  50  students  from  the  previous   batch:
–  Marks  in  the  ﬁnal  examina=on   –  Marks  in  3  internal  assessment  tests  held  during  the   academic  year   –  CGPA  obtained  in  the  previous  year
Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  3  CONTD…
•  Simply  using  this  data  and  few  sta=s=cal  tools  we  can   answer  various  interes=ng  ques=ons:
–  Can  a  student  predict  what  range  his\her  ﬁnal  examina=on   score  will  lie  in?   –  If  only  the  best  two  internal  marks  are  considered,  do  the  2nd   internal  marks  have  a  more  important  eﬀect  on  the  ﬁnal  score?   –  Does  performing  well  in  two  internals  bad  in  the  other  eﬀect   the  ﬁnals?   –  Is  it  correct  to  say  someone  that  did  well(or  not  so  well)  in   internals  will  do  well  (  or  not  so  well)  in  the  ﬁnals?   –  Does  the  previous  year’s  CGPA,  which  doesn’t  depend  on  the   present  course,  eﬀect    the  ﬁnal  scores?
•  If  yes  ,  it  could  mean  previous  year’s  CGPA  captures  the  innate  ability   of  the  student  which  otherwise  we  cannot  measure!

Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  4
•  Marke=ng  research   •  The  scenario:
There  are  4  stores  :  ‘OﬃceStar’,  ‘Paper  &  Co.’,  ‘Oﬃce  Equipment’,   ‘Supermarket’   There  are  some  customers  that  have  visited  and  made  purchases  from   each  of  these  stores     The  stores  collect  certain  feed  back  from  each  of  the  customers.  Each   customer  rates  each  store  on  a  scale  from  1  to  5  (  1  being  the  lowest   and  5  the  highest)  on  the  following  aRributes:
Large  choice  (  wide  variety)    Low  prices    Service  quality    Product  quality    Convenience     Preference  Score    (  overall  sa=sfac=on  score)   Applied  Sta+s+cs  and  Compu+ng  Lab

EXAMPLE  4  CONTD…
These  stores  are  interested  in  answering  the  following   ques=ons:   •  What  part  of  the  varia=on  in  the  ra=ngs  between  stores  is   because  of  the  customers  and  not  the  stores  themselves?   •  Does  a  par=cular  class  of  customers  (age  wise,  gender  wise,   locality  wise  etc.)  prefer  a  par=cular  store?   •  Does  a  par=cular  store  serve  a  par=cular  class  of  people  more   eﬃciently?   Sta=s=cs  can  help  provide  answers  to  the  above  ques=ons  with   a  reasonable  level  of  accuracy.

Applied  Sta+s+cs  and  Compu+ng  Lab

CONCLUDING  REMARKS
•  The  diﬃculty  in  providing  straighsorward   answers  to  all  the  above  ques=ons  arises  from   the  fact  that  there  is  variability.

•  The  idea  behind  introducing  the  above  few   examples  is  to  emphasize  on  the  need  for   sta=s=cal  inves=ga=on  when  a  ques=on  needs  to   be  answered  or  a  hypothesis  tested  for  accuracy   in  the  presence  of  varia=on.
Applied  Sta+s+cs  and  Compu+ng  Lab

–  Diﬀerent  cars,  diﬀerent  users  and  hence  diﬀerent   status  of  the  car  aLer  a  year   –  Diﬀerent  people,  diﬀerent  ages,  diﬀerent  weights  etc

CONCLUDING  REMARKS  CONTD…
The  rest  of  this  tutorial  is  intended  to  help  the   user  understand  the  concepts  behind  several   sta=s=cal  techniques  and  apply  them   eﬀec=vely

Applied  Sta+s+cs  and  Compu+ng  Lab

Thank  you

Applied  Sta+s+cs  and  Compu+ng  Lab