Professional Documents
Culture Documents
Analysis of Data
Exploring, Displaying, and Examining Data
Learning Ob e!ti"es
Understand t#at exploratory data analysis te!#ni$%es pro"ide insig#ts and data diagnosti!s by emp#asi&ing "is%al representations o' t#e data Understand #o( !ross-tab%lation is %sed to examine relations#ips in"ol"ing !ategori!al "ariables, ser"es as a 'rame(or) 'or later statisti!al testing, and ma)es an e''i!ient tool 'or data "is%ali&ation and later de!isionma)ing
T#is ,oot# -esear!# .er"i!es ad s%ggests t#at t#e resear!#er/s role is to ma)e sense o' data displays 0reat data exploration and analysis deli"ers insig#t 'rom data
19-*
Data +nalysis
Exploratory
2on'irmatory
3ie 2#art
,ar 2#art
Data analysis
Descriptive Statistics Descriptive statistics fall into one of the two categories viz. (a) measures of central tendency (mean, median and mode) and (b) measures of dispersion (standard deviation and variance). Their purpose is to explore hunches that may have come up during the course of the research process, but most people compute them to look at the normality of their numbers. Relational Statistics elational statistics fall into one of three categories! univariate, bivariate and multivariate analysis. Univariate "nivariate analysis is the study of one variables for a sub population. Bivariate #ivariate analysis is the study of a relationship between two variables. Multivariate $ultivariate analysis is the study of relationship between three or more variables. Inferential Statistics (Inductive Statistics) %nferential statistics fall into one of two categories (a) tests for difference of means, and (b) tests for statistical significance. The tests for statistical significance are further subdivided into parametric or nonparametric, depending upon whether researcher is inferring to the larger population as a whole (parametric) or the people in one sample (non parametric). The purpose of difference of means tests is to test the hypothesis and the most common techni&ue is called z-test. The most common parametric tests of significance are Ftest, T-test, ANOVA and regression. The most common non-parametric tests of significance are chisquare, the MannWhitne !-test and the "rus#a$Wa$$is test.
Types of Central Tendency or Average +"erages are o' t(o types; 19 <at#emati!al a"erages:<ean
T#e most !ommonly %sed meas%re o' !entral tenden!y is t#e mean9 To !omm%te t#e mean, yo% add %p all t#e n%mbers and di"ide t#e total by t#e total n%mbers9 It is not t#e a"erage nor a sel' (ay point, b%t a )ind o' !enter t#at balan!es #ig# n%mbers (it# lo( n%mbers9 <ean !an be !ategori&ed as; =a> +rit#meti! =b> 0eometri! =!> ?armoni!
+rit#meti! mean;
In case of an Individual series that is when only x are given
Arithmetic mean:
In case of Discrete series i.e. when x and f are given
Arithmetic mean:
In case of continuous series, that is when class intervals are given
Merits of arithmetic mean .imple and easy to %nderstand 2onsiders ea!# and e"ery item o' t#e series .imilarity in t#e ans(er a'ter appli!ation o' t#e 'orm%la o' a"erage -igidly de'ined 2apable o' '%rt#er algebrai! treatment Does not depend on t#e position o' t#e series Does not 'l%!t%ate (it# sampling Demerits of arithmetic mean It is %nd%ly e''e!ted by extreme "al%es It may introd%!e error in !ase o' open end !lasses !annot be geograp#i!ally lo!ated 2annot be determined by inspe!tion li)e mode Not s%itable in !ase o' $%alitati"e data In !ase o' normal distrib%tion, mean is not a good meas%re o' !entral tenden!y9
0eometri! mean;
It is de'ined as t#e nth root of the product of n items.
?armoni! mean;
It is defined as the reciprocal of the arithmetic mean of the reciprocal of the individual observations.
Median:
Median refers to the middle value in a distribution. It is a positional average. It is the exact mid point in a ran ed distribution of numbers! and not the exact average! it is the half "ay point.
Merits of median
%se'%l in !ase o' open end !lasses sin!e it is a positional a"erage9 easy to !al!%late is !ase o' %ne$%al inter"als o' !lass is not strongly a''e!ted by extreme obser"ations !an be determined geograp#i!ally9
Demerits of median
data arrangement is "ery m%!# ne!essary does not ta)e into a!!o%nt ea!# and e"ery item o' t#e series not !apable o' '%rt#er algebrai! treatment a''e!ted by 'l%!t%ation in sampling errati! i' t#e n%mber o' item is less9
<ode;
<ode is t#at "al%e o' t#e "ariate (#i!# o!!%rs (it# maxim%m 're$%en!y9 <ode !an also be de'ined as t#at "al%e abo%t (#i!# t#e items are most !losely !on!entrated9 It is t#e "al%e (#i!# #as t#e greatest 're$%en!y density in its immediate neig#bo%r#ood9 <ode !an be applied to nominal, ordinal or inter"al data9 It is al(ays t#e "al%e eit#er $%alitati"e or $%antitati"e t#at o!!%rs t#e most o'ten9
.tandard de"iation is de'ined as t#e s$%are root o' t#e mean o' t#e s$%ared de"iations 'rom t#e arit#meti! mean9 It is denoted by 0ree) letter s =sigma>9 It is an absol%te meas%re o' dispersion, a small standard de"iation means a #ig# degree o' stability or %ni'ormity o' t#e obser"ations as (ell as #omogeneity o' t#e series9
.tandard De"iation
Variance
T#e s$%are o' standard de"iation is !alled "arian!e9 Coefficient of variations:
It is a relative measure of dispersion corresponding to standard deviation. It "as given by #arl $earson. It is used in such problems "here a researcher "ant to compare the variability of t"o or more series. The series for "hich coefficient of variation is greater is said to be more variable or less uniform and less consistent.
.)e(ness
.)e(ness re'ers to depart%re 'rom t#e symmetry9 + distrib%tion in (#i!# mean, median and mode do not !oin!ide is said to be asymmetri!al or s)e(ed and i' mean, median, mode are identi!al, distrib%tion is symmetri!al
Types of skewness
=i> 3ositi"ely s)e(ed distrib%tion =ii> Negati"ely s)e(ed distrib%tion $ositively s e"ed distribution
A distribution in "hich mean is maximum and mode is least and median lies in bet"een mode and mean is called a positively s e"ed distribution
Measures of skewness
=i> Absolute measure of skewness
=ii>
2orrelation
2orrelation analysis re'ers to t#e te!#ni$%es %sed 'or meas%ring t#e degree o' relations#ip bet(een t#e "ariables 2orrelation analysis deals (it# t#e asso!iation bet(een t(o or more "ariables9 "ignificance 2orrelation enable %s to determine t#e degree o' relations#ip existing bet(een t#e "ariables9 +ids in lo!ating t#e important "ariables on (#i!# ot#ers depend9 ?elps in t#e 'ield o' s!ien!e and p#ilosop#y9 Is %sed by t#e e!onomist and exe!%ti"es in b%siness9 Is #elp'%l in ma)ing predi!tions 'or '%t%re (#i!# are li)ely to be more "al%able and near to reality9
Types of correlation
Positive or negative correlation ' correlation is said to be positive if both the variables vary in the same direction. That is if one variable increases, the other on an average is also increasing or if one variable decreases the other on an average decreases and the vice(versa. Simple, partial and multiple correlation (a) Simple correlation. %imp$e corre$ation in&o$&es the stud of on$ t'o &aria($es. (b) Partial correlation. )n partia$ corre$ation, three or more &aria($es are under stud , (ut on$ t'o &aria($es are considered to (e inf$uencing each other, the effect of other (eing constant. (c) Multiple correlation. Mu$tip$e corre$ation in&o$&es the stud of three or more than three &aria($es simu$taneous$ . Linear and non-linear correlation %t is based upon the constancy of the ratio of change between the variables. (a) Linear correlation. )f the amount of change in one &aria($e (ears a constant ratio to the amount of change in the other &aria($e then the corre$ation is $inear.. (b) Non-linear or curvilinear correlation. *orre$ation is said to (e non$inear if the amount of change in one &aria($e does not (ear a constant ratio to the amount of change in the other &aria($e.
!"Test
D-test is based on t#e normal probability distrib%tion and is %sed 'or %dging t#e signi'i!an!e o' se"eral statisti!al meas%res, parti!%larly t#e mean9 T#e rele"ant test statisti!s, D, is (or)ed o%t and !ompared (it# its probable "al%e =to be read 'rom table s#o(ing area %nder normal !%r"e> at a spe!i'ied le"el o' signi'i!an!e 'or %dging t#e signi'i!an!e o' t#e meas%re !on!erned9 T#e test statisti!s re'erred t#%s is t#e "al%e obtained 'rom t#e sample data t#at !orresponds to t#e parameter %nder in"estigation9 T#is test is %sed (#en binomial distrib%tion or t-test is appli!able on t#e pres%mption t#at s%!# a distrib%tion tends to approximate normal distrib%tion as n becomes larger. D-test is generally %sed 'or !omparing t#e mean o' a sample to some #ypot#esised mean 'or t#e pop%lation in !ase o' large sample, or (#en pop%lation "arian!e is )no(n9 D-test is also %sed 'or %dging t#e signi'i!an!e o' di''eren!e bet(een means o' t(o independent samples in !ase o' large samples, or (#en pop%lation "arian!e is )no(n9
!"Test
t-Test or t-Distrib%tion
t-test is based on t-distrib%tion and is !onsidered an appropriate test 'or %dging t#e signi'i!an!e o' a sample mean or 'or %dging t#e signi'i!an!es o' di''eren!e bet(een t#e means o' t(o samples in !ase o' small samples =s> (#en pop%lation "arian!e is not )no(n9 In !ase t(o samples are related (e %se paired t-test =or (#at is )no(n as di''eren!e test> 'or %dging t#e signi'i!an!e o' t#e mean o' di''eren!e bet(een t#e t(o related samples9
t is !al!%lated 'rom t#e sample data and t#en !ompared (it# its probable "al%e based on t-distrib%tion at a spe!i'ied le"el o' signi'i!an!e 'or !on!erning degrees o' 'reedom 'or a!!epting or re e!ting t#e N%ll ?ypot#esis9
Introd%!tion
?ypot#esis testing enables resear!#ers to ma)e probability statements abo%t pop%lation parameter=s>9 ?ypot#esis may not be pro"ed absol%tely, b%t in pra!ti!e it is a!!epted, i' it #as (it#stood a !riti!al testing9 6or a resear!#er, #ypot#esis is a 'ormal $%estion t#at #e intends to resol"e9 + resear!# #ypot#esis is a predi!ti"e statement, !apable o' being tested by s!ienti'i! met#ods, t#at relates an independent "ariable to dependent "ariable9 In a layman/s term it means a mere ass%mption or some s%pposition to be pro"ed or dispro"ed9 ,%t in terms o' resear!#, a #ypot#esis is a 'ormal $%estion t#at resear!#er intends to resol"e9 ?ypot#esis testing is t#e pro!ess o' p%tting t#e #ypot#esis to a test or set o' tests to determine its "alidity9
(lternative hypothesis %$a' +n alternati"e #ypot#esis =?a> is t#e logi!al opposite o' t#e n%ll #ypot#esis, t#at is, an alternati"e #ypot#esis m%st be tr%e (#en t#e n%ll #ypot#esis is 'o%nd to be 'alse9 In ot#er (ords t#e alternati"e #ypot#esis states t#at spe!i'i! pop%lation parameter "al%e is not e$%al to t#e "al%e stated in t#e n%ll #ypot#esis9
T#e (ord report is deri"ed 'rom a Latin (ord F-eportareG (#i!# means to !arry ba!) =-e H ba!) B 3ortare H to !arry>9 + &eport is a des!ription o' an e"ent !arried ba!) to someone (#o (as not present on s!ene9 It is a 'ormal !omm%ni!ation (ritten 'or a spe!i'i! p%rpose, and in!l%des a des!ription o' pro!ed%res 'ollo(ed 'or !olle!tion and analysis o' data, t#eir signi'i!an!e, t#e !on!l%sion dra(n 'rom t#em and re!ommendation, i' re$%ired9 + report !an be des!ribed as a statement prepared to present 'a!ts relating to planning, !oordinating per'orman!e, and t#e general state b%siness in an organi&ation9 + report may be de'ined as a do!%ment in (#i!# a gi"en problem is examined 'or t#e p%rpose o' !on"eying in'ormation, report 'indings, p%tting 'or(ard ideas and ma)ing re!ommendations9
pages
Main text
Introd%!tion .tatement o' 'indings -e!ommendation -es%lts Impli!ations dra(n 'rom t#e resear!# res%lts .%mmary
Title and date +!)no(ledgement 3re'a!e or 'ore(ord Table o' !ontents List o' tables and ill%strations
+nd matter
+ppendi!es ,ibliograp#y
+ength of the report should be long enough to cover the sub)ect but short enough to maintain the interest %t should not be dull 'bstract terminologies and technical )argons should be avoided %t should be providing &uick knowledge of the main findings +ayout should be appropriate in accordance to the ob)ectives ,ree from grammatical mistakes %t must present the logical analysis of the sub)ect matter %t should show originality %t must also state the policy implications relating to the problem under consideration %t should have relevant appendices, bibliography and index %t must be attractive
Mechanics
of 'riting a &eport
"i.e and physical design *rocedure /ayout ,reatment of 0uotations )ootnotes Documentation style *unctuations and a ,he final draft -i liography!References Index reviations in footnotes 1se of statistics, charts and graphs
(ral $resentations
0etting .tarted
Identi'y t#e expe!tations o' t#e assignment
?o( long s#o%ld t#e presentation beJ 8#at is t#e intent o' t#is presentationJ To in'ormJ 3ers%adeJ 2riti$%eJ Ed%!ateJ InspireJ
2ontent 3%rpose
Deli"ery
+%dien!e
3resenter
T#e diagram on t#e pre"io%s slide ill%strates t#e parts o' a r#etori!al sit%ation =li)e t#e one !reated by yo%r oral presentation>9 Identi'ying t#e p%rpose and t#e a%dien!e o' yo%r presentation (ill allo( yo% to !#oose t#e appropriate !ontent and deli"ery style9 6or example, i' t#e p%rpose is to pers%ade and t#e a%dien!e is biased, t#e !ontent s#o%ld in!l%de espe!ially !ompelling e"iden!e9 Ko% s#o%ld also deli"er t#is !ontent in a (ay t#at ass%res yo%r s)epti!al a%dien!e yo% are a tr%st(ort#y expert9