You are on page 1of 26

STATISTICAL TOOLS FOR DATA ANALYSIS

Tools for Data Analysis could be different with respect to type of data. Several statistical tools could be used to analyze data such as a) qualitative and quantitative and b) time series, cross section and panel data.

Specification of procedure and analytical tool to be used are related to the objectives of the study. Tool for data analysis are classified into three broad categories. Univariate Tools Bivariate Tools Multivariate Tools These classification is based on the number of variables a tool uses.

Classification of statistical Tools (Contd) Univariate Tools for Data Analysis


Frequency Tables & Distribution Histogram Ogive or Cumulative frequency curve Pie-Charts MCTs Measures of Dispersion Kurtosis etc Some of these tools are visual aids.

Classification of statistical Tools (Contd)

Bivariate Tools for Data Analysis Cross Tables Scatter Plots Correlation Bivariate Regression Trend Lines Binary Choice (Logistic Regression, Linear Probability Models using two variables) Etc

Classification of statistical Tools (Contd) A few Multivariate Tools are as follows


Multiple Regression Factor Analysis Cluster Analysis Discriminant analysis Multivariate Analysis of variance (MANOVA) Conjoint Analysis Canonical Correlation Multi Dimensional Scaling (MDS) Structural Equation Modeling (SEM) Logistic Regression using more than 2 variables etc

Master Table for survey Research


When data are collected through questionnaire in a social research a master table is prepared to summarize the data and conduct further analysis. A Master Table could be prepared either manually or using a soft ware package like an excel package. Code number could be used. After summarizing the data different statistical tools could be used to analyse those.

1. Univariate Tools
The primary objectives of use of univariate tools are a) to introduce the sample to the reader and b) to examine the nature of the variable in terms of its distribution. Uses of some of these tools are as follows: Frequency Tables. The objective of a frequency table is to summarize the raw data in a concise, systematic and meaningful way. Cumulative frequency tables, Histogram, PieCharts, Distributions can be prepared from this. Broad conclusions on the nature of the distribution of the data can be drawn from these tools.

Univariate Tools (Contd)


Summary Statistics (Descriptive Statistics)
Often the researcher is interested to represent a set of data in single number/figure with respect to a variable. For example : A researcher has a set of observation on income of a group of persons. He wants to summarize the variable for the group in terms of average and deviation from the average. Such statistical tools are known as descriptive statistics since the number/figure describe the distribution of the variables .

Univariate Tools (Contd) Some of the Summary Statistics are: Measures of Central Tendencies. Measures of Dispersion, Measures of Peaked ness.
a) MCT : Arithmetic Mean, AM = Xi/N (Simple average Weighted AM, WAM= WiXi/N ( Takes the importance of each value to the overall total) Geometric Mean: We use GM when we need to know the average rate of growth of a series of numbers. GM=Nth root of the product of n number of Xs.

Univariate Tools (Contd)


Harmonic Mean: It is used in cases where extreme values (usually higher values) are there in a series For exple: Let us consider the series of numbers 12,13,16,18, 11.16.19. 20.18. 17.14.89. 99. Arithmetic Mean may not represent the series.In such cases we use a harmonic mean to give less weightage to the higher values. H.M.=Reciprocal of the average of the reciprocals of N number of Xs i.e 1/AM of 1/12,1/13.. Median: The most central item. Mode: A value repeated most often

Univariate Tools (Contd)


Measures of Dispersion: Range, Mean Deviation, Variance and Standard Deviation.
They have several implications and uses in analysing a set of observations. For Exple: 1+/- one standard deviation covers about 66% of the sample in a normal distribution.

Statistical Tests
Z and 't' Tests are used to examine the significance of difference between sample and population means. Similarly 2 'F' Tests are used to examine the difference between sample and population variance.

2. Bivariate Tools
Bivariate Tools are used to highlight relationship between two variables. Some of the bivaraite tools are a) Cross Tables,Graphs and Scatter Plots b) Correlations (Rank and Simple) c) Bivariate linear and non.linear regression d) Binary Choice (Logistic Regression, Linear Probability Models using two variables) e)Trend Lines

Bivariate Tools (Contd..)


Scatter Plots (Gives an idea about the nature of relationship between the variables) Correlation Rank (Spearman) and Simple (Karl Pearson) correlations are used bivariate data analysis. These two types of correlation differ with respect to the types of data used. Ordinal scale (rank order) data are used for rank correlation where as metric data are used in simple correlation. Both of these use a specific formulae to calculate the correlation coefficient which ranges from -1 to 1. The correlation coefficient speaks about the direction and the extent of correlation. No cause and effect relationship is examined, but it should have construct validity. Redundant relationship should be avoided.

Bivariate Tools (Contd..)

Bivariate Linear Non-Linear Regression 1. Linear Regression: The simplest relationship between two variables is a linear one which can be specified as follows; Yi = + Xi + Ui , where Y - Dependent variable X- Independent variable U- Error term or disturbance term.

Bivariate Tools (Contd..)


A scatter plot gives us some idea about the relationship between two variables. There could be alternative lines representing the relationship between the variables. Consider ei and ei2 about the alternative lines. ei2 will be non-negative and will vary with the spread of the points from the lines. Now, each line has and values . Therefore ei2 will be a function of and . Therefore we need to minimize this with respect to and which will identify the line which will give the least square error.

Bivariate Tools (Contd..)


What is ei ? ei = Actual Observation on Y - Estimated Y Therefore, ei2 = (yi - y^i) 2 Or [yi - ( + Xi) ]2

ei2 = [yi - ( + Xi) ]2 This has to be minimized with respect to and to get the best fitted line which represents the relationship between X and Y.

Bivariate Tools (Contd..)


The process of minimization gives two normal equations with two unknowns. By solving the equations we get the formula for estimating the values of and . = xiyi/ xi 2 ( In deviation form) = Mean Y - Mean X
These estimates are known an Least Square Estimates. With the help of these estimated values of the intercept and the slope we can write the equation of the line of best fit.

Bivariate Tools (Contd..)


The null hypothesis (Ho):
A Null Hypothesis which is commonly tested is Ho : = 0 This means that there is no relation between X and Y Or the line is a straight line parallel to the X axis. This null hypothesis (Ho) is rejected if the computed 't' value is more than the tabulated 't' value with a certain degree of freedom and significance level.

Bivariate Tools (Contd..)


The Coefficient of Determination ( R2 )
Three quantities can be calculated from the line of regression with respect to the given Y and X values. TSS: Total Sum of Squares of the deviations ESS : Explained sum of Squares RSS: Residual Sum of Squares

R2 = Explained sum of Squares/ Total Sum of Squares.(When RSS declines ESS tends to TSS and
R2 approaches 1 (One) This is known as the explanatory power of the model.

Forms of Bivariate Regression Models and their uses


Various forms of two variable regression models have have different objectives/uses. A few examples: 1. Simple Linear Model Yi = + Xi + Ui , where
Y - Dependent variable, X- Independent variable and U- Error term It highlights the linear relationship between Y and X as discussed earlier.

2. Linear Trend
The linear growth of a variable can be calculated using a simple regression model such as Y = + t + u, where Y is the variable under consideration & t is the time or trend variable. The + ive or - ive trend of the variable over the time period is determined by looking at the sign of the slope or .

3.Log Linear Model Yi = Xi e , (Taking log) Ln Yi = ln + ln Xi + ei It is an exponential regression model ( known as double log or log linear model). This model is popular in applied work since the slope coefficient measures elasticity of Y with respect to X.(% change in Y due to % change in X) Exple: To estimate the advertising elasticity of a product we may use this model specifying : Sale Volume = f( Adv expdr).The slope will give the adv elasticity.

4. Semi-log Regression Model.


The semi log model could be used to measure growth rate of a variable over a time period. This model is specified as ln Yi = ln + t + u
This is known as semi log model since only one variable appears in log form. It is also known as log-lin model.

Semi-log Rgression Model..Contd.


In the semi log model the slope coefficient measures the constant proportion or relative change in Y for a given absolute change in X ( 't' in the above equation). Slope x 100 will give the point of time change in Y with respect to change in X. Compound growth rate can be found by The formulae: [ Antilog - 1] x 100

5. Quadratic/ Cubic Model:


The forms of a quadratic or a cubic model could be Y= a+bx+cX2 + u or Y= a+bx+cX2 +dX3+u
Since these models use one independent variable they can be categorized under the two variable regression equations. The quadratic models are used to examine whether minima or maxima exits in the curve depicting the relationship between X and Y. Expl: Total Rev Curve, Av Cost Curves etc. Cibic models are used in Total cost functions etc.

Assignment 2
Collect relevant data and estimate the Five Forms of Two Variable Regression Models explained above. Use SPSS package for estimation. Interpret the results
Exercises of randomly selected groups will be discussed in the next class. One class will be assigned for discussion of the results and interpretation.

You might also like