Professional Documents
Culture Documents
Statistical Tools For Data Analysis
Statistical Tools For Data Analysis
Tools for Data Analysis could be different with respect to type of data. Several statistical tools could be used to analyze data such as a) qualitative and quantitative and b) time series, cross section and panel data.
Specification of procedure and analytical tool to be used are related to the objectives of the study. Tool for data analysis are classified into three broad categories. Univariate Tools Bivariate Tools Multivariate Tools These classification is based on the number of variables a tool uses.
Bivariate Tools for Data Analysis Cross Tables Scatter Plots Correlation Bivariate Regression Trend Lines Binary Choice (Logistic Regression, Linear Probability Models using two variables) Etc
1. Univariate Tools
The primary objectives of use of univariate tools are a) to introduce the sample to the reader and b) to examine the nature of the variable in terms of its distribution. Uses of some of these tools are as follows: Frequency Tables. The objective of a frequency table is to summarize the raw data in a concise, systematic and meaningful way. Cumulative frequency tables, Histogram, PieCharts, Distributions can be prepared from this. Broad conclusions on the nature of the distribution of the data can be drawn from these tools.
Univariate Tools (Contd) Some of the Summary Statistics are: Measures of Central Tendencies. Measures of Dispersion, Measures of Peaked ness.
a) MCT : Arithmetic Mean, AM = Xi/N (Simple average Weighted AM, WAM= WiXi/N ( Takes the importance of each value to the overall total) Geometric Mean: We use GM when we need to know the average rate of growth of a series of numbers. GM=Nth root of the product of n number of Xs.
Statistical Tests
Z and 't' Tests are used to examine the significance of difference between sample and population means. Similarly 2 'F' Tests are used to examine the difference between sample and population variance.
2. Bivariate Tools
Bivariate Tools are used to highlight relationship between two variables. Some of the bivaraite tools are a) Cross Tables,Graphs and Scatter Plots b) Correlations (Rank and Simple) c) Bivariate linear and non.linear regression d) Binary Choice (Logistic Regression, Linear Probability Models using two variables) e)Trend Lines
Bivariate Linear Non-Linear Regression 1. Linear Regression: The simplest relationship between two variables is a linear one which can be specified as follows; Yi = + Xi + Ui , where Y - Dependent variable X- Independent variable U- Error term or disturbance term.
ei2 = [yi - ( + Xi) ]2 This has to be minimized with respect to and to get the best fitted line which represents the relationship between X and Y.
R2 = Explained sum of Squares/ Total Sum of Squares.(When RSS declines ESS tends to TSS and
R2 approaches 1 (One) This is known as the explanatory power of the model.
2. Linear Trend
The linear growth of a variable can be calculated using a simple regression model such as Y = + t + u, where Y is the variable under consideration & t is the time or trend variable. The + ive or - ive trend of the variable over the time period is determined by looking at the sign of the slope or .
3.Log Linear Model Yi = Xi e , (Taking log) Ln Yi = ln + ln Xi + ei It is an exponential regression model ( known as double log or log linear model). This model is popular in applied work since the slope coefficient measures elasticity of Y with respect to X.(% change in Y due to % change in X) Exple: To estimate the advertising elasticity of a product we may use this model specifying : Sale Volume = f( Adv expdr).The slope will give the adv elasticity.
Assignment 2
Collect relevant data and estimate the Five Forms of Two Variable Regression Models explained above. Use SPSS package for estimation. Interpret the results
Exercises of randomly selected groups will be discussed in the next class. One class will be assigned for discussion of the results and interpretation.