7 views

Original Title: stats_project.docx

Uploaded by Syeda Raiha Raza Gardezi

stats_project.docx

statitic

Attribution Non-Commercial (BY-NC)

- Functii pentru Excel
- dck20110215133150 NedPower Mount Storm Wind Energy Avian and Bat Mortality Monitoring July - October 2010
- Multiple Regression
- Example How to Perform Multiple Regression Analysis Using SPSS Statistics
- effect_characteristics_attitude_Linkages_Research_Extension_personnel_Farming_Yemen_Alshrjabietal.pdf
- 2 Measuring Student Satisfaction in Publ
- Correlation and regression
- spss ppt
- Jurnal Translate English
- Correlation, Regression Analysis in Civil Engineering
- Multivariate Analysis of Water Quality in Rosario Islands National Park (Colombia)
- Finan Regression Sas
- bettcher term paper apsy 607
- STHELP
- Globalization And
- lse
- Chapter 13 - Correlation and Linear Regression
- 18001.pdf
- Colo
- Square Surface Aerator Process Modeling and Parameter Optimization

You are on page 1of 14

Final Project

1/1/2011

CORRELATION:

Correlation is a statistical measurement of the relationship between two variables. Possible correlations range from +1 to 1.

A zero correlation indicates that there is no relationship between the variables. A correlation of 1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together. In other words Correlation computes the value of the Pearson correlation coefficient, r. Its value ranges from -1 to +1. The correlation answers the STRENGTH of linear association between paired variables, say X and Y Correlation is calculated whenever: * both X and Y is measured in each subject and quantify how much they are linearly associated. * in particular the Pearson's product moment correlation coefficient is used when the assumption of both X and Y are sampled from normally-distributed populations are satisfied * or the Spearman's moment order correlation coefficient is used if the assumption of normality is not satisfied. * correlation is not used when the variables are manipulated, for example, in experiments. if you interchange variables X and Y in the calculation of correlation coefficient you will get the same value of this correlation coefficient REGRESSION:

Technique of fitting a simple equation to real data points. The most typical type of regression is linear regression (meaning you use the equation for a straight line, rather than some other type of curve), constructed using the least-squares method (the line you choose is the one that minimizes the sum of the squares of the distances between the line and the data points). It's customary to use "a" or "alpha" for the intercept of the line, and "b" or "beta" for the slope; so linear regression gives you a formula of the form: y = bx + a Linear regression quantifies goodness of fit with r2, sometimes shown in uppercase as R2.

The "best" linear regression model is obtained by selecting the variables (X's) with at least strong correlation to Y, i.e. >= 0.80 or <= -0.80 The same underlying distribution is assumed for all variables in linear regression. Thus, linear regression will underestimate the correlation of the independent and dependent when they (X's and Y) come from different underlying distributions.

The regression tells us the FORM of linear association that best predicts Y from the values of X.

Linear regression is used whenever: * at least one of the independent variables (Xi's) is to predict the dependent variable Y.Some of the Xi's are dummy variables, i.e.

Xi = 0 or 1, which are used to code some nominal variables. * if one manipulates the X variable, e.g. in an experiment. Linear regression are not symmetric in terms of X and Y. That is interchanging X and Y will give a different regression model (i.e. X in terms of Y) against the original Y in terms of X.

Q2. Give Any Example of Spurious Correlation between any two REAL WORLD variables & highlight the hidden factor. "Spurious Relation (or Correlation) : A situation in which measures of two or more variables are statistically related (they cover) but are not in fact causally linkedusually because the statistical relation is caused by a third variable. When the effects of the third variable are removed, they are said to have been partialed out. A spurious correlation, as defined in definition a, is sometimes called an "illusory correlation." Lurking Variable. A third variable that causes a correlation between two others sometimes, like the troll under the bridge, an unpleasant surprise when discovered. A lurking variable is a source of a spurious correlation.For example, if researchers found a correlation between individuals' college grades and their income later in life, they might wonder whether doing well in school increased income. It might; but good grades and high income could both be caused by a third (lurking or hidden variable) such as tendency to work hard." For example, if the students in a psychology class who had long hair got higher scores on the midterm than those who had short hair, there would be a correlation between hair length and test scores. Not many people, however, would believe that there was a causal link and that, for example, students who wished to improve their

grades should let their hair grow. The real cause might be gender: that is, women (who usually have longer hair) did better on the test. Or that might be a spurious relationship too. The real cause might be class rank: Seniors did better on the test than sophomores and juniors, and, in this class, the women (who also had longer hair) were mostly seniors, whereas the men (with shorter hair) were mostly sophomores and juniors." Here in the example long hairs is one variable and grades the other, while the cause of higher grades might be Gender but here they have stressed upon long hairs and grades which indicates that in their class women achieve high grades. So this indicates the FALLACY factor of two variable.

a. Explaining the relationship between Y and X variables with a model b. Estimating and testing the intensity of their relationship c. Given a fixed x value, we can predict y value.

Applications of regression analysis exist in almost every field. In economics, the dependent variable might be a family's consumption expenditure and the independent variables might be the family's income, number of children in the family, and other factors that would affect the family's consumption patterns. In political science, the dependent variable might be a state's level of welfare spending and the independent variables measures of public opinion and institutional variables that would cause the state to have higher or lower levels of welfare spending. In sociology, the dependent variable might be a measure of the social status of various occupations and the independent variables characteristics of the occupations (pay, qualifications, etc.). In psychology, the dependent variable might be individual's racial tolerance as measured on a standard scale and with indicators of social background as independent variables. In education, the dependent variable might be a student's score on an achievment test and the independent variables characteristics of the student's family, teachers, or school.

MATHEMATICAL EQUATION:

B1=0.976 B0=250.553 Y^ = bo + b1 X1

Y^ = 395.5817809 + 0.976 X1

Therefore,for every 1 unit change in X1, there will be 0.976 change in Y^ as 0.976 is the gradient of the function. The Y intercept of the function is 395.5817809 .

R^2= SSR / SST 1.23 / 1.347 r^2 = 0.912927 The value of r^2 is quite high (91.29 %) meaning there is a strong relationship and dependence b/ w X & Y.

= 0.910439

After adjusting for the no. of explanatory variables and sample size, the adjusted R^2 is also very high ( 91.04 % ) which means there is strong dependance between X & Y .

(v)

SCATTER PLOT

20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 10 20 30 40 Series1 2 per. Mov. Avg. (Series1)

T he scatter plot also supports our analysis that on average both the variables show same movements and trend.

DESCRIPTIVE STATISTICS: Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. There are three major characteristics of a single variable that we tend to look at:

The Distribution. The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. Central Tendency. The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency:

Dispersion. Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range, the standard deviation and the variance.

DESCRIPTIVE STATS

APPLICATION OF F- TEST.

7.83525

MSR = SSR / k

= 7.83525

SSE = Sum (Y y) ^2

= 1346829407

= 37411927.98

F = MSR/ MSE

= 2.03614

0.323985026

F c.v =

2.09432 > 0.323985026 Since F > F c.v , so reject H0 : B1 = B2 = 0 Therefore, Ha : at least 1 B is not = 0

Intercept X Variable 1

APPLICATION OF t TEST

HYPOTHESIS Ho : B1 =0 Ha : B1 is not = 0

t = ( b1 - B1) / S(b1)

d.f = n-2 = 37 -2 = 35

Standard error of model = [{sum ( Y^ - Y )^2}/ 35]^ 0.5 = 3364957 S ( b1) = standard error of model / [sum ( X- x)^ 0.5 = 3364957/1285049376

=93.86846

t = 1.13051

t c.v = 0.324174

As

t > t c.v

So reject Ho : B1 =0. Therefore, as the gradient is not equal to 0 there is a linear relationship b/w Y & X and they are dependent on each other.

P VALUE ANALYSIS :

Thus, accept Ho :

Mean of X=6350.59

V=72784603.35

Df=35

Mean of Y=6549.84

V=78646765.38

Df=36

536810.8654

Df 1

Ms

536810.8654

F 0.0071

5378744671 71 =5378744671+536810.8654 72

75756967.2

6000 5000 4000 3000 2000 1000 0 -1000 0 -2000 -3000 -4000 -5000 5000 10000 15000 20000 Series1

As the residuals are identically divided around the mean i.e. 0 , thus, the errors are independent and random & are so they are IID .

- Functii pentru ExcelUploaded byCristina Georgiana Răducanu
- dck20110215133150 NedPower Mount Storm Wind Energy Avian and Bat Mortality Monitoring July - October 2010Uploaded byztower
- Multiple RegressionUploaded bypriy14
- Example How to Perform Multiple Regression Analysis Using SPSS StatisticsUploaded byMunirul Ula
- effect_characteristics_attitude_Linkages_Research_Extension_personnel_Farming_Yemen_Alshrjabietal.pdfUploaded byKHALIL M. ALSHARJABI
- 2 Measuring Student Satisfaction in PublUploaded bySajalRoy
- Correlation and regressionUploaded byPranusha Reddy
- spss pptUploaded bychandanprakash30
- Jurnal Translate EnglishUploaded byRetno Susilowati
- Correlation, Regression Analysis in Civil EngineeringUploaded byPIUS
- Multivariate Analysis of Water Quality in Rosario Islands National Park (Colombia)Uploaded byAJER JOURNAL
- Finan Regression SasUploaded byBack-syir Barrock
- bettcher term paper apsy 607Uploaded byapi-162509150
- STHELPUploaded byNeha_ Sharma08
- Globalization AndUploaded byOmar Hasan
- lseUploaded by林金龍
- Chapter 13 - Correlation and Linear RegressionUploaded byBrenda Tandayu
- 18001.pdfUploaded byAnonymous Xwoj0INcKI
- ColoUploaded byOssama Makaveli
- Square Surface Aerator Process Modeling and Parameter OptimizationUploaded byBimleshKumar
- Chapter 09 - AnswerUploaded byLei Chumacera
- unit 5 - description and planUploaded byapi-274589624
- Final Assignment.docxUploaded byMuhammad Asad Ali
- MPRA Paper 22321Uploaded byVinay Dhingra
- Unit 5 Theory BiostUploaded byKarthi Shanmugam
- ch03Uploaded byFerdinand Macol
- Gravity DummiesUploaded byRicardo Caffe
- october calendar 18-19 ap stats pdfUploaded byapi-344176657
- Learning Unit 7Uploaded byBelindaNiemand
- Azizur_paper in New Template_Work_CX - Final EditUploaded byKamel Mehali

- Final - Mamas WearUploaded bySyeda Raiha Raza Gardezi
- siness Ethics Week 7 Lec 1Uploaded bySyeda Raiha Raza Gardezi
- Business Ethics Week 5 Lec 1Uploaded bySyeda Raiha Raza Gardezi
- Business Ethics Week 5 2Uploaded bySyeda Raiha Raza Gardezi
- Business Ethics Week 4 1Uploaded bySyeda Raiha Raza Gardezi
- Defining Sequencing VariablesUploaded bySyeda Raiha Raza Gardezi
- Project on Gourmet Final Edited 2Uploaded bySyeda Raiha Raza Gardezi
- Quiz 13Uploaded bySyeda Raiha Raza Gardezi
- Regression ExUploaded bySyeda Raiha Raza Gardezi

- Performance Factor Impact on Profitability of Cementmanufacturing Sector in PakistanUploaded byMuhammed Amjad Islam
- 06 SM_Ch04.docUploaded byST24
- Inference in Regression CoefficientsUploaded byJaica Mangurali Tumulak
- Statistics 03Uploaded byAhmed Kadem Arab
- Students Tutorial Answers Week7Uploaded byHeoHamHố
- Chapter 1Uploaded bySaket Maheshwari
- Generate Two Correlated NoiseUploaded bylaleh_badri
- Latent Class RegresionUploaded byIrvaniUtami
- Solutions PartUploaded byLibyaFlower
- Stats for Non-staticiansUploaded byAthina Mardha
- Chapter 2 - Basic Tools for ForecastingUploaded byStuti Madaan
- GMMUploaded byShuchi Goel
- bbs13e_chapter11Uploaded byDrNaveed Ul Haq
- Project management: Demand ForecastUploaded byRam Krishna Singh
- Estimation of ParametersUploaded byJealyn Flores
- Cambridge Stats TableUploaded byEmily Tan
- Linear Regression ModelUploaded byJesús Yepez
- HLM in StataUploaded byajp11
- Applied Bayesian Econometrics for Central Bankers Updated 2017Uploaded byAugusto de Lima
- BKM Ch 08 Answers w CFA.docxUploaded byzy
- Multiple RegressionUploaded byruthpalupi
- 04-SAS for Statistical GeneticsUploaded byReza Zulfahmi
- Repeat Anova LabkomUploaded byYusi Rizky
- ANOVA-RMUploaded byhubik38
- Identification StrategiesUploaded byjameswen627
- Chapter 09Uploaded byAtin Zai
- statasession3_econ4136Uploaded byjuchama4256
- Parametric TestUploaded byElmalyn Bernarte
- Residuals Statisticsa.docxUploaded bymuklis
- Using Real Data For Decision AnalysisUploaded byphi slamma jamma