249 views

Uploaded by Vivek Garg

- Repeatability Standard Deviation
- Correlation & Regression
- Matrix Diagram
- Multiple Linear Regression in Data Mining
- The Unscrambler Methods
- Training Older Employees
- Impact of Office Design on Employees’ Productivity- A Case study of Banking
- Simple Linear Regression
- Lab 4 Regression
- Effects of International Institutional Factors on Earnings Qualit تابع التطبيق.pdf
- Cph 14
- 2. Bivariate Distributions and Estimating Relations (1)
- Linear Regression.docx
- ME12 Ch. 5 (Recovered)
- SDA 3E Chapter 6 (1)
- Final Project
- reg corre
- Chapter 2 correlation and regression.doc
- The Role of Employee Relations in The
- medium

You are on page 1of 6

A scatter diagram is a graphical technique used to analyze the relationship between two

variables.One variable is plotted on the horizontal axis and the other is plotted on the vertical

axis. The pattern of their intersecting points can graphically show relationship patterns. Most

often a scatter diagram is used to prove or disprove cause-and-effect relationships. While the

diagram shows relationships, it does not by itself prove that one variable causes the other. In

addition to showing possible cause and- effect relationships, a scatter diagram can show that two

variables are from a common cause that is unknown or that one variable can be used as a

surrogate for the other.

Use a scatter diagram to examine theories about cause-and-effect relationships and to search for

root causes of an identified problem. Use a scatter diagram to design a control system to ensure

that gains from quality improvement efforts are maintained.

Collect data. Gather 50 to 100 paired samples of data that show a possible relationship.

Draw the diagram. Draw roughly equal horizontal and vertical axes of the diagram, creating a

square plotting area. Label the axes in convenient multiples (1, 2, 5, etc.) increasing on the

horizontal axes from left to right and on the vertical axis from bottom to top. Label both axes.

Plot the paired data. Plot the data on the chart, using concentric circles to indicate repeated data

points.

Title and label the diagram.

Interpret the data. Scatter diagrams will generally show one of six possible correlations

between the variables:

Strong Positive Correlation The value of Y clearly increases as the value of X increases.

Strong Negative Correlation The value of Y clearly decreases as the value of X increases.

Weak Positive Correlation The value of Y increases slightly as the value of X increases.

Weak Negative Correlation The value of Y decreases slightly as the value of X increases.

Complex Correlation The value of Y seems to be related to the value of X, but the

relationship is not easily determined.

No Correlation There is no demonstrated connection between the two variables.

Example

A town planning team, during an investigation of road accidents, identified a number of possible

causes. Three main causes were suspected: the speed of the vehicles, the traffic density and the

local weather conditions. As there was no clear evidence available to support any of these

hypotheses, they decided to measure them, and used Scatter Diagrams to check whether the link

between any of the causes was strong enough to take further action.

In order to get sufficient measures, they made daily measures for two months, using local road

sensors and reports from the ambulance service. Scatter Diagrams were drawn for each possible

cause against the accident count. The results enabled the following conclusion to be made:

• There was a low, positive correlation with traffic density.

• There was an inconclusive correlation with road conditions.

• There was a high, positive correlation with traffic speed, with accidents dropping off

more sharply under 30 mph.

As a consequence, more traffic speed control measures were installed, including signs and

surfaces. This resulted in a measurable decrease in accidents.

Other examples

• A baker suspects that the standing time of the dough is affecting the way it rises. A

Scatter Diagram of rise time against measured bread density shows a fair correlation on

an inverse U-shaped distribution. He thus uses the time at the highest point on the curve

to get the best chance of well-risen bread.

• It is suspected that press temperature is causing rejects in a plastic forming process. A

Scatter Diagram shows a high positive correlation, prompting a redesign of the press,

including the use of more heat-resistant materials. This results in a significant reduction

in the number of rejected pieces.

• A personnel department plots salary against the results of a motivation survey. The result

is a weak negative correlation. A second Scatter Diagram, plotting time in company

against motivation, gives a higher correlation. A motivation program is targeted

accordingly and results in a steady increase in the score given to motivation in the

company's next personnel survey.

Line of regression

Regression

Simple regression is used to examine the relationship between one dependent and one

independent variable. After performing an analysis, the regression statistics can be used to

predict the dependent variable when the independent variable is known. Regression goes beyond

correlation by adding prediction capabilities.

People use regression on an intuitive level every day. In business, a well-dressed man is thought

to be financially successful. A mother knows that more sugar in her children's diet results in

higher energy levels. The ease of waking up in the morning often depends on how late you went

to bed the night before. Quantitative regression adds precision by developing a mathematical

formula that can be used for predictive purposes.

For example, a medical researcher might want to use body weight (independent variable) to

predict the most appropriate dose for a new drug (dependent variable). The purpose of running

the regression is to find a formula that fits the relationship between the two variables. Then you

can use that formula to predict values for the dependent variable when only the independent

variable is known. A doctor could prescribe the proper dose based on a person's body weight.

The regression line (known as the least squares line) is a plot of the expected value of the

dependent variable for all values of the independent variable. Technically, it is the line that

"minimizes the squared residuals". The regression line is the one that best fits the data on a

scatterplot.

Using the regression equation, the dependent variable may be predicted from the independent

variable. The slope of the regression line (b) is defined as the rise divided by the run. The y

intercept (a) is the point on the y axis where the regression line would intercept the y axis. The

slope and y intercept are incorporated into the regression equation. The intercept is usually called

the constant, and the slope is referred to as the coefficient. Since the regression model is usually

not a perfect predictor, there is also an error term in the equation.

In the regression equation, y is always the dependent variable and x is always the independent

variable. Here are three equivalent ways to mathematically describe a linear regression model.

y = a + bx + e

The significance of the slope of the regression line is determined from the t-statistic. It is the

probability that the observed correlation coefficient occurred by chance if the true correlation is

zero. Some researchers prefer to report the F-ratio instead of the t-statistic. The F-ratio is equal to

the t-statistic squared.

The t-statistic for the significance of the slope is essentially a test to determine if the regression

model (equation) is usable. If the slope is significantly different than zero, then we can use the

regression model to predict the dependent variable for any value of the independent variable.

On the other hand, take an example where the slope is zero. It has no prediction ability because

for every value of the independent variable, the prediction for the dependent variable would be

the same. Knowing the value of the independent variable would not improve our ability to

predict the dependent variable. Thus, if the slope is not significantly different than zero, don't use

the model to make predictions.

The coefficient of determination (r-squared) is the square of the correlation coefficient. Its value

may vary from zero to one. It has the advantage over the correlation coefficient in that it may be

interpreted directly as the proportion of variance in the dependent variable that can be accounted

for by the regression equation. For example, an r-squared value of .49 means that 49% of the

variance in the dependent variable can be explained by the regression equation. The other 51% is

unexplained.

The standard error of the estimate for regression measures the amount of variability in the points

around the regression line. It is the standard deviation of the data points as they are distributed

around the regression line. The standard error of the estimate can be used to develop confidence

intervals around a prediction.

Example

expenditures and its sales volume. The independent variable is advertising budget and the

dependent variable is sales volume. A lag time of one month will be used because sales are

expected to lag behind actual advertising expenditures. Data was collected for a six month

period. All figures are in thousands of dollars. Is there a significant relationship between

advertising budget and sales volume?

Indep. Var. Depen. Var

4.2 27.1

6.1 30.4

3.9 25.0

5.7 29.7

7.3 40.1

5.9 28.8

--------------------------------------------------

Standard error of the estimate = 2.637

t-test for the significance of the slope = 3.961

Degrees of freedom = 4

Two-tailed probability = .0149

r-squared = .807

You might make a statement in a report like this: A simple linear regression was performed on

six months of data to determine if there was a significant relationship between advertising

expenditures and sales volume. The t-statistic for the slope was significant at the .05 critical

alpha level, t(4)=3.96, p=.015. Thus, we reject the null hypothesis and conclude that there was a

positive significant relationship between advertising expenditures and sales volume.

Furthermore, 80.7% of the variability in sales volume could be explained by advertising

expenditures.

- Repeatability Standard DeviationUploaded byBalram Ji
- Correlation & RegressionUploaded bygooseberrie
- Matrix DiagramUploaded byKomal Shujaat
- Multiple Linear Regression in Data MiningUploaded byakirank1
- The Unscrambler MethodsUploaded byMostafa Afify
- Training Older EmployeesUploaded byAnonymous I3gfw0tuzk
- Impact of Office Design on Employees’ Productivity- A Case study of BankingUploaded byani ni mus
- Simple Linear RegressionUploaded byHicham Tou Nsi
- Lab 4 RegressionUploaded byYogi Tresno Patriatama
- Effects of International Institutional Factors on Earnings Qualit تابع التطبيق.pdfUploaded byMohamed Kamel Maaty
- Cph 14Uploaded bySulistyono
- 2. Bivariate Distributions and Estimating Relations (1)Uploaded byCharles Bryan M. Alicaya
- Linear Regression.docxUploaded bySantosh Kanasepatil
- ME12 Ch. 5 (Recovered)Uploaded byMarjuree Fuchsia Quibod
- SDA 3E Chapter 6 (1)Uploaded byxinearpinger
- Final ProjectUploaded bySamiullah Saqib
- reg correUploaded bytesfu
- Chapter 2 correlation and regression.docUploaded bySumesh Mirashi
- The Role of Employee Relations in TheUploaded byNoer Prast
- mediumUploaded byHaythem Mzoughi
- c1972eb702a3924c95bd7fe835c8b6f2_e96425aece0ffe07c042f47ee40874b5Uploaded bykarthu48
- Reaser9 7rezeanu p66-782Uploaded byca ta
- Seminario Ecología.pdfUploaded byttii0
- CCP403Uploaded byapi-3849444
- 0140-0039 pdfUploaded byDaniel Hunter
- RegannA14NOV05Refresh21OCT10Uploaded byFelipe Fernandez
- STATISTIK HAKIM.xlsxUploaded byHakim
- FB8916Ch1.pdfUploaded byadane
- MBA2038 1T10 Wk4 ForecastingUploaded byChristine Cueva
- Paper Dispenser in Every BuildingUploaded byAngelica

- discriminant analysisUploaded byMeenakshi Thapar
- design of experimentUploaded byYanjie Zhang
- AP Statistics Semester Exam ReviewUploaded bykiraayumi6789
- The Impact of the Mariel Boatlift on the Miami Labor Market_David CardUploaded byMecobio
- Master ThesisUploaded bykehedsiah
- Penalized RegressionUploaded byPino Bacada
- bmd110226-76-92.pdfUploaded byYoussef Aly
- The Role of Anchoring Bias in the Equity MarketUploaded byZahra Asad Hamdani
- Relationships Between Glucose Excursion and the ActivationUploaded bymannkindfan
- Parkinson’s diseaseUploaded bySamar Hamady
- Internasional Jurnal PoeUploaded byPurnama Putra
- Human Factors- The Journal of the Human Factors and Ergonomics Society-2003-Carnahan-408-23Uploaded byAJay
- 4. F.Y.B.sc. Com.sci. Statistics SyllabusUploaded byananddangade
- Print_EDM_CART_Intro_2013.pdfUploaded bybharadwajdes3
- 137518997 Air TransportationUploaded byNarendra Vaidya
- Jurnal Isol Ing.Uploaded byNurul Anisah
- Leadership Across Hierarchical LevelsUploaded bySwarada Chitale
- rss-grad-diploma-module4-solutions-specimen-b.pdfUploaded byA Pirzada
- Multiple Regression AnalysisUploaded bypolobook3782
- Application of Supervised Learning Methods to Better Predict Building Energy Performance 1 %285%29Uploaded byJosé Daniel Hernández Sánchez
- 11Uploaded byAnonymous 932wiZUNjD
- Multiple Imputation MethodUploaded bySrea Nov
- Blader AreaUploaded bySyed Irtaza
- ch06Uploaded byPratik Shah
- The Impact of Ownership Structure on the Performance ofPalestinian Listed CompaniesUploaded byInternational Journal of Business Marketing and Management
- Prediction and Application of Solar Radiation With Soft Computing Over Traditional and Conventional Approach – a Comprehensive ReviewUploaded byahonenkone
- Exame Final IIUploaded byMariana Gameiro
- IRJET-Rice and Jute yield forecast over Bihar regionUploaded byIRJET Journal
- Deaton Analysis of Household SurveyUploaded byFrancisco Carrillo
- 2006 MBL 3 Research Report WA Van NiekerkUploaded bySCRUPEUSS