50 views

Uploaded by Xiao Ho

Albright DADM 5e_PPT_Ch 10

- About the MS Regress Package
- A Manual Method for Applying the Hansch Approach to Drug Design
- bbs13e_chapter13
- 2004-Pausas-FEM-Pinus-halepensis-regeneration
- Cost Estimation Predictive Modeling Regression Verses NN
- Apartment Good & Bad Time
- Effectiveness of Online Advertisement on Consumer Behavior of University Students in Bangladesh
- An Analysis of the Relationships between Educational Achievement and Home Environment
- Linear Regression
- BAB 7 Multiple Regression and Other Extensions of the Simple
- Econometrics assignment
- Ch_21.pdf
- Study the Effects of Unexpected Disaster on Negative Trends of Rural Development Case Study: Varzaghan City’s Earthquakes (2012), Iran
- Tugas
- New Syllabus Research Asst e Ar Dept Pg Std
- math x3
- Kelly 2006
- Lecture 8 on Simple Linear Regression
- Data Aaa Spss
- CHAPTER 5.pptx

You are on page 1of 51

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or

duplicated, or posted to a publicly accessible website, in whole or in part.

BUSINESS ANALYTICS:

DATA ANALYSIS AND

DECISION MAKING

Relationships

Introduction

(slide 1 of 2)

between variables.

There are two potential objectives of regression

analysis: to understand how the world operates and

to make predictions.

Two basic types of data are analyzed:

approximately the same period of time from a population.

Time series data involve one or more variables that are

observed at several, usually equally spaced, points in

time.

Time

valuesa property called autocorrelationwhich adds

complications to the analysis.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Introduction

(slide 2 of 2)

we are trying to explain or predict, called the dependent

variable.

one or more explanatory variables.

called simple regression.

If there are several explanatory variables, it is called

multiple regression.

Regression can be linear (straight-line relationships) or

nonlinear (curved relationships).

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Scatterplots:

Graphing Relationships

begin regression analysis.

A scatterplot is a graphical plot of two

variables, an X and a Y.

If there is any relationship between the

two variables, it is usually apparent from

the scatterplot.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Example 10.1:

Drugstore Sales.xlsx

(slide 1 of 2)

between promotional expenditures and sales at Pharmex.

Solution: Pharmex has collected data from 50 randomly

selected metropolitan regions.

There are two variables: Pharmexs promotional

expenditures as a percentage of those of the leading

competitor (Promote) and Pharmexs sales as a

percentage of those of the leading competitor (Sales).

A partial listing of the data is shown below.

Example 10.1:

Drugstore Sales.xlsx

(slide 2 of 2)

procedure to create a scatterplot.

axis because the store believes that large promotional

expenditures tend to cause larger values of sales.

Example 10.2:

Overhead Costs.xlsx

(slide 1 of 3)

among overhead, machine hours, and production runs at

Bendrix.

Solution: Data file contains observations of overhead costs,

machine hours, and number of production runs at Bendrix.

Each observation (row) corresponds to a single month.

Example 10.2:

Overhead Costs.xlsx

(slide 2 of 3)

explanatory variable (Machine Hours and

Production Runs) and the dependent

variable (Overhead).

Example 10.2:

Overhead Costs.xlsx

(slide 3 of 3)

creating a time series graph for any of the

variables.

Check for relationships among the multiple

explanatory variables (Machine Hours

versus Production Runs).

may not be obvious otherwise.

The typical relationship you hope to see is a straight-line,

or linear, relationship.

This doesnt mean that all points lie on a straight line, but that

the points tend to cluster around a straight line.

clearly nonlinear.

Outliers

(slide 1 of 2)

outliersobservations that fall outside of the

general pattern of the rest of the observations.

of interest, then it is probably best to delete it from

the analysis.

If it isnt clear whether outliers are members of the

relevant population, run the regression analysis with

them and again without them.

If

is probably best to report the results with the outliers

included.

Otherwise, you can report both sets of results with a

verbal explanation of the outliers.

2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible

Outliers

(slide 2 of 2)

top right) is the company CEO, whose salary is

well above that of all of the other employees.

Unequal Variance

depends on the value of the explanatory variable.

The figure below illustrates an example of this.

amount spent increases as salary increaseswhich is evident

from the fan shape.

linear regression analysis, but there are ways to deal with it.

No Relationship

is no relationship between a pair of

variables.

scatterplot appears as a shapeless swarm

of points.

Correlations: Indicators of

Linear Relationships (slide 1 of 2)

indicate the strength of linear relationships between

pairs of variables.

that summarizes the information in a scatterplot.

It measures the strength of linear relationships only.

The usual notation for a correlation between variables X

and Y is rxy.

The numerator of the equation is also a measure of

association between X and Y, called the covariance

between X and Y.

because it depends on the units of measurement.

Correlations: Indicators of

Linear Relationships (slide 2 of 2)

plus or minusyou can tell whether the two

variables are positively or negatively related.

Unlike covariances, correlations are completely

unaffected by the units of measurement.

linear relationship.

A correlation with magnitude close to 1 indicates a strong

linear relationship.

A correlation equal to -1 (negative correlation) or

+1 (positive correlation) occurs only when the linear

relationship between the two variables is perfect.

relevant descriptors only for linear relationships.

linear relationships and the strengths of

these relationships, but they do not

quantify them.

Simple linear regression quantifies the

relationship where there is a single

explanatory variable.

A straight line is fitted through the

scatterplot of the dependent variable Y

versus the explanatory variable X.

(slide 1 of 2)

the line that makes the vertical distance from the points

to the line as small as possible.

A fitted value is the predicted value of the dependent

variable.

explanatory value.

(slide 2 of 2)

Observed Value = Fitted Value + Residual

The best-fitting line through the points of a scatterplot is

the line with the smallest sum of squared residuals.

It is the line quoted in regression outputs.

and intercept.

Drugstore Sales.xlsx (slide 1 of 2)

least squares line for sales as a function of promotional expenses

at Pharmex.

Solution: Select Regression from the StatTools Regression and

Classification dropdown list.

Use Sales as the dependent variable and Promote as the

explanatory variable.

The regression output is shown below and on the next slide.

Drugstore Sales.xlsx (slide 2 of 2)

Predicted Sales = 25.1264 + 0.7623Promote

Overhead Costs.xlsx (slide 1 of 2)

Objective: To use the StatTools Regression procedure to regress

overhead expenses at Bendrix against machine hours and then

against production runs.

Solution: The Bendrix manufacturing data set has two potential

explanatory variables, Machine Hours and Production Runs.

The regression output for Overhead with Machine Hours as the

single explanatory variable is shown below.

Overhead Costs.xlsx (slide 2 of 2)

explanatory variable is shown below.

Predicted Overhead = 48621 + 34.7MachineHours

Predicted Overhead = 75606 + 655.1ProductionRuns

useful the regression line is for predicting Y values from X values.

Because there are numerous residuals, it is useful to summarize

them with a single numerical measure.

denoted se.

It is essentially the standard deviation of the residuals.

It is given by this equation:

to the standard error of estimate.

In general, the standard error of estimate indicates the level of

accuracy of predictions made from the regression equation.

of fit of the least squares line.

dependent variable explained by the regression.

It always ranges between 0 and 1.

The better the linear fit is, the closer R2 is to 1.

Formula for R2:

In simple linear regression, R2 is the square of

the correlation between the dependent variable

and the explanatory variable.

Multiple Regression

explanatory variables could be included in the regression

equation. This is the realm of multiple regression.

If there are two explanatory variables, you are fitting a plane

to the data in three-dimensional space.

The regression equation is still estimated by the least

squares method, but it is not practical to do this by hand.

There is a slope term for each explanatory variable in the

equation, but the interpretation of these terms is different.

The standard error of estimate and R2 summary measures

are almost exactly as in simple regression.

Many types of explanatory variables can be included in the

regression equation.

the explanatory variables, then a typical multiple

regression equation has the form shown below, where

a is the Y-intercept, and b1 through bk are the slopes.

Predicted Y = a + b1X1 + b2X2 + + bkXk

regression coefficients.

Each slope coefficient is the expected change in Y

when this particular X increases by one unit and the

other Xs in the equation remain constant.

other Xs are included in the regression equation.

Overhead Costs.xlsx

Objective: To use StatToolss Regression procedure to estimate the

equation for overhead costs at Bendrix as a function of machine

hours and production runs.

Solution: Select Regression from the StatTools Regression and

Classification dropdown list. Then choose the Multiple option and

specify the single D variable and the two I variables.

The coefficients in the output below indicate that the estimated

regression equation is: Predicted Overhead = 3997 + 43.54Machine

Hours + 883.62Production Runs.

Estimate and R-Square

The multiple regression output is very similar to simple

regression output.

The standard error of estimate is essentially the standard

deviation of residuals, but it is now given by the equation

below, where n is the number of observations and k is the

number of explanatory variables:

dependent variable explained by the combined set of

explanatory variables, but it has a serious drawback: It can only

increase when extra explanatory variables are added to an

equation.

Adjusted R2 is an alternative measure that adjusts R2 for the

number of explanatory variables in the equation.

belong in the equation.

Modeling Possibilities

can be included in regression equations:

Dummy variables

Interaction variables

Nonlinear transformations

to modeling the relationship between a

dependent variable and potential

explanatory variables.

produce much better fits than you could

Dummy Variables

cannot be measured on a quantitative scale.

dependent variable, so they need to be included in the

regression equation.

A dummy variable is a variable with possible values of 0 and 1.

It is also called a 0-1 variable or an indicator variable.

It equals 1 if the observation is in a particular category, and 0

if it is not.

When there are more than two categories (example: quarters)

In

Example 10.3:

Bank Salaries.xlsx

(slide 1 of 3)

analyze whether the bank discriminates against females in

terms of salary.

Solution: Data set includes the following variables for each

of the 208 employees of the bank: Education (categorical),

Grade (categorical), Years1 (years with this bank), Years2

(years of previous work experience), Age, Gender

(categorical with two values), PCJob (categorical yes/no),

Salary.

Example 10.3:

Bank Salaries.xlsx

(slide 2 of 3)

categorical variables, using IF functions or

the StatTools Dummy procedure.

Then run a regression analysis with Salary as

the dependent variable, using any

combination of numerical and dummy

explanatory variables.

Education) that the dummies are based on.

Always use one fewer dummy than the number

of categories for any categorical variable.

Example 10.3:

Bank Salaries.xlsx

(slide 3 of 3)

appears below.

Interaction Variables

equation, like the one above, you are allowing the

intercepts of the two lines to differ, but you are forcing

the lines to be parallel.

To be more realistic, you might want to allow them to

have different slopes.

You can do this by including an interaction variable.

variables.

Include an interaction variable in a regression equation if you

believe the effect of one explanatory variable on Y depends

on the value of another explanatory variable.

Bank Salaries.xlsx (slide 1 of 2)

Objective:

see whether the effect of years of experience on salary is different

across the two genders.

Solution: First, form an interaction variable that is the product of

Years 1and Female, using an Excel formula or the Interaction option

from the StatTools Data Utilities dropdown menu.

Include the interaction variable in addition to the other variables in the

regression equation.

The multiple regression output appears below.

Bank Salaries.xlsx (slide 2 of 2)

shown graphically below.

Nonlinear Transformations

The general linear regression equation has the form:

Predicted Y = a + b1X1 + b2X2 + + bkXk

It is linear in the sense that the right side of the

equation is a constant plus a sum of products of

constants and variables.

The variables can be transformations of original

variables.

because of curvature detected in scatterplots.

You can transform the dependent variable Y or any of the

explanatory variables, the Xs. Or you can do both.

Typical nonlinear transformations include: the natural

logarithm, the square root, the reciprocal, and the square.

Example 10.4:

Cost of Power.xlsx

(slide 1 of 3)

nonlinear function of demand, and if it is, what form the

nonlinearity takes.

Solution: The data set lists the number of units of electricity

produced (Units) and the total cost of producing these (Cost) for

a 36-month period.

Start with a scatterplot of Cost versus Units.

Example 10.4:

Cost of Power.xlsx

(slide 2 of 3)

values.

The negative-positive-negative behavior of residuals suggests

a parabolathat is, a quadratic relationship with the square

of Units included in the equation.

Create a new variable (Units)^2 in the data set and then use

multiple regression to estimate the equation for Cost with

both Units and (Units)^2 included.

Example 10.4:

Cost of Power.xlsx

(slide 3 of 3)

on the scatterplot. This curve is shown below, on the left.

Finally, try a logarithmic fit by creating a new variable,

Log(Units), and then regressing Cost against this variable. This

curve is shown below, on the right.

widely in regression analysis is that they are fairly easy to interpret.

A logarithmic transformation of Y is often useful when the

distribution of Y values is skewed to the right.

Bank Salaries.xlsx (slide 1 of 2)

Objective:

logarithm of salary as the dependent variable.

Solution: The distribution of salaries of the 208 employees

shows some skewness to the right.

First, create the Log(Salary) variable.

Then run the regression, with Log(Salary) as the dependent

variable and Female and Years 1 as the explanatory variables.

Bank Salaries.xlsx (slide 2 of 2)

The

general:

are not directly comparable. They are percentages

explained of different variables.

The se values with Y and Log(Y) as dependent variables

are usually of totally different magnitudes. To make the

se from the log equation comparable, you need to go

through the procedure described in the example so that

the residuals are in original units.

To interpret any term of the form bX in the log equation,

you should first express b as a percentage. Then when X

increases by one unit, the expected percentage change

in Y is approximately this percentage b.

Constant Elasticity

Relationships

firm grounding in economic theory is the constant

elasticity relationship.

It has the form shown in the equation below:

The effect of a one-unit change in any X on Y depends on

the levels of the other Xs in the equation.

The dependent variable is expressed as a product of

explanatory variables raised to powers.

When any explanatory variable X changes by 1%, the

predicted value of the dependent variable changes by a

constant percentage, regardless of the value of

this X or the values of the other Xs.

Example 10.5:

Car Sales.xlsx

(slide 1 of 2)

Objective:

estimate a multiplicative relationship for automobile sales as a function

of price, income, and interest rate.

Solution: The data set contains annual data on domestic auto sales in

the United States.

Variables include: Sales (in units), Price Index (consumer price index of

transportation), Income (real disposable income), and Interest (prime rate).

First,

Then run a multiple regression with Log(Sales) as the dependent variable

and Log(Price Index), Log(Income), and Log(Interest) as the explanatory

variables.

Example 10.5:

Car Sales.xlsx

(slide 2 of 2)

production time (or cost) to the

cumulative volume of output since the

production process first began.

times tend to decrease by a relatively

constant percentage every time cumulative

output doubles.

This constant is often called the learning rate.

Equation for Learning Rate (where LN refers

to the natural logarithm):

Example 10.6:

Learning Curve.xlsx

(slide 1 of 2)

equation to estimate the learning rate for

production time.

Solution: Data set contains the times (in hours)

to produce each batch of a new product at

Presario Company.

Example 10.6:

Learning Curve.xlsx

(slide 2 of 2)

creating a scatterplot of Log(Time) versus Log(Batch). The multiplicative

model implies that it should be approximately linear.

Log(Batch). The resulting equation is:

Now solve for the learning rate (multiply through by LN(2) and then take

antilogs).

To see if the regression equation will be successful in

predicting new values of the dependent variable, split the

original data into two subsets: one for estimation and one

for validation.

Then the values of the explanatory variables from the second

subset are substituted into the equation to obtain predicted

values for the dependent variable.

Finally, these predicted values are compared to the known

values of the dependent variable in the second subset.

If the agreement is good, there is reason to believe that the

regression equation will predict well for the new data.

Overhead Costs Validation.xlsx

Objective: To validate the original Bendrix regression for

making predictions at another plant.

Solution: Bendrix would like to predict overhead costs for

another plant by using data on machine hours and production

runs at this second plant.

The first step is to see how well the regression from the first

plant fits data from the other plant.

- About the MS Regress PackageUploaded byDom DeSicilia
- A Manual Method for Applying the Hansch Approach to Drug DesignUploaded byBen Josey
- bbs13e_chapter13Uploaded byDrNaveed Ul Haq
- 2004-Pausas-FEM-Pinus-halepensis-regenerationUploaded byjuli
- Cost Estimation Predictive Modeling Regression Verses NNUploaded byMubanga
- Apartment Good & Bad TimeUploaded byRaffiai Abdullah
- Effectiveness of Online Advertisement on Consumer Behavior of University Students in BangladeshUploaded bySajeed Mahmud Mahee
- An Analysis of the Relationships between Educational Achievement and Home EnvironmentUploaded bykavita karia
- Linear RegressionUploaded byCOOOLKHATRI
- BAB 7 Multiple Regression and Other Extensions of the SimpleUploaded byClaudia Andriani
- Econometrics assignmentUploaded byCurtis Miller
- Ch_21.pdfUploaded bymahadabrata21
- Study the Effects of Unexpected Disaster on Negative Trends of Rural Development Case Study: Varzaghan City’s Earthquakes (2012), IranUploaded byIjsrnet Editorial
- TugasUploaded bySulTan Zhayla
- New Syllabus Research Asst e Ar Dept Pg StdUploaded bySam Selin
- math x3Uploaded byapi-302784238
- Kelly 2006Uploaded byĐạt Nguyễn
- Lecture 8 on Simple Linear RegressionUploaded bykurikong111
- Data Aaa SpssUploaded byLolita
- CHAPTER 5.pptxUploaded byNANA
- Slide Mancini Eco No Metrics EPFL2011Uploaded byzcahfm8
- Assessing the Impact of Total QualityUploaded bySuciIedalUP
- Assignment 2.docxUploaded byFasih Ur Rasheed
- G 3 Everett_p205Uploaded byEstrellaVentura
- 12 Shibly RahmanUploaded bymailmaverick8167
- Census Analysis Project STAT 330Uploaded byRobert Lee
- Uji Regresi Linier Sederhana 2Uploaded bySigit Tri Laksito
- STA302_Mid_2012SUploaded byexamkiller
- SierUploaded byChandrajeet Chavhan
- ASB 3303Uploaded byFaisal Derbas

- IM mc week 6Uploaded byXiao Ho
- Bienvenido!Uploaded byXiao Ho
- 34-1 07 CosbeyUploaded byXiao Ho
- W13 Property Tax Comp 2015Uploaded byXiao Ho
- Trainer Tools Basic Customer Care Case StudyUploaded bykiranremo
- Secretary Treasurer DutiesUploaded byXiao Ho
- BBAIM 4-Year Structure for the 2013-14 Intake_revised on 14Jul2014_pending for Approval - For O'DayUploaded byXiao Ho
- Reflection Paper Example6Uploaded byXiao Ho
- US Deloittereview Sustainability 2.0 Jan12Uploaded byXiao Ho
- Final Business EthicsUploaded byXiao Ho
- IIRP Reflection Tip SheetUploaded byXiao Ho
- AdvantageUploaded byXiao Ho
- Quiz 1 0506b Key 1 for BlackboardUploaded byXiao Ho
- Topic 3 CheatsheetUploaded byXiao Ho
- Topic 2 CheatsheetUploaded byXiao Ho
- Topic 1 CheatsheetUploaded byXiao Ho
- Phonics InitialsUploaded byXiao Ho
- Phonics FinalsUploaded byXiao Ho
- Hanzi WritingUploaded byXiao Ho
- Tone Practice 1 Written MaterialUploaded byXiao Ho
- 4 Tones of PutonghuaUploaded byXiao Ho
- 37 Basic FinalsUploaded byXiao Ho
- 23 InitialsUploaded bySheren Devina
- IM mc week 4Uploaded byXiao Ho
- Bi ConceptsUploaded byKamal Kannan G
- is2136_ch1-3Uploaded byXiao Ho
- ACC101Uploaded byJamie Catherine Go

- chapter 09Uploaded bySyed Ahmed Sakib
- Manual AA MazdaUploaded byingenierognv
- Extractive Metallurgy of IronUploaded byBharichalo007
- Rfc 2475Uploaded byNishant Mishra
- Introduction to German Idealism - 5(26) TextsUploaded byRobbert Adrianus Veen
- Ternary Tree Asynchronous Interconnect Network for GALS' SOCUploaded byAnonymous e4UpOQEP
- Quantum Mechanics TheoryUploaded byPridana Ynwa
- Excel 2007VBAUploaded byjiguparmar1516
- HP-UX 11i v3 Congestion Control Management for StorageUploaded byMuhamad Noor Chik
- Dld ManualUploaded byDani Chaudhry
- 10.1103@physrev.100.580Uploaded byRenato Bertoni
- Scratchpad from firefoxUploaded byCharles Stewart
- IBM II13354_ MOVING ICF CATALOGS - STEP BY STEP SCENARIOS, INCLUDING SHARED CATALOGS - United StatesUploaded byrpelko
- solnUploaded byNakul Goud
- readme.txtUploaded bySerghei Plamadeala
- air linesUploaded byRehman SixTnine
- Block Trace Analysis and Storage System OptimizationUploaded byAndras Szilagyi
- Ball Valve Seat Seal Injection SystemUploaded bymudrijasm
- V2I1_IJERTV2IS1264 (1)Uploaded byswapnil kale
- Hyperbola.pdfUploaded byRaju Raju
- Ansys Autodyn 120 Workshop 06Uploaded byAhmed Hanafy
- 2. Inventory Numericals SolvedUploaded bywahab_pakistan
- Idamalayar Hydro Electric ProjectUploaded byRajesh TK
- Aboh 2010.pdfUploaded byRafael Berg
- E 1108Uploaded byEmrahCayboylu
- Carbon, Alloy and Stainless Steel Pipes - ASME_ANSI B36Uploaded byEdward Coraspe
- Andrew's Geometry Essential Questions CirclesUploaded byMattbrooks33
- Combined Applications of Mems and Nano Technology in Obtaining High Data Storage DensityUploaded byvaalgatamilram
- Models of Communication (1)Uploaded byRhislee Pabriga
- 2009 Crimp Data and Dies Manual FinalUploaded byomni_parts