AN EXPLORATORY STUDY ON GOAL PROGRAMMING AS AN ALTERNATIVE METHOD TO DEVELOP PREDICTION EQUATIONS

LAU CHIK KONG

A dissertation submitted in partial fulfillment of the requirements for the award of degree of Master of Science (Mathematics)

Faculty of Science Universiti Teknologi Malaysia

OCTOBER, 2004

PSZ 19:16 (Pind. 1/97)

Universiti Teknologi Malaysia

BORANG PENGESAHAN STATUS TESIS*

JUDUL: AN EXPLORATORY STUDY ON GOAL PROGRAMMING AS AN ALTERNATIVE METHOD TO DEVELOP PREDICTION

EQUATIONS

SESI PENGAJIAN: 2004/2005 LAU eRIK KONG

Saya

(HURUF BESAR)

mengaku membenarkan tesis (PSMl8arjana,£[)oktor Falsafah)* ini disimpan di Perpustakaan Universiti Teknologi Malaysia dengan syarat-syarat kegunaan seperti berikut:

l. Tesis adalah hakmilik Universiti Teknologi Malaysia

2. Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan pengajian sahaja.

3. Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi pengajian tinggi.

4. **Sila tandakan (..J)

D SULIT

(Mengandungi makJumat yang berdarjah keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASM1l972)

D TERHAD

(Mengandungi maklumat TERHAD yang telah ditentukan oleh organisasilbadan di mana penyelidikan dijalankan)

D TIDAK TERHAD

Disahkan oleh

(TANDAT ANGAN PENULIS)

(T ANDA 'r ANGAN PENYELIA)

Alamat Tetap:

NCS 50171, 96009

SIBU, SARAWAK.

Dr. Maizah Hura Ahmad

Nama Penyelia ')"/1.0 I ')-00 If

pi '01 xo» If

Tarikh:

Tarikh:

CA TA TAN: * Potong yang tidak berkenaan.

*. Jika tesis ini SULIT atau TERHAO, sila lampirkan surat daripada pihak berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh tesis ini perJu dikelaskan sebagai SULIT atau TERIIAD-

Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara penyelidikan, atau disertasi bagi pengajian secara kerja kursus dan penyelidikan,

atau L.aporan Projek Sarjana Muda (PSM).

"I hereby declare that 1 have read this dissertation and in my opinion this dissertation is sufficient in terms of scope and quality for the award of the degree of Master of Science (Mathematics)."

Signature

................ 1 .. ', ~ J. 7.:: : .

Supervisor

Dr. Maizah Hura Ahmad

Date

I J...jlo / )..CO 'to'

......................................................

11

"I declare that this dissertation entitled "An Exploratory Study On Goal Programming As An Alternative Method To Develop Prediction Equations" is the result of my own research except as cited in references. The dissertation has not been accepted for any degree and is not concurrently submitted in candidature of any degree."

Signature

Name

LAU eRIK KONG

Date

iii

Especially for my loving p8fCDts, dad Lau Beng Tiong and mum Loi Kiik Bee and my young brother, Lau Cbik Muan.

IV

ACKNOWLEDGEMENTS

A very great gratitude and appreciation expressed to all those who make a part to the successful cease of this dissertation with title' An Exploratory Study As An Alternative Method to Develop Prediction Equations', either directly or indirectly.

In particular, I would like to thank my supervisor Dr. Maizah Hura Ahmad, for her dedication, support, useful guidance and advice. Her patience and helps really encourage me to finish this dissertation right on time. Also, I need to thank her for having the trust and confidence in me to handle this dissertation on myself.

Lots of gratitude and special thanks also to my family for their support, advice and encouragement throughout my study in Universiti Teknologi Malaysia (UTM). Last but not least, lost of thanks to my friends for their cooperation and help.

v

ABSTRACT

One of the most promising techniques for multiple objective decision analysis is goal programming. Goal programming is a powerful tool which draws upon the highly developed and tested technique of linear programming, but provides a simultaneous solution to a complex system of competing objectives. Least squares method in regression analysis is also a popular technique used in decision making. It is an approach used in the study of relations between variables, particularly for the purpose of understanding how one variable depends on one or more other variables. However, one of the main problems is that the method ofleast squares is biased by extreme cases. This study proposes goal programming as an alternative to analyze such problems. The analysis were done by using QM for Windows and MINITAB software package.

VI

ABSTRAK

Pengaturcaraan gol adalah satu kaedah yang paling berkesan dalam penganalisisan keputusan objektifberganda. Iajuga merupakan suatu teknik yang lebih baik berbanding pengaturcaraan linear dalam penyelesaian serentak untuk sistem kompleks. Kaedah kuasa dua terkecil dalam analisis regresi juga adalah satu teknik yang terkenal dalam membuat keputusan. Kaedah ini mengkaji hubungan antara pemboleubah terutama dalam memahami bagaimana satu pemboleubah bersandar kepada satu atau lebih pembolehubah yang lain. Bagaimanapun, masalah utama bagi kaedah kuasa dua terkecil ialah pengaruh kes ekstrim. Kajian ini mencadangkan pengaturcaraan gol sebagai kaedah alternatif untuk mengatasi masalah tersebut. Kajian ini menggunakan program QM for Window dan MINITAB dalam analisis.

CHAPTER

CHAPTERl

TABLE OF CONTENTS

SUBJECT

PAGE

COVER I
DECLARATION 11
DEDICATION 111
ACKNOWLEDGEMENTS IV
ABSTRACT V
ABSTRAK VI
TABLE OF CONTENTS VII
LIST OF TABLES X111
LIST OF FIGURES xv
LIST OF SYMBOLS XVI
LIST OF APPENDICES XIX RESEARCH FRAMEWORK

1.0 Introduction 1
1.1 Research Background 2
1.2 Objectives of the Study 3
1.3 Importance of the Study 3
1.4 Scopes of the Study 3
1.5 Thesis Organization 4 VII

1

CHAPTER 2

RESEARCH METHODOLOGY

2.0 Introduction 5
2.1 Regression 5
2.1.1 Definition of Regression 5
2.1.2 The Purposes and Benefits
of Regression Analysis 6
2.1.3 Simple Regression 7
2.1.4 Possible Criteria for Fitting
aLine 10
2.1.5 Using Residuals to Test
the Assumption of the
Regression Model 12
2.2 Multiple Regression 13
2.2.1 The Mathematical Model 16
2.3 The Method of Least Squares 17
2.3.1 Polynomial Least Squares
Fitting 18
2.4 The Least Squares Line 19
2.4.1 Residuals of Least
Squares Line 21
2.5 Linear Multiple Regression Least
Squares 22
2.5.1 Residuals of Multiple
Regression Least Squares 25
2.6 Linear Goal Programming 25
2.6.1 History of Goal
Programming 26
2.6.2 Advantages and
Disadvantages of
the Goal Programming 26
2.7 Goal Programming Model
Formulation 27 viii

5

IX
2.7.1 Preemptive Goal
Programming 29
2.7.2 Weighted Goal
Programming 30
2.8 Solution Method of Goal
Programming 36
2.8.1 The Graphical Method 36
2.8.2 The Modified Simplex
(Multiphase) Method 40
2.9 Goal Programming Complications 53
2.9.1 Negative Right-hand Side
Value 53
2.9.2 A tie in Selecting the
Incoming Variable
(pivot Column) 53
2.9.3 A Tie in Selecting the
Outgoing Variable
(pivot Row) 54
2.9.4 Alternative Optimal
Solutions 54
2.9.5 An Infeasible Problem 54
2.9.6 An Unbounded Solutions 55
2.10 SensitivitylPost Optimality
Analysis 55
2.10.1 A Change in Vs 56
2.10.2 A Change in u, 57
2.10.3 A Change in hi 58
2.10.4 A Change inYi,s 59
2.10.5 Additional of a New Goal 59
2.10.6 Additional of a New
Decision Variable 60
2.11 Regression Analysis for
Determining Relative Weighting or Goal Constraint Parameter Estimation

2.12 Summary

CHAPTER 3 CASE STUDY ON USING LEAST
SQUARES METHOD 62
3.0 Introduction 62
3.1 Background of Data 62
3.2 Outliers 66
3.2.1 Box Plot
(Box and Whisker Plots) 67
3.2.2 Existence of Outliers 69
3.3 Analysis Using Least Squares
Method 74
3.3.1 Analysis on Data Set 1 74
3.3.2 Analysis on Data Set 2 77
3.3.3 Analysis on Data Set 3 79
3.4 Concluding Remarks 80 CHAPTER 4

CASE STUDY ON USING GOAL PROGRAMMING

4.0 Introduction 81
4.1 Converting Least Squares Problem
into a Goal Programming 81
4.2 Analysis Using Goal Programming 82
4.2.1 Analysis on Data Set 1 82
4.2.2 Analysis on Data Set 2 87
4.2.3 Analysis on Data Set 3 93
4.3 Concluding Remarks 97 x

60 61

81

CHAPTERS

CHAPTER 6

COMPARISON BETWEEN LEAST SQUARES METHOD AND GOAL PROGRAMMING

98

5.0 Introduction 98
5.1 Mean Absolute Percentage Error
(MAPE) 99
5.2 MAPE for Data Set 1 Analyzed
Using Least Squares Method and
Goal Programming 100
5.2.1 Discussion of the Results:
Data Set 1 103
5.3 MAPE for Data Set 2 Analyzed
Using Least Squares Method and
Goal Programming 104
5.3.1 Discussion of the Results:
Data Set 2 108
5.4 MAPE for Data Set 3 Analyzed
Using Least Squares Method and
Goal Programming 109
5.4.1 Discussion of the Results:
Data Set 3 112
5.5 Discussion of the Results 112
5.6 Concluding Remarks 114 CONCLUSIONS AND SUGGESTIONS FOR FUTURE INVESTIGATION

6.0 Introduction

6.1 Conclusions

6.2 Suggestions for Future Investigation

Xl

115

115 115

118

XlI

REFERENCES

120

APPENDICES

124

Appendix A - F

125 -153

...

Xl11

LIST OF TABLES

TABLE NO.

TITLE

PAGE

2.1 Seven observations of annual starting salary and
grade-point average 8
2.2 Procedure for achieving a goal 29
2.3 Data for single-goal model 32
2.4 The general initial modified simplex tableau 41
2.5 The general initial simplex tableau when the system
constraints exist 44
2.6 The initial simplex tableau 48
2.7 The second tableau 50
2.8 The third tableau 51
2.9 The fourth tableau 51
2.10 The final tableau 52
3.1 The data of mass and elemental carbon 63
3.2 The data of enrollment, number of mailings and lead time 64
3.3 The data of gold, copper, silver and aluminium 65
5.1 The predicted values and errors for set 1 101
5.2 The predicted values and errors for set 1 with mild
outlier removed 102
5.3 The predicted values and errors for set 1 with extreme
outlier removed 102
5.4 The predicted values and errors for set 1 with both
mild and extreme outliers removed 103
5.5 MAPE for data set 1 103
5.6 The predicted values and errors for set 2 105 XlV
5.7 The predicted values and errors for set 2 with the first
mild outlier removed 106
5.S The predicted values and errors for set 2 with the second
mild outlier removed 106
5.9 The predicted values and errors for set 2 with the third
mild outlier removed 107
5.10 The predicted values and errors for set 2 with all the
mild outliers removed 107
5.11 MAPE for data set 2 lOS
5.12 The predicted values and errors for set 3 109
5.13 The predicted values and errors for set 3 with the first
extreme outlier removed 110
5.14 The predicted values and errors for set 3 with the second
extreme outlier removed 111
5.15 The predicted values and errors for set 3 with both
extreme outliers removed 111
5.16 MAPE for data set 3 112
5.17 MAPE for data set containing outliers 113 LIST OF FIGURES

FIGURE NO.

PAGE

TITLE

2.1 Scatter of the data on annual starting salary and
grade-point average 8
2.2 Straight line fit based on inspection of the data 10
2.3 Typical error in fitting points with a line 11
2.4 The weakness of using L (1'; - f; ) to fit a line 12
2.5 Scatter of demand and price 14
2.6 Effect of price on demand, holding advertising constant 15
2.7 The multiple regression plane 24
2.8 Achievement of the system constraints 38
2.9 Achievement of the profit goal 39
2.10 Achievement of the profit and workface goals 40
3.1 Box plots 69
3.2 Comparative box plot for mass (pg/cm2) 70
3.3 Comparative box plot for number of mailings (x 1000) 71
3.4 Comparative box plot for silver ($ per oz) 73
3.5 Comparative box plot for aluminium (cents per lb) 74
3.6 The scatter plot for mass (pg/cm2) depend on elemental
carbon (pg/cm2) 75 xv

a

b b

e

GP

i

iqr

j

k

K

L

LS MAD - MAPEMPE - MSE -

n

p

p(x) QM RMSE-

s

S

x y

LIST OF SYMBOLS

Intercept of response variable, y Estimate regression coefficient Vector of b

Error / residual term Goal programming 1,2, ... , m Interquartile range

1,2, , n

1,2, , K

The total number of preemptive priority factors Minimize function

Least squares

Mean absolute deviation

Mean absolute percentage error Mean percentage error

Mean square error

Number of prediction / forecasts

The total number of system constraints Polynomial function / fitting curve Quantitative method

Root mean square error

1,2, ... , S

The total number of the decision and deviational variables where S=n + 2m

Independent (predictor) variable Dependent (response) variable

XVl

XVll

n "

L (1'; - 1';) - Sum of errors

/=1

i'

Objective function

Parameter of regression equation The pivot row

The pivot column

Independent variable of k

z Px

s'

~ " 2

L..J (1'; - 1';) - Sum of the squares of the error / least squares I ordinary least squares

1=1

The coefficient associated with variable j in the ith goal Artificial variable

The associated right hand side value

The column for assigning the preemptive priority factors and weights to the basic variables

The row for assigning the preemptive priority factors and weights to the basic and nonbasic variables

Excess or surplus variable

level of achievement of the goal in priority k, where g = (g1, g2, ... , gJJ

Super priority factor/artificial objective function The priority factor of the kth goal

The index number for priority k under sth basic or nonbasic variable Slack variable

The function of preemptive factors and weights associated with the ith basic variable

The function of preemptive priority factors and weights associated with the sth basic or nonbasic variable

Basic variable

The jth decision variable

Element in the ith row under the sth basic or nonbasic variable. That is, the coefficient of the sth basic or nonbasic variable in goal i

The index row

Negative deviational variable from ith goal (underachievement)

Ui

Yi,s

Positive deviational variable from ith goal (overachievement) Positive numerical weight assigned to the negative deviational variable, d; of the ith constant

Positive numerical weight assigned to the positive deviational variable, d, + of the ith constant

The optimal value of decision variables

New value of b,

w/

Xj*

New value of gk

New value of rk,s

New value of Uj

New value of vs

Predicted / estimated value

New value of Yj,s

xviii

XIX

LIST OF APPENDICES

APPENDIX

TITLE

PAGE

Al The Output of MINIT AB for Data Set 2
(remove first mild outlier) 125
A2 The Output of MINIT AB for Data Set 2
(remove second mild outlier) 125
A3 The Output of MINIT AB for Data Set 2
(remove third mild outlier) 126
A4 The Output of MINIT AB for Data Set 2
(remove all mild outliers) 126
BI The Output of MINIT AB for Data Set 3
(contain outliers) 127
B2 The Output of MINIT AB for Data Set 3
(remove first extreme outlier) 127
B3 The Output of MINIT AB for Data Set 3
(remove second extreme outlier) 128
B4 The Output of MINI TAB for Data Set 3
(remove both extreme outliers) 128
Cl Solution from QM for Window for Data Set 1
(contain outliers) 129
C2 Solution from QM for Window for Data Set 1
(remove mild outlier) 130
C3 Solution from QM for Window for Data Set 1
(remove extreme outlier) 132
C4 Solution from QM for Window for Data Set 1
(remove both mild and extreme outliers) 133 xx
Dl Solution from QM for Window for Data Set 2
(contain outliers) 135
D2 Solution from QM for Window for Data Set 2
(remove first mild outlier) 137
D3 Solution from QM for Window for Data Set 2
(remove second mild outlier) 139
D4 Solution from QM for Window for Data Set 2
(remove third mild outlier) 141
D5 Solution from QM for Window for Data Set 2
(remove all mild outliers) 143
El Solution from QM for Window for Data Set 3
(contain outliers) 145
E2 Solution from QM for Window for Data Set 3
(remove first extreme outlier) 146
E3 Solution from QM for Window for Data Set 3
(remove second extreme outlier) 147
E4 Solution from QM for Window for Data Set 3
(remove both extreme outliers) 148
F Terminology 150 CHAPTER!

RESEARCH FRAMEWORK

1.0 Introduction

Many decision problems involve multiple objectives. Often, these objectives are conflicting with each other. A number of techniques have been proposed for multiple-objective decision making. One of the most promising techniques for multiple objective decision analysis is goal programming. Goal programming is a powerful tool which draws upon the highly developed and tested technique of linear programming, but provides a simultaneous solution to a complex system of competing objectives (Lee,1981). Goal programming can handle decision problems having a single goal with multiple subgoals.

Generally, many decision problems in organizations which involve multiple objectives are not easy to analyze by optimization techniques such as linear programming. Multiple-criteria decision making (MCDM) or multiple-objective decision making (MODM) has been a popular topic of management science during the past decade. A number of different approaches of MCDM or MODM have been proposed, such as multiattribute utility theory, multiple-objective linear programming, goal programming, compromise programming and various heuristics which are methods based on rules that are developed through experience. Goal programming is among the best widely accepted and applied technique. The primary reason for the wide popularity of goal programming appears to be its underlying philosophy of "satisficing" (Lee and Shim, 1986).

2

Nobel laureate Herbert A. Simon (1981) suggested that the satisficing approach, rather than optimizing is based on the concept of bounded rationality. This approach has emerged as a pragmatic methodology of decision making.

1.1 Research Background

A regression model is a mathematical equation that describes the relationship between two or more variables. The dependent variable is the one being explained, and the independent variables are the ones used to explain the variation in the dependent variable.

Regression techniques are associated with the fitting of straight lines, curves, or surfaces, to set of observations, where the fit is for one reason or another imperfect. The straight line is the simplest curve that can be fitted to a set of n paired observations (Xl, YI), (X2, Y2) ... (Xn. Yn). The least squares method is the most frequently used procedure for obtaining a linear function. A problem of fitting occurs only if the fit is for some reason imperfect. To be a statistical problem there must be some random element present in the data which leads to this inexactitude of fit. It is the nature of this random element that determines the appropriate method of fitting, that is of estimating the constants or parameters in the equation.

In simple linear regression analysis, the estimated regression model is

Y = a + bX (Y denotes the predicted dependent variable and X denotes the independent variable). In multiple regressions, the estimated regression model is

n

Y = a + 2)jXj (Y denotes the predicted dependent variable and Xi denotes the i=1

independent variables). Although the method of least squares is one of the best known and probably widely utilized methods employed in the analyses of making predictions or forecasts of the future, most previous efforts in this area however, suffer from several disadvantages. According to Campbell (1972), one of the main

3

problems is that the method of least squares is biased by extreme cases. The current study proposes goal programming as an alternative to analyze such problems.

1.2 Objectives of the Study

The following are objectives of this study:

1. To develop prediction equations using least squares method.

11. To develop prediction equations using goal programming.

111. To compare the accuracy of goal programming and least squares method.

1.3 Importance of the Study

Many prediction equations have been obtained using the least squares method. These equations have been used in various areas such as educational system planning, financial planning and economic policy analysis. This study explores goal programming as an alternative method to produce prediction equations. This is because goal programming is a widely accepted and applied technique in multiple objective decision analysis.

1.4 Scopes of the Study

This study focuses on the use of the linear goal programming method to produce prediction equations in regression analysis problems. Only three data sets are considered. The first set consists of only one independent variable, the second

. set has two independent variables while the third set has three independent variables.

4

1.5 Thesis Organization

This dissertation is organized into six chapters. Chapter 1 discusses the research framework. It begins with the introduction to goal programming and the least squares method. The objectives, importance and scope of this study are also presented.

Chapter 2 reviews the least squares method and goal programming. First, the least squares line and multiple regression least squares will be reviewed. Then, the modeling of the goal programming will be discussed. The discussion starts with the background of goal programming. Formulation and methodology of the goal programming model are also presented. Finally, some complications and post optimality analysis in goal programming are explained.

Chapter 3 begins with a discussion on outliers in a data set. In this chapter analysis of data sets using the least squares methods are carried out. Chapter 4 presents on the analysis of the same data sets using the goal programming model.

In chapter 5, comparison between the least squares and goal programming are

made.

Chapter 6 summarizes and concludes the whole study and makes some suggestions for future investigation.

CHAPTER 2

RESEARCH METHODOLOGY

2.0 Introduction

This chapter will first discuss the method of least squares. Then, the linear goal programming will be reviewed.

2.1 Regression

Regression analysis is an approach or a research tool in statistics that is used to study the relationships between variables, especially for the purpose of understanding how one variable relates or depends on one or more of other variables (Wittink, 1988). For example, how is crime related to residents' age, sex, employment, and income? How the students' performances relates to their family income and the food they eat?

2.1.1 Definition of Regression

'Regression' is often used to indicate ''the return to a mean or average value" (Wittink, 1988). More than one hundred years ago, the term regression was

6

introduced to statistics by Francis Galton in a series of paper. The most famous being Galton (1886) is to describe a hereditary phenomenon. In these papers, he reported that the average height of sons with tall fathers is less than the father's height (both measured at adult ages). Similarly, the average height of son with short fathers was reported to be greater than their fathers' height. In his data, Galton emphasized the "regression toward the mean" phenomenon. He also found a positive relationship between the height of fathers and the height of their son which lay approximately on a straight line.

Galton's paper is well worth reading as an example of the many practical considerations that have to be kept in mind in collecting and interpreting data. For Galton, the important point that justified his calling this a regression line was the slope was less then unity (implying the regression, or movement towards the population mean).

Today, any study of relations between variables is often accomplished and referred through regression analysis (Wittink, 1988). The technique is used heavily in business and government activities, and social sciences, especially in economics and related disciplines. It is also a technique for quantifying the relationship between a criterion variable (dependent variable) and one or more predictor variables (independent variables). In particular, the quality of decisions often depends on the quantification of relationships between variables.

2.1.2 The Purposes and Benefits of Regression Analysis

Regression may be used for two main purposes. They are

(i) to predict the criterion variable based on specified values for the predictor variable(s), and

(ii) to understand how the predictor variable(s) influence or relate to the criterion variable.

7

The following are the benefits of using regression analysis. This tool

(i) suggests and quantifies the nature of relations between variables,

(ii) provides consistent predictions,

(iii) may provide superior predictions, and

(iv) may save time or allow a decision maker to focus on nonquantifiable aspects.

2.1.3 Simple Regression

Definition 2.1

The simple linear regression model assumes that there is a line with vertical or y intercept a and slope b, called the true or population regression line. When a value of the independent variable x is fixed and an observation on the dependent variable y is made,

y = a + bx + e

Without the random deviation e, all observed (x, y) points would fall exactly on the population regression line. The inclusion of e in the model equation allows points to deviate from the line by random amounts.

Simple regression has only two variables that is a criterion variable and one predictor variable. Lets illustrate a simple regression in an annual starting salary problem. Suppose we believe that annual starting salary relates with academic performance. Consider the following data which were obtained from seven fellow students.

8

Table 2.1 : Seven Observations of Annual Starting Salary and Grade-point Average

Annual starting salary (dollars) Grade-point average

20,000 2.8

24,500 3.4

23,000 3.2

25,000 3.8

20,000 3.2

22,500 3.4

24,000 3.4

To examine whether a relationship exists, and what the nature of the relationship may be, it is useful to plot a scatter of the data, as shown in Figure 2.1. Here, annual starting salary is called the dependent variable or response Yand gradepoint average is called independent variable, factor or regressor X

~ 25000
ca
- 24000
II
C)1i) 23000
c ...
.- ca 22000
1:=-
l:g 21000
-
ca 20000 f-
:s
c
c 19000
ca
2.3 • •

• •

2.8

3.3

3.8

4.3

Grade-point average

Figure 2.1 : Scatter of data on annual starting salary and grade-point average

From this scatter it seems clear that grade-point average does affect annual starting salary. There is a tendency for a positive relation to exist between the two variables: as the grade-point average increases, the annual starting salary also tends

9

to increase. Here, we can say that students with higher grade-point averages tend to have higher starting salaries.

To quantify the relationship, we need to specify its functional form. The most common and convenient functional form is a linear or straight line. Moreover, it should be possible to describe how, by a straight line relating Y to X. This is called the regression of Yon X. But we must remember that the number of data points is very small. So, we cannot make this statement with a great deal of confidence. And if there are other relevant predictor variables to be considered, this scatter may provide invalid information. However, based on the scatter in Figure 2.1, we can conclude that the relationship between variable is a linear function.

From Figure 2.1, it appears that a linear function is a reasonable approximation of the relationship between these variables. Lets use Y to denote an individual's starting salary, and X to indicate the individual's corresponding gradepoint average. Now, we can specify that

Y=a+bX

(2.1)

where a is the intercept (that is, the estimated valued for Y when X = 0), b is the slope (that is, the estimated change in Y for a one-unit increase in X).

Now the question is how do we obtain the values of a and b based on the seven data points. One procedure is to draw the most reasonable or best straight line we can through the data in the scatter in Figure 2.1. An example of such a drawing is shown in Figure 2.2.

10

~ 25000
ca •
-
DJ 24000 •
tnfi)' 23000 •
c .. •
.- "
~- 22000
;:g
21000
-
ca 20000
::l •
C
c 19000
ca
2.3 2.8 3.3 3.8 4.3
Grade-point average Figure 2.2 : Straight line fit based on inspection of the data

From this straight line, we try to find the values of a and b. If the graph were extended to include the origin, we could determine the intercept, a. The slope coefficient, b is obtained by fmding the change in annual starting salary that corresponds to a unit increase in grade-point average as indicated by the line. But the results are influenced by inaccuracy in drawing the straight line. Hence, a numerical procedure like least squares method should be used. This method will be discussed in detail later.

2.1.4 Possible Criteria for Fitting a Line

What is a good fit? A good fit is a fit that makes the total error small (Wannacott and Wannacott, 1981). One typical error (deviation) is shown in Figure 2.3. It can be defined as the vertical distance from the observed Yj to the fitted value

Y; on the line, that is, Yj - Y;. The error is positive if the observed Jj is above the line and negative when the observed Y; is below the line.

11

Y

"

Error = Yj - r;

Xi

Figure 2.3 : Typical error in fitting points with a line

To minimize the total error, the following criteria should be considered (Wannacott and Wannacott, 1981):

(a) A fitted line that minimizes the sum of all these errors can be presented as

n

L(r; -Y;)

;=1

Using this criterion, two type of fit lines are shown in Figure 2.4 which fit the observations equally well. The fit in panel (a) is intuitively a good one and the fit in panel (b) is a bad one. The problem is concerned with sign's where in both cases, positive error just offset negative errors and leaving their sum equals to zero. This criterion must be rejected because no distinction between bad fits and good ones.

12

+

(a)

(b)

Figure 2.4 : The weakness of using L (r; - Y;) to fit a line

(b) One of the ways to overcome the sign problem is to minimize the sum of the squares of the errors, that is:

"'" ~ 2

L.J (r; - r;)

This criterion is called least squares, or ordinary least squares (OLS). Its advantages is:

(i) In overcoming the sign problem by squaring the errors, least squares produces very manageable algebra to the geometric theorem of Pythagoras.

(ii) There are two theoretical justifications for least squares, that is the Gauss-Markov theorem and the maximum likelihood criterion for a normal regression model.

2.1.5 Using Residuals to Test the Assumptions of the Regression Model

One of the major uses of residual analysis is to test some of the assumptions underlying regression. The following are the assumptions of simple regression analysis.

a) The model is linear.

b) The error terms have constant variance.

13

c) The error terms are independent.

d) The error terms are normally distributed, where e ~ N (0, rl).

2.2 Multiple Regression

Multiple regression is the extension of simple regression. It takes account of more than one independent variable X The appropriate technique should be used when we want to investigate the effect on Y of several variables simultaneously. Many times, we wish to include the other variables influencing Y in a multiple regression analysis. The reason is:

(i) To reduce stochastic error and hence reduce the residual variance I. This makes confidence intervals more precise.

(ii) To eliminate bias that might occur if we just ignore a variable that substantially affects Y.

Let us consider the following hypothetical data. Four observations are available on the demand for a product offered at different prices.

Demand (units)

Price (dollars per unit)

80 100 90

10 10 15

These data are plotted in Figure 2.5.

14

110 -
i' 100 •
. _
C
:::I
-
"C 90 • •
e
as
E 80 •
G)
c
70

0 5 10 15 20
Price (dollars per unit) Figure 2.5 : Scatter of demand and price

From simple linear model (2.1),

Y=a + bX

where Y is demand, and X is price. From this scatter plot, we see that price has no effect on demand because the slope, b = O. So, we conclude that demand does not depend on price.

Now, we add another predictor variable which may affect demand. For example, suppose that the product has been advertised at different expenditure levels shown below.

y x. x,
Demand (units) Price (dollars) Advertising (dollars)
90 5 100
80 10 100
100 10 300
90 15 300 If demand depends on price as well as on advertising, we should estimate the effect of price on demand, holding advertising expenditure constant, or, the effect of advertising on demand, holding price constant. If both price and advertising affect

15

demand, then in general a valid indication of the effect of price on demand cannot be obtained without explicitly considering advertising's effect on demand at the same time.

We can illustrate this example graphically. To do this, we focus on the relation between demand and price, holding advertising constant. This relation is depicted in Figure 2.6. The graph shows two parallel straight lines, with each line representing the effect of price on demand at a given advertising level.

Advertising $300

110
i' 100
._
c
:::s
-
" 90
c
ca
E 80
CD
c
70
0 Advertising $100

5

10

15

20

Price (dollars per unit)

Figure 2.6 : Effect of price on demand, holding advertising constant

This example illustrates the two advantages how the additional variable X2 improved our analysis:

(i) We have a better fit of the data. This should allow us to make more precise statistical conclusions about how XI affects Y.

(ii) This example shows the relationship of demand to price while advertising is held constant.

16

2.2.1 The Mathematical Model

Y is now to be regressed on the two independent variables Xl and X2. Our model which includes X2 as a predictor variable is

y= a + blXI + b2X2

(2.2)

where b, is geometrically interpreted as the slope of the plane as we move in the Xidirection, keeping X, constant. Thus b, is the marginal effect of Xl on Y. Similarly bi is the slope of the plane as we move in the Xrdirection, keeping X, constant; thus bi is the marginal effect of X2 on Y. More generally,

a = the increase in Y if X is increased one unit, while all other regressions are

held constant

(2.3)

Proof:

Suppose that, in addition to Xl, there is only one other regressor X2; that is Y = a + blX1 + b2X2

To establish (2.3), simply take the partial derivative of Y with respect to Xl in the equation above, that is,

ay =b ax 1 1

We can easily confirm that this simple interpret of b is valid because the regression is linear. If it is not, then ay '* b.

ax

For example, if the regression is of the non-linear form Y = a + bjX + bvf +cZ

ay

-=b1 +2b2X ax

Then

To establish (2.3) without calculus, hold Z constant at its initial value Zo, and increase X from its initial Xo to (Xo + I). Substituting into the equation above, we may write

Initial Y = a + bXo + cZo

New Y = a + b(Xo + 1) + cZo

17

Difference = increase in Y = b

It is easy to confirm that this is still true if there are several Z variables.

To generalize the regression model for problems involving any number of predictor variables, we use

Y = a + b IXI + b2X2 + ... + b X, + E;

(2.4)

where

Y = dependent or response variable,

Xi, X2, ... , X; = independent or predictor variables,

E;is the random component of the model and is called the random error.

This model is often referred to as the general linear model. It is general because it allows for an arbitrary number, i, of predictor variables. And, for each of the i predictor variables specified, the effects are assumed to be linear.

2.3 The Method of Least Squares

Least squares method is a computational technique for determining the 'best' equation describing a set of points, (Xl, YI), (X2, Y2), ... ,and (x", Yn), where best is defined geometrically (Larsen and Marx, 2001). It assumes that the best-fit curve of a given type is the curve that has the minimal sum of the deviations squared (least squares error) from a given set of data.

Given data that are relevant to the problem on Y and X, the most common procedure for computing the intercept, a, and the slope coefficient, b, is called least squares method. When the values of a and b are obtained, we can compute a predicted value for Y.

Suppose the data points are (Xl, YI), (X2, Y2), ... , (X", Yn) where X is the independent variable and Y is the dependent variable. The fitting curve or the desired polynomial,p(x), can be written as

18

m

p(X) = LPkXk

k=O

(2.5)

where Po, PI, ... , Pm are to be determined. The method ofleast squares will choose as 'solution' those Pk'S that minimize the sum of squares of the vertical distances from the data points to the presumed polynomial. It means that the fitting curve p(x) has the deviation (error) d from each data point, i.e., dl = vr; p(XI) , dz = Y2 - P(X2) , ... , d; = Yn - ptx.), We give the label 'best' to the polynomialp(x) whose coefficients minimize the function L, where

n n

L = d12 + di + ... +d; = 'Ld; = L[Yi - p(xi)f = min (2.6)

i=1 i=1

2.3.1 Polynomials Least Squares Fitting

Polynomials are one of the most commonly used types of curves in regression. The applications of the method of least squares curve fitting using polynomials are briefly discussed as follows:

The Least Squares Line

The least squares line method uses a straight line Y = a + bx to approximate a given set of data, (XI,Yl), (X2, Y2), ... , (XII> Yn), where n 2: 2.

The Least Squares Parabola

The least squares parabola methods uses a second degree curve Y = a + bx + c~ to approximate a given set of data, (Xl, Yl), (X2, Y2), ... , (XII> Yn), where n 2: 3.

The Least Squares m1h degree Polynomials

The least squares mth degree polynomials method uses mth degree polynomials Y = ao + aix + a2x2 + ... a".xm to approximate a given set of data, (Xl, Yl), (X2, Y2), ... , (Xn, Yn), where n 2: m+ 1.

19

Multiple Regression Least Squares

Multiple regression estimates the outcomes which may be affected by more than one control parameters or there may be more than one control parameter being changed at the same time, e.g., Y = a + b1X1 + b2X2.

In the next section, linear least squares and multiple regression least squares are discussed in more detail.

2.4 The Least-squares Line

The method of least squares can be applied to a special case where p(x) is a linear polynomial. In the least squares line, it involves one dependent variable, Yand one independent variable, X.

The least squares line uses a straight line Y = a + bx + e

(2.7)

where

a = Y intercept of the line, b = slope of the line, and

e = error term

to approximate the given set of data, (x], Y1), (X2, y~, ... , (x", Yn), where n>: 2.

Theorem 3.1

Given n points (x], Y1), (X2, Y2), ... , (xn, Yn), the straight line Y = a + bx minimizing

n n

L = ~)Yi - p(XJ]2 = ~)Yi - (a + bx, )]2

i=1

i=1

has slope

n n n

nL>iYi -(LXi)(LYi)

b = i=1 i=1 1=1

n n

n(Lx})-(LXI)2

i=1 i=1

20

and y-intercept

n n

LYj-bLXj

a = ;=1 ;=1 = y - b'X

n

Note that a and b are unknown coefficients while all Xi and Yi are given. To obtain the least squares error, the unknown coefficients a and b must yield zero first derivatives.

Proof:

The proof is accomplished by the usual device of taking the partial derivatives of L with respect to a and b, setting the resulting expressions equal to zero, and solving. By the first step we get

oL n

- = (-2)L[Y; -(a + bxJ] = 0

oa ;=1

and

Expanding the above equation, we have:

;=1

j=1

i=1

(2.8)

j",1

j=1

and

(2.9)

;=1

i=1

j=1

From (2.8);

a = ....:...j=:..:...1 __ '='=1~

(2.10)

n

Then, substitute (2.10) into (2.9). We will obtain

21

n n n n n

b[(Lx;)2 -nLx;] = LX;LY; -nLx;y;

;=1 ;=1 ;=1 ;=1 ;=1

or

n n n

n LX;Y; - (LXJ(LYJ

b = ;=1

n n

n(Lx;)-(LX;)2

;=1 i=1

;=1

;=1

(2.11)

(2.10) and (2.11) gives the solution for a and b which are stated in Theorem 3.1.

A line that fits the data well makes the residuals small. Requiring that the

n

sum of residuals, Le; , be small is futile, since large negative residuals can offset ;=1

n

large positive ones. Indeed, any line through the point (x,y) has Le; = o. i=l

2.4.1 Residuals of Least-squares Line

Residuals are also called "goodness offit". The difference between an observed or dependent variable yt and the value of the least -squares line when x = Xi is called ith residual. In other words, residual is the difference between an actual

value (Yi) in the sample and the fitted value (f;). With the sample data for Yand X, we can obtain a and b. With these estimates we can obtain fitted values for Yusing the sample data. Its magnitude reflects the failure of the least-squares line to 'model' that particular point.

22

Definition 2.2

Let a and b be the least-squares coefficients associated with the sample (Xl, YI), (X2, Y2), ... , (XII1 Yn). For any value of X, the quantity y = a + bx is known as the predicted

value ofy. For each i, i = 1, 2, ... , n, the difference Yj - y;= Yj- (a + bXj) is called a

residual.

A residual plot is a graph of the ith residual versus Xj, for all i. Applied statisticians find residual plots to be very helpful in assessing the appropriateness of fitting a straight line through a set of points.

Theorem 2.2

The sum of the residuals equals zero. Using the definition for the simple linear model applying the least squares method

(vi -y;) = Y;- (a + bX;)

= Yi-(Y -bX +bX;)

then

L(vi - Y;)=LYj- LY +b:LX -b:LXj = LY;-nY + bnX - bLX;

= LYj - LYj + bLX; - bLX;

=0

2.5 Linear Multiple Regression Least-squares

Linear Multiple regression predicts the outcome (dependent variables) which may be affected by more than one control parameter (independent variables) or there may be more than one control parameter being changed at the same time.

23

In this section, only the multiple regression least-squares with two predictor variables will be discussed. The model for one dependent variable, Y, and two independent variables X, and X2, is

Y = a + bIxI + b2X2 + e

(2.12)

for a given data set (YI, xu, X2I), (Y2, X]2, X22), ... , (Yilt XIIIt X2n), where n 2: 3. The best fitting curve p(x) has the least squares error

n n

L = ~)Y; - p(xJj,X2;)]2 = ~)Y; -(a+b1xJj +b2x2J]2 = min

;=1 ;=1

(2.13)

Note that a, b 1, and bs are unknown coefficients while xu, X2i, and Yi are given. To obtain the least squares error, the unknown coefficients a, bi, and b2 must yield zero first derivatives. That is

8L n

- = (-2)Lx2JY; -(a+b1xli +b2x2J] = o.

8b2 ;=1

Expanding the above equations, we have

and

(2.14)

;=1

;=1

;=1

;=1

n n n n

LXliY; = aLxli +blLX~ +b2LXliX2; ,

;=1 ;=1 ;=1 ;=1

(2.15)

and

n n n n

LX2;Y; -.», +b1LXliX2; +b2LX;;

;=1 ;=1 ;=1 ;=1

(2.16)

The unknown coefficients a, b 1, and b2 can hence be obtained by solving the above linear equation simultaneously. This is a system ofthree linear equations in three unknowns, so it usually provides a unique solution for the least-squares regression

coefficient, a, bland b-, These values a, b1 and b2 are called the least squares estimates of the coefficients.

The formula for b I estimates the effect of X, on Y, holding X2 constant.

Similarly, the formula for b2estimates the effect ofX2 on Y, holding Xl constant.

24

Finally, a is the intercept, the estimated value for the criterion variable when both X, andX2 are zero.

One of the differences between fitting a straight-line regression and a multiple regression is the computational difficulty. One needs to solve (i+ 1) linear equations simultaneously and this will be vary cumbersome working with a calculator.

A central difference in interpretation between simple and multiple regression is the slope coefficients for the explanatory variables in the multiple regression are partial coefficients, while the slope coefficient in simple regression gives the marginal relationship between the response variable and a single explanatory variable. That is, each slope in multiple regression represents the 'effect' on the response variable of a one-unit increment in the corresponding explanatory variable holding the value of the other explanatory variable. The simple-regression slope effectively ignores the other explanatory variable (Fox, 2004).

The linear multiple-regression plane in the three-dimensional {y, X., X2} space, as shown in Figure 2.7.

Figure 2.7 : The multiple regression plane

25

2.5.1 Residuals of Multiple Regression Least-squares

Residuals for multiple regression least-squares are actually the same as those for least-squares line. That is,

Yi = a + btxli + b2x2i + ei

or

Yi = a + btxli + b2x2i where Y is the predicted value of y.

Then, the difference

ej = Yi - Yi = Yi -(a + btx}j +b2x2i) is called a residual.

(2.17)

In the following sections, linear goal programming will be discussed.

2.6 Linear Goal Programming

Goal programming problems can be classified according to the types of mathematical programming models such as linear programming, integer programming and nonlinear programming. These goal programming problems have multiple goals instead of a single objective (Hillier and Lieberman, 2001). In this dissertation, only the linear goal programming model is explored. These are goal programming problems that fit linear programming where each objective function is linear.

The following section will discuss the history, advantages and disadvantages, the formulation, solution method, sensitivity analysis and some complications of goal programming.

26

2.6.1 History of Goal Programming

Goal programming is an extension of linear programming. It was first developed and introduced by A. Charnes and W.W. Cooper in 1961 (Goicoechea, 1982). It was further refined by Y. Ijiri in 1965 (Goicoechea, 1982). In 1968, B. Contini considered goal programming under conditions of uncertainty. Major applications were developed by V. Jaaskelainen, S.M. Lee and J.P. Ignizio in the 1970s (Wu and Coppins, 1981). Since 1968, many goal programming related studies have been published (Ignizio, 1976).

Goal programming is widely accepted and applied technique in various functional areas, such as academic planning and administration, accounting analysis, advertising media scheduling, capital budgeting, decision-support system design, economic policy analysis, energy resources planning, financial planning, inventory management, marketing logistics, military strategies and planning, organizational analysis, production scheduling, quality control, urban planning and predicting student performance (Ignizio, 1976; Lee and Shim, 1986).

2.6.2 Advantages and Disadvantages of the Goal Programming

Goal programming is one of the popular and powerful method/technique for multiple objective decision analysis (Lee and Shim, 1986).

The following are the advantages of goal programming (Hughes and Grawoig, 1973):

a) Allows for an ordinal ranking of goal, where the low-priority goals are considered only after higher-priority goals have been satisfied to the fullest extent possible.

b) Useful in situations where the multiple goals are conflicting and cannot all be fully achieved.

27

c) Used to "satisfies" rather than to "optimize" the problem. In linear programming, what one wants is to optimize the solution. But in using the goal programming, the goal may be incorporated into the model at a value that is judged to be satisfactory, not necessarily optimal.

d) Appropriate to find a satisfactory solution where many objectives or goals are to be considered.

However there are some disadvantages of goal programming. These include the following:

a) More time and thought, are required in the construction of the model. That is the decision-maker need a lot of times to define the goals and the constraints.

b) More decision-makers involvement is required, that is in the establishment of aspiration levels and weightings.

c) The subjectivity regarding the weights given to priority levels to goal deviations may be of concern.

In the next section, the formulation of goal programming model will be presented.

2.7 Goal Programming Model Formulation

The formulation of goal programming problem is similar to that of linear programming problems (Wu and Coppins, 1981). According to Charnes and Cooper (1961), goal programming extends the linear programming formulation to accommodate mathematical programming with multiple objectives. The major differences are an explicit consideration of goals and the various priorities associated with the different goals.

According to Ignizio (1976) to formulate goal programming model, the following steps should be followed:

1. Define the decision variables.

28

11. State the system constraints and goal constraints.

111. Determine the preemptive priority factor and the relative weight (if need be).

IV. Develop the objective function.

v. State the nonnegative requirement.

Goal programming's objective function is always minimized and must be composed of deviational variables only. It minimizes the deviations of the compromise solution from target goals, weighted and prioritized.

In the formulation, two types of variables are used. They are decision variables and deviational variables. There are two categories of constraints. They are structural or system constraints (strict as in traditional linear programming) and goal constraints, which are expressions of the original functions with target goals set a priori and positive and negative deviational variables.

The general goal programming model can be expressed as follows:

m

Minimize Z = I (d;- + d;+)

;=1

(2.18)

Subject to the linear constraints:

n

Goal constraints: (Iaijx) + d; - dt = b., i = 1, 2, ... , m

}=1

System constraints: faijX}[~Jb;' i = m+ 1, ... , m+p

1=1 ~

with x},d;- ,dt ~ 0, for i = 1,2, ... , m andj = 1, 2, ... , n

where there are m goals, p system constraints and n decision variables Z = objective function

aij = the coefficient associated with variable j in the ith goal Xl = the jth decision variable

bs = the associated right hand side value

d, = negative deviational variable from the ith goal (underachievement)

d;+ = positive deviational variable from the ith goal (overachievement)

29

Both over- and underachievement ofa goal cannot occur simultaneously.

Hence, either one or both of these variable must have a zero value; that is,

d" xd- =0

Both variables apply for the nonnegativity requirement as to all other linear programming variables; that is,

d+,d- ~ 0

Table 2.2 shows three basic options to achieve various goals:

Table 2.2 : Procedure for achieving a goal

Minimize Goal If goal is achieved
d-:- Minimize the underachievement d; =O,d: ~O
I
d+ Minimize the overachievement dj- ~ O,dt = 0
I
d, +dt Minimize both under- and overachievement d; =O,d: =0 2.7.1 Preemptive Goal Programming

Before solving a goal programming problem, the goals need to be ranked.

Preemptive goal programming is also called non-Archimedean or lexicographic goal programming (Ignizio, 1983, 1985a). In priority goal programming, the objectives can be divided into different priority classes. Here it is assumed that no two goals have equal priority. The goals are given ordinal rankings and are called preemptive priority factors. These preemptive priority factors have the relationship

P; »> P2 »> ... »> Pk »> PhI

where »> means "very much greater than". This priority ranking is absolute. Therefore, the PI goal is so much more important than the P 2 goal and P 2 goal will never be attempted until the PI goal is achieved to the greatest extent possible.

The priority relationship implies that multiplication by n, however large it may be, cannot make the lower-level goal as important as the higher goal (that is,

30

Pj > nPj+1). In formulating a goal programming model having prioritized goals, those preemptive priority factors are incorporated into the objective function as weights for the deviational variables.

Using equation (2.18), the preemptive goal programming model can be presented as:

m

Minimize Z = 'LPk(d; +dt)

;=!

(2.19)

Subject to the linear constraints:

n

Goal constraints: 'Laijxj +d;- -dt =b.; i= 1,2, ... ,m j=1

System constraints: taijXjl~] s; i = m+1, ... , m+p

)=1 ~

with xj,d;- ,dt ~ 0, i = 1,2, ... , m andj = 1, 2, ... , n

where there are m goals, p system constraints, k priority levels and n decision variables

Pk = the priority factor of the kth goal

Here, the difference between equations (2.18) and (2.19) is the priority factor in the objective function.

2.7.2 Weighted Goal Programming

The weighting of deviational variables at the same priority level should be considered in the goal programming model formulation. These weights show the relative important of each deviation. Charnes and Cooper (1977) stated the weighted goal programming model as:

m

Minimize Z = "(W-d~ +W+d+)

£..J I I I I

(2.20)

;=1

31

Subject to the linear constraints:

n

Goal constraints: Laijxj + d;- - dt = b., i = 1, 2, ... ,m j=1

System constraints: taijXj[~J b.; i = m+1, ... , m+p

J=I ~

with xj,d;-,d;+ ~O,i=l, 2, ... ,mandj=l, 2, ... ,n where there are m goals, p system constraints and n decision variables

W;- = positive numerical weight assigned to the negative deviational

variable, d, of the ith constraint

W;+ = positive numerical weight assigned to the positive deviational

variable, dt of the ith constraint

While Ijiri (1965) had introduced the idea of combining preemptive priorities and weighting, Charnes and Cooper (1977) suggested the goal programming model

as:

m n

Minimize Z = LLPk<W;~d;- +W;;dt)

;=1 k=1

(2.21)

Subject to the linear constraints:

n

Goal constraints: Laijxj + d;- - dt = bi' i = 1, 2, ... ,m j=1

System constraints: taijXj[~Jbi' i=m+1, ... , m+p

J=I ~

with xj,d;- .d] ~ 0, i = 1, 2, ... , m andj = 1, 2, ... , n

where there are m goals, p system constraints, k priority levels and n decision variables

Z = objective function

Pk = the priority factor of the kth goal

W;- = positive numerical weight assigned to the negative deviational

variable, d, of the ith constraint

32

W/ = positive numerical weight assigned to the positive deviational

variable, dt of the ith constraint

a; = negative deviational variable from the ith goal (underachievement) dt= positive deviational variable from the ith goal (overachievement) aij = the coefficient associated with variable j in the ith goal

Xj = the jth decision variable

b, = the associated right hand side value

The following are examples of a single-goal and multiple-goal problems:

A Single-Goal Problem

As an example, an electronic firm manufactures two types of electronic calculators, A and B. The relevant data are shown in Table 2.3. If 50 hours is available each week for each department, how many calculators should the company produce to maximize profit? (Wu and Coppins, 1981).

Table 2.3 : Data for single-goal model

Type A B
Profit, $ 6.00 12.00
Hours in department I 1 2
Hours in department II 2 3 Decision Variables

x I = number of A calculators X2 = number of B calculators

System I Structural Constraints xi + 2x2:S 50

2x1 + 3X2:S 50

(hours constraint in department I) (hours constraint in department II)

33

Goal Constraints

The only goal for the company in this case is to maximize profits. Here, we should establish a target (goal) for profits and then try to find a solution that comes as close as possible to achieving the goal. Let's establish the profit goals as $1500 (per week). Now, defme the deviational variables which indicate by how much the goal is under- or overachieved. The goal constraint for profit maximization can be formulated as

$6.00x] + $ 12.00x2 + a - d" = $1500 where a = underachievement of the $1500 profit goal

d' = overachievement of the $1500 profit goal

In this problem, if total profit < $1500, then a > 0 and d" = O. On the other hand, if total profit> $1500, then d" > 0 and a = O. If total profit = $1500, then a = .r = O. So at least one of the deviational variables must always be zero. Thus, a x cr = O.

Objective Function

The company wishes to minimize the underachievement of this goal. So, we must minimize a to zero. Thus, the objective function becomes

Minimize Z = a Now the complete model is

Minimize Subject to

Z=a

6XI +12x2 + a - ct = 1500 Xl + 2x2 S 50

2x1 + 3X2 S 50

with X}, X2, a, ct 2: 0

A Multiple-Goal Problem

As an example, Digital Devices is a firm that specializes in producing disk drives for various computer manufacturers. Currently, the company produces two types of disk drives: TIl and DI00. The TIl drive requires 8 minutes of processing time in assembly center one and 3 minutes in assembly center two. A D 1 00 drive requires 4 minutes of processing time in assembly center one and also 4 minutes in

34

assembly center two. The normal operation time is 60 hours per week in assembly center one and 40 hours per week in assembly center two.

Digital Devices currently has a contract to deliver 420 TIl drives. There is almost unlimited demand for D I 00 drives. The current market prices provide the following unit profits: TIl - $120 and DIDO - $80. The management of Digital Devices has set the following goals in the order of their importance (Lee and Shim, 1986):

PI: Produce at least 420 TIl drives.

P2: Avoid any underutilization of normal operation hours in the two assembly centers.

P3: Avoid any overtime operation in assembly center one.

P4: Achieve the profit goal of $80,000.

Decision Variables

x; = number of TIl disk drives produced X2 = number ofDIOO disk drives produced

Goal Constrains

TIl Disk Drives: The first goal is to produce at least 420 TIOO disk drives.

Xl + dl- - dt = 420,

where dl- = underachievement of sales goal for TIl drives

dt = overachievement of sales goal for TIl drives

If Xl < 420, then d; > 0 and dt = O. On the other hand, if Xl > 420, then dt > 0 and dl- = O. If Xl = 420, then dl- = dt = O. To achieve the sales goal of 420 TIl drives, we must minimize dl- to zero. This is accomplished by

Minimize Z = P; dl-

Operation Hours of Assembly Centers: The second goal is to avoid underutilization of normal operation hours in the two assembly centers.

8xl +4X2 +d; -d; =3600 (assembly center one)

35

(assembly center two)

where d; = underutilization of normal operation time of 3600 minutes in assembly center one

d; = overtime operation in assembly center one

d; = underutilization of normal operation time of 2400 minutes in assembly center two

d; = overtime operation in assembly center two

To achieve the second goal, we should minimize the negative deviations in the above goal constraints.

Minimize Z = ~dl- + P2d; + P2d;

Overtime Operation in Assembly Center One: To achieve the second goal, we have developed the normal operation hour constraint for assembly center one. Thus, we

have already defined d; as overtime operation in assembly center one. Consequently, we do not need to develop a new goal constraint to achieve the third goal. The only thing we need to do is simply to minimize d; .

Minimize Z = ~dl- + ~d; + P2d; + ~d;

Profit Goal: Management's final goal is to achieve the profit goal $80,000. This goal constraint can be formulated as

120xI + 80x2 + d; - d; = 80,000 where d; = underachievement of $80,000 profit d; = overachievement of $80,000 profit

This goal can be achieved by minimizing d; as follows:

Minimize Z = ~dl- + P2d~ + P2d; + P3d; + P4d;

Now the complete model can be presented as follows:

Minimize

SUbject to

Z = ~dl- + P2d; + P2d; + ~d; + P4d; Xl +dl- -dt = 420

36

8x1 + 4X2 + d; - d; = 3600

3x1 +4X2 +d; -d; = 2400 120x1 + 80x2 + d~ - d: = 80,000

2.8 Solution Method of Goal Programming

In this section, two types of goal programming solution methods will be discussed. They are the graphical method and the modified simplex method.

2.8.1 The Graphical Method

As with linear programming, the graphical method is useful for those goal programming problems which involve only two decision variables. Naturally, while this approach does not work for most practical problems, it does offer valuable insight into the underlying theory of goal programming.

In goal programming, we try to minimize the deviation from the goal with the highest priority to its fullest possible extent. Then the goal with the second higher priority factor is considered, and so on. The sequential "satisficing" procedure is used in the graphical method.

According to Ignizio (1976), the steps of the graphical approach are listed as follows:

1. Plot all the system and goal constraints in terms of the decision variables (these will simply be straight lines or planes in a linear model).

2. Determine the solution(s) space for the priority 1 goals.

37

3. Move to the set of goals having the next-highest priority and determine the "best" solution space for this set of goals, where this "best" solution cannot degrade the achievement values already obtained for higherpriority goals.

4. If, at any time in the process, the solution space is reduced to a single point, we may terminate the procedure because no further improvement is possible.

5. Repeat steps 3 and 4 until either we converge to a single point or we have evaluated all the priority levels.

Again, the clearest explanation of this approach may be shown via a simple, but typical example as follows:

Solving problem with graphical method

The production manager for Kitchen Brite Cookware wants to schedule a day's production run for two types of electric toaster, the Plain and the Gaudy. The production of a Plain toaster requires 1.0 person-hours, 2.0 square feet of sheet metal, and 0.5 pound of wiring. Making a Gaudy toaster uses up 2.0 person-hours, 2.5 square feet of sheet metal, and 0.4 pounds of wiring.

Available for the day's run are 310 person-hours, 500 square feet of sheet metal, and 120 pounds of wiring. Production of one Plain toaster per day would maintain 0.2 persons in the work face, and one Gaudy toaster per day is associated with 0.5 persons. The profit per Plain toaster is $5, while the profit per Gaudy toaster is $8. The manager has the following goals (Cooke, 1985):

PI: Profit of$1,350 per day P2: Work force of 72 people

The goal programming formulation of this problem is

Let x I = number of Plain toasters to produce per day X2 = number of Gaudy toasters to produce per day

38

Subject to

Z = p.,dl- + P2d;

5xI + 8x2 + dl- - dt = 1,350 (profit goal)

0.2xl + 0.5X2 + d; - d; = 72 (work force goal)

Minimize

XI + 2X2 ::; 310

2xI + 2.5x2 s 500 0.5xl + OAx2 ::; 120

(person-hours constraint) (sheet metal constraint) (wiring constraint)

where d; = underachievement of $1 ,350 profit

dt = overachievement of $1,350 profit

d;= underachievement of72 people work-force d; = overachievement of 72 people work-force

First of all, we need to plot the system constraints. This problem have three system constraints (person-hours, sheet metal, and wiring constraint) and are plotted on a graph as shown in Figure 2.8.

300

100

200

o

50

100

150

200

250

300

Figure 2.8 : Achievement of the system constraints

39

Next, consider the goal constraints. One goal constraint will be plotted at a time, following the order of the objective function. The most important goal is to maximize the profit goal (the positive deviational) in the first constraint. Thus, the profit goal constraint can be plotted as shown in Figure 2.9 which is minimize d}- .

When d}- is minimized, the feasible area becomes the shaded area. Any point in the shaded area will satisfy the profit goal because the total profit will be $1,350 or more and d}-= o.

300

100

200

o

50

100

150

200

250

300

Figure 2.9 : Achievement of the profit goal

The second is to minimize the underachievement of the work force goal.

This can be accomplished by minimizing d; in the second goal constraint. However, this goal must sought within the feasible area already defmed by satisfying the first goal. Thus, the feasible area becomes further reduced as shown in Figure 2.10.

40

300

100

Xl

200

o

50

100

150

200

250

300

Figure 2.10 : Achievement of the profit and work force goals

It is obvious that point A (x), Yl) is the optimal solution. The values of xi and X2 can be obtained by solving the two equalities simultaneously. Here, xi = 110 and X2 = 100 are obtained. When substituting xi = 110 and X2 = 100 in all of the goal constraints, we get that

d1- = dt = d; = d; = 0

At this solution point all two goals are completely attained because there is no conflict among the goals.

2.8.2 The Modified Simplex (Multiphase) Method

The modified simplex method is a general solution technique for all types of goal programming problems. It is an iterative algorithm just like the regular simplex method for linear programming. Because of the unique features of the goal programming model, a number of modifications are necessary in the simplex operation.

41

To apply this method, the first thing we need to do is developing the initial modified simplex tableau. The general initial modified simplex tableau is shown in Table 2.4.

Table 2.4 : The general initial modified simplex tableau

Cj VI V2• .. Vn Vn+1 Vn+2'" Vn+m Vn+m+1 Vn+m+2'" Vn+2m
Cb Basic Solution XI x2 ... xn a: a: ... d- d+ d+ ... d+
I 2 m I 2 m
variables b
Xb
ul d- bl YI,I YI,2'" YI,n YI,n+1 .,. YI,n+m YI,n+m+1 ... YI,n+2m
I
U2 d- b2 Y2,1 Y2,2'" Y2,n Y2,n+I'" Y2,n+m Y2,n+m+1 ••• Y2,n+2m
2
.
.
Um d- bm Ym,1 Ym,2'" Ym,n Ym,n+1 ... Ym,n+m Ym,n+m+I'" Ym,n+2m
m
PK gK rK,l rK•2 ••• rK•n rK,n+1 ." rK•n+m rK,n+m+I" • rK,n+2m
PK-I gK-1 r K _1,1 r K -1,2 ... r K -I,n r K -I,n+l ... r K -l,n+m r K -1,n+m+1 ... r K -1,n+2m
Zj -Cj .
P2 g2 r2•1 r2,2 ... r2,n r2•n+1 ••• r2•n+m r2•n+m+1 ••• r2•n+2m
~ gl rl•l rl,2 ••• rl,n rl,n+l ... rl,n+m rl,n+m+1 ••• rl,n+2m where

j = 1,2, , n

i = 1,2, , m

k = 1,2, , K

s = 1,2, , S

x j = the initial set of nonbasic variable

dt = the initial set of nonbasic variable

dl- = the initial set of basic variable

Vs = the function of preemptive priority factors and weights associated with the

sth basic or nonbasic variable

42

u, = the function of preemptive priority factors and weights associated with the

ith basic variable

b, = the right hand side value of the ith goal

Yi,s = element in the ith row under the sth basic or nonbasic variable. That is, the

coefficient of the sth basic or nonbasic variable in goal i.

Pk = kth priority level

gk = level of achievement of the goals in priority k, where g = (g], g2, ... , g,J

rk,s = the index number for priority k under sth basic or nonbasic variable

All the elements in the initial tableau, except for rk,s and gk are simply obtained from the mathematical model (2.18). However, rk s and gk must be computed as follows:

or

m

r., = I (Yi,s ·uJ-vs i=1

(2.22)

and

or

m

s, = I(bi ·uJ

i=1

(2.23)

If the system constraints exist in the goal programming model, some further steps have to be taken. The system constraint can exist in the three following ways:

n

1. If the system constraint is Iaijxj ~ b., a slack variable, S; will be added to j=1

this equation. The equation will become

n

Iaijxj +Si =b, j=1

The slack variable, S; will be defined as the initial basic variable.

43

n

2. If the system constraint is _Laijxj = hi' an artificial variable, A; will be added j=1

to this equation. The equation will become

n

_Laijxj + Ai = hi j=1

The artificial variable, Ai will be used as the initial basic variable.

n

3. If the system constraint is _Laijxj ~ hi' an excess or surplus variable, E, and j=1

an artificial variable, Ai will be added to this equation. The equation will become

n

Laijxj - Ei + Ai = hi j=1

The artificial variable, Ai will be used as the initial basic variable.

Then, a new priority factor, Po must be introduced. The Po is defined as the super priority factor, which is the highest priority factor among all the priority factors where Po »> PI »> P2 »> ... »> Pk »> Pk+I. Po also represents the artificial objective function. The initial simplex tableau when the system constraints exist is shown in Table 2.5.

44

Table 2.5 : The general initial simplex tableau when the system constraints exist

Cj VI'" Vn ... V n+2m+q+r+t
Cb Basic Solution Xl ••• xn d; '" d: dt ... d~ SI ... Sq EI••• E, AI'" At
variables b
Xb
U1 d- bl YI,1 ... YI,n ...... YI,n+2m+q+r+t
I
·
U2 · Y2,1 Y2,n Y 2,n+2m+q+r+l
... ......
d- bm
m
Sl bm+1 .
· .
· .
Sq «;
.
bm+q+1 .
Al
Ym+q+t,l ...... Y m+q+t,n+2m+q+r+t
Um ~ bm+q+t
PK gK rK,1 ... rK,n . ..... rK,n+2m
PK-1 gK-l rK-1,1 ••• rK-1,n+2m
Zj -Cj .
~ gl r1,1 .. , ... rl,n+2m
Po go rO,1 ro,n rO,n+2m
... .., ... The element within the Table 2.4 can be defined as follows:

i =m+1,m+2, ... ,m+p

k = 0, 1, 2, ... , K

S, = slack variable for ith goal

E; = excess or surplus variable for the ith goal

A, = artificial variable for the ith goal

Po = the super priority factor which assigned to the artificial variable, Ai in the

objective function

45

By following the steps given below, the optimal solution to the goal programming model may be derived (Ignizio, 1976).

Step 1: Initialization. Establish the initial modified simplex tableau and the index row for priority level 1 only. Set k = 1 and proceed to Step 2.

Step 2: Check for optimality. Examine gk. If gk is zero go to Step 7. Otherwise, examine each positive valued index number rk,s in the kth index row. Select the largest, positive rk,s for which there are no negative valued index numbers, at a higher priority, in the same column. Designate this column as s '. Ties in the selection of rk,s may be broken arbitrary. If no such rk,s may be found, go to Step 7. Otherwise, go to Step 3.

Step 3: Determining the pivot column and incoming nonbasic variable.

Step 4 Determining the pivot row and outgoing basic variable. Determine the row associated with the minimum nonnegative value of

hi/ Yis'

In the event of ties, select that row having the basic variable with the higher priority level. Designate this row as i '. The basic variable associated with row i ' is the outgoing basic variable.

Step 5: Establishment of the new tableau.

(i) Set up a new tableau with all Yi,s, hi, rk,s and gk elements empty.

Exchange the positions of the basic variable heading in row i' (of the previous tableau) with the nonbasic variable heading in column s' (of the previous tableau).

(ii) Row i' ofthe new tableau (except for Yi',s) is obtained by dividing row i' of the previous tableau by Yi',s'.

(iii) Column s' of the new tableau (except for Yi',S') is obtained by dividing column s' of the previous tableau by (-Yi',s).

(iv) The remaining element are computed as follows:

46

(2.24)

b. = b. _ (bi' )(Yi,s')

I I

(2.25)

Yi',s'

where hi andYi,s represent the new set of elements to be computed and b, and Yi,s represent the previous values for these element (from previous tableau).

(v) After that, the new values for rk,s and gk is establish. These values must be computed for the kth priority level and all higher priority levels. Its can be obtained simply through the use of equations (2.22) and (2.23).

(vi) Return to Step 2.

Step 6: Check the optimality for the new solution.

Step 7: Evaluate the next-lower priority level. Set k = k+ 1. If k exceeds K (the total number of priority levels) then stop as the solution is optimal. If k ::s K, establish the index row for P« and go to Step 2.

The following illustrates an example of solving a problem using the modified simplex method.

Solving problem with the modified simplex method

Given the problem to

Minimize

Subject to

X I - 3x 3 - 8x 4 + s; - d; = 0

47

Solve by the simplex method (Lee and Coppins, 1981).

Step 1: Developing the initial simplex tableau

Similar to linear programming, the initial solution always starts at the origin where all the decision variables and positive deviational variables have a zero value. This leaves all the negative deviational variables, with a solution value. Then, these variables are entered into the basis column and their solution values are also entered into the solution column. In the example, when zero is substituted into the decision variables, the system of constraints becomes

a; = 100

d; =100 d; =0

a; =2200

Then the variables and their solution values are entered into the initial simplex tableau as shown in Table 2.6.

As discussed earlier, Cj values are replaced by the preemptive priority factors or differential weights. Variables without preemptive priority factors are considered to have zero y value. All coefficients are recorded in the tableau as same way in the simplex method of linear programming. The 0 value row is completely eliminated.

48

Table 2.6 : The initial simplex tableau

Cj 0 0 0 0 PI 0 P2 0 0 0 P3
Cb Basic Solution XI X2 X3 X,f d/ d2- d3- d,f- d/ d/ d/
variable b
Xb
PI dI- 100 1 1 1 1 1 0 0 0 -1 0 0
0 d2- 100 2 2 0 0 0 1 0 0 0 0 0
P2 d3- 0 ~o -3 -8 0 0 1 0 0 -1 0 f4-
0 d,f- 2200 20 30 20 24 0 0 0 1 0 0 -1
P3 0 0 0 0 0 0 0 0 0 0 0 -1
Z.-C. P2 0 1 0 -3 -8 0 0 0 0 0 -1 0
J ]
PI 100 1 1 1 1 0 0 0 0 -1 0 0 (+- denoteds 'pivot row' and t denotes 'pivot colwnn)

Since goal programming is always a minimization technique, '4 - 0 values are used. '4 - Cj values are stored in a k x n matrix, where k is the number of preemptive priority levels and n is the total number of decision and deviational variables. Before obtaining the values of '4 - Cj; the '4 values must be computed first. The calculation is shown as follows:

solution PIX lOO+Ox lOO+P2xO+Ox2200

Column l:(Cb x solution values)

= Zj (solution)

(2.26)

and

X I P I X 1 + 0 x 2 + P 2 X 1 + 0 x 20

X2 P,» 1+(),x2+P2xO+Ox30

o o

Variable l:(Cb x coefficients)

=~

(2.27)

49

After the 4 - Cj values of each column are calculated, they are substituted into the 4- Cj matrix. Since preemptive priority factors Pj are not commensurable, each of their coefficients is entered separately into the row and column as shown in Table 2.6. In this example, the 4 - Cj matrix has 3 x 11 dimension where there are three preemptive priority levels and eleven variables (with four decision and seven deviational).

Step 2: Selecting the pivot column

Similar to the simplex method of linear programming, the basic approach in selecting the pivot column is to choose the column with the largest nonnegative Zj - Cjvalue. Recall the relationship of preemptive priority factors PI »> P2 »>

... ,where »> means 'very much greater than'. When this is applied, the values of each '4 - ~. column can easily be compared and the one with the largest value is chosen as the pivot column. If there is no positive value in that row, move up one row and find the column with the largest positive 4 - Cj value. If no such column can be found, the solution is obtained. If there exists a tie between column, check the next row with a lower priority level.

In this example, the xi column is selected as the pivot column. There is a tie between the XI,X2, X3 and x, column in the PI row. The tie remains when the next level of priority, P2 is considered where xi has a larger positive value and thus it is chosen as the pivot column. Now x, is the entering variable.

Step 3: Determine the pivot row

To select the pivot row, divide each solution value by the coefficient on the same row in the pivot column. The row that has the minimum nonnegative quotient is chosen as the pivot row. If a tie between row exists, select the one with a higher priority deviational variable. In the example, the outgoing variable is found to be d3- in row number three.

Step 4: Determine the new solution

To determine the new solution of each element of the pivot row, divide the pivot element, that is, the element at the intersection of the pivot column and pivot

50

row. The new value of each element in the other row is calculated with the following formula:

new value = old value - (row value x new value in pivot row)

The new tableau (second simplex tableau) is shown in Table 2.7. In this new tableau, x, = 0 and d3- is eliminated from the basic variable.

Table 2.7 : The second simplex tableau

Cj '0 0 0 0 PI 0 P2 0 0 0 P3
Cb Basic Solution Xl X2 X3 X4 dl- d2- ds d4- dl + d3 + d/
. variable b
Xb
PI dI- 100 .0 1 4 9 1 .0 -1 o -1 1 o
0 d2- 100 0 2 6QO 1 -2 0 0 2 0 ~
0 Xl 0 1 0 -3 -8 0 0 1 0 0 -1 0
0 d4- 2200 0 30 80 1840 0 -20 1 0 20 -1
P3 0 0 0 0 0 0 0 0 0 0 0 -1
. Zj -Cj P2 () () () () () () () -1 () () () e
PI 100 .0 1 4 9 0 0 -1 0 -1 1 .0 t

Step 5: Test the optimality

To determine whether a solution is optimum, consider the '4 - Y matrix. If the b;, .seletien value is zero, the solution is obtained. If positive values in '4 - q matrix, and for every positive value in the '4 - Cj matrix at least one negative value exists at its higher priority level in the same column, then the final solution is attained.

Refer to the second tableau. The solution has not been obtained yet. The pivot column, X4, and pivot row, d2- is determined. The completed third tableau is shown in Table 2.8.

Computational continues for two more tableau and the optimum solution is found. The fourth and fifth tableau are shown in Table 2.9 and 2.10.

Table 2.8 : The third tableau

Cj 0 0 0 0 PI 0 P2 0 0 0 P3
Cb Basic Solution Xl X2 X3 X4 dI- d2- d3- d4- di + d3 + d4 +
variable b
Xb
PI d/ 175/4 0 ». ~/s 0 1 -')/16 lis 0 -1 :.r/s 0
0 25/4 0 I/sC'j;) 1 0 I _lIs 0 0 I Is 0 +-
X4 it6
0 Xl 50 1 1 0 0 0 1/2 0 0 0 0 0
0 d4- 1050 0 7 11 0 0 231 3 1 0 -3 -1
- 2
Zj -Cj P3 0 0 0 0 0 0 0 0 0 0 0 -1
P2 0 0 0 0 0 0 0 -1 0 0 0 0
PI 175/4 0 _lIs Sis 0 0 -9116 lis 0 -1 _lis 0 t

Table 2.9 : The fourth tableau

Cj 0 0 0 0 PI 0 P2 0 0 0 P3
Cb Basic Solution xi X2 X3 X4 d/ d2- d3- di dl+ d3+ d4+
variable b
Xb
PI dI- lIJO/3 0 _lh 0 0 1 _7h ~O -1 :.r/3 0 I+-
0 X3 SOh 0 1/3 1 sh 0 1/6 _Ih 0 0 Ih 0
0 Xl 50 1 1 0 0 0 Ih 0 0 0 0 0
0 d4- 2600/3 0 1°/3 0 _ssh 0 SOl 2°1 1 0 _2°h -1
- 6 3
Zj -Cj P3 0 0 0 0 0 0 0 0 0 0 0 -1
P2 0 0 0 0 0 0 0 -1 0 0 0 0
PI lOOh 0 _l/3 0 0 0 2h Ih 0 -1 _l/3 0 t

51

52

Table 2.10: The fmal tableau

Cj 0 0 0 0 PI 0 P2 0 0 0 P3
Cb Basic Solution Xl X2 X3 X4 d/ d2- d3- d4- dI+ d3+ d4+
variable b
Xb
P2 d3- 100 0 -1 0 0 3 -2 1 0 -3 -1 0
0 X3 50 0 0 1 8/3 1 _lh 0 0 -1 0 0
0 Xl 50 1 1 0 0 0 1/2 0 0 0 0 0
0 di 200 0 10 0 _88h -20 0 0 1 20 0 -1
P3 0 0 0 0 0 0 0 0 0 0 0 -1
Zj -Cj P2 100 0 -1 0 0 3 -2 0 0 -3 -1 0
PI 0 0 0 0 0 -1 0 0 0 0 0 0 Note that in the first four tableau we are trying to satisfy the first priority goal. This is finally achieved in the fifth tableau. Now we shift our attention to the P2 row of the 2.Jo-Cj matrix. However, the only candidate to enter is di, which has a coefficient of -1 on the PI row. Therefore it cannot enter or else PI optimality will be destroyed. We are unable to achieve the second priority goal at this point. Finally we can consider the P3 row. Since all entries are nonnegative, the third priority goal has been achieved. We therefore conclude that the tableau shown in Table 2.10 is optimal, with XI* = 50 and X3* = 50. Goal 1 and goal 3 have been achieved; goal 2 has not been achieved. There is an underachievement d3- = 100 and d4- = 200. The unattained portion of the goal is shown by the nonzero 0 (solution) value, which is 100P 2. This is further verified by the positive value, 3, in column dI- at the 0 - Cj matrix. The positive value implies that these are conflicting goals. The conflict occurs between the goals PI and P 2.

53

2.9 Goal Programming Complications

In applying goal programming to multiple objective decision problems, we may face a few types of complicated situations (Lee and Shim, 1982; Markland and Sweigart, 1987). These specials problem will be discussed next.

2.9.1 Negative Right-hand Side Value

If a goal constraint has a negative right-hand side value, we must multiply both sides of the constraint by -1 to make the right-hand side value positive. Then, we introduce the deviational variables to the goal constraint. If we want to minimize the negative deviation in the original constraint, we should minimize the positive deviational variable in the new goal constraint, and vice versa.

For example, consider the following goal:

GI: -Xl -2X2 +d; -dt = -25

We had wished to minimize dl- •

The form of goal without deviation variables is

GI: -xl-2x2~-25

Thus,

and

G r , I .

where dt must now be minimized.

2.9.2 A Tie in Selecting the Incoming Variable (Pivot Column)

In selecting the pivot column, consider the largest 'Z.t - Cjvalue at the highestpriority level. If two or more columns have the same largest '4 - Cj value and the tie

54

cannot be broken even at lower-priority levels, then we can choose one of the tied column as the pivot column on an arbitrary basis.

2.9.3 A Tie in Selecting the Outgoing Variable (pivot Row)

In selecting the pivot row, we decide solution values by the associated positive coefficients in the pivot column. The row with the smallest ratio (nonnegative quotient) is selected as the pivot row. If two or more rows have the same-ratio, the tie can broken by choosing the row that has the highest priority factor.

2.9.4 Alternative Optimal Solutions

This situation can occur if one or more of the nonbasic variable columns have zero '0 - Cj values in the fmal simplex tableau. The multiple optimum solution is determined by computing a new tableau. When conflicts exist among the goals, this situation will not happen.

2.9.5 An Infeasible Problem

When system constraints exist in a goal programming model, we need to assign the super priority, Po for these constraints. If a conflict exist among the system constraints, '0 value at the Po will be positive in the final simplex tableau. The problem will remain an infeasible problem if the conflict among system constraints cannot be solved.

55

2.9.6 An Unbounded Solutions

In most real world goal programming problems unbounded solutions do not occur, since every goal is constrained and the goal tends to be set at a level that is not easily reached. It is possible, however, to omit important constraints in a goal programming problem, as well as have an unrealistic priority structure. When this happens, an unbounded solution could occur. Such an unbounded solution would require the decision maker to reanalyze the goal structure of the problem.

2.10 Sensitivity I Post Optimality Analysis

An analysis of the effect of parameter changes after determining the optimal solution is a very important part of any solution process. This procedure is broadly defined as the post optimal sensitivity analysis. Because there usually exists some degree of uncertainty in real world problem concerning the model parameters, such as

1. Which facilities and/or products may be discontinued.

2. What may be gained from and how much one should pay for additional resources.

3. What the impact will be of increases or shortages in resources and increases or decreases in inflation and/or interest rates.

If the optimal solution is relatively sensitive to changes in certain parameters, special effort should be directed to forecasting the future values of these parameters. Sensitivity analysis provides us with a systematic procedure for analyzing all of the aspects listed above and, as such, can well be the most important phase in the total decision making framework.

From the final modified simplex tableau, one can obtain enough data to perform a sensitivity analysis for changes in Cj (priority factor), hi (goal level or

56

resources), and jv, (technological coefficient). The types of changes that will be investigated are as follows:

• A change in the weighting factor at priority k for the jth nonbasic variable, Vk,s.

• A change in the weighting factor at priority k for the ith basic variable,

ui»

• A change in the original right-hand side value of the goal i, hi.

• A change in the original coefficient associated with the ith goal and the jth nonbasic variable, Yi,s.

• The additional of a new goal.

• The additional of a new decision variable.

All of these changes will be presented next.

2.10.1 A Change in Vs

Vs deputes the function of the preemptive priority factors and weights associated with the particular nonbasic variables. If Vs is changed, the result will occur in the elements of the index rows, that is rk,s will change.

Let the new function of v s be represented by v s and the new value of rk,s be represented by 7k,s. Using equation (2.22), the formula of 7k,s can be obtained by

(2.28)

Proof:

From (2.27), we have

o = ~Cb x coefficients) Minus C, both sides, this equation become

57

4 - q = I:(Cb x coefficients) - q

To finding the value of rk,s, the above equation become rk,s = I(Ui x Yi,J - Vs

As a result of a change in the value of rk,s, there may be an impact on the optimality of the new tableau. For example, if rk,s was negative and is changed to a positive value where there are no negative valued index numbers at a higher priority in the same column, then the optimal solution will change.

2.10.2 A Change in u,

U; represents the function of the preemptive priority factors and weights associated with the particular basic variables. A change in U; can affect both the index row value, rk,s and the achievement level values, gk. Let the new function of U; be represented byu; and gk by gk . Thus, using equation (2.22) and (2.23),

(2.29)

and

(2.30)

Proof:

From (2.26), we have

4 (solution) = I:(Cb x solution values)

58

Minus Cj both sides, this equation become

'4 - Cj = L(Cb x solution values) - Cj To find the value of gk, the above equation becomes gk = I:(Ui x b.)

For new u.;

2.10.3 A Cbange in b,

The b, value often represents estimates of resources availabilities or aspiration levels. Any such change is usually an important aspect of real world problem solutions. The effect of a change in b, is evident in both b and gk. The new b value can be computed as

(2.31)

A

where b is the new left hand side column vector and b ' reflects the new set values in

the original problem formulation. B-1 is the inverse of the basic matrix.

Proof:

For general linear programming, the problem with m equation constraints and n nonnegative variables, take a form

Ax=b

(2.32)

If B represents the m columns of A which correspond to the basic variables, so that A can be written

A = (BR)

(2.33)

where B is the m »m matrix of the basis and R the m x (n-m) matrix of the non-basis variable, then

(BR)x=b Multiplier both sides with s',

Kl (BR)x = e' b

(2.34)

59

that is

(2.35)

The new gk values are determined by

(2.36)

2.10.4 Change in yl,s

The changes in Yi,s are associated with nonbasic variables (in the final tableau under consideration); that is, with only the Yi,s coefficients. A change in the coefficient Yi,s have an effect on the index rows in r.. This is given by

,. -B-1• '

Yi,s - Yi,s

(2.37)

(this equation is the same as (2.31) with b, changed to Yi,s) and

m

rk,s = ~)Ui' Yi,S)-VS

i=1

(2.38)

where Yi,S is the new vector set of Yi,s value in the final modified simplex tableau under the sth nonbasic and Yt» ' is the changed vector set of initial Yi,s coefficient under the sth nonbasic variable.

2.10.5 Additional of a New Goal

Particular care must be given to the addition of a new goal. First, this goal must be commensurable with any other goals at the same priority level. A new goal

60

will also change the size of the basic. So, we must clear or eliminate the new goal of any nonzero coefficients for any basic variable that appears in the new goal.

2.10.6 Additional of a New Decision Variable

The additional of a new variable requires a new column in the tableau. This column is associated with the new, nonbasic variable and change in jz; That is, originally, the new variable does not exist and thus all its Yi.s coefficients are zero. Once the new variable has been added to the problem, we have, in effect, changed the Y;,s coefficients of a nonbasic variable from all zero to some new values. Consequently, we may find the new set of Y;,s values in the final tableau with the

equation Yi,S = B-1• Yi,S'. This, of course, also requires computation of a new rk,s as

discussed above.

In next section, the relation between goal programming and least squares method will be presented.

2.11 Regression Analysis for Determining Relative Weighting or Goal Constraint Parameter Estimation

Goal programming in the form of a constrainted regression model was used quite some time ago by Charnes, Cooper and Ferguson (1955). By minimizing deviation the goal programming model can generate decision variable values that are the same as the beta values in some types of multiple regression models. In Charnes, Cooper and Sueyoshi (1986, 1988) it was suggested that their goal programming model serves a valuable purpose of cross checking answers from other methodologies. Likewise, multiple regression models can also be used to more accurately combine multiple criteria measures that can be used in goal programming model parameter (Schniederjans, 1995).

61

2.12 Summary

In this chapter, the analysis for simple and multiple regression have been presented. Then, the method of least squares method for simple and multiple regression were also discussed. This was followed by the history and advantages of the goal programming. The formulation, methods, complication and sensitivity analysis of the goal programming were also explained using some examples.

Finally, the relation between goal programming and least squares method was briefly presented.

CHAPTER 3

CASE STUDY ON USING LEAST SQUARES METHOD

3.0 Introduction

Chapter 2 has discussed two popular and powerful methods in the operational research and statistics field. In this chapter, least squares methods will be used to analyze the data sets. This chapter will begin with the description of the data.

3.1 Background of Data

Three data sets were chosen for analysis. All of the data sets contained outliers. Set 1 relates one dependent variable (Y) with one independent variable (X), set 2 relates one dependent variable (Y) with two independent variable (Xl, X2) while set 3 relates one dependent variable (Y) with three independent variable (Xl, X2, X3). The data set are as follow:

Data Set 1

Carbon aerosols have been identified as a contributing factor in a number of air quality problems. In a chemical analysis of diesel engine exhaust, X = mass (p,g/cm2) and Y = elemental carbon (p,g/cm2) were recorded ("Comparison of Solvent

63

Extraction and Thermal Optical Carbon Analysis Methods: Application to Diesel Vehicle Exhaust Aerosol" Environment Science Technology (1984): 231 - 234).

Table 3.1 : The data of mass and elemental carbon

Observation X, Y, elemental Observation X, Y, elemental
number mass carbon number mass carbon
1 164.2 181 16 78.9 86
2 156.9 156 17 387.8 310
3 109.8 115 18 135.0 141
4 111.4 132 19 82.9 90
5 87.0 96 20 117.9 130
6 161.8 170 21 108.1 102
7 230.9 193 22 89.4 91
8 106.5 110 23 76.4 97
9 97.6 94 24 131.7 128
10 79.7 77 25 100.8 88
11 118.7 106
12 248.8 204
13 102.4 98
14 64.2 76
15 89.4 89 Carbon aerosol is dangerous to our health because it influences the number of air quality. This set of data has 25 pairs, (Xi, Yi) of observations as tabulated in Table 3.1. In this set, mass (X) is an independent variable while elemental carbon (Y) is a dependent variable. So, this is a simple linear regression problem.

Data Set 2

The administrator for an organization that conducts management seminar programs is interested in examining the relationship between seminar enrollments (y), the number of mailings (Xl), and the lead time of mailings (Xl) of seminar

64

announcements. Data were obtained from a sample of n = 25 management seminars offered by the organization and are listed in Table 3.2.

Table 3.2 : The data of enrollment, the number of mailings and the lead time of mailings

Obser- Enrollment, Num. of Lead Obser- Enrollment, Num. of Lead
vation Y Mailings, Time, vation y Mailings, Time,
number x, x, number x, x,
(x 1,000) (weeks) (X 1,000) (weeks)
1 27 6.5 3 16 19 3.7 6
2 29 6.5 2 17 36 9.1 12
3 41 13.0 15 18 43 23.0 13
4 36 8.1 13 19 40 23.5 10
5 22 4.0 6 20 38 9.0 9
6 40 11.5 13 21 40 7.0 12
7 52 18.0 17 22 42 12.5 16
8 39 10.0 12 23 21 5.0 6
9 27 7.1 4 24 29 6.8 12
10 28 6.5 10 25 35 7.2 14
11 24 7.0 5
12 29 7.3 11
13 33 7.5 12
14 35 7.5 12
15 27 4.9 9 The second set of data is about management seminar program. 25 pair (V;, xi; Xli) of observations were recorded. The relationship between enrollment and number of mailings and lead time is deterministic if the value of enrollment is completely determined, with no uncertainty, once values of the number of mailings and lead time have been specified.

65

Data Set 3

The U.S. Bureau of Mines produces data on the price of minerals. Table 3.3 shows the average prices per year for several minerals over a decade.

Table 3.3 : The data of gold, copper, silver and aluminium

Observation Y, Gold ($ per oz) x.; Copper Xl, Silver ($ X3,
number (cent per lb) peroz) Aluminium
(cents per lb)
1 161.1 64.2 4.4 39.8
2 308.0 93.3 11.1 61.0
3 613.0 101.3 20.6 71.6
4 460.0 84.2 10.5 76.0
5 376.0 72.8 8.0 76.0
6 424.0 76.5 11.4 77.8
7 361.0 66.8 8.1 81.0
8 318.0 67.0 6.1 81.0
9 368.0 66.1 5.5 81.0
10 448.0 82.5 7.0 72.3
11 438.0 120.5 6.5 110.1
12 382.6 130.9 5.5 87.8 There are four variables (minerals) - gold, copper, silver and aluminium in this data set. Gold and silver are measured by $ per oz while copper and aluminium are measured by cents per lb. The objective here is to predict the average price of gold. Here, gold is the dependent variable denoted by Y while copper, silver and aluminium are independent variable denoted by Xl, X2 and X3•

66

3.2 Outliers

Definition 3.1

The outlier is an unusually small or large data value (Devore and Peck, 2001).

Definition 3.2

Outliers are data points that lie apart from the rest points, or are data points that are apart, or far, from the mainstream of the other data (Black, 2001).

Definition 3.3

Outliers are observations with a unique combination of characteristics identifiable as distinctly different from the other observations (Hair et al, 1998).

Outliers can be classified into one of four classes. The first class arises from a procedural error, such as a data entry error or a mistake in coding. These outliers should be identified in the data cleaning stage, but if overlooked, they should be eliminated or recorded as missing value. The second class of outlier is the observation that occurs as the result of an extraordinary event, which then is an explanation for the uniqueness of the observation. The researcher must decide whether the extraordinary event should be represented in the sample. If so, the outlier should be retained in the analysis; if not, it should be deleted. The third class of outlier comprises extraordinary observations for which the researcher has no explanation. Although these are the outliers most likely to be omitted, they may be retained if the researcher feels they represent a valid segment of the population. The fourth and final class of outlier contains observations that fall within the ordinary range of values on each of the variables but are unique in their combination of values across the variables (Hair et al, 1998).

In linear regression, an outlier is defined as an observation for which the studentized residual (r, or r/) is large in magnitude compared to other observations in the data set. Observations are judged as outliers on the basis of how unsuccessful the fitted regression equation is in accommodating them (Chatterjee and Hadi, 1988).

67

Potential outliers are observations that have extremely large residuals. They do not fit in with the pattern of the remaining data points and are not at all typical of the rest of the data. As a result, outliers are given careful attention in regression analysis in order to determine the reasons for the large fluctuations between the observed and predicted responses (Richard & Robert, 1980).

Both the predictor and dependent variables will have their parts to play in deciding whether an observation is unusual. The predictor variables determine whether a point has high leverage. The value of the dependent variable, Y, for a given set of X values will determine whether the point is an outlier.

As every data point has an influence on the regression model, outliers can exert an overly important influence on the model because of their distance from other points. Thus an examination of outliers is worth considering before a set of data is analyzed. In the next section, box plot which is a simple technique to identify outliers in a data set will be presented.

3.2.1 Box Plot (Box and Whisker Plots)

A box plot is a diagram that utilizes the upper and lower quartiles along with

the median and the two most extreme values to depict a distribution graphically. It is one technique to detect an outlier in data set. This technique is used in many statistics and management texts book. There are two types of box plot: the skeletal and the modified box plot (Devore and Peck, 2001).

Definition 3.4

Lower quartile = median of the lower half of the sample Upper quartile = median of the upper half of the sample

The interquartile range (iqr), a resistant measure of variability, is given by iqr = upper quartile - lower quartile

68

Definition 3.5

An observation is an outlier if it is more than 1.5 iqr away from the closest end of the box (the closest quartile). An outlier is extreme if it is more than 3 iqr from the closest end of the box, and it is mild otherwise. A modified box plot represents mild outliers by shaded circles and extreme outliers by open circles. Whiskers extend on each end to the most extreme observations that are not outliers.

The box plot is determined from five specific numbers.

1. The median (Q2).

2. The lower quartile (Ql).

3. The upper quartile (Q3).

4. The smallest value in the distribution.

5. The largest value in the distribution.

The box of the plot is determined by locating the median and the lower and upper quartiles on a continuum. A box is drawn around the median with the lower and upper quartiles (Ql and Q3) as the box endpoints. These box endpoints (Ql and Q3) are referred to as the hinges of the box.

At a distance of 1.5 . iqr outward from the lower and upper quartiles are what are referred to as inner fences. A whisker, a line segment, is drawn from the lower hinge of the box outward to the smallest data value. A second whisker is drawn from the upper hinge of the box outward to the largest data value. The inner fences are established as follows.

Ql-1.5 . iqr Q3 + 1.5 . iqr

If there are data beyond the inner fences, then outer fences can be constructed:

Ql- 3· iqr Q3 + 3· iqr

Figure 3.1 shows the features of a box plots.

69

Hinge

Hinge

~ 1.5· iq>\_ LI.5. iqr

~ ~ ~
..._ _..
3· iqr 3· iqr Figure 3.1 : Box plots

Data values that are outside the mainstream of values in a distribution are viewed as outliers. Outliers can be merely the more extreme values of a data set. Values in the data distribution that are outside the inner fences but within the outer fences are referred to as mild outliers. Values outside the outer fences are indicated by zero on the graph. These values are extreme outliers.

3.2.2 Existence of Outliers

Using the box plot technique, data set 1,2 and 3 will be shown to contain outliers. It only tests the independent variable, X.

Data Set 1

First, the data need to be arranged from the smallest value to the largest value or vice versa as follows:

Xi: 64.2, 76.4, 78.9, 79.7, 82.9, 87.0, 89.4, 89.4,97.6, 100.8, 102.4, 106.5, 108.1, 109.8, 111.4, 117.9, 118.7, 131.7, 135.0, 156.9, 161.8, 164.2,230.9,248.8,387.8

The quantities needed for constructing the modified box plot are as follows:

Median = 108.1

Lower quartile = 88.2 Upper quartile = 145.95

iqr = upper quartile - lower quartile = 145.95 - 88.2 = 57.75

1.5 . iqr = 1.5 . 57.75 = 86.625

70

3' iqr=Y: 57.75 = 173.25

Thus,

Upper edge of box (upper quartile) + 1.5 . iqr = 145.95 + 86.625 = 232.575 Lower edge of box (lower quartile) - 1.5 . iqr = 88.2 - 86.625 = 1.575

So 248.8 and 387.8 are both outliers at the upper end, and there are no outliers at the lower end.

Since,

Upper edge of box + 3' iqr = 145.95 + 173.25 = 319.2 387.8 is an extreme outlier and 248.8 is only a mild outlier.

The MINITAB box plot is presented in Figure 3.2.

----I +

1----

*

o

+---------+---------+---------+---------+---------+------x

60

120

180

240

300

360

Figure 3.2 : Comparative box plot for mass (p.glcm2)

Data Set 2

Here the data of independent variable, X, (number of mailings/x 1 000) and X2 (lead time/weeks) would be tested.

For independent variable number of mailings,

xi.: 3.7,4.0,4.9,5.0,6.5,6.5,6.5,6.8, 7.0, 7.0, 7.1, 7.2, 7.3, 7.5, 7.5, 8.1,9.0,9.1, 10.0, 11.5, 12.5, 13.0, 18.0,23.0,23.5

The quantities needed for constructing the modified box plot are as follows:

Median = 7.3

Lower quartile = 6.5 Upper quartile = 10.75 iqr = 10.75 -6.5 = 4.25

1.5 . iqr = 1.5 . 4.25 = 6.375

71

3' iqr= 3' 4.25 = 12.75

Thus,

Upper edge of box + 1.5 . iqr = 10.75 + 6.375 = 17.125 Lower edge of box -1.5' iqr = 6.5 - 6.375 = 0.125

So, 18.0, 23.0 and 23.5 are both outliers at the upper end.

Since,

Upper edge of box + 3 . iqr = 10.75 + 12.75 = 23.5 There are no an extreme outlier for this data set.

The MINIT AB box plot is presented in Figure 3.3.

-------1 +

1--------

*

00

----+---------+---------+---------+---------+---------+--x1

4.0 8.0 12.0 16.0 20.0 24.0

Figure 3.3 : Comparative box plot for number of mailings (x 1000)

For independent variable lead time of mailings,

X2i : 2, 3, 4, 5, 6, 6, 6, 9, 9, 10, 10, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 14, 15, 16, 17

The quantities needed for constructing the modified box plot are as follows:

Median = 12 Lower quartile = 6 Upper quartile = 13 iqr=13-6=7

1.5 . iqr = 1.5 . 7 = 10.5 3 . iqr = 3 . 7 = 21

Thus,

Upper edge of box + 1.5 . iqr = 13 + 10.5 = 23.5 Lower edge of box -1.5' iqr = 6 -10.5 = -4.5 So, there are not outliers for lead time of mailings, X2.

72

Data Set 3

The data of independent variable, X] (copper/cent per lb), X2 (silver/$ per oz) andX3 (aluminium/cents per lb) would be tested.

For independent variable copper,

Xli: 66.1, 64.2, 66.8, 67.0, 72.8, 76.5, 82.5, 84.2, 93.3, 101.3, 120.5, 130.9

The quantities needed for constructing the modified box plot are as follows:

Median = 79.5

Lower quartile = 66.9 Upper quartile = 97.3

iqr = 97.3 - 66.9 = 30.4 1.5 . iqr = 1.5 . 30.4 = 45.6 3 . iqr = 3 . 30.4 = 91.2

Thus,

Upper edge of box + 1.5 . iqr = 97.3 + 45.6 = 142.9 Lower edge of box - 1.5 . iqr = 66.9 - 45.6 = 21.3 There are not outliers for independent Xi.

For independent variable silver,

X2i : 4.4, 5.5, 5.5, 6.1, 6.5, 7.0, 8.0, 8.1, 10.5, 11.1, 11.4,20.6

The quantities needed for constructing the modified box plot are as follows:

Median = 7.5

Lower quartile = 5.8 Upper quartile = 10.8 iqr = 10.8 - 5.8 = 5.0 1.5 . iqr = 1.5 . 5.0 = 7.5 3' iqr=Y: 5.0= 15.0

Thus,

Upper edge of box + 1.5 . iqr = 10.8 + 7.5 = 18.3 Lower edge of box -1.5' iqr = 5.8 -7.5 = -1.7 So, 20.6 are outlier at the upper end.

73

Since,

Upper edge of box + 3 . iqr = 10.8 + 15.0 = 25.8 20.6 is a mild outlier.

The MINIT AB box plot is presented in Figure 3.4.

----I

+

1--

*

------+---------+---------+---------+---------+---------+x2

6.0

9.0

12.0

15.0

18.0

21. 0

Figure 3.4: Comparative box plot for silver ($ per oz)

For independent variable aluminium,

X3i: 39.8,61.0, 71.6, 72.3, 76.0, 76.0, 77.8, 81.0, 81.0, 81.0, 87.8, 110.1

The quantities needed for constructing the modified box plot are as follow:

Median = 76.9

Lower quartile = 71.95 Upper quartile = 81.0

iqr = 81.0 -71.95 = 9.05

1.5 . iqr = 1.5 . 9.05 = 13.575 3 . iqr = 3 . 9.05 = 27.15

Thus,

Upper edge of box + 1.5 . iqr = 81.0 + 13.575 = 94.575 Lower edge of box - 1.5 . iqr = 71.95 - 13.575 = 58.375

So, 110.1 is an outlier at the upper end and 39.8 is an outlier at the lower end. Since Upper edge of box + 3 . iqr = 81.0 + 27.15 = 108.15

Lower edge of box - 3 . iqr = 71.95 - 27.15 = 44.8

39.8 and 110.1 is an extreme outlier.

The MINIT AB box plot is presented in Figure 3.5.

74

o

-------1 + 1-----

o

--------+---------+---------+---------+---------+--------x3

45

60

75

90

105

Figure 3.5: Comparative box plot for aluminium (cents per Ib)

3.3 Analysis Using Least Squares Method

In this section, the three data sets will be analyzed using the least squares method to produce the best polynomiaL

3.3.1 Analysis on Data Set 1

In this data set, only the first 20, 19 and 18 pairs (Y;, x;) of observations will be used for analyzing the data in the following conditions: contains outliers, mild outlier or extreme outlier removed, and both mild and extreme outliers removed. The last five observations will be used to predict or estimate using the least squares line equations obtained from this data set.

(i) Contained outliers

Recall Table 3.1. It is calculated that

20

LXi = 2731.8

i=1

20

LX; = 484531.16

i=1

20

LYi =2654

i=1

20

LXiYi = 444011.2

i=1

Using equations (2.10) and (2.11), we have

and

n n n

nLXiYi -CLx;)(LY;)

b = ;:1 ;:1 i~1

n n

n(~x;2) - (LXJ2

i~1 ;=1

20( 444011.2) - 2731.8(2654) 20( 484531.16) - (2731.8)2

= 0.7316

a = -"-.i:..;_1 __ ,---,i~l;__

n

2654 - 0.7316(2731.8) 20

= 32.7708

Then, the least-squares line is

Y· = a+bx +e

III

Y; = 32.7708 + 0.7316x; + ei

or

The scatter plot for this best straight line is shown in Figure 3.6 .

Yi =::32.7708+0.7316xi

(3.1)

.c 400
...
CO
0 ~ 300
S (J 200
-
c C)
CI) ::l.
E - 100
CI)
- 0
W
0 200 400 600
Mass (lJg/cm2) Figure 3.6 : The scatter plot for mass (pg/cm2) depend on elemental carbon (pg/cm2)

75

76

(ii) Remove mild outlier

Here, the 12th observation (upper end mild outlier), (yJ2, X12) = (204, 248.8), will be removed. Only the first 19 pairs of observations will be used for analysis.

Recall Table 3.1. It is calculated that

19

LXi =2483

i=1

19

LX; = 422629.72

i=1

19

LYi =2450

19

LXiYi =393256

i=1

;=1

Using equations (2.10) and (2.11), we have b=0.7446

and

a= 31.6345

Then, the least-squares line is

j\ = 31.6345+0.7446xi

(3.2)

(iii) Remove extreme outlier

Now, the 17th observation (upper end extreme outlier), (yJ7, X17) = (310, 387.8), will be removed.

Recall Table 3.1. It is calculated that

19

LX; = 334142.32

i=1

i=1

19

LYi =2344

19

LXjYj =323793.2

j=1

i=1

Using equations (2.10) and (2.11), we have b= 0.7698

and

a= 28.3933

Then, the least -squares line is

.Pi = 28.3933 + 0.7698xj

(3.3)

77

(iv) Remove both mild and extreme outliers

Now, the 12th and 17th observation (mild and extreme outlier), will be removed.

Recall Table 3.1. It is calculated that

18

LXi =2095.2

i=1

18

LX; = 272240.88

i=1

18

LYi =2140

;=1

18

LXiYi = 273038

i=1

Using equations (2.10) and (2.11), we have b = 0.8442

and

a=20.6206

Then, the least-squares line is

Yi = 20.6206 + 0.8442xi

(3.4)

3.3.2 Analysis on Data Set 2

For this set of data only 20 pairs (Yi, xu. X2i) of observations will be used for analysis. Then 19 and 17 pairs of observations will be used to analyze the data when the first, second or third mild outlier and all of the three mild outliers is remove from the data set using MINIT AB software package.

(i) Contained outliers

Recall Table 4.2. It is calculated that

20

LXIi = 193.7

i=1

20

L X~ = 2481.57

i=1

20

LX2i =194

i=1

20

LX;i =2206

j=1

20

LYI =665

1=1

20

LXliYi = 7127.2

1=1

20

L X21YI = 6959 1=1

1=1

Using equations (2.14) to (2.16), we have

n n n

an+b1Lxli +b2LX21 = LYI

1=1

1=1

1=1

n n n n

-». +blLX~ +b2LXliX21 = LXliYI

1=1 I=! i=! I=!

193.7a + 2481.57b] + 2111.5b2 = 7127.2

n n n n

aLx21 +b1LXliX2i +b2LXil = LX21YI

I=! 1=1 1=1 1=1

194a + 2111.5bI + 2206b2 = 6959

Solving equations (3.5), (3.6) and (3.7) simultaneously, we obtain a=22.2

bi = 1.1

b2=0.0167

Thus, the least-squares line is

or

(ii) Remove first mild outlier

78

(3.5)

(3.6)

(3.7)

(3.8)

The first mild outlier is (Y7, Xl,7, X2,7) = (52, 18.0, 17). Only 19 pairs of observations will be used for analysis. Using MINITAB software package, the leastsquares line is

YI = 23.1 + 0.932x11 + 0.02x21

The output of the MINITAB analysis is given in Appendix At.

(3.9)

Sign up to vote on this title
UsefulNot useful