You are on page 1of 9

TRANSPORTATION RESEARCH RECORD 1617 Paper No.

98-1115 69

Annual Average Daily Traffic


Prediction Model for County Roads
DADANG MOHAMAD, KUMARES C. SINHA, THOMAS KUCZEK,
AND CHARLES F. SCHOLER

A traffic prediction model that incorporates relevant demographic vari- volume trends, and conducting traffic engineering analysis (1). In
ables for county roads was developed. Field traffic data were collected contrast, there is no program of a similar nature available for county
from 40 out of 92 counties in Indiana. The selection of a county was roads. Traffic monitoring on county roads is not a responsibility of
based on population, state highway mileage, per capita income, and the
state departments of transportation. Instead, it is a responsibility of
presence of interstate highways. Three to four automatic traffic counters
were installed in each selected county. Most counters installed on the county highway departments.
selected road sections were based on the standard 48-hour traffic counts. Several studies have been conducted for INDOT to forecast AADT
Then, the obtained average daily traffic was converted to annual average on the state highway systems (2–4). Since INDOT maintains traffic-
daily traffic by means of adjustment factors. Multiple regression analy- monitoring programs on the state highway systems and records the
sis was conducted to develop the model. There were quantitative and traffic counts continuously each year, the development of traffic fore-
qualitative predictor variables used in the model development. To vali-
casting models for the state highway systems is possible. However,
date the developed model, additional field traffic data were collected
from eight randomly selected counties. The accuracy measures of the val- little attention has been focused on the prediction of daily traffic on
idation showed the high accuracy of the model. The statistical analyses county roads due to lack of traffic records.
also found that the independent variables employed in the model were Traffic forecasting methodology is highly advanced at the urban
statistically significant. The number of independent variables included in level. Most large metropolitan areas have developed and imple-
the model was kept to a minimum. mented a fairly sophisticated set of computer-based travel simula-
tion models based on the common four-step process. At the county
Traffic monitoring involves the collection of many types of data, level, however, this process is not nearly so advanced. Most county
such as traffic volume, traffic composition, vehicle speeds, and roads carry a low average daily traffic (ADT) volume. Therefore, an
vehicle weights. One of the most important traffic-monitoring vari- easy-to-implement traffic prediction model is believed to be desirable
ables is the annual average daily traffic (AADT) for a section of for local agencies.
road. AADT is defined as the average, over a year, of the number of
vehicles that pass through a particular section of a road each day.
LITERATURE REVIEW
The most reliable approach to obtain the AADT values is to install
automatic traffic recorders (ATRs) at all sections of each road in the
Estimation of AADT
networks. However, this is not a practical approach because of the
expense involved in purchasing and installing ATRs. This study
Normally, the estimation of AADT on county roads, if available, is
focuses on the problem of installing a limited number of ATRs to
obtained by a short-duration count. The standard short duration that
estimate AADT on county roads, which are not currently monitored
has been applied in some counties is 48 hours, taken some time
by the Indiana Department of Transportation (INDOT).
during the period between Monday noon and Friday noon. Then, the
The common approach for developing the AADT prediction
observed traffic data are multiplied by adjustment (monthly varia-
model is multiple linear regression. Prediction of AADT on road sec-
tion) factors. This method assumes that the traffic on county roads
tions is a difficult task as it is dependent on many variables. However,
has little daily or weekly variation, and, consequently, only monthly
a desirable traffic prediction model is the one that requires the least
adjustment factors are considered. In addition, it is also assumed
number of independent variables to monitor. The AADT prediction
that there are few vehicles with more than two axles on county
model must be reasonably easy and economical to use. The devel-
roads, and, therefore, no adjustment factor for the number of axles
oped traffic prediction model should also produce information for
is applied.
decision makers in a form that does not require extensive training to
understand.
Traffic Prediction Models
BACKGROUND INFORMATION
An elasticity-based approach is the most commonly used approach
in the development of prediction models. It was applied to develop
All states maintain ongoing traffic-monitoring programs on the state
traffic prediction models for the rural state highway systems in New
highways for planning, estimating vehicle miles traveled, tracking
York (5). The models predicted future-year AADT as a function of
D. Mohamad, K. C. Sinha, and C. F. Scholer, Department of Civil Engi-
base-year AADT, modified by various demographic factors. It was
neering, and T. Kuczek, Department of Statistics, Purdue University, West found that each functional class model had different significant fac-
Lafayette, IN 47907. tors. For example, for the interstate traffic prediction model, AADT
70 Paper No. 98-1115 TRANSPORTATION RESEARCH RECORD 1617

was a function of county automobiles and town households. For the The validation of the selected regression model is required in the
principal arterial model, AADT was a function of county house- model development. Collection of new data was conducted in the
holds and town population; and for the minor arterial and major present study to check its predictive ability. Validation of a fitted
collector model, AADT was a function of town households. regression is the demonstration or confirmation that the model is
A similar study was conducted in Indiana to develop traffic sound and effective for the purpose for which it was intended. Val-
prediction models for rural state highways (2). Factors suspected to idation of the model requires assessing the effectiveness of the fit-
influence traffic—population, vehicle registration, and employment— ted equation against an independent set of data and is essential if
were considered independent variables in the model development. confidence in the model is to be expected.
There was no test for the appearance of collinearity problem in Bias and variance are sometimes incorporated into a single
either study mentioned above. The principal component analysis measure called the mean squared prediction error (MSPR). MSPR is
that can remedy the collinearity problem was introduced in other defined as the average squared difference between independent obser-
Indiana studies (3,4). It was found that the elasticity-based model vations and predictions from the fitted equation for the corresponding
was more efficient than the principal component model. values of the independent variables.
There were several other studies that applied regression analysis.
∑ (Yi − Yˆi )
n* 2
In the Kentucky study (6 ), a model to forecast highway traffic vol-
umes on the state highway systems was developed in a two-step MSPR = i =1
(2)
modeling process. In the first step, a linear regression related aver- n*
age travel to personal income, fuel price, and total miles of streets
where
and highways. In the second step, cross-tabulation models related
site-specific traffic growth to statewide ADT. Yi = the value of the response variable in the ith validation case,
In the Minnesota study (7 ), the use of regression analysis for esti- Ŷi = the predicted value for the ith validation case based on the
mating AADT for road sections was investigated. Seven potential model-building data set; and
predictor variables of traffic volume were chosen from the Min- n* = the number of cases in the validation data set.
nesota Department of Transportation road-log database. The vari-
If the MSPR is fairly close to the mean squared error (MSE) based
ables considered were county population, number of lanes, width of
on the regression fit to the model-building data set, then the error
road section, two-category qualitative variable indicating whether
mean square MSE for the selected regression model is not seriously
there is control of access to road section, four-category qualitative
biased and gives an appropriate indication of the predictive ability
variable indicating road functional class, five-category qualitative
of the model. If the MSPR is much larger than the MSE, one should
variable indicating availability status of road section to trucks, and
rely on the mean squared prediction error as an indicator of how well
three-category qualitative variable indicating type of locale.
the selected regression model will predict in the future.
Some other factors that were assumed to affect traffic growth pat-
tern were investigated in a study (8) that included geographic loca-
tion, type and width of pavement, proximity to an urban area, and
types of services the roadway provides. VARIABLE DESCRIPTION
All of the studies mentioned did not develop traffic prediction
models for county roads due to the unavailability of county road Response Variable
traffic data. In the present study, an effort has been made to collect
traffic data on county roads. All the methodologies used in the afore- The only response variable that needed to be predicted was AADT of
mentioned studies, especially the analysis applied in the Minnesota county roads. In the present study, the standard 48-hour traffic counts
study, were adopted in the present study’s model development. or coverage count stations were installed to collect daily traffic vol-
ume on county roads. The automatic traffic counters were installed
within 40 selected counties from February to August of 1996.
METHODOLOGY

The most commonly used form of a multiple regression equation is Independent Variables
as follows:
These independent variables are generally county characteristics:
AADTi = β 0 + β1 Xi1 + L L + β j Xij + ε i (1)
• County Population (CPOP)—The County and City Data Book
(9) provides county population values. These variables were taken
where
as independent variables on the assumption that AADT is dependent
AADTi = the value of the dependent variable in the ith observation, on the number of people living nearby.
i = 1, . . . , n, • County Households (CHH)—The number of occupied housing
Xij = the value of the jth independent variable in the ith units is the same as the number of households. The data can be
observation, obtained from the County and City Data Book (9).
j = 1, . . ., m, • County Vehicle Registration (CVR)—The total number of
β0 = constant term, vehicle registrations in the county where a road segment is located
βj = regression coefficient for the jth independent variable, is published each year (10).
« = error term, • County Employment (CEMP)—County employment, which
n = observation number, and can also be obtained from the County and City Data Book (9), is an
m = number of independent variables. economic variable.
Mohamad et al. Paper No. 98-1115 71

• County Per Capita Income (CPCI)—Similar to employment, it • CHH (X7)—total households;


is believed that daily traffic will increase as the per capita income • CVR (X8)—total vehicle registration of a county;
increases. The County and City Data Book (9) provides the per capita • CEMP (X9)—total employment;
income at the state and county level. • ART (X10)—total arterial mileage of a county;
• County State Highway Mileage (CSHM), Arterial Mileage • COLL (X11)—total collector mileage of a county.
(ART), and Collector Mileage (COLL)—CSHM, ART, and COLL
are adopted on the assumption that there exists a significant relation-
ship between state highway mileage and AADT on county roads. It Descriptive Statistics
is assumed that the more state highway mileage that is available,
the less daily traffic there is on county roads due to the different Descriptive statistics, including the mean, standard deviation, min-
pavement condition between state roads and nonstate roads. These imum, and maximum values of each variable, were generated by the
variables were obtained from INDOT (11). SAS procedure MEANS (13). The outlying values of a variable
• Location (LOCALE)—Location is a variable that indicates type were detected by investigating its standard deviation since it cannot
of locale, rural or urban. This variable is assumed to have a signifi- resist the influence of extreme observations. It is sensitive to the
cant relationship with AADT. A county road within urban areas influence of a few extreme observations, such as outliers.
will carry more daily traffic than a county road within rural areas.
The urban and rural classifications are based on population size.
Roads located in areas with a population of fewer than 5,000 are clas- Relationships between Pairs of Variables
sified as rural roads. Otherwise, the roads in those areas with a pop-
ulation of more than 5,000 are classified as small urban to urbanized The correlations between AADT and X1 and that between AADT
roads (3,12). and X2 were 0.55 and 0.46, respectively, indicating that the AADT
• Presence of Interstate Highways (INT)—More than half of the was significantly influenced by the location and accessibility of the
counties in Indiana—40 out of 92 counties—have no interstate seg- roads. The correlations between AADT and X4, X7, X8, and X9 were
ments within their county (11). Even though most interstate systems 0.32, 0.30, 0.31, and 0.30, respectively, indicating that the correlation
have no exits on the county roads, it is assumed that daily traffic between AADT and these four demographic variables was also sig-
using the interstate highways will divert if the county roads provide nificant. However, these four variables showed a high correlation to
a shortcut to the final destination. each other.
• Accessibility (ACCESS)—It is obvious that a county road that The correlation between AADT and X10 was 0.35, indicating that
has an easy access to the state highway system will have more daily the AADT was affected by the total arterial mileage within the
traffic than a county road that has difficult access. The term easy county in which the road sections were located. The correlations
means that a county road is closest to the state systems or closest to between AADT and X5 and X11, however, were rather marginal to
an urban area. insignificant.

AADT PREDICTION MODEL Model Development

Many predictor variables of interest in traffic prediction models are Full Model
quantitative. These quantitative variables are county population,
households, employment, width of road section, per capita income, An initial regression fit of a first-order model with ordinary least
gas consumption, and vehicle registration. However, there are some squares, using all 11 predictor variables, was conducted. A funda-
common predictor variables of interest in traffic prediction models mental assumption underlying multiple linear regression analyses is
that are qualitative. Examples of common qualitative predictor vari- that all random errors have the same variance (14). Outliers may be con-
ables in the traffic study are functional class, access control, quali- sidered a special case of unequal variances since such observations
tative variable indicating locale, and qualitative variable indicating may have very large variances.
availability status of road section to trucks (7 ). The inclusion of a large number of variables in a regression model
will result in multicollinearity. It often results in coefficient esti-
mates that have large variances with consequent lack of statistical
AADT Database significance or they have incorrect signs (15). One of the statistics
that is commonly used to ascertain the degree and nature of multi-
Three of the 11 considered variables were qualitative, requiring collinearity is the variance inflation factor (VIF). The results of the
three indicator variables. A description of all variables included in SAS procedure REG with the VIF option and the SAS procedure
the analysis follows. PLOT are presented in Figures 1 and 2, respectively.
Regression diagnostics for this initial fit suggested several prob-
• AADT (Y)—the only response variable that will be predicted; lems. First, the variance inflation factors of some variables were
• LOCALE (X1)—1 = urban; 0 = rural; very large, indicating that some predictor variables may be involved
• ACCESS (X2)—1 = easy access or close to the state highway; in the multicollinearities. The variance inflation factor is defined as
0 = otherwise; 1/(1 − R2), where R2 is the coefficient of multiple determination. In
• INT (X3)—1 = having interstate; 0 = none; Figure 1, the regression R2 was 0.4644. Since 1/(1 − R2) = 1.87, any
• CPOP (X4)—county population; variables associated with VIF values exceeding 1.87 were more
• CSHM (X5)—total state highway mileage of a county; closely related to the other independent variables. It can be observed
• CPCI (X6)—per capita income; that most predictor variables have a VIF value greater than 1.87.
FIGURE 1 First-order regression model with all 11 predictor variables.

FIGURE 2 Plot of residual against predicted value of Y.


Mohamad et al. Paper No. 98-1115 73

Second, due to the presence of serious multicollinearity, non- The subset models were ranked within each subset size from the
significant results in individual tests on the regression coefficient can best to the worst fitting model.
happen. For example, only two predictor variables were found to be The full model accounted for 76.6 percent of the variation in the
significant in the regression results above. Multicollinearity increases dependent variable AADT. Of the single-variable subsets, the first
the instability of the coefficient estimates (16 ). One approach to best, X1 (location), accounted for 58.1 percent of the variation in
remedy the multicollinearity problem is to express the model in terms AADT, 18.5 percent below the maximum. The second best single-
of centered independent variables. variable subset, X2 (accessibility), accounted for 54.8 percent of the
Finally, the residual plot against predicted values revealed that the variation in AADT. The best two-variable model, X1 and X2,
error variance was not constant. Figure 2 shows a prototype picture accounted for 72.6 percent, only 4 percent below the maximum.
of residual plots when the error variance increases with predictor It was also observed that R-SQUARE remained virtually
variables. This type of departure from constancy of the error vari- unchanged down to subsets of size four, indicating that the 4-variable
ance is called the megaphone type. The plot suggests that the larger model including X1 (location), X2 (accessibility), X4 (total arterial
the predicted value of Y, the more spread out the residuals are. mileage in a county), and X10 (county population)(similar to
Transformation on the response variable for stabilizing the error the result of the backward elimination) may be most useful—with
variance is conducted in the following section. R2 = 75.1 percent (1.5 percent below the maximum)—since the
remaining models, the 5-variable to the 10-variable models, increased
R2 by only 0.7 percent to 1.4 percent.
Variable Transformations Third, the SAS procedure REG provides another variable selection
by plotting the Cp statistic versus P. The resulting plot for this study
To remedy unequal error variances from a regression analysis, a appears in Figure 3. According to the literature (18), for any given
transformation on Y was conducted. The most common transforma- number of selected predictor variables, larger Cp values indicate equa-
tion form is the logarithmic transformation Y ′ = log10 Y, in this study tions with larger error mean squares. For any subset model Cp >
Y ′ = log10 (AADT). Such a transformation on Y may, at the same (p + 1), there is evidence of bias due to an incompletely specified
time, also help normalize the error terms. model. On the other hand, if there are values of Cp < (p + 1), the full
To remedy a multicollinearity problem, a centering on the pre- model is said to be overspecified, that is, it contains too many variables.
dictor variables is one of the most common approaches. Centering Figure 3 reveals that the first three subsets in which their Cp val-
the variables places the intercept at the means of all the variables. ues are greater than (p + 1) reflect the contribution of bias due to an
Therefore, if the variables have been centered, the intercept has no incomplete model or the omitted variables. The pattern of Cp val-
effect on the multicollinearity of the other variables (17 ). Centering ues is quite typical for situations in which multicollinearity is seri-
involves taking the difference between each observation and the ous. Beginning with P = 11 predictor variables, they initially
mean of all observations for the variables. became smaller than ( p + 1) but eventually started to increase. In
this plot, there is a definite corner at P = 4, where the Cp values
increased rapidly with smaller subset size. Hence, a model with four
Variable Selection variables appeared to be a good choice. All four-variable models had
Cp values of less than (p + 1), indicating that these models could be
Since many of the variables appeared to be unimportant and some used for prediction. However, the first four-variable model, which
problems were encountered in the initial regression, subset selection included X1, X2, X4, and X10, had the highest R2.
procedures to identify promising models were used. Several selec- All variable selection methods gave the same result: the AADT
tion options are available in the SAS procedure REG. These options prediction model must include these four predictor variables. There-
are R-SQUARE, Cp Versus P Plot, Forward Selection, Backward fore, the regression model on these four predictor variables was
Elimination, and Stepwise Selection. Three of them were used in investigated in more detail in the following section.
this study.
First, the subset models were chosen using Backward Elimination
by starting with the full model and then eliminating at each step the Reduced Model
one variable whose deletion would cause the residual sum of squares
to increase the least. Some predictor variables were insignificant in the On the basis of the previous analyses, a model with four selected pre-
first step so that they were deleted one by one from the model, begin- dictor variables, X1, X2, X4, and X10, was investigated. The regres-
ning from the smallest partial sum of squares or the largest P-value sion on these four predictor variables is presented in Figure 4. This
until no further deletion was possible. At the end of the seventh step, figure shows that the multicollinearity greatly decreased compared
four predictor variables were found to be statistically significant. with the results in Figure 1. In Figure 4 the R2 is 0.7511 so the max-
These variables were X1, X2, X4, and X10, as defined earlier. imum VIF value was 1/(1 − 0.7511) = 4. All coefficients had a VIF
Second, an optimum subset model is one that, for a given num- value of less than four, which indicated that multicollinearity did not
ber of variables, produces the minimum error sum of squares, or, exist in the model. Further, all t-statistics showed the significance of
equivalently, the maximum R2. Conceptually, the only way to the estimated coefficients at the 5 percent level.
ensure that the best model for each subset size has been found is to All coefficient signs were as expected. For example, the positive
examine all possible subset regressions. This procedure, for m pre- sign for X1 indicated that urban county roads have more daily traf-
dictor variables, requires computing 2m equations for finding opti- fic than rural county roads. Similarly, the positive sign for X2 indi-
mum subsets for all subset sizes. Since this study has 11 predictor cated that the easy-access county roads carry more traffic than the
variables, 2,048 regression equations must be evaluated in order to difficult-access county roads. The positive sign for X4 showed that
find an optimum subset model. Fortunately, the SAS procedure the roads within a high-population county carry more traffic than the
REG has an R-SQUARE option to provide such a computation. roads within a low-population county. Finally, the negative sign for
FIGURE 3 Cp versus P plot.

FIGURE 4 Regression for the best four-variable model.


Mohamad et al. Paper No. 98-1115 75

X10 indicated that the more arterial mileage there is within a county, of the error terms is slightly non-normal, which supports the previous
the less traffic there is on county roads. It is, therefore, expected that paragraph’s assertion.
drivers on local roads will divert as soon as they reach arterial roads.

Pre- and Postanalysis of Outliers


Diagnostics
An alternative model without two outliers was regressed. The results
The purpose of diagnostics is to exhibit the degree of appropriateness of this regression, including the regression of the full data model,
of the reduced model. In this section, a number of refined diagnostics are tabulated in Table 1 for comparison purposes.
for checking the adequacy of the reduced regression model were con- Both models had the constancy of error variances. The model
without two outliers fitted the assumption of normality and had R2
ducted. These include methods for detecting outliers, nonconstancy
larger than the R2 of the full data model. Therefore, this model was
of error variance, and nonnormality of error terms.
selected as a final model for the AADT prediction on paved county
roads. The model is written as follows:
Presence of Outliers Residual outliers can be identified from
Log10 ( AADT) = 4.82 + 0.82 X1 + 0.84 X2 + 0.24 Log ( X4)
residual plots against predicted values of the response variable and
from stem-and-leaf plots. − 0.46 Log ( X10) (3)
Both plots indicated that there were some outliers in the data set.
Two observations had a residual of more than one. A possible cause
of one observation was that the traffic counter did not count for a Model Validation
full 48-hour period, rather it was for only 28 hours due to early
removal of the counter by an unknown party. An outlier on another In the model development, the fitted regression model (Equation 3)
observation occurred probably because the traffic counter was was finally selected as the AADT prediction model for paved county
installed too close to the city of Indianapolis, whose population roads to check for validity. The applied means of model validation
was through the collection of new data of the response variable,
range was far beyond the values in this study.
which were then compared with the predicted values obtained from
the selected model.
Eight counties in Indiana were randomly selected to obtain
Nonconstancy of Error Variance If the model is correct, the
average daily traffic for validation purposes. These counties were
residuals should be without pattern or structure, which can be exam-
in addition to the previously selected 40 counties. However, their
ined by plotting the residuals against the predicted values. The results
independent variables’ values were still in the model range.
of this plot suggest the constancy of error variances. As mentioned
The validation method used was designed to calibrate the predic-
earlier, the logarithmic transformation on the response variable
tive capability of the selected regression model. To calibrate the pre-
helped stabilize the error variance.
dictive ability of the regression model fitted from the model-building
data set, the predicted values of the AADT were calculated by enter-
ing the validation data set into the model. The observed and predicted
Nonnormality of Error Terms The assumption of normality is
values of the AADT are presented in Table 2. Then, the mean
often made from a suitable normal distribution. Since the validity of
squared prediction error as in Equation 2 was calculated. Since log-
many results often depends on the validity of the normal assumption,
arithmic transformation was used in the model development, both
a test for normality was examined. The normality of the error terms
observed and predicted AADT were transformed into logarithmic
can be studied informally by examining the residuals in a variety of
form. The MSPR for the AADT prediction model was 0.0510.
graphic ways.
Since the MPSR did not differ greatly from the MSE obtained
The standard and widely used tests for normality are the Shapiro-
from the model-building data set, the selected regression model was
Wilk test and normal probability plots. The SAS procedure UNI-
not seriously biased and was an appropriate indication of the pre-
VARIATE was conducted in this study to test the normality of error
dictive ability of the model. These validation results supported the
terms for the best four-variable model. The normal probability plot
appropriateness of the model selected.
and the W-value obtained from UNIVARIATE procedure appear in
The percentage difference between the AADT observed from the
Figure 5. It shows that the W-value or the coefficient of correlation
counters and the AADT predicted from the models is discussed
between the ordered residuals and their expected values under nor-
here. The comparison of these two AADT is presented in Table 2.
mality was .9791. The small value of W leads to the rejection of the
The percentage difference between these two AADT values ranged
null hypothesis of normality. Controlling the α risk at 0.05, using
from 1.56 percent to 34.18 percent, with the average difference of
the table “Critical Values for Coefficient of Correlation Between
16.78 percent.
Ordered Residuals and Expected Values under Normality When
Distribution of Error Terms Is Normal” (14), the critical value for
n = 89 was determined as 0.9859. Since the observed coefficient was SUMMARY AND CONCLUSIONS
lower than this level, the distribution of the error terms departed
slightly from a normal distribution. This study developed an efficient, inexpensive, and easy-to-use
The departure of error terms from a normal distribution was method to estimate the daily traffic on county roads. Multiple regres-
probably due to the presence of the previously discussed outliers in sion analysis was used to develop the AADT prediction model. It was
the data set. This was visually indicated by the normal probability found that transformation of the response variable and centering of the
plot in Figure 5. It is clear from Figure 5 that five observations devi- independent variables were required because all variables were not
ate from the expected straight line, indicating that the distribution normally distributed.
76 Paper No. 98-1115 TRANSPORTATION RESEARCH RECORD 1617

FIGURE 5 W-value and normal probability plot.

Several variable selection methods in multiple regression analysis was 0.77, which indicated that the four significant factors of the
were employed to select the best model in the building process. The model explained about 77 percent of the variability in AADT. The
number of significant predictor variables was four. County popula- W-value of the model under normality was 0.9865, indicating that
tion and county arterial mileage were two quantitative significant fac- their error terms were normally distributed.
tors; and location and accessibility were two qualitative significant Some outliers were observed in data sets. These outliers were
factors affecting the daily traffic on paved county roads. analyzed in the model-building process. A model without outliers
R2 and normality assumption were adopted as the most useful cri- was finally suggested for use by the county highway department for
teria to gauge the goodness of a statistical model. The R2 of the model the AADT prediction on its county roads. A validation of the mod-
Mohamad et al. Paper No. 98-1115 77

TABLE 1 Pre- and Postanalysis of Outliers

els through the collection of new traffic data from eight randomly REFERENCES
selected counties was conducted and the results showed that the
developed models produced predicted values of AADT close to the 1. Robertson, H. D., J. E. Hummer, and D. C. Nelson. Manual of Trans-
observed values. The predictive ability of the models, which was cal- portation Engineering Studies. Prentice-Hall, Inc., Englewood Cliffs,
culated as the MSPR, was 0.0510. Since the MSPR of the model did N.J., 1994.
2. Fricker, J. D., and S. K. Saha. Traffic Volume Forecasting Methods for
not differ greatly from the MSE, the selected model was not seriously Rural State Highways—Final Report. FHWA/IN/JHRP-86/20. Joint
biased and was an appropriate indication of the predictive ability of Highway Research Project, School of Civil Engineering, Purdue Uni-
the model. The MSE of the model obtained from the model-building versity, West Lafayette, Ind., 1987.
data set was 0.1606. 3. Saha, S. K. The Development of a Procedure to Forecast Traffic Vol-
umes on Urban Segments of the State and Interstate Highway Systems.
Multiple linear regression provided reasonable statistical results. Ph.D. dissertation. School of Civil Engineering, Purdue University,
The traffic prediction models were developed as accurately as pos- West Lafayette, Ind., 1990.
sible within the limitations of traffic data. All parameter estimates 4. Clark, D. E. Updating Models to Forecast Traffic Volume on Rural Seg-
were found to be statistically significant and had an expected sign. ments of the State Highway System. Master’s thesis. School of Civil
It was found that no other variables would provide additional sig- Engineering, Purdue University, West Lafayette, Ind., 1993.
5. Neveu, A. J. Quick-Response Procedures to Forecast Rural Traffic. In
nificant predictive power in the models. The developed models Transportation Research Record 944, TRB, National Research Coun-
could be updated as new traffic data became available. The predic- cil, Washington, D.C., 1983, pp. 47–53.
tive ability of the developed models was proven through model 6. Deacon, J. A., J. G. Pigman, and A. Mohenzadeh. Traffic Volume Esti-
validation, and the results supported the appropriateness of the mates and Growth Trends. UKTRP-87-32. Kentucky Transportation
Research Program, University of Kentucky, Lexington, 1987.
developed models. 7. Cheng, C. Optimal Sampling for Traffic Volume Estimation. Ph.D. dis-
sertation. Carlson School of Management, University of Minnesota, 1992.
8. Morf, T. F., and F. V. Houska. Traffic Growth Patterns on Rural High-
TABLE 2 Observed and Predicted AADT ways. Bulletin 194, HRB, National Research Council, Washington,
D.C., 1958, pp. 33–41.
9. Bureau of the Census. County and City Data Book 1995: A Statistical
Abstract Supplement. U.S. Department of Commerce.
10. Motor Vehicle Registration in Indiana by County. Indiana Bureau of
Motor Vehicles, 1995.
11. Division of Program Development Mileage. Functional Classification
Code. Indiana Department of Transportation, 1995.
12. Kumapley, R. K. Estimating Statewide Vehicle-Miles of Travel in Indi-
ana. Master’s thesis. School of Civil Engineering, Purdue University.
West Lafayette, Ind., 1994.
13. Freund, R. J., and R. C. Littell. SAS System for Regression, 2nd ed. SAS
Institute Inc., Cary, N.C., 1991.
14. Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. Applied
Linear Statistical Models, 4th ed. Irwin, Chicago, 1996.
15. Myers, R. H. Classical and Modern Regression with Application, 2nd
ed. PWS and Kent Publishing Company, Inc., Boston, 1990.
16. Chatterjee, S., and B. Price. Regression Analysis by Example, 2nd ed.
John Wiley & Sons, Inc., New York, 1991.
17. Belsley, D. A., E. Kuh, and R. E. Welsch. Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity. John Wiley &
Sons, Inc., New York, 1980.
18. Rawlings, O. J. Applied Regression Analysis: A Research Tool.
Wadsworth & Brooks, Pacific Grove, Calif., 1988.

Publication of this paper sponsored by Committee on Transportation Plan-


ning Applications.

You might also like