You are on page 1of 10

CICTP 2019 4274

Improving the Quality of Pavement Performance Data in Pavement


Management System

Jialing Jiang1; Fujian Ni2; Qiao Dong3; and Linyi Yao4

1
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

Dept. of Transportation, Southeast Univ., No. 2, Southeast University Rd., Nanjing,


China (corresponding author). E-mail: 2917747916@qq.com
2
Dept. of Transportation, Southeast Univ., No. 2, Southeast University Rd., Nanjing,
China. E-mail: nifujian@gmail.com
3
Dept. of Transportation, Southeast Univ., No. 2, Southeast University Rd., Nanjing,
China. E-mail: qiaodong@seu.edu.cn
4
Dept. of Transportation, Southeast Univ., No. 2, Southeast University Rd., Nanjing,
China. E-mail: 767011072@qq.com

Abstract
High-quality data are the presupposition for accurate analysis results.
However, due to some inevitable errors, the quality of pavement performance data is
low, which leads to great interference in maintenance decisions. In this paper, we first
take a brief review of data quality control aspects and methods. Second, we offer a
creative data quality control framework suitable for large scale use in pavement
management systems. The framework consists of data preprocessing, data inspection,
and data correction. Third, we use the rut depth data of a coastal highway as a case
analysis to verify feasibility and effectiveness. Finally, we discuss future work.

Keywords: Data quality control; Pavement performance; Data cleaning; Quality


management.

INTRODUCTION
A pavement management system is an auxiliary decision-making tool which
helps highway management departments make rational plans with maintenance funds
and achieve maximum investment benefit ratios. The system is built with the
combination of several subjects such as operations research and systems engineering.
There are usually six modules in the system, with the data acquisition module as the
foundation of the other modules. Pavement performance data consists of rut depth,
deflection value, surface distress, international roughness index, side-way force
coefficient, etc. By analyzing pavement performance data, highway management
departments can assess road conditions and develop a corresponding maintenance
plan to improve pavement performance.

© ASCE

CICTP 2019
CICTP 2019 4275

Data quality has a series of definitions since researchers began to study in the
1950s. Data quality is the extent to which the data is suitable for use (Huang 1999).
Cappiello (2004) defined data quality as the degree to which data meets customers’
expectations. In order to measure data quality, Wang and Strong (1996) defined the
concept of “data quality dimension” which represents aspects of data quality. Haug
(2011) summarized existing data quality dimensions and considered accuracy,
timeliness, consistency and completeness as the core dimensions. Cai and Zhu (2018)
divided data quality into five dimensions, each of which consists of several elements.
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

The choice of core data quality elements differs in different situations; for example,
when it comes to medical data, accuracy and timeliness are more important as a
doctor needs previous medical records as reference to learn more about patients’
physical condition. In the paper, accuracy, consistency and completeness are regarded
as core data quality elements in pavement management system.
An efficient pavement management system is supported by accurate,
consistent, complete, and immediate data. Data quality directly determines whether a
system can make correct and timely feedback on pavement conditions. Low data
quality will lead to a deviation between assessment results and actual conditions of
pavements, which can result in waste of maintenance funds and negligible
improvement on pavement performance. Research has found that data with large
variability can cause great interference with pavement performance assessment (Jia
2018). Practical Guide for Quality Management of Pavement Condition Data
Collection (Pierce 2013) points out that just a one percent difference in the area of
low-severity fatigue cracking can make a 12-point difference in the 100-point
pavement condition index (PCI) calculation. Therefore, the assessment of pavement
performance must be based on high-quality data, which is an essential guarantee for
objective and effective assessment on pavement performance. As to data quality
management, meteorology has formed a mature data quality control system, which
consists of inspection of historical and file extremes and a check of internal and time
consistency (Feng 2004). Dai (2018) integrates deep learning networks and statistical
quality control models. The data quality management program proposed by the
Federal Highway Administration focuses on the source of data production to avoid
mistakes and gives practical methods to improve data quality during the process of
data collection. When it comes to data cleaning, there hasn’t yet been a systematic
inspection and correction method for pavement management system.
By the exploration and analysis of large amounts of pavement performance
data, the paper put forward a data quality control process which consists of a data
inspection process and data correction methods with consideration of pavement
structure, maintenance information and traffic environment.

© ASCE

CICTP 2019
CICTP 2019 4276

ARRCHITECTURE
In the paper,
p the daata quality co
ontrol frameework is divvided into thrree periods:
dataa preprocesssing, data in
nspection, daata correctionn. Data prepprocessing m mainly deals
with data format problems and data missing
m probllems. Data sshall be trannsferred into
the required form
f that iss convenien nt for compputation durring this peeriod. Data
insppection utilizes mathem matical stattistics princciples and ppavement pperformance
atteenuation law
w to inspect preprocessed
p d data and ddivide data innto two grouups: correct
andd wrong. Data
D correction is to interpolate
i eerroneous ddata by usiing various
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

inteerpolation methods.
m The architecturee is as follow
ws (see Figurre 1).

Figure 1. Architectture

DA
ATA PREPR
ROCESSING
G

Datta format innspection


Since pavement perrformance data
d over thee years is noot necessarily measured
with the samee type of equipment,
e sometimes there existss the probleem of unit
incoonsistency. Before dataa analysis, alll data shouuld be conveerted to stanndard units.
Thee confusion between num meric data and
a textual ddata also brinngs extra troouble to the
folllowing data analysis.

© ASCE

CICTP 2019
CICTP 2019 4277

Missing data inspection


During field testing process, the starting and ending stake numbers assigned
by inspectors are different each year, which can lead to missing detection data of
some stake numbers. In addition, in the process of data entry, some detection data is
lost due to human negligence or system failure. The impact of a small proportion of
data loss on subsequent maintenance decisions is negligible. Meanwhile, if there is a
wide range of data loss, sufficient attention should be paid to and relevant reasons
should be found out.
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

Data transformation
In order to reduce the error caused by inconsistent stake number, the entire
road can be divided into small sections. As a result, the pavement data is also
converted to the same form. For instance, the asphalt pavement maintenance
specification stipulates that the rut depth measurement interval is 10 meters. Divide
the pavement into a 100-meter section and calculate the average of the rut depth
within 100 meters for subsequent data inspection.

DATA INSPECTION PROCESS

Reasonable range inspection


When pavement performance drops to a certain state, for example, rut depth
exceeds 15mm, appropriate maintenance measures are taken to restore pavement
performance for safe driving. Due to the different maintenance funds in different
regions, the level of maintenance varies. When determining the reasonable range of
detection indicators, it is necessary to comprehensively consider the road structure,
maintenance intensity, climate, traffic axle load, and other factors in the region. It is
worth noting that when analyzing the specific situation, it is necessary to divide the
route, lane, and direction to perform statistical analysis on the data and determine the
reasonable range of the data. The same road section has the same pavement structure,
traffic volume and climate, and the detection data of each stake number should have
little difference. However, the data of a few stake numbers are significantly different
from other stake numbers due to the deviation of the driving route and the instability
of the instrument. In this case, the 3σ principle or the box line diagram are used to
identify the abnormal value.
The principle of the 3σ principle is that when the data satisfies the normal
distribution, the value deviating from the average value by more than three standard
deviations is regarded as an abnormal value. Under the assumption of a normal
distribution, the probability of a value that deviates from the average of three standard
deviations is 0.0027, which is a small probability event, so it can be regarded as an
outlier.
In the box plot, the outliers refer to values less than QL-1.5IQR or greater than
QU+1.5IQR. QL stands for the lower quartile and 25% of the total sample data is

© ASCE

CICTP 2019
CICTP 2019 4278

smaller than it. QU stands for the upper quartile, which means that 25% of the data in
all sample data is larger than it. IQR stands for the interquartile range and is the
difference between the upper quartile QU and the lower quartile QL. Since the box
line diagram is based on the quartile and the interquartile range, the result of
identifying the abnormal data is more objective and reliable.

Pavement performance attenuation law inspection


The development of pavement performance follows a certain law under the
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

repeated effects of vehicle load and the environment. Factors such as pavement
structure design, construction level, maintenance level, and vehicle axle load
determine the decay mode of pavement performance. The initial performance
attenuation of the pavement with reasonable design of the pavement structure and
high construction level will be slower. Proper maintenance will slow down the later
speed of road performance attenuation.
With the increase of road age, the performance of pavement performance is
declining in the absence of road maintenance. The concrete manifestation is that the
rut depth increases, the roughness of the road surface deteriorates, the anti-sliding
performance decreases, the pavement structure strength decreases, and pavement
distress increases. After pavement maintenance, pavement performance is improved
to a certain extent. The subsequent performance attenuation rate is different according
to the maintenance measures. Previous researchers have summarized the attenuation
patterns of pavement performance, and each pattern corresponds to several complex
formulas. However, such complex formulas are not necessarily needed to improve the
data quality of pavement performance. The paper establishes the following inspection
process according to the overall trend of pavement performance attenuation.
The first detection data after the latest pavement overhaul is usually regarded
as the benchmark data y1. The next detection data after the benchmark data is y2 and
so on. Take rut depth data as an example, if there has not been maintenance between
the two detections, the relationship between data should satisfy the following formula
(1):

y2 > y1 (1)

Considering errors such as measuring instruments, environmental changes,


and travel route offsets, if the latter detection data is floating within a reasonable
range of the former data, it is also considered correct. This reasonable range
coefficient c is obtained by considering the allowable error level and the growth rate
of the index. So the formula (1) can be changed to formula (2):

© ASCE

CICTP 2019
CICTP 2019 4279

y2 ≥ c ⋅ y1 (2)

c = (1− a) × (1+ b)n (3)

Where a is allowable error level, b is the growth rate of the index and n stands for the
interval between two inspections.
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

Summarize the error rate of each detection time. Check backwards in


accordance with the inspection rules if the error rate is below 50%. There are two
possibilities when the error rate exceeds 50%:
(a) Missing maintenance records between detections resulted in
misclassification of the correct data for the latter detection as abnormal data.
(b) There was a significant error in the latter detection, which caused most of
the data of the detection to be incorrect.
The specific situation should be identified by specific circumstances. For the
first case, check whether the subsequent data is developed smoothly on the basis of
the latter detection data, that is, the attenuation continues after the road performance
is improved. If it is, it means that the rutting treatment is actually carried out between
two detections, and the latter detection data is also regarded as correct data. If the
subsequent detection data is developed smoothly on the basis of the former detection
data, the latter data is abnormal data. The data should not participate in the
subsequent inspection process. For the second case, the latter test data is regarded as
abnormal data directly and excluded from the subsequent inspection process.

Consistency inspection
The indexes are not isolated and there is a certain relationship between them.
Research (Patrick and Soliman 2018) has found that pavement roughness is closely
related to pavement structure strength and pavement distress. When the pavement
structure strength decreases, the pavement crack develops rapidly and the pavement
condition deteriorates. Therefore, according to this phenomenon, inconsistent
detection data can be eliminated.

DATA CORRECTION
According to the above data inspection process, regard the abnormal data
which don’t pass inspection as the missing data for subsequent processing. The
abnormal data include missing data, data which is not in reasonable range, data which
do not satisfy pavement performance attenuation law and data which are not
consistent. The missing data is filled by the appropriate interpolation method to
achieve the goal of data correction.

© ASCE

CICTP 2019
CICTP 2019 4280

Mean interpolation method


The mean interpolation method replaces the missing values with the mean
value of the overall sample data, and can be divided into the total mean interpolation
method and the group mean interpolation method. The total mean interpolation
method replaces all the missing values with the mean value of a certain performance
data of the entire highway. It is simple and convenient but ignores the differences in
the road structure, environment and vehicle load of each road section, which greatly
reduces the data variation and seriously distorts the distribution of data. This
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

interpolation method is not recommended for use in a pavement management system.


The group mean interpolation method divides the entire expressway into many small
sections according to the road structure, vehicle load, environment and other factors,
and fills the missing values with the mean value of each small section. The group
mean interpolation method considers the difference between the road sections and has
less destructive effect on the distribution of the original data. Compared to not
processing missing values, the impact on the subsequent analysis is small.

Regression interpolation method


The regression interpolation method utilizes the relationship between the
independent variable and the dependent variable to interpolate the missing values.
Establish a regression model between the data variable y and the influence variable xi
with correct data and then predict missing values. The regression model belongs to
the empirical prediction model and is based on a large amount of data. It has strong
operability, conciseness and clarity, and can achieve certain prediction accuracy. It is
suitable for use in pavement management system. Therefore, the road age, the
cumulative number of axle load times, etc. can be used as variables, and the data can
be fitted with different mathematical models, such as linear model, exponential model
and polynomial model. The model with the highest fitting accuracy is selected as the
final prediction model.

Linear interpolation
Although the relationship between road performance and single variable is not
linear, the more complex the interpolation method is, the greater the error caused by
the interpolation results. The error caused by linear interpolation method is lower than
that of curve interpolation method. In addition, the linear interpolation method is
simple in principle and operability, and is suitable for large-scale applications.
The principle is as follows: It is known that the detection datum corresponding
to the time t1 and the time t2 are y1 and y2, and the time t is between the time t1 and
the time t2. If the data corresponding to the time t is missing, the calculation formula
(4) of the filling value y is:

© ASCE

CICTP 2019
CICTP 2019 4281

y = y1 + (t − t1 ) × ( y2 − y1 ) / (t2 − t1 ) (4)
Sometimmes there arre cases wheere the valuue of the tail interpolation is large,
and
d such cases are marked for subsequeent adjustmeent.

CAASE STUDY Y
he paper werre all downlooaded from pavement m
The datta used in th management
system develop ped by Ni Fu ujian Researrch Group oof Southeastt University. This paper
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

seleected the rutt depth data of Jiangsu Coastal


C Exppressway oveer the years as research
object. In ordeer to reduce the error caaused by the misplacem ment of the station, the
entiire road is divided into small
s section
ns of 100 meeters. As a result, the rutt depth data
is also
a converteed to the sam me form. Sin nce there hass been an oveerhaul in 2010, the data
beffore 2011 aree removed to o ensure the accuracy.
The dev velopment of
o the raw ru ut depth with
th time is ass shown (seee Figure 2).
Aftter performin ng the data inspection process thatt has been eexplained beefore, three
dataa correction methods aree applied to the preliminnary processsed data. Thee results are
shoown in Figurre 2.

Figurre 2. The co
omparison between
b threee interpolaation methood

© ASCE

CICTP 2019
CICTP 2019 4282

As can be seen in the picture 3, in the beginning, there is not much difference
among the three interpolation methods and the interpolation results are similar to the
raw data. However, as the time goes, the difference between the three interpolation
methods gradually emerges. Data processed with linear interpolation method are
significantly larger than raw data. This can be attributed to the fact that the tail value
is too large and some adjustments are needed. The results of the mean interpolation
method and the regression interpolation method are always very close. However, the
data processed by regression interpolation method keeps growing with time while the
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

other doesn’t.
It’s worth noting that from the figure above it seems that the curves of the
three interpolation methods are quite different from the curve of raw data. This
phenomenon can be attributed to the reasons as follows: first, the standard applied is
too strict and we can set the standard to be looser in practical applications; second, the
absence of some conservation history data has also influenced the result.
CONCLUSION
The paper has built a data quality control framework to identify abnormal data
in a pavement management system, including data preprocessing, data inspection and
data correction. Data preprocessing changes pavement performance data into the
needed form, then data inspection identifies abnormal data and data correction finally
uses interpolation methods to interpolate abnormal data.
This work applies data quality control to the pavement management system.
This idea has a great potential to improve pavement performance data and to support
scientific maintenance decision-making.

FUTURE WORK
The indicator consistency inspection currently only relates to the relationship
between pavement distress and structural strength. The relationship between other
indicators is not clear, so it is not included in the scope of inspection. Future research
will determine the correlation between other indicators and improve the data
inspection process.

REFERENCES
Cai, L. and Zhu, Y. (2015),”The Challenges of Data Quality and Data Quality
Assessment in the Big Data Era.” Data Science Journal, 14, p.2.
Cappiello, C., Francalanci, C., and Pernici, B. (2004) Data quality assessment from
user’s perspective. Procedures of the 2004 International Workshop on
Information Quality in Information Systems, New York: ACM, pp 78–73.
Dai W., Yoshigoe K., and Parsley W. (2018), “Improving Data Quality Through Deep
Learning and Statistical Models”. In: Latifi S. (eds) Information Technology -
New Generations. Advances in Intelligent Systems and Computing, vol 558.
Springer, Cham

© ASCE

CICTP 2019
CICTP 2019 4283

Feng, S., Hu, Q., and Qian, W. (2004), Quality control of daily meteorological data in
China, 1951–2000: a new dataset. Int. J. Climatol., 24: 853-870.
doi:10.1002/joc.1047
Haug, A. and Arlbjørn, J. S. (2011), "Barriers to Master Data Quality," Journal of
Enterprise Information Management, 24, 3, pp. 288-303.
Huang, K.T. and Wang, W.E. (1999) Quality Information and Knowledge, Upper
Saddle River, NJ: Prentice Hall.
Jia, X., Huang, B., Zhu, D., Dong, Q., Woods, M. (2018) Influence of Measurement
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.

Variability of International Roughness Index on Uncertainty of Network-Level


Pavement Evaluation. Journal of Transportation Engineering, Part B:
Pavements 144:2, 04018007.
Patrick, G and Soliman, H. (2018). Roughness Prediction Models Using Pavement
Surface Distresses in Different Canadian Climatic Regions. Transportation
Research Board 97th Annual Meeting, 15p
Pierce, L. M., McGovern, G., and Zimmerman, K. A.. (2013) Practical Guide for
Quality Management of Pavement Condition Data Collection.
FHWA-HIF-14-006. FHWA, U.S. Department of Transportation
Wang, R.Y., and Strong, D.M. (1996), "Beyond Accuracy: What Data Quality Means
to Data Consumers," Journal of Management Information Systems, 12, 4, pp.
5-33.

© ASCE

CICTP 2019

You might also like