Professional Documents
Culture Documents
1
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
Abstract
High-quality data are the presupposition for accurate analysis results.
However, due to some inevitable errors, the quality of pavement performance data is
low, which leads to great interference in maintenance decisions. In this paper, we first
take a brief review of data quality control aspects and methods. Second, we offer a
creative data quality control framework suitable for large scale use in pavement
management systems. The framework consists of data preprocessing, data inspection,
and data correction. Third, we use the rut depth data of a coastal highway as a case
analysis to verify feasibility and effectiveness. Finally, we discuss future work.
INTRODUCTION
A pavement management system is an auxiliary decision-making tool which
helps highway management departments make rational plans with maintenance funds
and achieve maximum investment benefit ratios. The system is built with the
combination of several subjects such as operations research and systems engineering.
There are usually six modules in the system, with the data acquisition module as the
foundation of the other modules. Pavement performance data consists of rut depth,
deflection value, surface distress, international roughness index, side-way force
coefficient, etc. By analyzing pavement performance data, highway management
departments can assess road conditions and develop a corresponding maintenance
plan to improve pavement performance.
© ASCE
CICTP 2019
CICTP 2019 4275
Data quality has a series of definitions since researchers began to study in the
1950s. Data quality is the extent to which the data is suitable for use (Huang 1999).
Cappiello (2004) defined data quality as the degree to which data meets customers’
expectations. In order to measure data quality, Wang and Strong (1996) defined the
concept of “data quality dimension” which represents aspects of data quality. Haug
(2011) summarized existing data quality dimensions and considered accuracy,
timeliness, consistency and completeness as the core dimensions. Cai and Zhu (2018)
divided data quality into five dimensions, each of which consists of several elements.
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
The choice of core data quality elements differs in different situations; for example,
when it comes to medical data, accuracy and timeliness are more important as a
doctor needs previous medical records as reference to learn more about patients’
physical condition. In the paper, accuracy, consistency and completeness are regarded
as core data quality elements in pavement management system.
An efficient pavement management system is supported by accurate,
consistent, complete, and immediate data. Data quality directly determines whether a
system can make correct and timely feedback on pavement conditions. Low data
quality will lead to a deviation between assessment results and actual conditions of
pavements, which can result in waste of maintenance funds and negligible
improvement on pavement performance. Research has found that data with large
variability can cause great interference with pavement performance assessment (Jia
2018). Practical Guide for Quality Management of Pavement Condition Data
Collection (Pierce 2013) points out that just a one percent difference in the area of
low-severity fatigue cracking can make a 12-point difference in the 100-point
pavement condition index (PCI) calculation. Therefore, the assessment of pavement
performance must be based on high-quality data, which is an essential guarantee for
objective and effective assessment on pavement performance. As to data quality
management, meteorology has formed a mature data quality control system, which
consists of inspection of historical and file extremes and a check of internal and time
consistency (Feng 2004). Dai (2018) integrates deep learning networks and statistical
quality control models. The data quality management program proposed by the
Federal Highway Administration focuses on the source of data production to avoid
mistakes and gives practical methods to improve data quality during the process of
data collection. When it comes to data cleaning, there hasn’t yet been a systematic
inspection and correction method for pavement management system.
By the exploration and analysis of large amounts of pavement performance
data, the paper put forward a data quality control process which consists of a data
inspection process and data correction methods with consideration of pavement
structure, maintenance information and traffic environment.
© ASCE
CICTP 2019
CICTP 2019 4276
ARRCHITECTURE
In the paper,
p the daata quality co
ontrol frameework is divvided into thrree periods:
dataa preprocesssing, data in
nspection, daata correctionn. Data prepprocessing m mainly deals
with data format problems and data missing
m probllems. Data sshall be trannsferred into
the required form
f that iss convenien nt for compputation durring this peeriod. Data
insppection utilizes mathem matical stattistics princciples and ppavement pperformance
atteenuation law
w to inspect preprocessed
p d data and ddivide data innto two grouups: correct
andd wrong. Data
D correction is to interpolate
i eerroneous ddata by usiing various
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
inteerpolation methods.
m The architecturee is as follow
ws (see Figurre 1).
Figure 1. Architectture
DA
ATA PREPR
ROCESSING
G
© ASCE
CICTP 2019
CICTP 2019 4277
Data transformation
In order to reduce the error caused by inconsistent stake number, the entire
road can be divided into small sections. As a result, the pavement data is also
converted to the same form. For instance, the asphalt pavement maintenance
specification stipulates that the rut depth measurement interval is 10 meters. Divide
the pavement into a 100-meter section and calculate the average of the rut depth
within 100 meters for subsequent data inspection.
© ASCE
CICTP 2019
CICTP 2019 4278
smaller than it. QU stands for the upper quartile, which means that 25% of the data in
all sample data is larger than it. IQR stands for the interquartile range and is the
difference between the upper quartile QU and the lower quartile QL. Since the box
line diagram is based on the quartile and the interquartile range, the result of
identifying the abnormal data is more objective and reliable.
repeated effects of vehicle load and the environment. Factors such as pavement
structure design, construction level, maintenance level, and vehicle axle load
determine the decay mode of pavement performance. The initial performance
attenuation of the pavement with reasonable design of the pavement structure and
high construction level will be slower. Proper maintenance will slow down the later
speed of road performance attenuation.
With the increase of road age, the performance of pavement performance is
declining in the absence of road maintenance. The concrete manifestation is that the
rut depth increases, the roughness of the road surface deteriorates, the anti-sliding
performance decreases, the pavement structure strength decreases, and pavement
distress increases. After pavement maintenance, pavement performance is improved
to a certain extent. The subsequent performance attenuation rate is different according
to the maintenance measures. Previous researchers have summarized the attenuation
patterns of pavement performance, and each pattern corresponds to several complex
formulas. However, such complex formulas are not necessarily needed to improve the
data quality of pavement performance. The paper establishes the following inspection
process according to the overall trend of pavement performance attenuation.
The first detection data after the latest pavement overhaul is usually regarded
as the benchmark data y1. The next detection data after the benchmark data is y2 and
so on. Take rut depth data as an example, if there has not been maintenance between
the two detections, the relationship between data should satisfy the following formula
(1):
y2 > y1 (1)
© ASCE
CICTP 2019
CICTP 2019 4279
y2 ≥ c ⋅ y1 (2)
Where a is allowable error level, b is the growth rate of the index and n stands for the
interval between two inspections.
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
Consistency inspection
The indexes are not isolated and there is a certain relationship between them.
Research (Patrick and Soliman 2018) has found that pavement roughness is closely
related to pavement structure strength and pavement distress. When the pavement
structure strength decreases, the pavement crack develops rapidly and the pavement
condition deteriorates. Therefore, according to this phenomenon, inconsistent
detection data can be eliminated.
DATA CORRECTION
According to the above data inspection process, regard the abnormal data
which don’t pass inspection as the missing data for subsequent processing. The
abnormal data include missing data, data which is not in reasonable range, data which
do not satisfy pavement performance attenuation law and data which are not
consistent. The missing data is filled by the appropriate interpolation method to
achieve the goal of data correction.
© ASCE
CICTP 2019
CICTP 2019 4280
Linear interpolation
Although the relationship between road performance and single variable is not
linear, the more complex the interpolation method is, the greater the error caused by
the interpolation results. The error caused by linear interpolation method is lower than
that of curve interpolation method. In addition, the linear interpolation method is
simple in principle and operability, and is suitable for large-scale applications.
The principle is as follows: It is known that the detection datum corresponding
to the time t1 and the time t2 are y1 and y2, and the time t is between the time t1 and
the time t2. If the data corresponding to the time t is missing, the calculation formula
(4) of the filling value y is:
© ASCE
CICTP 2019
CICTP 2019 4281
y = y1 + (t − t1 ) × ( y2 − y1 ) / (t2 − t1 ) (4)
Sometimmes there arre cases wheere the valuue of the tail interpolation is large,
and
d such cases are marked for subsequeent adjustmeent.
CAASE STUDY Y
he paper werre all downlooaded from pavement m
The datta used in th management
system develop ped by Ni Fu ujian Researrch Group oof Southeastt University. This paper
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
Figurre 2. The co
omparison between
b threee interpolaation methood
© ASCE
CICTP 2019
CICTP 2019 4282
As can be seen in the picture 3, in the beginning, there is not much difference
among the three interpolation methods and the interpolation results are similar to the
raw data. However, as the time goes, the difference between the three interpolation
methods gradually emerges. Data processed with linear interpolation method are
significantly larger than raw data. This can be attributed to the fact that the tail value
is too large and some adjustments are needed. The results of the mean interpolation
method and the regression interpolation method are always very close. However, the
data processed by regression interpolation method keeps growing with time while the
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
other doesn’t.
It’s worth noting that from the figure above it seems that the curves of the
three interpolation methods are quite different from the curve of raw data. This
phenomenon can be attributed to the reasons as follows: first, the standard applied is
too strict and we can set the standard to be looser in practical applications; second, the
absence of some conservation history data has also influenced the result.
CONCLUSION
The paper has built a data quality control framework to identify abnormal data
in a pavement management system, including data preprocessing, data inspection and
data correction. Data preprocessing changes pavement performance data into the
needed form, then data inspection identifies abnormal data and data correction finally
uses interpolation methods to interpolate abnormal data.
This work applies data quality control to the pavement management system.
This idea has a great potential to improve pavement performance data and to support
scientific maintenance decision-making.
FUTURE WORK
The indicator consistency inspection currently only relates to the relationship
between pavement distress and structural strength. The relationship between other
indicators is not clear, so it is not included in the scope of inspection. Future research
will determine the correlation between other indicators and improve the data
inspection process.
REFERENCES
Cai, L. and Zhu, Y. (2015),”The Challenges of Data Quality and Data Quality
Assessment in the Big Data Era.” Data Science Journal, 14, p.2.
Cappiello, C., Francalanci, C., and Pernici, B. (2004) Data quality assessment from
user’s perspective. Procedures of the 2004 International Workshop on
Information Quality in Information Systems, New York: ACM, pp 78–73.
Dai W., Yoshigoe K., and Parsley W. (2018), “Improving Data Quality Through Deep
Learning and Statistical Models”. In: Latifi S. (eds) Information Technology -
New Generations. Advances in Intelligent Systems and Computing, vol 558.
Springer, Cham
© ASCE
CICTP 2019
CICTP 2019 4283
Feng, S., Hu, Q., and Qian, W. (2004), Quality control of daily meteorological data in
China, 1951–2000: a new dataset. Int. J. Climatol., 24: 853-870.
doi:10.1002/joc.1047
Haug, A. and Arlbjørn, J. S. (2011), "Barriers to Master Data Quality," Journal of
Enterprise Information Management, 24, 3, pp. 288-303.
Huang, K.T. and Wang, W.E. (1999) Quality Information and Knowledge, Upper
Saddle River, NJ: Prentice Hall.
Jia, X., Huang, B., Zhu, D., Dong, Q., Woods, M. (2018) Influence of Measurement
Downloaded from ascelibrary.org by Carleton University on 09/09/19. Copyright ASCE. For personal use only; all rights reserved.
© ASCE
CICTP 2019