Venkatanarayana, Smith, and Demetsky QUANTUM-FREQUENCY ALGORITHM FOR THE AUTOMATED IDENTIFICATION OF TRAFFIC PATTERNS Ramkumar Venkatanarayana

Transportation Systems Engineer Smart Travel Laboratory* Phone: 434-906-5677 ramkumar@virginia.edu Brian L. Smith (Corresponding Author) Associate Professor* Phone: 434-243-8585 briansmith@virginia.edu Michael J. Demetsky Professor and Chair* Phone: 434-982-2325 mjd@virginia.edu *Department of Civil Engineering University of Virginia P. O. Box 400742 351 McCormick Road Charlottesville, VA 22904-4742 Fax: 434-982-2951

1

A Paper Submitted for Presentation at the 2007 Annual Meeting of the Transportation Research Board and Publication in the Transportation Research Record

Total words = 4460 + 250*10 (9 Figures + 1 Table) = 7,460

Venkatanarayana, Smith, and Demetsky ABSTRACT

2

Knowledge of the “normal” traffic flow pattern is required for a number of transportation applications. Traditionally, the simple historic average has been considered as the best way to derive the traffic pattern. However, this method may often be significantly biased by the presence of incidents. One solution to avoid this bias is through visual inspection of the data by experts. The experts could potentially identify anomalies caused by incidents, and thereby identify the underlying “normal” traffic patterns. Three main challenges of this approach are: (1) the bias introduced due to subjectivity, (2) the additional time required to analyze the data manually, and (3) the increasing sizes of the available traffic data sets. To address the above challenges, and also exploit the potential of information technology, new data analysis tools are essential. In this research, a new tool, the Quantum-Frequency algorithm, was developed. This algorithm can aid in the automated identification of traffic flow patterns from large datasets. The paper presents the algorithm along with its theoretical basis. Finally, in the case study presented in the paper, the algorithm was able to automatically identify a “reasonable” traffic pattern from a large set of archived data. When compared to the historic average, it was found that the pattern identified by the Quantum-Frequency algorithm resulted in 39 % lower cumulative deviation from the pattern identified manually by experts. Key words: Traffic flow pattern, demand, normal traffic, quantization, quantum-frequency algorithm

a need has been identified for the “extraction” of demand (referred to in this paper as traffic patterns) from large quantities of volume data (1. this variation is repetitive and rhythmic. a case study and preliminary results are presented from the application of the Quantum-Frequency algorithm to a large traffic data archive. and Demetsky INTRODUCTION 3 An understanding of the demand for travel is critical to the provision of effective transportation services. BACKGROUND To provide context for the proposed algorithm described in this paper. Garber and Hoel (4) state that “the regular observation of traffic volumes over the years has identified certain characteristics showing that although traffic volume at a section of a road varies from time to time. 2). a software program for estimating delays caused by work zones. Smith.” The user guide continues on to explain the calculation of “hourly demand patterns from Average Daily Traffic. The QuickZone user guide states: “Without an accurate demand. Traffic Flow Patterns – Definition The inherent existence of traffic flow patterns is widely acknowledged by the transportation profession. traffic counts for a single day (or even a few days) may very well provide a poor indication of demand. and a section describing the new algorithm in detail. For example. For example. This paper describes research at the University of Virginia’s Center for Transportation Studies that has addressed this need through the development of a new approach to traffic pattern extraction – the Quantum-Frequency algorithm. QuickZone will not generate usable results. this section presents background information on the state-of-the-art and state-of-the-practice in traffic pattern identification. An example of this need can be seen in the work zone permitting process. and sub-hourly interval within the . there is a need to develop new and better tools to automate the identification of traffic patterns from large scale traffic volume data sets. The paper begins with background information on the state-of-the-art and the state-of-the-practice for identifying traffic flow patterns. Finally. and the disruptions to system capacity caused by incidents. given the stochastic nature of traffic. the results of a manual examination process lead to concerns over bias and repeatability. Further. When experts attempt to counter this bias through visual examination. For this reason. and further stresses the importance of these traffic demand patterns. page 8-2 states “Traffic demand varies by month of the year. day of the week. hour of the day. when considering permit requests. The demand needs to be available in hourly counts for each day of the week (3). Therefore. Many state agencies use QuickZone.” As seen in the QuickZone example. the simple historic average method will likely be significantly biased by the presence of incidents. A common approach to measuring demand is through counting the volume of traffic that is actually using the transportation facilities. Of course. the process becomes very time consuming and cumbersome. However. This is followed by the theoretical basis for the proposed Quantum-Frequency algorithm.Venkatanarayana. the simple historic average of traffic counts has traditionally been used in practice as the underlying traffic pattern.” The Highway Capacity Manual (HCM) (5) echoes the concept of repeatability.

Since the inter-day variation was so low. the average is computed using only the same day of the week. These variations are important if highways are to effectively serve peak demands without breakdown.” and further says that “typical morning and evening peak hours are evident for urban commuter routes on weekdays. 7). and Demetsky hour. see 4). They demonstrated the regularity of traffic flow patterns. The present day simple historic average algorithm has likely derived its foundation and acceptance from this seminal paper. In this paper. They further demonstrated that 95% of the days fall within a small region around this average.e. and for validating the results from its application. during weekdays. State-of-the-Practice For any location of interest. daily) patterns of traffic flows. the authors argued that the historic average is a good approximation of the diurnal traffic flow pattern. Often.” or “normal patterns” (6. the state-of-the practice is to define the traffic pattern as the simple historic average of volume data collected over a (generally) very short period of time. Smith. All this data is then simply averaged by the time of the day (for example.Venkatanarayana. . the HCM (5) references the results from McShane and Crowley’s 1976 TRB paper. as will be described later in the case study section. Various applications commonly refer to these patterns as the “typical patterns. The reasons for the popularity of this approach are likely its simplicity. no rigorous definition has been found in the literature reviewed. at each point of time. When addressing the recurring diurnal (i.” And page 8-4 talks of “typical hourly variation patterns. or just the weekdays or the weekends. The non-existence of a basic definition posed a significant barrier for both developing a new algorithm.” 4 In spite of such wide-spread acceptance of the concept of traffic patterns. the authors considered 77 days of traffic in a Toronto urban street setting. and the ability to statistically manipulate the final result. These barriers were overcome by comparing the results with the engineering judgment of field experts. as shown in Figure 1. primarily based on the average traffic for all the days.

border (please note that data from this link.Venkatanarayana. When examining Figure 2. the simple historic average method is susceptible to significant bias in locations prone to incidents. For example. remember that each series plotted represents volume for a single weekday in January. and Demetsky 5 FIGURE 1 Daily repeatability of hourly traffic variation from HCM (McShane and Crowley.C. Smith. the regular patterns witnessed in Toronto in 1976 are not as readily apparent in the I-395 data. . contrast Figure 1 above with Figure 2 which represents traffic in January 2004 from I-395 North. 1976) (Figure copyright. used with permission) However. near the Virginia-Washington D. Clearly. 90272 and time period will be used throughout this paper). National Academy of Sciences.

Link 90272 All 22 weekdays of January 2004 2000 1800 Traffic Demand Volume (veh. Even the same expert might come up with different patterns by analyzing the same data twice. subjectivity may be introduced. .g. 2005 (unpublished data)). However. more than a science: results are partly dependent on every individual’s own frame of reference (e. Nichols. Different experts viewing the same data might come up with different patterns. these methods present other challenges. Sometimes. these methodologies require human resources. and Demetsky 6 Traffic Demand Volume. these approaches ignore vast stores of data now available from ITS deployments. That figure also shows the engineering judgment of field experts in identifying the typical traffic flow pattern. Smith. Such expert analysis through visualization is an often used improvement from the simple historic average method. a similar result is obtained by pre-selecting one or more days in a period as the representative days (7. Thirdly. per hour per lane) 1600 1400 1200 1000 800 600 400 200 0 0:00 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 Time of Day 1/1/04 1/19/04 1/2/04 1/20/04 1/5/04 1/21/04 1/6/04 1/22/04 1/7/04 1/23/04 1/8/04 1/26/04 1/9/04 1/27/04 1/12/04 1/28/04 1/13/04 1/29/04 1/14/04 1/30/04 1/15/04 1/16/04 FIGURE 2 Hourly traffic flow variation on weekdays The simple historic average for this data is shown in Figure 8 later in the paper. Secondly. Is the Thursday before Easter a regular weekday as far as traffic is concerned?). and extensive time to analyze the data. First.Venkatanarayana. Visser and Molenkamp (8) effectively sum up the state-of-the-practice for traffic pattern identification with this statement: Determining the daily and weekly patterns is a bit of an art.

such as fuzzy logic (9). All these research results are first summarized here briefly. The numerous missing or suspect traffic data records also pose a significant challenge to define and compute features for each day of traffic. these are mainly theoretical. knowing them in advance makes the method redundant. the words “normal” and “pattern” required rigorous definitions. Compared to the 118 days finally used in that research. and then for methods to identify patterns given data. several researchers have attempted to find alternative approaches for extracting traffic flow patterns. This algorithm was developed by a thorough investigation of the definition and concept of “normal patterns. the final goal of the traffic flow pattern identification is to find the patterns. 13). For unsupervised algorithms. Theoretical Foundation for the Quantum-Frequency Algorithm To overcome the deficiencies of the existing practices. Weijermars and van Berkum (12) had to remove 199 days out of the entire year of 2003. as the count of vehicles. the first step involved investigating each part of the phrase – “normal traffic flow pattern. b. Its theoretical foundation is first presented in this section. selecting appropriate clustering parameter values remains more of an art than science.Other Traffic Flow Pattern Identification Algorithms In the recent years. For example. Based on this knowledge. In the previous research.” and a consideration of the thought-process used by experts in manually identifying patterns. 12. and Demetsky 7 Literature Review . Pattern theory: This field was explored for a generic definition of a pattern. Although research papers have been published on approaches such as fuzzy logic. therefore. the missing data represents a large portion that could significantly impact the final results. including clustering. Normal: “Normal” is a concept. they can compare and recognize if new datasets fall under those patterns or not. and wavelets (14). As a rigorous definition for traffic patterns was not found.” “Traffic flow” or “traffic demand” is well-defined. because of missing data. The wavelets approach also has this same drawback. Practical applications of these approaches have not been presented in these cases. a similarity measure or distance (dissimilarity) measure has to be defined based on the features of interest. Smith. Retraining the algorithm for each location and time period will make the algorithm ill-suited for wide implementation. and. 1. key deficiencies are evident in these approaches. clustering (11. However. and SVM. However. the audience seem to understand the . In addition. A major drawback for supervised pattern recognition algorithms is their need for priorknowledge of the patterns. Using these concepts. and are then followed by a more in-depth discussion. The approaches generally are based on advanced computing techniques developed for pattern recognition in other domains. we found that various features and distance metrics could not consistently provide valid results for different locations and time periods (13). the traffic data was explored to unearth important findings. a. principle component analysis (PCA) and support vector machines (SVM) (10). Upon careful consideration.Venkatanarayana. the Quantum-Frequency Algorithm is proposed in this paper. Wherever the word “normal” is used. and its rigorous definition could not be identified. the clustering approaches cannot provide any semantic meaning to their outputs. The Algorithm itself is described in the next section.

Davis and Bradley (15) explore the word “normal” in great detail. We reviewed these theories for potentially identifying the traffic anomalies. The main objective in this research is to identify a time-sequence of values that typify traffic flow. Outlier detection: These statistical theories are quite well-developed. Understanding the expert’s view: How does an expert use judgment to extract the traffic flow pattern by visualizing the data? This step also provided important insights. In medicine. They present several examples to show that “normal” could mean any of “perfect. our findings coincides with theirs: . Two important findings are that (1) the concept of “normal” requires a context. and digital compression fields extensively use quantization – mainly vector quantization. and Demetsky 8 meaning subjectively. The dictionary meanings and synonyms of “normal” and “typical” clearly suggested a repeating structure. the characteristics of the “pathological” or the “abnormal” are well defined.” “typical. and yet retain the crucial information. Repetitions of events determines whether it is “normal. In these cases. the word “normal” appeared quite frequently. Those details are presented in the next sub-section. “outliers” were studied.” “statistical average” etc. These concepts are inherently applied to some degree in all the potential algorithms discussed before.describes the essence of pattern theory (13). Visualization and Vision science: How do humans (and also machines) perceive images. and medicine. more so in psychology. the theoretical understanding came from diverse fields: a. “normal” is often defined through its difference (or opposite) from pathological (i. the “abnormal”). Quantization: Signal/image processing. without any (or much) explanation. Even though they could not be used. In essence. These primitive features appear as indivisible and readily apparent that they seem to need no further explanation. The concept of “normal” was found to occur quite frequently in the fields of literature.Venkatanarayana. Smith.” “healthy. and (2) the dictionary synonyms of “normal” and “typical” suggest the repeating aspect of an event.e. 2. In an attempt to emulate this approach. the opposite concept “abnormal” is well defined. Formalizing these features and their relations . but without any detailed description. 2. No universal definition of “normal” exists. Quantization is the process of approximating a signal so as to ignore superfluous details. It focuses on understanding and identifying complex patterns from their basic properties. The literature clearly pointed out several important details: 1. In some cases. “Pattern” and “Normal” Pattern theory presupposes an underlying structure. Formal.” “usual” etc. a brief overview is presented for completeness. called primitive features.” “conforming to a standard. i. However. and understand them? How are separations in graphs understood visually? b.which together form the patterns . mathematical structures are studied in traffic flow theories. In literature. For example.

The word means “not deviating from a type or standard. usual. The number of outliers expected (18). and its complement would automatically be the “normal. A distance metric. many days within a month could have some incident for part of the day – i.” but what standard? 4. Visualization and Vision Science . Smith. Further. only a few or even no completely entirely normal days may really exist. 3.e. and Demetsky 9 Everyone knows what “normal” is. it generally describes some commonly held understanding. a culturally accepted belief about what is typical. 3. these algorithms could not be applied for various reasons as noted here. the existing outlier detection algorithms cannot detect them properly.Venkatanarayana. The definition of a model (from which the outlier is different). However. 2. while the definition varies with the referent. All the algorithms reviewed require one or more of the following inputs (all of which are unknown for traffic data): 1. An observation is classified as an outlier if it is “significantly” different from the rest of the data (17). However. and natural. Defining “abnormal” may sometimes be easier. From all the studies. Two other challenges that surfaced from the literature review are: • One of the properties of the outlier detection algorithms reviewed is that an observation is either a good data (fitting the underlying model) or an outlier. … And.” Outlier Detection This field was explored for potentially defining and identifying the traffic data “abnormalities” directly. it can be concluded that the context is very important for understanding what is normal. • Kosinski (19) states that if the amount of contamination is large (>35 or 45%). There is no provision for part of that observation to be good and the other part as outlier (which is often the case with time series traffic data – when there is an incident during part of the day). 5. What Hooton (16) says for the medical field also applies to transportation (for traffic flow patterns): [The term “normal”] smacks of the hypothetical human being who is described so exhaustively and ingenuously in textbooks of anatomy and who never turns up in the dissecting room tailored according to specifications. “normal” could be hypothetical and even a single sample might not exhibit all the so-called “normal” characteristics. Excluding these from the parent dataset would have resulted in identifying the underlying “normal” traffic patterns.

and Demetsky A review of research in these fields provided a better understanding of how humans (and machines) perceive images and graphs. or more bits may be retained for more exhaustive examination by humans or machine. … using more bits will generally not improve the visual appearance of the image – the adapted human eye usually is unable to see improvements beyond 6 bits (although the total range that can be seen under different conditions can exceed 10 bits) – hence using more bits would be wasteful. Smith. predetermine the patterned structures that the higher centers can detect. stimulus-based attention shifting causes this [attention] mechanism to shift toward either movement or areas where preattentive features have identified strong patterns of color. “Automatic.” (20) The importance of color contrast on attention and perception are stressed in the literature. 12.” (21) The understanding of the visual processes has significantly improved the development of the Quantum-Frequency algorithm. intensity. “… the kind of errors that are evident in perceiving impossible objects seem to indicate that at least some visual processes work initially at a local level and only later fit the results into a global framework. before combining together the “normal” quantums from various points in time. The visual perception of proximal and similar data points. extracting interpretations from high-information features like sharp corners …” (21). The latter finding suggests using a default value of “0” for “insignificant frequency. and simplification in the form of closure and continuity are also noted. However. supplying the higher processing centers with aggregated forms of information which. The important findings are: • 10 • The following finding suggests the strength in scalar quantization of the traffic volume values at each point in time.” (22) And “There appears to be a general Principle of Selective Omission of Information at work in all biological information processing systems. The sensory organs simplify and organize their inputs. The first finding confirms the decision to collapse adjacent quantums with any values into one large normal range. and separating them from other values that are far away are deduced from these findings. to a considerable extent. • Abstracting away some details may not impact the information perceived. or size contrast.” (21) And “The visual system knits together a remarkable illusion of continuity from the succession of saccades [the quick movements of the eye]. The following findings strongly suggest that quantization and regrouping adjacent quantums are comparable to visual analyses.” for maximum contrast. “This number [8] of bits [for representing images] is quite common for 2 reasons. There are exceptions: in certain scientific or medical applications. The idea of grouping together the adjacent values. the findings could not always be directly related to the . 16.Venkatanarayana.

24). y3. signal/image processing and digital data compression (23. One definition of an N-point scalar (one-dimensional) quantizer Q is a mapping Q: R → C. And this situation is currently unavoidable. Following this. When the appropriate parameters are used. as explained in the theoretical foundation above. which depended on the repetition of closely spaced traffic volume values at each point in time. when compared to the abnormal traffic data. This section presents a detailed explanation of the proposed algorithm. . and Demetsky 11 theoretical explanations of the algorithm. In this research. Vector quantization is extensively used in communications. the similar “normal” traffic flows emerge as the set of frequently occurring traffic flows. is that the (unaffected) normal traffic data from different days are similar and close to each other.” The pseudocode of the algorithm is presented first in Figure 3. details of the algorithm are explained using traffic flow data from all 22 weekdays in January 2004 for the link 90272 described earlier. … . QUANTUM-FREQUENCY ALGORITHM The Quantum-Frequency algorithm was developed by the research team based on insight gained from exploring several diverse fields. the simpler form of scalar quantization is used. Smith.Venkatanarayana. the input vector is also discrete.” “visualization.” “outliers. as vision scientists openly acknowledge that much is understood in that field and much more remains (20). and an assumption for the Quantum-Frequency Algorithm. The quantization effectively abstracted away a lot of details not relevant for the analysis of traffic patterns. In the QuantumFrequency Algorithm. these frequently occurring traffic flows are directly identified in an automated manner. The uniform floor function presented earlier was selected as the quantization function. y2. For the present research study. based on the pre-selected quantization function. Algorithm Description The Quantum-Frequency algorithm evolved from several fields based on a careful consideration of the concepts “patterns. and the resolution of interest input by the analyst (as the quantum size). A fundamental observation. Quantization Quantization is the process of approximating a continuous signal to a finite and small discrete set. for its simplicity.” “normal. where R is the real line and C ≡ {y1. yN} ⊂ R is the output set or codebook with size | C | = N (23). The codebook is defined dynamically.” and “quantization.

FIGURE 3 Quantum-Frequency algorithm pseudocode 12 The quantum size in the first step and the frequency value of insignificance in the third step are the two key parameters of this algorithm. The “abnormal values” are the complement of the normal values in the entire input dataset. it is ignored in further calculations in this algorithm. there is no loss of information as the quantization is used only to find the distribution of the traffic flow (volume) values.Venkatanarayana. Once the “normal” range of traffic volume values are identified in step 5. In step 1. This is illustrated in Figure 4. the number of times a quantized volume value occurs. Step 3: A frequency value of insignificance is provided to the algorithm. Only the traffic volume values falling within this quantum are considered in further calculations of normal traffic pattern. Any value between 60 and 119 is considered as 60. (For each point in time) The highest and lowest “normal values” signify the range within which the traffic volumes of most days fall. Quantizing is the process of approximating a traffic volume value to the nearest value in a pre-defined set of discrete values. The discrete data set is referred to as cells or quantums. In this research. These volume values are designated as the “normal” volume values for the entire time period. These numbers are referred as frequency (of occurrence of that volume value). to represent the frequency of the new grouped quantum. and Demetsky Step 1: The traffic volume values (for each point in time.e. Step 4: (For each point in time) If two neighboring quantums have frequency>0. the frequency for that quantum is set to zero. These quantums are similar to the bins in a histogram. If a volume value occurs fewer times than this frequency. Step 6: A time-series plot of all the “normal values” from Step 5 provides a quick view of the identified normal traffic pattern. from here on. for all the days) are first quantized as a scalar variable. Step 7: One final “normal daily traffic pattern” for the entire period is determined by averaging all the “normal values” from the above step. the simplest function – the floor function (or step function) is selected. etc. any traffic volume value between 0 and 59 vehicles/hour/lane is considered as 0. i. i. A time-series plot of all the abnormal values is also prepared. i. Step 2: (For each point in time) The distribution of the traffic volumes from all the days is determined. Their frequencies are added. the quantums are grouped together. Smith. This process is repeated until no two neighboring quantums have frequency>0.e. It is important to select an appropriate quantization function.e. only the actual values (from the input dataset) are used in further calculations. . Step 5: (For each point in time) The quantum with the highest frequency is declared as the typical or normal quantum.

All the values outside the ovals would usually be viewed by experts as abnormal. and the corresponding quantized data are presented in Figures 5 and 6. It can be seen that the relative positions of the volume values have not changed . the frequency distributions of the raw traffic volume data (at 13:00). and Step/Quantum Size 300 FIGURE 4 Floor function for uniform quantization Elaborating step 2.the quantization has simply made the “pattern” more pronounced. FIGURE 5 Histogram of raw volume data .Venkatanarayana. and Demetsky 300 13 240 180 120 60 0 0 60 120 180 240 Actual Data. The ovals marked with the red data form the neighboring quantums with the highest frequency (when combined together). Smith.

This figure represents time series data. . and insignificant frequency=0. with the “y” axis representing quantum values (i. volumes) and the “x” axis representing time. The yellow quantums depict the neighbors at each point of time. the frequency for each quantum from the January data is shown by the number. For each point of time.Venkatanarayana. and Demetsky 14 FIGURE 6 Histogram of the quantized traffic volume data Steps 4 and 5 of the algorithm are illustrated in Figure 7. The “normal” interval is also outlined. This figure depicts the results by considering the parameters quantum size=60. which bounds the yellow quantums. Smith. which combine together to form the highest frequency.e.

the example presented is the same link and time period used for illustrative purposes earlier in the paper. The following plots are included in the output: (1) A plot of the raw data. executing the code requires less than 30 seconds. The preliminary results are promising. (d) the identified normal average (i. The data output includes a matrix of (a) the input dataset. These datasets for the past several years can be accessed through the Smart Travel Laboratory at the University of Virginia. These data are collected at lane level at 15-minute intervals. a statistical analysis software package. and the historic average. and quite consistent. Note that for the sake of consistency. (c) the simple historic average. Traffic data from the Traffic Monitoring System (TMS) in Virginia was used. (2) A plot of all the normal points identified. (b) all the identified normal points. Smith. (e) the number of sample . The TMS collects data for several continuous count stations across the state.e. and also data outputs. One specific example is presented in this paper. the average of all the normal points). Algorithm Implementation The Quantum-Frequency Algorithm was implemented in SAS. and Demetsky 15 FIGURE 7 Time-series plot to illustrate “normal” pattern range based on quantum frequencies CASE STUDY In order to assess the potential of the Quantum-Frequency Algorithm. (4) A plot of the normal pattern interval. The deployed algorithm creates results in the form of several plots. The Quantum-Frequency algorithm was applied to several months of this data (at 1-hour aggregations) across different links and years. a case study was conducted in which the traffic flow pattern was investigated for a freeway link in Virginia. (3) A plot of all the abnormal points. the final normal traffic flow pattern.Venkatanarayana. For analyzing a month of data.

and (g) the parameter values. The experts contributing to this work were: (1) Ms. 15-minute data can be analyzed with a quantum of 15 directly. (f) the normal range (both high and low limits). For example. but not at a much lower precision. selecting a value of 60 vehicles/hour/lane allows ready application of the same value to any temporal aggregation of the traffic flow (volume) data.” Actually. Cathy McGhee. HA2 is also presented. Once the frequently occurring traffic flows are determined. Results The pattern identified by the Quantum-Frequency Algorithm was compared to the patterns identified manually by experts in the area of traffic data analysis and planning. The “insignificant frequency” value indirectly represents the minimum repetitions that deserve to be called “normal. Senior Research Scientist at the Virginia Transportation Research Council. It should be noted that the quantum value is used only for abstracting the details in the data. These results are shown in Figure 8 below.Venkatanarayana. it represents the maximum number of repetitions that does not deserve to be called part of the “normal” group. the capacity of a freeway is often presented as 2400 passenger cars/hour/lane or 2350 passenger cars/hour/lane. and the results are comparable. Therefore.e. during the first few steps. The initial value selected for this parameter is 60 vehicles/hour/lane. the quantum conversion does not involve any kind of information loss. along with the results from the Quantum Frequency algorithm and the historic averages. HA1 is the simple historic average of all days. followed by an empty quantum is significant enough to discount any data beyond that boundary. The main rationale for this selection is that traffic volume values (such as capacity in the HCM) are often rounded to the nearest multiple of 50. the final calculations use only the actual data. Selection of Algorithm Parameters The “quantum” value represents the resolution of interest to the analyst. Smith. Keith Nichols. For example. and Demetsky 16 points that contributed to this identified normal average. (2) Mr. there is a tradeoff: many more data points would be labeled as abnormal. an expert in systems operations and simulation. an expert in transportation planning. A value of “0” for this parameter suggests that the contrast of several days in one group. as most analysts are likely to remove the January 1 data irrespective of the actual values. days consisting of only normal points). Senior Transportation Planner at the Hampton Roads Planning District Commission. and HA2 is the historic average after removing data for January 1. and asked to identify the “normal traffic data pattern. . As can be seen in Figure 6. (f) the average of complete normal days (i. However. a value of “1” would result in more tightly bounding the normal interval. The experts were given the input dataset.” as useful for their purposes using the modified historic average method. Further.

The average results are presented in Table 1 below. Note that the average absolute error (or deviation) from the ‘true” expert defined pattern is nearly 40% less for the Quantum-Frequency algorithm than for the simple historic average (HA1) approach. where the performance of the Quantum-Frequency algorithm is significantly better than the others. Mean Absolute Percentage Error (MAPE).Venkatanarayana.8 % 10.87 66.10 48.2 % .85 40. and Demetsky 17 Typical Traffic Pattern Comparing Results from Different Methods 2000 1800 1600 Traffic Demand (vphpl) 1400 1200 1000 800 600 400 200 0 Operations/Simulation Planning FIGURE 8 Comparison of results from different methods In addition to the plots.” Both the historic averages and the Quantum-Frequency algorithm results were compared to the typical traffic patterns identified by the experts.48 58.8 % 12. Further.11 39. TABLE 1 Error Measures on Comparing the Results from Different Methods Error Metric MAPE 8.39 35. Smith. a number of traditional error metrics such as Mean Absolute Error (MAE).41 45. The similar improvement is nearly 18% over the modified historic average (HA2).15 4.3 % 17. and the Root Mean Squared Error (RMSE) were calculated using the expert patterns as the “ground truth.7 % Method Historic Average (HA1) Average – without Jan 1 data (HA2) Quantum-Frequency Algorithm Improvement (over HA1) Improvement (over HA2) 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 Time of Day QFAlgorithm HA1 (Simple Historic Average) HA2 00 2: 00 00 3: 00 00 7: 00 4: 00 00 8: 00 0: 1: 5: 6: MAE 66.9 % RMSE 90. the results of various algorithms are different mainly for the AM peak period.94 4.

Smith. the algorithm is capable of automatically providing results that are more consistent with expert judgment than the biased historic average approach most frequently used today. Figure 9 shows that the typical traffic patterns identified by the experts are almost completely contained within these bounds identified by the Quantum-Frequency algorithm. and Demetsky 18 In addition to the final normal traffic flow pattern. the Quantum-Frequency algorithm also provides an upper and a lower bound that contains the normal days. As evident in the case study.Venkatanarayana. and overcomes several drawbacks from the existing and other proposed methods. Typical Traffic Pattern Comparing Results from Different Methods 2000 1800 1600 Traffic Demand (vphpl) 1400 1200 1000 800 600 400 200 0 Operations/Simulation FIGURE 9 Comparison of expert results with Q-F algorithm normal bounds CONCLUSION A new algorithm (the Quantum-Frequency algorithm) has been developed for identifying traffic flow patterns from large datasets. 00 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 Time of Day Planning QF Normal Upper Bound QF Normal Lower Bound 00 3: 00 00 00 4: 00 0: 1: 2: 00 00 6: 00 7: 5: 8: . The algorithm has a robust theoretical basis from several diverse fields.

May 1983. and Demetsky.. ACM. Charlottesville. N.P. Outlier………. Y. Transportation Research Board. Orthodox Academy of Crete. Virginia Transportation Research Council. Final Report VTRC 02-R8. 3. 16. R. Smith. J. TRB.D. The Second International Symposium on Transportation Network Reliability (INSTR). IEEE 5th International Conference on Intelligent Transportation Systems. Washington D. A. Traffic Congestion and Reliability: Trends and Advanced Strategies for Congestion Mitigation. Smith. Hooton. 1945. 2. Smith. National Research Council. 2. Ou. European Symposium on Intelligent Techniques. Smith. and H. Beckman. The Kent State University Press. 2006. 2002 HICOMP Annual Report. Revised Second Edition. WSDOT. Davis. Ramaswamy. Vol. New Zealand. Shim. A procedure for the detection of multivariate outliers. Ren. 13. Iwatani... 2000. 1999. March. Edited by Carol Donley & Sheryl Buckley. R. A. Kirshfink.. T. Venkatanarayana.L.. M. Turochy. 2003. 2005.” Findings from a Study of Students. 14. Vulnerability quick scan of a national road network. MOD 2000. and J. H.A. Technometrics. P. and C. Molenkamp. June 3-4.s. B. van Berkum.. Asakura. Traffic Pattern Identification using Wavelet Transforms. 4. Narratives of Mental & Emotional Disorders.L. Analyzing highway flow patterns using cluster analysis. G. USA. 12. The Gray Notebook for the quarter ending September 30. A. and B. and K. The Meaning of Normal. 2001.. and Y. X-L. B. 1999. Washington D. “Young Man. 8. New York.. August. R. Traffic Situation Prediction Applying Pattern matching and Fuzzy Classification. 2004 7.V. 2005.L. 1999. Hoel.G. 17. You are Normal. and E. Venkatanarayana. 6. Chadenas. PWS Publishing. Washington D. Final Report. 6-10 November. 2000. 15. R. Markers and Mileposts. E. and L. pp 145-161. November. 11. Visser. Inc.C.. 5. . J-T. 3-6 September 2002. 18. 2005. No. 2004.. In What’s Normal. 10. and D-C Hu.. W. Kosinski. Putnam’s Sons. Efficient Algorithms for Mining Outliers from Large Data Sets. and R. An exploration of advanced computing algorithms for automated identification of traffic patterns from large traffic datasets.. J. 19. CalTrans. Greece.. Rastogi. Singapore. S. and L. Measures.J. Virginia.E. 2002.Venkatanarayana. Zhang. Ohio. User Guide: QuickZone Delay Estimation Program.99. Cambridge Systematics. San Francisco. September. Presented at the 84th Annual Meeting of the Transportation Research Board. 12th World Congress on ITS. Computational Statistics & Data Analysis 29. Cook. and Demetsky REFERENCES 19 1.25. Iryo. G. Kent.C. ITSC. January. Alternative Approaches to Condition Monitoring in Freeway Management Systems. Version 0. Bradley. Research on Network-Level Traffic Pattern Recognition. Tanikella. Garber.J. Presented at the 85th Annual Meeting of the Transportation Research Board. 2000.C. Classifying of day-to-day variation of traffic flow with cluster analysis. Zhang. MitreTek Systems. R. S. 9. 2006. Traffic and Highway Engineering. Weijermars. Highway Capacity Manual. Christchurch.

J. Mackinlay. Readings in Information Visualization: Using Vision to Think.M. and R. MA. 2000. Kluwer Academic Publishers. Gray.K.Venkatanarayana. 1999. Shneiderman. 23. Springer-Verlag. S. 3rd Edition. and Demetsky 20 20. Vector Quantization and Signal Compression. The MIT Press. Palmer. Boston. 21. Smith. A. 22. 1992. Kohonen... 2001. and B. Handbook of Image and Video Processing. San Francisco.D. T.. 1999. A.E. Card. Vision Science: Photons to Phenomenology. Cambridge. . Bovik. S. 24. Academic Press. Self-Organizing Maps. Berlin. Morgan Kaufmann Publishers. Gersho.