Wang & Wang: Designed Web Process Sequence Analysis

ADAPTABLE ALGORITHM FOR DESIGNED WEB PROCESS SEQUENCE DATA ANALYSIS
Hai Wang Sobey School of Business Saint Mary's University Halifax, Nova Scotia B3H 3C3, Canada hwang@smu.ca Shouhong Wang Charlton College of Business University of Massachusetts Dartmouth Dartmouth, MA 02747, USA swang@umassd.edu

ABSTRACT A significant interest for Web process design is to discover the discrepancies between the users' transaction process sequences and the desired process sequence for the Web transaction process. Sequence data analysis has been an important approach to analyzing Web log data in the e-commerce field. There have been many methods for sequence data analysis; however, few existing methods can be applied to analyzing designed Web transaction process sequence data for improving Web process design. This paper proposes an adaptable sequence matching algorithm for analyzing designed Web process sequence data for discovering knowledge about the Web process design. An application of this algorithm to a case of online shopping cart abandonment analysis is presented. Keywords: Web design, Web process sequence data analysis, designed Web process sequences, shopping cart abandonment. 1. Introduction A Web process is a business process carried out on the World Wide Web. Traditionally, Web process design has been studied in the field of workflow analysis [Cardoso & Sheth 2003; Cardoso 2006]. Recent research has suggested that Web page design has significant impact on online consumers’ attitude and behavior towards Web processes [Chatterjee 2008]. A right Web page design must meet specific needs of right consumers for right Web processes [Zhou et al. 2004; Shergill & Chen 2005; Singh et al. 2008]. Web log sequences (i.e., user click-stream sequences) represent users' Web access behaviors in carrying out Web processes. Massive Web log sequences can be used to discover the general patterns of these process sequences [Wen et al. 2007; Greco & Guzzo 2007]. These patterns are useful for us to understand the online consumers’ behaviors as well as problems in Web process design. For instance, by analyzing online shopping Web log sequences of an online store, one might be able to understand more about why online shoppers abandon shopping carts so often, and how the design of the Web site and transaction processes can be improved to reduce shopping cart abandonment. This paper presents a Web log sequence analysis method for improving Web process design. In the literature, various data analysis methods have been proposed to analyze Web log data [Cadez et al. 2000; Ester et al. 2002; Jiang & Tuzhilin 2006; Mobasher et al.2002; Manavoglu et al. 2003; Yang & Padmanabhan 2005]. In general, these methods aim at discovering interesting patterns of sets of sequences [Fayyad et al. 1996; Pei et al. 2004]. Common approaches to analyzing sequence data include time series analysis (e.g., [LeBaron & Weigend 1998]), association rules induction (e.g., [Lee et al. 2003]), and sequences pattern discovery (e.g., [Dutta et al. 2007]). While many research reports have been emphasizing on the performance of algorithms, exploring applications of process sequence data analysis results directly to e-commerce is imperative [Wu et al. 2000]. In this paper, we present a new method of analyzing Web log sequence data for the diagnosis of Web process design. This algorithm can be used to reveal useful information and develop knowledge for improving Web process design. The remainder of this paper is organized as follows. First, we provide a brief overview and discussion of major methods used for analyzing sequence data. Next, we develop an adaptable process sequence matching algorithm of

Page 104

c. Finally. the difference between two sequences S1 and S2 is measured by the Euclidean distance between the two vectors. In general cases of sequence data where the time is not necessarily a useful static reference dimension. we conclude with a summary of the study.. fi  1 if S1(i)  S2(i) (3) fi  0 otherwise Missing values are treated as dissimilarity. we focus on improving Web process design through analyzing online consumers’ Web access behavior in Web processes. c. y} S2: {a. a. stock price). y} where a. Because of this property of time series. the order of the elements in a sequence is not important. a sequence is a vector [Everitt 1980]. c. xt = a +  i 1 p bi xt-i +  j 1 q cj εt-j + εt (1) where t is the time. A general formula to calculate the distance between two sequences is d = wd D + wi I + wr R (4) where D is the number of deletion operation. van der Aalst et al. DNA sequences). 2. Cardoso & Sheth 2003.g. A classical time series analysis method is the linear autoregressive moving average (ARMA) model [Box & Jenkin 1970] expressed as follows.. t. 2009 analysis of Web process sequence data. xt is the value of the time-dependent variable at time t. 2. Although there have been a variety of methods for time series analysis. In terms of the equivalence between a sequence and a vector. click-streams). NO 2. However. An online consumer’ Web accesses in a Web process can be treated as a process sequence.2.g. t. in the SAM method. I is the number of insertion operation. insertion. approaches to time series analysis are structured. and cj are regression parameters. n is the number of the elements of the longest sequence. Also. d= f i 1 n i (2) where i is the position index of the longest sequence. p. m. the time series analysis methods. such as the ARMA method. 2. However. f i is the dissimilarity of the elements at position i. 2. and reordering operations respectively. wi.3.g. b i. p. the similarity between two sequences is measured by the necessary operations to covert one sequence to the other. i and j are window indices. In this paper. Sequence Alignment Method According to the sequence alignment method (SAM). a sequence is a set of elements arranged in a certain order [Sankoff & Kruskal 1983]. We can convert S1 into S2 by applying the following operations: { Page 105 . suppose that we have the following two sequences: S1: {a. For instance. This marks the limitations of the ADM method in cases where the order of the elements is crucial (e. are not particularly useful for sequence data analysis. and wr are predetermined weights for deletion. 2008]. online consumers’ attitude and behavior towards Web processes have received little attention. Four major categories of data analysis methods for processing sequence data are discussed below. and has been extensively studied through workflow analysis and management [Ould 1995. Associate Distance Measure Method According to the associate distance measure (ADM) method. The SAM method also uses a distance measure to calibrate the similarity between sequences. Time Series Analysis Time series are sequence data measured strictly against the time dimension. d. we present an application of the proposed algorithm to a case of shopping cart abandonment. y represent distinct values. m. and εt is the residual term. d.Journal of Electronic Commerce Research. the time window method is typically used to describe the pattern of the time-dependent variable (e. Cardoso 2006.. existing time series analysis methods are powerless in dealing with multiple dependent variables as well as nonnumerical sequences (e. Rozinat & van der Aalst 2008. In the ADM method.. Related Work: Methods of Process Sequence Data Analysis Web process design is a topic in the field of business process design. d. m.1. and wd.g. in contrast to data analysis methods for general cases of sequences where time is not a key factor (e. click-streams). Then. A process sequence is defined as a set of ordered elements. VOL 10. R is the number of reordering operation.

g. and associates each Web access event with a time stamp. If we choose wd=wi=2 and wr=1. (2) An actual DWPS is a variance of a desired sequence. but may not be important for many real-life sequence data analysis problems [Song et al. (1) The time aspect is important in DWPS for understanding the online consumers' behaviors. (3) Insert t to the fifth element. The formal definition of DWPS in the next section incorporates the time aspect. 2. High frequency patterns methods assume the frequency is the criterion for discovering patterns with high frequency. (2) Delete the fifth element. 1999] are algorithms in this category of sequence data analysis.. Later. there have been several popular methods for processing sequence data analysis. In the e-commerce context. a good Web process design should not have too many DWPS for a specific process to avoid confusion. 3. DWPS are less spontaneous than browsing-oriented sequences because each DWPS has a predetermined start state and an expected termination state. such as FreeSpan [Han et al. High frequency patterns methods rely heavily on predefined thresholds of frequency. This study investigates DWPS data analysis for the diagnosis of Web process design to improve e-commerce practices. casual Web navigation and DWPS often mix together. Thus. [Hay et al. More importantly. but do not represent the time measure explicitly. called “norm process sequence”. there are two major types of Web log process sequences: (1) browsing-oriented: casual information search and Web navigation. and continuously improve the Web design. This property makes these methods inadequate in cases where the frequency is not important. the distance between S1 and S2 is 5. Analyzing Designed Web Process Sequence Data Page 106 . GSP [Srikant & Agrawal 1996]. Intuitively. For instance. Thus. the general formula for distance calculation might be over simplified. a buyer may not be so sure about the specific item that she/he needs. monitor the users’ online shopping behaviors against the design. In comparison with the ADM method. frequency is just one of the indicators of sequence data. the existing methods of data analysis on process sequence data are not applicable. etc. AprioriAll [Agrawal & Srikant 1995]. High Frequency Patterns Methods High frequency patterns methods aim to discover the most frequent patterns from the sequence database based on predefined criteria. Discussion In summary. distances measures for process sequences are not relevant. many algorithms. Clearly. and can be used as a reference for Web log analyses. To improve the Web process design. DWPS have two unique characteristics in comparison with casual information search and Web navigation. 2004.4. adding and removing items repeatedly from the shopping cart. DWPS are non-numerical. a norm process represents the expected online consumers’ behavior by the designer. Nevertheless. and it represents the expected behavior of online consumers to carry out the Web process. the SAM method takes the order of the elements into account in measuring the distance between the sequences. A norm process sequence is usually short. these methods require candidate sequences as seeds to discover high frequency patterns. In the business field. and ought to be taken into account in analysis. 2004] use recursive projections to generate candidate sequences and use pattern growth approaches to discover high frequency patterns. 2. On the other hand. but have their limitations in general non-numerical process sequence cases. The start state of a DWPS corresponds to the start point of the Web process. but has not paid much attention to DWPS. and therefore spends much time browsing. 2001]. A designed Web process sequence (DWPS) is a Web log sequence designed by the designer of a Web process. Originally.Wang & Wang: Designed Web Process Sequence Analysis (1) Reorder the second and the third elements. the time measure in the sequence is missing in the SAM method. and (2) transaction-oriented: designed Web process sequences (DWPS) such as the online shopping process. the motivation of research into DWPS is to support pre-meditated Web process design by averting ad hoc Web design styles. A Web process may have more than one norm process sequences. and the termination state corresponds to the exit point of the Web process. The ADM and SAM methods are often used for sequence pattern analysis. 2007]) has been focusing on browsing-oriented sequences. Existing research on Web log analysis (e. Dutta et al. depending upon the Web process design.5. and the frequencies of similar process sequences are not particularly important. and PSP [Masseglia et al. and a new algorithm must be developed for this study. Also. 2000] and PrefixSpan [Pei et al. time series analysis methods have been used for sequence analysis with a single time sensitive variable during the past decades. one must make a distinction between these two types of Web logs. A norm process sequence is an ideal sequence expected by the designer for the entire online transaction process. The development of an effective method to separate these two types of Web logs and analyze valid DWPS is imperative.

. We develop an adaptable process sequence matching method in this study. tNorm-T>} |{ <eNorm-1. . there exists a norm DWPS that is considered to be the desired DWPS for the user to complete the process. . For instance. ei. eNorm-T } (9) where ENorm-i is any local iteration loop of norm designed partial Web process events which does not include the designed termination event eNorm-T. . . tNorm-1>. Θ can be defined in the Extended Backus-Naur Form (EBNF) notation as follows [Aho et al. ψ = { eNorm-1.. . 124> } where a is the event "Enter the Home page". NO 2. <e. <e1. . and ends with the designed termination event eNorm-T..2. α. <Click Finish.2. θk = { e1. Updating Ψ for a Web process is important. <Click Registration.. which might include certain local iteration loops. <eNorm-T. Formally. For a designed Web process such as an online purchasing process. but must have one certain termination event (e. and use numbers to encode the time with respect to the time of the initial event. α) (8) where α is the norm designed partial Web process sequence which does not include the designed termination event eNorm-T. 2006]: Θ ::= <e1. t>. ti>. } (7) The interest of this study is to analyze DWPS to improve Web process design. . Ψ may include local iteration loop process sequences (e. 309>. A norm designed Web process sequence has its corresponding norm designed Web process event sequence. Adaptable process sequence matching Analyzing DWPS is to measure the dissimilarity between all Θ and Ψ. 0>. tT> if the process is aborted. . For example. ENorm-i. A DWPS has its corresponding designed Web process event sequence which contains events only. e2. tNorm-1>. 2006]: Ψ ::= {<eNorm-1. and none of the existing sequence matching method can be used directly for analyzing DWPS because of the irregularity of DWPS. and eT is the designed outcome of the Web process. we assume that there is only one Ψ. <e2. 313>. <b. The norm times are the desired times for the corresponding norm events. <eT. <Type User-ID. and can be viewed as one of the outcomes of data analysis on DWPS. as defined below. 304>. where e is an event and t is the time of the event. <c. t1>. <b.Journal of Electronic Commerce Research. Ψ might not be unique. tNorm> | (<eNorm. Designed Web Process Sequences Analysis 3. . t> ) (6) which means that a DWPS has its determined time-stamped initial event and arbitrary numbers of follow-up timestamped events. A real DWPS may not end with <e T. 0>. b is the event "Click Registration".g. For a Web process. . in an online shopping Web Page 107 . . [Boyer & Moore 1977. . c is the event "Type User-ID" … T is the designed outcome "Click Finish". for a Web process in investigation.g. Process sequence matching is a division of pattern matching in computing (e. . 3. 1977]). 5>. VOL 10. that is. . Our adaptable process sequence matching method is different from the exact process sequence matching methods in that heuristics relevant to DWPS must be applied. For example. Let Ψ denote a DWPS with norm events and their norm times. .1. First. 2009 3. t1> | (Θ. Designed Web Process Sequences A DWPS is a set of process event-time pairs <e. t2>. 428> } If we use symbols to encode the event occurrences. In this study only one layer loop is considered. Two characteristics of DWPS call for the adaptable process sequence matching method. tNorm>. t1> is the determined designed initial event-time pair for the Web process. <ei. tNorm-T> } α ::= <eNorm. repeating shopping items). Ψ can be defined in the EBNF notation as follows [Aho et al.. in the above case. tT> } (5) where k is the process actor. ψ of a Web process is determined by the design of the Web process. Knuth et al. . 9>. Knowledge and interpretations of those discrepancies can be useful for improving the design of the Web process. the above DWPS can be rewritten as Θ1054 = { <a. 2>. . DWPSID#1054 = { <Enter the Home page. <c. confirm the payment). a sequence matching method is always related to a particular problem domain. There has been a very rich literature on sequence matching covering a wide variety of topics. <eT. Θk = { <e1. Ψ = { <a.. However. depending upon the Web process design.. .1.g. the Web designer might define the following Ψ for the online registration process based on the Web design and experimental data of times. In this study. <eNorm-T. . as defined below. 70> } Formally. 5>. and are usually estimated by using typical DWPS data or experimental DWPS data. tT> is the terminate event-time pair. the following is a process sequence of online registration. … <T. uncertain local loops in the process sequences are normally involved. <T.. This study uses process sequence matching for analyzing DWPS. The approach discussed here can be extended to multiple Ψ cases.

ψ: Norm designed Web process event sequence. but the computational cost used for such a search would be prohibitively expensive.4) that includes m rows and 4 columns. If the event in Ψ is a local loop. and then use ψ and the corresponding t values to generate Ψ. uncertain interruptions of the process sequences might happen. the number of the processes that has reached the event. The first event e Norm-1 of Ψ is used to search Θ.2. The adaptable matching process goes through the following steps.3. the customer might leave the Web site in the middle of shopping.5. the number of the processes that has reached the corresponding event (or local loop). Assemble DWPS The first step of the adaptable process sequence matching method is to assemble DWPS based on the Web log database to obtain Ω. δ. and may or may not return. It is possible to search the entire process sequence database to assemble interrupted process sequences. The output of the adaptable matching process is a table. For instance. Notations: w: Web log. If eNorm-1 is found in Θ. Each Θ  Ω is the input of the adaptable matching process. Γ: DWPS summary table. 3. To reduce the complexity of the followed matching process. customer) might re-do a part of the process by clicking a “back” button. Input: A Web log database Π. randomly select complete samples of Θ Ω. First. Each row of the table lists the event and local loops specified by Ψ. casual Web navigation logs will be excluded so that valid DWPS will be used for the analysis. a process actor (e. and the average interval time between the corresponding event and the next event. the withdrawn parts are excluded when assembling Θ. A: Web process actor. e: Event of Web process. is applied in searching the Web log database in order not to search the entire database for exceptional cases. an adaptable matching process takes place.. Ψ: Norm designed Web process sequence.4. Second. θ: Assembled designed Web process event sequence. t> to find average t for each e and each event in E. which could be a set of event. The algorithm is summarized as follows. The interval between leaving and revisiting is uncertain. use their segments of process pairs <e. To improve the performance of process sequence matching. the threshold for interruption intervals. Adaptable Matching Once Ψ and Ω are generated based on the Web log database.2. it is assumed that the process is inadequately terminated. and is used to match Ψ. Page 108 .2. In such a way. 3. First. δ: Threshold for interruption intervals of the Web process. For instance.g. It has two sub-steps. if the online shopping customer left the Web site and did not return to the process within a half hour. Each norm event in ψ must be reached by the process actor in the correct sequence. E Norm-i.Wang & Wang: Designed Web Process Sequence Analysis process the customer can repeat purchasing items for arbitrary times. two practical methods are used in assembling DWPS. Generate norm DWPS for the Web process The second step of the adaptable process sequence matching method is to generate Ψ for the Web process. as described next. Ω: Assembled designed Web process sequences database.2. the set of Θ. a norm designed Web process event sequence ψ. Second. Each row lists event. in an online shopping Web process. 3. Θ: Assembled designed Web process sequence. and the average interval time between the corresponding event (or local loop) and the next event (or local loop). E: Local iteration loop of norm designed partial Web process events. a threshold for interruption intervals of the Web process δ. ψ (the norm designed Web process event sequence) is generated by the Web designer to normalize the desired event sequence for the Web process. and the process continues to the second event eNorm-2. Second. indicator of local loop. This process proceeds until eNorm-T is reached. 3. ψ might include local loop prototype E. is used to search Θ repeatedly until the loop is ended in Θ.2. T: Cumulative interrupted interval of a Web process. or Θ is ill-terminated. q: Number of event-time pairs in Θ. The proposed adaptable process sequence matching method meets the needs in these two aspects. Π: Web log databases. t: Time of the event of Web process. then record the time t1. as explained in Equation (9). Output: A DWPS summary table Γ(m. The adaptable process sequence matching algorithm The above DWPS adaptable process sequence matching method was implemented in C++. for each of the process actors.

Match ΘΩ against Ψ. and produce table Γ. Set T=0 and exit the case. t>Θ . estimated number of online shoppers of a few recent purchasing cycles) for analysis. tj>Θ (j=2…q). Hence. the number of DWPS in the Web log database is about the magnitude of the number of online shopping visitors. ti >. ti > (ei Ei). exit the case. find average ti for ei. the third and fourth columns are zeros. t+δ].g. Select first w Π. 2-4. do the following. For each ΘΩ. do 3-3 through 3-5 until Ω =Ø . find <ei. Search Π for all w with A while A is active in the time window [t. find average ti for ei. ti >. ti > (ei Ei).Journal of Electronic Commerce Research. } Ω =the set of Θ. one might choose a limited number of DWPS (e. t+δ]. For the first event-time pair <e1. cumulate these t. 2-1. search Π for all w with A while A is active in the time window [t. Assemble designed Web process sequences Θ. t1>. ti >. The time complexity of this algorithm is O(|Π|2). NO 2. process event e.4) so that the first two columns list all events in ψ and indicators of local loops. Generate Ψ using all obtained <ei. The set of Θ is Ω. practically. because the time spans of usual DWPS are not long. For each ei ψ. VOL 10. Delete Θ from Ω. A= the process actor of w If (A is not in Ω) { For each event e in w { t = the time of e. 3-1. 3-4. Listing 1. 1-2. The pseudo-code of this algorithm is presented in Listing 1. identify the process actor A. 1-4. search each of the complete Θ to find <ei. and time t. } For each Ei ψ { Compute <ei. ti >. multiple full scans of the entire Web log database seldom occur. all event-time pairs are ordered in the time sequence. 2-2. } Assemble w into Θ for actor A. and generate <ei. so that the first event-time pair of Θ must be <e1. so that the first event-time pair of Θ must be <e1. 2-3. Case 1: ejθ and ejψ (ejEi when ej is a local loop event). If T< δ. For each event-time pair <ej. otherwise exit this step. cumulate these t. and add (tj-tj-1) to the fourth column in the row ej-1 of table Γ. */ Randomly select n complete Θ from Ω. Step 2. /* Step 2. Generate the norm designed Web process sequence Ψ. t1 >. Assemble all w into Θ for actor A. in principle. } Delete w from Π. More importantly. 3-2. and q is the length of Θ. Randomly select n complete Θ from Ω. 2009 Step 1. ti >. ti > and <Ei.. 3-3. the average time complexity of this algorithm is considered to be scalable with a large data size of Web logs. Theoretically. Step 3. t>Θ . Case 2: ejθ and ej  ψ. Page 109 . For each ei ψ { Search each of the complete Θ to find <ei. */ While (Π=Ø) { Select first w Π. Initialize table Γ(m. Add tj to T. Assemble designed Web process sequences Θ.. t1 > and all event-time pairs are ordered in the time sequence. 3-5. Delete w from Π. the size of a Web log database could be accumulated to several gigabytes. Generate the norm designed Web process sequence Ψ. and generate <ei. For each Ei ψ. However. If A is not in Ω. Adaptable Algorithm for Designed Web Process Sequence Analysis /* Step 1. Note that. 1-3. add 1 to the third column in row ei of table Γ. 1-1. Add 1 to the third column in row ei of table Γ. and obtain <Ei. Repeat Step 1-1 through Step 1-3 until Π=Ø. and <Ei. where |Π| is the number of DWPS in the Web log database Π.

proteomics. There have been many sequence data analysis methods. /* Step 3. Each log entry included the visitor's IP address.7%) Θ were processes without shopping cart abandonment. CL 2009. These process logs were recorded by the server of the Web site. VI 2009]. To discover knowledge about shopping cart abandonment for a particular Web site. and each of them was designed for a certain type of problem. and SPSS PASW Modeler [2009]. checkout process is confusing. 821 (28. our algorithm is aimed to discover the patterns of variance of actual Web process sequences against a desired norm process. process point.4) so that the first two columns list all events in ψ and indicators of local loops.com [2009]. Each Θ started from adding an item to the shopping cart. A Case of Online Shopping Cart Abandonment In this section we describe the use of the proposed DWPS data analysis method for an online shopping cart abandonment case. and produce table Γ. common-sense explanations have not been fully studied in the literature. According to eshopability. and process timing. Our proposed adaptable process sequence matching algorithm is distinct from the existing methods by integrating the following two characteristics. After examining the Web site. 4. For each event-time pair <ej.Wang & Wang: Designed Web Process Sequence Analysis } Generate Ψ using all obtained <ei. data analysis on DWPS is particularly useful. T=0. } We have reviewed three well known commercial software products for sequence data analysis: IBM DB2 Intelligent Miner [2009]. Add (tj-tj-1) to the fourth column in the row ej-1 of table Γ. While all of these competitive software products are capable to extract general patterns and trends from the sequence data set. However. 75% of respondents abandoned their cart without making a purchase. but not merely the general patterns hidden in the Web log sequence data. in terms of matching. add 1 to the third column in row ei of table Γ. Among these Θ. genomics. [CO 2009. Repetition of choosing merchandise was the only local loop of processes. We believe that our algorithm has reached a new development beyond tuning an existing algorithm. First. none of them is able to perform adaptable process sequence data matching for the diagnosis of Web process design. tj>Θ (j>1) { If (ejθ and ejψ for ejEi when ej is a local loop event) { Add 1 to the third column in row ei of table Γ. 2856 Θ which reached a shopping cart were assembled for this experiment. 821 complete Θ were used to determine the norm time for each norm event. linguistics.} } } Delete Θ from Ω. ti >. a bad Web site usability. */ Initialize table Γ(m. There are many reasons for shopping cart abandonment: shopping information is unavailable. the third and fourth columns are zeros. the Web designer is able to learn useful information about online shopping cart abandonment to identify the weakest links of the entire Web process sequence. If (T>= δ) {exit. and 2035 (71. For each ΘΩ { For the first event-time pair <e1. etc. t1>. These data were used to determine Ψ (norm designed Web process sequence). Page 110 . Web log data mining. This experiment was to analyze those reasons.3%) Θ abandoned the shopping carts for some reasons. The context of adaptable matching in this study is analysis of designed Web process sequences. ti > and <Ei. By analyzing the results of the proposed method. etc. SAS Enterprise Miner [2009]. As discussed in the literature review of Section 2. sequence data analysis has been widely applied to various fields including finance. Shopping cart abandonment is costing online merchants billions dollars a year [ME 2009]. Second. in terms of pattern extraction. Match ΘΩ against Ψ. ψ (norm designed Web process event sequence) was generated. The Web log database used in this case study was process logs of an online house-hardware catalog sales Web site. but not exact matching. and was supposed to be ended with confirm of purchase. } If (ejθ and ej  ψ) { T = T + tj . our algorithm is designed to perform adaptable matching.

06 - Table 2..Provide other payment alternatives (e.9% N/A not be competitive cart (283/2035) .68 10.Improve back buttons . NO 2.Show shipping and handling costs information earlier . the data analysis results were reviewed by online sales managers and Web designers and lead to recommendations for improving Web design to reduce shopping cart abandonment.Shipping cost is higher entering shipping (855/2035) +508% than expected address After entering 8. the shopping cart was treated as abandoned in this experiment. Second. The Analysis Results of the Web Process Sequences and Interpretations Abandonment Cumulative Point Percentage Time Interpretations Deviation Left the Web site after . the majority of shopping cart abandonment cases (42%) happened right after the check-out point before the payment process.Invalid credit card card number before (102/2035) +896% number confirm .Prices of products might adding an item to a 13.Process might be confusing In the middle of .Shopping cart might not be functioning After check-out before 42. PayPal) . it was naturally considered that Page 111 . A Segment of Sequence Matching Results Event (ei) Adding an item to shopping cart Adding more than one item to shopping cart Check out Enter shipping address Enter credit card number Confirm Local loop No Yes No No No No Number of processes has reached ei 2856 2573 1941 1086 923 821 Average interval time between ei and ei+1 (Seconds) 32. The adaptable process sequence matching method was applied.Not all merchandise items have competitive prices .24 41.Make shopping cart visible all the time .0% . First. VOL 10.Searching and browsing shopping before 31. Given the fact that the majority of shopping cart abandonment cases took place after the check-out point.Process might be confusing and the shopper is lost . 2009 δ (threshold for interruption intervals of the Web process) was set to 1 hour.75 44. In this case. and the output table Γ was obtained (see Table 1 for a segment).Journal of Electronic Commerce Research.12 54. that is. Table 1.1% products are not easy check-out (632/2035) + 356% . As the processes with shopping cart abandonment wasted much time in the middle shopping process.0% . there were significant time deviations between the DWPS with shopping cart abandonment and the designed norm DWPS.Improve back buttons Meaningful knowledge development through data analysis always depends upon the interactions among business insiders and data analysts. The table revealed interesting facts.Improve searching and browsing products . the design for searching and browsing products was called for further examination. if a customer left the Web site for longer than 1 hour without reaching the check out point. The design of back buttons might also need to be inspected to see whether these buttons were helpful for searching and browsing products.Confusion Recommendations for Web Designer .The shopper does not like shipping address (163/2035) +615% to use credit card number before entering credit card number After entering credit 5.Make shopping cart visible all the time .g.0% .

Acknowledgment The comments of three anonymous reviewers have contributed significantly to the revision of the article. Techniques and Tools. R. “ A fast string search algorithm. P. Datta. The data analysis results and their interpretations by the online sales team are summarized in Table 2. E.” International Journal of Data Warehousing and Mining.” Journal of Electronic Commerce Research. No. H. 1977.. REFERENCES Agrawal.” Communications of the ACM. 21. V. 2009]. and M. 15. As a powerful technique. We conclude that designed Web process sequence data analysis is useful for improving e-commerce practices. G. “Semantic e-workflow composition. R. we have applied this method to a shopping cart abandonment case. Chatterjee. Cluster Analysis. customers and suppliers in the world wide web. and A. 20: 762-772. C. No. “Mining sequential patterns. CA: Holden-Day. Fayyad. <http://content. D. Cardoso. 1980. sequence data analysis has been widely applied in various fields. VanderMeer. D. <http://clickz.html?page=2245891> [accessed on January 30. P. Smyth and S. Aho. and R. Box. The analysis of the output of this method by business insiders and Web process designers can reveal useful information and develop knowledge for improving Web process design. 11:5157.websitegear.com. 2007. 1996. D. Sheth. Stolorz. R. J. “Poseidon: A framework to assist Web process design based on business cases. Ullman. Ramamritham. The first author is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC Grant 312423). Taiwan. 4: 99-118. 2006. and A. Srikant. M. Compilers: Principles. Indeed. eshopability. 181: 855-871. M. Guzzo. S. we are convinced that this method can be useful for improving Web process design. 1:51-61.com/article/shopping_cart_abandonment. 2009]. R. P. NY: Addison-Wesley. 2000. and match these irregular DWPS against the norm DWPS. Vol. P. 1995. Clearly. <http://www.. Sethi.. No. This paper proposes the adaptable process sequence matching method for analyzing DWPS. Heckerman. P. No. Keskinocak. it was a challenging task for the Web designer to design the way of presenting shipping and handling cost information. 1970.. as illustrated in this case.htm> [accessed on January 30. 2003. 3-14. “Mining scientific data. 2007.” Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. NY: Halsted Press. 3:191-225. J.. 1: 23-55. Vol.” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Cadez. This method is to assemble DWPS in the online process database.Wang & Wang: Designed Web Process Sequence Analysis the uncertainty of shipping and handling costs contributed to the problem. Everitt.” Proceedings of the 11th International Conference on Data Engineering. A. content. 2002. “Web site mining: a new way to spot competitors. CL . and G. A. “An information-theoretic framework for process structure and data mining. 39. Vol. Time Series Analysis: Forecasting and Control. and J. Moore. 249-258.websitegear. as shipping and handling costs highly depended on the sizes and weights of hardware items. B. clickz. 2nd edition.eshopability.com. San Francisco. 2008. 5. Jenkin. Meek. K. “A fast method for discovering critical edge sequences in e-commerce catalogs. an effective use of the proposed adaptable algorithm is always supported by a full understanding of the specific Web design and a comprehensive content analysis of the particular Web process. Haussler and P.” Journal of Intelligent Information Systems. 2006. 3. M. 280-284. I. Vol.” International Journal of Cooperative Information Systems. White. A good sequence data analysis method shall address specific types of problems.com.com/showPage. and K. CO. Vol. Cardoso.” Communication of the ACM. Dutta. 9. New York. 2009]. Lam. D. V. Page 112 . “Are unclicked ads wasted? Enduring effects of banner and pop-up ad exposure on brand memory and attitudes. Ester. Schubert. Conclusions Our discussion on existing methods for analyzing process sequence data as well as the characteristics of DWPS suggests that the current existing sequence data analysis methods are inadequate for analyzing DWPS for Web process design. Vol. No. and J. U. “Visualization of navigation patterns on a web site using model-based clustering. Greco. as this study suggests. Kriegel. As an application example. Based on the case study. Vol. G.” European Journal of Operational Research.com> [accessed on January 30. Boyer.

and W. 16.” Data Mining and Knowledge Discovery.com> [accessed on April 22. 2002. P. 2009]. No. Q. S. No. “U. and R. 2008. Verbeek. “GHIC: A hierarchical pattern-based clustering algorithm for grouping web transactions. Zhang “Discovering rules for predicting customers’ attitude toward Internet retailers. J.. Business Processes: Modelling and Analysis for Re-engineering and Improvement.tv/tips/shopping_cart_abandonment. Pei. 1998. Hay. Vol. Sankoff. “Mining the change of customer behavior in an internet shopping mall. J. MA: Addison-Wesley. Wen. Vol. Dayal and M. Kim. <http://visibility. Chichester. D. No. Knuth. G. No.” Proceedings of the 6th IEEE Conference on Data Mining. and Z. PASW Modeler. 2007. D. Kim. 2: 248-261. “Conformance checking of service behavior. U. Chiang and D. B.com> [accessed on April 25. VOL 10. Wu. 9.. J. England: John Wiley & Sons. Yang.” Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining. 2009]. 2000. 2006. 5. “Mining process models with non-free-choice constructs.” Proceedings of the 2nd IEEE Conference on Data Mining. LeBaron. Wang. 21. No. 10:135-144. W. 33. Wets and K. Vanhoof. No. Baack..” Journal of Electronic Commerce Research. H. Mortazavi-Asl. Sun.. No.” Journal of Electronic Commerce Research. “A bootstrap evaluation of the effect of data splitting on financial time series. marketingexperiments. Vol. France. 2009]. H. Hsu. Pei. Han. <http://www. and J. “An efficient algorithm for Web usage mining. 2008. 4:10041017. 10:1424-1440. Nakagawa. 4:228-239. Boston. Lin. No. R.ibm. C. Padmanabhan. S.. P. 2004. Vol. bisibility. 2004 Page 113 . B. Q. S. Rozinat and E.” Expert Systems with Applications. M. Ouyang. “Progressive pattern miner: An efficient algorithm for mining general temporal association rules. “Using sequential and non-sequential patterns for predictive web usage mining tasks. A. 2003. SAS Enterprise Miner. H.” IEEE Transactions on Knowledge and Data Engineering.” Networking and Information Systems Journal. Luo and M. H. Cicchetti. Dumas. “Identifying purchase-history sensitive shopper segments using scanner panel data and sequence alignment methods. K. Jiang. 8. 1:323-350. “FreeSpan: Frequent pattern-projected sequential pattern mining. Chen. Srikant. E. W. String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. L. Agrawal. 2003.. “Probabilistic user behavior models. MA. 307-318.com> [accessed on April 23. C. L. and A. J. 2003. J. Avignon. J. Wang and J. SPSS. Rozinat. van der Aalst. Poncelet and R. 2005. No. 2:145-180. Giles. Manavoglu.. T. W. U. N. J. 1983. “Data mining: How research meets practical development?” Knowledge and Information Systems. Chen.” IEEE Transactions on Knowledge and Data Engineering. X. 2009]. Vol. Pavlov and C.” IEEE Transactions on Knowledge and Data Engineering. 203-210. H. B.” IEEE Transactions on Neural Networks. J. 5/6: 571-603. Vol. “Improving personalization solutions through optimal segmentation of customer bases. No. B. H. 3:13:1-13:30.” Journal of Retailing and Consumer Services. Pratt. D. Vol. No. Vol. and S. Song. Lee. Mobasher. 2009]. 17. 2000. Vol. C. Vol. 2:78-92. Vol. M. Kruskal. “Fast pattern matching in strings. Vol. Yu. DB2 Intelligent Miner. No. 5. 1977. 669-672. 2009 Han.spss.com. Vol.. Hispanic consumer e-commerce preferences: Expectations and attitudes towards Web content. M.html> [accessed September 14. G.. Zhou.. ME.” Journal of Electronic Commerce Research. P. Ould.Journal of Electronic Commerce Research. Chen and C. and B. M. Joh. 9:1300-1304.tv. Mortazavi-Asl. 1995.. Masseglia.” SIAM Journal of Computing. L. K. R. A. VI. 355-359. 2. P.sas.” Knowledge and Information Systems.. M. Tuzhilin. “Mining navigation patterns using a sequence alignment method. Dayal and M. 1:213-220. W. C. <http://www. 2008. <http://www.. B..marketingexperiments. F. S. 2:162-175. 2005. 6. T. SAS. P.com/> [accessed on January 30. Kundu and C. H. Shergill.” ACM Transactions on Internet Technology. Singh. J. Hurtado. Piatesky-Shapiro et al. Y. G. “Web-based shopping: Consumers’ attitudes towards online shopping in New Zealand. No.” Proceedings of the 3rd IEEE Conference on Data Mining. NO 2. van der Aalst. 2000. No. Pinto. T. P. M. <http://www. H. D. Reading. M. 9. 15.” Proceedings of the 5th International Conference on Extending Database Technology. 1996. Morris and V. Dai. “Conformance checking of processes based on monitoring real behavior. S. Time Warps. “Mining sequential patterns by pattern-growth: The PrefixSpan approach. A. Hsu. Vol. and A.” Information Systems. C. Vol.S. van der Aalst. Timmermans and P. 2004. 1999. Popkowski-Leszczyc. 6:150-163. 6.. “Mining sequential patterns: generalizations and performance improvements. IBM. K. L. 3:157-168. Weigend. 15. 1:64-95. Vol. Chen.