Supply Chain Management Journal

Improving Freight Operations and Planning by Applying Data Mining Techniques Over Data Collected From Intelligent Transport Systems

Florin Codrut NEMTANU, Valentin IORDACHE, Angel Ciprian CORMOS University Politehnica of Bucharest florin.nemtanu@ieee.or g
Abstract These days, freight transport domain faces many challenges that need to be addressed using appropriate data. A changing regulatory and technological environment for logistics, energy, environmental and safety considerations, impacts on global competitiveness, and the need to do more using fewer resources are some of the challenges. With the adoption of just-in-time supply chain management solutions, and increasing congestion on motorways, better knowledge of freight movements can serve to improve freight operations and planning. Considering the current advanced Intelligent Transport Systems technologies and the large amount of high quality data available and improved collection methodologies, data mining software will produce results that reveal new information about the freight transport system. Data mining can provide information beyond the use of general statistical analysis, the original databases being used to derive various rules and patterns which could be applied to support decision-making. This information will be the key for better-managed freight operations and long-range planning and forecasting. Keywords: freight transport, data mining, intelligent transport systems, data sources characterize the detection, medical effectiveness of and health Introduction freight transport, research and scientific discovery. Road freight transport plays a verythus occupying an In transport important role in the whole supply chain, and important place in domain they were at some points the distribution of goods can any business. They used in areas like become the bottleneck of it. Intelligenthave the power of traffic Transport Systems involve the application ofpredicting potential management, information and communication technologiesproblems based on accident analysis, to the planning and operation of transport archived or realdata, but pavement systems (B. McQueen & J. McQueen, 1999),time management making possible the efficient use of resources,before using them, the resulting systems, travel reduce environmental impacts and increase models from the forecasting, safety and, last but not least, ITS systems, mining information for many cases, ensure the achievement of adata process must be systems or public certain level of transport service (which, if not tried out and transport. Related ensured, it would make it useless). Currently, tested. to freight there are many organizations that stored transport, some of Data mining huge amounts of data generated over time. the benefits of data are The availability of detailed data obtained fromtechniques mining are: used in ITS systems opens the door to systematic widely - It speeds up evaluation of freight transportation systemmany fields such financial the process performance. Data mining methods ofas of data analysis are modern and powerful tools that analysis, analysis. allow discovering in these data relationshipsmarketing, human resources, and patterns that 1- It reveals astronomy, facts about business, fraud customers.

Supply Chain Management Journal

s, like speeds, - It d speed et pattern er s, m vehicle in weights es, th congest e ion and di acciden st t ri analysis b , ut volume of io s traffic, n sc paveme h nt e conditio or d ns weather ul escharact . eristics one can - It e create n models h that a support n freight cedelivery s and ef plannin fi g. ci Potenti e al n applicat cyions a may n include d arrival satime v es m o n ey . B

y analyzi ng data provide d from ITS system F

predicti on, schedul e adjust ment, transit plannin g and scheduli ng, real time service manage ment and others. I n order to solve a specific proble m, at first, one must identify what kind of data is availabl e, collect it, see what part of that data is useful, extract it and transfor m it so it can be used by the data mining applicat ion. This is called sourcin g and aggreg ation of data. u e 1. r Step

s o f t h e d a t a m i n i n g p r o c e s s

Data sourcing and aggregati on l a s he t ne b xt u ste t p isn cho o osi t ng l a e dat a a s mi t, nin t g h ap e plic c ati o on s bas t. ed Fi on n diff a ere ll nt y, crit t eri h a e like p co a mp tt lexi e ty, r alg n orit s hm a s n incl d ud m ed, o eas d e e of ls use p an r d o

Data exploratio n (Data Mining) d ed o u into n c knowl . e edge. d 1 b n this. y paper t we D h analy a e ze thet a sourc a p es p availa s o li ble u c for r a ITS ti data, c o recom i n mend n w methog ill ds for b choosi a n e ng d v the e appro ri priate a fi minin g g eg d techni r , que e i and g n softw a t t are e tool i o r and n p sugge r st e poten I n t tial e applic o d ations r a in thed n conte e d xt ofr t freigh t o ut r transp a n ortati s s e s s t h e i m p a c t o f a fr ei g h t m a n a g e m e n t p r o g r a m , it is i m p o rt a n t

Delivery of personali zed informati on

to u tak l e d int c o o acc n ou si nt d the e eff r ect u s si of n all g typ d es a of t ve a hicl fr es o an m d I the n ent t ire e tra lli nspg ort e ati n on t sys T te r m. a Th n at s is p wh o y, rt wh S en y de s vel t opi e ng m frei s ght . per M for a ma n nce y mo t del y s, p the e frei s ght o op f era d tor a s t sho a

a integr r ate, g e select e a, n v filter, e a transf r il orm a a or t b forma e l t thatv e data. a a s n TS t d sourc a fr es m o use ao mmassi u ve n d numb t if er ofs f senso o e rs f r embe d e dded a n in thet t infrast a s ructur a o e, inb u vehicl o r es oru c in t e porta l s ble o , devicec a s, a n that ti d will o it n is , t r h o e u m t i e n , e l r a s n r e e , s n p u o m n b si e b r ili o t f y v t e o h

ic le s , le n g t h a n d w ei g h t o f t h e v e h ic le , d a t e , ti m e , p o si ti o n o r s p e e d . C ol le c ti n g I T S

dat cl a a me s tho s ds o ha f ve tr be a en ff evoic lvins g e conn sid s era o bly r fro s m is the r use e of p tra r diti e on s al e on- n roa t d e sen d sor b s y to l Flo o ati o ng p Car d me e tho t ds e whi c ch t are o mo r re s cos , t- w eff h ectiic ve h sol a uti r ons e . s y he s mo t st e wid m ely s use e d m

b used w e to a d count v d vehicl e e es, r d deter a i mine d n their a r speed r o (if t a two e d conse c w cutive h a loops n y are o s used l t on theo h same g a lane) y t or w s classif h ey ic n vehicl h s es c e based a t on n h length d e and e p numb t r er ofe e axles c s (also t e using m n multip o c le vi e loops) n o. g f Other v a sourc e v es forh e speed ic h meas l ic ure e l are s e digital . o tacho L v graph a es s r which e t recordr hs as e vehicl c me's a . speed n T over n h time, e e infrar r y ed s a senso a r rs andr e micro e

al s o u s e d t o c r e a t e 3 D m o d el s o f v e h ic le s w it h t h e p u r p o s e o f cl a s si fi c a ti o n . W ei g h -

In- a Mo b tio o n isu an t eff tr ectiu ve c too k l w for e wei i ght g enf h orc t em s ent , an s d p coll e ectie on d of s dat , a a

xlithout e e requir c a c ing u o the n vehicl s fi e toe g stop. o u They f it r are a usuall s a ti y o based b ili n on s bendi t y a ng n plate, t d load o v cell ortr o piezo a l electri n s u cal msenso m it e rs. m s o w

vi n g i m a g e s i n a cl o s e d ci r c u it ,

Supply Chain Management Journal

s are used to enablin detect g human congest operato ion, to rs innotice decisio acciden ts, to n read making , videothe camera license plates s quickly of cars to found or classify applicat ion invehicles . the R field of adiotraffic surveill Freque ncy ance IDentifi and detecti cation on. As(RFID) a technol is technol ogies ogy have develop that uses ed, image commu analysi nication through s began the use to beof radio automa waves to ted based exchan ge data on electro betwee n a nic reader process ing ofand an informa electron ic tag tion attache using differen d to an object, t algorith for the ms andpurpose techniq of identific ues. CCTV ation (Closed and Circuit tracking TeleVisi . RFID tags on) traffic can monitorincorpo rate ing system data

about the owner, vehicle charact eristics, the type of goods transpo rted, origin, destinat ion etc. Newer tags have the possibili ty to connect to externa l sensors and create data logs. T he Global Position ing System (GPS) is a wellknown spacebased global navigati on satellite system (GNSS) that provide s location and time informa tion. F ollowin g the develop ment of

cooper anonym ative ously to system traffic s newmanage method ment s ofcenters traffic or other data vehicles acquisit . ion R have oad emerge surface d. conditio Collecti n and ng weather "floatin sensors g carare data" used to method determi is ne based temper on ature, vehicle dry, wet s thator ice serve conditio as n of the mobile road, sensors the . presenc Releva e of nt snow or informa rain tion and (speed their values, quantity location , dew , point directio and n ofrelative travel, humidit activati y. on of M internal ost of system the raw s -data ABS, collecte ESP - ,d by informa the ITS tion system from s and rain sensors sensors present , ed thermo above meter, is never etc.) used as are such. aggreg Usually ated data is and aggreg sent ated

over time or used to calculat e other indices like volume (the number of vehicles that passed over the detecto r during a period of time), occupa ncy (the average occupa ncy of the

detecto instant) r , travel during time or a delay period (K.A. of Tufte). time), O vehicle ther kilomet sources ers (private travele or d (VKT,public) the that estimat can be ed totalused in kilomet conjunc ers tion travele with d byITS vehicle data s in aare segme usually nt databas during es a timewhich period) can , provide vehicle informa hours tion travele about: d (VHT, 1a the cces estimat s to ed total physi hours cal vehicle facili s spent ties travelli need ng in a ed segme for nt freig during ht a time mov period) eme , nt; density 2s (the afety number of of vehic vehicle les s and occupyi prev ng a entio given n of length loss of a of or lane or dam roadwa age y at a to particul prod ar ucts/

freig ht bein g deliv ered ; 3c apac ity of the road to mov e freig ht at desir ed volu me/ weig ht level ; 4t ravel time (dwe ll time, proc essin g time and trans it time ); 5t onna ge and valu e of ship men ts to diffe rent mark ets; 6f reigh t prod uctiv ity (ann ual

agre eme nts, econ omy, inflat ion, fuel price s, and logis tical facto r chan ges). T hese databas es are created either by the freight operato r itself or by entities like depart ments of transpo rt, data collectio n agencie s, transpo rtation researc h boards, commit tees on freight transpo rtation, who identify and publiciz e sources of data and needs of data, assist analyst s and

decisio and n time makers consum in theing effectiv element e useof the of data freight mining transpo process rt data. . The miner T has to he aggreg integrat ation ofe data isinforma tion the most from difficult several

sources , to choose what data is relevan t; one might need to transfor m it in a suitable form, to deal with missing values,

Supply Chain Management Journal

. The right and to remove choice duplica must meet te or erroneo user needs us records and its budget. and extrem Therefo re, two e values. major proble ms 2 .arise: the first one is D to a tselect the a proper data e mining x techniq p lue that can be o rapplied accordi a to tng ithe type of o data n availabl e, and E veryon second e has ais choosin very difficult g of the softwar task when ite itself. T comes o to choosin determi g whatne the proper data mining techniq softwar ue, one e tohas to underst use. and the This data also applies sources for the, data freight parame manag ters (numeri ers involve cal, d inlogical, Boolean this process etc.)

and quality of data. Databa ses can be relation al, spatial, tempor al, deducti ve, object oriente d, transact ional, textual and many more. Techniques classificatio n,

prediction, algorith ms and others are availabl e. In (M.F. Jiang et al, 1999) the authors illustrat e the relation ship betwee n six types of data mining techniq ues and most common four types of databases: transaction databas ca b al es n e analyze usin associatio rul d g n es techniqu e; relation databases c b

suc clusterin h as g, associatio numer n, ic neura network geneti l s, c


i al m analyze il a d r classificatio i n t techniques; y t e x t u a l d a t a b a s e s c a n b e a n a l y z e d u s i n g p a t t e r n b a s e d s s e a r c h t e c h n i q u e ; t e m p o r a l d a t a b a s e s c a n b e a n a l y z e d u s

i n g p a t t e r n b a s e d s i m il a r i t y s e a r c h o r t i m e s e r i e s d i s c o v e r y t e c


hmaking nwithin a i paveme qnt umanage ement ssystem . (K. N Nassar, ext we2007). provide some exampl es of applied techniq ues taken from availabl e literatu re related to transpo rt domain . T he classific ation techniq ue is used to mine a databa se with logical and numeri cal values containi ng informa tion about asphalt project s to find rules and pattern s that will support decisio n

W hen searchi ng in a databas e with statistic s of acciden ts that contain s groups of attribut es, associat ion rules techniq ue can be used to discove r commo n combin ations of attribut es that occur most frequen tly within a given data set (P. Haluzov a, 2008). I n (D.H. Lee et al., 2004) an investig ation of traffic incident situatio ns is done by using a relation finder algorith m and

clusteri shortng term techniq traffic ues forecast over aing databa using a se withform of numeri nonpar cal ametric records regressi groupe on d intomethod categor called ies. k-NN I(Neares dentific t ation ofNeighb rear- or). end A crash s seen pattern from s onabove, instrum no ented certain freewa techniq ys isue can analyze be d in (A. recom Pande mended & M. and Abdel- every Aty, miner 2005) in over afreight traffic transpo surveill rt ance domain databa should se establis using h from neural the networ start ks what techniq techniq ues. ue is Isuitable n (R.E. for use Turoch with his y & data set. B.D. Pierce, When 2004) one is not the authors sure mined about that he an archive must choose of traffic a softwar data for ae that

uses a variety of data mining techniq ues, althoug h this could be an expensi ve choice. I n (K. Collier et al., 1999) the authors propose d a method ology for selectin g from among the assortm ent of comme rcially availabl e data mining softwar e tools. They suggest four categori es of criteria for evaluati ng data mining tools: perform ance, function ality, usabilit y, and support of ancillar y activitie s.

Taking into account the domain we are discussi ng in this paper we selecte d in the followin g some of the most importa nt criteria : 1- Pl at fo r m v ar ie ty . It is k n o w n th at th e pl at fo r m s

ol ut io ns a d o pt e d in th e tr a ns p or t se ct or ar e di v er se a n d so m e w h at in di vi d u ali st ic (p ro pr ie


ta ry ). Het ero gen eou s Dat a Acc ess . It is nec ess ary to cov er a larg e vari ety of dat aba ses , bec aus e dat a is obt ain ed usi ng diff ere nt tec hno logi es, diff ere nt


Supply Chain Management Journal

me tho ds an d ma y hav e diff ere nt str uct ure s. - Data siz e. It is a ver y im por tan t crit eri on bec aus e in the tra nsp ort do ma in dat a set s ten d to be ver y lar ge.

The soft war e mu st pro duc e res ults in a rea son abl e am oun t of tim e, wor kin g wit h mul tipl e dat aba se typ es, usi ng all the info rm atio n fro m the m wit hou t ma kin g con tra dict ory or uni ntel ligi ble



res ults . Rob ust nes s. The soft war e mu st be sta ble, wit hou t cra shi ng reg ular ly and esp eci ally in ana lysi s tha t req uire lon g wor kin g tim es. Alg orit hmi c vari ety. It is a ver y imp ort ant crit erio n if


the nec ess ary alg orit hm s are kno wn. If not , the sof twa re sho uld hav e as ma ny alg orit hm s im ple me nte d as pos sibl e. Mo del vali dat ion . It is a req uir ed cha rac teri stic of the sof twa re bec aus e obt


ain ed mo del s hav e to be vali dat ed bef ore bei ng dep loy ed into a wor kin g env iron me nt. Dat a cle ans ing and Dat a filte ring . It is an imp ort ant cha ract eris tic tha t allo ws the use r to wor k wit h the val ues




of the dat a set. Dat a vis uali zati on. The soft war e sho uld rep ort the res ults in a vari ety of gra phi cal met hod s. Err or rep orti ng. Sho uld be eas y to use and hel p the use r. Use r inte rfac e. It has to be eas y to nav


iga choose te the an appropr d iate unc softwar om e for plic each ate criterio d n bot conside h red for importa dat nt a a weight ag accordi gre ng to gat its ion importa ste nce will p be an assigne d d. Then for each vis softwar uali e tool zati will be on scored of for the compari res son, ultsmaking , ait poopossible r to find int the erf suitable ace one. can V lea arious d data to mining mis tool tak compari es sons an and d evaluati mis ons int were erp made ret over ati time. ons (J.F. of Elder the IV & anaD.W. lysi Abbott, s. 1998) Cosand t. KDNug gets T website

made some of them. But using these evaluati ons is not always the best idea becaus e softwar e tools are rapidly evolvin g, incorpo rating

new evaluati charact on eristics criteria. and The techniq most ues. recent Everyo survey ne is made must by comple Rexer te theirAnalytic own s (K. compar Rexer, ison 2010). study throug 3 h which. to Applic choose ations the in approprfreight transp iate ortatio tool. Good n help isplanni given ng. by T organiz ations he discove that red run surveys knowle among dge from the mining the data tool users mining process and create will be popular evaluat ed and ity reports interpre . If ated and softwar accordi to e toolng [P. is extensi Haluzov a, vely used, 2008] there this could are be anthree indicati categori of on thates it willthe probabl discove y getred good knowle scores dge: on Evi many

den t kno wle dge , whi ch is the co mm on kno wle dge of an exp ert. Eve n if it doe s not offe r any thin g ne w, it can sho w us tha t the met hod app lied is wor kin g well . Int ere stin g kno wle dge tha t


yiel uld ds be a eli ne min w ate poi d, nt but of at vie so w. me An tim d es this mu is st the be ma tak in en ai into m con of sid pro era ces tion s. bec aus Kn e it owl can ed exp ge ose tha a t ne see w ms poi to nt be of unc vie lea w r or tha is t at app var lies ian to ce all wit the h pro exp ble ert ms. kno I wle n (R. dg e. Kohavi, Us 2001) uall the y author thisstates typ that e data of mining knoserves wle two dg goals: e insight sho which is

identifyi ng pattern s and trend that are easy to underst and and use, and predicti on which means building a model that will predict based on input data. Each data mining applicat ion in transpo rt enginee ring will have to reach one of these two goals. I n (J. Patnaik et al., 2006) the authors studied the develop ment of an effectiv e bus scheduli ng plan using data mining techniq ues on APC data

(Autom clusteri atic ng the Passen state ger variable Counte s (the rs) number applyin of g aserved method passen ology gers for and

halting stations in each vehicle trip) and using that for service

Supply Chain Management Journal

heavy plannin freight g. Tovehicles analyze is one cause the data, for extract disrupti it andons of determi the traffic ne deman flow and the d pattern appeara s datance of mining congest techniq ion. ues likeOne of classific the ation solution s to this and regress proble m that ion trees leads to improve and hierarc d traffic flow is hical clusteri formati on of ng vehicle were used. platoon s. The same Trucks can idea may bebecome applied largely when itselfcomes driven with to goods the distribu help of Advanc tion ed with regular Driver schedul Assista nt es, using System s data (ADAS) from RFID which tags orallows other automa identifi tically cation forming instrum of close ents. platoon s in T order to he traffic reduce generat air ed bydrag, fuel

consum ption and the space needed by trucks on the road (E. Savelsb erg, 2008; P. Meisen et al., 2008). ADAS enables trucks to adapt the speed to the traffic environ ment and follow the truck in front at close distanc e. By using revoluti onary ad-hoc network s trucks could commu nicate by dynami cally forming commu nication network s, where each vehicle is directly linked to its nearest

neighb powerors ortoto theweight intelligeratio). nt By infrastr commu ucture. nicating Benefit with s arethe numero intellige us. Thent distanc infrastr e ucture betwee this n data trucks can be in asent platoon also to can bea traffic reduce manage d whenment using center wireles to build s a commu databas nicatione that with can be the mined truck inin order front to becaus discove e of ar faster pattern responss and e of thepredicti involve ve d models system that s. can be Trucks used by can freight commu planner nicate s (C.H. with Cheong each & M.H. other Wong, and 2006). form They platoon can s basedchoose on a certain certain charact route eristics and a (destin certain ation, time route, window weight in per which horsep there is ower ora

bigger probabil ity to find similar trucks and create platoon s. C omplexi ca Inc. develop ed a system called MINDS that analyze s data from an array of invehicle sensors , to

determi day ne (hour, when day/nig truck ht) to drivers find are inpattern danger s of corresp falling onding asleep, to each using driver, advanc and ed adapt neural drivers' networ working k hours technol accordi ogy. ngly, Many thus other increasi system ng s thatsafety detect by the prevent drowsin ing ess of aacciden driver ts. are A emergi more ng orrational are use of already resourc deploye es can d. Thereduce informa the tion number provide of d byfreight such vehicles system on the s canroad be and used inincreas conjunc e the tion efficien with cy of data freight related transpo to thertation. route Similar travele to d byalready the existing driver, carspeed sharing profile, services weathe , truckr sharing conditiocould ns, be time ofimplem

ented. Differen t compan ies may share the same vehicles to deliver goods if they have similar routes, to avoid situatio ns in which vehicles are not fully loaded, or to shorten the distribu tion time and avoid peak hours. An interest ing task is to match user profiles based on various informa tion types but this can be solved reliably using pattern analysis techniq ues. For each vehicle and each freight

operato central r a userserver. profile With must sensor be network created s along containithe ng road prefere measuri nces ng like traffic type offlow goods, and weight, road volume conditio , alongn, with freight specific manage ations rs can concernrun ing thepattern journey analysis itself. tasks to obtain T informa he vehicle tion and theabout driver the can betravel monitortime, from ed using door to door, invehicle includin sensors g , mapparking and and trajecto waiting, and the ry informa cost needed tion can beto gathere follow a d withcertain route the help ofand predict GPS, storing traffic a userloads in profile the on afuture,

taking into account special events (such as games, festival s or holiday s), as well as weather forecast s, planned road mainte nance and so on. Upto-date time-ofday and day-ofweek traffic statistic s would allow freight operato rs to route their vehicles as to avoid traffic congest ion and better serve their custom er. On the logistic level, the


