You are on page 1of 4

Temporal data mining:

Temporal data mining “is a process of Knowledge Discovery in Temporal Databases that

enumerates structures (temporal patterns or models) over the temporal data”. Examples of

temporal patterns or models (trends) in the data are event associations over time, similar pattern

based time series retrieval, time series based indexing and segmentation. In the telecom market

domain, temporal data mining helps in identifying temporal patterns like the fluctuation of

bandwidth prices, range of variation in band width demand, range of variation in band width

usage, the period of demand time, statistical variation of service usage, statistical variation of

App usage, statistical variation of device usage etc.,

Time Series

A sequence of continuous real-valued elements, such as service prices, product prices, bandwidth

prices is known as a time series. This time series data is represented by piecewise linear

transformation using Discreet Fourier Transformations or Discreet Wavelet Transformations.

Then the Euclidean Distance is used for identification of trends (patterns & models) through the

comparison of time series to discover the similarity measure of similar shapes between them,

based on a predefined and domain-specific measure of similarity.

Sequence Mining

The goal of this research study in mining sequential patterns from the commercial databases of

the telecom company is to predict the customer buying behavior. The proposed tool will use a
novel sequence mining algorithm which is able to extract statistically significant patterns

(similarity measures) of the form A_B[t] where an event of type B follows an event type of A

within time t, from event sequences with a goal of finding the optimal t for each of the extracted

patterns.

Proposed Sequence Mining Approach

Step 1: Define the Event Sequence

An event sequence S is a collection of events over R (linear time). In a given set R = { A1, …,

Am } of event attributes with respective domains DA1, …, DAm. An event e over R is a ( m +

1 )-tuple ( a1, …, am, t ), where ai∈ DAi and t ∈ ℵ, being the occurrence time of e.

Each event has a type and a time of occurrence. Therefore, given a class E of basic event type, an

event is a pair (e, t), where e ∈ E and t ∈ ℵ. An event sequence S is an ordered sequence of

events is represented as S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ) where ei ∈ E and ti ≤ ti+1 for all i

∈ {1,…, n – 1 }. Taking into account the various event types A, B, C and D an example of such

an event sequence would be S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 )

Step 2: Frequent Pattern Discovery methodology


The proposed sequence mining approach discovers the temporal associations between pairs of

event types in a given event sequence using statistical measures to calculate the frequency of

occurrence of these associations within the sequence.

In an event sequence S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ), with a user defined minimum

frequency threshold min_freq, and a maximum temporal difference boundary defined as _Tmax.

In a finite set named E consisting of different event types that are contained in the event

sequence S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 ), the proposed frequent pattern

approach discovers frequent patterns that belong to the pattern class of X _ Y [_TS, _TE]. In this

equation X, Y are event types that belong to E and [_TS, _TE ] are the relative time frames,

within which event Y happens after event X with a frequency ≥ min_freq. _TS, _TE are positive

integers with _TS ≤ _TE ≤ _Tmax. The frequency f of a pattern is calculated as the number of

times the pattern occurs for a different occurrence of X, divided by the total number of times X

occurs in the sequence. All the occurrences of the pattern using the same occurrence of X are

counted as one and only different occurrences are taken into account.

Step 3: Implementation of prototype

The main components of the prototype are update component and ETL component.

a) The update component consisting of a set of agents who communicate with the web sources to

retrieve commercial transaction data and store them locally. As the company has branches all

over the world the data format is not similar for all the sources, hence a dedicated agent is used

to checks the data for consistency, to schedule the daily automatic update, to clean and prepare

the data before storing into the research database of the system.
b) The ETL (Extract-Transform-Load) component consists of a set of tools for the preparation of

the data before a data mining algorithm uses them. Each tool depend on a specific algorithms

hence each algorithm needs its own tool. The proposed temporal data mining algorithm also uses

a tool. The sequence mining algorithm (data mining engine component) needs a specific type of

cleaned and prepared input in order to run.

The ETL component prepares a table in the database with just two fields: Event description,

Event timestamp which describe the event type that occurred on the Event timestamp. The form

of the results produced by the algorithm is: LHS Event Type _ RHS Event Type [_TS, _TE],

Frequency.

c) The evaluation component intakes the set of rules produced by the sequence mining algorithm

and checks the produced rule (pattern) with an algorithm to search for equal patterns ( LHS and

RHS) and decides to update or not the rule that is already stored. The three parameters that are

checked are _TS, _TE and the frequency value. The patterns produced by the evaluation

component are stored. The pattern warehouse already includes some standard events types: the
Product
Event
priceProduct
of a service,
ID the high/low demand fluctuation and the high/low usage fluctuation. Each
Event ID
Product Descriptn
Event Description
pattern consists of many Price
events and every event can be related with a business transaction.
Price ID
High value
Service Low value Event Hierarchy
Service ID Event ID
Service Descripton Event Level
Pattern
Fact Table Event Condition
Pattern ID
Pattern Descripton PatternDescription
Volume Time ID
Service Pattern Volume ID Event Pattern
Service ID High value Event ID
Pattern Descriptin Low value Pattern ID
Demand ID
Price ID
Volume ID Demand
Time ID Demand ID
High value Time Granularity
Low value Year
Quarter
Month
Week
Day

You might also like