DM Te

Temporal data mining:
Temporal data mining “is a process of Knowledge Discovery in Temporal Databases that
enumerates structures (temporal patterns or models) over the temporal data”. Examples of
temporal patterns or models (trends) in the data are event associations over time, similar pattern
based time series retrieval, time series based indexing and segmentation. In the telecom market
domain, temporal data mining helps in identifying temporal patterns like the fluctuation of
bandwidth prices, range of variation in band width demand, range of variation in band width
usage, the period of demand time, statistical variation of service usage, statistical variation of
App usage, statistical variation of device usage etc.,
Time Series
A sequence of continuous real-valued elements, such as service prices, product prices, bandwidth
prices is known as a time series. This time series data is represented by piecewise linear
transformation using Discreet Fourier Transformations or Discreet Wavelet Transformations.
Then the Euclidean Distance is used for identification of trends (patterns & models) through the
comparison of time series to discover the similarity measure of similar shapes between them,
based on a predefined and domain-specific measure of similarity.
Sequence Mining
The goal of this research study in mining sequential patterns from the commercial databases of
the telecom company is to predict the customer buying behavior. The proposed tool will use a
novel sequence mining algorithm which is able to extract statistically significant patterns
(similarity measures) of the form A_B[t] where an event of type B follows an event type of A
within time t, from event sequences with a goal of finding the optimal t for each of the extracted
patterns.
Proposed Sequence Mining Approach
Step 1: Define the Event Sequence
An event sequence S is a collection of events over R (linear time). In a given set R = { A1, …,
Am } of event attributes with respective domains DA1, …, DAm. An event e over R is a ( m +
1 )-tuple ( a1, …, am, t ), where ai∈ DAi and t ∈ ℵ, being the occurrence time of e.
Each event has a type and a time of occurrence. Therefore, given a class E of basic event type, an
event is a pair (e, t), where e ∈ E and t ∈ ℵ. An event sequence S is an ordered sequence of
events is represented as S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ) where ei ∈ E and ti ≤ ti+1 for all i
∈ {1,…, n – 1 }. Taking into account the various event types A, B, C and D an example of such
an event sequence would be S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 )
Step 2: Frequent Pattern Discovery methodology

The proposed sequence mining approach discovers the temporal associations between pairs of
event types in a given event sequence using statistical measures to calculate the frequency of
occurrence of these associations within the sequence.
In an event sequence S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ), with a user defined minimum
frequency threshold min_freq, and a maximum temporal difference boundary defined as _Tmax.
In a finite set named E consisting of different event types that are contained in the event
sequence S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 ), the proposed frequent pattern
approach discovers frequent patterns that belong to the pattern class of X _ Y [_TS, _TE]. In this
equation X, Y are event types that belong to E and [_TS, _TE ] are the relative time frames,
within which event Y happens after event X with a frequency ≥ min_freq. _TS, _TE are positive
integers with _TS ≤ _TE ≤ _Tmax. The frequency f of a pattern is calculated as the number of
times the pattern occurs for a different occurrence of X, divided by the total number of times X
occurs in the sequence. All the occurrences of the pattern using the same occurrence of X are
counted as one and only different occurrences are taken into account.
Step 3: Implementation of prototype
The main components of the prototype are update component and ETL component.
a) The update component consisting of a set of agents who communicate with the web sources to
retrieve commercial transaction data and store them locally. As the company has branches all
over the world the data format is not similar for all the sources, hence a dedicated agent is used
to checks the data for consistency, to schedule the daily automatic update, to clean and prepare
the data before storing into the research database of the system.
b) The ETL (Extract-Transform-Load) component consists of a set of tools for the preparation of
the data before a data mining algorithm uses them. Each tool depend on a specific algorithms
hence each algorithm needs its own tool. The proposed temporal data mining algorithm also uses
a tool. The sequence mining algorithm (data mining engine component) needs a specific type of
cleaned and prepared input in order to run.
The ETL component prepares a table in the database with just two fields: Event description,
Event timestamp which describe the event type that occurred on the Event timestamp. The form
of the results produced by the algorithm is: LHS Event Type _ RHS Event Type [_TS, _TE],
Frequency.
c) The evaluation component intakes the set of rules produced by the sequence mining algorithm
and checks the produced rule (pattern) with an algorithm to search for equal patterns ( LHS and
RHS) and decides to update or not the rule that is already stored. The three parameters that are
checked are _TS, _TE and the frequency value. The patterns produced by the evaluation
component are stored. The pattern warehouse already includes some standard events types: the
Product
Event
priceProduct
of a service,
ID the high/low demand fluctuation and the high/low usage fluctuation. Each
Event ID
Product Descriptn
Event Description
pattern consists of many Price
events and every event can be related with a business transaction.
Price ID
High value
Service Low value Event Hierarchy
Service ID Event ID
Service Descripton Event Level
Pattern
Fact Table Event Condition
Pattern ID
Pattern Descripton PatternDescription
Volume Time ID
Service Pattern Volume ID Event Pattern
Service ID High value Event ID
Pattern Descriptin Low value Pattern ID
Demand ID
Price ID
Volume ID Demand
Time ID Demand ID
High value Time Granularity
Low value Year
Quarter
Month
Week
Day

DM Te

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DM Te

Uploaded by

Copyright:

Available Formats

Temporal data mining:

App usage, statistical variation of device usage etc.,

transformation using Discreet Fourier Transformations or Discreet Wavelet Transformations.

based on a predefined and domain-specific measure of similarity.

Proposed Sequence Mining Approach

Step 1: Define the Event Sequence

Am } of event attributes with respective domains DA1, …, DAm. An event e over R is a ( m +

an event sequence would be S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 )

Step 2: Frequent Pattern Discovery methodology

occurrence of these associations within the sequence.

In an event sequence S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ), with a user defined minimum

sequence S = ( B, 4 ), ( D, 6 ), ( C, 6 ), ( A, 8 ), ( B, 12 ), ( D, 15 ), the proposed frequent pattern

Step 3: Implementation of prototype

cleaned and prepared input in order to run.

You might also like