Professional Documents
Culture Documents
Temporal data mining “is a process of Knowledge Discovery in Temporal Databases that
enumerates structures (temporal patterns or models) over the temporal data”. Examples of
temporal patterns or models (trends) in the data are event associations over time, similar pattern
based time series retrieval, time series based indexing and segmentation. In the telecom market
domain, temporal data mining helps in identifying temporal patterns like the fluctuation of
bandwidth prices, range of variation in band width demand, range of variation in band width
usage, the period of demand time, statistical variation of service usage, statistical variation of
Time Series
A sequence of continuous real-valued elements, such as service prices, product prices, bandwidth
prices is known as a time series. This time series data is represented by piecewise linear
Then the Euclidean Distance is used for identification of trends (patterns & models) through the
comparison of time series to discover the similarity measure of similar shapes between them,
Sequence Mining
The goal of this research study in mining sequential patterns from the commercial databases of
the telecom company is to predict the customer buying behavior. The proposed tool will use a
novel sequence mining algorithm which is able to extract statistically significant patterns
(similarity measures) of the form A_B[t] where an event of type B follows an event type of A
within time t, from event sequences with a goal of finding the optimal t for each of the extracted
patterns.
An event sequence S is a collection of events over R (linear time). In a given set R = { A1, …,
1 )-tuple ( a1, …, am, t ), where ai∈ DAi and t ∈ ℵ, being the occurrence time of e.
Each event has a type and a time of occurrence. Therefore, given a class E of basic event type, an
event is a pair (e, t), where e ∈ E and t ∈ ℵ. An event sequence S is an ordered sequence of
events is represented as S = ( e1, t1 ), ( e2, t2 ), …, ( en, tn ) where ei ∈ E and ti ≤ ti+1 for all i
∈ {1,…, n – 1 }. Taking into account the various event types A, B, C and D an example of such
event types in a given event sequence using statistical measures to calculate the frequency of
frequency threshold min_freq, and a maximum temporal difference boundary defined as _Tmax.
In a finite set named E consisting of different event types that are contained in the event
approach discovers frequent patterns that belong to the pattern class of X _ Y [_TS, _TE]. In this
equation X, Y are event types that belong to E and [_TS, _TE ] are the relative time frames,
within which event Y happens after event X with a frequency ≥ min_freq. _TS, _TE are positive
integers with _TS ≤ _TE ≤ _Tmax. The frequency f of a pattern is calculated as the number of
times the pattern occurs for a different occurrence of X, divided by the total number of times X
occurs in the sequence. All the occurrences of the pattern using the same occurrence of X are
counted as one and only different occurrences are taken into account.
The main components of the prototype are update component and ETL component.
a) The update component consisting of a set of agents who communicate with the web sources to
retrieve commercial transaction data and store them locally. As the company has branches all
over the world the data format is not similar for all the sources, hence a dedicated agent is used
to checks the data for consistency, to schedule the daily automatic update, to clean and prepare
the data before storing into the research database of the system.
b) The ETL (Extract-Transform-Load) component consists of a set of tools for the preparation of
the data before a data mining algorithm uses them. Each tool depend on a specific algorithms
hence each algorithm needs its own tool. The proposed temporal data mining algorithm also uses
a tool. The sequence mining algorithm (data mining engine component) needs a specific type of
The ETL component prepares a table in the database with just two fields: Event description,
Event timestamp which describe the event type that occurred on the Event timestamp. The form
of the results produced by the algorithm is: LHS Event Type _ RHS Event Type [_TS, _TE],
Frequency.
c) The evaluation component intakes the set of rules produced by the sequence mining algorithm
and checks the produced rule (pattern) with an algorithm to search for equal patterns ( LHS and
RHS) and decides to update or not the rule that is already stored. The three parameters that are
checked are _TS, _TE and the frequency value. The patterns produced by the evaluation
component are stored. The pattern warehouse already includes some standard events types: the
Product
Event
priceProduct
of a service,
ID the high/low demand fluctuation and the high/low usage fluctuation. Each
Event ID
Product Descriptn
Event Description
pattern consists of many Price
events and every event can be related with a business transaction.
Price ID
High value
Service Low value Event Hierarchy
Service ID Event ID
Service Descripton Event Level
Pattern
Fact Table Event Condition
Pattern ID
Pattern Descripton PatternDescription
Volume Time ID
Service Pattern Volume ID Event Pattern
Service ID High value Event ID
Pattern Descriptin Low value Pattern ID
Demand ID
Price ID
Volume ID Demand
Time ID Demand ID
High value Time Granularity
Low value Year
Quarter
Month
Week
Day