You are on page 1of 19

Weekly Meeting

30th August 2022


Isura Manchanayaka
Doctor of Philosophy – Engineering and IT [056957F]
Order Relaxed Accuracy
• For each event (𝑡, 𝑒) in the true sequence of events that appears in
the 𝑖-th position,
• Check if there are any events of class 𝑒 appear in the next 𝑟 events, where
𝑟 is arbitrarily chosen and called relaxation
• If there is, it is counted as a correct prediction otherwise ignored
!"##$!% &#$'(!%(")*
• Relaxed accuracy metric is calculated as %"%+, )-./$# "0 $1$)%*
Questions
• How does the order relaxed accuracy perform on noisy synthetic
data?
• How does the order relaxed accuracy perform on random data?
• How does the order relaxed accuracy perform on real data?
Synthetic Data – n=10
Synthetic Data – n=10
Synthetic Data – n=10
Synthetic Data – n=10
Moving to large datasets
• IRA dataset
• 8.76 million tweets
• 3613 users
Coordinated Activity
Extracting an Active Time Window
• Starting from 2014-08-14
• Ending in 2014-08-28
• 885 users
• 264k tweets
Performing RMTPP on the dataset
• Time error: 16.503 s
• Accuracy: 1.16%
Pruning Retweet Network
• A retweet network is built using the data
• Nodes – Users
• Edges – Number of retweets between users. The edges are undirected.
• The nodes are iteratively pruned based on the sum of the edge
weights belonging to that node
• If ∑2∈4 𝑤{6,2} ≤ 𝑒9, then 𝑢 is pruned from the graph
• This is iteratively done until no node is pruned
Results
IRA Data (Pruned
Randomly Generated
Relaxation IRA Data – 885 users with threshold 400) –
Data – 57 users
57 users
0 1.16% 2.59% 1.79%
5 1.64% 3.07% 1.80%
10 1.82% 3.31% 1.80%
100 2.18% 4.27% 1.80%
1000 2.38% 4.69% 1.80%
Order Relaxed Accuracy
• For each user in the true sequence of events, number of correctly
predicted times is divided by the number of occurrences that user
appears in the true sequence of events is used to obtain order
relaxed accuracy per user
Results – Number of users with individual
relaxed accuracy more than some given
value (IRA Data – Without Pruning)

Accuracy Lower Relaxation =


Relaxation = 0 Relaxation = 10 Relaxation = 100
Bound 1000
80% 1/885 (0.11%) 1/885 (0.11%) 2/885 (0.11%) 5/885 (0.11%)
50% 1/885 (0.11%) 4/885 (0.45%) 5/885 (0.56%) 5/885 (0.56%)
20% 3/885 (0.34%) 4/885 (0.45%) 5/885 (0.56%) 5/885 (0.56%)
10% 4/885 (0.45%) 5/885 (0.56%) 5/885 (0.56%) 5/885 (0.56%)
5% 4/885 (0.45%) 5/885 (0.56%) 5/885 (0.56%) 6/885 (0.68%)
0% 5/885 (0.56%) 5/885 (0.56%) 5/885 (0.56%) 8/885 (0.90%)
Results – Number of users with individual
relaxed accuracy more than some given
value (IRA Data – Pruned with threshold
400)
Accuracy Lower Relaxation =
Relaxation = 0 Relaxation = 10 Relaxation = 100
Bound 1000
80% 0/57 (0%) 0/57 (0%) 2/57 (3.51%) 2/57 (3.51%)
50% 1/57 (1.75%) 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%)
20% 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%)
10% 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%)
5% 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%)
0% 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%) 2/57 (3.51%)
Analysis
• ORA gives a better metric for predictions on noisy data
• ORA is less forgiving about the errors in time since timestamp is
irrelevant, but the order is
• ORA is more natural than the Time Relaxed Accuracy since 𝑟 = 0
of ORA corresponds to regular sequential accuracy
• A small portion of the users are dominant in the predictions which
implies highly predictive behaviors
Future Work
• Need to check those highly predictive users for their activities
• Need to add noisy data to IRA dataset
Thank You

You might also like