Professional Documents
Culture Documents
駱宏毅 Hung-Yi Lo
p ( x, y )
MI ( X ; Y ) p ( x, y ) log( )
x, y p( x) p( y )
2007/07/18
3
Mutual Information (MI) (2/2)
For example, say a discrete random variable X represents visibility
at a certain moment in time and random variable Y represents
wind speed at that moment.
The mutual information between X and Y:
p( X good , Y high )
MI ( X ; Y ) p( X good , Y high ) log( )
p( X good ) p(Y high )
p( X bad , Y high )
p( X bad , Y high ) log( )
p( X bad ) p(Y high )
p( X good , Y low)
p( X good , Y low) log( )
p( X good ) p(Y low)
p( X bad , Y low)
p( X bad , Y low) log( )
p( X bad ) p(Y low)
2007/07/18
4
Pointwise Mutual Information (PMI)
PMI measures the mutual dependence of between two
instances or realizations of random variables.
Positive => high correlated
Zeros => no information (independence)
Negative => opposite correlated
p( X x, Y y )
PMI ( X x, Y y ) log( )
p( X x) p(Y y )
2007/07/18
Mutual Information and
5
X=Good X=Bad
2007/07/18
Mutual Information and
6
Y=High Y=Low
2007/07/18
Mutual Information and
7
MI ( X ; Y ) 0.35 * 0.6650
0.1
0.05 * (1.4816)
0.1* (0.9933)
p( X good ) 0.4 p( X bad ) 0.6
0.5 * 0.4155
p(Y high ) 0.45 p(Y low) 0.55
MI ( X ; Y ) 0.2671
2007/07/18
Mutual Information and
8
p ( S s, PT pt )
log( )
p ( S s ) p ( PT pt )
2007/07/18
Pointwise Mutual Information Model for 11
2007/07/18
Pointwise Mutual Information Model for 12
p( PT pt S s)
log( ) log( Node )
p( PT p
knode
t S k)
2007/07/18
Pointwise Mutual Information Model for 13
p( S s PT pt )
log( ) log( PathType )
p(S s PT m)
m path_ type
2007/07/18
14
Mutual Information for MRN (1/2)
Unlike PMI, which models the dependency of two instances
of random variables, mutual information is generally used to
compute the dependency of two random variables.
2007/07/18
15
Mutual Information for MRN (2/2)
p( S t , PT t )
MI ( S ; PT ) p( S t , PT t ) log( )
p( S t ) p( PT t )
p( S f , PT t )
p( S f , PT t ) log( )
p( S f ) p( PT t )
p( S t , PT f )
p( S t , PT f ) log( )
p( S t ) p( PT f )
p( S f , PT f )
p( S f , PT f ) log( )
p( S f ) p( PT f )
2007/07/18
17
Meta Constraints (2/4)
Meta-constraint 1: maximum path length
The farther away a node/link is to the source node, the less impact it
has on the semantics of the source node.
The longer a path is, the harder to make sense of it.
In our experiment we chose the maximum path length somewhere
between four and six.
2007/07/18
18
Meta Constraints (3/4)
Meta-constraint 3: node and link type constraints
One can specify that at least one of the nodes in a path type needs to
be of type person, or that one link in the path needs to be a murder
link.
2007/07/18
19
Meta Constraints (4/4)
Meta-constraint 5: structure constraints
In one of our experiments we ask the system to only consider paths
whose source and target nodes are the same, which we call the “loop
constraint”.
We also define a “uniform link constraint” to consider only paths with
only one single link type such as [A cites B cites C cites D].
2007/07/18
20
Finding Abnormal Nodes in the MRN (1/2)
The semantic profiles of a node: represented by using path
types as features and the dependencies (i.e., contribution or
PMI or MI) of each node with respect to each path type as
feature values.
2007/07/18
22
Outlier Detection (1/3)
Clustering-based outlier detection
CLARANS, BIRCH and CLUE.
Extract outliers as those points that cannot be clustered into any
clusters based on a given clustering method.
2007/07/18
23
Outlier Detection (2/3)
Distance-based outlier detection
Identifies outlier points simply as those that look very different from
their neighbor points.
Distance-based outlier detectors look for outliers from a local point of
view instead of a global one. That is, they do not look for a point that
is significantly different from the rest of the world, but for a point that
is significantly different from its closest points.
2007/07/18
24
Outlier Detection (3/3)
Distance-based outlier detection is useful in a security
domain, since malicious individuals who try to disguise
themselves by playing certain roles but fail to get everything
right are likely to still be different from genuine players of
that role.
2007/07/18
UNICORN: An Unsupervised Abnormal Node 25
Discovery Framework
UNICORN, as an abnormal animal, is the abbreviation for
“UNsupervised Instance disCOvery in multi-Relational
Networks”.
2007/07/18
26
UNICORN Pseudo-Code (1/2)
2007/07/18
27
UNICORN Pseudo-Code (2/2)
2007/07/18
28
Local Node Discovery
Goal: identify a node that is abnormally connected to a given
node s.
We consider only the paths that start from s, and can be done by
simply adding one meta-constraint of type 6 during the feature
value generation stage.
2007/07/18
29
謝謝大家
2007/07/18