Professional Documents
Culture Documents
systems:
Distributed Compu5ng, Synchroniza5on and Data Mining
Nick Freris
Cybeprhysical Systems Laboratory
New York University Abu Dhabi
h3ps://wp.nyu.edu/cpslab
Cyberphysical systems (CPS)
▪ Large networks of smart devices
1 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Cyberphysical systems (CPS)
▪ The trinity of C’s
1 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons
Smart grids
2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons
Smart highways
Railway networks
Air traffic
Transporta5on systems
2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons
mHealth
Healthcare
2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons
Forma5on control
Defense
2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Design Challenges
▪ Scalability
• Millions of nodes
▪ Wireless communica5on
• Fault-tolerant to ?me-varying topologies, broken links
▪ Security
• Electronic warfare
▪ Big Data
• Mining & Learning
▪ Limited BaQeries
• Energy-hungry opera?ons such as GPS
▪ Privacy
• Anonymiza?on of data (e.g., medical)
3 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Proposed Solu5ons
▪ Scalability
• Distributed compu?ng
▪ Wireless communica5on
• Distributed communica?on protocols
▪ Big Data
• Data mining on compressed data
▪ Limited BaQeries
• Clock Synchroniza?on
3 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Outline
▪ Distributed compu5ng
• Gossip algorithms
▪ Clock Synchroniza5on
• Protocols & Limita?ons
▪ Data mining
• Exact informa?on retrieval from inexact data
4 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Distributed compu5ng
5 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Gossip algorithms
▪ Consensus: Distributed Asynchronous Averaging
Similarly: min, max, product
1
3 2 Analysis: Randomized ac?va?on
Implementa?on: Exponen?al clocks
1
Node i ?cks ~ Exp ( )
i
ó
5
3 Node i ac?vated w.p. P i
i i
Applica?ons:
4.5
7
5 Distributed control, op?miza?on,
synchroniza?on/localiza?on, wireless
protocols
exponen5al convergence
4.5
4
Tsitsiklis et al., Boyd et al., Dimakis, et al.
6 / 40
Extension to linear systems
CPS: Distributed Computing, Synchronization and Data Mining
Randomized Kaczmarz
▪ Itera5ve algorithm for solving Ax = b
Randomized selection
of row
Projection to the solution
space of selected row
{A(ik ) x = bik }
Alternating projections
▪ Least-squares:
• Bad idea (squaring the condi?on number)
8 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Op5mal de-noising
▪ LS for inconsistent system:
• Solu?on: projec?on to the range space of A
Ax = b R(A)
[FZ’13]
Randomized selection
of column
Projection to the orthogonal complement
of the selected column
9 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Puang the pieces together
▪ RK and de-noising:
Randomized Kaczmarz
Termination criteria
10 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Analysis of REK
▪ Rate of convergence (ZF’13):
1
Ekx(k) xLS k2 (1 )k [kxLS k2 + ckbR(A) k2 k]
2F (A)
▪ same exponent, no delay
▪ Expected number of arithme5c opera5ons:
• propor?onal to
▪ sparsity
▪ squared condi?on number
11 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
▪ Implementa5on in C
• REK-C
• REK-BLAS (level-1 BLAS rou?nes + Blendenpik)
▪ Comparison
• Matlab backslash \
• LAPACK
▪ DGELSY (QR factoriza?on)
▪ DGELSD (SVD)
▪ LSNR
• Blendenpik
12 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
Sparse Dense
13 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪ A is the incidence matrix
• RK is distributed
• Average consensus is special case
Ax = 0
13 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪ L is the Laplacian matrix
L = AT A
1 2
0 1
2 -1 -1 0 0
B -1 3 -1 -1 0 C
3 B C
L=B
B -1 -1 3 0 -1 C
C
@ 0 -1 0 2 -1 A
4
0 0 -1 -1 2
nxn
5
14 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪ Laplacian systems
T
A y=b
Lx = b
Ax = y
• Running 2 RKs in parallel
• Same convergence
▪ Extensions Applica5ons
• Online measurements Synchroniza?on / localiza?on
• Varying topologies Control / op?miza?on
Effec?ve resistance computa?on
• Accelerated versions
14 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Clock Synchroniza5on
▪ Applica5ons
• Sensor Networks: Localiza?on / Tracking, Data Fusion, Duty-cycling
• Wireless Networks: Slo3ed protocols, Scheduling
• Control: Distributed control / op?miza?on, forma?on control
• and more…
15 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Is it even possible? (FGK’11)
bi skew ai
Node 1
(reference)
offset
t
(2) (1)
r1,j r1,j
aj = (2) (1)
s1 s1 …
Unknowns : 4
Rank = 3
NO!
16 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Synchroniza5on
▪ Protocols
• Internet (ms): NTP
• Sensornets (μs): RBS, FTSP, TPSN, DiSync
▪ HowTo
• Noisy rela?ve measurements
yij = xi xj + wij
j i j
j
18 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Smoothing
▪ Pairwise averaging
1
Node
i sets : x̂i (x̂i + x̂j yij )
2
1 i j
Node j sets : x̂j (x̂i + x̂j + yij )
2
j
▪ Over-smoothing
• Neighborhood averaging: for all j 2 Ni j i j
j
Trade-off: convergence speed vs # of exchanged messages
Localiza5on is analogous
19 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Simula5ons
Faster convergence
Energy savings
20 / 40
CPS: Distributed Computing, Synchronization and Data Mining
A model for clocks (FBK’13)
▪ State (Ornstein-Uhlenbeck process)
•
• Independent Brownian mo?ons
▪ Skew
▪ Time display
21 / 40
CPS: Distributed Computing, Synchronization and Data Mining
MBCSP
▪ Skew es5ma5on
r1 r2
r2 r1
↵= X ⇠ log ↵
s2 s1
s1 s2
• Linear Filtering (Kalman-Bucy)
• Distributed implementa?on
▪ Offset es5ma5on
r1 s 2 ⌧ = (s2 + ↵d) r2
1 1
d= [(r2 s2 ) + (r1 s1 ) + (s2 r1 )(1 )]
2↵ ↵
s1 r2
• Network Smoothing
• Delay es?ma?on
22 / 40
CPS: Distributed Computing, Synchronization and Data Mining
MBCSP
▪ Accuracy
r1 r2 predictable
estimated
r2 r1
↵ =
s2 s1
s1 s2
known
pr
accur := |r2 r2 |
▪ Experiments
• <1 μsec
• ~45% be3er
23 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Data Mining
Original Quantized
Watermarking
24 / 40
Exact data mining from in-exact data
25 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Op5mal distance es5ma5on
▪ Distance es5ma5on has various data analy5cs applica5ons
• Clustering / Classifica?on
• Anomaly detec?on
• Similarity search (k-NN)
Now we can do all this very efficiently directly on the
compressed data!
27 / 40
CPS: Distributed Computing, Synchronization and Data Mining
underlying opera5on: k-NN search
▪ K-Nearest Neighbor (k-NN) similarity search
▪ Issues that arise:
• How to compress data?
• How to speedup search
▪ We have to es?mate ?ght bounds on the distance metric using just the compressed
representa?on
7
D= [d ij ] := [||x i xj ||2 ]
1 2
N N (i) := arg min dij
j6=i 5
N N (1) = 3 3
N N (2) = 3 5
3
N N (3) = 2
27 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Similarity search
Distance
query
D = 7.3
k-NN: D = 10.2
D = 17
D = 22
28 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Speed-up
simplified Candidate original Final
DB Superset DB Answer set
Verify
against
original DB
Upper / lower
bounds on distance
keyword 1
keyword 2
simplified
query keyword 3
…
keyword 28 / 40
CPS: Distributed Computing, Synchronization and Data Mining
compressing weblog data
Use Euclidean distance to match ?me-series. But how should we compress the
data?
1 year span
0.4326 65.0630
§ Approximate the Euclidean distance using DFT 2.0981 0.4721 +20.2755i
1.9728 4.6455 - 0.7204i
1.6851 -11.6083 - 5.8807i
2.8316 1.0588 - 3.9980i
1.6407 0.3014 + 3.4492i
0.4515 -1.3769 + 3.3947i
0.4892 0.5637 + 3.4632i
0.1619 -2.7711 + 0.3124i
11.5517 0.0128 -3.6572 - 1.4000i
0.1739 0.1078 - 6.0011i
0.5518 -1.9892 + 2.8143i
0.0365 -2.6120 + 3.4182i
2.1467 -3.9104 - 0.4136i
2.0103 -1.2362 + 3.5155i
2.1243 -2.2397 - 0.6038i
3.1910 -2.7173
3.2503 -2.2397 + 0.6038i
3.1547 -1.2362 - 3.5155i
2.3223 -3.9104 + 0.4136i
First 5 Coefficients +symmetric ones
2.6167 -2.6120 - 3.4182i
1.2805 -1.9892 - 2.8143i
1.9949 0.1078 + 6.0011i
3.6184 -3.6572 + 1.4000i
2.9266 -2.7711 - 0.3124i
3.7846 0.5637 - 3.4632i
5.0386 -1.3769 - 3.3947i
7.9234 3.4449 0.3014 - 3.4492i
2.0039 1.0588 + 3.9980i
2.5751 -11.6083 + 5.8807i
2.1752 4.6455 + 0.7204i
2.8652 0.4721 -20.2755i
29 / 40
Similarity Search x(n) X(f)
0.4326 65.0630
§ Approximate the Euclidean distance using DFT 2.0981 0.4721 +20.2755i
1.9728 4.6455 - 0.7204i
1.6851 -11.6083 - 5.8807i
2.8316 1.0588 - 3.9980i
1.6407 0.3014 + 3.4492i
0.4515 -1.3769 + 3.3947i
0.4892 0.5637 + 3.4632i
0.1619 -2.7711 + 0.3124i
11.5517 0.0128 -3.6572 - 1.4000i
0.1739 0.1078 - 6.0011i
0.5518 -1.9892 + 2.8143i
0.0365 -2.6120 + 3.4182i
2.1467 -3.9104 - 0.4136i
2.0103 -1.2362 + 3.5155i
2.1243 -2.2397 - 0.6038i
3.1910 -2.7173
3.2503 -2.2397 + 0.6038i
3.1547 -1.2362 - 3.5155i
Best 5 Coefficients + symmetric ones 2.3223 -3.9104 + 0.4136i
2.6167 -2.6120 - 3.4182i
1.2805 -1.9892 - 2.8143i
1.9949 0.1078 + 6.0011i
3.6184 -3.6572 + 1.4000i
2.9266 -2.7711 - 0.3124i
3.7846 0.5637 - 3.4632i
5.0386 -1.3769 - 3.3947i
11.1624 3.4449 0.3014 - 3.4492i
2.0039 1.0588 + 3.9980i
2.5751 -11.6083 + 5.8807i
2.1752 4.6455 + 0.7204i
2.8652 0.4721 -20.2755i
29 / 40
objec5ve
▪ Calculate the 5ghtest possible upper/lower bounds
using the coefficients with the highest energy
30 / 40
CPS: Distributed Computing, Synchronization and Data Mining
x X Q q
-1.7313 0.0000 0.0000 -1.3356
-0.7221 -12.0861 + 4.4812i -2.4756 + 1.7973i 0.6182
-1.2267 7.0708 - 3.3545i -2.8455 - 0.8510i -1.6135
-0.4194 0.8045 + 4.2567i -5.5245 - 2.4452i 1.0842
-0.4194 0.2386 + 2.0592i -2.4342 - 1.9897i 0.7981
-0.0158 -1.6003 + 6.7154i -2.1082 + 1.6303i 0.2667
-0.8231 -2.6539 + 0.8595i -3.6438 - 1.9424i -1.2048
-1.2267 5.8845 + 3.7689i -1.5989 - 0.1514i 0.7654
-0.3185 -2.0182 + 3.9356i 1.6186 - 2.3544i 0.6182
-0.3185 -3.9484 + 0.2234i 15.2057 - 6.0515i -1.4500
-0.1167 -1.2440 + 2.4538i -3.6746 + 0.3413i -0.2892
-0.5203 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6264
0.0851 -1.0458 + 4.0774i -1.8331 - 0.6000i 0.2503
0.6906 -4.2436 + 0.5988i 0.1642 + 1.3907i -1.0740
0.1861 -6.4020 - 0.9529i -7.3632 - 5.7938i 0.7163
0.7915 -3.3663 - 1.0825i -2.2362 - 1.7287i 0.7654
0.7915 -2.9264 -5.1339 -1.5073
1.2961 -3.3663 + 1.0825i -2.2362 + 1.7287i 1.1987
2.3052 -6.4020 + 0.9529i -7.3632 + 5.7938i 0.7735
1.9016 -4.2436 - 0.5988i 0.1642 - 1.3907i 0.1277
0.1861 -1.0458 - 4.0774i -1.8331 + 0.6000i -1.2865
-0.0158 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6509
0.5897 -1.2440 - 2.4538i -3.6746 - 0.3413i 0.5283
0.8924 -3.9484 - 0.2234i 15.2057 + 6.0515i -1.0413
0.1861 -2.0182 - 3.9356i 1.6186 + 2.3544i 0.9207
-1.7313 5.8845 - 3.7689i -1.5989 + 0.1514i 1.2150
-0.3185 -2.6539 - 0.8595i -3.6438 + 1.9424i 0.4302
-0.9240 -1.6003 - 6.7154i -2.1082 - 1.6303i -1.2211
-0.5203 0.2386 - 2.0592i -2.4342 + 1.9897i 1.0678
-0.4194 0.8045 - 4.2567i -5.5245 + 2.4452i 1.0351
-0.3185 7.0708 + 3.3545i -2.8455 + 0.8510i -1.4337
2.2043 -12.0861 - 4.4812i -2.4756 - 1.7973i -1.0004
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ Find best k coefficients of X (k=4)
Magnitude ||X|| X Q ||Q||
vector 0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ Find best 4 coefficients of X
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ Iden?fy smallest magnitude (à power)
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ The remaining powers are less than minPower
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ We keep also the sum of squares of the remaining powers
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
e2 6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ Calculate distance from k known coeffs
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§ Op?mize distance from remaining coefficients
||X|| X Q ||Q||
0.0000
12.8901
0.0000
-12.0861 + 4.4812i
0.0000
-2.4756 + 1.7973i
0
3.0592
Op5miza5on
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -3.6438 - 1.9424i
-2.6539 + 0.8595i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360
6.4725
-3.3663 + 1.0825i -2.2362 + 1.7287i
-7.3632 + 5.7938i
2.8265 Generaliza?on
-6.4020 + 0.9529i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592
31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X
32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
▪ When both sequences are compressed the op5mal distance
es5ma5on can be solved using a double waterfilling process.
33 / 40
CPS: Distributed Computing, Synchronization and Data Mining
the bounds are op5mally 5ght
Theorem (FV’12, VFK’13): The computa5on of lower and upper bounds
given the aforemen5oned compression can be solved exactly using
double water-filling. The lower and upper bounds are op5mally 5ght; no
5ghter solu5ons can be provided
33 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Sketch of proof
X X X X
max qi xi + qi x i + qi xi + qi x i
x,q
i2p+ +
q \px i2px \p+
q i2pq \p+
x i2pq \px
X X X
max qi x i + qi x i + qi x i
x,q
i2px \p+
q i2pq \p+
x i2pq \px
xi 2 [0, Ux ] qi 2 [0, Uq ] X
X X x2i = ex e(1)
x
x2i = e(1)
x qi2 = e(1)
q i
i i
X
qi2 = eq e(1)
q
uncoupled waterfillings i
C-S
q q
(1) (1)
ex ex eq eq
34 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Double Water-filling algorithm
Intui?on:
Water-fill for the discarded
coefficients of the two vectors
separately..
..using the appropriate energy
alloca?on
Zero-overhead algorithm
Complexity: Θ(nlong)
35 / 40
Experiments
▪ Unica database. IBM web traffic for year of 2010
§ Marke?ng/Adwords recommenda?on
– Analysis/Storage of weblog queries (1TB of data per month)
– GBS: Scheduling adver?sing campaigns / pricing
YIN YANG OF FINANCIAL DISRUPTION EINSURANCE CUSTOMER EXPERIENCE. IBM GLOBAL BUSINESS ANDREW STEVENS
BUSINESS DYNAMICS IBM
36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
the previous (10-20%) improvement in distance es?ma?on can
significantly reduce the search space when searching for k-NN
We retrieve
20%-80% fewer
sequences than other
approaches
36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Extensions
▪ Cosine Similarity (text documents):
Dynamic Time Warping
cos(x,y) = 1 - L2(x,y)2/2
Halloween
▪ Correla?on (financial analysis):
corr(x,y) = 1 - L2(x,y)2/2 (for normalized signals x,y)
▪ Dynamic Time Warping (flexible similarity metric) Christmas
37 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Conclusions
▪ Distributed Compu5ng
• Improvements for sparse systems
• Design method for gossip algorithms
▪ Clock synchroniza5on
• Fundamental limits
• Distributed protocols
▪ Data Mining
• Exact mining from inexact data
• Op?mal distance bounds
38 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Other research (1)
1. Compressed sensing
Ø Recursive algorithm for streaming data
2. Wireless resource alloca5on
Ø MAC, cross-layer design
39 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Other research (2)
4. Data mining
Cluster-preserving compression kNN-preserving watermarking
39 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Acknowledgements
▪ Collaborators:
• Anastasios Zouzias, IBM-Research
• Michail Vlachos, IBM-Research
• Mar?n Ve3erli, EPFL
• P. R. Kumar, Texas A&M University
• Vivek Borkar, Indian Ins?tute of Technology Bombay
▪ Funding
40 / 40
CPS: Distributed Computing, Synchronization and Data Mining
References
[FGK11] N. Freris, S. Graham and P. R. Kumar, “Fundamental Limits on Synchronizing Clocks over
Networks.” IEEE Transac?ons on Automa?c Control, vol. 56, no. 2, pp. 1352-1364, June 2011.
[FZ12] N. Freris and A. Zouzias, “Fast distributed smoothing of rela5ve measurements." 51st IEEE
Conference on Decision and Control (CDC), pp.1411-1416, Dec. 2012.
[AZ13] A. Zouzias and N. Freris, “Randomized Extended Kaczmarz for Solving Least Squares.” SIAM
Journal on Matrix Analysis and Applica?ons, vol. 34(2), pp. 773-793, 2013.
[FBK13] N. Freris, V. Borkar and P. R. Kumar, “Distributed model-based clock synchroniza5on in wireless sensor
networks.” Submi3ed to IEEE/ACM Transac?ons on Networking, Nov. 2013.
[FZ14] N. Freris and A. Zouzias, “Randomized gossip algorithms for solving Laplacian systems.”
(invited paper) submi3ed to 53d IEEE Conference on Decision and Control (CDC), Mar. 2014.
[VFK15] M. Vlachos, N. Freris and A. Kyrillidis, “Compressive Mining: Fast and Op5mal Data Mining in the
Compressed Domain.” Interna?onal Journal on Very Large Data Bases (VLDBJ), vol. 24(1), pp. 1-24, 2015.
CPS: Distributed Computing, Synchronization and Data Mining
CSPLab is hiring!
https://wp.nyu.edu/cpslab