You are on page 1of 80

Cyberphysical

systems:
Distributed Compu5ng, Synchroniza5on and Data Mining

Nick Freris
Cybeprhysical Systems Laboratory
New York University Abu Dhabi
h3ps://wp.nyu.edu/cpslab


Cyberphysical systems (CPS)
▪  Large networks of smart devices

1 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Cyberphysical systems (CPS)
▪  The trinity of C’s

30 billion devices by 2020

How to design such systems?

1 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons











Smart grids

2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons




Smart highways

Railway networks






Air traffic

Transporta5on systems

2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons
mHealth










Healthcare

2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Applica5ons











Forma5on control

Defense

2 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Design Challenges
▪  Scalability
•  Millions of nodes
▪  Wireless communica5on
•  Fault-tolerant to ?me-varying topologies, broken links
▪  Security
•  Electronic warfare
▪  Big Data
•  Mining & Learning
▪  Limited BaQeries
•  Energy-hungry opera?ons such as GPS
▪  Privacy
•  Anonymiza?on of data (e.g., medical)

Need for new theories

3 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Proposed Solu5ons
▪  Scalability
•  Distributed compu?ng
▪  Wireless communica5on
•  Distributed communica?on protocols
▪  Big Data
•  Data mining on compressed data
▪  Limited BaQeries
•  Clock Synchroniza?on

Need for new theories

3 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Outline

▪  Distributed compu5ng
•  Gossip algorithms

▪  Clock Synchroniza5on
•  Protocols & Limita?ons

▪  Data mining
•  Exact informa?on retrieval from inexact data

4 / 40
CPS: Distributed Computing, Synchronization and Data Mining


Distributed compu5ng

CPS: Distributed Computing, Synchronization and Data Mining


Mo5va5on
▪  Cloud compu5ng
•  Large-scale
•  In-network processing



▪  Fault-tolerance
•  Broken Links
•  Mobile agents


▪  Liveness
•  Message passing is expensive
•  so is GPS (for synchroniza?on)

5 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Gossip algorithms

▪  Consensus: Distributed Asynchronous Averaging
Similarly: min, max, product

1
3 2 Analysis: Randomized ac?va?on

Implementa?on: Exponen?al clocks
1
Node i ?cks ~ Exp ( )
i
ó
5
3 Node i ac?vated w.p. P i
i i

Applica?ons:
4.5
7
5 Distributed control, op?miza?on,
synchroniza?on/localiza?on, wireless
protocols
exponen5al convergence
4.5
4
Tsitsiklis et al., Boyd et al., Dimakis, et al.
6 / 40
Extension to linear systems
CPS: Distributed Computing, Synchronization and Data Mining
Randomized Kaczmarz
▪  Itera5ve algorithm for solving Ax = b

Randomized selection
of row
Projection to the solution
space of selected row
{A(ik ) x = bik }

▪  Exponen5al convergence in m.s. (SV’09, FZ’13) ||A||2F


•  Rate of convergence:
2F := 2
min

CPS: Distributed Computing, Synchronization and Data Mining
7 / 40
Randomized Kaczmarz
▪  Itera5ve algorithm for solving Ax = b

Alternating projections

▪  Exponen5al convergence in m.s. (SV’09, FZ’13)


•  Rate of convergence:
performance depends on row scaling

CPS: Distributed Computing, Synchronization and Data Mining
7 / 40
Noisy case
▪  Noisy measurements:
▪  Oscillatory behavior
•  Asympto?cally constrained in a ball (N’10, FZ’12)

▪  Least-squares:
•  Bad idea (squaring the condi?on number)

8 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Op5mal de-noising
▪  LS for inconsistent system:
•  Solu?on: projec?on to the range space of A
Ax = b R(A)
[FZ’13]

Randomized selection
of column
Projection to the orthogonal complement
of the selected column

same rate of convergence


2
||A||F
2F := 2
min

9 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Puang the pieces together
▪  RK and de-noising:

Randomized orthogonal projection

Randomized Kaczmarz

Termination criteria

10 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Analysis of REK
▪  Rate of convergence (ZF’13):
1

Ekx(k) xLS k2  (1 )k [kxLS k2 + ckbR(A) k2 k]
2F (A)
▪  same exponent, no delay
▪  Expected number of arithme5c opera5ons:

•  propor?onal to
▪  sparsity
▪  squared condi?on number

Suitable for sparse systems

11 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
▪  Implementa5on in C
•  REK-C
•  REK-BLAS (level-1 BLAS rou?nes + Blendenpik)
▪  Comparison
•  Matlab backslash \
•  LAPACK
▪  DGELSY (QR factoriza?on)
▪  DGELSD (SVD)
▪  LSNR
•  Blendenpik

12 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments

Sparse Dense

Outperformance for sparse systems 12 / 40


RK Gossiping
▪  A is the incidence matrix
n vertices
m edges
1 2
0 1
-1 1 0 0 0
B 0 -1 1 0 0 C
B C
B 1 0 -1 0 0 C
3 A=B
B
C
C
B 0 1 0 -1 0 C
@ 0 0 0 -1 1 A
4 0 0 1 0 -1
mxn
5

13 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪  A is the incidence matrix
•  RK is distributed
•  Average consensus is special case
Ax = 0

▪  Smoothing of rela5ve measurements


y
•  ij
= xi xj + wij

Convergence: Second eigenvalue of the Laplacian 2 (G)

13 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪  L is the Laplacian matrix
L = AT A
1 2
0 1
2 -1 -1 0 0
B -1 3 -1 -1 0 C
3 B C
L=B
B -1 -1 3 0 -1 C
C
@ 0 -1 0 2 -1 A
4
0 0 -1 -1 2
nxn
5

14 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Gossiping
▪  Laplacian systems
T
A y=b
Lx = b
Ax = y
•  Running 2 RKs in parallel
•  Same convergence


▪  Extensions Applica5ons
•  Online measurements Synchroniza?on / localiza?on
•  Varying topologies Control / op?miza?on
Effec?ve resistance computa?on
•  Accelerated versions

14 / 40
CPS: Distributed Computing, Synchronization and Data Mining


Clock Synchroniza5on

CPS: Distributed Computing, Synchronization and Data Mining


Mo5va5on
?
11:58 ?
12:01

▪  Applica5ons
•  Sensor Networks: Localiza?on / Tracking, Data Fusion, Duty-cycling
•  Wireless Networks: Slo3ed protocols, Scheduling
•  Control: Distributed control / op?miza?on, forma?on control
•  and more…

How accurately can we synchronize clocks?

15 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Is it even possible? (FGK’11)







bi skew ai
Node 1
(reference)
offset
t

(2) (1)
r1,j r1,j
aj = (2) (1)
s1 s1 …

Unknowns : 4
Rank = 3

NO!

16 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Synchroniza5on
▪  Protocols
•  Internet (ms): NTP
•  Sensornets (μs): RBS, FTSP, TPSN, DiSync

▪  HowTo
•  Noisy rela?ve measurements
yij = xi xj + wij

•  Jacobi algorithm (GK’06, BH’06)


▪  Neighborhood averaging j

j i j
j

18 / 40
CPS: Distributed Computing, Synchronization and Data Mining
RK Smoothing

▪  Pairwise averaging
1
Node
i sets : x̂i (x̂i + x̂j yij )
2
1 i j
Node j sets : x̂j (x̂i + x̂j + yij )
2

j
▪  Over-smoothing
•  Neighborhood averaging: for all j 2 Ni j i j
j

Trade-off: convergence speed vs # of exchanged messages

Localiza5on is analogous

19 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Simula5ons

Faster convergence
Energy savings
20 / 40
CPS: Distributed Computing, Synchronization and Data Mining
A model for clocks (FBK’13)
▪  State (Ornstein-Uhlenbeck process)
• 
•  Independent Brownian mo?ons

▪  Skew

▪  Time display

21 / 40
CPS: Distributed Computing, Synchronization and Data Mining
MBCSP
▪  Skew es5ma5on
r1 r2
r2 r1
↵= X ⇠ log ↵
s2 s1
s1 s2
•  Linear Filtering (Kalman-Bucy)
•  Distributed implementa?on

▪  Offset es5ma5on
r1 s 2 ⌧ = (s2 + ↵d) r2
1 1
d= [(r2 s2 ) + (r1 s1 ) + (s2 r1 )(1 )]
2↵ ↵
s1 r2
•  Network Smoothing
•  Delay es?ma?on

22 / 40
CPS: Distributed Computing, Synchronization and Data Mining
MBCSP
▪  Accuracy
r1 r2 predictable
estimated
r2 r1
↵ =
s2 s1
s1 s2
known

pr
accur := |r2 r2 |


▪  Experiments
•  <1 μsec
•  ~45% be3er

23 / 40
CPS: Distributed Computing, Synchronization and Data Mining


Data Mining

CPS: Distributed Computing, Synchronization and Data Mining


Mo5va5on
§  Informa5on retrieval is a large industry..
§  Biology, finance, engineering, marke?ng, vision/graphics, etc.
§  ..but data are hardly ever maintained in original form
Compression Rights protection

Original Quantized

Watermarking

24 / 40
Exact data mining from in-exact data

25 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Op5mal distance es5ma5on

This work presents op5mal es5ma5on of Euclidean distance in the


compressed domain. This result cannot be further improved


D = 11.5 D̂ 2 [11, 12]

§  Our approach is applicable on any orthonormal data compression basis (Fourier,


Wavelets, Chebyshev, PCA, etc.)
§  Our method allows up to
–  57% be3er distance es?ma?on
–  80% less computa?on effort
–  128 : 1 compression efficiency
26 / 40
CPS: Distributed Computing, Synchronization and Data Mining
CompressiveMining
▪  Time-series data are customarily compressed in order to:
•  Save storage space
•  Reduce transmission bandwidth
•  Achieve faster processing / data analysis
•  Remove noise


▪  Distance es5ma5on has various data analy5cs applica5ons
•  Clustering / Classifica?on
•  Anomaly detec?on
•  Similarity search (k-NN)

Now we can do all this very efficiently directly on the
compressed data!

27 / 40
CPS: Distributed Computing, Synchronization and Data Mining
underlying opera5on: k-NN search
▪  K-Nearest Neighbor (k-NN) similarity search
▪  Issues that arise:
•  How to compress data?
•  How to speedup search
▪  We have to es?mate ?ght bounds on the distance metric using just the compressed
representa?on

7
D= [d ij ] := [||x i xj ||2 ]

1 2
N N (i) := arg min dij
j6=i 5
N N (1) = 3 3

N N (2) = 3 5
3
N N (3) = 2
27 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Similarity search
Distance
query
D = 7.3

k-NN: D = 10.2

Objec've: Compare the query with all


sequences in DB and return the k most
similar sequences to the query. D = 11.8

D = 17

D = 22

28 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Speed-up
simplified Candidate original Final
DB Superset DB Answer set

Verify
against
original DB
Upper / lower
bounds on distance

keyword 1
keyword 2
simplified
query keyword 3

keyword 28 / 40
CPS: Distributed Computing, Synchronization and Data Mining
compressing weblog data
Use Euclidean distance to match ?me-series. But how should we compress the
data?

The data are highly periodic, so we can use Fourier


decomposition.
Instead of using the first Fourier coefficients we can
Query: “analytics and optimization” use the best ones instead.

1 year span

Query: “consulting services”


29 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Similarity Search x(n) X(f)

0.4326 65.0630
§  Approximate the Euclidean distance using DFT 2.0981 0.4721 +20.2755i
1.9728 4.6455 - 0.7204i
1.6851 -11.6083 - 5.8807i
2.8316 1.0588 - 3.9980i
1.6407 0.3014 + 3.4492i
0.4515 -1.3769 + 3.3947i
0.4892 0.5637 + 3.4632i
0.1619 -2.7711 + 0.3124i
11.5517 0.0128 -3.6572 - 1.4000i
0.1739 0.1078 - 6.0011i
0.5518 -1.9892 + 2.8143i
0.0365 -2.6120 + 3.4182i
2.1467 -3.9104 - 0.4136i
2.0103 -1.2362 + 3.5155i
2.1243 -2.2397 - 0.6038i
3.1910 -2.7173
3.2503 -2.2397 + 0.6038i
3.1547 -1.2362 - 3.5155i
2.3223 -3.9104 + 0.4136i
First 5 Coefficients +symmetric ones
2.6167 -2.6120 - 3.4182i
1.2805 -1.9892 - 2.8143i
1.9949 0.1078 + 6.0011i
3.6184 -3.6572 + 1.4000i
2.9266 -2.7711 - 0.3124i
3.7846 0.5637 - 3.4632i
5.0386 -1.3769 - 3.3947i
7.9234 3.4449 0.3014 - 3.4492i
2.0039 1.0588 + 3.9980i
2.5751 -11.6083 + 5.8807i
2.1752 4.6455 + 0.7204i
2.8652 0.4721 -20.2755i

29 / 40
Similarity Search x(n) X(f)

0.4326 65.0630
§  Approximate the Euclidean distance using DFT 2.0981 0.4721 +20.2755i
1.9728 4.6455 - 0.7204i
1.6851 -11.6083 - 5.8807i
2.8316 1.0588 - 3.9980i
1.6407 0.3014 + 3.4492i
0.4515 -1.3769 + 3.3947i
0.4892 0.5637 + 3.4632i
0.1619 -2.7711 + 0.3124i
11.5517 0.0128 -3.6572 - 1.4000i
0.1739 0.1078 - 6.0011i
0.5518 -1.9892 + 2.8143i
0.0365 -2.6120 + 3.4182i
2.1467 -3.9104 - 0.4136i
2.0103 -1.2362 + 3.5155i
2.1243 -2.2397 - 0.6038i
3.1910 -2.7173
3.2503 -2.2397 + 0.6038i
3.1547 -1.2362 - 3.5155i
Best 5 Coefficients + symmetric ones 2.3223 -3.9104 + 0.4136i
2.6167 -2.6120 - 3.4182i
1.2805 -1.9892 - 2.8143i
1.9949 0.1078 + 6.0011i
3.6184 -3.6572 + 1.4000i
2.9266 -2.7711 - 0.3124i
3.7846 0.5637 - 3.4632i
5.0386 -1.3769 - 3.3947i
11.1624 3.4449 0.3014 - 3.4492i
2.0039 1.0588 + 3.9980i
2.5751 -11.6083 + 5.8807i
2.1752 4.6455 + 0.7204i
2.8652 0.4721 -20.2755i

29 / 40
objec5ve
▪  Calculate the 5ghtest possible upper/lower bounds
using the coefficients with the highest energy

▪  This will result in beQer pruning of the search space ->


faster search

30 / 40
CPS: Distributed Computing, Synchronization and Data Mining
x X Q q
-1.7313 0.0000 0.0000 -1.3356
-0.7221 -12.0861 + 4.4812i -2.4756 + 1.7973i 0.6182
-1.2267 7.0708 - 3.3545i -2.8455 - 0.8510i -1.6135
-0.4194 0.8045 + 4.2567i -5.5245 - 2.4452i 1.0842
-0.4194 0.2386 + 2.0592i -2.4342 - 1.9897i 0.7981
-0.0158 -1.6003 + 6.7154i -2.1082 + 1.6303i 0.2667
-0.8231 -2.6539 + 0.8595i -3.6438 - 1.9424i -1.2048
-1.2267 5.8845 + 3.7689i -1.5989 - 0.1514i 0.7654
-0.3185 -2.0182 + 3.9356i 1.6186 - 2.3544i 0.6182
-0.3185 -3.9484 + 0.2234i 15.2057 - 6.0515i -1.4500
-0.1167 -1.2440 + 2.4538i -3.6746 + 0.3413i -0.2892
-0.5203 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6264
0.0851 -1.0458 + 4.0774i -1.8331 - 0.6000i 0.2503
0.6906 -4.2436 + 0.5988i 0.1642 + 1.3907i -1.0740
0.1861 -6.4020 - 0.9529i -7.3632 - 5.7938i 0.7163
0.7915 -3.3663 - 1.0825i -2.2362 - 1.7287i 0.7654
0.7915 -2.9264 -5.1339 -1.5073
1.2961 -3.3663 + 1.0825i -2.2362 + 1.7287i 1.1987
2.3052 -6.4020 + 0.9529i -7.3632 + 5.7938i 0.7735
1.9016 -4.2436 - 0.5988i 0.1642 - 1.3907i 0.1277
0.1861 -1.0458 - 4.0774i -1.8331 + 0.6000i -1.2865
-0.0158 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6509
0.5897 -1.2440 - 2.4538i -3.6746 - 0.3413i 0.5283
0.8924 -3.9484 - 0.2234i 15.2057 + 6.0515i -1.0413
0.1861 -2.0182 - 3.9356i 1.6186 + 2.3544i 0.9207
-1.7313 5.8845 - 3.7689i -1.5989 + 0.1514i 1.2150
-0.3185 -2.6539 - 0.8595i -3.6438 + 1.9424i 0.4302
-0.9240 -1.6003 - 6.7154i -2.1082 - 1.6303i -1.2211
-0.5203 0.2386 - 2.0592i -2.4342 + 1.9897i 1.0678
-0.4194 0.8045 - 4.2567i -5.5245 + 2.4452i 1.0351
-0.3185 7.0708 + 3.3545i -2.8455 + 0.8510i -1.4337
2.2043 -12.0861 - 4.4812i -2.4756 - 1.7973i -1.0004

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  Find best k coefficients of X (k=4)
Magnitude ||X|| X Q ||Q||
vector 0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  Find best 4 coefficients of X
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  Iden?fy smallest magnitude (à power)
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  The remaining powers are less than minPower
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  We keep also the sum of squares of the remaining powers
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
e2 6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  Calculate distance from k known coeffs
||X|| X Q ||Q||
0.0000 0.0000 0.0000 0
12.8901 -12.0861 + 4.4812i -2.4756 + 1.7973i 3.0592
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
2.7897 -2.6539 + 0.8595i -3.6438 - 1.9424i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360 -3.3663 + 1.0825i -2.2362 + 1.7287i 2.8265
6.4725 -6.4020 + 0.9529i -7.3632 + 5.7938i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
§  Op?mize distance from remaining coefficients
||X|| X Q ||Q||
0.0000
12.8901
0.0000
-12.0861 + 4.4812i
0.0000
-2.4756 + 1.7973i
0
3.0592
Op5miza5on
7.8261 7.0708 - 3.3545i -2.8455 - 0.8510i 2.9700
4.3321 0.8045 + 4.2567i -5.5245 - 2.4452i 6.0414
2.0729 0.2386 + 2.0592i -2.4342 - 1.9897i 3.1439
minPower
6.9035 -1.6003 + 6.7154i -2.1082 + 1.6303i 2.6650
6.9035 2.7897 -3.6438 - 1.9424i
-2.6539 + 0.8595i 4.1292
6.9880 5.8845 + 3.7689i -1.5989 - 0.1514i 1.6061
4.4229 -2.0182 + 3.9356i 1.6186 - 2.3544i 2.8571
3.9548 -3.9484 + 0.2234i 15.2057 - 6.0515i 16.3656
2.7512 -1.2440 + 2.4538i -3.6746 + 0.3413i 3.6904
1.9304 -1.6268 + 1.0393i -0.0532 + 0.6724i 0.6745
4.2094 -1.0458 + 4.0774i -1.8331 - 0.6000i 1.9288
4.2856 -4.2436 + 0.5988i 0.1642 + 1.3907i 1.4004
6.4725 -6.4020 - 0.9529i -7.3632 - 5.7938i 9.3694
3.5360 -3.3663 - 1.0825i -2.2362 - 1.7287i 2.8265
2.9264 -2.9264 -5.1339 5.1339
3.5360
6.4725
-3.3663 + 1.0825i -2.2362 + 1.7287i
-7.3632 + 5.7938i
2.8265 Generaliza?on
-6.4020 + 0.9529i 9.3694
4.2856 -4.2436 - 0.5988i 0.1642 - 1.3907i 1.4004
4.2094 -1.0458 - 4.0774i -1.8331 + 0.6000i 1.9288
1.9304 -1.6268 - 1.0393i -0.0532 - 0.6724i 0.6745
2.7512 -1.2440 - 2.4538i -3.6746 - 0.3413i 3.6904
3.9548 -3.9484 - 0.2234i 15.2057 + 6.0515i 16.3656
4.4229 -2.0182 - 3.9356i 1.6186 + 2.3544i 2.8571
6.9880 5.8845 - 3.7689i -1.5989 + 0.1514i 1.6061
2.7897 -2.6539 - 0.8595i -3.6438 + 1.9424i 4.1292
6.9035 -1.6003 - 6.7154i -2.1082 - 1.6303i 2.6650
2.0729 0.2386 - 2.0592i -2.4342 + 1.9897i 3.1439
4.3321 0.8045 - 4.2567i -5.5245 + 2.4452i 6.0414
7.8261 7.0708 + 3.3545i -2.8455 + 0.8510i 2.9700
12.8901 -12.0861 - 4.4812i -2.4756 - 1.7973i 3.0592

31 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Waterfilling
Q X

minPower Available Energy

32 / 40
CPS: Distributed Computing, Synchronization and Data Mining
▪  When both sequences are compressed the op5mal distance
es5ma5on can be solved using a double waterfilling process.

waterfilling double waterfilling

33 / 40
CPS: Distributed Computing, Synchronization and Data Mining
the bounds are op5mally 5ght
Theorem (FV’12, VFK’13): The computa5on of lower and upper bounds
given the aforemen5oned compression can be solved exactly using
double water-filling. The lower and upper bounds are op5mally 5ght; no
5ghter solu5ons can be provided

33 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Sketch of proof
X X X X
max qi xi + qi x i + qi xi + qi x i
x,q
i2p+ +
q \px i2px \p+
q i2pq \p+
x i2pq \px

s.t. xi 2 [0, Ux ], qi 2 [0, Uq ] Energy allocation


X X
2
xi = ex , qi2 = eq
i i

X X X
max qi x i + qi x i + qi x i
x,q
i2px \p+
q i2pq \p+
x i2pq \px

xi 2 [0, Ux ] qi 2 [0, Uq ] X
X X x2i = ex e(1)
x
x2i = e(1)
x qi2 = e(1)
q i
i i
X
qi2 = eq e(1)
q

uncoupled waterfillings i

C-S
q q
(1) (1)
ex ex eq eq

34 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Double Water-filling algorithm

Intui?on:

Water-fill for the discarded
coefficients of the two vectors
separately..

..using the appropriate energy
alloca?on

Zero-overhead algorithm
Complexity: Θ(nlong)
35 / 40
Experiments
▪  Unica database. IBM web traffic for year of 2010
§  Marke?ng/Adwords recommenda?on
– Analysis/Storage of weblog queries (1TB of data per month)
– GBS: Scheduling adver?sing campaigns / pricing

YIN YANG OF FINANCIAL DISRUPTION EINSURANCE CUSTOMER EXPERIENCE. IBM GLOBAL BUSINESS ANDREW STEVENS
BUSINESS DYNAMICS IBM

AMERICA MEDIA PLAYER INDUSTRY STRATEGIE ENTREPRISE RENTABILIT


GLENN FINCH IBM
BUSINESS CONSULTING

36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments

our analy5c solu5on is 300x faster


than convex op5mizers

36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments

LB/UB are 20% 5ghter than state-of-art

36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Experiments
the previous (10-20%) improvement in distance es?ma?on can
significantly reduce the search space when searching for k-NN

We retrieve
20%-80% fewer
sequences than other
approaches

36 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Extensions

▪  Cosine Similarity (text documents):
Dynamic Time Warping
cos(x,y) = 1 - L2(x,y)2/2
Halloween
▪  Correla?on (financial analysis):
corr(x,y) = 1 - L2(x,y)2/2 (for normalized signals x,y)
▪  Dynamic Time Warping (flexible similarity metric) Christmas

37 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Conclusions

▪  Distributed Compu5ng
•  Improvements for sparse systems
•  Design method for gossip algorithms

▪  Clock synchroniza5on
•  Fundamental limits
•  Distributed protocols

▪  Data Mining
•  Exact mining from inexact data
•  Op?mal distance bounds

38 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Other research (1)

1.  Compressed sensing
Ø  Recursive algorithm for streaming data



2.  Wireless resource alloca5on
Ø  MAC, cross-layer design

3.  Video streaming


Ø  Cross-layer architecture for quality op?miza?on

39 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Other research (2)

4.  Data mining
Cluster-preserving compression kNN-preserving watermarking

5.  Mobile Health

39 / 40
CPS: Distributed Computing, Synchronization and Data Mining
Acknowledgements
▪  Collaborators:
•  Anastasios Zouzias, IBM-Research
•  Michail Vlachos, IBM-Research
•  Mar?n Ve3erli, EPFL
•  P. R. Kumar, Texas A&M University
•  Vivek Borkar, Indian Ins?tute of Technology Bombay

▪  Funding

40 / 40
CPS: Distributed Computing, Synchronization and Data Mining
References

[FGK11] N. Freris, S. Graham and P. R. Kumar, “Fundamental Limits on Synchronizing Clocks over
Networks.” IEEE Transac?ons on Automa?c Control, vol. 56, no. 2, pp. 1352-1364, June 2011.

[FZ12] N. Freris and A. Zouzias, “Fast distributed smoothing of rela5ve measurements." 51st IEEE
Conference on Decision and Control (CDC), pp.1411-1416, Dec. 2012.

[AZ13] A. Zouzias and N. Freris, “Randomized Extended Kaczmarz for Solving Least Squares.” SIAM
Journal on Matrix Analysis and Applica?ons, vol. 34(2), pp. 773-793, 2013.

[FBK13] N. Freris, V. Borkar and P. R. Kumar, “Distributed model-based clock synchroniza5on in wireless sensor
networks.” Submi3ed to IEEE/ACM Transac?ons on Networking, Nov. 2013.

[FZ14] N. Freris and A. Zouzias, “Randomized gossip algorithms for solving Laplacian systems.”
(invited paper) submi3ed to 53d IEEE Conference on Decision and Control (CDC), Mar. 2014.

[VFK15] M. Vlachos, N. Freris and A. Kyrillidis, “Compressive Mining: Fast and Op5mal Data Mining in the
Compressed Domain.” Interna?onal Journal on Very Large Data Bases (VLDBJ), vol. 24(1), pp. 1-24, 2015.




CPS: Distributed Computing, Synchronization and Data Mining


CSPLab is hiring!
https://wp.nyu.edu/cpslab

CPS: Distributed Computing, Synchronization and Data Mining


Thank you

CPS: Distributed Computing, Synchronization and Data Mining

You might also like