You are on page 1of 19

US 20110289351A1

( 19) United Sta tes


( 12) Pa tent Applica tion Publica tion ( 10) Pub. N o. : US 2011/0289351 A1
RASHMI et a l. ( 4 3) Pub. D a te: N OV. 24 , 2011
( 54 ) D ISTRIBUTED STORAGE SYSTEM AN D A Publica tion C la s s i? ca tion
METHOD THEREOF ( 51) Int- C l
( 7 5) l t K V RASHMI ( US) N 'h B G06F 11/20 ( 2006. 01)
nven ors : . . ' 1 a r .
52 US. C l. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 14 /632~ 7 14 /E11. 084
SHAH, ( US) ; P. Vij a y KUMAR, ( )
( Us ) ( 57 ) ABSTRAC T
_ Embodiments of the dis clos ure rela te to a dis tributed s tora ge
( 7 3) As s lgnee? IN D IAN IN STITUTE OF s y s tem compris ing of s tora ge nodes s toring s ource da ta
SC IEN C E, Ba nga lore ( IN ) a mongs t them in a coded, a nd ty pica lly , redunda nt ma nner.
The da ta to be s tored in the s tora ge nodes ma y be dis s emi
( 21) Appl. N o. : 13/110, 534 na ted from the s ource a cros s the network in a dis tributed
ma nner. The s y s tem a ls o includes end us ers who recover the
( 22) Filed; Ma y 18, 2011 s ource da ta by connecting to s ubs ets of the s tora ge nodes . A
fa iled s tora ge node ma y be repa ired by downloa ding da ta
( 30) Foreign Applica tion Priority D a ta from s ubs ets of ex is ting nodes . The s tora ge s pa ce req uired in
the nodes , a nd the network ba ndwidth utiliZ ed for repa ir a re
Ma y 21, 2010 ( IN ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 21/C HE/2010 minimiZ ed
114 m0
106
End [ us er
106 102
Stora ge node
116
Repla cement
node
Source W
da ta '
I 110 112
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 1 0f 6 US 2011/0289351 A1
HBO
we
End ueer
Q D RB
106 102
Stora ge nq q e
N etwork
Repla cement
node
114
Source
H03 112
Figure 1
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 2 0f 6 US 2011/0289351 A1
Mes s a ge C ons truction o1b All > Encoding
Sy mbols mes s a ge ma trix '
Figure 2
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 3 0f 6 US 2011/0289351 A1
nd Us er
~ ( a te coilector)
node ik
a ta recovery
nodes
Figure 3[ \
D a ta recons truction block a t end- us er
w? 1' , 7 , 7 1
Input m if . s
from k m. j rj U Era s ure Hr _ Source
da ta * " decoding " da ta
recovery .
nodes r I
- Ji'j r 7
+ 311;
Figure 3B
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 4 0f 6 US 2011/0289351 A1
b1
Helper nodes
Figure 4 A
Heiper N ode:
FYI. As s is ting Block
Figure 4 B
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 5 0f 6 US 2011/0289351 A1
Repa ir Block a t repia cement nply de
Input 1; ? Eur [ 1.
- 1
from it; a ir/ilk 151w
Figure 4 C
N ode 1 N a me 2 N ix ie 3 i ? ii? i 1i
K2 gm, 2114 m? a + my % + m?
* y
K m2 m5 m8 332 + m5 ~+ ~ mg
/ R23 { 116 m5 mg + mg 11 m3
m3 mq ? ny nra mi? my iimg
a m m3 m2 " '1' m2 + m3 mii + 2m; + 3mg mi + 3m? + 2mg
: 21. 1% m5 m4 + m5 7 + mg mgi + 21115 ~ ~ 31: 15 115 1+ 31115 + 2ma
m; m3 m; + ms + mg m; < 1 21% + 3mg m; + Ema + 2mg
N ude % Ma de 2 N a cie 3 iimz ie d N a de 5
Figure 5
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 6 0f 6 US 2011/0289351 A1
Kmies W i? i co? mi da ta
a 4 ! a An P1 0 a t. T E. u nu
g
Fiod? - W iiimm da m
Figure 6
US 2011/0289351A1
D ISTRIBUTED STORAGE SYSTEM AN D A
METHOD THEREOF
TEC HN IC AL FIELD
[ 0001] Embodiments of the pres ent dis clos ure rela te to dis
tributed s tora ge networks . More pa rticula rly , the embodi
ments rela te to a n optimiz ed dis tributed s tora ge s y s tem for
repa iring fa iled s tora ge nodes a nd recons tructing the s ource
da ta from the da ta s tored in s ubs ets of nodes .
BAC KGROUN D
[ 0002] In a dis tributed s tora ge s y s tem, informa tion perta in
ing to a da ta ? le is dis pers ed a cros s nodes in a netW ork in s uch
a ma nner tha t a n end- us er ca n retrieve the da ta s tored by
ta pping into a s ubs et of the nodes . A popula r option tha t
reduces netW ork conges tion a nd tha t lea ds to increa s ed res il
iency in the fa ce of node fa ilures , is to employ era s ure coding,
for ex a mple by ca lling upon ma x imum- dis ta nce- s epa ra ble
( MD S) codes s uch a s Reed- Solomon ( RS) codes . Let B be the
tota l ? le s iz e mea s ured in terms of s y mbols over a ? nite ? eld
Fq of s iz e q . W ith RS codes , da ta is s tored a cros s n nodes in the
netW ork in s uch a W a y tha t the entire da ta ca n be recovered by
a da ta collector by connecting to a ny k nodes , a proces s of
da ta recovery tha t W e W ill refer to a s recons truction. Severa l
dis tributed s tora ge s y s tems s uch a s RAID - 6, Ocea nStore a nd
Tota l Reca ll employ s uch a n era s ure- coding option.
[ 0003] Regenera ting a lgorithms or codes in a dis tributed
s tora ge netW ork perform the regenera tion of a fa iled node.
Upon fa ilure of a n individua l node, a s elf- s us ta ining da ta
s tora ge netW ork mus t neces s a rily pos s es s the a bility to regen
era te ( or repa ir) a fa iled node. An obvious mea ns to a ccom
plis h this is to permit the repla cement node to connect to a ny
k nodes , doW nloa d the entire da ta , a nd ex tra ct the da ta tha t
W a s s tored in the fa iled node. But doW nloa ding the entire B
units of da ta in order to recover the da ta s tored in a s ingle node
tha t s tores only a fra ction of the entire da ta ? le is W a s teful, a nd
ra is es the q ues tion a s to W hether there is a better option.
[ 0004 ] C onventiona l RS codes trea t ea ch fra gment s tored in
a node a s a s ingle s y mbol belonging to the ? nite ? eld Fq . It ca n
be s hoW n tha t W hen individua l nodes a re res tricted to perform
only linea r opera tions over Fq , the tota l a mount of da ta doW n
loa d needed to repa ir a fa iled node ca n be no s ma ller tha n B,
the s iz e of the entire ? le. In contra s t, regenera ting codes a re
codes over a vector a lpha bet a nd hence trea t ea ch fra gment a s
being compris ed of a s y mbols over the ? nite ? eld Fq . Linea r
opera tions over R, in this ca s e, permit the tra ns fer of a fra ction
of the da ta s tored a t a pa rticula r node. Apa rt from this neW
pa ra meter 0t, tW o other pa ra meters ( d, [ 3) a re a s s ocia ted W ith
regenera ting codes . Under the de? nition of regenera ting
codes , a fa iled node is permitted to connect to a ? x ed number
d of the rema ining nodes W hile doW nloa ding [ 32a s y mbols
from ea ch node. This proces s is termed a s regenera tion a nd
the tota l a mount d[ 3 of da ta doW nloa ded for repa ir purpos es a s
the repa ir ba ndW idth. Ty pica lly , W ith a regenera ting code, the
a vera ge repa ir ba ndW idth is s ma ll compa red to the s iz e of the
? le B.
[ 0005] It W ill be a s s umed throughout the des cription, tha t
for [ n, k, d] regenera ting code, k, d a re the minimum va lues
under W hich recons truction a nd regenera tion ca n a lW a y s be
gua ra nteed. This res tricts the ra nge of d to
k d n- l ( 1)
[ 0006] The ? rs t ineq ua lity a ris es beca us e if the regenera
tion pa ra meter d W ere les s tha n the da ta - recons truction
N ov. 24 , 2011
pa ra meter k then one could, in fa ct, recons truct da ta by con
necting to a ny d< k nodes ( trea ting the da ta s tored in every
other node a s a function of tha t s tored in thes e d nodes )
thereby contra dicting the minima lity of k. Fina lly , W hile a
regenera ting code over Fq is a s s ocia ted W ith the collection of
pa ra meters
it W ill be found more convenient to rega rd pa ra meters { n, k,
d} a s prima ry a nd { ( x , [ 3, B} a s s econda ry a nd thus W e W ill
ma ke freq uent references in the s eq uel, to a code W ith thes e
pa ra meters a s a n [ n, k, d] regenera ting code ha ving pa ra meter
s et ( 0t, [ 3, B) .
[ 0007 ] Ex a ct vers us Functiona l Regenera tion: In the con
tex t of a regenera ting code, by functiona l regenera tion of a
fa iled node v, W e W ill mea n, repla cement of the fa iled node by
a neW node v' in s uch a W a y tha t folloW ing repla cement, the
res ulting netW ork of n nodes continues to pos s es s the recon
s truction a nd regenera tion properties . In contra s t, by ex a ct
regenera tion mea ns repla cement of a fa iled node vby a
repla cement node v' tha t s tores ex a ctly the s a me da ta a s W a s
s tored in node v. An ex a ct- regenera ting code is a regenera ting
code tha t ha s the ca pa bility of ex a ctly regenera ting ea ch
ins ta nce of a fa iled node. C lea rly W here it is pos s ible, a n
ex a ct- regenera tion code is to be preferred over a functiona l
regenera tion code s ince under functiona l regenera tion; there
is need for the netW ork to inform a ll nodes in the netW ork of
the repla cement, W herea s this is clea rly unneces s a ry under
ex a ct regenera tion.
[ 0008] C ut- Set Bound a nd the Stora ge- Repa ir- Ba ndW idth
Tra de- off ha s a ma j or res ult in the ? eld of regenera ting codes ,
tha t us es the cut- s et bound of netW ork coding to es ta blis h tha t
the pa ra meters of a functiona l regenera ting code mus t neces
s a rily s a tis fy :
1H ( 2)
[ 0009] It is des ira ble to minimiz e both 0t a s W ell a s [ 3 s ince
minimiz ing a res ults in a minimum s tora ge s olution W hile
minimiz ing [ 3 ( for ? x ed d) res ults in a s tora ge s olution tha t
minimiz es repa ir ba ndW idth. It ca n be deduced from ( 2) tha t
it is not pos s ible to minimiz e both 0t a nd [ 3 s imulta neous ly a nd
there is in fa ct a tra deoffbetW een choices of the pa ra meters 0t
a nd [ 3. The tW o ex treme points in this tra deoff a re termed the
minimum s tora ge regenera tion ( MSR) a nd minimum ba nd
W idth regenera tion ( MBR) points res pectively . The pa ra m
eters 0t a nd [ 3 for the MSR point on the tra deoff ca n be
obta ined by ? rs t minimiz ing 0t a nd then minimiz ing [ 3 to
obta in
_ B ( 3)
a _ I,
_ B
B _ k( d - k + 1)
[ 0010] Revers ing the order, lea ds to the MBR point W hich
thus corres ponds to
US 2011/0289351Al
[ 0011] W e de? ne a n optima l [ n, k, d] regenera ting code a s
a code W ith pa ra meters ( 0t, [ 3, B) s a tis fy ing the tW in req uire
ments tha t
l) the pa ra meter s et ( 0t, [ 3, B) a chieves the cut- s et bound W ith
eq ua lity a nd
2) decrea s ing either 0t or [ 3 W ill res ult in a neW pa ra meter s et
tha t viola tes the cut s et bound.
[ 0012] An MSR code is then de? ned a s a n [ n, k, d] regen
era ting code W hos e pa ra meters ( 0t, [ 3, B) s a tis fy ( 3) a nd s imi
la rly , a n MBR code a s one W ith pa ra meters ( 0t, [ 3, B) s a tis fy
ing ( 4 ) . C lea rly , both MSR a nd MBR codes a re optima l
regenera ting codes .
[ 0013] The na ture of the cut- s et bound permits a divide
a nd- conq uer a pproa ch to be us ed in the a pplica tion of optima l
regenera ting codes to la rge ? le s iZ es , thereby s implify ing
s y s tem implementa tion
[ 0014 ] Given a n optima l [ n, k, d] regenera ting code W ith
pa ra meter s et ( 0t, [ 3, B) , a s econd optima l regenera ting code
W ith pa ra meter s et ( 00: 6 0t, [ 3': 6 [ 3, B': 6 B) for a ny pos itive
integer 6 ca n be cons tructed, by dividing the 6 B mes s a ge
s y mbols into 6 groups of B s y mbols ea ch, a nd a pply ing the
( 0t, [ 3, B) code to ea ch group independently . Secondly , a
common fea ture of both MSR a nd MBR regenera ting codes is
tha t in either ca s e, their pa ra meter s et ( 0t, [ 3, B) is s uch tha t
both 0t a nd B a re multiples of [ 3. It folloW s tha t if one ca n
cons truct a n ( optima l) [ n, k, d] MSR or MBR code W ith [ 3: 1,
then one ca n cons truct a n ( optima l) [ n, k, d] MSR or MBR
code for a ny la rger va lue of [ 3.
[ 0015] The problem of minimiZ ing repa ir ba ndW idth for
functiona l repa ir of a fa iled s tora ge node is cons idered, W here
the evolution of the s tora ge netW ork through a s eq uence of
fa ilures a nd regenera tions is repres ented a s a netW ork, W ith a ll
pos s ible da ta collectors repres ented a s s inks . The da ta - recon
s truction req uirement is formula ted a s a multica s t netW ork
coding problem, W ith the netW ork ha ving a n in? nite number
of nodes . The cut- s et a na ly s is of this netW ork lea ds to the
rela tion betW een the pa ra meters of a regenera ting code given
in eq ua tion ( 2) . It ca n be s een tha t there is a tra deoff betW een
the choice of the pa ra meters 0t a nd [ 3 for a ? x ed B a nd this is
termed a s the s tora ge pa ir ba ndW idth tra deoff. This tra deoff is
a chieva ble under functiona l- regenera tion. HoW ever, the cod
ing s chemes s ugges ted a re not ex plicit a nd req uire la rge ? eld
s iz e.
[ 0016] A code s tructure tha t gua ra ntees ex a ct- regenera tion
of j us t the s y s tema tic nodes is provided for the MSR point
W ith pa ra meters [ n: d+ l, k, d>: 2k l] . This code ma kes us e
of interference a lignment, a nd is termed a s the MISER code.
Subs eq uently , it W a s s hoW n tha t for this s et of pa ra meters a nd
the code introduced for ex a ct- regenera tion of only the s y s
tema tic nodes ca n a ls o be us ed to repa ir the non- s y s tema tic
( pa rity ) node fa ilures ex a ctly provided repa ir cons truction
s chemes a re a ppropria tely des igned. Such a n ex plicit repa ir
s cheme is indeed des igned, W hich conta ins a n ex a ct- regener
a ting MSR code for pa ra meter s et [ n: 5, k: 3, d: 4 ] . A proof of
non- a chieva bility of the cut- s et bound on ex a ct- regenera tion
a t the MSR point, for the pa ra meters [ n, k, d< 2k 3] W hen
N ov. 24 , 2011
[ 3: 1 On the other ha nd, the MSR point is s hoW n to be a chiev
a ble in the limiting ca s e of B a pproa ching in? nity ( i. e. , [ 3
a pproa ching in? nity ) .
SUMMARY
[ 0017 ] The s hortcomings of the prior a rt a re overcome a nd
a dditiona l a dva nta ges a re provided through the provis ion of a
method a nd s y s tem a s des cribed in the des cription.
[ 0018] In one embodiment, the dis clos ure provides a dis
tributed da ta s tora ge netW ork. The netW ork compris es of a
plura lity of s tora ge nodes . An encoder is con? gured to obta in
a nd a s s ign pre- determined a mount of da ta to be s tored in ea ch
s tora ge node in the netW ork. An end us er compris ing of da ta
recons truction block con? gured to obta in s ource da ta by con
necting to one of a plura lity of s ubs ets of the s tora ge nodes .
Als o, W hen a s tora ge node fa ils , it is repla ced by a neW node
a nd cons is ting of a repa ir block for retrieving the da ta s tored
da ta s tored in the fa iled node by connecting to one of a
plura lity of s ubs ets of s tora ge nodes . Further, ea ch s tora ge
node cons is ts of a da ta a s s is ting block to compute a nd tra ns
mit da ta to a s s is t in the repa ir of fa iled nodes .
[ 0019] In a nother embodiment, the dis clos ure provides a
method of obta ining a dis tributed s tora ge netW ork. The
method compris es of performing coding opera tion on s ource
da ta of s iZ e B for obta ining a s et of coded da ta , a s s igning ea ch
node W ith a prede? ned a mount of coded da ta ( 0t) , performing
da ta recons truction opera tion a t the da ta recons truction block
a t a n end us er by connecting to a predetermined number of
s tora ge nodes ( k) , performing repa ir opera tion a t repa ir block
of the neW node repla cing a fa iled s tora ge node by retrieving
da ta s tored in the fa iled node by doW nloa ding prede? ned
a mount of coded da ta ( [ 3) from a predetermined number of
s tora ge nodes ( d) ; a nd a s s is ting in the repa ir of fa iled node by
performing a computa tion by the repa ir a s s is ting block of d
s tora ge nodes chos en by s a id neW node a nd tra ns mitting the
res ult of the computa tion to s a id neW node.
BRIEF D ESC RIPTION OF THE
AC C OMPAN YIN G FIGURES
[ 0020] The novel fea tures a nd cha ra cteris tics of the dis clo
s ure a re s et forth in the a ppended cla ims . The embodiments of
the dis clos ure its elf, hoW ever, a s W ell a s a preferred mode of
us e, further obj ectives a nd a dva nta ges thereof, W ill bes t be
unders tood by reference to the folloW ing deta iled des cription
of a n illus tra tive embodiment W hen rea d in conj unction W ith
the a ccompa ny ing ? gures . One or more embodiments a re
noW des cribed, by W a y of ex a mple only , W ith reference to the
a ccompa ny ing ? gures in W hich:
[ 0021] FIG. 1 illus tra tes one embodiment of a dis tributed
s tora ge netW ork W ith s tora ge nodes .
[ 0022] FIG. 2 illus tra tes the block dia gra m of the Encoder.
[ 0023] FIG. 3A illus tra tes da ta recovery by a n end us er
( da ta collector) by connecting to a ny k nodes ,
[ 0024 ] FIG. 3B illus tra tes a block dia gra m of the da ta
recons truction block a t the end- us ers .
[ 0025] FIG. 4 A illus tra tes repa ir of a fa iled node, node f,
W ith the repla cement node f doW nloa ding da ta from a ny d
rema ining nodes .
[ 0026] FIG. 4 B illus tra tes the intera ction betW een the
repla cement node f a nd repa ir a s s is ting block of the helper
node.
[ 0027 ] FIG. 4 C illus tra tes repa irblock a t repla cement node,
a s one embodiment.
US 2011/0289351Al
[ 0028] FIG. 5 illus tra tes a n ex a mple of the tW in code fra me
W ork.
[ 0029] FIG. 6 illus tra tes da ta deploy ment us ing the TW in
code fra mework.
[ 0030] The ? gures depict embodiments of the dis clos ure
for purpos es of illus tra tion only . One s killed in the a rt W ill
rea dily recogniz e from the folloW ing des cription tha t a lterna
tive embodiments of the s tructures a nd methods illus tra ted
herein ma y be employ ed W ithout depa rting from the prin
ciples of the dis clos ure des cribed herein.
D ETAILED D ESC RIPTION
[ 0031] The foregoing ha s broa dly outlined the fea tures a nd
technica l a dva nta ges of the pres ent dis clos ure in order tha t the
deta iled des cription of the dis clos ure tha t folloW s ma y be
better unders tood. Additiona l fea tures a nd a dva nta ges of the
dis clos ure W ill be des cribed hereina fter W hich form the s ub
j ect of the cla ims of the dis clos ure. It s hould be a pprecia ted by
thos e s killed in the a rt tha t the conception a nd s peci? c
embodiment dis clos ed ma y be rea dily utiliZ ed a s a ba s is for
modify ing or des igning other s tructures for ca rry ing out the
s a me purpos es of the pres ent dis clos ure. It s hould a ls o be
rea liZ ed by thos e s killed in the a rt tha t s uch eq uiva lent con
s tructions do not depa rt from the s pirit a nd s cope of the
dis clos ure a s s et forth in the a ppended cla ims . The novel
fea tures W hich a re believed to be cha ra cteris tic of the dis clo
s ure, both a s to its orga niZ a tion a nd method of opera tion,
together W ith further obj ects a nd a dva nta ges W ill be better
unders tood from the following des cription W hen cons idered
in connection W ith the a ccompa ny ing ? gures . It is to be
ex pres s ly unders tood, hoW ever, tha t ea ch of Figures is pro
vided for the purpos e of illus tra tion a nd des cription only a nd
is not intended a s a de? nition of the limits of the pres ent
dis clos ure.
[ 0032] An ex empla ry embodiment of the pres ent dis clos ure
is a s y s tem 100 to ena ble da ta to be s tored in a dis tributed
ma nner compris ing of a s et of n s tora ge nodes 106, ea ch
ha ving a ca pa city of a s y mbols , connected through a netW ork
a nd s toring s y mbols coded from a s et of B s ource s y mbols .
FIG. 1 is a n illus tra tion of the dis tributed s tora ge s y s tem. An
encoder 108 is illus tra ted a s block dia gra m repres enta tion in
the FIG. 2 W hich computes the coded s y mbols to be s tored in
the s tora ge nodes . End- us ers 102, a ls o ca lled da ta collectors ,
ha ving connectivity to a k- s iZ ed s ubs et of the s tora ge nodes
con? gured to decode the s tored coded- s y mbols in order to
recover the da ta . A repla cement 112 for fa iled node 110, us ing
a regenera ting decoder ( repa ir block 114 ) , being a ble to
recover the da ta tha t W a s s tored in the fa iled node by doW n
loa ding [ 3 s y mbols ea ch from a d- s iZ ed s ubs et of the ex is ting
s tora ge nodes .
[ 0033] The encoder 108 is con? gured to encode the s ource
da ta , a nd a s s ign pre- determined a mount of da ta to be s tored in
ea ch s tora ge node 106 in the netW ork 104 . An end us er 102
compris ing of a proces s ing unit con? gured to obta in s ource
da ta by connecting to one of a plura lity of s ubs ets of the
s tora ge nodes S 106. Als o, ea ch s tora ge node 106 cons is ts of
a repa ir block 114 ( RB) or a mea ns for repla cing a ny fa iled
s tora ge node to repa ir a s tora ge node 110 W ith a neW s tora ge
node 112 a nd retrieving the da ta s tored in s a id fa iled node by
connecting to one of a plura lity of s ubs ets of s tora ge nodes .
The s tora ge node 106, 112 a ls o cons is ts of a repa ir a s s is ting
block 116 ( RAB) or mea ns for computing a nd tra ns mitting
da ta to a s s is t in the repa ir of fa iled nodes . The repa ir a s s is ting
block 116 ( RAB) is s hoW n in FIG. 4 B.
N ov. 24 , 2011
[ 0034 ] FIGS. 3A a nd 3B illus tra te the da ta recovery a t the
end us er ( da ta collector) , a nd FIGS. 4 A, 4 8 a nd 4 C illus tra te
repa ir of a fa iled node. The proces s of a da ta collector recov
ering the s ource da ta is ca lled recons truction, a nd the proces s
of repla cement node obta ining the da ta tha t W a s s tored in the
fa iled node is ca lled repa ir or regenera tion. The a mount of
da ta doW nloa d tha t the repla cement node needs to perform is
termed a s the repa ir ba ndW idth, d[ 3.
[ 0035] D a ta D is tribution is us ed to dis tribute da ta , W hich is
genera ted a t a point s ource, a cros s a netW ork to ena ble recon
s truction of da ta in a dis tributed ma nner. A s tora ge node tha t
conta ins the a ppropria te encoded da ta is referred a s a ? lled
node, a nd s tora ge nodes tha t do not conta in a ny da ta a s the
empty nodes .
[ 0036] Initia lly , a ll the s tora ge nodes a re a s s umed to be
empty . Firs t, the da ta s ource encodes the da ta to be s tored in
its neighbouring empty nodes , a nd tra ns mits this da ta to them.
Subs eq uently , the nodes in the neighbourhood of ? lled nodes
doW nloa d da ta from them us ing the regenera tion a lgorithm.
This proces s goes on till a ll the nodes a re ? lled.
[ 0037 ] The tota l a mount of da ta tra ns fer in this proces s is
the minimum pos s ible, a nd eq ua l to the a mount of da ta tra ns
fer req uired if a centra l node W ould ha ve tra ns mitted a ppro
pria te encoded da ta to a ll the s tora ge nodes . HoW ever, the
dis tributed na ture of the da ta dis tribution ha s s evera l a dva n
ta ges . Loa d ba la ncing is a chieved, a nd the tra ? ic is uniform
a cros s the netW ork. The s ource node ca n become una va ila ble
a fter the initia l tra ns mis s ion of coded da ta to the s tora ge
nodes in its neighbourhood. Though the tota l a mount of da ta
tra ns fer is eq ua l, or nea rly eq ua l, in both ca s es , the number of
hops a cros s the netW ork tha t a ny da ta pa cket W ill ha ve to
tra vel in the ca s e of thes e codes W ill be cons idera bly les s er,
thereby reducing the netW ork conges tion.
[ 0038] The da ta s tored in the ith s tora ge node in the s y s tem
is completely determined a s ingle encoding vector, a s com
pa red to a la rge genera tor ma trix in genera l. The encoding
vector s uf? ces for encoding, recons truction, a nd regenera
tion. The s hort length of the encoding vector reduces the
overhea d a s s ocia ted W ith the need for nodes to communica te
their encoding vectors to the D C during recons truction, a nd to
the repla cement node during regenera tion of a fa iled node.
[ 0039] The dis clos ure provides three va ria tions of s uch dis
tributed s tora ge s y s tems , a s des cribed s ubs eq uently . The
choice of the s y s tem to be us ed W ill depend on the tra de- off
betW een the cos ts of s tora ge s pa ce, netW ork ba ndW idth utili
Z a tion a nd connectivity in the netW ork. The ? rs t va ria tion
W orks on the netW ork ba ndW idth utiliZ a tion a nd repa irs a
fa iled node us ing the minimum pos s ible repa ir ba ndW idth.
Als o, this va ria tion minimiZ es the s tora ge s pa ce. The s econd
va ria tion is s uited W hen there is a higher price for s tora ge
s pa ce. It ? rs t minimiZ es the a mount of da ta needed to be
s tored in ea ch node, a nd then reduces the repa ir ba ndW idth to
the minimum pos s ible for this a mount of s tora ge. Thes e tW o
s y s tems ena ble a da ta collector to recover the entire s ource
da ta by doW nloa ding da ta from a ny k nodes , a nd a ls o permit
node repa ir by connecting to a ny d nodes . The ? rs t a nd s econd
va ria tions a re ca lled a s common product- ma trix fra meW ork.
The third va ria tion s imulta neous ly a chieves both the a b s olute
minimum s tora ge, a s W ell a s a bs olute minimum repa ir ba nd
W idth. The third va ria tion is ca lled a s TW in- code fra meW ork.
[ 004 0] All three va ria tions fa ll under a common fra me
W ork, in W hich the s tora ge nodes s tore coded da ta or s y mbols
tha t is ex pres s ed by the ma trix product ex pres s ion IP M. Ea ch
codeW ord in the dis tributed s tora ge code ca n be repres ented
US 2011/0289351 A1
by a n ( nx a ) code ma trix C whos e ith row cti, conta ins a
s y mbols s tored by the ith node. Ea ch code ma trix is a product
C IW M ( 1)
of a n ( n>< d) encoding ma trix II a nd a n ( d>< 0t) mes s a ge ma trix
M. The entries of the ma trix 11) a re ? x ed a priori a nd a re
independent of the mes s a ge s y mbols . The mes s a ge ma trix M
conta ins the B mes s a ge s y mbols , with s ome s y mbols pos s ibly
repea ted. The ith row 1] ) , - of II a re referred a s the encoding
vector of node i a s it is this vector tha t is us ed to encode the
mes s a ge into the form in which it is s tored within the ith node:
C IZ IIIPIM ( 2)
where the s upers cript t is us ed to denote the tra ns pos e of a
ma trix . The choice of T a nd M depends on the va ria tion under
cons idera tion. Ea ch node is a s s ocia ted with a vector ( a row of
1I ) , a nd the encoder multiplies tha t vector with M to obta in
the s y mbols s tored in the node. The s ource s y mbols or da ta
a re from a ? nite ? eld Fq , a nd a ny s tored s y mbol is a linea r
combina tion of thes e s y mbols over Fq . The s iZ e of the ? nite
? eld req uired is of the order of n, a nd no further res trictions
a re impos ed on the ? eld s iZ e.
[ 004 1] The entire da ta is divided into s tripes of s iZ es corre
s ponding to [ 3: 1. Since ea ch s tripe is of minima l s iZ e, the
complex ity of encoding, recons truction a nd regenera tion
opera tions a re cons idera bly lowered, a nd s o a re the buffer
s iZ es req uired a t da ta collectors a nd repla cement nodes . Fur
thermore, the opera tions tha t need to be performed on ea ch
s tripe a re identica l a nd independent, a nd hence ca n be per
formed in pa ra llel e? iciently by a GPU/FPGA/multi- core
proces s or.
[ 004 2] The end- us ers downloa d the da ta s tored in a s ubs et
of k nodes , a nd a ls o downloa d a diges t ( i. e. , a pos s ibly con
dens ed form) of their node vectors , a nd a decoder a t the
end- us er ca n recover the s ource da ta from this informa tion.
[ 004 3] A repla cement node cons is ts of a decoder tha t ? rs t
a s s imila tes the downloa ded da ta , a nd s ubs eq uently decodes
this coded da ta to recover the da ta tha t wa s s tored in the fa iled
node.
[ 004 4 ] Ea ch s tora ge node ca n a ls o a s s is t in the repa ir of a ny
fa iled node. Onbeing conta cted by the node repla cing a fa iled
node, a s tora ge node computes certa in linea r combina tions of
the s y mbols s tored in it, this linea r combina tion being depen
dent on the encoding vector of the fa iled node, a nd pa s s es on
the res ult.
[ 004 5] The following a re the s teps performed in recon
s truction a nd regenera tion for the three va ria tions of the dis
tributed s tora ge s y s tem.
[ 004 6] In one embodiment, the ? rs t va ria tion is one of the
product- ma trix fra mework which ha s a da ta collector ca n
recover the entire s ource by downloa ding da ta from a ny k
s tora ge nodes , a nd a fa iled node ca n be regenera ted by a ny d
ex is ting nodes . The common s tructure of the code ma trices
lea ds to common a rchitectures for both da ta - recons truction
a nd ex a ct- regenera tion.
[ 004 7 ] D a ta - recons truction a mounts to recovering the mes
s a ge ma trix M from the KO. s y mbols obta ined from a n a rbi
tra ry s et of k s tora ge nodes . Let the s et of k nodes to which the
da ta collector connects be denoted a s { i 1, . . . , ik} . The ith node
in the s et pa s s es on the mes s a ge vector to the da ta
collector. The da ta collector thus obta ins the product ma trix
where 11) D C is the s ub ma trix of cons is ting of the k rows [ i] , .
. . , ik} . It then us es the properties of the ma trices a nd M to
N ov. 24 , 2011
recover the mes s a ge. The precis e procedure for recovering M
is a function of the pa rticula r cons truction. Ea ch node in the
network is a s s ocia ted to a dis tinct ( d>< 1) encoding vector 11) , .
In the regenera tion proces s , a rela ted vector ul- of length 0t is
us ed, tha t conta ins a s ubs et of the components of 11) , .
[ 004 8] The repla cement of a fa iled node is illus tra ted in
FIGS. 4 A, 4 B a nd 4 C . To regenera te a fa iled node f, the node
repla cing the fa iled node connects to a n a rbitra ry s ubs et [ h] ,
. . , hd} of d s tora ge nodes which we will refer to a s the d
helper nodes which is s hown in FIG. 4 A. Ea ch helper node
pa s s es on the inner product of a s y mbols s tored in it with pf,
to the repla cement node a s s hown in FIG. 4 B, the helper node
hj pa s s es
[ 004 9] The repla cement node thus obta ins the product
ma trix which is s hown in FIG. 4 C , i. e.
where wrepa l- r is the s ubma trix of 11) cons is ting of the d rows
{ hp . . . , hd} . From this it turns out, a s will be s hown
s ubs eq uently , tha t one ca n recover the des ired s y mbols . Here
a ga in, the precis e procedure is dependent on the pa rticula r
cons truction. An importa nt fea ture of the product- ma trix con
s truction is tha t ea ch of the nodes pa rticipa ting in the
regenera tion of node f, needs only ha s knowledge of the
encoding vector of the fa iled node f a nd not the identity of the
other nodes pa rticipa ting in the regenera tion. This s igni?
ca ntly s impli? es the opera tion of the s y s tem.
[ 0050] In one embodiment the product- ma trix MBR code
cons truction is des cribed. The s peci? c ma ke- up of the encod
ing ma trix 11) a nd the mes s a ge ma trix M tha t res ults in a n [ n,
k, d] MBR code with [ 3: 1 . A nota ble fea ture of the cons truc
tion is tha t it is a pplica ble to a ll fea s ible va lues of [ n, k, d] , i. e. ,
a ll n, k, d s a tis fy ing k d n- l. Since the code is req uired to
be a n MBR code with [ 3: 1, it mus t pos s es s the da ta - recon
s truction a nd ex a ct- regenera tion properties req uired of a
regenera ting code, a nd in a ddition, ha ve pa ra meters { ( x , B}
tha t s a tis fy the eq ua tions for the MBR point. In pa rticula r, the
ex pres s ion for the ? le s iZ e B a t the MBR point ca n be rewrit
ten in the form:
[ 0051] Thus the pa ra meter s et of the des ired [ n, k, d] MBR
code is ( ( Fd, [ 3: 1, B: ( ( k+ 1) /2+ k( d k) ) .
[ 0052] Let S be a ( k>< k) ma trix cons tructed s o tha t the
k( k+ 1) /2 entries in the upper- tria ngula r ha lf of the ma trix a re
? lled up by k( k+ 1) /2 dis tinct mes s a ge s y mbols dra wn from
the s et { ul- ] >Bi: l. The k( k 1) /2 entries in the s trictly lower
tria ngula r portion of the ma trix a re then chos en s o a s to ma ke
the ma trix S a s y mmetric ma trix . The rema ining k ( d- k)
mes s a ge s y mbols a re us ed to ? ll up a s econd ( k>< ( d k) ) ma trix
T. The mes s a ge ma trix M is then de? ned a s the ( d>< d) s y m
metric ma trix given by
s T ( 3)
: iT' 01
US 2011/0289351 A1
[ 0053] The s y mmetry of the ma trix W ill be ex ploited W hen
ena bling node repa ir. N ex t, de? ne the encoding ma trix to be
a ny ( n>< d) ma trix of the form
W here ( I) a nd A a re ( n>< k) a nd ( n>< ( d k) ) ma trices res pectively ,
chos en in s uch a W a y tha t
[ 0054 ] 1) a ny d roW s of P a re linea rly independent,
[ 0055] 2) a ny k roW s of ( I) a re linea rly independent.
[ 0056] MBR Ex a ct- Regenera tion of a ny fa iled node ca n be
a chieved by connecting to a ny d of the ( n- 1) rema ining
nodes .
[ 0057 ] Let Ip fbe the roW of P corres ponding to the fa iled
node f. The d s y mbols s tored in the fa iled node corres pond to
the vector
111W ( 4 )
[ 0058] The repla cement for the fa iled node f connects to a n
a rbitra ry s et { hJ- Ij II, . . . , d} ofd helper nodes . Upon being
conta cted by the repla cement node, the helper node com
putes the inner product
a nd pa s s es on this va lue to the repla cement node. Thus , in the
pres ent cons truction, the vector pf eq ua ls Ipf its elf. The
repla cement node thus obta ins the d s y mbols Prepm- rM Ipf
from the d helper nodes , W here
21
w!
Pra wn: Th2
1; ,
[ 0059] By cons truction, the ( d>< d) ma trix PrePm- r is invert
ible. Thus , the repla cement node recovers M11) fthrough mul
tiplica tion on the left by Pdwpm- r. Since M is s y mmetric,
a nd this is precis ely the da ta previous ly s tored in the fa iled
node.
[ 0060] In MBR D a ta - Recons truction, a ll the B mes s a ge
s y mbols ca n be recovered by connecting to a ny k nodes , i. e. ,
the mes s a ge s y mbols ca n be recovered through linea r opera
tions on the entries of a ny k roW s of the ma trix C .
be the ( kx d) s ub- ma trix of P, corres ponding to the k roW s of
to W hich the da ta collector connects . Thus the da ta collector
ha s a cces s to the s y mbols
[ 0062] By cons truction, C ID D C is a non- s ingula r ma trix .
Hence, by multiply ing the ma trix PD C M on the left by C IJD C T
1, one ca n recover ? rs t T a nd s ubs eq uently , S.
[ 0063] Any ex a ct- regenera ting code ca n be ma de s y s tem
a tic through a non- s ingula r tra ns forma tion of the mes s a ge
s y mbols . In the pres ent ca s e, there is a s impler a pproa ch, in
W hich the ma trix ca n be chos en in s uch a W a y tha t the code is
a utoma tica lly s y s tema tic. W e s imply ma ke the choice:
N ov. 24 , 2011
W here Ik is the ( kx k) identity ma trix , 0 is a ( k>< ( d k) ) Z ero
ma trix , ( I9 a nd Al a re ma trices of s iZ es ( ( n k) >< k) a nd ( ( n k) ><
( d k) ) res pectively , s uch tha t [ 'EID Al] is a C a uchy ma trix .
C lea rly the code is s y s tema tic. It ca n be veri? ed tha t the
ma trix ha s the properties lis ted in MBR Ex a ct- Regenera tion.
[ 0064 ] A node repla cing the fa iled node needs to doW nloa d
the a bs olute minimum a mount of da ta , i. e. , doW nloa ds only
W ha t it s tores : d[ 3: ot. Further, the s tora ge s pa ce req uired a t
ea ch node a ha s the minimum pos s ible va lue for this s etting,
eq ua l to
[ 0065] As in the fra mework, the coded- s y mbols s tored in
the s tora ge nodes ca n be ex pres s ed by the ma trix product
ex pres s ion [ C IJA] M, W herein a ny k roW s of ( I) a nd a ny d roW s
of P a re linea rly independent.
[ 0066] In one embodiment va ria tion 2 W hich is ca lled MSR
is des cribed. As in va ria tion 1, a da ta collector ca n recover the
entire s ource by doW nloa ding da ta from a ny k s tora ge nodes ,
a nd a fa iled node ca n be regenera ted by a ny d ex is ting nodes .
[ 0067 ] Ea ch s tora ge node s tores the a bs olute minimum
a mount of da ta , i. e. , l/k of the tota l s ource da ta . For this
s etting, W e ena ble a ny fa iled node to be repa ired, W hen
d>: 2k 2, W ith [ 3 ha ving minimum pos s ible va lue.
[ 0068] In a nother embodiment the product- ma trix MSR
code cons truction is ex pla ined, identify the s peci? c ma ke- up
of the encoding ma trix a nd the mes s a ge ma trix M tha t res ults
in a n [ n, k, d] MSR code W ith [ 3: 1. The cons truction a pplies
to a ll pa ra meters [ n, k, dZ 2k- 2] . Since the code is req uired to
be a n MSR code W ith [ 3: 1, it mus t pos s es s the da ta - recon
s truction a nd ex a ct- regenera tion properties req uired of a
regenera ting code, a nd in a ddition, the pa ra meters { ( x , B}
s a tis fy eq ua tion the conditions for the MSR point. W e ? rs t
cons truct MSR codes for the ca s e W hen d: 2k 2 a nd s ubs e
q uently ex tend it to the ca s e W hen d> 2k 2.
[ 0069] The cons truction a n MSR code in the product- ma
trix forma t for d: 2k 2 is a s folloW s :
[ 007 0] At the MSR point W ith d: 2k 2 W e ha ve
0t: d k+ 1 Ik- 1, ( 9)
a nd hence
d: 2a . ( 10)
Als o,
[ 007 1] The ( d>< 0t) mes s a ge ma trix M is
S1 ( 12)
S2 1,
US 2011/0289351 A1
W here S1 a nd S2 a re ( otx ) s y mmetric ma trices cons tructed
s uch tha t the 0t( 0t+ 1) /2 entries in the upper- tria ngula r pa rt of
ea ch of the tW o ma trices a re ? lled up by 0t( 0t+ 1) /2 dis tinct
mes s a ge s y mbols . Thus , a ll the B: 0t( 0t+ 1) mes s a ge s y mbols
a re conta ined in the tW o ma trices S 1 a nd S2. The entries in the
s trictly loW er- tria ngula r portion of the tW o ma trices S l a nd S2
a re chos en s o a s to ma ke the ma trices S 1 a nd S2 s y mmetric.
The encoding ma trix is a n ( n>< d) ma trix given by
W here ( I) is a n ( n>< 0t) ma trix a nd A is a n ( n>< n) dia gona l ma trix .
The elements of a re chos en s uch tha t the folloW ing conditions
a re s a tis ? ed:
1) a ny d roW s of II a re linea rly independent,
2) a ny a roW s of ( I) a re linea rly independent a nd
3) the n dia gona l elements of A a re dis tinct.
[ 007 2] The a bove req uirements a re met by choos ing II to be
a Va ndermonde ma trix W ith elements chos en ca refully to
s a tis fy the third condition. Then under our code- cons truction
fra mework, the roW of the ( n>< 0t) product ma trix C IPM,
conta ins a code s y mbols s tored by the ith node.
[ 007 3] In one embodiment, MSR Ex a ct- Regenera tion of
a ny fa iled node ca n be a chieved by connecting to a ny d: 2k 2
of the rema ining ( n 1) nodes .
[ 007 4 ] Let [ q fj x fq ff] be the roW of corres ponding to the
fa iled node. Thus the ot s y mbols s tored in the fa iled node W ere
W W 4 ) ? ! " : f91+ 7 ~/ f921 ( 14 )
[ 007 5] The repla cement for the fa iled node f connects to a n
a rbitra ry s et { hj | j : 1, . . . , d} ofd helper nodes . Upon being
conta cted by the repla cement node, the helper node com
putes the inner product 1p h_ Mq ) /l a nd pa s s es on this va lue to
the repla cement node. Thus , in the pres ent cons truction, the
vector ufeq ua ls Bf. The repla cement node thus obta ins the d
s y mbols lI'mPm- rMBffrom the d helper nodes , W here
2; ,
if
l repa ir = 3
1; ,
[ 007 6] By cons truction, the ( d>< d) ma trix Prepa y is invert
ible. Thus the repla cement node noW ha s a cces s to
Slff
S2 ff
Q f
[ 007 7 ] As S1 a nd S2 a re s y mmetric ma trices , the repla ce
ment node ha s thus a cq uired through tra ns pos ition, both
( pt/? ll a nd ( PZ SZ I. Us ing this it ca n obta in
W hich is precis ely the da ta previous ly s tored in the fa iled
node.
[ 007 8] In one embodiment of the MSR D a ta - Recons truc
tion, a ll the B mes s a ge s y mbols ca n be recovered by connect
ing to a ny knodes , i. e. , the mes s a ge s y mbols ca nbe recovered
through linea r opera tions on the entries of a ny k roW s of the
code ma trix C .
N ov. 24 , 2011
be the ( k>< d) s ubma trix of, conta ining the k roW s of W hich
corres pond to the k nodes to W hich the da ta collector con
nects . Hence the da ta collector obta ins the s y mbols
[ 0080] The da ta collector ca n pos t- multiply this term W ith
C ID D C to obta in
[ ( 1) 0651 + AD C < I>D C S2I< I>Z C = < I>D C S1< I>Z C + AD C < I>D C S2< I>Z C ( 18)
[ 0081] N ex t, let the ma trices P a nd Q be de? ned a s
PID D C S1D D C T,
Q IQ D C SZ Q D C T' ( 19)
[ 0082] As S1 a nd S2 a re s y mmetric, the s a me is true ofthe
ma trices P a nd Q . In terms of P a nd Q , the da ta collector ha s
a cces s to the s y mbols of the ma trix
P+ AD C Q ( 20)
The ( i, j ) , l i, j k, element of this ma trix is
W hile the ( j , i) element is given by
W here eq ua tion ( 22) folloW s from the s y mmetry of P a nd Q .
By cons truction, a ll the 7 , - a re dis tinct a nd hence us ing ( 21)
a nd ( 22) , the da ta collector ca n s olve for the va lues of Pi] , Q U
for a ll i# j .
[ 0084 ] C ons ider ? rs t the ma trix P. Let C ID D C be given by
2'1 ( 23)
gird
[ 0085] All the non- dia gona l elements of P a re knoW n. The
elements in the ith roW ( ex cluding the dia gona l element) a re
given by
( PC - 511% - - - . >1 i+ 1 - - - Q + 1] - ( 24 )
[ 0086] HoW ever, the ma trix to the right is non- s ingula r by
cons truction a nd hence the da ta collector ca n obta in
[ 0087 ]
a cces s to
Selecting the ? rs t a of thes e, the da ta collector ha s
US 2011/0289351A1
( 26)
[ 0088] The ma trix on the left is a ls o non- s ingula r by con
s truction a nd hence the da ta collector ca n recover S1. Simi
la rly , us ing the va lues of the non- dia gona l elements of Q , the
da ta collector ca n recover S2.
[ 0089] In one embodiment, a s s ta ted in code cons truction
fra mework, tha t every ex a ct- regenera ting code ha s a s y s tem
a tic vers ion a nd further, tha t the code could be ma de s y s tem
a tic through a proces s of mes s a ge- s y mbol rema pping. The
folloW ing ma kes this more ex plicit in the contex t of the prod
uct- ma trix MSR code.
[ 0090] Let lPk be the ( k>< d) s ubma trix of 11', conta ining the
k roW s of II corres ponding to the k nodes W hich a re chos en to
be ma de s y s tema tic. The s et of k0. s y mbols s tored in thes e k
nodes a re given by the elements of the ( k>< 0t) ma trix 1PkM. Let
U be a ( k>< 0t) ma trix conta ining the B: k0t s ource s y mbols .
Ma pping
\ IJkMIU, ( 26)
a nd s olve for the entries of M in terms of the s y mbols in U.
This is precis ely the da ta - recons truction proces s tha t ta kes
pla ce W hen a da ta collector connects to the k chos en nodes .
Thus , the va lue of the entries in M ca n be obta ined by folloW
ing the procedure outlined in MSR D a ta - Recons truction.
Then, us e this M to obta in the code C IPM. C lea rly , in this
repres enta tion, the k chos en nodes s tore the s ource s y mbols U
in uncoded form.
[ 0091] In one embodiment, a n ex plicit MSR Product- Ma
trix C odes for dZ Z k- Z is des cribed. An MSR code for
d: 2k 2 ca n be us ed to obta in MSR codes for a ll dZ Z k- Z .
[ 0092] An ex plicit [ n': n+ 1, k': k+ 1, d': d+ 1] ex a ct- regen
era ting code C ' tha t a chieves the cut- s et bound a t the MSR
point ca n be us ed to cons truct a n ex plicit [ n, k, d] ex a ct
regenera ting code C tha t a ls o a chieves the cut- s et bound a t the
MSR point. Furthermore if d': a k'+ b in code C ', d: a k+ b+ ( a
1) in code C . If C ' is linea r, s o is C .
[ 0093] If both codes opera te a t the MSR point, then the
number of mes s a ge s y mbols B', B in the tW o ca s es mus t
s a tis fy
[ 0094 ] Firs t cons truct a n MSR- point- optima l [ n', d'] ex a ct
regenera ting code C in s y s tema tic form W ith the ? rs t k' roW s
conta ining the B' mes s a ge s y mbols . Let C " be the s ubcode of
C ' cons is ting of a ll code ma trices in C ' W hos e top roW is the
a ll- Z ero roW ( i. e. , the ? rs t a of the B' mes s a ge s y mbols a re a ll
Z ero) . C lea rly , the s ubcode C " is of s iZ e q BLOIq B. N ote tha t
C " a ls o pos s es s es the s a me ex a ct- regenera tion a nd da ta - re
cons truction properties a s does the pa rent code C '.
[ 0095] Let the code C noW be formed from s ubcode C " by
puncturing ( i. e. , deleting) the ? rs t roW in ea ch code ma trix of
C " . C lea rly , code C is a ls o of s iZ e q B. W e cla im tha t C is a n [ n,
k, d] ex a ct- regenera ting code. The da ta - recons truction
req uirement req uires tha t the B underly ing mes s a ge s y mbols
be recovera ble from the contents of a ny k roW s of a code
ma trix C in C . But this folloW s s ince, by a ugmenting the
N ov. 24 , 2011
ma trices of code C by pla cing a t the top a n a dditiona l a ll- Z ero
roW , to obta in a code ma trix in C " a nd code C " ha s the
property tha t the da ta ca n be recovered from a ny ( k+ 1) roW s
of ea ch code ma trix in C " . A s imila r a rgument s hoW s tha t code
C a ls o pos s es s es the ex a ct- regenera tion property . C lea rly if C '
is linea r, s o is code C . Fina lly ,
[ 0096] C odes for the ca s e d>: 2k 2 ca n noW be obta ined
s imply by itera ting the a bove procedure multiple times .
[ 0097 ] In one embodiment of the tW in- code fra mework, the
n s tora ge nodes in the dis tributed netW ork a re pa rtitioned
a rbitra rily into tW o ty pes , la belled a s Ty pe 0 a nd Ty pe 1 nodes .
The number of nodes of Ty pe 0 is denoted by no, a nd the
number of nodes of Ty pe 1 is denoted by n1. To a rrive a t the
da ta s tored in nodes of Ty pe 0, the mes s a ge is encoded us ing
a n a rbitra ry linea r block code C O. To a rrive a t the da ta s tored
in Ty pe 1 nodes , the mes s a ge s y mbols a re ? rs t permuted by
mea ns of s imple tra ns pos ition a nd then encoded by a s econd
a rbitra ry linea r block code C 1. The tW o codes do not neces
s a rily ha ve to be dis tinct. Repa ir of a fa iled node of one ty pe
is a ccomplis hed by doW nloa ding da ta from a s ubs et of nodes
of the other ty pe. Further, a da ta - collector ca n recover the
entire mes s a ge by connecting to a s ubs et of the nodes of the
s a me ty pe The mes s a ge to be s tored a cros s the netW ork com
pris es of B s y mbols dra W n from the ? nite ? eld F q . For iIO, 1,
let C , - be a n a rbitra ry [ n, , k] code over Fq ha ving ( k>< ni) gen
era tor ma trix Gi. Let g( l. , Z ) ( 1 l ni) denote the 1th | column of
Gi.
[ 0098] In one embodiment of the tW in- code fra meW ork is
encoding. The mes s a ge is ? rs t divided into fra gments com
pris ing of k2 s y mbols . The code for a s ingle fra gment a nd
cons tructions for the ca s e W hen the s iZ e of the mes s a ge is
la rger ca n be obta ined by conca tena ting multiple s uch fra g
ments . Thus , W e ha ve
BIk2 ( 27 )
[ 0099] Let the k2 s y mbols be a rra nged in the form of a ( k>< k)
ma trix MO, W hich W e W ill ca ll the mes s a ge ma trix . Let
M1 2 M5, ( 28)
W here the s upers cript t denotes the tra ns pos e. For iIO, 1, the
da ta s tored in the nodes of Ty pe i is obta inedus ing code C l. a nd
mes s a ge ma trix M, - a s folloW s . Ea ch node of Ty pe i s tores the
k s y mbols corres ponding to a column of the ( k>< ni) ma trix
MiGi. More s peci? ca lly , node 1 ( 1 l ni) s tores the s y mbols
in its 1th lcolumn, given by
Migw) ( 29)
g0. is encoding vector of this node. Thus , under our encoding
a lgorithm, every node s tores k s y mbols .
[ 0100] In one embodiment, the proces s es of da ta - recon
s truction a nd node- repa ir is des cribed under the TW in- code
fra meW ork, the proces s of node- repa ir is reduced to one of
era s ure decoding of the cons tituent codes . This is des cribed in
deta il in the s ubs eq uent s ubs ections . Here, W e de? ne the term
fea s ible s et of nodes W hich W ill a id in the des cription.
US 2011/0289351A1
[ 0101] A fea s ible s et ofnodes is a s et ofnodes ofthe s a me
ty pe, s a y Ty pe i, i in { 0, 1} , s uch tha t the components of the
codeword of the C ode C , - s tored in thes e nodes permit era s ure
decoding of C ode C i. For ex a mple, in the ca s e when C ode C O
is MD S, a ny s et ofk nodes of Ty pe 0 is a fea s ible s et.
[ 0102] In one embodiment, the da ta - recons truction a nd
node- repa ir for the ca s e when both C O a nd C 1 a re MD S codes
a re des cribed. MD S codes s erve a s a s imple a nd concrete
ex a mple, a nd MD S codes turn in a s trong performa nce in tha t
they s imulta neous ly minimiz e the s tora ge per node a s well a s
the a mount of downloa d req uired to repa ir a fa iled node. The
codes res ulting from the Twin- code fra mework when the
cons tituent codes a re MD S a re ca lled a s Twin- MD S ( dis trib
uted s tora ge) codes .
[ 0103] Minimiz ing Both Stora ge per N ode a nd Repa ir
Ba ndwidth for the ca s e when both C O a nd C l a re MD S codes
over Fq . Under the Twin- code fra mework, da ta - recons truc
tion ca n be a ccomplis hed by downloa ding the da ta s tored in
a ny fea s ible s et of nodes of the s a me ty pe. A fa iled node of one
ty pe ca n be repa ired by downloa ding one s y mbol ea ch from
a ny fea s ible s et of nodes of the other ty pe. In the pres ent ca s e,
the cons tituent codes C O a nd C 1 a re MD S, a nd permit decod
ing from a ny k nodes . Thus , in this ca s e,
[ 0104 ] a da ta - collector ca n recover the entire ? le by con
necting to a ny k nodes of the s a me ty pe a nd downloa ding
the k2 s y mbols s tored in them, a nd
[ 0105] a repla cement node of a certa in ty pe ca n recover
the k s y mbols tha t were s tored in the fa iled node by
downloa ding a s ingle s y mbol over Fq from a ny s ubs et of
k nodes belonging to the other ty pe.
[ 0106] For the da ta - recons truction in the a bove ca s e, let the
da ta - collector connects to the ? rs t k nodes of Ty pe 1. Thus ,
the da ta - collector ga ins a cces s to the k2 s y mbols :
[ 0107 ] The MD S property of code C l gua ra ntees linea r
independence of the corres ponding k columns of G1, a nd
hence the da ta - collector ca n recover the mes s a ge ma trix M1
by inverting the ma trix formed by thes e columns . Eq uiva
lently , the a ction of the da ta - collector ca n a ls o be viewed a s
era s ure decoding of code C 1 in the pres ence of ( nl k) era
s ures . For this , cons ider the k s y mbols in the ? rs t row of the
ma trix in eq ua tion ( 30) . C lea rly , thes e a re k components of
the codeword of code C l corres ponding to the k mes s a ge
s y mbols in the ? rs t row of M1. Thus , ea ch of the rows of
mes s a ge ma trix Ml ca n be decoded independently , a llowing
the da ta - collector to perform s ca la r decoding of the vector
code.
[ 0108] For node- repa ir, let node f of Ty pe 0 fa ils . Thus the
repla cement node needs to recover the k s y mbols
Mogm) ( 31)
[ 0109] Further, let the node connects to the ? rs t k nodes of
Ty pe 1. Under the Twin- code fra mework, the repla cement
node ma kes known its identity to ea ch of the helper nodes . In
turn, the 1th helper node ( 1 l k) | pa s s es the inner product of
the k s y mbols Mlgm? l s tored in it with the encoding vector
ga m of the fa iled node: gta mMlgui) Thus , the repla cement
node ga ins a cces s to the k s y mbols
N ov. 24 , 2011
[ 0110] N ow, s etting
1 A r ( 33)
H _ 510mm
we s ee tha t the repla cement node ha s a cces s to
lit[ g( l, l) - - - g( l, k) ] ( 34 )
[ 0111] It is clea r tha t [ 1t ca n be recovered by era s ure decod
ing of the MD S code C 1 under ( nl k) era s ures . Reca ll tha t by
cons truction, we ha ve MOIMtI, a nd hence
[ 0112] Thus , the vector 11. compris es of precis ely the k s y m
bols req uired by the repla cement node. It follows tha t repa ir
of a node of Ty pe 0 is reduced to era s ure decoding of code C 1.
[ 0113] The Twin- MD S codes a re optima l in the following
s ens e. Under the da ta - recons truction condition tha t ( a ) the
mes s a ge be recovera ble from one or more s ubs ets of k nodes ,
( b) every node mus t belong to a t lea s t one s uch s ubs et, a nd ( c)
ea ch node s tore a n eq ua l a mount of da ta , the s tora ge s pa ce of
B/kIk s y mbols req uired per node is clea rly the minimum
req uired. Furthermore, the downloa d of k s y mbols from the
network to recover the k s y mbols s tored prior to fa ilure, is
a ls o clea rly the minimum pos s ible. Thus , Twin- MD S codes
s imulta neous ly minimiZ e the s tora ge a nd the repa ir- ba nd
width. Us ing twin code fra mework s imulta neous minimiZ a
tion of the two res ources is a chieved.
[ 0114 ] FIG. 5 illus tra tes a numerica l ex a mple, where the
network ha s n: 9 nodes , with no: 4 , nl: 5, k: 3, giving
B: k2: 9. The code pres ented opera tes over the ? nite ? eld F7 .
The mes s a ge ma trices MO a nd Ml( : M O) popula ted by the
mes s a ge s y mbols { ml- PH, a nd the genera tor ma trices GO a nd
G1 a re:
m1m4 m7 1001 10111
[ 0115] It ca n be veri? ed tha t this code permits a da ta - col
lector to recons truct the entire mes s a ge from a ny three nodes
of the s a me ty pe. The ? gure a ls o ex pla ins the repa ir of node 1
of Ty pe 0, which ha s a n encoding vector [ 1 0 0] , with the help
ofnodes 2, 3 a nd 4 ofTy pe l.
[ 0116] In a Twin- code fra mework, a da ta - collector ca n
recover the entire mes s a ge by connecting to a ny fea s ible
s ubs et of nodes of the s a me ty pe. To s ee how da ta - recons truc
tion is a ccomplis hed, let us s uppos e tha t the da ta - collector
connects to a fea s ible s ubs et of Ty pe 1 nodes . Res tricting our
a ttention to nodes of Ty pe 1 a lone, the s y s tem is identica l to a
dis tributed s tora ge s y s tem employ ing only code C 1. Thus , the
da ta - collector ca n recover the entire mes s a ge by a pply ing the
decoding procedure of code C l, k times .
[ 0117 ] In one embodiment of the Twin- code fra mework, a
repla cement node of a certa in ty pe ca n recover the s y mbols
s tored in the fa iled node by downloa ding a s ingle s y mbol
from a ny fea s ible s ubs et of the other ty pe. The repa ir a lgo
rithm reduces the problem of node- repa ir to one of da ta
recons truction, however with a n a mount of downloa d which
is clos e to a s ma ll fra ction of the entire da ta . Let the fa ilure of
node f is of Ty pe 0. The repla cement node des ires to recover
the k s y mbols
US 2011/0289351A1
M .
05m)
Under the Twin- code fra mework, the repla cement node con
nects to a fea s ible s et of Ty pe 1 nodes a nd ma kes known its
identity . In turn, ea ch helper node pa s s es ba ck a n inner prod
uct of the k s y mbols Mlgui) s tored in it with the encoding
vector ga m of the fa iled node: helper node 1 pa s s es the s y mbol
[ 0118] As in the ca s e of MD S codes , s etting
E g gla nMl ( 37 )
the repla cement node ha s a cces s to the s y mbols
{ u guy blnode Z in the pa rticula r fea s ible s et] >l ( 3 8)
[ 0119] FIG. 6 illus tra tes da ta deploy ment us ing the Twin
code fra mework where ( a ) the point s ource tra ns fers coded
da ta to a s ubs et of nodes . ( b) , ( c) Thes e nodes now help in the
repa ir of nodes in their neighbourhood tha t ha ve not
received da ta . The s y s tem eventua lly rea ches a s ta te a s s hown
in ( d) .
[ 0120] Since the s et of helper nodes form a fea s ible s et of
Ty pe 1 nodes , it is clea r tha t the repla cement node ca n recover
pf through era s ure decoding of code C l. Under our fra me
work, MOIM l a nd hence,
[ 0121] The vector [ 1. thus compris es of precis ely the k s y m
bols req uired by the repla cement node. In this ma nner, our
repa ir a lgorithm req uires a downloa d eq ua l to a fra ction 1/k of
the da ta downloa d req uired during the da ta - recons truction
proces s .
[ 0122] The following a re the a dva nta ges :
[ 0123] D uring regenera tion of a fa iled node, the informa
tion pa s s ed on to the repla cement node by a helper node is
only a function of the index of the fa iled node. Thus , it is
independent of the identity of the d 1 other nodes tha t a re
pa rticipa ting in the regenera tion. Once a ga in, this reduces the
communica tion overhea d by req uiring les s informa tion to be
dis s emina ted.
[ 0124 ] Ha ndling Sy s tem D y na mics for a ny n: The s y s tem
works for a ll va lues of n, a nd independent of the va lues of the
pa ra meters k a nd d. This ma kes the code cons tructions pre
s ented here pra ctica lly a ppea ling, a s in a ny rea l- world dis
tributed s tora ge a pplica tion s uch a s a peer- to- peer s tora ge,
cloud s tora ge, etc, ex pa ns ion or s hrinking of the s y s tem s iZ e
is very na tura l. For ex a mple, in peer- to- peer s y s tems , indi
vidua l nodes a re free to come a nd go a t their own will.
[ 0125] The encoding ma trix 1P, for the codes des cribed, ca n
be chos en a s a Va nderrnonde ma trix . Then ea ch node vector
ca n be des cribed by j us t a s ca la r. Moreover with this choice,
the encoding, recons truction, a nd regenera tion opera tions
a re, for the mos t pa rt, identica l to encoding or decoding of
conventiona l Reed- Solomon codes .
[ 0126] D y na mic Ada pta tion to the N etwork Ba ndwidths is
a chieved. The ba s ic form of the propos ed codes a s s umes a
ra ther rigid s tructure, tha t of connecting to ex a ctly k nodes
a nd downloa ding a ll the da ta for recons truction, a nd connect
ing to ex a ctly d nodes for regenera tion. The propos ed codes
N ov. 24 , 2011
ca n be ea s ily enha nced to a highly ? ex ible s etup, where a da ta
collector or a new node repla cing a fa iled node ca n connect to
a va ria ble number of nodes a nd downloa d different a mounts
of da ta from ea ch node to which it connects to. In this wa y , it
ca n ma ke us e of pa ra llel downloa ding when it ha s a higher
connectivity thereby ma king its downloa d much fa s ter. Als o,
when the links a re not homogeneous , it ca n choos e to down
loa ds higher a mounts from the nodes with fa s ter link, a nd
les s er from nodes with s lower link.
[ 0127 ] The C a s e of N on- uniform s tora ge s pa ces , tha t is
when s tora ge ca pa cities of the nodes a re different, ea ch s tripe
ca n be encoded s epa ra tely with a different n ( a nd pos s ibly a
different k a nd d a s well) . A s tripe will s ee only thos e s tora ge
nodes in which the da ta perta ining to tha t s tripe is s tored, thus
s ea mles s ly ha ndling the ca s e of va ria ble 0t.
[ 0128] Recovering intermedia te ( Pa rtia l) D a ta is a chieved.
Since ea ch s tripe ca n be decoded independently , when only a
s ma ll chunk of intermedia te da ta is req uired, only the s tripes
conta ining the des ired da ta ca n be decoded, ins tea d of decod
ing the entire da ta .
[ 0129] Ea s ter D ownloa ds is a chieved. In the dis tributed
s tora ge s y s tem, a da ta collector ( repla cement node) ca n tra de
the network us a ge to increa s e the downloa d s peed. Ins tea d of
s ending downloa d req ues ts for recons tructing ( or, regenera t
ing) a s tripe to s ome k ( or, d) nodes , the da ta collector ca n
s end req ues ts to s ome k+ 1 ( or, d+ 1) nodes . Out of thes e, the k
( or, d) pa rts of the s tripe tha t a rrive ? rs t ca n be us ed for
recons truction ( regenera tion) , thereby decrea s ing the net
time for da ta recons truction ( node regenera tion) .
[ 0130] For Pa ra meters N ot C onforming to Either va ria tion:
Va ria tion 1 req uires the minimum pos s ible s tora ge per,
wherea s the va ria tion 2 point req uires the minimum pos s ible
repa ir ba ndwidth per s tora ge node ( though the a mount of
s tora ge is higher tha n the MSR point) . D ue to the s triped
na ture of our s y s tem, the s y s tem ca n be opera ted a t a ny
intermedia te point between via s tora ge- s pa ce s ha ring.
REFEREN C ES
[ 0131] [ 1] D . A. Pa tters on, G. Gibs on, a nd R. H. Ka tZ , A
C a s e for Redunda nt Arra y s of Inex pens ive D is ks ( RAID ) ,
in Proc. AC M SIGMOD interna tiona l conference on ma n
a gement of da ta , C hica go, USA, June 1988, pp. 109- 116.
[ 0132] [ 2] S. Rhea , P. Ea ton, D . Geels , H. W ea thers poon, B.
Z ha o, a nd J. Kubia towicZ , Pond: The Ocea nStore Proto
ty pe, in Proc. 2nd USEN IX conference on File a nd Stor
a ge Technologies ( FAST) , 2003, pp. 1- 14 .
[ 0133] [ 3] R. Bha gwa n, K. Ta ti, Y. C . C heng, S. Sa va ge, a nd
G. M. Voelker, Tota l Reca ll: Sy s tem Support for Auto
ma ted Ava ila bility Ma na gement, in Proc. 1s t conference
on Sy mpos ium on N etworked Sy s tems D es ign a nd Imple
menta tion ( N SD I) , 2004 .
[ 0134 ] [ 4 ] A. G. D ima kis , P. B. Godfrey , M. W a inwright,
a nd K. Ra mcha ndra n, N etwork C oding for dis tributed
s tora ge s y s tems , in Proc. 26th IEEE Interna tiona l C onfer
ence on C omputer C ommunica tions ( IN FOC OM) ,
Anchora ge, Ma y 2007 , pp. 2000- 2008.
[ 0135] [ 5] Y. W u, A. G. D ima kis , a nd K. Ra mcha ndra n,
D eterminis tic Regenera ting codes for D is tributed Stor
a ge, in Proc. 4 5thAnnua lAllerton C onference on C ontrol,
C omputing, a nd C ommunica tion, Urba na - C ha mpa ign,
September 2007 .
[ 0136] [ 6] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, Ex plicit C odes Minimiz ing Repa ir Ba nd
US 2011/0289351 A1
W idth for D is tributed Stora ge, in Proc. IEEE Informa tion
Theory W orks hop ( ITW ) , C a iro, Ja nua ry 2010.
[ 0137 ] [ 7 ] C . Sub a nd K. Ra mcha ndra n, Ex a ct- Repa ir
MD S C odes for D is tributed Stora ge Us ing Interference
Alignment, in Proc. IEEE Interna tiona l Sy mpos ium on
Informa tion Theory ( ISIT) , Aus tin, June 2010, pp. 161
165.
[ 0138] [ 8] Y. W u, Ex is tence a nd C ons truction of C a pa city
Achieving N etwork C odes for D is tributed Stora ge, in
IEEE Journa l on Selected Area s in C ommunica tions , vol.
28, no. 2, 2010, pp. 27 7 - 288.
[ 0139] [ 9] A. G. D ima kis , P. B. Godfrey , Y. W u, M. W a in
W right, a nd K. Ra mcha ndra n, N etwork C oding for D is
tributed Stora ge Sy s tems , IEEE Tra ns a ctions on Inforrna
tion Theory , vol. 56, no. 9, pp. 4 539- 4 551, 2010.
[ 014 0] [ 10] A. D uminuco a nd E. Biers a ck, A Pra ctica l
Study of Regenera ting C odes for Peer- to- Peer Ba ckup
Sy s tems , in Proc. 29th IEEE Interna tiona l C onference on
D is tributed C omputing Sy s tems ( IC D C S) , June 2009, pp.
37 6- 384 .
[ 014 1] [ 1 1] Y. W u a ndA. D ima kis , Reducing Repa ir Tra f
? c for Era s ure C oding- Ba s ed Stora ge via Interference
Alignment, in Proc. IEEE Interna tiona l Sy mpos ium on
Informa tion Theory ( ISIT) , Seoul, July 2009, pp. 227 6
2280.
[ 014 2] [ 12] K. V. Ra s hmi, N . B. Sha h, P. V. Kuma r, a nd K.
Ra mcha ndra n, Ex plicit C ons truction of Optima l Ex a ct
Regenera ting C odes for D is tributed Stora ge, in Proc. 4 7 th
Annua l Allerton C onference on C ommunica tion, C ontrol,
a nd C omputing, Urba na - C ha mpa ign, September 2009, pp.
124 3- 124 9.
[ 014 3] [ 13] D . C ullina , A. G. D ima kis , a nd T. Ho, Sea rch
ing for Minimum Stora ge Regenera ting C odes , in Proc.
4 7 th Annua l Allerton C onference on C ommunica tion,
C ontrol, a nd C omputing, Urba na - C ha mpa ign, September
2009.
[ 014 4 ] [ 14 ] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, Interference Alignment in Regenera ting
C odes for D is tributed Stora ge: N eces s ity a nd C ode C on
s tructions , IEEE Tra ns a ctions on Informa tion Theory ,
s ubmitted for publica tion. [ Online] . Ava ila ble: a rXiv:
1005. 1634 v2 [ cs . IT]
[ 014 5] [ 15] Y. W u, A C ons truction of Sy s tema tic MD S
C odes W ith Minimum Repa ir Ba ndW idth, IEEE Tra ns a c
tions on Informa tion Theory , s ubmitted for publica tion.
[ Online] . Ava ila ble: a rXiv: 0910. 24 86 [ cs . IT]
[ 014 6] [ 16] V. R. C a da mbe, S. A. Ja fa r, a nd H. Ma leki,
D is tributed D a ta Stora ge W ith Minimum Stora ge Regen
era ting C odes iEx a ct a nd Functiona l Repa ir a re As y mp
totica lly Eq ua lly Ef? cient. [ Online] . Ava ila ble: a rXiv:
1004 . 4 299 [ cs . IT]
[ 014 7 ] [ 17 ] C . Suh a nd K. Ra mcha ndra n, On the Ex is tence
of Optima l Ex a ct- Repa ir MD S C odes for D is tributed Stor
a ge. [ Online] . Ava ila ble: a rXiv: 1004 . 4 663 [ cs . IT]
[ 014 8] [ 18] N . B. Sha h, K. V. Ra s hmi, a nd P. V. Kuma r, A
Flex ible C la s s of Regenera ting C odes for D is tributed Stor
a ge, in Proc. IEEE Interna tiona l Sy mpos ium on Inforrna
tion Theory ( ISIT) , Aus tin, June 2010, pp. 194 3- 194 7 .
[ 014 9] [ 19] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, D is tributed Stora ge C odes W ith Repa ir
by - Tra ns fer a nd N on- a chieva bility of Interior Points on the
Stora ge- Ba ndW idth Tra deoff, IEEE Tra ns a ctions on
Informa tion Theory , s ubmitted for publica tion. [ Online] .
Ava ila ble: a rXiv: 1011. 2361 [ cs . IT]
N ov. 24 , 2011
[ 0150] [ 20] D . S. Berns tein, Ma trix ma thema tics : Theory ,
fa cts , a nd formula s W ith a pplica tion to linea r s y s tems
theory . Princeton Univers ity Pres s , 2005.
[ 0151] [ 21] M. Ma dda h- Ali, A. Mota ha ri, a ndA. Kha nda ni,
C ommunica tion over MIMO X C ha nnels : Interference
Alignment, D ecompos ition, a nd Performa nce Ana ly s is ,
IEEE Tra ns a ctions on Informa tion Theory , vol. 54 , no. 8,
pp. 34 57 - 34 7 0, 2008.
[ 0152] [ 22] V. C a da mbe a nd S. Ja fa r, Interference Align
ment a nd Spa tia l D egrees of Freedom for the k Us er Inter
ference C ha nnel, IEEE Tra ns a ctions on Informa tion
Theory , vol. 54 , no. 8, pp. 34 25- 34 4 1, Augus t 2008.
[ 0153] [ 23] A. G. D ima kis , P. B. Godfrey , Y W u, M. W a in
W right, a nd K. Ra mcha ndra n, N etW ork C oding for D is
tributed Stora ge Sy s tems , IEEE Tra ns , on Inf. Theory , vol.
56, no. 9, pp, 4 539- 4 551, 2010.
[ 0154 ] [ 24 ] K. V. Ra s hmi, N . B. Sha h, a nd P. V. Kuma r,
Optima l Ex a ct- Regenera ting C odes for the MSR a nd
MBR Points via a Product- Ma trix C ons truction, IEEE
Tra ns . on Inf. Theory , s ubmitted for publica tion. [ Online] .
Ava ila ble: a rx iv: 1005. 4 17 8 [ cs . IT]
[ 0155] [ 25] Z . W a ng, A. G. D ima kis , a nd J. Bruck,
Rebuilding for Arra y C odes in D is tributed Stora ge Sy s
tems , in AC TEMT, D ecember 2010.
[ 0156] [ 26] P. Elia s , Error- free C oding, IRE Tra ns a ctions
on Informa tion Theory , vol. 4 , no. 4 , pp. 29- 37 , September
1 954 .
[ 0157 ] The pres ent dis clos ure is not to be limited in terms of
the pa rticula r embodiments des cribed in this a pplica tion,
W hich a re intended a s illus tra tions of va rious a s pects . Ma ny
modi? ca tions a nd va ria tions ca n be ma de W ithout depa rting
from its s pirit a nd s cope, a s W ill be a ppa rent to thos e s killed in
the a rt. Functiona lly eq uiva lent methods a nd devices W ithin
the s cope of the dis clos ure, in a ddition to thos e enumera ted
herein, W ill be a ppa rent to thos e s killed in the a rt from the
foregoing des criptions . Such modi? ca tions a nd va ria tions a re
intended to fa ll W ithin the s cope of the a ppended cla ims . The
pres ent dis clos ure is to be limited only by the terms of the
a ppended cla ims , a long W ith the full s cope of eq uiva lents to
W hich s uch cla ims a re entitled. It is a ls o to be unders tood tha t
the terminology us ed herein is for the purpos e of des cribing
pa rticula r embodiments only , a nd is not intended to be limit
ing.
[ 0158] W ith res pect to the us e of a ny plura l a nd/ or s ingula r
terms herein, thos e ha ving s kill in the a rt ca n tra ns la te from
the plura l to the s ingula r a nd/ or from the s ingula r to the plura l
a s is a ppropria te to the contex t a nd/or a pplica tion. The va ri
ous s ingula r/plura l permuta tions ma y be ex pres s ly s et forth
herein for s a ke of cla rity .
[ 0159] In a ddition, W here fea tures or a s pects of the dis clo
s ure a re des cribed in terms of Ma rkus h groups , thos e s killed
in the a rt W ill recogniZ e tha t the dis clos ure is a ls o thereby
des cribed in terms of a ny individua l member or s ubgroup of
members of the Ma rkus h group.
[ 0160] W hile va rious a s pects a nd embodiments ha ve been
dis clos ed herein, other a s pects a nd embodiments W ill be
a ppa rent to thos e s killed in the a rt. The va rious a s pects a nd
embodiments dis clos ed herein a re for purpos es of illus tra tion
US 2011/0289351A1
a nd a re not intended to be limiting, with the true s cope a nd
s pirit being indica ted by the following cla ims .
1. A dis tributed da ta s tora ge network, compris ing:
a plura lity of s tora ge nodes ;
a n encoder con? gured to obta in a nd a s s ign pre- determined
a mount of da ta to be s tored in ea ch s tora ge node in the
network;
a da ta recons truction block a t ea ch end- us er to obta in
s ource da ta by connecting to one of a plura lity of s ubs ets
of the s tora ge nodes ; a nd
mea ns for repla cing a ny fa iled s tora ge node with a new
node;
a repa ir block in ea ch s tora ge node repla cing a fa iled s tor
a ge node for retrieving the da ta s tored in s a id fa iled node
by connecting to one of a plura lity of s ubs ets of s tora ge
nodes ; a nd
a repa ir a s s is ting block in ea ch s tora ge node for computing
a nd tra ns mitting da ta to a s s is t in the repa ir of fa iled
nodes .
2. The dis tributed da ta s tora ge network a s cla imed in cla im
1, wherein the da ta recons truction block a t a n end- us er per
forms da ta recons truction by s electing a ny one of the prede
termined s ubs ets of s tora ge nodes ; downloa ding the da ta
s tored in the s tora ge nodes in the s ubs et s elected; a nd decod
ing the da ta downloa ded to obta in the s ource da ta .
3. The dis tributed da ta s tora ge network a s cla imed in cla im
1, wherein the repa ir block in the new node repla cing a fa iled
s tora ge node choos es a ny one of the predetermined s ubs ets of
s tora ge nodes ; downloa ds a predetermined a mount of da ta
from ea ch of the s tora ge nodes in the chos en s ubs et; a nd
decodes the downloa ded da ta to obta in the da ta tha t wa s
s tored in the fa iled node.
4 . The dis tributed da ta s tora ge network a s cla imed in cla im
3, wherein the predetermined a mount of da ta downloa ded by
the repa ir block is the res ult of computa tion performed by the
repa ir a s s is ting block of the s tora ge nodes in s a id chos en
s ubs et.
5. A method of obta ining a dis tributed s tora ge network
compris ing a cts of:
performing coding opera tion on s ource da ta ( of s iZ e B) for
obta ining a s et of coded da ta ;
a s s igning ea ch node with a prede? ned a mount of coded
da ta ( or) ;
performing da ta recons truction opera tion a t the recons truc
tion block a t a n end us er by connecting to a predeter
mined number of s tora ge nodes ( k) ; a nd
performing repa ir opera tion a t repa ir block of a new node
repla cing a fa iled s tora ge node by retrieving da ta s tored
in the fa iled node by downloa ding prede? ned a mount of
coded da ta ( [ 3) from a predetermined number of s tora ge
nodes ( d) ; a nd a s s is ting in the repa ir of fa iled node by
performing a computa tion by the repa ir a s s is ting block
of d s tora ge nodes chos en by s a id new node a nd tra ns
mitting res ult of computa tion to s a id new node.
6. The method a s cla imed in cla im 5, wherein the pre
de? ned a mount of da ta downloa ded by the repa ir block of a
new node repla cing a fa iled node is minima l compa red to the
conventiona l techniq ues .
7 . The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined by ta king linea r combina tions of the s ource
da ta with a predetermined s et of encoding vectors .
8. The method a s cla imed in cla im 7 , wherein the coded
da ta ca n be repres ented a s a product of a predetermined
encoding ma trix a nd a predetermined mes s a ge ma trix .
11
N ov. 24 , 2011
9. The method a s cla imed in cla im 8, wherein the mes s a ge
ma trix is obta ined us ing the s ource da ta .
10. The method a s cla imed in cla im 8, wherein coded da ta
s tored in a ny of the s tora ge nodes is repres ented a s a product
of a predetermined vector with the mes s a ge ma trix .
11. The method a s cla imed in cla im 5, wherein the com
puta tion performed a t the repa ir a s s is ting block of a s tora ge
node a s s is ting in the repa ir of a ny fa iled node is repres ented a s
a linea r combina tion of s y mbols s tored in s a id s tora ge node.
12. The method a s cla imed in cla im 11, wherein the linea r
combina tion is ba s ed on the encoding vector a s s ocia ted with
the fa iled node.
13. The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined ba s ed on the rela tion between the tota l
a mount of da ta to be s tored ( B) , a mount of da ta s tored in ea ch
node ( or) , a mount of da ta downloa ded from ea ch node while
retrieving the da ta s tored in a fa iled node ( [ 3) , number s tora ge
nodes to which the end- us ers connect ( k) a nd number of
s tora ge nodes to which a new s tora ge node repla cing a fa iled
node connects ( d) .
14 . The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined a nd a s s igned to ea ch s tora ge node ba s ed on
minimum s tora ge regenera tion ( MSR) , minimum ba ndwidth
regenera tion ( MBR) a nd twin- code techniq ue.
15. The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of coded da ta in ea ch s tora ge node is lea s t for
the MSR techniq ue.
16. The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of da ta downloa ded by a new s tora ge node
repla cing a fa iled node is lea s t for the MBR techniq ue.
17 . The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of coded da ta in ea ch s tora ge node a nd the
prede? ned a mount of da ta downloa ded by a new s tora ge node
repla cing a fa iled node a re lea s t for the twin- code techniq ue.
18. The method a s cla imed in cla im 5, wherein coded da ta
is obta ined us ing MSR techniq ue a nd the da ta recons truction
opera tion a t a ny end- us er compris es a cts of s electing one of
the s ubs ets of a predetermined number of nodes ( k) ; down
loa ding the da ta s tored in the s tora ge nodes in the s ubs et
s elected; a nd decoding the da ta downloa ded to obta in the
s ource da ta .
19. The method a s cla imed in cla im 5, wherein coded da ta
is obta ined us ing the MSR techniq ue a nd retrieving da ta
s tored in a fa iled s tora ge node by a new node repla cing the
fa iled node compris es a cts of choos ing one of the s ubs ets of a
predetermined number of nodes ( d) ; downloa ding a predeter
mined a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in
the chos en s ubs et; a nd decoding the downloa ded da ta to
obta in the da ta tha t wa s s tored in the fa iled node.
20. The method a s cla imed in cla im 14 , wherein coded da ta
is obta ined us ing MBR techniq ue a nd the da ta recons truction
opera tion a t a ny end- us er compris es a cts of s electing one of
the s ubs ets of a predetermined number of nodes ( k) ; down
loa ding the da ta s tored in the s tora ge nodes in the s ubs et
s elected; a nd decoding the da ta downloa ded to obta in the
s ource da ta .
21. The method a s cla imed in cla im 14 , wherein coded da ta
is obta ined us ing MBR techniq ue a nd retrieving da ta s tored in
a fa iled s tora ge node by a new node repla cing the fa iled node
compris es a cts of choos ing one of the s ubs ets of a predeter
mined number of nodes ( d) ; downloa ding a predetermined
a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in the
chos en s ubs et; a nd decoding the downloa ded da ta to obta in
the da ta tha t wa s s tored in s a id fa iled node.
US 2011/0289351A1
22. The method a s cla imed in cla im 5, wherein for tW in
code techniq ue ea ch node in the network is a s s igned W ith one
of a plura lity of ty pes .
23. The method a s cla imed in cla im 5, W herein the da ta
recons truction opera tion is performed by choos ing one ty pe
of nodes to connect to from a plura lity of ty pes ; choos ing one
of a ll pos s ible s ubs ets of a predetermined number of nodes ( k)
cons is ting of nodes of the chos en ty pe; doW nloa ding the da ta
s tored in the s tora ge nodes in the chos en s ub s et; a nd decoding
the da ta doW nloa ded to obta in the s ource da ta .
24 . The method a s cla imed in cla im 5, W herein da ta to be
s tored in the neW node repla cing a fa iled s tora ge node is
obta ined by choos ing one ty pe of nodes to connect to from a
plura lity of ty pes , s a id ty pe being dis tinct from the ty pe of the
repla cement node; choos ing one of a ll pos s ible s ubs ets of a
predetermined number of nodes ( k) cons is ting of nodes of the
chos en ty pe; doW nloa ding a predetermined a mount of da ta
( [ 3) from ea ch of the s tora ge nodes in the chos en s ubs et; a nd
decoding the doW nloa ded da ta .
25. The method a s cla imed in cla im 5, W herein the coded
da ta s tored in the s tora ge nodes a s s ocia ted to ea ch one of a
plura lity of ty pes is obta ined us ing a ny a rbitra ry linea r error
correcting code.
26. The method a s cla imed in cla im 25, W herein the da ta
recons truction opera tion is performed by choos ing one ty pe
of nodes to connect to from a plura lity of ty pes ; choos ing one
N ov. 24 , 2011
of a ll pos s ible s ubs ets cons is ting of nodes of the chos en ty pe,
the s a id pos s ible s ubs ets being dependent on the linea r error
correcting code a s s ocia ted W ith the chos en ty pe; doW nloa d
ing the da ta s tored in the s tora ge nodes in the chos en s ubs et;
a nd decoding the da ta doW nloa ded us ing the decoding a lgo
rithm of the s a id linea r error- correcting code.
27 . The method a s cla imed in cla im 25, W herein da ta to be
s tored in the neW node repla cing a fa iled s tora ge node is
obta ined by choos ing one ty pe of nodes to connect to from a
plura lity of ty pes , the s a id ty pe being dis tinct from the ty pe of
the repla cement node; choos ing one of a ll pos s ible s ubs ets
cons is ting of nodes of the chos en ty pe, the s a id pos s ible
s ubs ets being dependent on the linea r error- correcting code
a s s ocia ted W ith the chos en ty pe; doW nloa ding a predeter
mined a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in
the chos en s ubs et; a nd us ing the decoding a lgorithm of the
linea r error- correcting code a s s ocia ted W ith the chos en ty pe.
28. The method a s cla imed in cla im 5, W herein the da ta to
be s tored in the s tora ge nodes ca n be deploy ed to the nodes in
a dis tributed ma nner.
29. The method a s cla imed in 28, W herein a node tha t ha s
not received the da ta a cts a s the repla cement of a fa iled node,
a nd doW nloa ds the des ired da ta from one of a plura lity of
s ubs ets of nodes W hich ha ve received the da ta .
* * * * *

You might also like