process gives every file a unique absolute identi-fier (SHA-1 collisions are considered nearly impos-sible) that can be verified quickly. Unlike withURLs, you can be certain that a CHK reference willpoint to the exact file intended. CHKs also permitidentical copies of a file inserted by different peo-ple to be automatically coalesced because everyuser will calculate the same key for the file.
Signed-subspace keys
.
The signed-subspace key(SSK) sets up a personal namespace that anyonecan read but only its owner can write to. You couldcreate a subspace for an archive on the Vietnam War, for example, by firstgenerating a randompublic-private key pair to identify it. To add a file you first choose a short text description, such as
politics/us/pentagon-papers
. You would thencalculate the file’s SSK by hashing the public half of the subspace key and the descriptive stringindependently before concatenating them andhashing again. Signing the file with the privatehalf of the key provides an integrity check as everynode that handles a signed-subspace file verifiesits signature before accepting it.To retrieve a file from a subspace, you need onlythe subspace’s public key (perhaps stored on your “keyring”) and the descriptive string, from which you can recreate the SSK. Adding or updating afile, on the other hand, requires the private key inorder to generate a valid signature. SSKs thusfacilitate trust by guaranteeing that the same pseu-donymous person created all files in the subspace,even though the subspace is not tied to a real-world identity. For example, you can use SSKs tosend out a newsletter, to publish a Web site, or (operated in reverse) to receive e-mail.Typically, SSKs are used to store indirect filescontaining pointers to CHKs rather than to storedata files directly. Indirect files combine the humanreadability and publisher authentication of SSKswith the fast verification of CHKs. They also allowdata to be updated while preserving referentialintegrity. To perform an update, the data’s owner first inserts a new version of the data, which willget a new CHK because the file contents are dif-ferent. The owner then updates the SSK to point tothe new version. The new version will be availableby the original SSK, and the old version willremain accessible by the old CHK. Indirect files canalso be used to split large files into multiple
pieces
by inserting each part under a separate CHK andcreating an indirect file that points to all the parts.
42
JANUARY • FEBRUARY 2002
http://computer.org/internet/
IEEEINTERNETCOMPUTING
Peer-to-Peer Networking
Related Work in P2P
The best-known systems similar to Freenetare Napster (http://www.napster.com/) andGnutella (http://gnutella.wego.com/),whichboth implement large-scale pooling of disk space among individual users.The majordifference is that whereas Freenet providesa file-storage service,these systemspro-vide a file-sharing service.That is,partici-pants make files available to others but donot push files to other nodes for storage.This architecture means that data is notpersistent in the network;rather,files areavailable only when their originators (orsubsequent requesters) are online.Anoth-er difference is that neither systemattempts to provide anonymity.Gnutella isalso extremely inefficient,broadcastingthousands of messages per request.Freenet more closely resembles the Eter-nity service,which was described in a pro-posal for a highly survivable network for per-manently and anonymously archivinginformation.
1
However,the proposal lackedspecifics on how to efficiently implementsuch a service.Free Haven is an Eternity-likeanonymous P2P publication system that usestrust mechanisms and file trading to enforceserver accountability and user anonymity.
2
Unfortunately,it can take a very long time — even days —to retrieve files from it.
Security Issues
Several recently developed P2P file-storagesystems focus on efficient data locationrather than privacy and security againstmalicious participants.Systems such asOceanStore,
3
CooperativeFile System(CFS),
4
and PAST
5
are all based on routingmodels in which each node is assigned afixed identity and maintains some knowl-edge of nodes whose identities vary inspecified ways from its own.These systemsdeterministically place data on nodes thatmost closely match the data’s globallyunique identifier (GUID).A user can thuslocate data by progressively visiting nodeswhose identities match more and morebits of the desired GUID.The main advan-tage to these systems is that they can pro-vide strong guarantees that data will belocated within certain time bounds (gener-ally logarithmic) if it exists.Thus,they canprovide better handling of issues like stor-age management.The main disadvantage of these systemsrelative to Freenet is that they are more dif-ficult to secure against attack.It is easier fora malicious node to manipulate its identityto gain responsibility for a particular pieceof data and suppress it.Links and routingare also more visible and deterministicallystructured,making it easier to trace mes-sages and harder to route around maliciousnodes that sabotage requests
(
for example,by pretending data could not be found
)
.PAST,as currently constituted,also requiresusers to trust external smart cards.
Privacy Issues
Systems focusing on privacy for informa-tion consumers include browser proxy ser-
continued on p.43
Leave a Comment