Final Database

Distributed Databases and
Peer-to-Peer Databases
Supervised By:
Dr. Bassam Hasan Hammo
Ayman Fetyani
Mohammed Musaddaq
Mohammed Ghanem
Distributed Databases
Architecture
Architecture
Peer-to-Peer
What are Peer-to-Peer systems?
• All nodes both clients

and servers
• Multiple connections
between nodes
• Notion of “equality”
hence “peers”
• Pure P2P = Zero
server
Potential benefits of P2P systems
Scale up to very large numbers of
peers
Dynamic self-organization
Load balancing
Parallel processing
High availability through massive
replication
A generic P2P system
• A user at a peer may access sharable
data at remote peers
P2P software
private sharable
P2P software
private sharable
P2P software
private sharable
8/26
Distributed database system
(DDBS)
• Distribution transparency
Queries, Transactions – Global schema
• Common data descriptions
Site 1
• Distributed data placement
Distributed – Centralized control through
Database global catalog
System – Distributed functions
Site 2 Site 3 • Schema mapping
• Query processing
DBMS1 DBMS2 • Transaction management
• Access control
• Etc.
10/26
DBS categories
• The various DBS categories can be characterized along the
following three dimensions:
• (i) Distribution, ranging from a centralized architecture (no
distribution) (D0) to a client-server distribution (moderate
distribution) (D1) to a peer-to-peer (or to full-scale distribution) (D2);
• (ii) Autonomy, ranging from zero autonomy (tight integration)(A0),

semi-autonomy (loose integration)(A1) to full autonomy or total
isolation (A2);
• (iii) Heterogeneity, ranging from zero heterogeneity (homogeneous

systems)(H0) to full heterogeneity (H1).
DBS categories(cont.)
• (A0,D1,H0) identifies properties of distributed database systems,
i.e., no heterogeneity and no autonomy.
• (A1,D0,H1) heterogeneous federated database systems .
• (A1,D1,H1) distributed heterogeneous federated database systems.
• (A2,D1,H1) Multi-databases
• (A2,D2,H1) distributed multi-databases
• These systems belong to the class of MDBSs: they are highly

decentralized, heterogeneous and totally independent of one
another, in the sense that each DBS component is not aware of the
existence of all other DBSs and their databases.
P2P vs DDBS
Data Integration Architecture
(a) FDBS/MDBS (b) PDBS

MDBS and PDBS
Simplified System Architecture of (a) a MDBS and (b) a PDBS

P2P network topologies
• Unstructured systems
– no predefined topology for linking the peers to each other. Query routing
is done by flooding.
– e.g. SETI@home, Gnutella
• Structured (DHT) systems

– There is a specific topology for peer linking.
– DHTs support a routing mechanism that allows the users to find
efficiently the peer responsible for a key.
– e.g. CAN, CHORD, Pastry, Pgrid
• Super-peer (hybrid) systems

– some peers are responsible for indexing and locating the shared data.
– e.g. Napster, Edutella
P2P unstructured network
• High autonomy (peer needs to know neighbor to login)
• Searching by flooding the network

– general, inefficient
• High-fault tolerance with replication

P2P structured network
• Efficient exact-match search

– O(log n) for put(key,value), get(key)
• Limited autonomy since a peer is responsible for a range of keys
Super-peer network
• Super-peers can perform complex functions (meta-data

management, indexing, acces control, etc.)
– Efficiency and QoS
– Restricted autonomy
– SP = single point of failure => use several
Requirements for P2P data
management (1)
• Autonomy of peers
– Peers should be able to join/leave at any time, control their data wrt
other (trusted) peers
• Query expressiveness
– Key-lookup, key-word search, SQL-like
• Efficiency
– Efficient use of bandwidth, computing power, storage
Requirements for P2P data
management (2)
• Quality of service (QoS)
– User-perceived efficiency: completeness of results, response time, data
consistency, …
• Fault-tolerance
– Efficiency and QoS despite failures
• Security
– Data access control in the context of very open systems
P2P systems comparison
Requirements Unstructured DHT Super-peer
Autonomy high low avg
.Query exp high low high
Efficiency low high high
QoS low high high
Fault-tolerance high high low
Security low low high

Data management in P2P systems
• Current research focuses on
– Decentralized schema mappings
• PeerDB: unstruct. network, keyword search only
– Extending DHT for complex querying
• PIER : exact-match and join queries
– Query reformulation
• Edutella: super-peer, RDF-based schemas
• Piazza: graph of pair-wise schema mappings
– Replication
• generally limited to static read-only files
• P-Grid addresses updates in structured networks
Data management in APPA (Atlas
P2P Architecture)
• Objectives
– Scalability, availability and performance
• Main features
– Network-independent architecture
– Layered, service-based architecture
– Replication with semantics-based reconciliation
– Decentralized schema management
– Schema-based query support and optimization
– Peer data caching
• Prototype on JXTA
– Network-independent P2P services
Network independent APPA
Different APPA architectures
Schema management in APPA
• Takes advantage of the collaborative
nature of the applications
– Peers that wish to cooperate agree on a
Common Schema Description (CSD)
• Given 2 CSD relation definitions, an
example of peer mapping at peer p is:
–
• Peer mappings stored as P2P data
Validation
• Implementation on JXTA
– Some support for APPA’s basic services
– Network-independent
• Experimentation on large clusters and
grid [Grid 5000]
• Simulation to scale up to very large P2P
systems
– Using SimJava and Brite
Thank you
We Hope this presentation has been

informative for you and thank you for
listening

Final Database

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Database

Uploaded by

Copyright:

Available Formats

Distributed Databases and

• All nodes both clients

• (ii) Autonomy, ranging from zero autonomy (tight integration)(A0),

• (iii) Heterogeneity, ranging from zero heterogeneity (homogeneous

• (A1,D0,H1) heterogeneous federated database systems .

• (A1,D1,H1) distributed heterogeneous federated database systems.

• (A2,D2,H1) distributed multi-databases

• These systems belong to the class of MDBSs: they are highly

(a) FDBS/MDBS (b) PDBS

Simplified System Architecture of (a) a MDBS and (b) a PDBS

• Structured (DHT) systems

• Super-peer (hybrid) systems

• High autonomy (peer needs to know neighbor to login)

• Searching by flooding the network

• High-fault tolerance with replication

• Efficient exact-match search

• Super-peers can perform complex functions (meta-data

Autonomy high low avg

.Query exp high low high

Efficiency low high high

QoS low high high

Fault-tolerance high high low

Security low low high

We Hope this presentation has been

You might also like