03 en Distributed DBMS

BASICS OF DISTRIBUTED SYSTEMS
DISTRIBUTED DATABASES
INDEXES
 Indexes are dedicated to increase the speed of data access

 Selected field are used to be indexed
 Each index has its own structure, file
 Each index stores pointers to the real record
search-key pointer
 In most cases index file is smaller than the data file

STRUCTURES FOR INDEX STOREGA
 How the index will be stored depends on the DBMS

 Typical structures for index storage:
 Binary trees
 B-trees
 Clusters
 Hash functions
 ...
INDEX TYPES
 Indeksai būna:
 PRIMARY index is intended as a primary means to uniquely identify any row in the
table, so unlike UNIQUE it should not be used on any columns which allow NULL values
 UNIQUE refers to an index where all rows of the index must be unique
 INDEX refers to a normal non-unique index. Non-distinct values for the index are
allowed, so the index may contain rows with identical values in all columns of the index
 FULLTEXT indexes are only useful for full text searches done with the MATCH() /
AGAINST() clause, unlike the above three - which are typically implemented internally
using b-trees
 ...
CORRECT WAY TO ADD INDEXES
 In initial phase do no add indexes

 Dedicate time for DB optimization as it is as much important as
design, logics, etc.
 Create indexes based on queries you use in the system
 One table might have several indexes
 There can be indexes, composed of multiple fields
 Multiple field indexes ca be used for one field condition too
EXPLAIN COMMAND
 It is written before SELECT (or other) query

 It returns data on records selection
 Field KEY defines what indexes were used
 EXTRA provides additional information on data selection
 Good example
 EXTRA = index
 TYPE = const
 Bad example
 EXTRA = filesort
 TYPE = ALL
ANALYZE COMMAND
 Works similar to EXPLAIN command

 Search type is defined
 Number of inspected rows is presented
 Proportion of filtered out rows
 In some DBMS execution times are presented as well

SITUATION
 We have fields: User, Car, CarColor.

 Search of all cars which belongs to specific user
 Search of all cars of a specific color
 Search of all cars which belongs to specific user and is of specific color
 Simple indexes –
 user_id
 color_id
 Combined indexes
 user_id,color_id
 As the beginning of user_id and user_id.color_id are the same, the combined index can
be used as user_id index as well
DATABASE DISTRIBUTION
 Centralized database
 All data is stored in one place (server)
 All DMBS functions are executed in this server
 Parallel database
 All data is stored in different servers, located in the same LAN
 Each server has its own DBMS
 Distributed database
 All data is stored in different servers, located in different locations, internet
 Each server has its own DBMS
THE NEED OF PARALLEL, DISTRIBUTED DB
 Specific of enterprise or its departments, access limitation

 Data is used for a specific department only, no need of all data
 Increase of data availability and integrity

 Different parts of the database are independent and will not influence total
crash
 Faster data selection
 Some queries might work faster if its divided into small, parallel queries
DIFFERENT DB ARCHITECTURE
 Shared memory Shared disk Nothing is shared

REQUIREMENTS FOR DISTRIBUTED DB
 Local independency (nodes and its work are independent)

 Independency on the central node
 Uninterrupted execution (integrity and availability)
 Independency on localization (it does not matter where the node is)
 Independency on fragmentation
 Independency on replication (data replication is not notices by the final user)
 Processing of distributed queries (automated and optimized data distribution and execution)
 Distributed transaction management
 Technical and OS independency
 Independency on DBMS
PARTITIONS
 DBMS might support partitions rather than database distribution

 In the same database data of one table is divided into multiple tables
Vertical partition Horizontal partition
TRANSPARANCY
 Data are distributed in

different locations
 User writes queries without
knowing where the data is
located
HOMOGENY AND HETEROGENIC DB
 Homogeny DB
 In each of the servers the same DBMS is
working
 Those DBMS can exchange the data directly
 Heterogenic DB
 In each of the servers different DBMS is
working
 It might be even different type DB (relational,
file, etc.)
 For communication between DMBS special
interface is needed
QUERY EXECUTION
 Direct
 Executed in one server
 Indirect
 Executed in different server,
transferred by the first one
Netiesioginė
Tiesioginė
REPLICATION AND FRAGMENTATION
 Replication – usage of multiple copies of the same data

 Data loss prevention
 Load balancing
 Slower data update (multiple copies have to be synchronized)
 Fragmentation – data division into separate parts

 Load balancing
 Data traceability mechanism is needed
REPLICATION
 No replicas
 Full replication
 Partial replication
 Replication is useful in situations where data selection is more common than data
update
 Replication can by done synchronically (on the same time) or asynchonically (the
data is update from time to time)
CORRECTNESS OF THE FRAGMENTATION
 Completeness
 All records belong to one of the fragments
 Uniqueness
 Intersection of two fragments is an empty set
 Each record should belong to one fragment only
 Returning
 It should be possible to get all data from all the fragments
 It should be able to get data from specific fragment (knowing or not knowing where, in
which fragment it is stored)
HORIZONTAL FRAGMENTATION
 Data is divided into fragments by rows
TYPES OF HORIZONTAL FRAGMENTATION
 Round-robin
 Consequential distribution
 In order to find a record all DB fragments have to be overviewed
 Hash value
 Similar to Round-robin, but is simpler to use for complex fields
 Range
 Suitable for search by range
VERTICAL FRAGMENTATION
 Data is divided into fragments by fields
MIXED FRAGMENTATION
QUESTIONS?
NOW, BY E-MAIL, …

03 en Distributed DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 en Distributed DBMS

Uploaded by

Copyright:

Available Formats

BASICS OF DISTRIBUTED SYSTEMS

 Indexes are dedicated to increase the speed of data access

 In most cases index file is smaller than the data file

 How the index will be stored depends on the DBMS

 In initial phase do no add indexes

 It is written before SELECT (or other) query

 Works similar to EXPLAIN command

 In some DBMS execution times are presented as well

 We have fields: User, Car, CarColor.

 Specific of enterprise or its departments, access limitation

 Increase of data availability and integrity

 Shared memory Shared disk Nothing is shared

 Local independency (nodes and its work are independent)

 DBMS might support partitions rather than database distribution

 Data are distributed in

 Replication – usage of multiple copies of the same data

 Fragmentation – data division into separate parts

You might also like