You are on page 1of 24

BASICS OF DISTRIBUTED SYSTEMS

DISTRIBUTED DATABASES
INDEXES

 Indexes are dedicated to increase the speed of data access


 Selected field are used to be indexed
 Each index has its own structure, file
 Each index stores pointers to the real record
search-key pointer

 In most cases index file is smaller than the data file


STRUCTURES FOR INDEX STOREGA

 How the index will be stored depends on the DBMS


 Typical structures for index storage:
 Binary trees
 B-trees
 Clusters
 Hash functions
 ...
INDEX TYPES

 Indeksai būna:
 PRIMARY index is intended as a primary means to uniquely identify any row in the
table, so unlike UNIQUE it should not be used on any columns which allow NULL values
 UNIQUE refers to an index where all rows of the index must be unique
 INDEX refers to a normal non-unique index. Non-distinct values for the index are
allowed, so the index may contain rows with identical values in all columns of the index
 FULLTEXT indexes are only useful for full text searches done with the MATCH() /
AGAINST() clause, unlike the above three - which are typically implemented internally
using b-trees
 ...
CORRECT WAY TO ADD INDEXES

 In initial phase do no add indexes


 Dedicate time for DB optimization as it is as much important as
design, logics, etc.
 Create indexes based on queries you use in the system
 One table might have several indexes
 There can be indexes, composed of multiple fields
 Multiple field indexes ca be used for one field condition too
EXPLAIN COMMAND

 It is written before SELECT (or other) query


 It returns data on records selection
 Field KEY defines what indexes were used
 EXTRA provides additional information on data selection
 Good example
 EXTRA = index
 TYPE = const
 Bad example
 EXTRA = filesort
 TYPE = ALL
ANALYZE COMMAND

 Works similar to EXPLAIN command


 Search type is defined
 Number of inspected rows is presented
 Proportion of filtered out rows

 In some DBMS execution times are presented as well


SITUATION

 We have fields: User, Car, CarColor.


 Search of all cars which belongs to specific user
 Search of all cars of a specific color
 Search of all cars which belongs to specific user and is of specific color
 Simple indexes –
 user_id
 color_id
 Combined indexes
 user_id,color_id
 As the beginning of user_id and user_id.color_id are the same, the combined index can
be used as user_id index as well
DATABASE DISTRIBUTION

 Centralized database
 All data is stored in one place (server)
 All DMBS functions are executed in this server
 Parallel database
 All data is stored in different servers, located in the same LAN
 Each server has its own DBMS
 Distributed database
 All data is stored in different servers, located in different locations, internet
 Each server has its own DBMS
THE NEED OF PARALLEL, DISTRIBUTED DB

 Specific of enterprise or its departments, access limitation


 Data is used for a specific department only, no need of all data

 Increase of data availability and integrity


 Different parts of the database are independent and will not influence total
crash
 Faster data selection
 Some queries might work faster if its divided into small, parallel queries
DIFFERENT DB ARCHITECTURE

 Shared memory Shared disk Nothing is shared


REQUIREMENTS FOR DISTRIBUTED DB

 Local independency (nodes and its work are independent)


 Independency on the central node
 Uninterrupted execution (integrity and availability)
 Independency on localization (it does not matter where the node is)
 Independency on fragmentation
 Independency on replication (data replication is not notices by the final user)
 Processing of distributed queries (automated and optimized data distribution and execution)
 Distributed transaction management
 Technical and OS independency
 Independency on DBMS
PARTITIONS

 DBMS might support partitions rather than database distribution


 In the same database data of one table is divided into multiple tables
Vertical partition Horizontal partition
TRANSPARANCY

 Data are distributed in


different locations
 User writes queries without
knowing where the data is
located
HOMOGENY AND HETEROGENIC DB

 Homogeny DB
 In each of the servers the same DBMS is
working
 Those DBMS can exchange the data directly

 Heterogenic DB
 In each of the servers different DBMS is
working
 It might be even different type DB (relational,
file, etc.)
 For communication between DMBS special
interface is needed
QUERY EXECUTION

 Direct
 Executed in one server

 Indirect
 Executed in different server,
transferred by the first one
Netiesioginė

Tiesioginė
REPLICATION AND FRAGMENTATION

 Replication – usage of multiple copies of the same data


 Data loss prevention
 Load balancing
 Slower data update (multiple copies have to be synchronized)

 Fragmentation – data division into separate parts


 Load balancing
 Data traceability mechanism is needed
REPLICATION

 No replicas
 Full replication
 Partial replication

 Replication is useful in situations where data selection is more common than data
update

 Replication can by done synchronically (on the same time) or asynchonically (the
data is update from time to time)
CORRECTNESS OF THE FRAGMENTATION

 Completeness
 All records belong to one of the fragments
 Uniqueness
 Intersection of two fragments is an empty set
 Each record should belong to one fragment only
 Returning
 It should be possible to get all data from all the fragments
 It should be able to get data from specific fragment (knowing or not knowing where, in
which fragment it is stored)
HORIZONTAL FRAGMENTATION
 Data is divided into fragments by rows
TYPES OF HORIZONTAL FRAGMENTATION

 Round-robin
 Consequential distribution
 In order to find a record all DB fragments have to be overviewed

 Hash value
 Similar to Round-robin, but is simpler to use for complex fields

 Range
 Suitable for search by range
VERTICAL FRAGMENTATION
 Data is divided into fragments by fields
MIXED FRAGMENTATION
QUESTIONS?
NOW, BY E-MAIL, …

You might also like