You are on page 1of 9

NSHM KNOWLEDGE CAMPUS, DURGAPUR-

GOI (College Code: 273)


CA1 Assessment

Distributed Query Optimization Techniques

Presented By

Student Name: DEBASIS GARAI


University Roll No.: 27300121013
University Registration 212730100110040(2021-22)
No.:
Branch: Computer Science Engineering
Year: 3rd
Semester: 6th

Paper Name: DISTRIBUTED SYSTEMS


Paper Code: PEC-IT601B
TABLE OF CONTENTS

 What are distributed systems ?


 Why do we need query optimization?
 Distributed Query Processing Architecture
 Optimal utilization resources
 Distributed query processing
 Key techniques for optimization
What are Distributed Systems ?

A distributed system is a collection of independent computers or nodes that work together to provide a unified and coherent set of services. In
a distributed system, these nodes are connected and communicate with each other to achieve a common goal or provide a specific
functionality. The key characteristics of distributed systems include:

1.Independent Nodes: Nodes in a distributed system are separate entities, each having its own memory, processing power, and possibly its
own operating system.
2.Communication: Nodes in a distributed system communicate with each other by passing messages. Communication can occur through
various methods, such as direct inter-process communication or through a network.
3.Shared Resources: Distributed systems often share resources such as data, files, or computational capabilities among the nodes. This
sharing allows for efficient utilization of resources and collaboration.
4.Concurrency: Distributed systems handle multiple tasks or processes concurrently. Nodes can work independently on different parts of a
task, contributing to parallel processing and improved performance.
5.Scalability: Distributed systems can scale horizontally by adding more nodes to the network. This scalability allows them to handle
increased workloads and adapt to changing requirements.
6.Fault Tolerance: Distributed systems are designed to be resilient in the face of failures. If one node fails, others can continue to operate,
ensuring the system's availability.
7.Consistency: Maintaining data consistency across distributed nodes is a challenge. Distributed systems employ various mechanisms, such
as distributed transactions and consensus algorithms, to ensure data consistency.
Why do we need to optimize query in case of
distributed systems ?

•Performance Enhancement: Optimize queries to reduce latency, improve throughput, and enhance overall system
performance.
•Resource Efficiency: Ensure efficient utilization of distributed resources, minimizing communication overhead, and
conserving bandwidth.

•Cost Reduction: Optimized queries lead to cost savings, particularly in scenarios where data transfer incurs charges.
•Scalability Support: Facilitate horizontal scalability by distributing and processing data efficiently across multiple nodes.
•Consistency and Reliability: Maintain data consistency across distributed nodes, preserving the integrity of the database.

•Adaptation to Heterogeneity: Handle diverse hardware and software configurations in distributed environments for seamless query
execution.
•Adherence to SLAs: Meet service level agreements by optimizing queries to achieve agreed-upon standards.
•Improved User Experience: Faster response times contribute to an enhanced overall user experience.
Distributed Query Processing Architecture
:

 In a distributed database system, processing a query comprises


of optimization at both the global and the local level. The query
enters the database system at the client or controlling site. Here,
the user is validated, the query is checked, translated, and
optimized at a global level.
 Distributed query optimization requires evaluation of a large
number of query trees each of which produce the required results
of a query. This is primarily due to the presence of large amount
of replicated and fragmented data. Hence, the target is to find an
optimal solution instead of the best solution.

 The main issues for distributed query optimization are −


1. Optimal utilization of resources in the distributed system.
2. Query trading.
3. Reduction of solution space of the query.
Optimal Utilization of Resources
techniques

1. Operation Shipping − In operation shipping, the operation is run at the


site where the data is stored and not at the client site. The results are
then transferred to the client site. This is appropriate for operations
where the operands are available at the same site. Example: Select
and Project operations.
2. Data Shipping − In data shipping, the data fragments are transferred
to the database server, where the operations are executed. This is
used in operations where the operands are distributed at different
sites. This is also appropriate in systems where the communication
costs are low, and local processors are much slower than the client
server.
3. Hybrid Shipping − This is a combination of data and operation
shipping. Here, data fragments are transferred to the high-speed
processors, where the operation runs. The results are then sent to the
client site.
Distributed Query Processing:

1.. Parallel Execution:


1. Utilize parallel processing to enhance query performance in a
distributed environment.
2.Data Fragmentation and Replication:
1. Efficiently manage data distribution through fragmentation and
replication strategies.
3.Query Routing:
1. Route queries to relevant nodes effectively, optimizing the path for
query execution.
4.Global Query Optimization:
1. Tackle challenges of optimizing queries across multiple nodes for a
cohesive and efficient approach.
Key Techniques for Optimization

1.Partitioning Strategies:
1. Optimize data distribution through horizontal and
vertical partitioning techniques.
2.Replication Strategies:
1. Improve availability and performance by
strategically implementing data replication
methods.
3.Caching Mechanisms:
1. Enhance query response times by implementing
efficient caching mechanisms for frequently
accessed data.
4.Load Balancing:
1. Ensure even distribution of workload among
distributed nodes for optimal resource utilization.
THANK
YOU

You might also like