You are on page 1of 7

Unit 8

I was absent…
Sorting: not much Important
1. Sorting may be requested by the query (e.g., order by) or is an important processing step for
other queries such as that involve a join.
2. If records are completely in main memory, standard sorting algorithms such as quick sort apply.
3. Otherwise, some records are still on disk, resulting in what is called external sorting.

Sorting Operations not much Important


1. We may build an index on the relation and then use the index to read the relation in sorted
order.
2. For relations that fit in memory, techniques like quick sort can be used.
3. For relations that do not fit in memory, external sort-merge is a good choice.
4. It means arranging the data either in ascending or descending order.
5. We use sorting not only for generating a sequence output, but also for satisfying conditions of
various database algorithms.
6. In query processing, the sorting method is used for performing various relational operations
such as joins efficiently.

Join Operation not much Important


1. Several different algorithms to implement joins:
a. Nested Loop Join:
i. In this method, we join the tables, inner and outer loops based on the table’s
records are created.
ii. The condition is tested in the innermost loop and if it satisfies then it will be
stored in the result set, else next record will be verified.
iii. The algorithm for this method can be written as below:
- For each record t in T // r ⨝ϴS
- loop
- For each record, S in S
- Loop
- Check
- I missed this part
 In above algorithm, we can see that each record of the outer
table t is verified with each record S of the inner table S
 Hence It is Very costly type of join.
 In the worst case, it requires NT +BT Seeks
 And (NT*BS)+BT Blocks transfers
b. Block Nested-Loop Join
c. Indexed Nested Loop Join
d. Merge-Join:
i. Tables used in the join query may be sorted or not sorted.
ii. Sorted tables gives efficient cost while joining.
iii. In this method, the columns used to join both tables is used to sort the two
tables.
iv.It uses merge sort technique to sort the records in two tables.
v.Once the sorting is done, then, join condition is applied to get the result.
vi.This method is usually used in natural join or equi join.
vii.Then the cost of this join would be:
- Cost of seeks = BT/m + BS/m
viii. Cost of Block Transfer = BT + BS
e. Hash-Join:
i. This method is also useful in case of Natural and equi join.
ii. We use Hash function h on the joining column, to divide the records of each
tables into different blocks.
2. Choice based on cost estimate:
a. Examples: Number of records of Customers :1000
Number of Depositors :5000
Number of Blocks of Customers :400
Number of Blocks of Customers :100

Evaluation of expressions
1. The obvious way to evaluate an expression is simply to evaluate one operation at a time, in
appropriate order.
2. There are two general approaches while evaluating an expression which are:
a. Materialization:
i. In this method, the given expression evaluates one relational operation at a
time.
ii. Each operation in the expression is evaluated one by one in appropriate order
and result of each operation is materialized (or created) in a temporary relation
which becomes input for subsequent operation.
iii. E.g., πP Name, email-ID (σcatogery=Nobel)
iv. From above example, the relation created by selection on book relation will be
temporary relation and then join will be evaluated between temporary relation
and publisher relation which further gives another temporary relation.
v. By repeating the process, we evaluate the operation at root of the tree that
gives the final result.
vi. A disadvantage to this approach is the need to construct the temporary relation
which must be written.
vii. Cost:
1. Generally, the cost of evaluating an expression is the addition of cost of
all operations and cost of writing intermediate result to disk.
2. We assume that records of the result accumulate in a buffer and when
the buffer is full, they are written to the disk. [overall cost = sum of
individual operations + cost of writing intermediate results to disk]
viii. Double buffering: Use two output buffers for each operation, when one buffer
is full, write it to disk while the other is getting filled.
b. Pipelining
i. The problem with materialization is just that lots of temporary files, lots of Input
and Output.
ii. With pipeline evaluation, the operation formed from a queue and results are
passed from one operation to another as they are calculated. Hence, the
techniques name.
iii. Avoids write-outs of entire intermediate relations.
iv. This is an alternative approach used to evaluate several operations
simultaneously with the results of operations passed onto the next without the
need to store a temporary relation.
v. Consider the expression, πa1, a2 (r⨝s)
vi. In case of materialization, evaluation would involve creating temporary
relation to whole result of join and then read that result to perform
projection.
vii. But in case of pipelining, when the join operation generates a tuple of its
result, it passes that tuple immediately to project operation for processing.
By combining join and projection, we avoid creating intermediate results.
viii. Advantages
1. Reduces the cost of query evaluation by eliminating the cost of
reading and writing temporary relations.
2. It can start generating every result quickly, if a root operator of a
query evaluation plan is combined in a pipeline with its inputs.

Query Optimization
1. The process of selecting the most efficient query evaluation plan from among the many
strategies usually possible for processing a given query specially if the query is complex.
2. E.g., Relation Schema

Instructure (ID, Name, dept-name, Salary)

Teaches (ID, course_ID, sec_ID, semester, year)

Course (Course_ID, title, dept_name, credits)

Find the name of all instructors in the music department together with the course title of all the
courses that the instructor teaches.

Ans1: πname, title(σdept_name = ‘music’ (instructor ⨝(teaches⨝πcourse_id, title)))

Ans2 … … …

3. Ma Dhila aayeko thiye


This includes-

 Data stored in Database.


 Database Server
 Database Management System
 Other Database Workflow Application

Database Security Issues


1. No security testing before deployment.
2. Poor encryption and data breaches come together.
3. Stolen database backups.
4. Week and complex DB(Database) infrastructure
5. Irregularities in database.
6. Limitless administration access, poor data production.
7. Inadequate key management.
Encryption and Decryption
- ………
- Different algorithms are used for encryption these algorithms generates keys related to the
encrypted data.
- These keys set a link between the encryption and decryption procedure
- The encrypted data can be decrypted only by using these keys.
- A DBMS can use encryption to protect information in certain situation where the normal
security mechanism of the DBMS is not adequate.
- Encryption is a technique to provide privacy of Data.
- In encryption, the message to be encrypted is known as plain text.
- The output of encryption process is known as cipher text.
- The cipher text is then transmitted over network.
- The process of converting the cipher text to plain text is called decryption.
- Encryption is performed at transmitting end and decryption is performed at receiving end.

Public Key Cryptography

- Public key cryptography is an encryption technique that uses a paired public and private
(asymmetric) key for secured data communication
- A message sender uses recipients a public key to encrypt a message
- To decrypt the sender’s message, only the recipients private key may be used.

Secret key Cryptography


- Here only one key is used for both encryption and decryption.
- This type of encryption is also referred to as symmetric encryption.
-

You might also like