You are on page 1of 21

ASSIGNMENT NO.

1. Explain Mobile database with Architecture?


Ans:

Mobile Database is a database that is transportable, portable and physically separate or detached
from the corporate database server but has the capability to communicate with those servers from
remote sites allowing the sharing of various kinds of data.

With mobile databases, users have access to corporate data on their laptop, PDA, or other
Internet access device that is required for applications at remote sites.

The components of a mobile database environment include:

 Corporate database server and DBMS that deals with and stores the corporate data and
provides corporate applications
 Remote database and DBMS usually manages and stores the mobile data and provides
mobile applications
 mobile database platform that includes a laptop, PDA, or other Internet access devices
 Two-way communication links between the corporate and mobile DBMS.

Based on the particular necessities of mobile applications, in many of the cases, the user might
use a mobile device may and log on to any corporate database server and work with data there,
while in others the user may download data and work with it on a mobile device or upload data
captured at the remote site to the corporate database. The communication between the corporate
and mobile databases is usually discontinuous and is typically established or gets its connection
for a short duration of time at irregular intervals. Although unusual, some applications require
direct communication between the mobile databases. The two main issues associated with
mobile databases are the management of the mobile database and the communication between
the mobile and corporate databases. In the following section, we identify the requirements of
mobile DBMSs.

The additional functionality required for mobile DBMSs includes the capability to:

 communicate with the centralized or primary database server through modes


 repeat those data on the centralized database server and mobile device
 coordinate data on the centralized database server and mobile device
 capture data from a range of sources such as the Internet
 deal with those data on the mobile device
 analyze those data on a mobile device
 create customized and personalized mobile applications
The mobile database includes the following components:

1. The main system database that stores all the data and is linked to the mobile database.

2. The mobile database that allows users to view information even while on the move. It
shares information with the main database.

3. The device that uses the mobile database to access data. This device can be a mobile
phone, laptop etc.

4. A communication link that allows the transfer of data between the mobile database and
the main database.
Some advantages of mobile databases are:

1. The data in a database can be accessed from anywhere using a mobile database. It
provides wireless database access.

2. The database systems are synchronized using mobile databases and multiple users can
access the data with seamless delivery process.

3. Mobile databases require very little support and maintenance.

4. The mobile database can be synchronized with multiple devices such as mobiles,
computer devices, laptops etc.
Some disadvantages of mobile databases are:

1. The mobile data is less secure than data that is stored in a conventional stationary
database. This presents a security hazard.

2. The mobile unit that houses a mobile database may frequently lose power because of
limited battery. This should not lead to loss of data in database.

2. Explain Temporal databases?


Ans:

A temporal database is generally understood as a database capable of supporting storage and


reasoning of time-based data. For example, medical applications may be able to benefit from
temporal database support a record of a patient's medical history has little value unless the test
results, e.g. the temperatures, are associated to the times at which they are valid, since we may
wish to do reasoning about the periods in time in which the patient’s temperature changed.

Temporal databases store temporal data, i.e. data that is time dependent (time varying). Typical
temporal database scenarios and applications include time-dependent/time-varying economic
data, such as:

 Share prices

 Exchange rates

 Interest rates

 Company profits

The desire to model such data means that we need to store not only the respective value but also
an associated date or a time period for which the value is valid. Typical queries expressed
informally might include:

 Give me last month's history of the Dollar-Pound Sterling exchange rate.

 Give me the share prices of the NYSE on October 17, 1996.

Example: -

What the
Date Real world event Database Action
database shows
There is no person
April 3, 1975 John is born Nothing
called John Doe

Inserted: Person
John's father officially reports John Doe lives in
April 4, 1975 (John Doe,
John's birth Smallville
Smallville)

After graduation, John moves to


August 26, John Doe lives in
Big town, but forgets to register his Nothing
1994 Smallville
new address

December John Doe lives in


Nothing Nothing
26, 1994 Smallville

Updated: Person
December John Doe lives in
John registers his new address (John Doe, Big
27, 1994 Big town
town)

Deleted: Person There is no person


April 1, 2001 John dies
(John Doe) called John Doe

3. Explain Spatial databases?

Ans :

Spatial data is associated with geographic locations such as cities, towns etc. A spatial database
is optimized to store and query data representing objects. These are the objects which are defined
in a geometric space.

Characteristics of Spatial Database

A spatial database system has the following characteristics

1. It is a database system
2. It offers spatial data types (SDTs) in its data model and query language.
3. It supports spatial data types in its implementation, providing at least spatial indexing and
efficient algorithms for spatial join.

Example

A road map is a visualization of geographic information. A road map is a 2-dimensional object


which contains points, lines, and polygons that can represent cities, roads, and political
boundaries such as states or provinces. 

In general, spatial data can be of two types:

1. Vector data: This data is represented as discrete points, lines and polygons
2. Raster data: This data is represented as a matrix of square cells.

4. Write short note on ARIES?

Ans:

ARIES stands for “Algorithm for Recovery and Isolation Exploiting Semantics.” It was designed
to support the needs of industrial strength transaction processing systems. ARIES uses logs to
record the progress of transactions and their actions which cause changes to recoverable data
objects. The log is the source of truth and is used to ensure that committed actions are reflected
in the database, and that uncommitted actions are undone. Conceptually the log is a single ever-
growing sequential file (append-only). Every log record has a unique log sequence number
(LSN), and LSNs are assigned in ascending order.

Log records are first written to volatile storage (e.g. in-memory), and at certain times – such as
transaction commit – the log records up to a certain point (LSN) are written to stable storage.
This is known as forcing the log up to that LSN. A system may periodically force the log buffers
as they fill up.

Log records may contain redo and undo information. A log record containing both is called an
undo-redo log record, there may also be undo-only log records and redo-only log records. A redo
record contains the information to redo a change made by a transaction (if they have been lost).
An undo record contains the information needed to reverse a change made by a transaction (in
the event of rollback). This information can simply be a copy of the before/after image of the
data object, or it can be a description of the operation that needs to be perform to undo/redo the
change.

All of this logging is in aid of recovery from failure. There are three basic types of failure we
need to concern ourselves with:

1. Failure of a transaction (such that its updates need to be undone).


2. Failure of the database management system itself – in this scenario we assume that
volatile storage contents are lost and recovery must be performed using the nonvolatile
versions of the database and log.
3. Failure of media/device – in this scenario the contents of just that media are lost, and the
lost data must be recovered using an image copy (archive dump) version of the lost data
plus the log. Recovery independence is the notion that it should be possible to perform
media recovery or restart recovery of objects at different granularities rather than only at
the entire database level.

The ARIES recovery procedure consists of three main steps:

1. Analysis

The analysis step identifies the dirty (updated) pages in the buffer (Note 6), and the set of
transactions active at the time of the crash. The appropriate point in the log where the
REDO operation should start is also determined

2. REDO

The REDO phase actually reapplies updates from the log to the database. Generally, the
REDO operation is applied to only committed transactions. However, in ARIES, this is
not the case. Certain information in the ARIES log will provide the start point for REDO,
from which REDO operations are applied until the end of the log is reached. In addition,
information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone has actually been applied to the database and hence
need not be reapplied. Thus only the necessary REDO operations are applied during
recovery.

3. UNDO

During the UNDO phase, the log is scanned backwards and the operations of transactions
that were active at the time of the crash are undone in reverse order. The information
needed for ARIES to accomplish its recovery procedure includes the log, the Transaction
Table, and the Dirty Page Table. In addition, check pointing is used. These two tables are
maintained by the transaction manager and written to the log during check pointing.

Checkpointing in ARIES consists of the following: writing a begin_checkpoint record to the


log, writing an end_checkpoint record to the log, and writing the LSN of the begin_checkpoint
record to a special file. This special file is accessed during recovery to locate the last checkpoint
information. With the end_checkpoint record, the con-tents of both the Transaction Table and
Dirty Page Table are appended to the end of the log. To reduce the cost, fuzzy checkpointing is
used so that the DBMS can continue to execute transactions during checkpointing (see Section
23.1.4). Additionally, the contents of the DBMS cache do not have to be flushed to disk during
checkpoint, since the Transaction Table and Dirty Page Table—which are appended to the log on
disk—contain the information needed for recovery. Note that if a crash occurs during
checkpointing, the special file will refer to the previous checkpoint, which is used for recovery.

Consider the recovery example shown in Figure 23.5. There are three transactions: T1, T2, and T3.
T1 updates page C, T2 updates pages B and C, and T3 updates page A.
Figure 23.5(a) shows the partial contents of the log, and Figure 23.5(b) shows the contents of the
Transaction Table and Dirty Page Table. Now, suppose that a crash occurs at this point. Since a
checkpoint has occurred, the address of the associated begin_checkpoint record is retrieved,
which is location 4. The analysis phase starts from location 4 until it reaches the end. The
end_checkpoint record would contain the Transaction Table and Dirty Page Table in Figure
23.5(b), and the analysis phase will further reconstruct these tables. When the analysis phase
encounters log record 6, a new entry for transaction T3 is made in the Transaction Table and a
new entry for page A is made in the Dirty Page Table. After log record 8 is analyzed, the status
of transaction T2 is changed to committed in the Transaction Table. Figure 23.5(c) shows the two
tables after the analysis phase.
For the REDO phase, the smallest LSN in the Dirty Page Table is 1. Hence the REDO will start
at log record 1 and proceed with the REDO of updates. The LSNs {1, 2, 6, 7} corresponding to
the updates for pages C, B, A, and C, respectively, are not less than the LSNs of those pages (as
shown in the Dirty Page Table). So those data pages will be read again and the updates reapplied
from the log (assuming the actual LSNs stored on those data pages are less then the
corresponding log entry). At this point, the REDO phase is finished and the UNDO phase starts.
From the Transaction Table (Figure 23.5(c)), UNDO is applied only to the active transaction T3.
The UNDO phase starts at log entry 6 (the last update for T3) and proceeds backward in the log.
The backward chain of updates for transaction T3 (only log record 6 in this example) is followed
and undone.

5. State ACID properties? With example.

Ans :

A transaction is a single logical unit of work which accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties are
followed. These are called ACID properties.

Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.

Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X)
but before write(Y)), then amount has been deducted from X but not added to Y. This results in
an inconsistent database state. Therefore, the transaction must be executed in entirety in order to
ensure correctness of database state.

 
Consistency
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to correctness of a database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a
result T is incomplete.

 
Isolation
This property ensures that multiple transactions can occur concurrently without leading to
inconsistency of database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that
particular change in that transaction is written to memory or has been committed. This property
ensures that the execution of transactions concurrently will result in a state that is equivalent to a
state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and
sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after a they have been made to the main
memory.

 
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if system
failure occurs. These updates now become permanent and are stored in a non-volatile memory.
The effects of the transaction, thus, are never lost.

The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of
a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes are
durably stored.

6. What is Test Serializability ? Discuss Test Serializability?

Ans :

Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a
pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of
vertices is used to contain all the transactions participating in the schedule. The set of edges is
used to contain all edges Ti ->Tj for which one of the three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).


2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

 If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are
executed before the first instruction of Tj is executed.
 If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the
precedence graph has no cycle, then S is known as serializable.

Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges

Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.

7. Illustrate serial schedule in which T1 followed by T2 the values after final


execution are Rs 855 and Rs 2145 current value of account A is 1000 and
account B is 2000 ?

Ans :

Given current balance of account A is 1000 and account B is 2000.

While the value after the transaction are Rs 855 and Rs 2145 respectively for account A and
account B.

Lets assume 10% of the transfer for T2 followed after completion of T1.

Hence the supposed amount for transaction T1 is Rs.50.

Therefore the transactions T1 followed by T2 using serial scheduling are as follows:

T1 T2
Read(A)  1000

A:=A-50  950

Write(A)  950

Read(B)  2000

B:=B+50  2050

Write(B)  2050

Read(A)  950

Temp =A*0.1  95

A:=A-Temp  855

Write(A)  855

Read(B)  2050

B:=B+Temp  2145

Write(B)  2145

8. Discuss the Basic steps of Query processing with example ?

Ans :

Query Processing would mean the entire process or activity which involves query translation into
low level instructions, query optimization to save resources, cost estimation or evaluation of
query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize
the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which
typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;
Figure 1 - Steps in Database Query Processing
Let us discuss the whole process with an example. Let us consider the following two relations as
the example tables for our discussion;

Employee(Eno, Ename, Phone)


Proj_Assigned(Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working in a
project which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let us consider the
SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the user’s
privileges to execute the query, the table names and attribute names, etc. The correct table
names, attribute names and the privilege of the users can be taken from the system catalog (data
dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low level
instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the table, etc.
Optimizer also checks for the conditions and conditional attributes which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using the
information collected in step 3 to find different relational algebra expressions that are equivalent
and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given
below;
πEname(Employee ⋈Eno (σDOP>10 (Prof_Assigned)))
So far, we have got two execution plans. Only condition is that both plans should give the same
result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return
same result (obvious), they differ in terms of Time consumption to execute the query, or the
Space required executing the query. Hence, it is mandatory choose one plan which obviously
consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Execution
plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined relation.
This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second plan
is the best (with the information known, not always).
Output:
The result is shown to the user.

9. Discuss Measures of Query cost using Selection Join and other operation
with example.
Ans :
There are multiple possible evaluation plans for a query, and it is important to be able to compare
the alternatives int terms of their (estimated) cost, and choose the best plan. To do so, we must
estimate the cost of individual operations, and combine them to get the cost of a query evaluation
plan. Thus, as we study evaluation algorithms for each operation, we also outline how to estimate
the cost operation.
The cost of query evaluation can be measured int terms of a number of different resources,
including disk accesses, CPU time to execute a query, and, in a distributed or parallel database
system, the cost of communication.
In large database systems, the cost to access data from disk is usually the most important cost,
sine disk accesses are slow compared to in-memory operations. Moreover, CPU speeds have
been improving much faster than have disk speeds. Thus, it is likely that the time spent in disk
activity will continue to dominate the total time to execute a query. The CPU time taken for a
task is harder to estimate since it depends on low-level details of the execution code. Although
real life query optimizers do take CPU costs into account, for simplicity in this we ignore CPU
costs and only use disk-access costs to measure the cost of a query-evaluation plan.
Alternative ways of evaluating a given query :
– Equivalent expressions.
– Different algorithms for each operation
For example:
1.

2.
10) Consider  the following Schedule:
 S=r1(x);r2(z);r3(x);r3(y);w1(x);w3(y);r2(y);w2(z);w2(y) 
  i) Show the Cycle and Precedence Graph ii) Is it Conflict Serializable?

Ans:

Schedule
T1 T2 T3
r(x)
r(z)
r(x)
r(y)
w(x)
w(y)
r(y)
w(z)
w(y)

For x
T1 T2 T3
r(x)
r(x)
w(x)
T1
T1 T2
1

For y
T1 T2 T3
r(y)
w(y)
r(y)
w(y)

T1
T1 T2
1

For z
T1 T2 T3
r(z)
w(z)
T1T1
T1T1

T2
T1
1

T1 T2
T1 T1
1 1

T3
T1
1

This is conflict serializable as it has no loop or cycle.

You might also like