Professional Documents
Culture Documents
1. What is Query Optimization? Briefly discuss the techniques of Query Optimization with suitable examples.
Ans. Query optimization is of great importance for the performance of a relational database, especially for the execution of complex
SQL statements. A query optimizer decides the best methods for implementing each query.
The query optimizer selects, for instance, whether or not to use indexes for a given query, and which join methods to use when
joining multiple tables. These decisions have a tremendous effect on SQL performance, and query optimization is a key technology
for every application, from operational Systems to data warehouse and analytical systems to content management systems.
techniques to optimize the query performance in SQL. There are some useful practices to reduce the cost. But, the process of
optimization is iterative. One needs to write the query, check query performance using io statistics or execution plan, and then
optimize it. This cycle needs to be followed iteratively for query optimization.
The SQL Server itself also finds the optimal and minimal plan to execute the
query.
Indexing
An index is a data structure used to provide quick access to the table based on a search key. It helps in minimizing the disk access to
fetch the rows from the database. An indexing operation can be a scan or a seek. An index scan is traversing the entire index for
For example,
ON P.ProductCategoryID = pc.ProductCategoryID
ON p.ProductID = sod.ProductID
WHERE p.ProductID>1
In the above query, we can see that a total of 99% of the query execution time goes in index seek operation. Therefore, it is an
optimization process.
2. Indexes should not be made on columns that are frequently modified i.e UPDATE command is applied on these columns
frequently.
3. Indexes should be made on Foreign keys where INSERT, UPDATE, and DELETE are concurrently performed. This allows
Selection
Selection of the rows that are required instead of selecting all the rows should be followed. SELECT * is highly inefficient as it scans
required.
task, it basically groups together related rows and then removes them. GROUP BY operation is a costly operation. So to fetch distinct
rows and remove duplicate rows, one might use more attributes in the SELECT operation.
As we can see from the execution of the above two queries, the DISTINCT operation takes more time to fetch the unique rows. So, it
is better to add more attributes in the SELECT query to improve the performance and get
unique rows.
2. What do you understand by query optimization? What are query trees? Explain with an
example.
ANS:-
3. Consider the following two tables...... Consider the query i) Draw the query tree for the given
query.
4. List and explain the steps followed to process a high level query.
Query Processing would mean the entire process or activity which involves query translation into low level instructions, query
optimization to save resources, cost estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which typically needs resources such as
memory/RAM].
The major steps involved in query processing are depicted in the figure below;
as a query tree or a query graph. Then alternative execution strategies are devised for retrieving results from the database tables. The
process of choosing the most appropriate execution strategy for query processing is called query optimization.
Relational algebra defines the basic set of operations of relational database model. A sequence of relational algebra operations forms
a relational algebra expression. The result of this expression represents the result of a database query.
In DDBMS, query optimization is a crucial task. The complexity is high since number of alternative strategies may increase
Hence, in a distributed system, the target is often to find a good execution strategy for query processing rather than the best one. The
SQL queries are translated into equivalent relational algebra expressions before optimization. A query is at first decomposed into
smaller query blocks. These blocks are translated to equivalent relational algebra expressions. Optimization includes optimization of
each block and then optimization of the query as a whole.
Ans. When a query is placed, it is at first scanned, parsed and validated. An internal representation of the query is then created such as
a query tree or a query graph. Then alternative execution strategies are devised for retrieving results from the database tables. The
process of choosing the most appropriate execution strategy for query processing is called query optimization.
Relational algebra defines the basic set of operations of relational database model. A sequence of relational algebra operations forms a
relational algebra expression. The result of this expression represents the result of a database query.
In DDBMS, query optimization is a crucial task. The complexity is high since number of alternative strategies may increase
Hence, in a distributed system, the target is often to find a good execution strategy for query processing rather than the best one. The
11. Describe the architecture of distributed databases with the help of a diagram.
Distributed Database Architecture
A distributed database system allows applications to access data from local and remote
databases. In a homogenous distributed database system, each database is an Oracle Database.
In a heterogeneous distributed database system, at least one of the databases is not an Oracle
Database. Distributed databases use a client/server architecture to process information requests.
The section contains the following topics:
• Homogenous Distributed Database Systems
• Heterogeneous Distributed Database Systems
• Client/Server Database Architecture
Homogenous Distributed Database Systems
A homogenous distributed database system is a network of two or more Oracle Databases that
reside on one or more machines. Figure 31-1 illustrates a distributed system that connects three
databases: hq, mfg, and sales. An application can simultaneously access or modify the data in
several databases in a single distributed environment. For example, a single query from a
Manufacturing client on local database mfg can retrieve joined data from the products table on
the local database and the dept table on the remote hq database.
For a client application, the location and platform of the databases are transparent. You can also
create synonyms for remote objects in the distributed system so that users can access them with
the same syntax as local objects. For example, if you are connected to database mfg but want to
access data on database hq, creating a synonym on mfg for the remote dept table enables you to
issue this query:
SELECT * FROM dept;
In this way, a distributed system gives the appearance of native data access. Users on mfg do not
have to know that the data they access resides on remote databases.
Figure 31-1 Homogeneous Distributed Database
13. List the functions, advantages and disadvantages of distributed of DDBMS (Distributed
Database Management Systems)
• Improved ease and flexibility of application development
• Developing and maintaining applications at geographically distributed sites of an organization is facilitated owing to
transparency of data distribution and control.
• This is achieved by the isolation of faults to their site of origin without affecting the other databases connected to the network.
• When the data and DDBMS software are distributed over several sites, one site may fail while other sites continue to operate.
• Only the data and software that exist at the failed site cannot be accessed. This improves both reliability and availability.
• Improved performance.
• A distributed DBMS fragments the database by keeping the data closer to where it is needed most.
• Data localization reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in wide
area networks.
• When a large database is distributed over multiple sites, smaller databases exist at each site. As a result, local queries and
transactions accessing data at a single site have better performance because of the smaller local databases.
• In addition, each site has a smaller number of transactions executing than if all transactions are submitted to a single
centralized database.
• Moreover, interquery and intraquery parallelism can be achieved by executing multiple queries at different sites, or by
breaking up a query into a number of subqueries that execute in parallel. This contributes to improved performance.
• Easier expansion
• In a distributed environment, expansion of the system in terms of adding more data, increasing database sizes, or adding more
Functions of DDB
• The ability to access remote sites and transmit queries and data
among the various sites via a communication network.
• The ability to devise execution strategies for queries and transactions that access data from more than one site and to
synchronize the access to distributed data and maintain the integrity of the overall database.
• The ability to decide which copy of a replicated data item to access and to maintain the consistency of copies of a replicated
data item.
• The ability to recover from individual site crashes and from new types of failures, such as the failure of communication links.
• Security.
• Distributed transactions must be executed with the proper management of the security of the data and the authorization/access
privileges of users.
14. What are Mobile Databases? Explain the characteristics of mobile databases. Give an
application of mobile databases.
What is mobile computing?
• Users with portable computers still have network connections while they move.
• Mobile Computing is an umbrella term used to describe technologies that enable people to access network services anyplace,
anytime, and anywhere.
• Mobile data-driven applications enable us to access any data from anywhere, anytime.
Examples:
• Salespersons can update sales records on the move.
• Reporters can update news database anytime.
• Doctors can retrieve patient’s medical history from anywhere.
Mobile DBMSs are needed to support these applications data processing capabilities.
The characteristics of mobile computing include high communication latency, intermittent wireless connectivity, limited battery
life, and changing client location
Latency is caused by the processes unique to the wireless medium, such as coding data for wireless transfer, and tracking
and filtering wireless signals at the receiver
Intermittent connectivity can be intentional or unintentional; unintentional disconnections happen in areas where wireless
signals cannot reach, e.g., elevator shafts or subway tunnels; Intentional disconnections occur by user intent, e.g., during an
airplane takeoff, or when the mobile device is powered down
Battery life is directly related to battery size, and indirectly related to the mobile device’s capabilities
Client locations are expected to change, which alters the network topology and may cause their data requirements to change
All these characteristics impact data management, and robust mobile applications must consider them
To compensate for high latencies and unreliable connectivity, clients cache replicas of important, frequently accessed data, and
work offline, if necessary; Besides increasing data availability and response time, caching can also reduce client power consumption
by eliminating the need to make energy-consuming wireless data transmission for each data access
The server may not be able to reach a client; A client may be unreachable because it is dozing – in an energy-conserving state in
which many subsystems are shut down – or because it is out of range of a base station; In either case, neither client nor server can
reach the other, and modifications must be made to the architecture in order to compensate for this case;
Proxies for unreachable components are added to the architecture; For a client (and symmetrically for a server), the
proxy can cache updates intended for the server; When a connection becomes available, the proxy automatically forwards
these cached updates to their ultimate destination
18. What is the difference between discretionary and mandatory access control?
DAC MAC
DAC stands for Discretionary Access Control. MAC stands for Mandatory Access Control.
In DAC, the owner can determine the access and privileges In MAC, the system only determines the access and the
and can restrict the resources based on the identity of the resources will be restricted based on the clearance of the
users. subjects.
DAC MAC
Users will be provided access based on their identity and not Users will be restricted based on their power and level of
using levels. hierarchy.
DAC has complete trust in users. MAC has trust only in administrators.
DAC is vulnerable to trojan horses. MAC prevents virus flow from a higher level to a lower level.
19. Discuss the types of privileges at the account level and those at the relation level.
There are two levels of privileges to be assigned to use the database system, account level and relation (or table level).
•At account level, each account of the relation holds particular privileges independently specified by the database administrator in the database.
•Atrelationlevel,eachindividualrelationorview inthedatabaseaccessingprivilegesare controlled by database administrator. Account level It includes,
1.CREATE SCHEMA or CREATE TABLE privilege, to create a schema.
2.CREATE VIEW privilege.
3.ALTER privilege, to perform changes such as adding or removing attributes.
4.DROP privilege, to delete relations or views.
5.MODIFY privilege, to insert, delete, or update tuples.
6.SELECT privilege, to retrieve information from the database. Relation level
•It refers to either base relation or view (virtual) relation.
•Each type of command can be applied for each user by specifying the individual relation. Access matrix model, an authorization model is used for granting
and revoking of privileges
20. What is the goal of encryption? What process is involved in encrypting data and then recovering it at the other end.
With more and more organizations moving to hybrid and multicloud environments, concerns are growing about public cloud security
and protecting data across complex environments. Enterprise-wide data encryption and encryption key management can help protect
data on-premises and in the cloud.
Cloud service providers (CSPs) may be responsible for the security of the cloud, but customers are responsible for security in the
cloud, especially the security of any data. An organization’s sensitive data must be protected, while allowing authorized users to
perform their job functions. This protection should not only encrypt data, but also provide robust encryption key management, access
control and audit logging capabilities.
Robust data encryption and key management solutions should offer:
• A centralized management console for data encryption and encryption key policies and configurations
• Encryption at the file, database and application levels for on-premise and cloud data
• Role and group-based access controls and audit logging to help address compliance
• Automated key lifecycle processes for on-premise and cloud encryption keys
Cryptography is the science of encoding information before sending via unreliable communication paths so that only an authorized
receiver can decode and use it.
The coded message is called cipher text and the original message is called plain text. The process of converting plain text to cipher
text by the sender is called encoding or encryption. The process of converting cipher text to plain text by the receiver is called
decoding or decryption.
The entire procedure of communicating using cryptography can be illustrated through the following diagram −
21. What is flow control as a security measure? What type of flow control exist?
Distributed systems encompass a lot of data flow from one site to another and also within a site. Flow control prevents data from
being transferred in such a way that it can be accessed by unauthorized agents. A flow policy lists out the channels through which
information can flow. It also defines security classes for data as well as transactions.
22. Discuss what is meant by each of the following terms:
a) Database authorization
Authorization is the process where the database manager gets information about the authenticated user. Part of that information
is determining which database operations the user can perform and which data objects a user can access.
b) Access Control
Database access control is a method of allowing access to company's sensitive data only to those people (database users) who
are allowed to access such data and to restrict access to unauthorized persons. It includes two main components: authentication
and authorization.
c) Data Encryption
Data encryption is a way of translating data from plaintext (unencrypted) to ciphertext (encrypted). Users can access encrypted
data with an encryption key and decrypted data with a decryption key. Protecting your data.
d) Privileged (system) account
A privileged account is a login credential to a server, firewall, or another administrative account. Often, privileged accounts are
referred to as admin accounts. Your Local Windows Admin accounts and Domain Admin accounts are examples of admin accounts.
Other examples are Unix root accounts, Cisco enable, etc.
e) Database audit
Database auditing involves observing a database so as to be aware of the actions of database users. Database administrators and
consultants often set up auditing for security purposes, for example, to ensure that those without the permission to access information
do not access it.
f) Audit trial
Whenever an action is performed on the database resources an audit trail of information including what database object was
impacted, who performed the operation, and when is generated, if the DBMS supports a very high level of auditing, a record of what
actually changed might also be maintained.
g) Granting a privilege
The GRANT (privilege) statement grants privileges on the database as a whole or on individual tables, views, sequences or
procedures. It controls access to database objects, roles, and DBMS resources. Details about using the GRANT statement with role
objects is described in GRANT (role).
h) Revoking a privilege
The REVOKE SQL Definition Privileges authorization statement removes from one or more users or groups the privilege of
performing selected actions on a specified access module, schema, table, view, function, procedure or table procedure.
i) Covert channels
A covert channel is any communication channel that can be exploited by a process to transfer information in a manner that
violates the systems security policy. In short, covert channels transfer information using non-standard methods against the system
design.
23. What is mixed fragmentation? Give an example.
Hybrid Data Fragmentation:
This is the combination of horizontal as well as vertical fragmentation. This type of fragmentation will have horizontal fragmentation
to have subset of data to be distributed over the DB, and vertical fragmentation to have subset of columns of the table.
As we observe in above diagram, this type of fragmentation can be done in any order. It does not have any particular order. It is
solely based on the user requirement. But it should satisfy fragmentation conditions. Consider the EMPLOYEE table with below
fragmentations.
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques is used. This is the most flexible
fragmentation technique since it generates fragments with minimal extraneous information. However, reconstruction of the original
table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
• At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the horizontal
fragments.
• At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the vertical
fragments.
SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE
FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA;
SELECT EMP_ID, DEPTID FROM EMPLOYEE WHERE EMP_LOCATION = ‘INDIA;
SELECT EMP_ID, EMP _FIRST_NAME, EMP_LAST_NAME, AGE
FROM EMPLOYEE WHERE EMP_LOCATION = ‘US;
SELECT EMP_ID, PROJID FROM EMPLOYEE WHERE EMP_LOCATION = ‘US;
This is a hybrid or mixed fragmentation of EMPLOYEE table.
24. How is horizontal partitioning of a relation specified? How can a relation be put back together from a complete horizontal
partitioning?
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows. For
example, a table that contains 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one
month of data for a specific year.
To horizontally partition a table, select a single table in a model, and click the Horizontal Partition icon on the Transformations
toolbar. Use the Horizontal Partitioning Wizard to:
▪ Specify how many partitioned tables to create.
▪ Enter a name for the partitioned tables.
▪ Enter, for notational purposes only, criteria for how you place rows from the table you choose to partition into the new
partitions. You can enter a script (SQL SELECT statement) and store the text for annotation purposes.
Result of Horizontally Partitioning a Table
When you click Horizontal Partition to horizontally partition a table, you:
▪ Create a new table for each partition that you specify. Each partitioned table contains all primary key and non-key
columns from the source table. The primary key of each partitioned table is the primary key of the source table. The
partitioned tables appear in the Model Explorer under the Tables folder.
▪ Create all relationships associated with the source table that you horizontally partition, and preserves all migrating keys.
▪ Preserve the properties from the source columns. The properties from the source table are not preserved.
The primary key is duplicated to allow the original table to be reconstructed. Using union operation to reconstruct them.
25. How is vertical partitioning of a relation specified? How can a relation be put back together from a complete vertical partitioning? Ans. Vertical
partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this
splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. The primary key is
duplicated to allow the original table to be reconstructed. Using join operation to reconstruct them. . 26. Consider the following relation: i) Give an example
of two simple predicates that would be meaningful for the relation for horizontal partitioning. ii) Give an example of two simple predicates that would be
meaningful for the relation for vertical partitioning. Ans. Horizontal: Let R be a relation, and A1, ..., An be its attributes with the corresponding domains
Dom(A1), ..., Dom(An). A predicate represents a pure boolean expression over the attributes of a relation R and constants of the attributes’ domains. An
atomic predicate p is a relationship among attributes and constants of a relation. For example, (A1 < A2) and (A3 >= 5) are atomic predicates. Then, the set
of all predicates over a relation R is: φ ::= p | ¬φ | φ1 ∧ φ2 | φ1 ∨ φ2 We define horizontal partitioning as a pair (R, φ), where R is a relation and φ is a
predicate, which partitions R into at most 2 fragments (sub-relations) with the identical structure (i.e. the same set of attributes), one per each truth value
of φ. The first fragment includes all tuples t of R which satisfy φ, i.e. t ² φ. The second fragment includes all tuples t of R which do not satisfy φ, i.e. t 2 φ. It is
possible one of the fragments to be empty if all tuples of R either satisfy or do not satisfy φ. Note that, the partitioning (R, φ) is identical to (R, ¬ φ). If we
apply the predicate true (or f alse) to a relation, then it remains undivided. Example 1. Let R = (A1 int, A2 int, A3 date) be a relation. It can be divided into 2
partitions by using one of the following predicates: – φ = (A1 = A2), which results into a fragment where the values of A1 and A2 are equal for all tuples, and
a fragment where those values are different. – φ = (A3 >=0 01 − 01 − 070 ) ∧ (A3
29. How do spatial databases differ from regular databases? What are the different types of spatial
data?
SPATIAL DATABASE REGULAR DATABASE
• It answers where things are. • It answers what and how much things are.
• It describes the absolute and relative location of • Characteristics of geographical features that are
geographical objects. qualitative or quantitative in nature.
• Satellite maps and scanned images help to obtain spatial • Forest managers, fire departments, environmental groups,
data. and online media helps to obtain non-spatial data.
• Relationships among spatial attributes are implicit. For • Relationships among non-spatial attributes are explicit.
example, boundaries 1 and 2 could be neighbours, but For example, two different attributes may be a part of, a
cannot be explicitly represented. subclass of, a member of, or represented in the form of
arithmetic values or orders.
• Types of spatial data: Raster Data – Composed of grids • Types of non-spatial data: Nominal Data, Ordinal Data,
or pixels and identified by rows and columns. Vector Interval Data, Ratio Data
Data – Composed of points, lines, and polygons.
• Examples of spatial data are maps, photographs, satellite • Examples of non-spatial data are names, phone numbers,
images, scanned images, roads rivers, contours, etc. area, postal code, rainfall, population, etc.
30. What is an Object identifier? Explain with an example. What are its advantages and
disadvantages?
An identifier is a string of characters (up to 255 characters in length) used to identify first-class Snowflake “named” objects,
including table columns:
• Identifiers are specified at object creation time and then are referenced in queries and DDL/DML statements.
• Identifiers can also be defined in queries as aliases (e.g. SELECT a+b AS "the sum";).
Object identifiers, often simply referred to as object names, must be unique within the context of the object type and the “parent”
object:
Account
Identifiers for account objects (users, roles, warehouses, databases, etc.) must be unique across the entire account.
Databases
Identifiers for schemas must be unique within the database. To enable resolving schemas that have the same identifiers across
databases, Snowflake supports fully-qualifying the schema identifiers in the form of:
<database_name>.<schema_name>
Schemas
Identifiers for schema objects (tables, views, file formats, stages, etc.) must be unique within the schema. To enable resolving
objects that have the same identifiers in different databases/schemas, Snowflake supports fully-qualifying the object identifiers
in the form of:
<database_name>.<schema_name>.<object_name>
Tables
● It possesses consolidated historical data, which helps the organization to analyze its business.
● A data warehouse helps executives to organize, understand, and use their data to make strategic decisions.
1. Non-technical People- Business users are the non technical people who need to gather information in a summarized, elementary
2 FEATURES
The four characteristics of a data warehouse, also called features of a data warehouse, include SUBJECT ORIENTED, TIME
VARIANT, INTEGRATED and NON-VOLATILE.
The three prominent ones among these are. INTEGRATED, TIME VARIANT, NON VOLATILE.
Subject oriented, on the other hand, is an unique feature of the data warehouse. These features of a data warehouse differentiate it from
any other set of databases or data by
characterization.
1. Subject Oriented
Analysis of the data for the decision makers of a business can be done easily by constricting to a particular subject area of the Data
warehouse. This makes
understanding and analysis of the data concise and straightforward by excluding the unwanted information on some subject that is not
needed for decision-making. This
means that the ongoing operations of an organization are not taken into consideration.
2. Integrated
Data warehouses consist of data from different variable sources integrated under one platform. This data obtained is extracted and
transformed maintaining uniformity without depending on the source it was obtained from, this feature is known as Integrated.
Standards are established which are universally acceptable for the data present in the warehouse.
3. Time Variant
One of the important properties of the data warehouse is the historical perspective it holds. It keeps the huge volume of data from all
databases stored in accordance with the elements of time. It consists of a temporal element and extensive time horizon. Inability to
change the element of time is an essential aspect of time variance. Record
key is used to display time variance.
4. Non-Volatile
Data is updated by uploading data in the data warehouse to protect data from momentary changes. This means that once a data is fed,
there can be no alteration or changes made. The inability to be erased is called the non-volatile character of the data
warehouse environment. data is read only and allows only two functions to be
performed: Access and Loading.
https://www.geeksforgeeks.org/difference-between-database-system-and-data-warehouse/
https://www.geeksforgeeks.org/multidimensional-data-model/#:~:text=OLAP%20(online%20anal
ytical%20processing)%20and,from%20many%20dimensions%20and%20perspectives.
https://www.geeksforgeeks.org/difference-between-olap-and-oltp-in-dbms/
Schema is a logical description of the entire database. It includes the name and description of records of all record types including all
associated data-items and aggregates. Much like a database, a data warehouse also requires to maintain a schema. A database uses
relational model, while a data warehouse uses Star,
Snowflake, and Fact Constellation schema.
Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables.
It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema.
In the following Star Schema example, the fact table is at the center which contains keys to every dimension table like Dealer_ID,
Model ID, Date_ID,
Product_ID, Branch_ID & other attributes like Units sold and revenue.
Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram
resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds
additional dimensions. The dimension tables are normalized which splits data
into additional tables.
In the following Snowflake Schema example, Country is further normalized
into an individual table.
● The following diagram shows two fact tables, namely sales and shipping.
● The sales fact table is same as that in the star schema.
● The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key, from_location, to_location.
● The shipping fact table also contains two measures, namely dollars sold and units sold.
● It is also possible to share dimension tables between fact tables. For example, time, item, and location dimension tables
Data Marts
https://www.javatpoint.com/data-warehouse-what-is-data-mart
● Data mart is a subordinate of data warehouse which helps in providing output for
https://www.geeksforgeeks.org/characteristics-and-functions-of-data-warehouse/