You are on page 1of 4

Week9- Tutorial

Database Performance Tuning and Query Optimization

1. What is SQL performance tuning?

SQL performance tuning is the process of making SQL queries run faster and use fewer
resources on the server side.It is process of enhancing SQL queries to accelerate the servers
performance, If we create our queries well in SQL, it will provide the information needed
from the database quickly and with less stress on the server.

2. What is the focus of most performance tuning activities, and why does that focus exist?

The focus of most performance tuning activities typically revolves around improving the
efficiency, speed, and resource utilization of computer systems, software applications, and
databases by minimizing the number of input output operation as this operation are much
slower than directly reading data from the data cache. This focus exists to enhance user
experience, optimize resource allocation, and maximize the throughput of systems.

3. How are database statistics obtained?

Database statistics are essential pieces of information gathered by Database Management


Systems (DBMS) to provide insights into the characteristics and performance of the database
objects. These statistics include various measurements related to tables, indexes, and system
resources such as the number of processors, processor speed, and available temporary space.
They play a important role in optimizing query processing efficiency and overall database
performance.

Database statistics can be gathered either manually by the Database Administrator (DBA)
or automatically by the Database Management System (DBMS) itself. Database
administrators can manually gather statistics using commands provided by the DBMS For
example many DBMS vendors supports the SQL’s ANALYZE command which is
commonly used across different database platforms to collect statistics. Many DBMS also
offer built-in mechanisms for automatic database statistics collection.

4. If indexes are so important, why not index every column in every table? (Include a brief
discussion of the role played by data sparsity).

Indexing every column in every table may seem like a simple solution to improve query
performance, but it has a major drawback. Although indexes are essential for good data
storage, processing and maintenance costs. There are several reasons why indexing every
row in every table is not a common method some of them are listed below:
 Index maintenance overhead: Each index requires additional storage space to store
evaluation values and point to the table pointer to the corresponding row. Index
maintenance involves updating the index every time an insert, update, or delete is
made to the base table. Indexing every column in every table will increase the
overhead associated with indexing operations, slow down data updates, and require
more storage.

 Performance: If we create many indexes, the DBMS will take more time to evaluate
and choose between different ways to access the index; This will lead to weak queries
and longer queries.

 Storage Overhead: Indexes use additional disk space, which can be significant,
especially for tables with many rows and columns. Analyzing every row in every
table will increase dependency on index management, which leads to higher storage
costs and resource usage.

 Data sparsity: Data sparsity refers to the number of different values in a column.
Data sparsity helps to determine whether indexing a particular column would be
beneficial or not Columns with low sparsity, which have a limited number of distinct
values, may not benefit significantly from indexing. For example, columns like
marital status with only a few distinct values may have low sparsity. On the other
hand, columns with high sparsity, such as Email address which are highly varied can
benefit from indexing because they offer more distinct values that need efficient
access. By analyzing the sparsity of each column, administrators can prioritize
indexing those with high sparsity. It also prevents unnecessary indexing of columns
with low sparsity, hence optimizing resource utilization within the database system.

5. Most query optimization techniques are designed to make the optimizer’s work
easier. What factors should you keep in mind if you intend to write conditional
expressions in SQL code?

When writing conditional expressions in SQL code, following things should be considerd
to optimize query performance:

 Simple columns or literal should be used as operands in a conditional expression.


The use of conditional expressions with functions should be avoided whenever
possible.
 Numeric field comparisons are Faster than comparing to character, date or null.
For example: Comparing a numeric field like "Age" to a literal value such as Age
= 30 is faster than comparing a date field or character field.

 Equality comparisons are faster than inequality comparisons. For example,


comparison like Quantity > 100 is slower than comparison like Quantity = 100.

 Transform Conditional Expressions to Use Literals. For example, transfer


expression like Quantity - 10 = 5 to Quantity = 15 to simplify and optimize the
query.

 When using multiple conditional expressions, the equality conditions must be


written first. As equality conditions are faster to process.

 If there are multiple AND conditions, we should write the condition most likely to
be false first.

 Avoid using the NOT logical operator whenever possible. For example, instead of
using NOT (Quantity > 100), we should use Quantity <= 100

6. What does RAID stand for, and what are some commonly used RAID levels?

RAID stands for Redundant Array of Independent Disks. It is used in computer storage to
combine multiple physical disk drives into a single logical unit for the purpose of data
redundancy, performance enhancement, or both. RAID systems are designed to balance
between performance and fault tolerance by distributing data across multiple disks in
various configurations.

RAID systems support various RAID levels, where each of them offer different
configurations for data redundancy and performance enhancement. Some commonly used
RAID levels are:

RAID 0: It utilizes striping (data blocks are spread over separate drives) without
redundancy, It is purely used for performance improvement.

RAID 1: It utilizes mirroring for data redundancy. Here the same data blocks are written
to separate drives. It provides data redundancy and fault tolerance because each disk in
the array contains an identical copy of the data.

RAID 3: It utilizes striping with dedicated parity, where data is distributed across disks
and parity information is stored on a dedicated disk for fault tolerance. It offers good read
performance for sequential data access but may suffer in write performance due to parity
calculation.
RAID 5:It utilizes striping with distributed parity, providing both performance and fault
tolerance.

7. Answer questions 7 (a) and (b), based on the following query:

SELECT EMP_SEX

FROM WHERE ORDER BY

EMP_LNAME, EMP_FNAME, EMP_AREACODE,

EMPLOYEE
EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’ EMP_LNAME, EMP_FNAME;

1. What is the likely data sparsity of the EMP_SEX column?

Here in the EMP_SEX column the query filters for employees with a specific
gender (Female). Most likely there are only two distinct values for the EMP_SEX
column: M: Male and F: Female. So the data sparsity of the column EMP_SEX
would be low as this column as only two possible values.

2. What indexes should you create? Write the required SQL commands.

To optimize the query performance for the given SQL statement, we should create the
following indexes:

The first index is idx_EMP_SEX_AREACODE which includes the EMP_AREACODE


columns.

The second index idx_EMP_LNAME_FNAME_AREACODE includes the columns


EMP_LNAME, EMP_FNAME.

SQL commands:

CREATE INDEX idx_EMP_SEX_AREACODE ON EMPLOYEE


(EMP_AREACODE);

CREATE INDEX idx_EMP_LNAME_FNAME_AREACODE ON EMPLOYEE


(EMP_LNAME, EMP_FNAME);

You might also like