You are on page 1of 59

SQL: Queries, Schemas, and Joins-

OH MY!!!
Geoff Noel
You may ask yourself
Well, How did I get
here.
David Byrne Once in a lifetime



Databases come in all different shapes and sizes. They can be flat files of ASCII
data (like Access or Q&A) or complex binary tree structures (Oracle or Sybase).
In any form, a database is a data store, or a place that holds data.

If a database is simply a collection of data, then what keeps track of
changes to this data?

That is the job of the database management system, or DBMS. Some DBMSs are
relational. Those are RDBMS. The relational part refers to the fact that separate
collections of data within the reaches of the RDBMS can be looked at together in
unison. The RDBMS is responsible for ensuring the integrity of the database.
Sometimes, things will get out of whack and the RDBMS will keep all that data in
line.
Databases Overview
What Is a Database?

What is a database and database management system.
A database is a collection of related data.
A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database.

Why use a database management system?
To control redundancy.
To restrict unauthorized access.
To provide persistent storage for program objects and data structures.
To permit inference and actions using rules.
To provide multiple user interfaces.
To representing complex relationships among data.
To enforce integrity constraints.
To provide backup and recovery.
Database Client/Server Methodology

The evolution of relational data storage began in 1970 with the work of Dr.
E. F. Codd, who proposed a set of 12 rules for identifying relationships
between pieces of data. Codd's rules formed the basis for the development
of systems to manage data. Today, Relational Database Management
Systems (RDBMS) are the result of Codd's vision.

Data in an RDBMS are stored as rows of distinct information in tables. A
structured language is used to query (retrieve), store and change the data.
The Structured Query Language (SQL) is an ANSI standard, and all major
commercial RDBMS vendors provide mechanisms for issuing SQL
commands.
Single-Tier Database Design
The early development of RDBMS applications utilized an integrated model of user interface code,
Application code and database libraries. This single binary model ran only on a local machine,
typically a mainframe. The applications were simple but inefficient and did not work over LANs. The
model did not scale the application and user interface code was tightly coupled to the database
libraries.
The monolithic single-tier database design
The two-tier database design
Two-Tier Database Design

Two-tier model appeared with the advent of server technology. Communication-protocol development and
extensive use of local and wide area networks allowed the database developer to create an application front
end that accessed data through a connection (socket) to the back-end server. A two-tier database design,
where the client software is connected to the database through a socket connection.
Client programs (applying a user interface) send SQL requests to the database server. The server returns the
appropriate results, and the client is responsible for the formatting and display of the data. Clients still use a
vendor-provided library of functions that manage the communication between client and server. Most of these
libraries are written in either the C language or Perl.
Commercial database vendors realized the potential for adding intelligence to the database server. They
created proprietary techniques that allowed the database designer to develop macro programs for simple
data manipulation. These macros, called stored procedures, can cause problems relating to version control
and maintenance. Because a stored procedure is an executable program living on the database, it is possible
for the stored procedure to attempt to access named columns of a database table after the table has been
changed.
For example, if a column with the name id is changed to cust_id, the meaning of the original stored
procedure is lost. The advent of triggers, which are stored procedures executed automatically when some
action (such as insert) happens with a particular table or tables, can compound these difficulties when the
data returned from a query are not expected. Again, this can be the result of the trigger reading a table
column that has been altered.
The vendor-provided library limits them. Switching from one
database vendor to another requires a rewrite of a significant
amount of code to the client application.

Version control is an issue. When the vendor updates the client-side
libraries, the applications that utilize the database must be
recompiled and redistributed.

Vendor libraries deal with low-level data manipulation. Typically,
the base library only deals with queries and updates of single rows
or columns of data. This can be enhanced on the server side by
creating a stored procedure, but the complexity of the system then
increases.

All of the intelligence associated with using and manipulating the data is
implemented in the client application, creating large client-side runtimes. This
drives up the cost of each client set.

Limitations of Two-Tier Database Design
Three-Tier Database Design
In a multi-tier design, the client communicates with an intermediate server that provides a
Layer of abstraction from the RDBMS. The intermediate layer is designed to handle multiple
Client requests and manage the connection to one or more database servers. There does not
have to be just three tiers, but conceptually this is the next step.
Three-Tier Database Design
Advantages of Three-Tier Design

It is multithreaded to manage multiple client connections simultaneously.

It can accept connections from clients over a variety of vendor-neutral
protocols (from HTTP to TCP/IP), then hand off the requests to the
appropriate vendor-specific database servers, returning the replies to the
appropriate clients.

It can be programmed with a set of "business rules" that manage the
manipulation of the data. Business rules could include anything from
restricting access to certain portions of data to making sure that data is
properly formatted before being inserted or updated.

It prevents the client from becoming too heavy by centralizing process-
intensive tasks and abstracting data representation to a higher level.
It isolates the client application from the database system and frees a
company to switch database systems without having to rework the business
rules.
It can asynchronously provide the client with the status of a current data
table or row.
Different RDBMS - History
DB2/UDB
NCR Teradata
Ingres
Informix
Sybase
MySQL
Gupta/Centura -SQLbase
DBase
Paradox . . . Many others


Other
Databases
7.x
8.0.x
8.1.x ( aka 8i)
9.2
10G
Oracle
Sybase/ SQL Server 6.0
6.5
7.0
2000
2005
MS SQL Server
Various Methods of Connection
ODBC
JDBC
Native
OLE/ADO
BCP
SQL-NET
SQL-LOADER
HPL
Tools

I-SQL
SQL Worksheet
Enterprise Manager
Toad
DB Artisan
Query Analyzer
Win SQL
O-SQL
SQL-Plus


Enterprise Manager
DB Artisan
MS Query
Access
Crystal
Data Dictionary
ERD (Entity
Relationship Diagram)

Schema By Object Type
Schema By Owner
SQLbase SQL Talk utility
SQLbase SQL COnsole
What is a Schema ?
Pronounced -> skee-ma.

The structure of a database system, described in a formal language supported by
the database management system (DBMS). In a relational Database (RDBMS),
the schema defines the tables, the fields in each table, and the
Relationships between fields and tables. Schemas are generally stored in a data
dictionary. Although a schema is defined in text database language, the term is
often used to refer to a graphical depiction of the database structure.

Database Objects
DDL- data definition
Tables
Columns
Data-types
Indexes
Primary key
Foreign key
Views
Table-spaces
Partitions
Constraints
Synonyms
DML- data manipulation
Packages
Triggers
Procedures
Functions
Constraints
Sequences
Database actions
S Select .. > Queries
U - Update
I - Insert
D - Delete
Table Basics
A relational database system
contains one or more objects
called tables. The data or
information for the database are
stored in these tables. Tables are
uniquely identified by their names
and are comprised of columns and
rows. Columns contain the column
name, data type, and any other
attributes for the column. Rows
contain the records or data for the
columns. Here is a sample table
called "weather".

city, state, high, and low are the
columns. The rows contain the
data for this table:
Weather
city state high low
Phoenix Arizona 105 90
Tucson Arizona 101 92
Flagstaff Arizona 88 69
San Diego California 77 60
Albuquerque New Mexico 80 72
Primary Key (PK)
A primary key is a table
column that can be used to
uniquely identify every row of
the table. Any column that
has this property will do --
these columns are called
candidate keys. A table can
have many candidate keys
but only one primary key. The
primary key cannot be null.



FooNumber FirstName LastName BarTab
21 Fred Jones 47
32 Bill Smith 23
87 Wendy Jones -
32 Bob Stikino 943
Composite Primary Key
A is a primary key consisting of more than one column. In the above
example, the combinations (RecordNo,FirstName), (RecordNo,Lastname),
(RecordNo,FirstName,Lastname), and (FirstName,LastName) are all
candidate keys. Any combination including Age is not a candidate key
because it contains a null.

Often, database designers add an extra column to their table designs, a
column defined as an integer, which will hold a number. In Microsoft
Access, this is an autonumber, in MySQL it's an auto-increment, in Oracle
it's a sequence, and in SQL/Server it's an identity column. As these names
suggest, this integer is automatically assigned by the database, usually
incrementally, sometimes using an initial value and increment that you can
specify. Some databases allow these numbers to be generated randomly.

The purpose of this type of automatically generated number is to act as the
surrogate primary key, usually in those situations similar to the above
where candidate keys are multi-column. The awkwardness of a multi-
column candidate key becomes apparent as soon as you define a foreign
key on it.
A foreign key is a column, or
combination of columns, that contain
values that are found in the primary key
of some table (including, possibly, itself).
A foreign key may be null, and almost
always is not unique.That last statement
may be counterintuitive, so let's take
another example. Here we have two
tables that are related via a foreign key

As you can see, the ClrFK column in the
second table is a foreign key to the ClrPK
primary key in the first table. Notice that
the ClrPK values are unique and not null,
but the ClrFK values may be null and
often repeat. A null foreign key means
that that particular row does not
participate in the relationship. The fact
that many foreign key values repeat
simply reflects the fact that it's a one-to-
many relationship.

In a one-to-many relationship, the
primary key has the "one" value, and the
foreign key has the "many" values. The
trick to remembering this is to keep in
mind that the primary key must be
unique.
ClrPK ColourName
10 yellow
20 red
30 green
40 blue
AdjPK AdjectiveName ClrFK
904 angry 20
913 envious 30
937 lazy -
941 lonely 40
954 fearful 10
979 jealous 30
991 furious 20
Foreign Key (FK)
It Aint nothing its NULL
Null means either "don't know" or "not
applicable" -- it's not really the same as zero or
any other default value, but more importantly,
null is treated quite differently from other values
in SQL, because it literally has no value.

Here's an example of a "don't know" null ->

As you can see, Fred's value is null, which you
could interpret as meaning that Fred didn't take
the test (maybe he has a medical exemption and
will take the test another day). It would be wrong
to assign a zero, because that would be
interpreted as Fred having taken the test and not
getting a single answer right!
Student TestResult
Joe 87
Bill 73
Mary 56
Fred null
Sam 92
Now consider the following query

SELECT AVG(TestResult) FROM Students;

Aggregate functions like AVG() and SUM() ignore nulls, so this query will return (87+73+56+92)/4=77,
which is certainly better than 87+73+56+0+92)/5=61.6 which you'd get using a zero default. Often a default
value is just wrong for a column where you expect to take aggregates.


An example of a column that would take a "not applicable" null is Date Terminated in a human resources
database, where the value would be null for all active employees. To test for nulls, you can filter them out in
the WHERE clause

SELECT EmployeeID , (DateTerminated - DateHired) AS LengthOfService FROM EmployeeTable
WHERE DateTerminated IS NOT NULL

which would give results only for terminated employees. If you didn't have the WHERE clause, the above
query would return null for every active employee, because any expression involving a null yields a null
result.

Alternatively, you can use the COALESCE function to supply a non-null value

SELECT EmployeeID , ( COALESCE(DateTerminated,GETDATE()) - DateHired) AS LengthOfService
FROM EmployeeTable where GETDATE()

returns today's date and therefore provides an accurate measure for the length of service of active
employees. So for terminated employees, DateTerminated is not null, and the calculation is the same as
above, while for active employees, DateTerminated is null so COALESCE uses today's date instead.

What SQL?
SQL or SEQUEL ?

SQL (pronounced "ess-que-el")

The acronym SQL stands for Structured Query Language

SQL is used to communicate with a database. According to ANSI (American National
Standards Institute), it is the standard language for relational database management
systems. SQL statements are used to perform tasks such as update data on a database,
or retrieve data from a database. Some common relational database management
systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc.
Although most database systems use SQL, most of them also have their own additional
proprietary extensions that are usually only used on their system. However, the standard
SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can
be used to accomplish almost everything that one needs to do with a database.

Some people claim that this is a bad name, because
SQL isn't (properly) structured
it's more than just queries (e.g. insert, update, delete)
it isn't a real computing language (Very Debatable)

In any case, SQL is a database query language that was adopted as an industry standard
in 1986. It has undergone two important revisions, SQL2 (also called SQL-92), and SQL3
(also called SQL-99).
Selecting Data

The select statement is used to query the
database and retrieve selected data that
match the criteria that you specify. Here is
the format of a simple select statement:
select "column1" [,"column2",etc] from
"tablename" [where "condition"];
[ ] = optional
The column names that follow the select
keyword determine which columns will be
returned in the results. You can select as
many column names that you'd like, or you
can use a "*" to select all columns.
The table name that follows the keyword
from specifies the table that will be queried
to retrieve the desired results.
The where clause (optional) specifies which
data values or rows will be returned or
displayed, based on the criteria described
after the keyword where.
Conditional selections used in the where
clause:
= Equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
<> Not equal to
LIKE *See next page Special Operator
select first, last, city from empinfo; select last,
city, age from empinfo where age > 30;

select first, last, city, state from empinfo where
first LIKE 'J%';

select * from empinfo;

select first, last, from empinfo where last LIKE
'%s';

select first, last, age from empinfo where last
LIKE '%illia%';

select * from empinfo where first = 'Eric';
Sample Table: empinfo
first last id
ag
e
city state
John Jones
9998
0
45 Payson Arizona
Mary Jones
9998
2
25 Payson Arizona
Eric
Edward
s
8823
2
32 San Diego
Californi
a
Mary
An
n
Edward
s
8823
3
32 Phoenix Arizona
Ginger Howell
9800
2
42
Cottonwoo
d
Arizona
Sebastia
n
Smith
9200
1
23 Gila Bend Arizona
Gus Gray
2232
2
35 Bagdad Arizona
Mary
An
n
May
3232
6
52 Tucson Arizona
Erica
William
s
3232
7
60 Show Low Arizona
Leroy Brown
3238
0
22 Pinetop Arizona
Elroy Cleaver
3238
2
22 Globe Arizona
select first, last, city from empinfo; select
last, city, age from empinfo where age >
30;


Sample Table: empinfo
first last id
ag
e
city state
John Jones
9998
0
45 Payson Arizona
Mary Jones
9998
2
25 Payson Arizona
Eric
Edward
s
8823
2
32 San Diego
Californi
a
Mary Ann
Edward
s
8823
3
32 Phoenix Arizona
Ginger Howell
9800
2
42
Cottonwoo
d
Arizona
Sebastia
n
Smith
9200
1
23 Gila Bend Arizona
Gus Gray
2232
2
35 Bagdad Arizona
Mary Ann May
3232
6
52 Tucson Arizona
Erica Williams
3232
7
60 Show Low Arizona
Leroy Brown
3238
0
22 Pinetop Arizona
Elroy Cleaver
3238
2
22 Globe Arizona
last city age
Jones Payson 45
Edwards San Diego 32
Edwards Phoenix 32
Howell Cottonwood 42
Gray Bagdad 35
May Tucson 52
Williams Show Low 60
select first, last, city, state from empinfo
where first LIKE 'J%';

Sample Table: empinfo
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
first last city state
John Jones Payson Arizona

select * from empinfo;

Sample Table: empinfo
first last id
ag
e
city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
select first, last, from empinfo where last
LIKE '%s';

Sample Table: empinfo
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
first last
John Jones
Mary Jones
Eric Edwards
Mary Ann Edwards
Erica Williams
select first, last, age from empinfo where last
LIKE '%illia%';

Sample Table: empinfo
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
first last age
Erica Williams 60
select * from empinfo where first = 'Eric';
Sample Table: empinfo
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
first last id age city state
Eric Edwards 88232 32 San Diego California
The LIKE pattern matching operator can also be
used in the conditional selection of the where
clause. Like is a very powerful operator that allows
you to select only rows that are "like" what you
specify. The percent sign "% can be used as
wild card to match any possible character
that might appear before or after the characters
specified.

For example:

select first, last, city from empinfo where
First LIKE 'Er%';

This SQL statement will match any first names
that start with 'Er'.

Strings must be in single quotes.

Or you can specify,
select first, last from empinfo where last
LIKE '%s';

This statement will match any last names that end
in a 's'.

select * from empinfo where first = 'Eric';

This will only select rows where the first name
equals 'Eric' exactly.

Sample Table: empinfo
first last id age city state
John Jones 99980 45 Payson Arizona
Mary Jones 99982 25 Payson Arizona
Eric Edwards 88232 32 San Diego California
Mary Ann Edwards 88233 32 Phoenix Arizona
Ginger Howell 98002 42 Cottonwood Arizona
Sebastian Smith 92001 23 Gila Bend Arizona
Gus Gray 22322 35 Bagdad Arizona
Mary Ann May 32326 52 Tucson Arizona
Erica Williams 32327 60 Show Low Arizona
Leroy Brown 32380 22 Pinetop Arizona
Elroy Cleaver 32382 22 Globe Arizona
The JOIN concept

JOIN is a query clause that can be used with the SELECT, UPDATE, and DELETE
data query statements to simultaneously affect rows from multiple tables. There
are several distinct types of JOIN statements that return different data result sets.

Joined tables must each include at least one field in both tables that contain
comparable data. For example, if you want to join a Customer table and a
Transaction table, they both must contain a common element, such as
CustomerID column, to serve as a key on which the data can be matched. Tables
can be joined on multiple columns so long as the columns have the potential to
supply matching information. Column names across tables don't have to be the
same, although for readability this standard is generally preferred.


When you do use like column names in multiple tables, you must use fully
qualified column names. This is a dot notation that combines the names of
tables and columns. For example, if I have two tables, Customer and Transaction,
and they both contain the column CustomerID, Id use the dot notation, as in
Customer.CustomerID and Transaction.CustomerID, to let the database know
which column from which table Im referring.


Now that weve examined the basic theory, lets take a look at the various types of
joins and examples of each.




The basic JOIN statement

A basic JOIN statement has the following format:

SELECT Customer.CustomerID, TransID, TransAmt FROM Customer JOIN Transaction
ON Customer.CustomerID = Transaction.CustomerID;

In practice, you'd never use the example above because the type of join is not specified. In this case,
SQL Server assumes an INNER JOIN. You can get the equivalent to this query by using the
statement:

SELECT Customer.CustomerID, TransID, TransAmt FROM Customer, Transaction;

However, the example is useful to point out a few noteworthy concepts:

TransID and TransAmt do not require fully qualified names because they exist in only one of the
tables. You can use fully qualified names for readability if you wish.
The Customer table is considered to be the left table because it was called first. Likewise, the
Transaction table is the right table.
You can use more than two tables, in which case each one is naturally joined to the cumulative
result in the order they are listed, unless controlled by other functionality such as join hints or
parenthesis.
You may use WHERE and ORDER BY clauses with any JOIN statement to limit the scope of your
results. Note that these clauses are applied to the results of your JOIN statement.
SQL Server does not recognize the semicolon (;), but I use it in the included examples to denote
the end of each statement, as would be expected by most other RDBMSs.

Another addition to your SQL toolbox Although the JOIN statement is often perceived as a
complicated concept, you will see that its a powerful timesaving resource thats relatively easy to
understand. Use this functionality to get related information from multiple tables with a single query
and to skillfully reference normalized data. Once youve mastered JOINs, you can elegantly maneuver
within even the most complex database.

Inner join
In relational databases, a join operation matches records
in two tables. The two tables must be joined by at least
one common field. That is, the join field is a member of
both tables. Typically, a join operation is part of a
SELECT query.

select * from A, B where A.x = B.y

The column names (x and y in this example) are often,
but not necessarily, the same.
Outer Join
(database)outer join - A less commonly used variant of the inner join relational
database operation. An inner join selects rows from two tables such that the
value in one column of the first table also appears in a certain column of the
second table. For an outer join, the result also includes all rows from the first
operand ("left outer join", "*="), or the second operand ("right outer join",
"=*"), or both ("full outer join", "*=*"). A field in a result row will be null if
the corresponding input table did not contain a matching row.

For example, if we want to list all employees and their employee number, but
not all employees have a number, then we could say (in SQL):

SELECT employee.name, empnum.number WHERE employee.id *=
empnum.id

The "*=" means "left outer join" and means that all rows from the
"employee" table will appear in the result, even if there is no match for their ID
in the empnum table.

The notorious CROSS JOIN

The CROSS JOIN has earned a bad reputation because its very resource
intensive and returns results of questionable usefulness. When you use the CROSS JOIN,
you're given a result set containing every possible combination of the rows returned from
each table. Take the following example:

SELECT CustomerName, TransDate, TransAmt FROM Customer CROSS JOIN
Transaction;


With the CROSS JOIN, you arent actually free to limit the results, but you can use the ORDER BY
clause to control the way they are returned. If the tables joined in this example contained only five
rows each, you would get 25 rows of results. Every CustomerName would be listed as associated with
every TransDate and TransAmt.


I really did try to come up with examples where this function was useful, and they were all very
contrived. However, Im sure someone out there is generating lists of all their products in all possible
colors or something similar, or we wouldnt have this wonderful but dangerous feature.

The INNER JOIN drops rows
When you perform an INNER JOIN, only rows that match up are returned. Any
time a row from either table doesnt have corresponding values from the other
table, it is disregarded. Because stray rows arent included, you dont have any of
the left and right nonsense to deal with and the order in which you present
tables matters only if you have more than two to compare. Since this is a simple
concept, heres a simple example:

SELECT CustomerName, TransDate FROM Customer INNER JOIN
Transaction ON Customer.CustomerID = Transaction.CustomerID;

If a row in the Transaction table contains a CustomerID thats not listed in the
Customer table, that row will not be returned as part of the result set. Likewise, if
the Customer table has a CustomerID with no corresponding rows in the Transaction table, the
row from the Customer table wont be returned.

The OUTER JOIN can include mismatched rows

OUTER JOINs, sometimes called complex joins, arent actually complicated. They are so-called
because SQL Server performs two functions for each OUTER JOIN.


The first function performed is an INNER JOIN. The second function includes the rows that the INNER
JOIN would have dropped. Which rows are included depends on the type of OUTER JOIN that is used
and the order the tables were presented.


There are three types of an OUTER JOIN: LEFT, RIGHT, and FULL. As youve probably guessed, the
LEFT OUTER JOIN keeps the stray rows from the left table (the one listed first in your query
statement). In the result set, columns from the other table that have no corresponding data are filled
with NULL values. Similarly, the RIGHT OUTER JOIN keeps stray rows from the right table, filling
columns from the left table with NULL values. The FULL OUTER JOIN keeps all stray rows as part of
the result set. Here is your example:

SELECT CustomerName, TransDate, TransAmt FROM Customer LEFT OUTER JOIN Transaction ON
Customer.CustomerID = Transaction.CustomerID;

Customer names that have no associated transactions will still be displayed. However, transactions
with no corresponding customers will not, because we used a LEFT OUTER JOIN and the Customer
table was listed first.


In SQL Server, the word OUTER is actually optional.

The clauses LEFT JOIN, RIGHT JOIN, and FULL JOIN are equivalent to LEFT OUTER JOIN, RIGHT OUTER
JOIN, and FULL OUTER JOIN, respectively.

SQL Subquery

It is possible to embed a SQL statement within another. When
this is done on the WHERE or the HAVING statements, we
have a subquery construct. What is subquery useful for? First, it
can also be used to join tables. Also, there are cases where the
only way to correlate two tables is through a subquery.
The syntax is as follows:
SELECT "column_name1"
FROM "table_name"
WHERE "column_name2" [Comparison Operator]
(SELECT "column_name1"
FROM "table_name"
WHERE [Condition])
[Comparison Operator] could be equality operators such as =, >,
<, >=, <=. It can also be a text operator such as "LIKE."
Let's use the same example as we did to illustrate SQL joins:
store_name Sales Date
Los Angeles $1500 Jan-05-1999
San Diego $250 Jan-07-1999
Los Angeles $300 Jan-08-1999
Boston $700 Jan-08-1999
region_name store_name
East Boston
East New York
West Los Angeles
West San Diego
Table Store_Information
Table Geography
and we want to use a subquery to find the sales of all stores in the West region.
To do so, we use the following SQL statement:

SELECT SUM(Sales) FROM Store_Information
WHERE Store_name IN
(SELECT store_name FROM Geography
WHERE region_name = 'West')
SUM(Sales)
2050
In this example, instead of joining the two tables directly and then adding up only the sales
amount for stores in the West region, we first use the subquery to find out which stores
are in the West region, and then we sum up the sales amount for these stores.
select 'ALTER '||OBJECT_TYPE|| '||OWNER||'.'||OBJECT_NAME||' compile;' from
dba_objects where status='INVALID';
'ALTER'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||'COMPILE;'
ALTER PROCEDURE MAXDATA.P_REPLACE_PROD compile;
ALTER PROCEDURE MAXDATA.P_SETPROTOTYPE compile;
ALTER PROCEDURE MAXDATA.UPDATE_LV10MAST compile;
ALTER PROCEDURE MAXAPP.P_COMPPROC compile;
Microsoft SQL Server Reporting Services


Originally not to be available until the Yukon release of SQL Server, Microsoft decided to release
Reporting Services early because of the customer excitement they heard. Why the excitement?
Reporting Services fills a need that many organizations are faced withthe need to build
business intelligence and reporting solutions. Until now, developers were required to embed
reports into their applications, or organizations were required to purchase expensive and
sometimes problematic third-party reporting solutions. Now, Reporting Services offer a complete
solution for distributing reports across the enterprise; enabling businesses to make decisions
better and faster.

Overview of Reporting Services
Reporting Services is a scalable, secure, robust reporting solution for SQL Server. It supports the
complete reporting lifecycle by including tools for report creation, execution, distribution, and
management. New users can have Reporting Services installed and new reports published within
a matter of hours instead of days or weeks.

Reporting Services consists of the following key components:

Report Designer: Supports the report creation phase of the report lifecycle. It is an add-on tool for
any edition of Visual Studio .NET 2003, suitable for both programmers and non-programmers.
Report Server: Provides services for execution and distribution of reports.
Report Manager: A Web-based administration tool for managing the Report Server.

Report Designer
Report Designer is a Visual Studio .NET 2003 add-on and is included with Reporting Services
(see Figure A). As the name implies, it provides developers and non-developers an intuitive tool
to create sophisticated reports. Users get standard reporting functionality such as grouping,
sorting, and report formatting. This should be sufficient for most reporting needs. For more
advanced reports, the Report Designer has full VB.NET support. Plus, designers can add ActiveX
controls to their reports to create rich, live, interactive reports.

One of the more compelling features of Report Designer is the ability to have dynamic, query-
based parameters. This eliminates the administrator having to maintain parameter lists for all the
reports (i.e., department names, office locations, employee names, etc.). You simply have to
create a new dataset and tie the results to the parameter. It even allows cascading parameters.

At the heart of Reporting Services architecture is the Report Definition Language (RDL), which is
an XML-based standard for defining reports. RDL is key to the Reporting Services success by
allowing third parties to publish reports to the Reporting Server. There are already product
offerings from independent software vendors (ISVs) today.

Though Reporting Services requires SQL Server as its repository, Report Designer can connect to
all types of data sources including OLE DB, ODBC, Oracle, SQL Server, and others. It also has
many rendering options such as HTML, Microsoft Excel, PDF, CSV, XML, and others. The list can
also be extended by third parties or by using the Reporting Services extension library.

Report Server
The Report Server provides the repository, management, execution, and delivery functions. It is
scalable and secure, and can support the most demanding reporting needs. It consists of several
subcomponents, including:

Request Handler: Handles all inbound server requests and routes them to the appropriate
component.
Scheduling and Delivery Processor: Provides the scheduling and delivering functionality, and it
can be extended to deliver reports to other devices such as fax machines or printers.
Report Processor: Provides the execution functionality and it, too, can be extended to render to
new output formats such as a Microsoft Word document.
Report Server Database: All the data required by Reporting Services is stored in the Report
Server Database, which must be a SQL Server. This includes everything from server settings to
report definitions, even to cached data from a report execution.

Report Manager
Report Manager provides administrators an easy-to-use tool for configuring the server and
managing the reports. Using Report Manager, administrators can configure security, change
server settings, schedule reports for execution, and maintain the structure of the report folders
(see Figure B).

Administrators have a variety of options for executing reports, including on-demand, cached reporting
with an adjustable expiration period, and flexible report scheduling. All of these options are
configurable at the report level through Report Manager.


Report Manager supports both push and pull distribution options. For e-mail delivery, Report Manager
can include a link to the report, attach the report to the message, or embed reports directly into the
message using Web archive. This eliminates one extra step for the reader of the report.


Where you need Reporting Services

Here are a few typical scenarios where Reporting Services will be invaluable:

New application development:
Most applications have reporting requirements that are sometimes very complex. With Reporting
services, the business analyst, who usually has a better understanding of what is required, is able to
fulfill these requirements. This frees a developer to perform other development tasks and not develop
tedious, time-consuming reports. Rarely do I meet a developer who enjoys writing reports.
Furthermore, these reports will be easier to maintain and support.

Existing applications:
Because of the complexity and time required to embed reports into applications, reports are often
created outside of the application using third-party tools and distributed manually or through batch
jobs. By using Reporting Services, these applications can easily be extended to include a complete
reporting solution, embedded within the application.

Executive dashboard
Executive dashboard is the buzzword for providing executives a comprehensive view of their business,
commonly in an Enterprise Portal. Reporting Services includes several key features for an executive
dashboard, such as My Reports, My Subscriptions, push/pull delivery, numerous rendering options,
and support for Web services.


Though very powerful, there are a couple of scenarios that would not be suitable for Reporting
Services:

Applications using third-party reporting solutions:
Reporting Services supply only a migration tool
for Microsoft Access. If you have a significant investment in another tool and it supports your needs,
youll need to carefully consider the costs of migrating to Reporting Services.

Enterprise reporting solution:
Though non-programmers can be very productive with Report Designer, there are reporting tools
absent from Report Designer that experienced report creators will miss. For example, Reporting
Services does not include a database abstraction layer to hide database details from users. This
abstraction is very useful if you want to deploy a reporting solution to a large, semi-technical audience.



Dirty Little Secret SHHH!!
Whether you are writing a simple database in Access/Visual Basic or a huge client-
server web application, SQL is a powerful tool that is a must-have skill for any
developer. For a beginner it is yet another skill you need to learn, but there is a
shortcut that will allow you to write SQL without knowing SQL, and as a result of
using these shortcuts you will actually learn to write native SQL. So what are these
shortcuts?, I hear you cry, well, read on and find out.

All good database and report writing applications allow you to create database
queries visually by deciding which tables to include, the relationship between tables,
the sort order and the criteria of which records to include or not to include. Certain
products (including Crystal Reports (SQL Query Designer), Microsoft Access and the
very basic 'Visdata' demo that comes with Visual Basic) allow you to view this query
as a native SQL statement. Be warned: Access puts lots of extra rubbish in the
statement that you don't need, such as square [] and curved () brackets; it also
prefixes fields with table names even if you are only using one.

So the answer to 'How do I write advanced SQL queries without knowing any SQL?'
is to design your query in, say, MS-Access, click the View -> SQL View menu option,
cut and paste into your own application, and hey-presto! You are an instant SQL
expert!

You might also like