You are on page 1of 40

exploring IT: Java Programming Grade 12

Learning Unit 2 SQL Revision


2.1 Introduction
The Learning Unit revises SQL concepts from Grade 11 by studying SQL statements that can be
performed on a single table. We will progress by first looking at SQL to query the entire table,
eliminate columns, eliminate rows and then aggregate functions to summarise the data in the table.
In this Learning Unit, we will be using the database PrintDB available in both MS Access and
MySQL. The database is not normalised. The database contains a single table called PrintLogs
consisting of many records detailing print jobs sent by users on a network at a school named
MySchool over the period 23 to 30 May 2017. The database is available on the Funworks web site at
http://www.funworks.co.za/FileDownloads/JavaFiles.aspx or
http://www.funworks.co.za/FileDownloads/DelphiFiles.aspx.
The design and datasheet views of the PrintLogs table are indicated below:

2.1.1 Design View of PrintLogs

The print logs consist of details about the user, the printer to which they send the job and the print job.

 User
The user’s first name, surname, date of birth (DOB) and email are stored in the second, third,
fourth and fifth fields. A user may either be a member of staff at the school or a student.
Students are identified with the word “student” after the @ sign. For example,
Alejandra.Olson@student.MySchool.co.za. Staff members only have the school’s domain name
after the @ sign. For example, CastroN@MySchool.co.za.

 Printer
The details of the printer are stored in the printer name, printer model and the printer serial
number fields.

 Print Job
The Date field stores the date and time of the print job sent to a printer by a user. The last six
fields store the details of the particular print job sent by the user to the printer. These details
include the total number of pages, the number of pages that are colour in the print job, the
number of copies, the size in kilobytes and whether the job was actually printed. A print job may
consist of colour and black pages. To determine the number of black and white pages, subtract
the number of colour pages from the total pages. Not all print jobs are printed. User may not
have rights to print to the printer, or enough credits to print the number of pages.

Learning Unit 2 SQL Revision Page 31


exploring IT: Java Programming Grade 12

2.1.2 Datasheet view of PrintLogs


A sample of the first few rows and columns is shown in the following table:

Page 32 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.2 Extracting Data from the Whole Table


In SQL, we use the SELECT statement to extract data from the table, followed by ‘*’ to indicate that
we want to extract all fields for each record. By omitting a WHERE clause ALL the rows will be
displayed.
SELECT *
FROM <tableName>

ACTIVITY 1
Open the database PrintDB.accdb and perform the following SQL query to extract all the data in the
table:
SELECT *
FROM PrintLogs

You will notice that all the columns (fields) are included (too wide to show in this document) and there
are 2642 rows (records).

2.2.1 Sorting
A table can be sorted according to the field/s indicated in an ORDER BY clause:
SELECT *
FROM <tableName>
ORDER BY <field1>, <field2>,…,<fieldN>

2.2.1.1 Sorting by One Field Only


Perform the following SQL query to extract all the data in the table, but sorted by the Surname field:
SELECT *
PrintLogs
ORDER BY Surname

You will notice that all the columns (fields) are included, as before, but an order has been created on
the Surname field. The order applies to the Surname only, ascending alphabetically, and no other
additional fields will be sorted. All of the users appear to be unique, so don’t assume that the first
names are sorted as well; if more than one user had the same surname then a sort on the first name
would only be co-incidental.

Page 33 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.2.1.2 Sorting by More than One Field


A more constructive display can be achieved by sorting on the Surname, once again, but including
the Date as a secondary sort; notice that the dates are not in order for each surname.
Perform the following SQL query to extract all the data in the table, sorted on the Surname field
and then the Date field:
SELECT *
FROM PrintLogs
ORDER BY Surname, Date

Notice that the records have, once again, been sorted (firstly) on the Surname field (Adams, Aguilar,
Alexander, Allen,…), but now there is a Date order within each surname. When sorting on multiple
fields, the first sort is maintained and any subsequent sorts take place within the previous sort.
Looking at Rebecca Alexander, in particular, she printed, in order, on the 23rd, 25th and 29th of May;
she printed twice on the 25th and the times are also ordered because the time portion is part of the
combined date/time field.
You can reverse the order of any of the sort fields by adding the keyword DESC to the required
field/s.
Change the query to extract all the data in the table, sorted on the Surname field, in reverse
order, and then the Date field:
SELECT *
FROM PrintLogs
ORDER BY Surname DESC, Date

Page 34 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

Notice that the result starts with the last alphabetical user ‘Young’, but the date/time order for each
user with the surname ‘Wright’ is still in ascending order.

2.2.1.3 Using TOP/LIMIT to Limit Rows


We can limit the number of rows displayed in a table using TOP in MS Access and LIMIT in MySQL.
TOP/LIMIT useful with the ORDER BY instruction reducing the rows extracted by only selecting the
first <n> rows according to the sort applied. Although it would still work, using TOP/LIMIT doesn’t
make much sense without ORDER BY. The syntax of SQL statement using TOP/LIMIT is:

MS Access MySQL
SELECT TOP <n> * SELECT *
FROM <tableName> FROM <tableName>
ORDER BY <field1>,…,<fieldN> ORDER BY <field1>,…,<fieldN>
LIMIT <n>

Perform the following SQL query to extract the three oldest users in the table, sorted on the DOB
field. The DOB field indicates the age of the user. In MS Access, the date is in the format 1970/03/02
and in MySQL 1970-03-02.

ACTIVITY 2
Type in the SQL statement below and run it.

MS Access MySQL
SELECT TOP 3 * SELECT *
FROM PrintLogs FROM PrintLogs
ORDER BY DOB ORDER BY DOB
LIMIT 3

In MS Access, we expected to see three results in the Result Set, but ended up with 43. The reason
is that were more than three rows for the oldest user, so all of the results were included for that user.
We will successfully get the correct result when we use the keyword DISTINCT, in a later section.
In MySQL, only three results are shown, all relating the same user. MS Access and MySQL may
return different results for similar queries. This happens in rare cases.

2.2.1.4 TOP/LIMIT without ORDER BY


If we leave out the ORDER By clause, the first three rows will be displayed. The DBMS will extract the
first three rows of the table in the order the records were added to the table.
Type in the query below using TOP/LIMIT without an ORDER BY clause.

MS Access MySQL
SELECT TOP 3 * SELECT *
FROM PrintLogs FROM PrintLogs
LIMIT 3
Check that the first three rows of the table are produced.

Learning Unit 2 SQL Revision Page 35


exploring IT: Java Programming Grade 12

2.2.1.5 Using TOP/LIMIT to find the Largest or Smallest


We can find the largest or smallest using TOP/LIMIT and the ORDER BY clause if we limit the result
to one record.
Type on the following SQL statement to find the print job with the highest number of colour
pages.

MS Access MySQL
SELECT TOP 1 * SELECT *
FROM PrintLogs FROM PrintLogs
ORDER BY TotalColourPages DESC ORDER BY TotalColourPages DESC
LIMIT 1

In MS Access, two rows are returned both from Josiah Thomas who printed a similar job of 120 colour
pages on the 23rd and 29th May.

In MySQL, only the print job of 120 pages printed by Josiah Thomas on the 23rd May is displayed.

In both cases we can see who was responsible for the printing as all the fields related to the record
are returned. We could determine the total number of colour pages using the aggregate function
MAX, however MAX on its own would not determine WHO was responsible for the printing. We will
return to this concept when we revise aggregate functions.

EXERCISE 1

Write SQL statements for the following queries using the table PrintLogs:
1. Display the table in descending order of the size of the print job.
2. Display the table sorted according to the date, then the printer name and then the surname of
the user
3. Display the top 10 largest print jobs in kilobytes.
4. Display the 20 lowest records sorted by cost.
5. Find the print job with the highest number of pages.
6. Find the smallest print job in kilobytes.
7. Normalise the database PrintDB to third normal form as a revision exercise. We will continue to
use the original database in the rest of this Learning Unit.

Page 36 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.3 Limiting the Columns (Fields)


Up to this point, we have selected all columns (fields) by simply adding ‘*’ after the SELECT verb. To
choose the fields you want displayed list the fields in the order you want them displayed, instead of
using the ‘wildcard’ ‘*’. If we do not use the WHERE clause, all the rows in the table will still be
displayed.
The general syntax of a column limiting query is:
SELECT <field1>,…,<fieldN>
FROM <tableName>

ACTIVITY 3
Perform the following SQL query to display only the Surname, FirstName and DOB fields from
PrintLogs, ordered by DOB:
SELECT Surname, FirstName, DOB
FROM PrintLogs

You can see clearly that only three columns exist in the Result Set and all the rows (2642) are
returned in the table.
Change the query to sort the table from oldest to youngest.
SELECT Surname, FirstName, DOB
FROM PrintLogs
ORDER BY DOB

The same three columns are shown with the same number of rows and no alphabetical ordering on
either the Surname or FirstName fields, only on the DOB field, where you see the oldest user listed
first.

2.3.1 Using DISTINCT


If we limit the columns, as we did in the previous query, we can remove duplicates more successfully.
The keyword DISTINCT instructs the query to only fetch unique rows based on the fields selected,
ignoring duplicates. The syntax is simple: the keyword DISTINCT immediately follows SELECT:
SELECT DISTINCT <field1>,…,<fieldN>
FROM <tableName>
Type in the following SQL query to display only the Surname, FirstName and DOB fields from
PrintLogs, ordered by DOB, but only unique (DISTINCT) rows:
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs
ORDER BY DOB

Interestingly, because we now have a list of all unique names, the result set indicates a total of 340
records, which must be the total number of users in the table.

We can now get back to using the TOP <n> directive more successfully. Combining TOP <n> with
DISTINCT and ORDER BY will show us who the 3 eldest users are.

Learning Unit 2 SQL Revision Page 37


exploring IT: Java Programming Grade 12

Perform the following SQL query to display the oldest 3 users’ Surname, FirstName and DOB
fields from PrintLogs, ordered by DOB:
SELECT DISTINCT TOP 3 Surname, FirstName, DOB
FROM PrintLogs
ORDER BY DOB

In MS Access, the Result Set looks like this:

In My SQL, a similar Result Set is produced:

EXERCISE 2

1. List all the users in the table PrintLogs displaying each user only once.
2. Create a unique alphabetical list of printers showing only the printer name.

2.4 Calculations
Not only can we limit the number of fields in a query, but we can create extra calculated, or derived
fields. Calculated fields can use the existing fields to generate a new answer field, using an arithmetic
formula, which is not a field in the original table. For example, each record in the table has a
TotalPages and a Copies field; this means that if the Copies filed has a value of 1, the total number
of pages in the print job would be the same as the TotalPages field, but, if Copies has a value of 2,
then the overall total would be 2 x TotalPages. We can produce a query to show the overall total
number of pages for each print job by multiplying the two fields, TotalPages and Copies, for each
record.
If we need to perform a calculation for ALL the records in a table, then we place the calculation in the
SELECT part of the SQL statement next to any fields that are listed. The Result Set will display the
listed fields as columns and a new column for each calculation. This field is not added to the table, it
is generated by the SQL statement.
The general form of a SQL statement with a calculation and listed fields where the calculation is
performed on all the rows in the table is:
SELECT <field1>,…,<fieldN>, <calculation>
FROM <tableName>

By convention, we normally place the calculation after the listed fields.

Page 38 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

ACTIVITY 4
Code a SQL query to display the Date, Surname, FirstName, TotalPages, Copies and
TotalPages x Copies for each record:
SELECT Date, Surname, FirstName, TotalPages, Copies,
TotalPages * Copies
FROM PrintLogs

It is good practise to include the fields used in the calculation so you can check your answer.
The Result Set should look like:

In MS Access, notice that 2642 records are displayed with an extra field called Expr1005 which is the
product of the TotalPages and Copies field. In MySQL, the calculation is used as a heading for the
new column.

2.4.1 Using an Alias to Name a Generated Field


Instead of using the default alias provided by the DBMS, unless you provide a title, known as an alias
using the AS keyword followed by your chosen name for the field.
Change the previous query to add the alias OverallTotalPages after the calculated field.
SELECT Date, Surname, FirstName, TotalPages, Copies,
TotalPages * Copies AS OverallTotalPages
FROM PrintLogs

Learning Unit 2 SQL Revision Page 39


exploring IT: Java Programming Grade 12

Notice how the title changed for the calculated field in MS Access:

In My SQL:

2.4.1.1 Naming Aliases Correctly


MS Access allows the naming of aliases that are keywords in SQL or consist of more than 1 word, as
long as you surround the alias with square brackets ([ ]). For example, if you wanted to name the
alias above using 3 separate words, you could by stating:
TotalPages * Copies AS [Overall Total Pages]

Or, if you wanted to provide an alias ‘BY’, to abbreviate Birth Year, you would also have to use square
brackets around ‘BY’ ([BY]) as ‘BY’ is a keyword used in clauses like ORDER BY.
Although MS Access allows the use of square brackets to enclose field names with spaces, as
indicated above, MySQL is not as tolerant and will not allow either square brackets or spaces in field
names.
In general, the rules for identifier names for a variable, method or program name state that the
identifier should not contain any spaces or special characters. In keeping with this concept, it is good
practise to apply these rules when naming generated fields in a database. It is far better, for both MS
Access and MySQL, to use a convention known as CamelCase, where the spaces are removed and
the first letter of each word is capitalised, representing the humps of a Camel. CamelCase is
subdivided into upper and lower CamelCase:

 upper CamelCase capitalises all first letters including the first letter - CamelCase
 lower CamelCase capitalises all first letters excluding the first letter – camelCase.
CamelCase used for many of the field names in the PrintLogs table. Another acceptable convention
is to use underscores ‘_’ <Shift> + <-> to combine separate words.
For example, TotalPages * Copies AS Overall_Total_Pages

Underscores do work, but the screen or printed view of a field name is sometimes obscured by
bottom border of the field’s cell and can be misinterpreted as a space.

Page 40 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.4.2 Arithmetic Functions


In SQL, we can perform calculations without referring to a table. For example, SELECT 5+11 with
return 16.

ACTIVITY 5
In MS Access, complete the table to determine the effect of the arithmetic functions INT and
ROUND.

Arithmetic Example Output


Function

SELECT INT(4.0)
SELECT INT(4.1)
INT
SELECT INT(4.9)
SELECT INT(4.5)
SELECT ROUND(4.1)
SELECT ROUND(4.9)
SELECT ROUND(4.5)
ROUND
SELECT ROUND(4.445)
SELECT ROUND(4.445,1)
SELECT ROUND(4.449,2)

If you are using MySQL, complete the table to determine the purpose of FLOOR, ROUND,
CEILING and TRUNCATE.

Arithmetic Example Output


Function

SELECT FLOOR(4.0)
SELECT FLOOR (4.1)
FLOOR
SELECT FLOOR (4.9)
SELECT FLOOR (4.5)
SELECT ROUND(4.1)
SELECT ROUND(4.9)
SELECT ROUND(4.5)
ROUND
SELECT ROUND(4.445)
SELECT ROUND(4.445,1)
SELECT ROUND(4.449,2)
SELECT CEILING(4.0)
SELECT CEILING (4.1)
CEILING
SELECT CEILING (4.9)
SELECT CEILING (4.5)
SELECT TRUNCATE(1, 0 )
TRUNCATE SELECT TRUNCATE(1.699,1)
SELECT TRUNCATE(185,-1)

The functions FLOOR and CEILING were named metaphorically: the ceiling would be the highest of a
room and the floor would be the lowest of a room.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/

Learning Unit 2 SQL Revision Page 41


exploring IT: Java Programming Grade 12

2.4.2.1 INT/FLOOR
The INT function truncates any real number or expression to the whole number portion only; i.e. it
rounds down. For example, INT(14.2) = 14 and INT(14.9) = 14. FLOOR is the MySQL equivalent of
INT.

ACTIVITY 6
Perform the following SQL query to display the TotalPages, TotalColourPages, and the
percentage of the print job that consisted of colour pages. Only the whole number percentage
must be displayed:
SELECT TotalPages, TotalColourPages,
INT(TotalColourPages/TotalPages * 100)
FROM PrintLogs
Check the Result Set answers to confirm that the INT function works correctly.

2.4.2.2 ROUND
The ROUND function rounds a result, either up or down according to the normal convention: if the
decimal portion is >= 0.5, the rounding is upwards to the next whole number, otherwise it’s rounded
down.
Change the SQL query to display only the TotalPages, TotalColourPages, and the percentage
of the print job that consisted of colour pages, but replace the INT function call with a call to the
ROUND function:
SELECT TotalPages, TotalColourPages,
ROUND(TotalColourPages/TotalPages * 100)
FROM PrintLogs

Notice that the value that should


have been 66.666…is rounded to
67 (up), and the value that should
have been 33.333…is rounded
down to 33.

The ROUND function can also be used with a parameter to indicate how many decimal places must
be included in the answer.
Change the query to display only the TotalPages, TotalColourPages, and the percentage of
the print job that consisted of colour pages, but replace the ROUND function call with a call to
the ROUND(expression, numberOfDecimalPlaces) function to round to 2 decimal places:
SELECT TotalPages, TotalColourPages,
ROUND(TotalColourPages/TotalPages * 100, 2)
FROM PrintLogs

In MS Access, some of the results are rounded to 2 decimal places, some have some have none and
some have one. This is because the ROUND function only displays the number of significant, non-
zero, decimal places, regardless of what was requested.

Page 42 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

In MySQL, all the numbers are displayed to two decimal places, regardless whether they are zero.

2.4.2.3 CEILING
In MySQL, CEILING rounds the decimal up to the next integer. The decimal is irrelevant in this
function as it is always eliminated when the number rounds up to an integer.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/

2.4.2.4 TRUNCATE
The TRUNCATE function in My SQL, is not very common to use since it removes digits from your
number. TRUNCATE takes two parameters: the first is the number and the second is the position to
which to truncate.
For example,
SELECT TRUNCATE( 1, 0 ); // returns 1
SELECT TRUNCATE( 1.699, 1 ); // returns 1.6
SELECT TRUNCATE( 185, -1 ); // returns 180

Learning Unit 2 SQL Revision Page 43


exploring IT: Java Programming Grade 12

You can see in the example that the third SQL statement has -1 for the position to truncate, indicating
the first position on the left of the decimal place. The result is 180 because the 5 is truncated, but
replaced with 0 so that the answer is a valid approximation of the original number.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/

2.4.2.5 RND/RAND
MS Access provides RND and MySQL uses RAND to randomly generate a number between 0 and 1,
not including 1. Replace RND with RAND if you are using My SQL.

ACTIVITY 7
Type in the following SQL statements to generate a random number between 2 and 7 using the
RND/RAND function.
Run each query more than once to check that a different result is produced each time.
Please explain the use of seed?

MS Access My SQL Result

SELECT RND() SELECT RAND()


SELECT RND() * 5 SELECT RAND() * 5
SELECT INT(RND() * 5) SELECT FLOOR(RAND() * 5)
SELECT INT(RND() * 5) + 2 SELECT FLOOR(RAND() * 5) + 2

In general, to produce one random integer between A and B (inclusive), use the formula:
SELECT INT(RND() * (B-A+1)) + A / SELECT FLOOR(RAND() * (B-A+1)) + A

OR
SELECT INT(RND() * (NoofOptions)) + StartNum
SELECT FLOOR(RAND() * (NoofOptions)) + StartNum

2.4.3 Arithmetic Operators


2.4.3.1 MOD
The MOD function works the same as you may have come across in your programming language: it
calculates the remainder of division. For example,
11 MOD 4 = 3, because 4 divides into 11 twice, leaving a remainder of 3;
12 MOD 4 = 0, because 4 divides into 12 exactly 3 times, leaving no remainder.

2.4.3.2 Integer Division


Integer division is division in which the fractional part (remainder) is discarded producing a whole
number. MS Access has a \ operator, opposite to the real division / operator. Division using the \
operator rounds both numbers to integers, divide the first number by the second number, and then
truncate the result to an integer, effectively producing integer division between tow integers.
My SQL uses the DIV operator for integer division.
Complete the following table to investigate integer division:

Page 44 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

MS Access My SQL Result

SELECT 23 \ 7 SELECT 23 DIV 7


SELECT 14 \ 4 SELECT 14 DIV 4
SELECT 2365 \ 1000 SELECT 2365 DIV 1000
SELECT -45 \ 5 SELECT -45 DIV 5
SELECT 678 \ -50 SELECT 678 DIV -50

ACTIVITY 8
Many printers have the ability to print in ‘pamphlet’ style, which means that 4 actual pages fit on 1
sheet of paper, either A4 with 4 A5 pages, or A3 with 4 A4 pages, back-to-back, in the correct order.
This means that if the total number of pages in the job is not a multiple of 4, there will be some wasted
(blank) pages included.
Write a SQL query to display the TotalPages and the number of wasted pages if a pamphlet
style print is used for a document. The WastedPages is calculated by subtracting the 4 MOD
TotalPages from 4:
SELECT TotalPages, 4 – TotalPages MOD 4 AS WastedPages
FROM PrintLogs

This query produces correct results for all page totals other than page counts that are exact multiples
of 4.
Can you think of a way to produce the correct answer including multiples of 4 in SQL?

EXERCISE 3

Write the SQL statement only using SELECT to perform the following calculations. Test each SQL
statement to check your answers.
1. Generate a random number between 10 and 20.
2. Generate a random number between 13 and 47.
3. Generate a random number between 0 and 99.
3.1. Is it possible to always have an answer that consists of 2 digits?
4. Determine the remainder after 63 is divided by 8.
5. Divide 25 by 3, rounded to three decimal places.
6. Divide 125 by 26 using integer division.
6.1. Can you provide two different ways of doing this calculation?
7. Calculate 1.23² + 4.798² divided by 7.65 – 2.3 rounded to the nearest whole number.
8. Can you think of an example when \ in MS Access will not produce the same result as DIV in
MY SQL?

2.4.4 String Manipulation


SQL provides functions to manipulate text. Specifically, the LEFT(), RIGHT() and MID() functions are
used to extract parts of strings. The LEN() function in MS Access / LENGTH() function in My SQL
determines the number of characters in a string. We can also use the ‘&’ operator in MS Access or
the CONCAT function in MySQL to concatenate (join) strings together. There are many more

Learning Unit 2 SQL Revision Page 45


exploring IT: Java Programming Grade 12

functions. To get an idea, you can look at the text manipulation functions available in MS Excel.
These functions can generally be used in SQL statements. Since these functions are quite simple to
interpret, we are going to look at them all together:
Examples:

Function/Operator Example Result Explanation

The left-most 2 characters of


LEFT LEFT('Hello', 2) 'He'
'Hello'

The right-most 3 characters of


RIGHT RIGHT('Goodbye', 3) 'bye'
'Goodbye'

The middle 2 characters, starting at


MID MID('Goodbye', 4, 2) 'db'
4: 'Goodbye'

MS Access

LEN LEN('John') 4 The number of characters in ‘John’

Combines the integer 4 with ‘th June’:


& 4 &'th June' 4
'4th June'

My SQL

LENGTH LENGTH('John') 4 The number of characters in ‘John’

Combines the integer 4 with ‘th June’:


CONCAT CONCAT(4,'th June') 4
'4th June'

Note that MID has three parameters, the first is the string from which text is to be extracted, the
second is the starting character where to begin (the string is numbered from 1) and the third character
is a count of the number of characters to extract.
For example, MID('Oh Happy Day', 4, 5) will produce Happy

5 characters
1 2 3 4 5 6 7 8 9 10 11 12
O h H a p p y D a y

ACTIVITY 9
We are going to make a SQL query that will uniquely display:

 • The Surname
 • The FirstName
 • The Surname and initial, with a comma and space separating them
 • The total number of characters in the Surname and FirstName
Type in the query in MS Access, using the '&' operator to join the strings:
SELECT DISTINCT Surname,FirstName,
Surname & ', ' & LEFT(FirstName,1) AS SurnInit,
LEN(Surname) + LEN(FirstName) AS TotalNamesLength
FROM PrintLogs

Page 46 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

If you are using My SQL type in the following query using the CONCAT function with the fields
listed as parameters separated by commas:
SELECT DISTINCT Surname,FirstName,
CONCAT(Surname,', ',LEFT(FirstName,1)) AS SurnInit,
LENGTH(Surname) + LENGTH(FirstName) AS TotalNamesLength
FROM PrintLogs

The result set should look like this:

Check the first few records for:

 uniqueness (distinct),
 correct joining of the Surname and Initial (note the inclusion of the literal text ‘, ‘ to get a
comma and space after the Surname),
 correct total for the sum of the characters of the Surname and FirstName, using
LEN/LENGTH

EXERCISE 4

1. Write a SQL statement to create a code for each user using the first three letters of the
surname, the 3rd and 4th letter of the first name and a single random digit. Name the generated
field Code.
2. Code a SQL statement to generate a code for each user using the last 2 letters of the surname
followed by two random digits. Ensure that if a 9 is generated then 09 is added.
3. Create a code for each user using their initials of their surname followed by the initial of the first
name and the number of characters in both fields combined.

2.4.5 Date and Time Functions


The Date field in the PrintLogs table stores both the date and time value of a specific moment in time
called a timestamp. A timestamp is not a combination of two fields but an atomic field referencing a
fixed point in time represented by the date and time.
There are a number of Date/Time functions available in SQL to isolate various parts of the date time
value.

 DATE(datetimevalue) – My SQL
 YEAR(datetimevalue)
 MONTH(datetimevalue)
 DAY(datetimevalue)
 TIME(datetimevalue) – My SQL

Learning Unit 2 SQL Revision Page 47


exploring IT: Java Programming Grade 12

 HOUR(datetimevalue)
 MINUTE(datetimevalue)
 SECOND(datetimevalue)
 NOW()
These functions can be used together with the NOW() function, which accesses the date/time value of
the computer system.

ACTIVITY 10
As an introduction, create a query that will display each of the date parts of the DOB field (year,
month, day), as separate fields, to test the functions.
SELECT DOB,
YEAR(DOB) AS BirthYear,
MONTH(DOB) AS BirthMonth,
DAY(DOB) AS BirthDay
FROM PrintLogs

The field values are now integers and are not part of a date/time value.
If you are using My SQL, code a new SQL statement to isolate the DATE and TIME values from
the Date field.
SELECT Date, DATE(Date) AS PrintDate, TIME(Date) AS PrintTime
FROM PrintLogs

2.4.5.1 Calculating Age


You can calculate the age in years of a person by subtracting the birth year from the current year. We
can obtain the birth year using YEAR(DOB) and the current year can be obtained by finding the year
portion of the current (system), wrapping the YEAR() function around the NOW() function:
YEAR(NOW()).
Change the previous query to add another column which will reflect the age for each user:
SELECT DOB,
YEAR(DOB) AS BirthYear,
MONTH(DOB) AS BirthMonth,
DAY(DOB) AS BirthDay,
YEAR(NOW()) - YEAR(DOB) AS Age
FROM PrintLogs

The query shows the following:

The query is fairly accurate, but it actually shows what the age will be this year, irrespective of
whether the birth month and day have been reached this year or not. A slightly more accurate age
can be determined by dividing the difference between the current date (NOW()) and the birth date

Page 48 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

(DOB) by 365.25 (to accommodate leap years). Any date/time value is actually the total number of
days, hours, minutes and seconds (based on a start date of 1900/01/01, usually. Dividing the
difference by 365.25 converts to years and provides more accuracy than the previous query.
In My SQL, the TIMESTAMPDIFF function can be used to accurately determine the age in years.
If you are using MS Access, change the previous query to provide a more accurate age, by
adding the AccurateAge field:

MS Access MySQL
SELECT DOB, SELECT DOB,
YEAR(DOB) AS BirthYear, YEAR(DOB) AS BirthYear,
MONTH(DOB) AS BirthMonth, MONTH(DOB) AS BirthMonth,
DAY(DOB) AS BirthDay, DAY(DOB) AS BirthDay,
YEAR(NOW()) - YEAR(DOB) AS Age, YEAR(NOW()) - YEAR(DOB) AS Age,
(NOW() – DOB)/365.25 AS TIMESTAMPDIFF(YEAR,DOB,Now())
AccurateAge AS AccurateAge
FROM PrintLogs FROM PrintLogs

In MS Access, some of the ages are one less in whole years because the birth month and day has
not yet been reached this year:

In MS Access use the INT function to remove the decimal places from the AccurateAge result
to produce an integer.
In My SQL, AccurateAge is an integer:

Learning Unit 2 SQL Revision Page 49


exploring IT: Java Programming Grade 12

2.4.6 Aggregate Functions


Up to now, we have applied functions to field values in a record and have produced a new field for
ALL the records in the table. Aggregate functions summarise the data in the table to produce ONE
result. The aggregate functions that we deal with are: MAX(), MIN(), AVG(), SUM() and COUNT().
The aggregate functions answer questions like:

 How many print jobs were done? (COUNT)


 What was the average number of pages printed across all the print jobs? (AVG)
 What was the maximum or minimum file size of all the print jobs? (MAX/MIN)
 What was the total cost of all the print jobs? (SUM)
These functions summarise the data in the table; i.e. the result is not found in the table, it uses the
table data to find a summary value.

2.4.6.1 COUNT

ACTIVITY 11
How many rows and columns do you expect to be produced by the query using the COUNT
function?
SELECT COUNT(*) AS TotalPrintJobs
FROM PrintLogs
Type in your query to check your answer.
This query produces only ONE answer, which is the total number of
records in the table (2462).
Change the query to remove the star and replace it with a specific field.
SELECT COUNT(FirstName) AS TotalPrintJobs
FROM PrintLogs

The same result will be displayed, as the number of rows will be the same for any field if the fields
have a value. The COUNT function can at times return different values if fields have null values.
Type in following SQL statement to insert a record into the table with null values. The SQL
statement will only provide a value for the Firstname field, the rest of the fields will have no
values.
INSERT INTO PrintLogs (FirstName)
VALUES ('TestName')
View the table and scroll to the last record in the table.

Run the query to count the number of values in the FirstName field.
SELECT COUNT(FirstName) AS TotalPrintJobs
FROM PrintLogs

The result returned will be 2643.


Change the query to count the number fields in the Surname field.
SELECT COUNT(Surname) AS TotalPrintJobs
FROM PrintLogs

The result will be 2642 since the Surname field does not have a value.

Page 50 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

In conclusion, the COUNT(<FieldName>), will count the number of non-null entries for the chosen
field

2.4.6.2 AVG

ACTIVITY 12
The AVG function will determine the average of the field sent as a parameter to the function.
Type in the SQL query to determine the average number of total pages in each print job.
SELECT AVG(TotalPages) AS AveragePages
FROM PrintLogs
Change the query to round the result to one decimal place to produce
11.6.
SELECT ROUND(AVG(TotalPages),1) AS AveragePages
FROM PrintLogs

2.4.6.3 MIN
The smallest value of a field can be determined using the MIN function.
Code the SQL statement to determine the smallest print job size.
SELECT MIN(SizeKB) AS MinFileSize
FROM PrintLogs

If we wanted to know who sent the smallest print job, we could try by
including the name of the field.
Adapt the query to include the FirstName and Surname fields.
SELECT FirstName, Surname, MIN(SizeKB) AS MinFileSize
FROM PrintLogs

The query will produce and error in MS Access explaining that the FirstName field is not part of the
MIN function, which basically means that the field and the MIN result is unrelated.
In MySQL, the query executes but the FirstName and Surname returned are that of the first record of
the table which is nonsensical.

To find WHO printed the smallest print job, we can use ORDER BY and LIMIT to return the record
which combines the related fields.

Learning Unit 2 SQL Revision Page 51


exploring IT: Java Programming Grade 12

In MS Access, type in the query:


SELECT TOP 1 FirstName, Surname, SizeKB
FROM PrintLogs
ORDER BY SizeKB
In My SQL, type in the query:
SELECT FirstName, Surname, SizeKB
FROM PrintLogs
ORDER BY SizeKB
LIMIT 1

Once again, MS Access produces a solution of 15 results instead of 1 as more than one user has
printed a job of 1 KB.

In MySQL, only one result is returned, even though more than one record has a size of 1 KB.

2.4.6.4 MAX
Type in the following query to determine the largest print job.
SELECT MAX(SizeKB) AS MaxFileSize
FROM PrintLogs
Change the query to determine which user sent the largest print job.
Change the query to find the smallest and largest print job.
Now change the query to find both the largest and smallest print job.
SELECT MAX(SizeKB) AS MaxFileSize, MIN(SizeKB) AS MinFileSize
FROM PrintLogs

Page 52 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

Note that the largest and smallest print job is returned in a single Result Set, but these values do not
relate to one record.

2.4.6.5 SUM
To add up all the values in a field, the SUM function can be used. Do not confuse SUM and COUNT.
SUM finds the total value of all the field’s values and COUNT determines the number of rows in the
table.
Code the query to add up the values in the TotalCost field.
SELECT SUM(Cost) AS TotalCost
FROM PrintLogs

Remember, all of the aggregate functions produce a single value.

EXERCISE 5

Code SQL statements to solve the following problems:


1. Determine the average number of colour pages printed rounded to two decimal places.
2. Determine the largest number of total pages printed.
3. Count the number users in the table PrintLogs.
4. Determine the latest hour that a print job occurred in any day.
5. Determine the user that printed the latest hour in any day.
5.1. Will you answer find more than user if there are many users that have printed in the same
hour?
5.2. Would the use of DISTINCT solve your problem?

2.5 Limiting the Rows Selected for a Query


In the previous section, we looked at limiting the columns (fields) in a query; we are now going to look
at limiting the rows (records). Row limiting is achieved by adding a WHERE clause after the table
name:
SELECT <field1>,…,<fieldn> (or ‘*’) FROM <tablename>
WHERE <condition>

The <condition> can consist of any Boolean condition, using any number of fields. The condition
always consists of a field name, followed by a relational operator (=, <, >, <>) and then a comparison
value:
WHERE <field> (=, <, >,>=, <=, <>) <value>

Remember, <> is the way we express ‘not equal to’.


The query will select all records WHERE the field/s condition/s are met. The number of rows returned
from the query will be less than the number of the rows in the original table.

Learning Unit 2 SQL Revision Page 53


exploring IT: Java Programming Grade 12

2.5.1 Basic Conditions


There are many conditions that produce meaningful queries. We will study some with reference to
the PrintLogs table:
In each query, check the number of rows retuned.

ACTIVITY 13
Perform the following queries and check that the required outcome is obtained:
Find the records for users who printed documents consisting of more than 5 pages.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages > 5

The result set produces 450 records (from 2642), meaning that the remainder must have printed 5 or
less pages.
You can also compare fields with fields (or calculations involving fields):
Find the records that contained more colour pages than black and white.
SELECT Surname, FirstName, TotalPages, TotalColourPages
FROM PrintLogs
WHERE TotalColourPages > TotalPages - TotalColourPages

You can even involve functions (but not aggregates) in a WHERE clause.
Find the users that are 30 or less years old, show each user only once.
SELECT DISTINCT Surname, FirstName, YEAR(NOW())-YEAR(DOB) AS AGE
FROM PrintLogs
WHERE YEAR(NOW())-YEAR(DOB) <= 30

2.5.2 Compound Conditions


Conditions can be combined using the logical operators NOT, AND and OR. Recall that these logical
operators are evaluated on the order of
1. NOT
2. AND
3. OR
To change the order of operations for logical operators you will need to use brackets.
Find the users, without duplicates, that were born on Valentine’s Day.
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs
WHERE MONTH(DOB) = 2 AND DAY(DOB) = 28
Find the users, without duplicates, who have ‘J’ as the first letter of their first name.
SELECT DISTINCT Surname, FirstName
FROM PrintLogs
WHERE LEFT(FirstName,1) = 'J'

Page 54 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.5.3 More Operators for Conditions


In addition to the previous WHERE clause conditions, there are a number of operators included in
SQL. We will look at each of them individually:

2.5.3.1 BETWEEN…AND

ACTIVITY 14
Type in the query to find the records for print jobs that had a total page count of between 5 and
10 copies using the query.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages >= 5 AND TotalPages <= 10

Note: The field TotalPages must be repeated for each condition. We cannot leave out the field in the
second condition, even though it makes sense in English.
WHERE TotalPages >= 5 AND <= 10

We can, however, replace the compound condition with BETWEEN…AND…


Type in the following query to find the records for print jobs that had a total page count of
between 5 and 10 copies using the query.
SELECT Surname, FirstName, TotalPages FROM PrintLogs
WHERE TotalPages BETWEEN 5 AND 10

Note that the values 5 and 10 are included and using BETWEEN…AND is somewhat simpler than
using a compound condition. It can also be used for real and string data.
Type in the query using real numbers and check that it works.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages BETWEEN 5.1 AND 10.2
Type in the query using string values and check that it works.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE FirstName BETWEEN 'Anne' AND 'Frank'

2.5.3.2 IN and NOT IN


IN and NOT IN can be used when a field, or an expression containing a field, needs to be compared
to a set of values.

ACTIVITY 15
If we wanted to find all the distinct users who were born in the months that have 31 days, we
could perform the query:
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs
WHERE MONTH(DOB) = 1 OR MONTH(DOB) = 3 OR MONTH(DOB) = 5 OR MONTH(DOB)
= 7 OR MONTH(DOB) = 8 OR MONTH(DOB) = 10 OR MONTH(DOB) = 12

We can simplify the query using the IN operator and combine the values the MONTH(DOB) is
compared to in a set of values.
Perform the query using the IN operator to observe the same outcome:
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs

Learning Unit 2 SQL Revision Page 55


exploring IT: Java Programming Grade 12

WHERE MONTH(DOB) IN (1,3,5,7,8,10,12)


Perform the following query to find users who were born in the months that don’t have 31 days,
by simply by adding the NOT operator before IN:
SELECT DISTINCT Surname, FirstName, DOB F
ROM PrintLogs
WHERE MONTH(DOB) NOT IN (1,3,5,7,8,10,12)

The IN operator also works with strings as values for the set elements. For example, if we wanted to
list all of the print jobs for users who have surnames: 'Adams', 'Bailey' or 'Carr'.
Type in the following query.
SELECT *
FROM PrintLogs
WHERE Surname IN ('Adams', 'Bailey', 'Carr')
Check that you get the results for the 3 surnames only.
Replacing BETWEEN…AND with IN
As long as the set of values used for IN is not too large to input, the IN statement can replace the
BETWEEN...AND statement.
Type in the following queries to check that they produce the same result.
SELECT *
FROM PrintLogs
WHERE MONTH(DOB) BETWEEN 3 AND 10

SELECT *
FROM PrintLogs
WHERE MONTH(DOB) IN (3,4,5,6,7,8,9,10)

2.5.3.3 LIKE
The LIKE operator is used to compare strings, and sub-strings; it usually includes a wildcard
character (‘*’) that can extract partial matches, depending on where the wildcard is placed. The
syntax of the LIKE operator is as follows:
WHERE <stringfield> LIKE ‘*sub-string*’

For example,
WHERE <stringfield> LIKE ‘abc*’

would extract all records where the <stringfield> value starts with ‘abc’, regardless of the rest of the
<stringfield> value. This would be the same as:
WHERE LEFT(<stringfield>,3) = ‘abc’

AND
WHERE <stringfield> LIKE ‘*abc’

ACTIVITY 16
Perform the following queries to test the different versions of the LIKE operator.
Explain the functions of each query.
SELECT DISTINCT Surname, FirstName
FROM PrintLogs
WHERE Surname LIKE 'Car*'

SELECT DISTINCT Surname, FirstName

Page 56 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

FROM PrintLogs
WHERE Surname LIKE '*son'

SELECT DISTINCT Surname, FirstName


FROM PrintLogs
WHERE Surname LIKE '*don*'
Rewrite the conditions that can be replaced with the functions LEFT or RIGHT.

2.5.3.4 IS NULL
We can check if a field does not have a value or is null. Note that we do not use = but the word IS.
For example, WHERE <field> IS NULL

or, if we only want fields that have values: , WHERE <field> IS NOT NULL

Type in the following query to list the printers, without duplicates, that have no serial number.
SELECT DISTINCT PrinterName
FROM PrintLogs
WHERE PrinterSerialNo IS NULL

EXERCISE 6

Write SQL statements for the following:


1. List all print jobs sent the printer ‘HP LaserJet P2055dn’, sort according to surname then first
name.
2. List print jobs sent to any ‘Xerox WorkCentre’ printer.
3. Count the number of print jobs sent to ‘HP LaserJet P2055dn’.
4. Count the number of print jobs sent on the 23 of May.
5. What is the average cost of colour print jobs sent to ‘Xerox WorkCentre 7835 PCL6’?
6. List all the print jobs that did not print.
7. Count the number of print jobs that did not print.
8. Count the number of Xerox printers.
9. Count the number of users who are students. A student has the word ‘student’ in their email
address.
10. Find the student(s) with the smallest print job.
11. Calculate the total number of pages sent to the ‘prepxerox7835’ printer.
12. Calculate the total cost of printing of all the print jobs.

2.6 Using Embedded Queries


We often need to extract records that require a comparison to an aggregate, for example: ‘Find all
print jobs that have a number of pages that is higher than the average number of pages of all the print
jobs’, or ‘Find the records that have a copy cost that is lower than the average copy cost of all the
print jobs’. A field value can be compared to a value in a WHERE clause, but it can’t be compared to
an aggregate. So, the condition:
WHERE TotalPages > AVG(TotalPages)

would not work.

Learning Unit 2 SQL Revision Page 57


exploring IT: Java Programming Grade 12

To calculate the average, we need a complete separate SQL statement. The SQL statement to
calculate the average will be embedded in the WHERE clause as the right-hand operand of the
WHERE clause:
WHERE TotalPages > (SELECT AVG(TotalPages) FROM PrintLogs)

ACTIVITY 17
Type in the query to find the print jobs whose total pages are above average.
SELECT *
FROM PrintLogs
WHERE TotalPages > (SELECT AVG(TotalPages) FROM PrintLogs)
Type in the query to determine the print jobs whose cost is less than average.
SELECT *
FROM PrintLogs
WHERE Cost < (SELECT AVG(Cost) FROM PrintLogs)

Returning to the problem of finding the smallest print jobs, we used ORDER BY and TOP/LIMIT.

MS Access MySQL
SELECT TOP 1 FirstName, SELECT FirstName, Surname,
Surname, SizeKB SizeKB
FROM PrintLogs FROM PrintLogs
ORDER BY SizeKB ORDER BY SizeKB
LIMIT 1

MS Access returned 15 rows and MySQL returned 1.


Type in the query that uses an embedded SQL query to determine the smallest print job AND
display the details of the user/s who printed the jobs.
SELECT FirstName, Surname, SizeKB
FROM PrintLogs
WHERE SizeKB = (SELECT MIN(SizeKB) FROM PrintLogs)

HINT: To check that the embedded query SELECT MIN(SizeKB) FROM PrintLogs has no errors
you run the query on its own before adding it to the larger query.

EXERCISE 7

Write SQL statements to solve the following problems. Not all of these queries are embedded.
1. Find all the users whose surname is longer that average.
2. Find all print jobs that cost above R10.00
3. Find all print jobs that cost more than average. Remember to take the number of pages into
account.
4. List the largest print job in KB printed in black and white to the HP LaserJet P2055dn printer.
Display the user name and the size of the print job. There may be more than one user in this
result.
5. Count the number of students that print to any Xerox machine have printed more than average
colour pages.

Page 58 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

2.7 GROUP BY
We could use a simple aggregate function call to determine the average age of the users. For
simplicity sake, we will use the inaccurate formula to determine the age in years.
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge
FROM PrintLogs

And the query result is:


What if we wanted to know if there is a correlation between the alphabetical category of the surname
and the average age? In other words, do users who have surnames starting with ‘A’ have a
significantly different average age to those starting with ‘B’? Statistically, this shouldn’t be the case,
but we might see some interesting results.
A WHERE clause can be used to restrict the rows generated by the query to find the average age for
surnames beginning with ‘A’ using the following query:

ACTIVITY 18
Type in the query to determine the average age for surnames beginning with ‘A’:
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge
FROM PrintLogs
WHERE LEFT(Surname,1) = 'A'

Recall that the WHERE clause eliminates rows in the existing table but only returns results for the first
letter of the alphabet and not ALL letters.
Notice that the average is different. This is contrary to expectation, so it would be interesting to see
the results for all the other letters of the alphabet. We can certainly perform the same query over and
over for the other 25 letters of the alphabet, but that would involve repetition which cannot be
performed in a single SQL query.
The GROUP BY clause provides the result of an aggregate function for each category of rows
satisfying a condition. The average for each letter can be set as the GROUP BY criterion so that the
aggregate no longer returns a single result for the entire table, as in the first
example, or a single letter as in the second example, using a WHERE clause.
GROUP BY returns the aggregate for each GROUP indicated by the GROUP
BY criterion. The word each (for every or per) often implies that the query
will most likely involve GROUP BY.
Perform the following query to test the outcome using GROUP BY:
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge FROM
PrintLogs
GROUP BY LEFT(Surname, 1)

We get this result set with a single column for AveAge:


The results show a confusing spread of values and there are only 21 rows,
none of which indicate the specific letter that the value pertains to.
It is always a good idea to display what is being GROUPed BY as the first,
left-most, field of the query.

Learning Unit 2 SQL Revision Page 59


exploring IT: Java Programming Grade 12

Perform the query again to include the letter as the first field:
SELECT LEFT(Surname, 1) AS Letter, AVG(YEAR(NOW())-YEAR(DOB)) AS
AveAge
FROM PrintLogs
GROUP BY LEFT(Surname, 1)

In MySQL, the alias Letter can be used in place of the condition


in the GROUP BY clause:
SELECT LEFT(Surname, 1) AS Letter,
AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge
FROM PrintLogs
GROUP BY Letter

The result set relates the AveAge value to the corresponding


Letter.
Notice that the order of the first letters was not specified in the
query using ORDER BY, yet the list is in alphabetical order.
This is a feature of GROUP BY; the result set will be ORDERed
BY the natural order of the data type that is used in the
GROUP BY, without any specific request to do so. We can
request to apply an ORDER BY that specifies a different
ordering criterion.
Consider the observations from the result set:

 The average age for each letter is certainly


inconsistent; this may be attributed to the fact that
the populations in each category are not the same.
 Only 21 letters are shown; this means that
surnames starting with the missing letters were not present. There could be many
reasons for this – notably the size of the sample and cultural bias in terms of where the
names were sourced.
In general, the higher the population and a more equitable spread
of surnames within the population, the more consistent the
averages should be; however, using GROUP BY does provide quite
relevant information.
It’s quite clear that the distribution is quite inconsistent, with
surnames starting with ‘B’, ‘R’ and ‘H’ are significantly more popular
than the rest.
Type in the following query to COUNT the number of
surnames that start with each letter, sorted according to the
fewest to the most surnames.
SELECT LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf
FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY COUNT(*)

In MySQL you can order according to the alias NumberOf.


SELECT LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf
FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY NumberOf

Page 60 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

Note that in MySQL, the ORDER BY clause can use functions or an alias.
We can find the three letters that occur the most.
Type in the query in MS Access to reverse the order and limit to the top three.
SELECT TOP 3 LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY COUNT(*) DESC
Or type the query into MySQL
SELECT LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY NumberOf DESC
LIMIT 3

2.7.2. GROUP BY on Multiple Fields


We can even GROUP BY multiple fields. We can count how many users have printed to each printer.

ACTIVITY 19
Type in the query to count the number of print jobs sent to a printer by each user. Make sure
the two fields that are in the GROUP BY clause are listed in the Result Set.
SELECT PrinterName, Surname, COUNT(*) as Total
FROM PrintLogs
GROUP BY PrinterName, Surname

The first rows of the Result Set show that the printer name is repeated for each user. The printers are
grouped in alphabetical order, and the users that have printed to each printer are listed in alphabetical
order together with the total. This COUNT function determines that there are four rows in the table
with a user name of “Alexander” and a PrinterName of “carmen hp printer”.

Check that there are 4 entries in the table where user name is “Alexander” and PrinterName is
“carmen hp printer”
SELECT COUNT(*) as Total
FROM PrintLogs
WHERE surname = 'Alexander' AND PrinterName= 'carmen hp printer'

We can swop the fields in the GROUP BY to count the number of print jobs a user has sent to
different printers.
Change the query by swopping the fields around in the GROUP BY clause.

Learning Unit 2 SQL Revision Page 61


exploring IT: Java Programming Grade 12

SELECT PrinterName, Surname, COUNT(*) as Total


FROM PrintLogs
GROUP BY Surname, PrinterName

The Result Set is GROUPed BY the Surname field in alphabetical order with each Surname repeated
for a different printer. The PrinterName is sorted alphabetically for each user together with the total
print jobs sent to the printer by a particular user.

2.7.2 GROUP BY with HAVING


Recall that WHERE eliminates rows in the existing or the original table. The GROUP BY clause
produces a different table to the existing table. If we want to remove rows in table that is produced by
a GROUP BY clause we cannot use a WHERE. We need a new clause. HAVING works in the same
way as WHERE works on the original table, it removes rows, but in a table produced by a GROUP BY
clause.
HAVING removes rows from a GROUP BY Result Set and must appear after GROUP BY clause.
Let’s look at the previous query to demonstrate the use of HAVING. What if we wanted to show only
those records that have count of 10 or more?
Perform the following query in MS Access and check the result:
SELECT PrinterName, Surname, COUNT(*) as Total
FROM PrintLogs
GROUP BY Surname, PrinterName
HAVING Count(*) >= 10
In MySQL, the alias Total can be used in the HAVING clause
SELECT PrinterName, Surname, COUNT(*) as Total
FROM PrintLogs
GROUP BY Surname, PrinterName
HAVING Total >= 10

Page 62 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

Note that only records with


a total greater than 10 are
displayed. The users
“Banks”, “Beck” and
“Brewer” have print jobs
that are larger than 10 sent
to two printers.

2.7.3 GROUP BY, ORDER BY and HAVING Summary

 The table produced is a SUMMARY of the actual table where the aggregate (max, min, count,
sum or avg) is calculated for each GROUP BY item.

 The query must involve an aggregate function.

 GROUP BY sorts by the fields it groups according to the natural sort order of the data being
grouped by, but its primary function is not a sort.

 Another field can be used to create a different sort using ORDER BY and ORDER BY must be
the last clause used in a query.

 GROUP BY can make use of more than 1 field.

 If you want to restrict rows based on an aggregate in a condition, the HAVING clause is used
instead of WHERE and is coded immediately after the GROUP BY clause.

EXERCISE 8

1. Find the average cost of print jobs sent to the printer.


2. Find the average cost to print a single page of print jobs sent to the printer.
3. Determine the total cost of the print jobs sent to each printer.
4. Determine the total cost sent to each printer. Include VAT in the cost of each print job. Do not
include print jobs that were not printed.
5. Determine the total cost sent to each printer. Include VAT in the cost of each print job.
6. Calculate the total number of colour pages sent to each printer.
7. Determine the cheapest print jobs sent to each printer.

1.8 Queries that Alter the Data in a Table


The 3 remaining query types INSERT (to add a new record), UPDATE (to edit/change (a) record/s)
and DELETE (to delete/remove (a) record/s) are provided by SQL to change the data in the database.
We will revise the syntax of each and provide examples that will demonstrate the effect of each on the
PrintDB database.

Learning Unit 2 SQL Revision Page 63


exploring IT: Java Programming Grade 12

2.8.1 INSERT
There are 3 versions to the INSERT query. We can insert all the fields for a record, only some of the
fields or we can use another record to provide the values for some of the fields.

2.8.1.1 Inserting All Fields


The syntax is:
INSERT INTO <tableName>
VALUES (<field1Data>, <field2Data>, …, <fieldNData>)

This version requires that:

 Data for all fields must be provided, separated by commas (‘,’)


 The order of the data for the fields must correspond exactly to the order of the fields in the
table
 In MS Access, string data fields must be enclosed in quotes (double (“”) or single (‘’)
quotes are acceptable) although MS-Access doesn’t interpret string data with single
quotes correctly always (especially strings that contain multiple words, special characters
or embedded keywords), so it’s safer to use double-quotes (“”)
 In MySQL, string data fields must be enclosed in single quotes
 In MS Access, Date/Time data fields must be enclosed in ‘#’ symbols, for example,
#1963-03-20#
 InMySQL, Date/Time data fields must be enclosed in single quotes, for example, '1963-
03-20'.
 All other data types, numbers and Boolean, are not enclosed in anything
This is quite a tedious task using the PrintLogs table, as there are many (14) fields; but, it’s a
worthwhile exercise to perform such a query because attention to the data type detail for each field is
critical, to ensure the query will work.
We are going to perform a query that will INSERT a new print job record with the following data:

Field Data Field Data


Date #2017/05/23 9:33:00# PrinterSerialNo ‘CNCKF11815’
Surname ‘Admin’ TotalPages 5
FirstName ‘Admin’ TotalColourPages 0
DOB #2000/01/01# Copies 1
EMail ‘Admin@MySchool.co.za’ Cost 0.56
PrinterName ‘hsxeroxd95’ SizeKB 1513
PrinterModel ‘HP LaserJet P2055dn’ Printed True

ACTIVITY 20
Type in the following query to INSERT the record with the data above:
INSERT INTO PrintLogs
VALUES
(#2017/05/23 9:33:00#, “Admin”, “Admin”, #2000/01/01#,
“Admin@MySchool.co.za”, “hsxeroxd95”, “HP LaserJet P2055dn”,
“CNCKF11815”, 5, 0, 1, 0.56, 1513, True)

2.8.1.2 Inserting Specified Fields


To insert some of the fields we need to list the fields that will receives values followed by the values.
The order and type of the fields must match the order and type of the values:
INSERT INTO tableName
(<fieldTitle1>, <fieldTitle2>,…, <fieldTitleN>)

Page 64 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

VALUES
(<fieldValue1>, <fieldValue2>, …, <fieldValueN>)

This version requires that:

 Data for the field names provided in the 1st set of brackets must correspond (in the 2nd
set of brackets after the VALUES keyword) exactly to the field names in the 1st set of
brackets in terms of data type and order
 Not all of the fields of the table need to be specified and the order of the fields can be
different to the actual table; any fields that are not specified will have NULL data for the
new record
 The fields listed do not need to be in the same order as they appear in the table.
 The rules about quotes (for string data) and ‘#’ symbols (for Date/Time values) still apply
here.
This version of the INSERT query is essential for a table that has an Autonumber primary key field
as an autonumber field cannot be assigned in a query; it is automatically assigned by the database
application. We can’t demonstrate the effect of performing this kind of INSERT as the table
PrintLogs doesn’t have an Autonumber primary key; however, this version works even if there isn’t a
primary key. If we wanted to INSERT a record with a limited number of fields (not all the fields in the
table), and in a different order of fields to those actually in the table, this version works perfectly.
Perform the following query that will INSERT a partial record into the PrintLogs table:
INSERT INTO PrintLogs (Copies, Printed, FirstName, Surname)
VALUES (1, True, “Joe”, “Soap”)

This is a pretty meaningless record, but it does exist; you can use the data find tool in your database
application to locate the surname ‘Soap’ or the first name ‘Joe’. The position of the record in the table
will be determined by the database application (in this case, it turned out to be record number 2299).
In MySQL, the record is entered in the last position of the table.

As long as you match the data values to the selected fields in order and type, the query will work.
INSERT INTO PrintLogs (Copies, Printed, FirstName, Surname)

VALUES (1, True, “Joe”, “Soap”)

2.8.1.3 Inserting Multiple Records Using an Embedded SELECT


Sometimes it is required to add records to a table with similar data to an existing record in the table.
This version of INSERT can add multiple records to a table using one INSERT statement with an
embedded SELECT.
The syntax is:
INSERT INTO tableName (<fieldTitle1>, <fieldTitle2>,…, <fieldTitleN>)
SELECT (<fieldTitle1>, <fieldTitle2>,…, <fieldTitleN>)
FROM tableName
WHERE <condition>

This syntax may be a little confusing to understand, so consider a worked example from the table
PrintLogs:
The user ‘Schultz’, ‘Faith’ completed 10 print jobs, as shown below:

Learning Unit 2 SQL Revision Page 65


exploring IT: Java Programming Grade 12

To add records for ‘Joe Soap’ that are an exact copy of the records for Faith Schultz; i.e. Joe Soap
printed the exact jobs as Faith Schultz. All the details for Faith Schultz’s print jobs must be allocated
to Joe Soap, except for Joe Soap’s personal details. We embed a SELECT statement to list the field
names we want to carry over from Faith Schultz and for the personal details of Joe Soap, we ‘hard
code’ (provide actual data) in the place of the field. The WHERE clause in the embedded SELECT
statement needs to provide a condition that will isolate the rows for Faith Schultz.
In MS Access, the field called Date is problematic as it is considered a reserved word.
Change the name of the field Date to be PDate in Design View of the table Printlogs.
Code the following query to INSERT records with an embedded SELECT:
INSERT INTO PrintLogs (PDate, Surname, FirstName, DOB, EMail,
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed)

SELECT PDate, 'Soap', 'Joe', #2000/02/23#, 'SoapJ@MySchool.co.za',


PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed
FROM PrintLogs
WHERE Surname = 'Schultz' AND FirstName = 'Faith'
In MySQL, the field called Date does not cause any problems. Type in the query using the field
name Date. Remember to enclose the value for the date field in single quotes.
INSERT INTO PrintLogs (Date, Surname, FirstName, DOB, EMail,
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed)
SELECT Date, 'Soap', 'Joe', '2000/02/23', 'SoapJ@MySchool.co.za',
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed
FROM PrintLogs
WHERE Surname = 'Schultz' AND FirstName = 'Faith'
Run a query or look up in the table to check that Joe Soap has the same 10 records (all
information included) as Faith Schultz, except for his personal data which replaced that of Faith
Schultz:

Page 66 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

Hint: Change the design view named field Date to PDate; MS-Access doesn’t like a field named
Date in an INSERT query – it sees it as a reserved word.
Change the query to insert records for Lee Destiny, born on the 10 April 1984 who printed to the
hsadminxerox7835 on same days as Maya Hill with identical print jobs.
INSERT INTO PrintLogs (Date, Surname, FirstName, DOB, EMail,
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed)
SELECT Date, 'Destiny', 'Lee', #1984/04/10#,
'DestinyL@MySchool.co.za', PrinterName, PrinterModel, PrinterSerialNo,
TotalPages, TotalColourPages, Copies, Cost, SizeKB, Printed
FROM PrintLogs
WHERE Surname = 'Hill' AND FirstName = 'Maya' AND PrinterName =
'hsadminxerox7835'
Check that the records have been inserted using the query”
SELECT *
FROM printlogs
WHERE Surname = 'Lee' AND FirstName = 'Destiny' AND PrinterName =
'hsadminxerox7835'

2.8.2 UPDATE Statement


The UPDATE query does not have different versions, although including a WHERE clause will restrict
the rows (records) that will be updated; not including a WHERE clause will update all the records in
the table.
The syntax of an UPDATE query is:
UPDATE tableName
SET field1 = value1 [, field2 = value2, …, fieldN = valueN]
[WHERE <condition>]

Square brackets ([ ]) indicate optional code; i.e. you can update one field or many fields. Separate
successive fieldX = valueX assignments with commas and only use the keyword SET once, after the
table name.

2.8.2.1 UPDATE without a WHERE Clause


Not using a WHERE clause will UPDATE or change all the records in the table. An UPDATE without
a WHERE clause can be used to change value of a field for all records using a formula.
The accounting department may want to have the Cost fields updated for all records to include VAT.
It is possible to calculate the Cost including VAT without actually changing the Cost field using a
SELECT query that generates a calculated field to include VAT; but, if the accounting department
insist that they want the Cost field to include VAT then the field must be permanently changed.

ACTIVITY 21
Code the query to change all Cost values to include VAT:
UPDATE PrintLogs
SET Cost = Cost * 1.14

2.8.2.2 Using a WHERE Clause


A print job should only attract a Cost value if the job was actually printed. To be compliant with
account department requests, we can set all values for the Cost field to 0 if the job wasn’t printed. I
Code an UPDATE query to SET all Cost values to 0 for non-printed jobs:
UPDATE PrintLogs

Learning Unit 2 SQL Revision Page 67


exploring IT: Java Programming Grade 12

SET Cost = 0
WHERE NOT(Printed)

This would be the same as:


UPDATE PrintLogs
SET Cost = 0
WHERE Printed = False

Logically, the conditions NOT(Printed) and Printed = False mean the same.
If you wanted to change a record in a table that has a primary key to uniquely identify each record,
you would simply include a WHERE clause that restricts the row to match the primary key field value:
WHERE PrimaryKeyField = Value

Since there is no primary key field in the PrintLogs table, so we would have to include a WHERE
clause that finds the correct record/s using a compound condition. The combination of the Surname
and FirstName may not be unique, however, in this example the email address will be unique.
The print jobs sent by ‘Henry Bates’ are displayed below with the Surname, FirstName and Email
fields.

It has been discovered that ‘Henry’ is actually ‘Henrietta’, a female, and she is actually not a student,
as indicated in the Email address field.
Code the following query to SET the FirstName to ‘Henrietta’ and change the Email address to
BatesH@MySchool.co.za:
UPDATE PrintLogs
SET FirstName = 'Henrietta', EMail = 'BatesH@MySchool.co.za'
WHERE Surname = 'Bates' AND FirstName = 'Henry'
Now run the query display Henrietta Bates:
SELECT Surname, FirstName, Email FROM PrintLogs
WHERE Surname = 'Bates' AND FirstName = 'Henrietta'

You will see that the relevant fields have been updated:

Rewrite the UPDATE query to change ‘Henry’ to ‘Henrietta’ and her email address using the
email address to change the correct records. Obviously, you cannot run this query on the
database as the records have already been altered.

2.8.3 DELETE Statement


The DELETE query is the simplest, but potentially the most damaging query. With no WHERE
clause, you will delete ALL the records in the table and they cannot be retrieved later.

Page 68 Learning Unit 2 SQL Revision


exploring IT: Java Programming Grade 12

In a similar fashion to the UPDATE query, a well-structured table should offer a primary key that can
be used to DELETE a single record using:
WHERE PrimaryKeyField = Value

The syntax of the DELETE query to delete ALL records in a table is:
DELETE *
FROM <tableName>

Note that this will not remove the table itself, only all the data in the table. The ‘*’ is not necessary in
the statement. A better version of the DELETE statement is:
DELETE
FROM <tableName>
WHERE fieldname = value

This more controlled DELETE statement will restrict the rows deleted using the WHERE clause.
For example, consider all the print jobs sent to the printer with PrinterSerialNo ‘CNB7H8R1X5’ were
queued and waiting to be printed. Unfortunately, a power surge irreparably damaged the printer and
all print jobs were lost; making it necessary to delete print jobs sent to this printer.
The total of the number of print jobs sent to each printer is
shown alongside with the number of print jobs sent to
PrinterSerialNo ‘CNB7H8R1X5’ was 19. The query produces
23 printer serial numbers for the printers in the table.

ACTIVITY 22
Code a query to produce the table shown alongside:
Perform the query to DELETE the printer with the serial
number 'CNB7H8R1X5'
DELETE FROM PrintLogs
WHERE PrinterSerialNo = 'CNB7H8R1X5'
Run the previously query to check that the printer no
longer exists.
Write down another query to check if the printer has
been successfully deleted.

EXERCISE 9
Learning Unit 2 SQL Revision Page 69
exploring IT: Java Programming Grade 12

1. Insert a new printer with the printer name ‘HPFastPrint’, model ‘HPFastPrint2500’ and serial
number ‘HP25001234’. No one has printed to this printer as yet, so the rest of the fields will be
blank.
2. A printer ‘prepxerox7835’ was replaced by a temporary printer called ‘XeroxReplacement’. A
serial number and model has not been assigned to this printer. Insert this new printer using all
the print jobs that were sent to the ‘prepXerox7835’ and then delete all the ‘prepxerox7835’
records.
3. Change the size of each print job to be rounded to the nearest 100 in KB.
4. The print cost has doubled for all print jobs sent to the Xerox machines.

Page 70 Learning Unit 2 SQL Revision

You might also like