Professional Documents
Culture Documents
The print logs consist of details about the user, the printer to which they send the job and the print job.
User
The user’s first name, surname, date of birth (DOB) and email are stored in the second, third,
fourth and fifth fields. A user may either be a member of staff at the school or a student.
Students are identified with the word “student” after the @ sign. For example,
Alejandra.Olson@student.MySchool.co.za. Staff members only have the school’s domain name
after the @ sign. For example, CastroN@MySchool.co.za.
Printer
The details of the printer are stored in the printer name, printer model and the printer serial
number fields.
Print Job
The Date field stores the date and time of the print job sent to a printer by a user. The last six
fields store the details of the particular print job sent by the user to the printer. These details
include the total number of pages, the number of pages that are colour in the print job, the
number of copies, the size in kilobytes and whether the job was actually printed. A print job may
consist of colour and black pages. To determine the number of black and white pages, subtract
the number of colour pages from the total pages. Not all print jobs are printed. User may not
have rights to print to the printer, or enough credits to print the number of pages.
ACTIVITY 1
Open the database PrintDB.accdb and perform the following SQL query to extract all the data in the
table:
SELECT *
FROM PrintLogs
You will notice that all the columns (fields) are included (too wide to show in this document) and there
are 2642 rows (records).
2.2.1 Sorting
A table can be sorted according to the field/s indicated in an ORDER BY clause:
SELECT *
FROM <tableName>
ORDER BY <field1>, <field2>,…,<fieldN>
You will notice that all the columns (fields) are included, as before, but an order has been created on
the Surname field. The order applies to the Surname only, ascending alphabetically, and no other
additional fields will be sorted. All of the users appear to be unique, so don’t assume that the first
names are sorted as well; if more than one user had the same surname then a sort on the first name
would only be co-incidental.
Notice that the records have, once again, been sorted (firstly) on the Surname field (Adams, Aguilar,
Alexander, Allen,…), but now there is a Date order within each surname. When sorting on multiple
fields, the first sort is maintained and any subsequent sorts take place within the previous sort.
Looking at Rebecca Alexander, in particular, she printed, in order, on the 23rd, 25th and 29th of May;
she printed twice on the 25th and the times are also ordered because the time portion is part of the
combined date/time field.
You can reverse the order of any of the sort fields by adding the keyword DESC to the required
field/s.
Change the query to extract all the data in the table, sorted on the Surname field, in reverse
order, and then the Date field:
SELECT *
FROM PrintLogs
ORDER BY Surname DESC, Date
Notice that the result starts with the last alphabetical user ‘Young’, but the date/time order for each
user with the surname ‘Wright’ is still in ascending order.
MS Access MySQL
SELECT TOP <n> * SELECT *
FROM <tableName> FROM <tableName>
ORDER BY <field1>,…,<fieldN> ORDER BY <field1>,…,<fieldN>
LIMIT <n>
Perform the following SQL query to extract the three oldest users in the table, sorted on the DOB
field. The DOB field indicates the age of the user. In MS Access, the date is in the format 1970/03/02
and in MySQL 1970-03-02.
ACTIVITY 2
Type in the SQL statement below and run it.
MS Access MySQL
SELECT TOP 3 * SELECT *
FROM PrintLogs FROM PrintLogs
ORDER BY DOB ORDER BY DOB
LIMIT 3
In MS Access, we expected to see three results in the Result Set, but ended up with 43. The reason
is that were more than three rows for the oldest user, so all of the results were included for that user.
We will successfully get the correct result when we use the keyword DISTINCT, in a later section.
In MySQL, only three results are shown, all relating the same user. MS Access and MySQL may
return different results for similar queries. This happens in rare cases.
MS Access MySQL
SELECT TOP 3 * SELECT *
FROM PrintLogs FROM PrintLogs
LIMIT 3
Check that the first three rows of the table are produced.
MS Access MySQL
SELECT TOP 1 * SELECT *
FROM PrintLogs FROM PrintLogs
ORDER BY TotalColourPages DESC ORDER BY TotalColourPages DESC
LIMIT 1
In MS Access, two rows are returned both from Josiah Thomas who printed a similar job of 120 colour
pages on the 23rd and 29th May.
In MySQL, only the print job of 120 pages printed by Josiah Thomas on the 23rd May is displayed.
In both cases we can see who was responsible for the printing as all the fields related to the record
are returned. We could determine the total number of colour pages using the aggregate function
MAX, however MAX on its own would not determine WHO was responsible for the printing. We will
return to this concept when we revise aggregate functions.
EXERCISE 1
Write SQL statements for the following queries using the table PrintLogs:
1. Display the table in descending order of the size of the print job.
2. Display the table sorted according to the date, then the printer name and then the surname of
the user
3. Display the top 10 largest print jobs in kilobytes.
4. Display the 20 lowest records sorted by cost.
5. Find the print job with the highest number of pages.
6. Find the smallest print job in kilobytes.
7. Normalise the database PrintDB to third normal form as a revision exercise. We will continue to
use the original database in the rest of this Learning Unit.
ACTIVITY 3
Perform the following SQL query to display only the Surname, FirstName and DOB fields from
PrintLogs, ordered by DOB:
SELECT Surname, FirstName, DOB
FROM PrintLogs
You can see clearly that only three columns exist in the Result Set and all the rows (2642) are
returned in the table.
Change the query to sort the table from oldest to youngest.
SELECT Surname, FirstName, DOB
FROM PrintLogs
ORDER BY DOB
The same three columns are shown with the same number of rows and no alphabetical ordering on
either the Surname or FirstName fields, only on the DOB field, where you see the oldest user listed
first.
Interestingly, because we now have a list of all unique names, the result set indicates a total of 340
records, which must be the total number of users in the table.
We can now get back to using the TOP <n> directive more successfully. Combining TOP <n> with
DISTINCT and ORDER BY will show us who the 3 eldest users are.
Perform the following SQL query to display the oldest 3 users’ Surname, FirstName and DOB
fields from PrintLogs, ordered by DOB:
SELECT DISTINCT TOP 3 Surname, FirstName, DOB
FROM PrintLogs
ORDER BY DOB
EXERCISE 2
1. List all the users in the table PrintLogs displaying each user only once.
2. Create a unique alphabetical list of printers showing only the printer name.
2.4 Calculations
Not only can we limit the number of fields in a query, but we can create extra calculated, or derived
fields. Calculated fields can use the existing fields to generate a new answer field, using an arithmetic
formula, which is not a field in the original table. For example, each record in the table has a
TotalPages and a Copies field; this means that if the Copies filed has a value of 1, the total number
of pages in the print job would be the same as the TotalPages field, but, if Copies has a value of 2,
then the overall total would be 2 x TotalPages. We can produce a query to show the overall total
number of pages for each print job by multiplying the two fields, TotalPages and Copies, for each
record.
If we need to perform a calculation for ALL the records in a table, then we place the calculation in the
SELECT part of the SQL statement next to any fields that are listed. The Result Set will display the
listed fields as columns and a new column for each calculation. This field is not added to the table, it
is generated by the SQL statement.
The general form of a SQL statement with a calculation and listed fields where the calculation is
performed on all the rows in the table is:
SELECT <field1>,…,<fieldN>, <calculation>
FROM <tableName>
ACTIVITY 4
Code a SQL query to display the Date, Surname, FirstName, TotalPages, Copies and
TotalPages x Copies for each record:
SELECT Date, Surname, FirstName, TotalPages, Copies,
TotalPages * Copies
FROM PrintLogs
It is good practise to include the fields used in the calculation so you can check your answer.
The Result Set should look like:
In MS Access, notice that 2642 records are displayed with an extra field called Expr1005 which is the
product of the TotalPages and Copies field. In MySQL, the calculation is used as a heading for the
new column.
Notice how the title changed for the calculated field in MS Access:
In My SQL:
Or, if you wanted to provide an alias ‘BY’, to abbreviate Birth Year, you would also have to use square
brackets around ‘BY’ ([BY]) as ‘BY’ is a keyword used in clauses like ORDER BY.
Although MS Access allows the use of square brackets to enclose field names with spaces, as
indicated above, MySQL is not as tolerant and will not allow either square brackets or spaces in field
names.
In general, the rules for identifier names for a variable, method or program name state that the
identifier should not contain any spaces or special characters. In keeping with this concept, it is good
practise to apply these rules when naming generated fields in a database. It is far better, for both MS
Access and MySQL, to use a convention known as CamelCase, where the spaces are removed and
the first letter of each word is capitalised, representing the humps of a Camel. CamelCase is
subdivided into upper and lower CamelCase:
upper CamelCase capitalises all first letters including the first letter - CamelCase
lower CamelCase capitalises all first letters excluding the first letter – camelCase.
CamelCase used for many of the field names in the PrintLogs table. Another acceptable convention
is to use underscores ‘_’ <Shift> + <-> to combine separate words.
For example, TotalPages * Copies AS Overall_Total_Pages
Underscores do work, but the screen or printed view of a field name is sometimes obscured by
bottom border of the field’s cell and can be misinterpreted as a space.
ACTIVITY 5
In MS Access, complete the table to determine the effect of the arithmetic functions INT and
ROUND.
SELECT INT(4.0)
SELECT INT(4.1)
INT
SELECT INT(4.9)
SELECT INT(4.5)
SELECT ROUND(4.1)
SELECT ROUND(4.9)
SELECT ROUND(4.5)
ROUND
SELECT ROUND(4.445)
SELECT ROUND(4.445,1)
SELECT ROUND(4.449,2)
If you are using MySQL, complete the table to determine the purpose of FLOOR, ROUND,
CEILING and TRUNCATE.
SELECT FLOOR(4.0)
SELECT FLOOR (4.1)
FLOOR
SELECT FLOOR (4.9)
SELECT FLOOR (4.5)
SELECT ROUND(4.1)
SELECT ROUND(4.9)
SELECT ROUND(4.5)
ROUND
SELECT ROUND(4.445)
SELECT ROUND(4.445,1)
SELECT ROUND(4.449,2)
SELECT CEILING(4.0)
SELECT CEILING (4.1)
CEILING
SELECT CEILING (4.9)
SELECT CEILING (4.5)
SELECT TRUNCATE(1, 0 )
TRUNCATE SELECT TRUNCATE(1.699,1)
SELECT TRUNCATE(185,-1)
The functions FLOOR and CEILING were named metaphorically: the ceiling would be the highest of a
room and the floor would be the lowest of a room.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/
2.4.2.1 INT/FLOOR
The INT function truncates any real number or expression to the whole number portion only; i.e. it
rounds down. For example, INT(14.2) = 14 and INT(14.9) = 14. FLOOR is the MySQL equivalent of
INT.
ACTIVITY 6
Perform the following SQL query to display the TotalPages, TotalColourPages, and the
percentage of the print job that consisted of colour pages. Only the whole number percentage
must be displayed:
SELECT TotalPages, TotalColourPages,
INT(TotalColourPages/TotalPages * 100)
FROM PrintLogs
Check the Result Set answers to confirm that the INT function works correctly.
2.4.2.2 ROUND
The ROUND function rounds a result, either up or down according to the normal convention: if the
decimal portion is >= 0.5, the rounding is upwards to the next whole number, otherwise it’s rounded
down.
Change the SQL query to display only the TotalPages, TotalColourPages, and the percentage
of the print job that consisted of colour pages, but replace the INT function call with a call to the
ROUND function:
SELECT TotalPages, TotalColourPages,
ROUND(TotalColourPages/TotalPages * 100)
FROM PrintLogs
The ROUND function can also be used with a parameter to indicate how many decimal places must
be included in the answer.
Change the query to display only the TotalPages, TotalColourPages, and the percentage of
the print job that consisted of colour pages, but replace the ROUND function call with a call to
the ROUND(expression, numberOfDecimalPlaces) function to round to 2 decimal places:
SELECT TotalPages, TotalColourPages,
ROUND(TotalColourPages/TotalPages * 100, 2)
FROM PrintLogs
In MS Access, some of the results are rounded to 2 decimal places, some have some have none and
some have one. This is because the ROUND function only displays the number of significant, non-
zero, decimal places, regardless of what was requested.
In MySQL, all the numbers are displayed to two decimal places, regardless whether they are zero.
2.4.2.3 CEILING
In MySQL, CEILING rounds the decimal up to the next integer. The decimal is irrelevant in this
function as it is always eliminated when the number rounds up to an integer.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/
2.4.2.4 TRUNCATE
The TRUNCATE function in My SQL, is not very common to use since it removes digits from your
number. TRUNCATE takes two parameters: the first is the number and the second is the position to
which to truncate.
For example,
SELECT TRUNCATE( 1, 0 ); // returns 1
SELECT TRUNCATE( 1.699, 1 ); // returns 1.6
SELECT TRUNCATE( 185, -1 ); // returns 180
You can see in the example that the third SQL statement has -1 for the position to truncate, indicating
the first position on the left of the decimal place. The result is 180 because the 5 is truncated, but
replaced with 0 so that the answer is a valid approximation of the original number.
http://www.afterhoursprogramming.com/tutorial/SQL/Rounding-Numbers/
2.4.2.5 RND/RAND
MS Access provides RND and MySQL uses RAND to randomly generate a number between 0 and 1,
not including 1. Replace RND with RAND if you are using My SQL.
ACTIVITY 7
Type in the following SQL statements to generate a random number between 2 and 7 using the
RND/RAND function.
Run each query more than once to check that a different result is produced each time.
Please explain the use of seed?
In general, to produce one random integer between A and B (inclusive), use the formula:
SELECT INT(RND() * (B-A+1)) + A / SELECT FLOOR(RAND() * (B-A+1)) + A
OR
SELECT INT(RND() * (NoofOptions)) + StartNum
SELECT FLOOR(RAND() * (NoofOptions)) + StartNum
ACTIVITY 8
Many printers have the ability to print in ‘pamphlet’ style, which means that 4 actual pages fit on 1
sheet of paper, either A4 with 4 A5 pages, or A3 with 4 A4 pages, back-to-back, in the correct order.
This means that if the total number of pages in the job is not a multiple of 4, there will be some wasted
(blank) pages included.
Write a SQL query to display the TotalPages and the number of wasted pages if a pamphlet
style print is used for a document. The WastedPages is calculated by subtracting the 4 MOD
TotalPages from 4:
SELECT TotalPages, 4 – TotalPages MOD 4 AS WastedPages
FROM PrintLogs
This query produces correct results for all page totals other than page counts that are exact multiples
of 4.
Can you think of a way to produce the correct answer including multiples of 4 in SQL?
EXERCISE 3
Write the SQL statement only using SELECT to perform the following calculations. Test each SQL
statement to check your answers.
1. Generate a random number between 10 and 20.
2. Generate a random number between 13 and 47.
3. Generate a random number between 0 and 99.
3.1. Is it possible to always have an answer that consists of 2 digits?
4. Determine the remainder after 63 is divided by 8.
5. Divide 25 by 3, rounded to three decimal places.
6. Divide 125 by 26 using integer division.
6.1. Can you provide two different ways of doing this calculation?
7. Calculate 1.23² + 4.798² divided by 7.65 – 2.3 rounded to the nearest whole number.
8. Can you think of an example when \ in MS Access will not produce the same result as DIV in
MY SQL?
functions. To get an idea, you can look at the text manipulation functions available in MS Excel.
These functions can generally be used in SQL statements. Since these functions are quite simple to
interpret, we are going to look at them all together:
Examples:
MS Access
My SQL
Note that MID has three parameters, the first is the string from which text is to be extracted, the
second is the starting character where to begin (the string is numbered from 1) and the third character
is a count of the number of characters to extract.
For example, MID('Oh Happy Day', 4, 5) will produce Happy
5 characters
1 2 3 4 5 6 7 8 9 10 11 12
O h H a p p y D a y
ACTIVITY 9
We are going to make a SQL query that will uniquely display:
• The Surname
• The FirstName
• The Surname and initial, with a comma and space separating them
• The total number of characters in the Surname and FirstName
Type in the query in MS Access, using the '&' operator to join the strings:
SELECT DISTINCT Surname,FirstName,
Surname & ', ' & LEFT(FirstName,1) AS SurnInit,
LEN(Surname) + LEN(FirstName) AS TotalNamesLength
FROM PrintLogs
If you are using My SQL type in the following query using the CONCAT function with the fields
listed as parameters separated by commas:
SELECT DISTINCT Surname,FirstName,
CONCAT(Surname,', ',LEFT(FirstName,1)) AS SurnInit,
LENGTH(Surname) + LENGTH(FirstName) AS TotalNamesLength
FROM PrintLogs
uniqueness (distinct),
correct joining of the Surname and Initial (note the inclusion of the literal text ‘, ‘ to get a
comma and space after the Surname),
correct total for the sum of the characters of the Surname and FirstName, using
LEN/LENGTH
EXERCISE 4
1. Write a SQL statement to create a code for each user using the first three letters of the
surname, the 3rd and 4th letter of the first name and a single random digit. Name the generated
field Code.
2. Code a SQL statement to generate a code for each user using the last 2 letters of the surname
followed by two random digits. Ensure that if a 9 is generated then 09 is added.
3. Create a code for each user using their initials of their surname followed by the initial of the first
name and the number of characters in both fields combined.
DATE(datetimevalue) – My SQL
YEAR(datetimevalue)
MONTH(datetimevalue)
DAY(datetimevalue)
TIME(datetimevalue) – My SQL
HOUR(datetimevalue)
MINUTE(datetimevalue)
SECOND(datetimevalue)
NOW()
These functions can be used together with the NOW() function, which accesses the date/time value of
the computer system.
ACTIVITY 10
As an introduction, create a query that will display each of the date parts of the DOB field (year,
month, day), as separate fields, to test the functions.
SELECT DOB,
YEAR(DOB) AS BirthYear,
MONTH(DOB) AS BirthMonth,
DAY(DOB) AS BirthDay
FROM PrintLogs
The field values are now integers and are not part of a date/time value.
If you are using My SQL, code a new SQL statement to isolate the DATE and TIME values from
the Date field.
SELECT Date, DATE(Date) AS PrintDate, TIME(Date) AS PrintTime
FROM PrintLogs
The query is fairly accurate, but it actually shows what the age will be this year, irrespective of
whether the birth month and day have been reached this year or not. A slightly more accurate age
can be determined by dividing the difference between the current date (NOW()) and the birth date
(DOB) by 365.25 (to accommodate leap years). Any date/time value is actually the total number of
days, hours, minutes and seconds (based on a start date of 1900/01/01, usually. Dividing the
difference by 365.25 converts to years and provides more accuracy than the previous query.
In My SQL, the TIMESTAMPDIFF function can be used to accurately determine the age in years.
If you are using MS Access, change the previous query to provide a more accurate age, by
adding the AccurateAge field:
MS Access MySQL
SELECT DOB, SELECT DOB,
YEAR(DOB) AS BirthYear, YEAR(DOB) AS BirthYear,
MONTH(DOB) AS BirthMonth, MONTH(DOB) AS BirthMonth,
DAY(DOB) AS BirthDay, DAY(DOB) AS BirthDay,
YEAR(NOW()) - YEAR(DOB) AS Age, YEAR(NOW()) - YEAR(DOB) AS Age,
(NOW() – DOB)/365.25 AS TIMESTAMPDIFF(YEAR,DOB,Now())
AccurateAge AS AccurateAge
FROM PrintLogs FROM PrintLogs
In MS Access, some of the ages are one less in whole years because the birth month and day has
not yet been reached this year:
In MS Access use the INT function to remove the decimal places from the AccurateAge result
to produce an integer.
In My SQL, AccurateAge is an integer:
2.4.6.1 COUNT
ACTIVITY 11
How many rows and columns do you expect to be produced by the query using the COUNT
function?
SELECT COUNT(*) AS TotalPrintJobs
FROM PrintLogs
Type in your query to check your answer.
This query produces only ONE answer, which is the total number of
records in the table (2462).
Change the query to remove the star and replace it with a specific field.
SELECT COUNT(FirstName) AS TotalPrintJobs
FROM PrintLogs
The same result will be displayed, as the number of rows will be the same for any field if the fields
have a value. The COUNT function can at times return different values if fields have null values.
Type in following SQL statement to insert a record into the table with null values. The SQL
statement will only provide a value for the Firstname field, the rest of the fields will have no
values.
INSERT INTO PrintLogs (FirstName)
VALUES ('TestName')
View the table and scroll to the last record in the table.
Run the query to count the number of values in the FirstName field.
SELECT COUNT(FirstName) AS TotalPrintJobs
FROM PrintLogs
The result will be 2642 since the Surname field does not have a value.
In conclusion, the COUNT(<FieldName>), will count the number of non-null entries for the chosen
field
2.4.6.2 AVG
ACTIVITY 12
The AVG function will determine the average of the field sent as a parameter to the function.
Type in the SQL query to determine the average number of total pages in each print job.
SELECT AVG(TotalPages) AS AveragePages
FROM PrintLogs
Change the query to round the result to one decimal place to produce
11.6.
SELECT ROUND(AVG(TotalPages),1) AS AveragePages
FROM PrintLogs
2.4.6.3 MIN
The smallest value of a field can be determined using the MIN function.
Code the SQL statement to determine the smallest print job size.
SELECT MIN(SizeKB) AS MinFileSize
FROM PrintLogs
If we wanted to know who sent the smallest print job, we could try by
including the name of the field.
Adapt the query to include the FirstName and Surname fields.
SELECT FirstName, Surname, MIN(SizeKB) AS MinFileSize
FROM PrintLogs
The query will produce and error in MS Access explaining that the FirstName field is not part of the
MIN function, which basically means that the field and the MIN result is unrelated.
In MySQL, the query executes but the FirstName and Surname returned are that of the first record of
the table which is nonsensical.
To find WHO printed the smallest print job, we can use ORDER BY and LIMIT to return the record
which combines the related fields.
Once again, MS Access produces a solution of 15 results instead of 1 as more than one user has
printed a job of 1 KB.
In MySQL, only one result is returned, even though more than one record has a size of 1 KB.
2.4.6.4 MAX
Type in the following query to determine the largest print job.
SELECT MAX(SizeKB) AS MaxFileSize
FROM PrintLogs
Change the query to determine which user sent the largest print job.
Change the query to find the smallest and largest print job.
Now change the query to find both the largest and smallest print job.
SELECT MAX(SizeKB) AS MaxFileSize, MIN(SizeKB) AS MinFileSize
FROM PrintLogs
Note that the largest and smallest print job is returned in a single Result Set, but these values do not
relate to one record.
2.4.6.5 SUM
To add up all the values in a field, the SUM function can be used. Do not confuse SUM and COUNT.
SUM finds the total value of all the field’s values and COUNT determines the number of rows in the
table.
Code the query to add up the values in the TotalCost field.
SELECT SUM(Cost) AS TotalCost
FROM PrintLogs
EXERCISE 5
The <condition> can consist of any Boolean condition, using any number of fields. The condition
always consists of a field name, followed by a relational operator (=, <, >, <>) and then a comparison
value:
WHERE <field> (=, <, >,>=, <=, <>) <value>
ACTIVITY 13
Perform the following queries and check that the required outcome is obtained:
Find the records for users who printed documents consisting of more than 5 pages.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages > 5
The result set produces 450 records (from 2642), meaning that the remainder must have printed 5 or
less pages.
You can also compare fields with fields (or calculations involving fields):
Find the records that contained more colour pages than black and white.
SELECT Surname, FirstName, TotalPages, TotalColourPages
FROM PrintLogs
WHERE TotalColourPages > TotalPages - TotalColourPages
You can even involve functions (but not aggregates) in a WHERE clause.
Find the users that are 30 or less years old, show each user only once.
SELECT DISTINCT Surname, FirstName, YEAR(NOW())-YEAR(DOB) AS AGE
FROM PrintLogs
WHERE YEAR(NOW())-YEAR(DOB) <= 30
2.5.3.1 BETWEEN…AND
ACTIVITY 14
Type in the query to find the records for print jobs that had a total page count of between 5 and
10 copies using the query.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages >= 5 AND TotalPages <= 10
Note: The field TotalPages must be repeated for each condition. We cannot leave out the field in the
second condition, even though it makes sense in English.
WHERE TotalPages >= 5 AND <= 10
Note that the values 5 and 10 are included and using BETWEEN…AND is somewhat simpler than
using a compound condition. It can also be used for real and string data.
Type in the query using real numbers and check that it works.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE TotalPages BETWEEN 5.1 AND 10.2
Type in the query using string values and check that it works.
SELECT Surname, FirstName, TotalPages
FROM PrintLogs
WHERE FirstName BETWEEN 'Anne' AND 'Frank'
ACTIVITY 15
If we wanted to find all the distinct users who were born in the months that have 31 days, we
could perform the query:
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs
WHERE MONTH(DOB) = 1 OR MONTH(DOB) = 3 OR MONTH(DOB) = 5 OR MONTH(DOB)
= 7 OR MONTH(DOB) = 8 OR MONTH(DOB) = 10 OR MONTH(DOB) = 12
We can simplify the query using the IN operator and combine the values the MONTH(DOB) is
compared to in a set of values.
Perform the query using the IN operator to observe the same outcome:
SELECT DISTINCT Surname, FirstName, DOB
FROM PrintLogs
The IN operator also works with strings as values for the set elements. For example, if we wanted to
list all of the print jobs for users who have surnames: 'Adams', 'Bailey' or 'Carr'.
Type in the following query.
SELECT *
FROM PrintLogs
WHERE Surname IN ('Adams', 'Bailey', 'Carr')
Check that you get the results for the 3 surnames only.
Replacing BETWEEN…AND with IN
As long as the set of values used for IN is not too large to input, the IN statement can replace the
BETWEEN...AND statement.
Type in the following queries to check that they produce the same result.
SELECT *
FROM PrintLogs
WHERE MONTH(DOB) BETWEEN 3 AND 10
SELECT *
FROM PrintLogs
WHERE MONTH(DOB) IN (3,4,5,6,7,8,9,10)
2.5.3.3 LIKE
The LIKE operator is used to compare strings, and sub-strings; it usually includes a wildcard
character (‘*’) that can extract partial matches, depending on where the wildcard is placed. The
syntax of the LIKE operator is as follows:
WHERE <stringfield> LIKE ‘*sub-string*’
For example,
WHERE <stringfield> LIKE ‘abc*’
would extract all records where the <stringfield> value starts with ‘abc’, regardless of the rest of the
<stringfield> value. This would be the same as:
WHERE LEFT(<stringfield>,3) = ‘abc’
AND
WHERE <stringfield> LIKE ‘*abc’
ACTIVITY 16
Perform the following queries to test the different versions of the LIKE operator.
Explain the functions of each query.
SELECT DISTINCT Surname, FirstName
FROM PrintLogs
WHERE Surname LIKE 'Car*'
FROM PrintLogs
WHERE Surname LIKE '*son'
2.5.3.4 IS NULL
We can check if a field does not have a value or is null. Note that we do not use = but the word IS.
For example, WHERE <field> IS NULL
or, if we only want fields that have values: , WHERE <field> IS NOT NULL
Type in the following query to list the printers, without duplicates, that have no serial number.
SELECT DISTINCT PrinterName
FROM PrintLogs
WHERE PrinterSerialNo IS NULL
EXERCISE 6
To calculate the average, we need a complete separate SQL statement. The SQL statement to
calculate the average will be embedded in the WHERE clause as the right-hand operand of the
WHERE clause:
WHERE TotalPages > (SELECT AVG(TotalPages) FROM PrintLogs)
ACTIVITY 17
Type in the query to find the print jobs whose total pages are above average.
SELECT *
FROM PrintLogs
WHERE TotalPages > (SELECT AVG(TotalPages) FROM PrintLogs)
Type in the query to determine the print jobs whose cost is less than average.
SELECT *
FROM PrintLogs
WHERE Cost < (SELECT AVG(Cost) FROM PrintLogs)
Returning to the problem of finding the smallest print jobs, we used ORDER BY and TOP/LIMIT.
MS Access MySQL
SELECT TOP 1 FirstName, SELECT FirstName, Surname,
Surname, SizeKB SizeKB
FROM PrintLogs FROM PrintLogs
ORDER BY SizeKB ORDER BY SizeKB
LIMIT 1
HINT: To check that the embedded query SELECT MIN(SizeKB) FROM PrintLogs has no errors
you run the query on its own before adding it to the larger query.
EXERCISE 7
Write SQL statements to solve the following problems. Not all of these queries are embedded.
1. Find all the users whose surname is longer that average.
2. Find all print jobs that cost above R10.00
3. Find all print jobs that cost more than average. Remember to take the number of pages into
account.
4. List the largest print job in KB printed in black and white to the HP LaserJet P2055dn printer.
Display the user name and the size of the print job. There may be more than one user in this
result.
5. Count the number of students that print to any Xerox machine have printed more than average
colour pages.
2.7 GROUP BY
We could use a simple aggregate function call to determine the average age of the users. For
simplicity sake, we will use the inaccurate formula to determine the age in years.
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge
FROM PrintLogs
ACTIVITY 18
Type in the query to determine the average age for surnames beginning with ‘A’:
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge
FROM PrintLogs
WHERE LEFT(Surname,1) = 'A'
Recall that the WHERE clause eliminates rows in the existing table but only returns results for the first
letter of the alphabet and not ALL letters.
Notice that the average is different. This is contrary to expectation, so it would be interesting to see
the results for all the other letters of the alphabet. We can certainly perform the same query over and
over for the other 25 letters of the alphabet, but that would involve repetition which cannot be
performed in a single SQL query.
The GROUP BY clause provides the result of an aggregate function for each category of rows
satisfying a condition. The average for each letter can be set as the GROUP BY criterion so that the
aggregate no longer returns a single result for the entire table, as in the first
example, or a single letter as in the second example, using a WHERE clause.
GROUP BY returns the aggregate for each GROUP indicated by the GROUP
BY criterion. The word each (for every or per) often implies that the query
will most likely involve GROUP BY.
Perform the following query to test the outcome using GROUP BY:
SELECT AVG(YEAR(NOW())-YEAR(DOB)) AS AveAge FROM
PrintLogs
GROUP BY LEFT(Surname, 1)
Perform the query again to include the letter as the first field:
SELECT LEFT(Surname, 1) AS Letter, AVG(YEAR(NOW())-YEAR(DOB)) AS
AveAge
FROM PrintLogs
GROUP BY LEFT(Surname, 1)
Note that in MySQL, the ORDER BY clause can use functions or an alias.
We can find the three letters that occur the most.
Type in the query in MS Access to reverse the order and limit to the top three.
SELECT TOP 3 LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY COUNT(*) DESC
Or type the query into MySQL
SELECT LEFT(Surname, 1) AS Letter,
COUNT(*) AS NumberOf FROM PrintLogs
GROUP BY LEFT(Surname, 1)
ORDER BY NumberOf DESC
LIMIT 3
ACTIVITY 19
Type in the query to count the number of print jobs sent to a printer by each user. Make sure
the two fields that are in the GROUP BY clause are listed in the Result Set.
SELECT PrinterName, Surname, COUNT(*) as Total
FROM PrintLogs
GROUP BY PrinterName, Surname
The first rows of the Result Set show that the printer name is repeated for each user. The printers are
grouped in alphabetical order, and the users that have printed to each printer are listed in alphabetical
order together with the total. This COUNT function determines that there are four rows in the table
with a user name of “Alexander” and a PrinterName of “carmen hp printer”.
Check that there are 4 entries in the table where user name is “Alexander” and PrinterName is
“carmen hp printer”
SELECT COUNT(*) as Total
FROM PrintLogs
WHERE surname = 'Alexander' AND PrinterName= 'carmen hp printer'
We can swop the fields in the GROUP BY to count the number of print jobs a user has sent to
different printers.
Change the query by swopping the fields around in the GROUP BY clause.
The Result Set is GROUPed BY the Surname field in alphabetical order with each Surname repeated
for a different printer. The PrinterName is sorted alphabetically for each user together with the total
print jobs sent to the printer by a particular user.
The table produced is a SUMMARY of the actual table where the aggregate (max, min, count,
sum or avg) is calculated for each GROUP BY item.
GROUP BY sorts by the fields it groups according to the natural sort order of the data being
grouped by, but its primary function is not a sort.
Another field can be used to create a different sort using ORDER BY and ORDER BY must be
the last clause used in a query.
If you want to restrict rows based on an aggregate in a condition, the HAVING clause is used
instead of WHERE and is coded immediately after the GROUP BY clause.
EXERCISE 8
2.8.1 INSERT
There are 3 versions to the INSERT query. We can insert all the fields for a record, only some of the
fields or we can use another record to provide the values for some of the fields.
ACTIVITY 20
Type in the following query to INSERT the record with the data above:
INSERT INTO PrintLogs
VALUES
(#2017/05/23 9:33:00#, “Admin”, “Admin”, #2000/01/01#,
“Admin@MySchool.co.za”, “hsxeroxd95”, “HP LaserJet P2055dn”,
“CNCKF11815”, 5, 0, 1, 0.56, 1513, True)
VALUES
(<fieldValue1>, <fieldValue2>, …, <fieldValueN>)
Data for the field names provided in the 1st set of brackets must correspond (in the 2nd
set of brackets after the VALUES keyword) exactly to the field names in the 1st set of
brackets in terms of data type and order
Not all of the fields of the table need to be specified and the order of the fields can be
different to the actual table; any fields that are not specified will have NULL data for the
new record
The fields listed do not need to be in the same order as they appear in the table.
The rules about quotes (for string data) and ‘#’ symbols (for Date/Time values) still apply
here.
This version of the INSERT query is essential for a table that has an Autonumber primary key field
as an autonumber field cannot be assigned in a query; it is automatically assigned by the database
application. We can’t demonstrate the effect of performing this kind of INSERT as the table
PrintLogs doesn’t have an Autonumber primary key; however, this version works even if there isn’t a
primary key. If we wanted to INSERT a record with a limited number of fields (not all the fields in the
table), and in a different order of fields to those actually in the table, this version works perfectly.
Perform the following query that will INSERT a partial record into the PrintLogs table:
INSERT INTO PrintLogs (Copies, Printed, FirstName, Surname)
VALUES (1, True, “Joe”, “Soap”)
This is a pretty meaningless record, but it does exist; you can use the data find tool in your database
application to locate the surname ‘Soap’ or the first name ‘Joe’. The position of the record in the table
will be determined by the database application (in this case, it turned out to be record number 2299).
In MySQL, the record is entered in the last position of the table.
As long as you match the data values to the selected fields in order and type, the query will work.
INSERT INTO PrintLogs (Copies, Printed, FirstName, Surname)
This syntax may be a little confusing to understand, so consider a worked example from the table
PrintLogs:
The user ‘Schultz’, ‘Faith’ completed 10 print jobs, as shown below:
To add records for ‘Joe Soap’ that are an exact copy of the records for Faith Schultz; i.e. Joe Soap
printed the exact jobs as Faith Schultz. All the details for Faith Schultz’s print jobs must be allocated
to Joe Soap, except for Joe Soap’s personal details. We embed a SELECT statement to list the field
names we want to carry over from Faith Schultz and for the personal details of Joe Soap, we ‘hard
code’ (provide actual data) in the place of the field. The WHERE clause in the embedded SELECT
statement needs to provide a condition that will isolate the rows for Faith Schultz.
In MS Access, the field called Date is problematic as it is considered a reserved word.
Change the name of the field Date to be PDate in Design View of the table Printlogs.
Code the following query to INSERT records with an embedded SELECT:
INSERT INTO PrintLogs (PDate, Surname, FirstName, DOB, EMail,
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed)
Hint: Change the design view named field Date to PDate; MS-Access doesn’t like a field named
Date in an INSERT query – it sees it as a reserved word.
Change the query to insert records for Lee Destiny, born on the 10 April 1984 who printed to the
hsadminxerox7835 on same days as Maya Hill with identical print jobs.
INSERT INTO PrintLogs (Date, Surname, FirstName, DOB, EMail,
PrinterName, PrinterModel, PrinterSerialNo, TotalPages,
TotalColourPages, Copies, Cost, SizeKB, Printed)
SELECT Date, 'Destiny', 'Lee', #1984/04/10#,
'DestinyL@MySchool.co.za', PrinterName, PrinterModel, PrinterSerialNo,
TotalPages, TotalColourPages, Copies, Cost, SizeKB, Printed
FROM PrintLogs
WHERE Surname = 'Hill' AND FirstName = 'Maya' AND PrinterName =
'hsadminxerox7835'
Check that the records have been inserted using the query”
SELECT *
FROM printlogs
WHERE Surname = 'Lee' AND FirstName = 'Destiny' AND PrinterName =
'hsadminxerox7835'
Square brackets ([ ]) indicate optional code; i.e. you can update one field or many fields. Separate
successive fieldX = valueX assignments with commas and only use the keyword SET once, after the
table name.
ACTIVITY 21
Code the query to change all Cost values to include VAT:
UPDATE PrintLogs
SET Cost = Cost * 1.14
SET Cost = 0
WHERE NOT(Printed)
Logically, the conditions NOT(Printed) and Printed = False mean the same.
If you wanted to change a record in a table that has a primary key to uniquely identify each record,
you would simply include a WHERE clause that restricts the row to match the primary key field value:
WHERE PrimaryKeyField = Value
Since there is no primary key field in the PrintLogs table, so we would have to include a WHERE
clause that finds the correct record/s using a compound condition. The combination of the Surname
and FirstName may not be unique, however, in this example the email address will be unique.
The print jobs sent by ‘Henry Bates’ are displayed below with the Surname, FirstName and Email
fields.
It has been discovered that ‘Henry’ is actually ‘Henrietta’, a female, and she is actually not a student,
as indicated in the Email address field.
Code the following query to SET the FirstName to ‘Henrietta’ and change the Email address to
BatesH@MySchool.co.za:
UPDATE PrintLogs
SET FirstName = 'Henrietta', EMail = 'BatesH@MySchool.co.za'
WHERE Surname = 'Bates' AND FirstName = 'Henry'
Now run the query display Henrietta Bates:
SELECT Surname, FirstName, Email FROM PrintLogs
WHERE Surname = 'Bates' AND FirstName = 'Henrietta'
You will see that the relevant fields have been updated:
Rewrite the UPDATE query to change ‘Henry’ to ‘Henrietta’ and her email address using the
email address to change the correct records. Obviously, you cannot run this query on the
database as the records have already been altered.
In a similar fashion to the UPDATE query, a well-structured table should offer a primary key that can
be used to DELETE a single record using:
WHERE PrimaryKeyField = Value
The syntax of the DELETE query to delete ALL records in a table is:
DELETE *
FROM <tableName>
Note that this will not remove the table itself, only all the data in the table. The ‘*’ is not necessary in
the statement. A better version of the DELETE statement is:
DELETE
FROM <tableName>
WHERE fieldname = value
This more controlled DELETE statement will restrict the rows deleted using the WHERE clause.
For example, consider all the print jobs sent to the printer with PrinterSerialNo ‘CNB7H8R1X5’ were
queued and waiting to be printed. Unfortunately, a power surge irreparably damaged the printer and
all print jobs were lost; making it necessary to delete print jobs sent to this printer.
The total of the number of print jobs sent to each printer is
shown alongside with the number of print jobs sent to
PrinterSerialNo ‘CNB7H8R1X5’ was 19. The query produces
23 printer serial numbers for the printers in the table.
ACTIVITY 22
Code a query to produce the table shown alongside:
Perform the query to DELETE the printer with the serial
number 'CNB7H8R1X5'
DELETE FROM PrintLogs
WHERE PrinterSerialNo = 'CNB7H8R1X5'
Run the previously query to check that the printer no
longer exists.
Write down another query to check if the printer has
been successfully deleted.
EXERCISE 9
Learning Unit 2 SQL Revision Page 69
exploring IT: Java Programming Grade 12
1. Insert a new printer with the printer name ‘HPFastPrint’, model ‘HPFastPrint2500’ and serial
number ‘HP25001234’. No one has printed to this printer as yet, so the rest of the fields will be
blank.
2. A printer ‘prepxerox7835’ was replaced by a temporary printer called ‘XeroxReplacement’. A
serial number and model has not been assigned to this printer. Insert this new printer using all
the print jobs that were sent to the ‘prepXerox7835’ and then delete all the ‘prepxerox7835’
records.
3. Change the size of each print job to be rounded to the nearest 100 in KB.
4. The print cost has doubled for all print jobs sent to the Xerox machines.