You are on page 1of 12

What does Star Wars have to do with RPG?

Don't worry, the nostalgia trip is almost over. As an adult, I realize that I'll never soar through the galaxy
in my X-Wing or own a real light saber; those technologies are well beyond my grasp. And while my
hopes of someday rescuing the Princess and destroying the Death Star erode a bit more each day, I can
take a cue from a real-life Star Wars icon, George Lucas: This is my prequel.

In a recent article entitled Embedding SQL in /free, I outlined some of the methods that I use when
embedding SQL in free-format RPG. That article presumes that the reader already understands
embedded SQL and largely addresses "the ugly factor." In recent months, though, it's become apparent
to me that many of our RPG brethren are still looking for the back story. The e-mail lists are frequently
abuzz with basic how-to questions about embedded SQL, so in this article I'm going to lay out a Primer
for embedding SQL in RPG. While this article addresses RPG specifically, SQL may also be embedded in
COBOL, C, and even REXX programs. Also, since my prior article focused on SQL in free-format, I'm going
to deviate from my normal mode of operations and use fixed-format for the examples in this article.

SQL: Why and When

I want to get into the meat of the topic, so I'm not going to spend a lot of time here. Plenty of other
good thinkers have weighed in on this topic before, so I'm just going to hit what I think are the
highlights.

1. Working with Groups of Data. SQL is terrific at selecting and manipulating groups of data. If you want
to update/delete all the records in a file matching a certain criteria, use SQL. SQL can change or delete a
group of records in a single statement, whereas native I/O would require you to loop through a file and
issue individual update or delete statements.

2. Massaging Columns. SQL has a slew of columnar functions designed to tally, total, summarize,
manipulate, and calculate columns of data. Many program features such as substringing, averaging, and
math functions can be performed during the record selection phase. Columns can even be returned that
don't exist in the file, such as counts, calculations, literals, and dates.

3. Aggregate Data. SQL lets you easily aggregate columns in like rows. For instance, if you wanted to find
a list of all the different zip codes in a mailing address file and count how many addresses were in each
zip code, SQL can easily accomplish this in a single statement. In native I/O you would have to loop
through a file and increment counter fields or use arrays and/or multiple-occurrence data structures to
aggregate like data. Combine this capability with 1 and 2 above and SQL absolutely shines.

4. Performance. A constant debate rages on the e-mail lists about whether native I/O or SQL performs
better. The answer, as usual, is "it depends". If I were doing anything that falls into numbers 1-3 above, I
would use SQL. If you are doing record level random retrieval (such as CHAIN), then native I/O should
perform better. This is hardly a given, though: IBM's DB2 development is largely focused on improving
the SQL engine and its performance, so you may see different results for different OS levels. The SQL
engine will determine access paths at execution time. If possible, the access plan is then stored in the
program object and reused on subsequent calls. Sometimes, such as in the case of dynamic SQL, this is
not possible and the plan is recreated on each PREPARE execution. More information on Access Paths is
available at the IBM Infocenter. As a general rule, you should not reference logical files on SQL
statements. Instead, reference the physical file name. If an appropriate access path does not exist, the
engine must generate one on the fly, so building appropriate indexes can greatly benefit your SQL
performance.

There are many other capabilities of SQL; this list is hardly inclusive. I just wanted to lay out some basic
ground rules. SQL is extremely flexible and capable, which is why I tend to prefer it, but I do not espouse
the total replacement of native I/O with SQL. Pick the right tool for the right job. While some argue that
you should use SQL because it is an "industry standard," I think that is a poor reason for completely
dismissing a unique and powerful tool like native I/O. The database access method you choose is not an
exclusive decision. You can use SQL and native I/O together in the same program. So you could use SQL
to select a group of records or aggregates to work with, perhaps selecting only the key fields, and then
use native I/O to CHAIN to another file using the keys to retrieve the detail.

Getting Started

In order to write programs using embedded SQL, you must have the DB2 Query Manager and SQL
Development Kit installed on your machine (that's licensed program 5722-ST1 according to my V5R2
box). A quick test to see if you already have this installed is to enter "STRSQL" on the command line. If
you have it, the "Interactive SQL" program will start, which you can use to execute ad hoc queries
against your DB2 database. If you do not have the Development Kit installed, you cannot write programs
that use SQL, so be sure you have this before you proceed.

It is important to note that you do not need the development kit or any additional runtimes to execute
programs written with embedded SQL. The SQL engine used to process the SQL commands is built-in to
the DB2 core, so you can run the programs anywhere. You just can't compile them anywhere. This can
be very handy if you manage multiple machines but only have a single development environment. While
you can still use a tool like Operations Navigator to execute SQL on machines without Interactive SQL,
neither one is a very good option for executing complex series of statements, especially if the function is
a recurring job. That's what programs are for!

In order to create an RPG source member that can use embedded SQL, you must use a source type of
SQLRPGLE. To compile an SQLRPGLE source member, use the "Create SQL ILE RPG Object" (CRTSQLRPGI)
command. If you are using PDM, option 14 will create a bound program, and option 15 will create a
*MODULE object. If you are issuing the command manually, use the OBJTYPE parameter to indicate
*PGM or *MODULE. Also, if your program updates a file that is not journaled, you must change the
command option COMMIT to *NONE, otherwise the updates will not occur. Personally, this was such as
constant hassle for me that I changed my command default.
Before too many of you more experienced SQL programmers out there send me a million e-mails, yes
there are two other ways to handle the commitment control problem. First, you could add With NC to
your SQL statements. This signifies that no commitment control should be used on this particular
statement. The problem with this approach is the additional hassle of needing to remember it for every
statement that updates the database. Second, you could use the SET OPTION SQL statement. SET
OPTION is perhaps worthy of an article in its own right. It allows you to enforce certain program options
in the code itself rather than on the compile command:

c/exec sql

c+ set option commit=*none,

c+ datfmt=*iso

c/end-exec

There are more options for set, and many people swear by it. For whatever they are worth, here are my
issues with SET OPTION. First of all, I'd have to remember to add a block in every embedded SQL
program I write. Call it lazy or senile, I just don't like needing to remember stuff like that. By changing
the command default, I only have to remember it when I upgrade my OS level. Second, I've run into
some funky side effects when debugging programs that have SET OPTION blocks: namely the lines
presented don't appear to match the lines being executed. This may have been corrected in a PTF or
subsequent releases, but I haven't felt the need to address it since I changed my command defaults.

The SQL Pre-Compiler

SQLRPGLE programs go through a two-step compile process. Before the RPG code is sent to the
standard RPG IV compiler, it must first pass muster with the "SQL pre-compiler." The pre-compiler's
main job is to validate all of your SQL statements and convert them to dynamic program calls, after
which it passes the altered code on to the main RPG IV compiler. If you are interested in seeing what the
converted code looks like, set the LIST parameter to *PRINT and then view the spooled output. Your
original SQL statements will be commented and followed by the code inserted by the pre-compiler.

Now for "the good, the bad, and the ugly." The good news is that this process allows us to easily embed
SQL. If we had to write code to validate our own SQL and then develop our own API calls, embedding
SQL would be inordinately difficult. The bad is that the SQL pre-compiler team and the RPG IV compiler
team are not one in the same: in fact, they aren't even in the same location or the same division. This
has a tendency to create a lag between features supported in RPGLE and features supported in
SQLRPGLE. I'm not commenting on this state of affairs, just pointing it out. Over the last several years
the SQL pre-compiler team has done a lot of work to try to rectify this situation, and it is improving.

But there is some ugly. Should your compile fail at the pre-compiler level, the error listing you get back
will be of very little assistance compared to the RPGIV error listings. The message are short and cryptic
and offer nothing in the way of corrective suggestions. Also, the pre-compiler appears to simply stop at
the first sign of trouble, so even if you use LIST(*PRINT) you only receive the source code up to the first
error. This may take a little getting used to and can surely be frustrating when you are first starting out.
My advice is to stick with it: with a little practice, your SQL statements won't cause a lot of compile-time
problems.

Embedding Basics

Issuing an SQL statement within your source code has some rules as well. In your RPG source code, you
must indicate to the SQL pre-compiler that you are in fact executing SQL. To do so, you must wrap all
SQL in Exec SQL and End-Exec compiler statements beginning in column 7:

c/exec sql

c+ create table midserve/sqltest

c+ ( id int not null primary key ,

c+ name char(35) not null with default ,

c+ address char(35) not null with default ,

c+ city char(35) not null with default ,

c+ state char(2) not null with default ,

c+ zip numeric(5,0) not null with default )

c/end-exec

Notice that the lines between the two compiler directives all begin with c+. Also, only one SQL
statement is allowed in each pair of directives, so if you wanted to drop this table before issuing the
CREATE TABLE statement it would look like so:

c/exec sql

c+ drop table midserve/sqltest

c/end-exec

c/exec sql

c+ create table midserve/sqltest

c+ ( id int not null primary key ,

c+ name char(35) not null with default ,

c+ address char(35) not null with default ,

c+ city char(35) not null with default ,

c+ state char(2) not null with default ,

c+ zip numeric(5,0) not null with default )

c/end-exec
You can insert SQL like this anywhere in your code, including subroutines and subprocedures. This SQL
doesn't use any program variables or return any values. Such SQL is not terribly functional beyond the
sort of example listed above, so let's turn our attention to using program variables with our SQL.

Host Variables

Host variable is a term that you see frequently in embedded SQL discussions. Since your RPG program is
the "host" of these SQL statements, the term host variable refers to any variable declared in the RPG
program. We can use these variables when we execute SQL as parameters or as return values. Host
variables are always preceded in SQL by a semi-colon. The following example will add ten records to our
table created above by incrementing a counter variable and using the counter for the ID field:

d counter s 10i 0 inz

c for counter=1 to 10

c/exec sql

c+ insert into midserve/sqltest

c+ ( id )

c+ values( :counter )

c/end-exec

c endfor

You can easily mix and match host variable and literals as well. In this example, I use three host variables
and one literal as INSERT values.

d counter s 10i 0 inz

d city s 35a inz( 'STAUNTON' )

d zip s 5s 0 inz( 24401 )

c for counter=1 to 10

c/exec sql

c+ insert into midserve/sqltest

c+ ( id , city , state , zip )

c+ values( :counter ,

c+ :city ,

c+ 'VA' ,
c+ :zip

c+ )

c/end-exec

c endfor

Granted, this code doesn't do a whole lot, but it does demonstrate how to reference a host variable. It
also demonstrates that, within the confines of the compiler directives, embedded SQL is basically free-
format. In this case, I shifted the SQL text over to be inline with the fixed-format code around it, but I
could have just easily left it left justified like the CREATE TABLE statements above. In the INSERT
statement I put all the field names I'm inserting on one row but I listed each of the insert values on their
own row and I also indented the closing parentheses for the values statement. These are merely stylistic
demonstrations: You should experiment and find a style that is comfortable for you to code and easy for
you to read. It's also a good time to point out that the SQL engine is case-insensitive. I'm using lowercase
above but you could just as easily use uppercase if it suits you. Of course, the literal value 'VA' above is
capitalized because that's how I want it in the database.

I should also point out that the host variable names are the same as the field names in the file. This is
simply a coincidence, or rather an intentional choice. Unlike native I/O, the fields are not automatically
available to the program as variables. Because of this difference, there is no conflict here in creating
variables as the same name. I could have just as easily used a variable called "JOELS_TOWN" instead of
"CITY".

There are some additional rules for host variable naming. I address them in more detail in the prior
article, but to sum them up in one fell swoop: the names should be unique. In other words, don't use
host variables in subprocedures that also have global names and vice-versa. Personally, I make all my
host variables global variables. I will frequently prefix them with sql_ to prevent any questions. I am also
fond of using externally defined datastructures for my host variables to prevent the SQL Engine from
performing unnecessary translations. Above we defined ID as "int". If we tried to insert a variable
defined as "3S 0", then the Engine would have to convert it to "int" on the fly. To avoid this, I try to use
host variables that are defined exactly the same as the database fields. These conventions have worked
well for me and once instituted as a standard are easy to maintain.

Returning a Value into a Host Variable

SQL reads data from a database using the SELECT statement. In embedded SQL, you must specify the
target(s) for the return value(s) to populate. We can accomplish this by adding the INTO clause in our
SELECT statement:

d counter s 10i 0 inz

c/exec sql

c+ select count(*)
c+ into :counter

c+ from midserve/sqltest

c/end-exec

c counter dsply

c eval *inlr = *on

This complete program counts the total number of records in a file and retrieves that value into a
program variable named "COUNTER". Notice again that the host variable is referenced in the SQL by the
use of the ":". As I implied before, you can also use a datastructure name as the return variable:

d myDS ds

d id 10a varying

d name 35a

d zip 5s 0

c/exec sql

c+ select trim(char(id)), name, zip

c+ into :myDS

c+ from midserve/sqltest

c/end-exec

I've mixed things up a bit here for demonstration purposes. ID is defined as an "int" in the database, but
I want the character version of that value, so I've used a couple of SQL functions to convert it for me.
More importantly though, I've retrieved 3 field values without needing to list all three host variables.
This is very handy, especially if you are retrieving large numbers of fields from a file.

Handling Multiple Records

If you display these results, you will only see the first record, but there are 10 records in the file, and this
particular "Select" statement should definitely return all 10 of them. Since SQL excels at processing
groups of data, this is fairly typical. In order to process multiple rows, we need a facility for navigating
the returned rows. In RPG, this is known as a cursor, in other languages it is frequently referred to as a
RecordSet.

The order of events for using a cursor is thus:

Declare
Open
Fetch (Repeat as necessary)
Close

Declaring a cursor is fairly straightforward. Declaring a cursor for the select statement above would look
like this:

c/exec sql

c+ declare mainCursor Cursor

c+ for

c+ select trim(char(id)), name, zip

c+ from midserve/sqltest

c/end-exec

The differences are the addition of the declare and for clauses and the removal of the into clause. We
still need an into clause, but it will be part of the fetchstatement. In this example "mainCursor" is that
name that we will use to reference this cursor: you can basically name cursors however you like, but as
always it's good advice to make them mean something.

Now that we have declared a cursor, we need to indicate to the program that we are ready to use it.
This is important because cursors can be reusable (more on this later). To make the cursor ready for use,
we must open it:

c/exec sql

c+ open mainCursor

c/end-exec

This will prepare the cursor and the data for retrieval. Now we must fetch the data into our host
variable(s). At this point we have some options. The first option is to make "myDS" a multiple-occurence
data structure (MODS) and fetch all 10 rows into our MODS at once. Below is a complete sample:

d myDS ds occurs(10)

d id 10a varying

d name 35a

d zip 5s 0

d i s 10i 0 inz

c/exec sql

c+ declare mainCursor Cursor


c+ for

c+ select trim(char(id)), name, zip

c+ from midserve/sqltest

c/end-exec

c/exec sql

c+ open mainCursor

c/end-exec

c/exec sql

c+ fetch next

c+ from mainCursor

c+ for 10 rows

c+ into :myDS

c/end-exec

c for i=1 to %elem( myDS )

c eval %occur( myDS ) = i

c myDS dsply

c endfor

c eval *inlr = *on

The limitations of this approach should be obvious. It forces you to hard code a number of occurrences
that is less than or equal to the number of rows returned. You can return fewer rows than the size of the
MODS, but if you try to return more rows than the defined size of the MODS the compiler will throw an
error. This hard coding may not be a problem depending on the needs of your program. One good
example would be a program that totals sales figures by month: since there can only be twelve
(calendar) months, then fetching all the records into a MODS with 12 occurrences at one time makes
perfect sense.

Looping through a Cursor

For the majority of cases, however, you probably want to be able to loop through the cursor as if you
were reading a file. To do so, simply remove the for x rows clause. You will also need to add a
conditioning test, typically on SQLCOD or SQLSTT:

d myDS ds
d id 10a varying

d name 35a

d zip 5s 0

c/exec sql

c+ declare mainCursor Cursor

c+ for

c+ select trim(char(id)), name, zip

c+ from midserve/sqltest

c/end-exec

c/exec sql

c+ open mainCursor

c/end-exec

c/exec sql

c+ fetch next

c+ from mainCursor

c+ into :myDS

c/end-exec

c dow SQLSTT = '00000'

c myDS dsply

c/exec sql

c+ fetch next

c+ from mainCursor

c+ into :myDS

c/end-exec

c enddo

c eval *inlr = *on

You can also make a cursor scrollable by adding the scroll clause to the declare statement:
c/exec sql

c+ declare mainCursor scroll Cursor

c+ for

c+ select trim(char(id)), name, zip

c+ from midserve/sqltest

c/end-exec

If you do not include the scroll clause, the cursor can only be read forward, but a scrollable cursor has
the additional capabilities of fetch first, fetch last, and fetch prior. There are additional fetch capabilities
as well.

SQLCOD and SQLSTT

In the previous example I referred to these two variables, but you won't find them defined in the source
code. These variables both come from the "SQL Communications Area" (SQLCA). SQLCA is a data
structure that is automatically included with every SQLRPGLE source member:

D* SQL Communications area

D SQLCA DS

D SQLAID 1 8A

D SQLABC 9 12B 0

D SQLCOD 13 16B 0

D SQLERL 17 18B 0

D SQLERM 19 88A

D SQLERP 89 96A

D SQLERRD 97 120B 0 DIM(6)

D SQLERR 97 120A

D SQLER1 97 100B 0

D SQLER2 101 104B 0

D SQLER3 105 108B 0

D SQLER4 109 112B 0

D SQLER5 113 116B 0

D SQLER6 117 120B 0

D SQLWRN 121 131A

D SQLWN0 121 121A


D SQLWN1 122 122A

D SQLWN2 123 123A

D SQLWN3 124 124A

D SQLWN4 125 125A

D SQLWN5 126 126A

D SQLWN6 127 127A

D SQLWN7 128 128A

D SQLWN8 129 129A

D SQLWN9 130 130A

D SQLWNA 131 131A

D SQLSTT 132 136A

D* End of SQLCA

The fields most used from SQLCA are SQLCOD and SQLSTT. Both of these fields return information to the
program indicating the resulting state of the most recently executed SQL statement. SQLCODEs and
SQLSTATEs always have corresponding values, so you can check either one, but I find the SQLSTATEs to
be more informative.

Briefly, I primarily use SQLSTT values of '00000' and '02000'. '00000' indicates that the statement was
successful and has no warnings: when fetching, this indicates that a row was found. '02000' basically
means no rows were found. For a complete overview of Codes and States, refer to IBM's DB2 UDB for
iSeries SQLCODEs and SQLSTATEs manual.

You might also like